From bugzilla-daemon at mcs.anl.gov Sun Jul 1 00:09:12 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sun, 1 Jul 2007 00:09:12 -0500 (CDT) Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: Message-ID: <20070701050912.2A2BC16505@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 ------- Comment #7 from iraicu at cs.uchicago.edu 2007-07-01 00:09 ------- (In reply to comment #6) > (In reply to comment #4) > > Hi again, > > Here is an update of yesterday's 244 molecule run. The experiment ran further > > than before, but it still did not complete. There were 240 molecules that > > completed successfully (in the previous run, no molecule finished), but 4 > > molecules still did not finish. > > > > Actually it looks tasks worked fine: > bash-3.1$ cat MolDyn-244-63ar6atbg2ae1.log |grep "type=1.*ubmitted"|wc > 24309 243090 2806214 > bash-3.1$ cat MolDyn-244-63ar6atbg2ae1.log |grep "type=1.*ailed"|wc > 3614 36140 405816 > bash-3.1$ cat MolDyn-244-63ar6atbg2ae1.log |grep "type=1.*ompleted"|wc > 20695 206950 2389556 > > All tasks are accounted for. It may be that some jobs failed 3 times in a row. > From the logs it looks like the workflow almost finished and it got to the > point where the error reporting was to be done. Perhaps the stack overflow that > you saw occurred there, and perhaps the impossible size of the workflow might > have something to do with it. > The same machine (tg-v024) that we had trouble with before acted up again, I should have removed it before we started the experiment. If this is the consensus, we can certainly try it again, and make sure this machine is not in the resource pool. Another idea is to increase the retry # from 3 to something higher, maybe 10, 30, etc? Jobs can be resubmitted relatively fast with Falkon, so retrying many times is not a big overhead... except that it takes longer for Swift to give up! Ioan -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Sun Jul 1 00:47:49 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sun, 1 Jul 2007 00:47:49 -0500 (CDT) Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: Message-ID: <20070701054749.8E478164DB@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 ------- Comment #8 from iraicu at cs.uchicago.edu 2007-07-01 00:47 ------- (In reply to comment #5) > First of all, can you commit the changes to SVN? > Yong made the changes, I am sure he will commit them the first chance he gets! > (In reply to comment #4) > > We fixed the potential synchronization issue > > Mihael pointed out. > > There were two. > I meant to say "issues"... from the discussion I had with Yong, I believe he addressed both of them. > > We also fixed a badly handled exception we had in the > > Falkon provider, that would give up very easily and exit the Falkon provider > > thread in case of an exception, even if it wasn't a fatal one. This time > > around, we changed the logic to simply print the exception, if there were any, > > and not exit the Falkon provider, just continue. Personally, I think this > > logic on handling exceptions in the Falkon provider was causing the Falkon > > provider to exit prematurely, and hence not send any more tasks to Falkon... > > I can't seem to find anything that would fit that profile in the provider code. > Can you be more specific? If the provider was setting the status of the task to > failed, then it doesn't matter. Swift retries failed things. > Sure. Double check file SubmissionThread.java, notice that the thread will live as long as exit is not set... Line 54: public void run() { while(!exit) { exit is initially set to false, but anything that sets it to true, and the submission thread will exit. Notice the end of the file with the setStatus(Executable) function: Line 98: public void setStatus (Executable execs[]) { try { for (int i=0; i > note that Swift was setting the set status of submitted tasks to the Falkon > > provider in a separate thread, > > Swift does not set status of tasks. That's what the provider is supposed to do. > OK, there are several separate threads, one that sets the status of the task for Swift, another that performs the submit, another that receives notifications, etc. The common data structure between the set status thread and the submit thread is a queue; if the submission thread dies, the queue is still valid, and the set status thread could still insert tasks into the queue and set the status to submitted, although there would be no submission thread alive to perform the submission itself to Falkon. > > which was not necesarly exiting when the Falkon > > provider was, and hence we had the scenario in which Swift thought it sent out > > more tasks than Falkon really saw. > > Can you be more specific? If there is a problem in Swift, we need to fix it, > but your comment is too vague. > > > > > Now, the issue that I think stopped this experiment. On the console of Swift, > > the last thing that it printed was a "stack overflow error"; I don't think this > > printed in the logs, just on the console. > > Without the stack trace, the information is not very useful. > Nika said it was simply a message printed on the console. This was the same as the case we saw on Thursday. This was not a regular exception that Swift or the Falkon provider controlled, and hence that it would have a print stack trace along with it. As far as I could tell, it was an error from the JVM, and was not accompanied by any stack trace. If you don't know where to even start looking, let's run some quick synthetic runs of 20K jobs on Monday together, and hopefully we can reproduce the stack overflow error, and you can see it in person! Ioan > > > > Ioan > > > -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From hategan at mcs.anl.gov Sun Jul 1 01:53:43 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 01 Jul 2007 01:53:43 -0500 Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: <1702663950-1183255865-cardhu_decombobulator_blackberry.rim.net-1244943269-@bxe006.bisx.prod.on.blackberry> References: <20070630225207.B70D916506@foxtrot.mcs.anl.gov> <1702663950-1183255865-cardhu_decombobulator_blackberry.rim.net-1244943269-@bxe006.bisx.prod.on.blackberry> Message-ID: <1183272823.21185.5.camel@blabla.mcs.anl.gov> On Sun, 2007-07-01 at 02:10 +0000, Ian Foster wrote: > Why do you say the workflow's size was "impossible"? It doesn't seem that large to me. We'd like to run larger ones! Most certainly so. However, we want to make use of loops rather than generating large swift files. > > > Sent via BlackBerry from T-Mobile > > -----Original Message----- > From: bugzilla-daemon at mcs.anl.gov > > Date: Sat, 30 Jun 2007 17:52:07 > To:swift-devel at ci.uchicago.edu > Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules > > > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 > > > > > > ------- Comment #6 from hategan at mcs.anl.gov 2007-06-30 17:52 ------- > (In reply to comment #4) > > Hi again, > > Here is an update of yesterday's 244 molecule run. The experiment ran further > > than before, but it still did not complete. There were 240 molecules that > > completed successfully (in the previous run, no molecule finished), but 4 > > molecules still did not finish. > > > > Actually it looks tasks worked fine: > bash-3.1$ cat MolDyn-244-63ar6atbg2ae1.log |grep "type=1.*ubmitted"|wc > 24309 243090 2806214 > bash-3.1$ cat MolDyn-244-63ar6atbg2ae1.log |grep "type=1.*ailed"|wc > 3614 36140 405816 > bash-3.1$ cat MolDyn-244-63ar6atbg2ae1.log |grep "type=1.*ompleted"|wc > 20695 206950 2389556 > > All tasks are accounted for. It may be that some jobs failed 3 times in a row. > >From the logs it looks like the workflow almost finished and it got to the > point where the error reporting was to be done. Perhaps the stack overflow that > you saw occurred there, and perhaps the impossible size of the workflow might > have something to do with it. > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From bugzilla-daemon at mcs.anl.gov Sun Jul 1 01:56:28 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sun, 1 Jul 2007 01:56:28 -0500 (CDT) Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: Message-ID: <20070701065628.0C73216506@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 ------- Comment #9 from hategan at mcs.anl.gov 2007-07-01 01:56 ------- (In reply to comment #7) > (In reply to comment #6) > > (In reply to comment #4) > The same machine (tg-v024) that we had trouble with before acted up again, I > should have removed it before we started the experiment. If this is the > consensus, we can certainly try it again, and make sure this machine is not in > the resource pool. Another idea is to increase the retry # from 3 to something > higher, maybe 10, 30, etc? Not a good idea in the general case, since many times the error may not be something temporary. The swift scheduler takes bad machines into account and attempts to avoid submitting to them. > > Ioan > -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From hategan at mcs.anl.gov Sun Jul 1 02:11:49 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 01 Jul 2007 02:11:49 -0500 Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: <1183272823.21185.5.camel@blabla.mcs.anl.gov> References: <20070630225207.B70D916506@foxtrot.mcs.anl.gov> <1702663950-1183255865-cardhu_decombobulator_blackberry.rim.net-1244943269-@bxe006.bisx.prod.on.blackberry> <1183272823.21185.5.camel@blabla.mcs.anl.gov> Message-ID: <1183273909.21185.11.camel@blabla.mcs.anl.gov> On Sun, 2007-07-01 at 01:53 -0500, Mihael Hategan wrote: > On Sun, 2007-07-01 at 02:10 +0000, Ian Foster wrote: > > Why do you say the workflow's size was "impossible"? It doesn't seem that large to me. We'd like to run larger ones! > > Most certainly so. However, we want to make use of loops rather than > generating large swift files. Ok. I see. I meant impossible size of the source file. We clearly want to be running workflows with that many jobs smoothly. I just don't think large source files (whether Swift or Karajan) are a good way to do it. I'm quite (pleasantly) surprised that Swift/Karajan can load and run XML files with 1M+ lines. Of course, that doesn't mean we shouldn't try to fix the problems that might arise with large source files if possible. > > > > > > > Sent via BlackBerry from T-Mobile > > > > -----Original Message----- > > From: bugzilla-daemon at mcs.anl.gov > > > > Date: Sat, 30 Jun 2007 17:52:07 > > To:swift-devel at ci.uchicago.edu > > Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules > > > > > > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 > > > > > > > > > > > > ------- Comment #6 from hategan at mcs.anl.gov 2007-06-30 17:52 ------- > > (In reply to comment #4) > > > Hi again, > > > Here is an update of yesterday's 244 molecule run. The experiment ran further > > > than before, but it still did not complete. There were 240 molecules that > > > completed successfully (in the previous run, no molecule finished), but 4 > > > molecules still did not finish. > > > > > > > Actually it looks tasks worked fine: > > bash-3.1$ cat MolDyn-244-63ar6atbg2ae1.log |grep "type=1.*ubmitted"|wc > > 24309 243090 2806214 > > bash-3.1$ cat MolDyn-244-63ar6atbg2ae1.log |grep "type=1.*ailed"|wc > > 3614 36140 405816 > > bash-3.1$ cat MolDyn-244-63ar6atbg2ae1.log |grep "type=1.*ompleted"|wc > > 20695 206950 2389556 > > > > All tasks are accounted for. It may be that some jobs failed 3 times in a row. > > >From the logs it looks like the workflow almost finished and it got to the > > point where the error reporting was to be done. Perhaps the stack overflow that > > you saw occurred there, and perhaps the impossible size of the workflow might > > have something to do with it. > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From bugzilla-daemon at mcs.anl.gov Sun Jul 1 02:15:35 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sun, 1 Jul 2007 02:15:35 -0500 (CDT) Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: Message-ID: <20070701071535.1AE3416506@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 ------- Comment #10 from hategan at mcs.anl.gov 2007-07-01 02:15 ------- (In reply to comment #7) > (In reply to comment #6) > > (In reply to comment #4) > > > > There were two. > > > I meant to say "issues"... from the discussion I had with Yong, I believe he > addressed both of them. Ok. Got confused. > > > We also fixed a badly handled exception we had [...] > > Can you be more specific? [...] > > > Sure. Double check file SubmissionThread.java, notice that the thread will > live as long as exit is not set... > Also, check the StatusThread.java, Right. Missed that. > > > > note that Swift was setting the set status of submitted tasks to the Falkon > > > provider in a separate thread, > > > > Swift does not set status of tasks. That's what the provider is supposed to do. > > > OK, there are several separate threads, one that sets the status of the task > for Swift, another that performs the submit, another that receives > notifications, etc. The common data structure between the set status thread > and the submit thread is a queue; if the submission thread dies, the queue is > still valid, and the set status thread could still insert tasks into the queue > and set the status to submitted, although there would be no submission thread > alive to perform the submission itself to Falkon. That sounds like the provider, not Swift. Maybe I misunderstood something? -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Sun Jul 1 02:18:46 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sun, 1 Jul 2007 02:18:46 -0500 (CDT) Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data In-Reply-To: Message-ID: <20070701071846.56FF016505@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=76 ------- Comment #1 from hategan at mcs.anl.gov 2007-07-01 02:18 ------- This would require a data file pointer store (VDC like thing) which can record where intermediate files are instead of assuming they are always available on the submit host. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Sun Jul 1 10:48:09 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sun, 1 Jul 2007 10:48:09 -0500 (CDT) Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: Message-ID: <20070701154809.1AFB5164DB@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 ------- Comment #11 from iraicu at cs.uchicago.edu 2007-07-01 10:48 ------- (In reply to comment #9) > (In reply to comment #7) > > (In reply to comment #6) > > > (In reply to comment #4) > > The same machine (tg-v024) that we had trouble with before acted up again, I > > should have removed it before we started the experiment. If this is the > > consensus, we can certainly try it again, and make sure this machine is not in > > the resource pool. Another idea is to increase the retry # from 3 to something > > higher, maybe 10, 30, etc? > > Not a good idea in the general case, since many times the error may not be > something temporary. The swift scheduler takes bad machines into account and > attempts to avoid submitting to them. > Yes, but in this case, Falkon was the only set of resources that were available to Swift, so giving up early means giving up on the entire workflow. If it was indeed that the # of failures reached up to the maximum of 3 and that is why the worklow didn't complete, I would argue that it would be worthwhile to increase this upper ceiling.... at least when running solely with Falkon, or at the very least, for this experiment to see th 244 mol run succeed. Remember that Falkon is much faster than GRAM/PBS, so if errors happen quick, as in the case on this tg-v024 node, where it happens in <50 ms, then 1000s of errors can happen in a matter of seconds to minutes.... I am not sure what the correct solution is, bu something to consider as the dynamics of the problem is now different than it was before prior to Falkon. Ioan > > > > Ioan > > > -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Sun Jul 1 10:49:46 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sun, 1 Jul 2007 10:49:46 -0500 (CDT) Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: Message-ID: <20070701154946.60D4C164DB@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 ------- Comment #12 from iraicu at cs.uchicago.edu 2007-07-01 10:49 ------- (In reply to comment #10) > (In reply to comment #7) > > (In reply to comment #6) > > > (In reply to comment #4) > > > > > > There were two. > > > > > I meant to say "issues"... from the discussion I had with Yong, I believe he > > addressed both of them. > > Ok. Got confused. > > > > > We also fixed a badly handled exception we had [...] > > > Can you be more specific? [...] > > > > > Sure. Double check file SubmissionThread.java, notice that the thread will > > live as long as exit is not set... > > Also, check the StatusThread.java, > > Right. Missed that. > > > > > > > note that Swift was setting the set status of submitted tasks to the Falkon > > > > provider in a separate thread, > > > > > > Swift does not set status of tasks. That's what the provider is supposed to do. > > > > > OK, there are several separate threads, one that sets the status of the task > > for Swift, another that performs the submit, another that receives > > notifications, etc. The common data structure between the set status thread > > and the submit thread is a queue; if the submission thread dies, the queue is > > still valid, and the set status thread could still insert tasks into the queue > > and set the status to submitted, although there would be no submission thread > > alive to perform the submission itself to Falkon. > > That sounds like the provider, not Swift. Maybe I misunderstood something? > Right, the provider has multiple threads, and if any one of them exit prematurely, then it cannot function correctly. Ioan -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Sun Jul 1 11:36:30 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sun, 1 Jul 2007 11:36:30 -0500 (CDT) Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: Message-ID: <20070701163630.E977916506@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 ------- Comment #13 from hategan at mcs.anl.gov 2007-07-01 11:36 ------- (In reply to comment #11) > (In reply to comment #9) > > (In reply to comment #7) > > > (In reply to comment #6) > > > > (In reply to comment #4) > > > The same machine (tg-v024) that we had trouble with before acted up again, I > > > should have removed it before we started the experiment. If this is the > > > consensus, we can certainly try it again, and make sure this machine is not in > > > the resource pool. Another idea is to increase the retry # from 3 to something > > > higher, maybe 10, 30, etc? > > > > Not a good idea in the general case, since many times the error may not be > > something temporary. The swift scheduler takes bad machines into account and > > attempts to avoid submitting to them. > > > Yes, but in this case, Falkon was the only set of resources that were available > to Swift, so giving up early means giving up on the entire workflow. If it was > indeed that the # of failures reached up to the maximum of 3 and that is why > the worklow didn't complete, I would argue that it would be worthwhile to > increase this upper ceiling.... at least when running solely with Falkon, or at > the very least, for this experiment to see th 244 mol run succeed. Remember > that Falkon is much faster than GRAM/PBS, so if errors happen quick, as in the > case on this tg-v024 node, where it happens in <50 ms, then 1000s of errors can > happen in a matter of seconds to minutes.... I am not sure what the correct > solution is, bu something to consider as the dynamics of the problem is now > different than it was before prior to Falkon. By themselves retries don't solve the problem. There must be a reasonable chance that a job will finish. If you have 999 busy workers and 1 bad worker, restarting 100 times will still cause the workflow to fail, and the fact that restarts will happen fast is not exactly helping. While a bit reluctant to add more options, I guess the number of restarts could be one in the future. > > Ioan > > > > > > Ioan > > > > > > -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Mon Jul 2 07:58:28 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 2 Jul 2007 07:58:28 -0500 (CDT) Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: Message-ID: <20070702125828.05B7B16506@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 ------- Comment #14 from nefedova at mcs.anl.gov 2007-07-02 07:58 ------- This is what I had on stdout (stack overflow error). The last line was printed over and over again 100s of times. *****************************SUPER_DEBUG: waiting for notification... chrm_long completed Exception in thread "Worker 3" java.lang.StackOverflowError at java.util.ArrayList.addAll(ArrayList.java:472) at org.globus.cog.karajan.arguments.VariableArgumentsImpl.appendAll(VariableArgumentsImpl.java:79) at org.globus.cog.karajan.workflow.futures.FutureVariableArguments.appendAll(FutureVariableArguments.java:40) at org.globus.cog.karajan.arguments.OrderedParallelVariableArguments.flushBuffer(OrderedParallelVariableArguments.java:67) at org.globus.cog.karajan.arguments.OrderedParallelVariableArguments.prevClosed(OrderedParallelVariableArguments.java:73) at org.globus.cog.karajan.arguments.OrderedParallelVariableArguments.prevClosed(OrderedParallelVariableArguments.java:78) at org.globus.cog.karajan.arguments.OrderedParallelVariableArguments.prevClosed(OrderedParallelVariableArguments.java:78) -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Mon Jul 2 09:17:11 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 2 Jul 2007 09:17:11 -0500 (CDT) Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: Message-ID: <20070702141711.DC3B516506@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 ------- Comment #15 from hategan at mcs.anl.gov 2007-07-02 09:17 ------- (In reply to comment #14) > Exception in thread "Worker 3" java.lang.StackOverflowError > [...] > org.globus.cog.karajan.arguments.OrderedParallelVariableArguments.prevClosed(OrderedParallelVariableArguments.java:78) > at > org.globus.cog.karajan.arguments.OrderedParallelVariableArguments.prevClosed(OrderedParallelVariableArguments.java:78) > > (repeat ad nauseaum) Fix to Karajan committed. Needs testing since it's in a delicate place. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From benc at hawaga.org.uk Mon Jul 2 13:42:30 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 3 Jul 2007 00:12:30 +0530 (IST) Subject: [Swift-devel] @strcut Message-ID: r881 makes a quick-and-dirty regexp function, @strcut, available. It doesn't handle errors nicely (or at all), but I've put it in so Nika can experiment with it a bit in an attempt to reduce her SwiftScript code size. If its useful, I'll tidy it up, otherwise I'll back it out. -- From foster at mcs.anl.gov Mon Jul 2 14:05:54 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Mon, 02 Jul 2007 14:05:54 -0500 Subject: [Swift-devel] @strcut In-Reply-To: References: Message-ID: <46894C92.6090602@mcs.anl.gov> Hi, I am curious--is this the only reason why the MolDyn program is so large, or are there other things that can be done to reduce code size? Ian. Ben Clifford wrote: > r881 makes a quick-and-dirty regexp function, @strcut, available. It > doesn't handle errors nicely (or at all), but I've put it in so Nika can > experiment with it a bit in an attempt to reduce her SwiftScript code > size. > > If its useful, I'll tidy it up, otherwise I'll back it out. > > -- Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org. From nefedova at mcs.anl.gov Mon Jul 2 14:16:44 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Mon, 2 Jul 2007 14:16:44 -0500 Subject: [Swift-devel] @strcut In-Reply-To: <46894C92.6090602@mcs.anl.gov> References: <46894C92.6090602@mcs.anl.gov> Message-ID: <33F67BC7-F2CD-4402-89F9-C27AF8162A6F@mcs.anl.gov> this is the main thing that prevented me from using loops. Once I re- write it with loops, the size of the code would be reduced dramatically. On Jul 2, 2007, at 2:05 PM, Ian Foster wrote: > Hi, > > I am curious--is this the only reason why the MolDyn program is so > large, or are there other things that can be done to reduce code size? > > Ian. > > Ben Clifford wrote: >> r881 makes a quick-and-dirty regexp function, @strcut, available. >> It doesn't handle errors nicely (or at all), but I've put it in so >> Nika can experiment with it a bit in an attempt to reduce her >> SwiftScript code size. >> >> If its useful, I'll tidy it up, otherwise I'll back it out. >> >> > > -- > > Ian Foster, Director, Computation Institute > Argonne National Laboratory & University of Chicago > Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 > Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 > Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. > Globus Alliance: www.globus.org. > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From foster at mcs.anl.gov Mon Jul 2 14:17:11 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Mon, 02 Jul 2007 14:17:11 -0500 Subject: [Swift-devel] @strcut In-Reply-To: <33F67BC7-F2CD-4402-89F9-C27AF8162A6F@mcs.anl.gov> References: <46894C92.6090602@mcs.anl.gov> <33F67BC7-F2CD-4402-89F9-C27AF8162A6F@mcs.anl.gov> Message-ID: <46894F37.5000506@mcs.anl.gov> cool ... Veronika Nefedova wrote: > this is the main thing that prevented me from using loops. Once I > re-write it with loops, the size of the code would be reduced > dramatically. > > On Jul 2, 2007, at 2:05 PM, Ian Foster wrote: > >> Hi, >> >> I am curious--is this the only reason why the MolDyn program is so >> large, or are there other things that can be done to reduce code size? >> >> Ian. >> >> Ben Clifford wrote: >>> r881 makes a quick-and-dirty regexp function, @strcut, available. It >>> doesn't handle errors nicely (or at all), but I've put it in so Nika >>> can experiment with it a bit in an attempt to reduce her SwiftScript >>> code size. >>> >>> If its useful, I'll tidy it up, otherwise I'll back it out. >>> >>> >> >> -- >> >> Ian Foster, Director, Computation Institute >> Argonne National Laboratory & University of Chicago >> Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 >> Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 >> Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. >> Globus Alliance: www.globus.org. >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > -- Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org. From hategan at mcs.anl.gov Mon Jul 2 14:47:26 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 02 Jul 2007 14:47:26 -0500 Subject: [Swift-devel] @strcut In-Reply-To: References: Message-ID: <1183405646.21420.1.camel@blabla.mcs.anl.gov> Something like that should be added to the Karajan system library too. On Tue, 2007-07-03 at 00:12 +0530, Ben Clifford wrote: > r881 makes a quick-and-dirty regexp function, @strcut, available. It > doesn't handle errors nicely (or at all), but I've put it in so Nika can > experiment with it a bit in an attempt to reduce her SwiftScript code > size. > > If its useful, I'll tidy it up, otherwise I'll back it out. > From benc at hawaga.org.uk Mon Jul 2 20:08:56 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 3 Jul 2007 06:38:56 +0530 (IST) Subject: [Swift-devel] recent karajan changes causing trouble Message-ID: I get the below when I try to run a hello world workflow (examples/tutorial/q1.swift). I think Nika also saw something that looks similar, with a different workflow. This is with cog r1655. I reverted my checkout to cog r1650 (svn merge -r1655:1650 .) and hello world runs ok (r1650 being before the most recent set of cog commits). $ swift -debug q1.swift Recompilation suppressed. null kernel:cache @ sys.xml, line: 3 Caused by: java.lang.UnsupportedOperationException at java.util.AbstractMap.put(AbstractMap.java:228) at org.globus.cog.karajan.workflow.nodes.CacheNode.getTrackingArguments(CacheNode.java:153) at org.globus.cog.karajan.workflow.nodes.CacheNode.post(CacheNode.java:77) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.nonArgChildCompleted(PartialArgumentsContainer.java:90) at org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.childCompleted(PartialArgumentsContainer.java:85) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) at org.globus.cog.karajan.workflow.nodes.CacheNode.notificationEvent(CacheNode.java:111) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:172) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.Namespace.post(Namespace.java:40) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.nonArgChildCompleted(PartialArgumentsContainer.java:90) -- From hategan at mcs.anl.gov Mon Jul 2 21:23:37 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 02 Jul 2007 21:23:37 -0500 Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: References: Message-ID: <1183429417.16404.0.camel@blabla.mcs.anl.gov> Yup. Try now. On Tue, 2007-07-03 at 06:38 +0530, Ben Clifford wrote: > I get the below when I try to run a hello world workflow > (examples/tutorial/q1.swift). > > I think Nika also saw something that looks similar, with a different > workflow. > > This is with cog r1655. > > I reverted my checkout to cog r1650 (svn merge -r1655:1650 .) and hello > world runs ok (r1650 being before the most recent set of cog commits). > > > $ swift -debug q1.swift > Recompilation suppressed. > > null > kernel:cache @ sys.xml, line: 3 > Caused by: java.lang.UnsupportedOperationException > at java.util.AbstractMap.put(AbstractMap.java:228) > at > org.globus.cog.karajan.workflow.nodes.CacheNode.getTrackingArguments(CacheNode.java:153) > at > org.globus.cog.karajan.workflow.nodes.CacheNode.post(CacheNode.java:77) > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) > at > org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.nonArgChildCompleted(PartialArgumentsContainer.java:90) > at > org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.childCompleted(PartialArgumentsContainer.java:85) > at > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) > at > org.globus.cog.karajan.workflow.nodes.CacheNode.notificationEvent(CacheNode.java:111) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334) > at > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > at > org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:172) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at > org.globus.cog.karajan.workflow.nodes.Namespace.post(Namespace.java:40) > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) > at > org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.nonArgChildCompleted(PartialArgumentsContainer.java:90) > > From benc at hawaga.org.uk Mon Jul 2 22:40:11 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 3 Jul 2007 03:40:11 +0000 (GMT) Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: <1183429417.16404.0.camel@blabla.mcs.anl.gov> References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> Message-ID: On Mon, 2 Jul 2007, Mihael Hategan wrote: > Yup. Try now. > works -- From benc at hawaga.org.uk Tue Jul 3 12:53:30 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 3 Jul 2007 23:23:30 +0530 (IST) Subject: [Swift-devel] mapper syntax Message-ID: The syntax: imagefiles if[] ; is rather noisy all on one line. A syntax change could be to express the above as: imagefiles if[] map my_mapper { foo = @strcat(filename,blah); otherparam = true; moreparams = false; }; -- From benc at hawaga.org.uk Tue Jul 3 12:50:35 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 3 Jul 2007 23:20:35 +0530 (IST) Subject: [Swift-devel] xml tc.data format Message-ID: I'd like to make tc.data be formatted as XML: i) the present tab-deliminated format has usability issues (pretty much the same as Makefile has). tabs are used for a reason (I think because some fields in the file can have spaces in them, or something like that). ii) it would be more consistent with the sites.xml format. -- From hategan at mcs.anl.gov Tue Jul 3 13:01:47 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 03 Jul 2007 13:01:47 -0500 Subject: [Swift-devel] mapper syntax In-Reply-To: References: Message-ID: <1183485707.17547.2.camel@blabla.mcs.anl.gov> On Tue, 2007-07-03 at 23:23 +0530, Ben Clifford wrote: > The syntax: > > imagefiles if[] > ; > > is rather noisy all on one line. > > A syntax change could be to express the above as: > > imagefiles if[] map my_mapper { What if "map" be replaced by some operator (":", "~", "#")? > foo = @strcat(filename,blah); > otherparam = true; > moreparams = false; > }; The semicolon should not be required after a '}'. > > From hategan at mcs.anl.gov Tue Jul 3 13:04:09 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 03 Jul 2007 13:04:09 -0500 Subject: [Swift-devel] xml tc.data format In-Reply-To: References: Message-ID: <1183485849.17547.5.camel@blabla.mcs.anl.gov> On Tue, 2007-07-03 at 23:20 +0530, Ben Clifford wrote: > I'd like to make tc.data be formatted as XML: > > i) the present tab-deliminated format has usability issues (pretty much > the same as Makefile has). tabs are used for a reason (I think because > some fields in the file can have spaces in them, or something like that). 1. Having named args (attributes) would make it easier to skip some of them instead of writing NULL (or was it null?). 2. The code for parsing it would be much simpler, and we could probably remove the dependency on vds. > > ii) it would be more consistent with the sites.xml format. > From benc at hawaga.org.uk Tue Jul 3 14:33:51 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 3 Jul 2007 19:33:51 +0000 (GMT) Subject: [Swift-devel] xml tc.data format In-Reply-To: <1183485849.17547.5.camel@blabla.mcs.anl.gov> References: <1183485849.17547.5.camel@blabla.mcs.anl.gov> Message-ID: another thing i was thinking about for the config file formats, is to change profile specification from: 5 which is how profiles are represented in the VDS1-style sites.xml to a more document-like(?) form such as: 5 This makes better use of XML structure, but I don't know how it would fit (perhaps quite badly) into the present way in which the swift code reads in sites.xml. -- From hategan at mcs.anl.gov Tue Jul 3 14:37:20 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 03 Jul 2007 14:37:20 -0500 Subject: [Swift-devel] xml tc.data format In-Reply-To: References: <1183485849.17547.5.camel@blabla.mcs.anl.gov> Message-ID: <1183491440.24728.1.camel@blabla.mcs.anl.gov> On Tue, 2007-07-03 at 19:33 +0000, Ben Clifford wrote: > another thing i was thinking about for the config file formats, is to > change profile specification from: > > 5 > > which is how profiles are represented in the VDS1-style sites.xml > > to a more document-like(?) form such as: > > 5 > > This makes better use of XML structure, but I don't know how it would fit > (perhaps quite badly) into the present way in which the swift code reads > in sites.xml. You'd have to pre-define things, so you won't get the flexibility of dynamic properties. > From benc at hawaga.org.uk Tue Jul 3 23:34:25 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 4 Jul 2007 10:04:25 +0530 (IST) Subject: [Swift-devel] xml tc.data format In-Reply-To: <1183485849.17547.5.camel@blabla.mcs.anl.gov> References: <1183485849.17547.5.camel@blabla.mcs.anl.gov> Message-ID: On Tue, 3 Jul 2007, Mihael Hategan wrote: > 2. The code for parsing it would be much simpler, and we could probably > remove the dependency on vds. VDSScheduler uses the RoundRobin site selector from VDS1. But people aren't using VDSScheduler so that can probably go away too. -- From benc at hawaga.org.uk Wed Jul 4 02:02:23 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 4 Jul 2007 12:32:23 +0530 (IST) Subject: [Swift-devel] license Message-ID: There's a jar file in lib/ called: jug-lgpl-2.0.0.jar The filename might suggest that this is subject to the LGPL. Does anyone know? -- From benc at hawaga.org.uk Wed Jul 4 00:37:39 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 4 Jul 2007 11:07:39 +0530 (IST) Subject: [Swift-devel] dot files by default Message-ID: does anyone have preference about whether .dot graphviz files are generated by default or not? I find them a bit annoying in as much as they double the number of run files in my working directories to no immediate benefit. -- From hategan at mcs.anl.gov Wed Jul 4 22:57:28 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 04 Jul 2007 22:57:28 -0500 Subject: [Swift-devel] license In-Reply-To: References: Message-ID: <1183607848.3638.1.camel@blabla.mcs.anl.gov> Yes. It's actually available in 2 licenses. http://jug.safehaus.org/Download On Wed, 2007-07-04 at 12:32 +0530, Ben Clifford wrote: > There's a jar file in lib/ called: jug-lgpl-2.0.0.jar > > The filename might suggest that this is subject to the LGPL. > > Does anyone know? > From benc at hawaga.org.uk Wed Jul 4 23:02:14 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 5 Jul 2007 04:02:14 +0000 (GMT) Subject: [Swift-devel] license In-Reply-To: <1183607848.3638.1.camel@blabla.mcs.anl.gov> References: <1183607848.3638.1.camel@blabla.mcs.anl.gov> Message-ID: ok cool. I suspect the dev.globus incubatorgods will be happier with the ASL one. Funny that they have separate jar files for each license. On Wed, 4 Jul 2007, Mihael Hategan wrote: > Yes. It's actually available in 2 licenses. > http://jug.safehaus.org/Download > > On Wed, 2007-07-04 at 12:32 +0530, Ben Clifford wrote: > > There's a jar file in lib/ called: jug-lgpl-2.0.0.jar > > > > The filename might suggest that this is subject to the LGPL. > > > > Does anyone know? > > > > From hategan at mcs.anl.gov Wed Jul 4 23:09:46 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 04 Jul 2007 23:09:46 -0500 Subject: [Swift-devel] license In-Reply-To: References: <1183607848.3638.1.camel@blabla.mcs.anl.gov> Message-ID: <1183608586.4172.0.camel@blabla.mcs.anl.gov> What's wrong with LGPL now? On Thu, 2007-07-05 at 04:02 +0000, Ben Clifford wrote: > ok cool. I suspect the dev.globus incubatorgods will be happier with the > ASL one. Funny that they have separate jar files for each license. > > On Wed, 4 Jul 2007, Mihael Hategan wrote: > > > Yes. It's actually available in 2 licenses. > > http://jug.safehaus.org/Download > > > > On Wed, 2007-07-04 at 12:32 +0530, Ben Clifford wrote: > > > There's a jar file in lib/ called: jug-lgpl-2.0.0.jar > > > > > > The filename might suggest that this is subject to the LGPL. > > > > > > Does anyone know? > > > > > > > > From benc at hawaga.org.uk Wed Jul 4 23:13:29 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 5 Jul 2007 04:13:29 +0000 (GMT) Subject: [Swift-devel] license In-Reply-To: <1183608586.4172.0.camel@blabla.mcs.anl.gov> References: <1183607848.3638.1.camel@blabla.mcs.anl.gov> <1183608586.4172.0.camel@blabla.mcs.anl.gov> Message-ID: paranoid lawyers? On Wed, 4 Jul 2007, Mihael Hategan wrote: > What's wrong with LGPL now? > > On Thu, 2007-07-05 at 04:02 +0000, Ben Clifford wrote: > > ok cool. I suspect the dev.globus incubatorgods will be happier with the > > ASL one. Funny that they have separate jar files for each license. > > > > On Wed, 4 Jul 2007, Mihael Hategan wrote: > > > > > Yes. It's actually available in 2 licenses. > > > http://jug.safehaus.org/Download > > > > > > On Wed, 2007-07-04 at 12:32 +0530, Ben Clifford wrote: > > > > There's a jar file in lib/ called: jug-lgpl-2.0.0.jar > > > > > > > > The filename might suggest that this is subject to the LGPL. > > > > > > > > Does anyone know? > > > > > > > > > > > > > > From benc at hawaga.org.uk Wed Jul 4 23:41:38 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 5 Jul 2007 04:41:38 +0000 (GMT) Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data In-Reply-To: <20070701071846.56FF016505@foxtrot.mcs.anl.gov> References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov> Message-ID: I don't think that's true. If data files are labelled with URIs rather than paths-relative-to-submit-directory, then those URIs are understandable without a VDC-as-entity. You don't need a separate VDC to tell you how to get at myfile here: file myfile <"gsiftp://terminable.ci.uchicago.edu/scratch/foo/">; The 'data file pointer store' exists already - its the hierarchical namespace that is rooted in IANA's management of the URI and DNS space, continues to UC's management of DNS space and then down to my management of terminable's filesystem space and then down to whoever owns the foo directory. On Sun, 1 Jul 2007, bugzilla-daemon at mcs.anl.gov wrote: > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=76 > > > > > > ------- Comment #1 from hategan at mcs.anl.gov 2007-07-01 02:18 ------- > This would require a data file pointer store (VDC like thing) which can record > where intermediate files are instead of assuming they are always available on > the submit host. > > > From benc at hawaga.org.uk Thu Jul 5 01:18:10 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 5 Jul 2007 11:48:10 +0530 (IST) Subject: [Swift-devel] language behaviour tests Message-ID: In r891 I put in some language behaviour tests in tests/language-behaviour/ These run a bunch of small SwiftScript programs locally and check that they output expected text - for example, checking that @strcat really does concatenate, that + really does add, and other such things. I built them for testing various changes I've been playing with at the language parsing and compilation layer. Previously I was using the tests in tests/language/ for testing parser changes. The language/ tests check that input SwiftScript always produces the same .xml intermediate form, whilst these new tests check that the input SwiftScript always produces the same output (in a file) on execution, without regard to whether the .xml and .kml intermediate files take a different form or not. -- From nefedova at mcs.anl.gov Thu Jul 5 08:34:39 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Thu, 5 Jul 2007 08:34:39 -0500 Subject: [Swift-devel] dot files by default In-Reply-To: References: Message-ID: <69D182A1-2658-4B6E-85E7-6B86ECB97A13@mcs.anl.gov> It would've been even better if these dot files were generated correctly. There is Bug #35 about it... Nika On Jul 4, 2007, at 12:37 AM, Ben Clifford wrote: > does anyone have preference about whether .dot graphviz files are > generated by default or not? > > I find them a bit annoying in as much as they double the number of run > files in my working directories to no immediate benefit. > > -- > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Thu Jul 5 08:55:31 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 05 Jul 2007 08:55:31 -0500 Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data In-Reply-To: References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov> Message-ID: <1183643731.5084.3.camel@blabla.mcs.anl.gov> I think you're missing something. You need to remember where the files are. The mapping information becomes insufficient. It tells you where some initial files were, but it won't contain any site information. And that's good, because the decision of where something is done is made at run-time. But you still need some store (even though probably memory-based and only persistent through one swift run). On Thu, 2007-07-05 at 04:41 +0000, Ben Clifford wrote: > I don't think that's true. > > If data files are labelled with URIs rather than > paths-relative-to-submit-directory, then those URIs are understandable > without a VDC-as-entity. > > You don't need a separate VDC to tell you how to get at myfile here: > > file myfile <"gsiftp://terminable.ci.uchicago.edu/scratch/foo/">; > > The 'data file pointer store' exists already - its the hierarchical > namespace that is rooted in IANA's management of the URI and DNS space, > continues to UC's management of DNS space and then down to my management > of terminable's filesystem space and then down to whoever owns the foo > directory. > > > On Sun, 1 Jul 2007, bugzilla-daemon at mcs.anl.gov wrote: > > > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=76 > > > > > > > > > > > > ------- Comment #1 from hategan at mcs.anl.gov 2007-07-01 02:18 ------- > > This would require a data file pointer store (VDC like thing) which can record > > where intermediate files are instead of assuming they are always available on > > the submit host. > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Thu Jul 5 08:59:16 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 05 Jul 2007 08:59:16 -0500 Subject: [Swift-devel] dot files by default In-Reply-To: <69D182A1-2658-4B6E-85E7-6B86ECB97A13@mcs.anl.gov> References: <69D182A1-2658-4B6E-85E7-6B86ECB97A13@mcs.anl.gov> Message-ID: <1183643956.5084.7.camel@blabla.mcs.anl.gov> On Thu, 2007-07-05 at 08:34 -0500, Veronika Nefedova wrote: > It would've been even better if these dot files were generated > correctly. There is Bug #35 about it... That's helpful ;) > > Nika > > On Jul 4, 2007, at 12:37 AM, Ben Clifford wrote: > > > does anyone have preference about whether .dot graphviz files are > > generated by default or not? > > > > I find them a bit annoying in as much as they double the number of run > > files in my working directories to no immediate benefit. > > > > -- > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Thu Jul 5 09:05:17 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 5 Jul 2007 14:05:17 +0000 (GMT) Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data In-Reply-To: <1183643731.5084.3.camel@blabla.mcs.anl.gov> References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov> <1183643731.5084.3.camel@blabla.mcs.anl.gov> Message-ID: how does it know where files are now, between jobs? On Thu, 5 Jul 2007, Mihael Hategan wrote: > I think you're missing something. You need to remember where the files > are. The mapping information becomes insufficient. It tells you where > some initial files were, but it won't contain any site information. And > that's good, because the decision of where something is done is made at > run-time. But you still need some store (even though probably > memory-based and only persistent through one swift run). > > On Thu, 2007-07-05 at 04:41 +0000, Ben Clifford wrote: > > I don't think that's true. > > > > If data files are labelled with URIs rather than > > paths-relative-to-submit-directory, then those URIs are understandable > > without a VDC-as-entity. > > > > You don't need a separate VDC to tell you how to get at myfile here: > > > > file myfile <"gsiftp://terminable.ci.uchicago.edu/scratch/foo/">; > > > > The 'data file pointer store' exists already - its the hierarchical > > namespace that is rooted in IANA's management of the URI and DNS space, > > continues to UC's management of DNS space and then down to my management > > of terminable's filesystem space and then down to whoever owns the foo > > directory. > > > > > > On Sun, 1 Jul 2007, bugzilla-daemon at mcs.anl.gov wrote: > > > > > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=76 > > > > > > > > > > > > > > > > > > ------- Comment #1 from hategan at mcs.anl.gov 2007-07-01 02:18 ------- > > > This would require a data file pointer store (VDC like thing) which can record > > > where intermediate files are instead of assuming they are always available on > > > the submit host. > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > From hategan at mcs.anl.gov Thu Jul 5 09:10:07 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 05 Jul 2007 09:10:07 -0500 Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data In-Reply-To: References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov> <1183643731.5084.3.camel@blabla.mcs.anl.gov> Message-ID: <1183644607.5084.9.camel@blabla.mcs.anl.gov> On Thu, 2007-07-05 at 14:05 +0000, Ben Clifford wrote: > how does it know where files are now, between jobs? That's the thing. They're always on localhost. > > On Thu, 5 Jul 2007, Mihael Hategan wrote: > > > I think you're missing something. You need to remember where the files > > are. The mapping information becomes insufficient. It tells you where > > some initial files were, but it won't contain any site information. And > > that's good, because the decision of where something is done is made at > > run-time. But you still need some store (even though probably > > memory-based and only persistent through one swift run). > > > > On Thu, 2007-07-05 at 04:41 +0000, Ben Clifford wrote: > > > I don't think that's true. > > > > > > If data files are labelled with URIs rather than > > > paths-relative-to-submit-directory, then those URIs are understandable > > > without a VDC-as-entity. > > > > > > You don't need a separate VDC to tell you how to get at myfile here: > > > > > > file myfile <"gsiftp://terminable.ci.uchicago.edu/scratch/foo/">; > > > > > > The 'data file pointer store' exists already - its the hierarchical > > > namespace that is rooted in IANA's management of the URI and DNS space, > > > continues to UC's management of DNS space and then down to my management > > > of terminable's filesystem space and then down to whoever owns the foo > > > directory. > > > > > > > > > On Sun, 1 Jul 2007, bugzilla-daemon at mcs.anl.gov wrote: > > > > > > > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=76 > > > > > > > > > > > > > > > > > > > > > > > > ------- Comment #1 from hategan at mcs.anl.gov 2007-07-01 02:18 ------- > > > > This would require a data file pointer store (VDC like thing) which can record > > > > where intermediate files are instead of assuming they are always available on > > > > the submit host. > > > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > From benc at hawaga.org.uk Thu Jul 5 11:25:06 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 5 Jul 2007 16:25:06 +0000 (GMT) Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data In-Reply-To: <1183644607.5084.9.camel@blabla.mcs.anl.gov> References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov> <1183643731.5084.3.camel@blabla.mcs.anl.gov> <1183644607.5084.9.camel@blabla.mcs.anl.gov> Message-ID: they're always in the place that the path name says they are. whether its a URI or a local relative path. On Thu, 5 Jul 2007, Mihael Hategan wrote: > On Thu, 2007-07-05 at 14:05 +0000, Ben Clifford wrote: > > how does it know where files are now, between jobs? > > That's the thing. They're always on localhost. > > > > > On Thu, 5 Jul 2007, Mihael Hategan wrote: > > > > > I think you're missing something. You need to remember where the files > > > are. The mapping information becomes insufficient. It tells you where > > > some initial files were, but it won't contain any site information. And > > > that's good, because the decision of where something is done is made at > > > run-time. But you still need some store (even though probably > > > memory-based and only persistent through one swift run). > > > > > > On Thu, 2007-07-05 at 04:41 +0000, Ben Clifford wrote: > > > > I don't think that's true. > > > > > > > > If data files are labelled with URIs rather than > > > > paths-relative-to-submit-directory, then those URIs are understandable > > > > without a VDC-as-entity. > > > > > > > > You don't need a separate VDC to tell you how to get at myfile here: > > > > > > > > file myfile <"gsiftp://terminable.ci.uchicago.edu/scratch/foo/">; > > > > > > > > The 'data file pointer store' exists already - its the hierarchical > > > > namespace that is rooted in IANA's management of the URI and DNS space, > > > > continues to UC's management of DNS space and then down to my management > > > > of terminable's filesystem space and then down to whoever owns the foo > > > > directory. > > > > > > > > > > > > On Sun, 1 Jul 2007, bugzilla-daemon at mcs.anl.gov wrote: > > > > > > > > > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=76 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------- Comment #1 from hategan at mcs.anl.gov 2007-07-01 02:18 ------- > > > > > This would require a data file pointer store (VDC like thing) which can record > > > > > where intermediate files are instead of assuming they are always available on > > > > > the submit host. > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > From hategan at mcs.anl.gov Thu Jul 5 11:41:30 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 05 Jul 2007 11:41:30 -0500 Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data In-Reply-To: References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov> <1183643731.5084.3.camel@blabla.mcs.anl.gov> <1183644607.5084.9.camel@blabla.mcs.anl.gov> Message-ID: <1183653690.11132.2.camel@blabla.mcs.anl.gov> On Thu, 2007-07-05 at 16:25 +0000, Ben Clifford wrote: > they're always in the place that the path name says they are. whether its > a URI or a local relative path. Right, but whereas in the current scheme you can assume the site is localhost, because files are always staged back to localhost, if you don't do the stage-out, that assumption goes away. In that case, the site information needs to be recorded. > > On Thu, 5 Jul 2007, Mihael Hategan wrote: > > > On Thu, 2007-07-05 at 14:05 +0000, Ben Clifford wrote: > > > how does it know where files are now, between jobs? > > > > That's the thing. They're always on localhost. > > > > > > > > On Thu, 5 Jul 2007, Mihael Hategan wrote: > > > > > > > I think you're missing something. You need to remember where the files > > > > are. The mapping information becomes insufficient. It tells you where > > > > some initial files were, but it won't contain any site information. And > > > > that's good, because the decision of where something is done is made at > > > > run-time. But you still need some store (even though probably > > > > memory-based and only persistent through one swift run). > > > > > > > > On Thu, 2007-07-05 at 04:41 +0000, Ben Clifford wrote: > > > > > I don't think that's true. > > > > > > > > > > If data files are labelled with URIs rather than > > > > > paths-relative-to-submit-directory, then those URIs are understandable > > > > > without a VDC-as-entity. > > > > > > > > > > You don't need a separate VDC to tell you how to get at myfile here: > > > > > > > > > > file myfile <"gsiftp://terminable.ci.uchicago.edu/scratch/foo/">; > > > > > > > > > > The 'data file pointer store' exists already - its the hierarchical > > > > > namespace that is rooted in IANA's management of the URI and DNS space, > > > > > continues to UC's management of DNS space and then down to my management > > > > > of terminable's filesystem space and then down to whoever owns the foo > > > > > directory. > > > > > > > > > > > > > > > On Sun, 1 Jul 2007, bugzilla-daemon at mcs.anl.gov wrote: > > > > > > > > > > > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=76 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------- Comment #1 from hategan at mcs.anl.gov 2007-07-01 02:18 ------- > > > > > > This would require a data file pointer store (VDC like thing) which can record > > > > > > where intermediate files are instead of assuming they are always available on > > > > > > the submit host. > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > From benc at hawaga.org.uk Thu Jul 5 11:47:45 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 5 Jul 2007 16:47:45 +0000 (GMT) Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data In-Reply-To: <1183653690.11132.2.camel@blabla.mcs.anl.gov> References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov> <1183643731.5084.3.camel@blabla.mcs.anl.gov> <1183644607.5084.9.camel@blabla.mcs.anl.gov> <1183653690.11132.2.camel@blabla.mcs.anl.gov> Message-ID: right. On Thu, 5 Jul 2007, Mihael Hategan wrote: > On Thu, 2007-07-05 at 16:25 +0000, Ben Clifford wrote: > > they're always in the place that the path name says they are. whether its > > a URI or a local relative path. > > Right, but whereas in the current scheme you can assume the site is > localhost, because files are always staged back to localhost, if you > don't do the stage-out, that assumption goes away. In that case, the > site information needs to be recorded. > > > > > On Thu, 5 Jul 2007, Mihael Hategan wrote: > > > > > On Thu, 2007-07-05 at 14:05 +0000, Ben Clifford wrote: > > > > how does it know where files are now, between jobs? > > > > > > That's the thing. They're always on localhost. > > > > > > > > > > > On Thu, 5 Jul 2007, Mihael Hategan wrote: > > > > > > > > > I think you're missing something. You need to remember where the files > > > > > are. The mapping information becomes insufficient. It tells you where > > > > > some initial files were, but it won't contain any site information. And > > > > > that's good, because the decision of where something is done is made at > > > > > run-time. But you still need some store (even though probably > > > > > memory-based and only persistent through one swift run). > > > > > > > > > > On Thu, 2007-07-05 at 04:41 +0000, Ben Clifford wrote: > > > > > > I don't think that's true. > > > > > > > > > > > > If data files are labelled with URIs rather than > > > > > > paths-relative-to-submit-directory, then those URIs are understandable > > > > > > without a VDC-as-entity. > > > > > > > > > > > > You don't need a separate VDC to tell you how to get at myfile here: > > > > > > > > > > > > file myfile <"gsiftp://terminable.ci.uchicago.edu/scratch/foo/">; > > > > > > > > > > > > The 'data file pointer store' exists already - its the hierarchical > > > > > > namespace that is rooted in IANA's management of the URI and DNS space, > > > > > > continues to UC's management of DNS space and then down to my management > > > > > > of terminable's filesystem space and then down to whoever owns the foo > > > > > > directory. > > > > > > > > > > > > > > > > > > On Sun, 1 Jul 2007, bugzilla-daemon at mcs.anl.gov wrote: > > > > > > > > > > > > > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=76 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------- Comment #1 from hategan at mcs.anl.gov 2007-07-01 02:18 ------- > > > > > > > This would require a data file pointer store (VDC like thing) which can record > > > > > > > where intermediate files are instead of assuming they are always available on > > > > > > > the submit host. > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Swift-devel mailing list > > > > > > Swift-devel at ci.uchicago.edu > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From benc at hawaga.org.uk Thu Jul 5 11:55:26 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 5 Jul 2007 16:55:26 +0000 (GMT) Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data In-Reply-To: <1183644607.5084.9.camel@blabla.mcs.anl.gov> References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov> <1183643731.5084.3.camel@blabla.mcs.anl.gov> <1183644607.5084.9.camel@blabla.mcs.anl.gov> Message-ID: so I was thinking the other day while poking through code. 'data' in SwiftScript terms is mostly represented by DSHandle objects. such objects (which can have one of several implementing classes, and potentially more in future) have a number of properties, such as: . value - what the 'value' is, for adding to other values, using @strcat on, performing array/member access using [] and . . submit-side location - what is extracted with @filename and used when that 'data' is passed to an application rather than being operated on by submit-side functions. Neither of these are compulsory (and I think in practice at the moment it works out that you either have a filename or a value and never meaningfully both). So a different model of mapping (which might work better when we want data that doesn't necessarily exist as discrete files or as in-memory values - the two examples that I've seen talked about are 'data from an sql database' and 'constants in a csv file') might be that mappers generate DSHandle trees (specifically a mapper generates a DSHandle, which might have descendants). Those DSHandles might have values, might have filenames, might have other attributes, might have ongoing annotation (which could include keeping track of where within-this-run copies have been made). -- From yongzh at cs.uchicago.edu Thu Jul 5 12:14:32 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Thu, 5 Jul 2007 12:14:32 -0500 (CDT) Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data In-Reply-To: References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov> <1183643731.5084.3.camel@blabla.mcs.anl.gov> <1183644607.5084.9.camel@blabla.mcs.anl.gov> Message-ID: My original thinking about value/filename was that we don't distinguish those at the logical level, essentially they could all just be values. Then when we need to call mapper functions (getFilename, for instance), we interprete the values differently. So in the case of getFilename, we can interprete the value either as 1) the filename itself, returning the value directly 2) writing the value into a file, and returning an automatically generated filename. 3) some other possibilities, e.g. a directory of files. The current DSHandle interface does allow nested trees, so a mapper could return a dshandle tree as the implementation currently stands. Yong. On Thu, 5 Jul 2007, Ben Clifford wrote: > > so I was thinking the other day while poking through code. > > 'data' in SwiftScript terms is mostly represented by DSHandle objects. > > such objects (which can have one of several implementing classes, and > potentially more in future) have a number of properties, such as: > > . value - what the 'value' is, for adding to other values, using @strcat > on, performing array/member access using [] and . > > . submit-side location - what is extracted with @filename and used when > that 'data' is passed to an application rather than being operated on by > submit-side functions. > > Neither of these are compulsory (and I think in practice at the moment it > works out that you either have a filename or a value and never > meaningfully both). > > So a different model of mapping (which might work better when we want data > that doesn't necessarily exist as discrete files or as in-memory values - > the two examples that I've seen talked about are 'data from an sql > database' and 'constants in a csv file') might be that mappers generate > DSHandle trees (specifically a mapper generates a DSHandle, which might > have descendants). Those DSHandles might have values, might have > filenames, might have other attributes, might have ongoing annotation > (which could include keeping track of where within-this-run copies have > been made). > > -- > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Thu Jul 5 12:22:25 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 5 Jul 2007 17:22:25 +0000 (GMT) Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data In-Reply-To: References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov> <1183643731.5084.3.camel@blabla.mcs.anl.gov> <1183644607.5084.9.camel@blabla.mcs.anl.gov> Message-ID: On Thu, 5 Jul 2007, Yong Zhao wrote: > My original thinking about value/filename was that we don't distinguish > those at the logical level, essentially they could all just be values. > Then when we need to call mapper functions (getFilename, for instance), we > interprete the values differently. So in the case of getFilename, we can > interprete the value either as > 1) the filename itself, returning the value directly > 2) writing the value into a file, and returning an automatically generated > filename. > 3) some other possibilities, e.g. a directory of files. option 1 goes against the strongly typed model - if I have a brain image, I dont want an access to that brain image to suddenly be the string "brain.img" - that isn't of type 'braingimage', its of type 'string'. but the other two options work, I think - that's what a mapper does - expresses how swiftscript data is interpreted in various different ways - as a (set of) file(s), as a in-memory value, in some other form. But I don't think it will always be the case that each data object will be accessible in each form. For example, a brain scan doesn't make much sense being mapepd into the karajan runtime at the moment - we have nothing to do interesting things with such. > > The current DSHandle interface does allow nested trees, so a mapper > could return a dshandle tree as the implementation currently stands. > > Yong. > > > On Thu, 5 Jul 2007, Ben Clifford wrote: > > > > > so I was thinking the other day while poking through code. > > > > 'data' in SwiftScript terms is mostly represented by DSHandle objects. > > > > such objects (which can have one of several implementing classes, and > > potentially more in future) have a number of properties, such as: > > > > . value - what the 'value' is, for adding to other values, using @strcat > > on, performing array/member access using [] and . > > > > . submit-side location - what is extracted with @filename and used when > > that 'data' is passed to an application rather than being operated on by > > submit-side functions. > > > > Neither of these are compulsory (and I think in practice at the moment it > > works out that you either have a filename or a value and never > > meaningfully both). > > > > So a different model of mapping (which might work better when we want data > > that doesn't necessarily exist as discrete files or as in-memory values - > > the two examples that I've seen talked about are 'data from an sql > > database' and 'constants in a csv file') might be that mappers generate > > DSHandle trees (specifically a mapper generates a DSHandle, which might > > have descendants). Those DSHandles might have values, might have > > filenames, might have other attributes, might have ongoing annotation > > (which could include keeping track of where within-this-run copies have > > been made). > > > > -- > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > From yongzh at cs.uchicago.edu Thu Jul 5 12:28:36 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Thu, 5 Jul 2007 12:28:36 -0500 (CDT) Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data In-Reply-To: References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov> <1183643731.5084.3.camel@blabla.mcs.anl.gov> <1183644607.5084.9.camel@blabla.mcs.anl.gov> Message-ID: option 1) does not say it is of type string, but it is of type any, which means it could be an opaque file that we are not interested in going into the file content, in which case, a file name could be in place of the content, it all depends on how the mapper interprete the value. Yong. On Thu, 5 Jul 2007, Ben Clifford wrote: > > > On Thu, 5 Jul 2007, Yong Zhao wrote: > > > My original thinking about value/filename was that we don't distinguish > > those at the logical level, essentially they could all just be values. > > Then when we need to call mapper functions (getFilename, for instance), we > > interprete the values differently. So in the case of getFilename, we can > > interprete the value either as > > 1) the filename itself, returning the value directly > > > 2) writing the value into a file, and returning an automatically generated > > filename. > > > 3) some other possibilities, e.g. a directory of files. > > option 1 goes against the strongly typed model - if I have a brain image, > I dont want an access to that brain image to suddenly be the string > "brain.img" - that isn't of type 'braingimage', its of type 'string'. > > but the other two options work, I think - that's what a mapper does - > expresses how swiftscript data is interpreted in various different ways > - as a (set of) file(s), as a in-memory value, in some other form. > > But I don't think it will always be the case that each data object will be > accessible in each form. For example, a brain scan doesn't make much sense > being mapepd into the karajan runtime at the moment - we have nothing to > do interesting things with such. > > > > > The current DSHandle interface does allow nested trees, so a mapper > > could return a dshandle tree as the implementation currently stands. > > > > Yong. > > > > > > On Thu, 5 Jul 2007, Ben Clifford wrote: > > > > > > > > so I was thinking the other day while poking through code. > > > > > > 'data' in SwiftScript terms is mostly represented by DSHandle objects. > > > > > > such objects (which can have one of several implementing classes, and > > > potentially more in future) have a number of properties, such as: > > > > > > . value - what the 'value' is, for adding to other values, using @strcat > > > on, performing array/member access using [] and . > > > > > > . submit-side location - what is extracted with @filename and used when > > > that 'data' is passed to an application rather than being operated on by > > > submit-side functions. > > > > > > Neither of these are compulsory (and I think in practice at the moment it > > > works out that you either have a filename or a value and never > > > meaningfully both). > > > > > > So a different model of mapping (which might work better when we want data > > > that doesn't necessarily exist as discrete files or as in-memory values - > > > the two examples that I've seen talked about are 'data from an sql > > > database' and 'constants in a csv file') might be that mappers generate > > > DSHandle trees (specifically a mapper generates a DSHandle, which might > > > have descendants). Those DSHandles might have values, might have > > > filenames, might have other attributes, might have ongoing annotation > > > (which could include keeping track of where within-this-run copies have > > > been made). > > > > > > -- > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > From hategan at mcs.anl.gov Thu Jul 5 13:02:03 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 05 Jul 2007 13:02:03 -0500 Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data In-Reply-To: References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov> <1183643731.5084.3.camel@blabla.mcs.anl.gov> <1183644607.5084.9.camel@blabla.mcs.anl.gov> Message-ID: <1183658523.13928.3.camel@blabla.mcs.anl.gov> On Thu, 2007-07-05 at 16:55 +0000, Ben Clifford wrote: > So a different model of mapping (which might work better when we want data > that doesn't necessarily exist as discrete files or as in-memory values - > the two examples that I've seen talked about are 'data from an sql > database' and 'constants in a csv file') might be that mappers generate > DSHandle trees (specifically a mapper generates a DSHandle, which might > have descendants). Those DSHandles might have values, might have > filenames, might have other attributes, might have ongoing annotation > (which could include keeping track of where within-this-run copies have > been made). However, we should keep in mind that mapping is lazy. We want that to achieve scalability, and at least in theory, infinite arrays (for that we would need some form of garbage collection). On the other hand, data itself is future-like. The difference being that everything is computed as soon as possible, but access is delayed until data is available. > > -- > From hategan at mcs.anl.gov Thu Jul 5 13:05:45 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 05 Jul 2007 13:05:45 -0500 Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data In-Reply-To: References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov> <1183643731.5084.3.camel@blabla.mcs.anl.gov> <1183644607.5084.9.camel@blabla.mcs.anl.gov> Message-ID: <1183658745.13928.8.camel@blabla.mcs.anl.gov> On Thu, 2007-07-05 at 12:14 -0500, Yong Zhao wrote: > My original thinking about value/filename was that we don't distinguish > those at the logical level, essentially they could all just be values. > Then when we need to call mapper functions (getFilename, for instance), we > interprete the values differently. So in the case of getFilename, we can > interprete the value either as > 1) the filename itself, returning the value directly > 2) writing the value into a file, and returning an automatically generated > filename. > 3) some other possibilities, e.g. a directory of files. This clearly conflicts with the ability to apply swift functions to data in files or databases, as one would need, in the case of files, both a file pointer and actual data. I would rather follow a known model for this: pointers. There are addresses (files, uris, db/table/column/row) and values, which are stored at those addresses. What's missing from the scheme right now is the ability of a mapper to fetch actual data from such locations when needed. > > The current DSHandle interface does allow nested trees, so a mapper > could return a dshandle tree as the implementation currently stands. > > Yong. > > > On Thu, 5 Jul 2007, Ben Clifford wrote: > > > > > so I was thinking the other day while poking through code. > > > > 'data' in SwiftScript terms is mostly represented by DSHandle objects. > > > > such objects (which can have one of several implementing classes, and > > potentially more in future) have a number of properties, such as: > > > > . value - what the 'value' is, for adding to other values, using @strcat > > on, performing array/member access using [] and . > > > > . submit-side location - what is extracted with @filename and used when > > that 'data' is passed to an application rather than being operated on by > > submit-side functions. > > > > Neither of these are compulsory (and I think in practice at the moment it > > works out that you either have a filename or a value and never > > meaningfully both). > > > > So a different model of mapping (which might work better when we want data > > that doesn't necessarily exist as discrete files or as in-memory values - > > the two examples that I've seen talked about are 'data from an sql > > database' and 'constants in a csv file') might be that mappers generate > > DSHandle trees (specifically a mapper generates a DSHandle, which might > > have descendants). Those DSHandles might have values, might have > > filenames, might have other attributes, might have ongoing annotation > > (which could include keeping track of where within-this-run copies have > > been made). > > > > -- > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > From nefedova at mcs.anl.gov Thu Jul 5 13:55:51 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Thu, 5 Jul 2007 13:55:51 -0500 Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: <1183429417.16404.0.camel@blabla.mcs.anl.gov> References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> Message-ID: my workflow doesn't work with recent changes. It worked fine for 1 molecule, but fails for 244 (right after compilation step, before submitting it to the grid). These are the errors: 2007-07-05 13:37:51,294 DEBUG VDL2ExecutionContext Missing argument s11 for sys:element(out1, out2, out3, out4, in1, in2, in3, in4, in5, in6, in7, in8, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11) Missing argument s11 for sys:element(out1, out2, out3, out4, in1, in2, in3, in4, in5, in6, in7, in8, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11) CHARMM3 @ MolDyn-244.kml, line: 209 vdl:mains @ MolDyn-244.kml, line: 583910 at org.globus.cog.karajan.workflow.nodes.user.UserDefinedElement.prepareIns tanceArguments(UserDefinedElement.java:196) at org.globus.cog.karajan.workflow.nodes.user.UserDefinedElement.startBody( UserDefinedElement.java:170) at org.globus.cog.karajan.workflow.nodes.user.SequentialImplicitExecutionUD E.startBody(SequentialImplicitExecutionUDE.java:55) at org.globus.cog.karajan.workflow.nodes.user.SequentialImplicitExecutionUD E.childCompleted(SequentialImplicitExecutionUDE.java:82) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent (Sequential.java:33) at org.globus.cog.karajan.workflow.nodes.FlowNode.event (FlowNode.java:334) at org.globus.cog.karajan.workflow.events.EventBus.send (EventBus.java:123) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked (EventBus.java:97) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent (FlowNode.java:172) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete (FlowNode.java:298) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post (FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.ch ildCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent (Sequential.java:33) at org.globus.cog.karajan.workflow.nodes.FlowNode.event (FlowNode.java:334) at org.globus.cog.karajan.workflow.events.EventBus.send (EventBus.java:123) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked (EventBus.java:97) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent (FlowNode.java:172) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete (FlowNode.java:298) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post (FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.Parallel.notificationEvent (Parallel.java:90) at org.globus.cog.karajan.workflow.nodes.FlowNode.event (FlowNode.java:334) a complete log is on terminable in ~nefedova/MolDyn-244- zvhy3me4scm61.log the MolDyn-244.* files are also there. Please note that this is exactly the same file (dtm) that worked before. Nika On Jul 2, 2007, at 9:23 PM, Mihael Hategan wrote: > Yup. Try now. > > On Tue, 2007-07-03 at 06:38 +0530, Ben Clifford wrote: >> I get the below when I try to run a hello world workflow >> (examples/tutorial/q1.swift). >> >> I think Nika also saw something that looks similar, with a different >> workflow. >> >> This is with cog r1655. >> >> I reverted my checkout to cog r1650 (svn merge -r1655:1650 .) and >> hello >> world runs ok (r1650 being before the most recent set of cog >> commits). >> >> >> $ swift -debug q1.swift >> Recompilation suppressed. >> >> null >> kernel:cache @ sys.xml, line: 3 >> Caused by: java.lang.UnsupportedOperationException >> at java.util.AbstractMap.put(AbstractMap.java:228) >> at >> org.globus.cog.karajan.workflow.nodes.CacheNode.getTrackingArguments( >> CacheNode.java:153) >> at >> org.globus.cog.karajan.workflow.nodes.CacheNode.post >> (CacheNode.java:77) >> at >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments >> .childCompleted(AbstractSequentialWithArguments.java:192) >> at >> org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.nonAr >> gChildCompleted(PartialArgumentsContainer.java:90) >> at >> org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.child >> Completed(PartialArgumentsContainer.java:85) >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent >> (Sequential.java:33) >> at >> org.globus.cog.karajan.workflow.nodes.CacheNode.notificationEvent >> (CacheNode.java:111) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java: >> 334) >> at >> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java: >> 123) >> at >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked >> (EventBus.java:97) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent( >> FlowNode.java:172) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete >> (FlowNode.java:298) >> at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post >> (FlowContainer.java:58) >> at >> org.globus.cog.karajan.workflow.nodes.Namespace.post >> (Namespace.java:40) >> at >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments >> .childCompleted(AbstractSequentialWithArguments.java:192) >> at >> org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.nonAr >> gChildCompleted(PartialArgumentsContainer.java:90) >> >> > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Thu Jul 5 14:57:25 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 5 Jul 2007 19:57:25 +0000 (GMT) Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> Message-ID: ou have your heap set on the 244 molecule workflow? I run out at the compile stage with default. -- From nefedova at mcs.anl.gov Thu Jul 5 15:01:06 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Thu, 5 Jul 2007 15:01:06 -0500 Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> Message-ID: yep, its set to the max: OPTIONS="-Xms1536m -Xmx1536m" (in bin/swift ) On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote: > ou have your heap set on the 244 molecule workflow? I run out at the > compile stage with default. > -- > > From benc at hawaga.org.uk Thu Jul 5 14:39:32 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 5 Jul 2007 19:39:32 +0000 (GMT) Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> Message-ID: did you touch both the 1 molecule and 244 moleule .swift files to cause recompilation? also, do you have the 1-molecule .swift, .xml and .kml files around? On Thu, 5 Jul 2007, Veronika Nefedova wrote: > my workflow doesn't work with recent changes. It worked fine for 1 molecule, > but fails for 244 (right after compilation step, before submitting it to the > grid). These are the errors: > > 2007-07-05 13:37:51,294 DEBUG VDL2ExecutionContext Missing argument s11 for > sys:element(out1, out2, out3, out4, in1, in2, in3, in4, in5, in6, in7, in8, > s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11) > Missing argument s11 for sys:element(out1, out2, out3, out4, in1, in2, in3, > in4, in5, in6, in7, in8, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11) > CHARMM3 @ MolDyn-244.kml, line: 209 > vdl:mains @ MolDyn-244.kml, line: 583910 > > at > org.globus.cog.karajan.workflow.nodes.user.UserDefinedElement.prepareInstanceArguments(UserDefinedElement.java:196) > at > org.globus.cog.karajan.workflow.nodes.user.UserDefinedElement.startBody(UserDefinedElement.java:170) > at > org.globus.cog.karajan.workflow.nodes.user.SequentialImplicitExecutionUDE.startBody(SequentialImplicitExecutionUDE.java:55) > at > org.globus.cog.karajan.workflow.nodes.user.SequentialImplicitExecutionUDE.childCompleted(SequentialImplicitExecutionUDE.java:82) > at > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334) > at > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > at > org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:172) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) > at > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334) > at > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > at > org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:172) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at > org.globus.cog.karajan.workflow.nodes.Parallel.notificationEvent(Parallel.java:90) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334) > > > a complete log is on terminable in ~nefedova/MolDyn-244-zvhy3me4scm61.log > the MolDyn-244.* files are also there. Please note that this is exactly the > same file (dtm) that worked before. > > Nika > > > On Jul 2, 2007, at 9:23 PM, Mihael Hategan wrote: > > > Yup. Try now. > > > > On Tue, 2007-07-03 at 06:38 +0530, Ben Clifford wrote: > > > I get the below when I try to run a hello world workflow > > > (examples/tutorial/q1.swift). > > > > > > I think Nika also saw something that looks similar, with a different > > > workflow. > > > > > > This is with cog r1655. > > > > > > I reverted my checkout to cog r1650 (svn merge -r1655:1650 .) and hello > > > world runs ok (r1650 being before the most recent set of cog commits). > > > > > > > > > $ swift -debug q1.swift > > > Recompilation suppressed. > > > > > > null > > > kernel:cache @ sys.xml, line: 3 > > > Caused by: java.lang.UnsupportedOperationException > > > at java.util.AbstractMap.put(AbstractMap.java:228) > > > at > > > org.globus.cog.karajan.workflow.nodes.CacheNode.getTrackingArguments(CacheNode.java:153) > > > at > > > org.globus.cog.karajan.workflow.nodes.CacheNode.post(CacheNode.java:77) > > > at > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) > > > at > > > org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.nonArgChildCompleted(PartialArgumentsContainer.java:90) > > > at > > > org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.childCompleted(PartialArgumentsContainer.java:85) > > > at > > > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) > > > at > > > org.globus.cog.karajan.workflow.nodes.CacheNode.notificationEvent(CacheNode.java:111) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334) > > > at > > > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) > > > at > > > org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:172) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > at > > > org.globus.cog.karajan.workflow.nodes.Namespace.post(Namespace.java:40) > > > at > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) > > > at > > > org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.nonArgChildCompleted(PartialArgumentsContainer.java:90) > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > From nefedova at mcs.anl.gov Thu Jul 5 15:03:28 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Thu, 5 Jul 2007 15:03:28 -0500 Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> Message-ID: you can use my kml file that I compiled today with the latest karajan (its on terminable). On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote: > ou have your heap set on the 244 molecule workflow? I run out at the > compile stage with default. > -- > > From nefedova at mcs.anl.gov Thu Jul 5 15:06:07 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Thu, 5 Jul 2007 15:06:07 -0500 Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> Message-ID: <3EF6AA08-FA79-430C-99BE-9CB8EF8CEF70@mcs.anl.gov> yep, I "touched" them both. I put the MoDyn-1.* files also in ~nefedova on terminable. MolDyn-1.dtm ran successfully today. Nika On Jul 5, 2007, at 2:39 PM, Ben Clifford wrote: > did you touch both the 1 molecule and 244 moleule .swift files to > cause > recompilation? > > also, do you have the 1-molecule .swift, .xml and .kml files around? > > On Thu, 5 Jul 2007, Veronika Nefedova wrote: > >> my workflow doesn't work with recent changes. It worked fine for 1 >> molecule, >> but fails for 244 (right after compilation step, before submitting >> it to the >> grid). These are the errors: >> >> 2007-07-05 13:37:51,294 DEBUG VDL2ExecutionContext Missing >> argument s11 for >> sys:element(out1, out2, out3, out4, in1, in2, in3, in4, in5, in6, >> in7, in8, >> s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11) >> Missing argument s11 for sys:element(out1, out2, out3, out4, in1, >> in2, in3, >> in4, in5, in6, in7, in8, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, >> s11) >> CHARMM3 @ MolDyn-244.kml, line: 209 >> vdl:mains @ MolDyn-244.kml, line: 583910 >> >> at >> org.globus.cog.karajan.workflow.nodes.user.UserDefinedElement.prepare >> InstanceArguments(UserDefinedElement.java:196) >> at >> org.globus.cog.karajan.workflow.nodes.user.UserDefinedElement.startBo >> dy(UserDefinedElement.java:170) >> at >> org.globus.cog.karajan.workflow.nodes.user.SequentialImplicitExecutio >> nUDE.startBody(SequentialImplicitExecutionUDE.java:55) >> at >> org.globus.cog.karajan.workflow.nodes.user.SequentialImplicitExecutio >> nUDE.childCompleted(SequentialImplicitExecutionUDE.java:82) >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent >> (Sequential.java:33) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java: >> 334) >> at >> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java: >> 123) >> at >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked >> (EventBus.java:97) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent( >> FlowNode.java:172) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete >> (FlowNode.java:298) >> at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post >> (FlowContainer.java:58) >> at >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments >> .childCompleted(AbstractSequentialWithArguments.java:192) >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent >> (Sequential.java:33) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java: >> 334) >> at >> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java: >> 123) >> at >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked >> (EventBus.java:97) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent( >> FlowNode.java:172) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete >> (FlowNode.java:298) >> at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post >> (FlowContainer.java:58) >> at >> org.globus.cog.karajan.workflow.nodes.Parallel.notificationEvent >> (Parallel.java:90) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java: >> 334) >> >> >> a complete log is on terminable in ~nefedova/MolDyn-244- >> zvhy3me4scm61.log >> the MolDyn-244.* files are also there. Please note that this is >> exactly the >> same file (dtm) that worked before. >> >> Nika >> >> >> On Jul 2, 2007, at 9:23 PM, Mihael Hategan wrote: >> >>> Yup. Try now. >>> >>> On Tue, 2007-07-03 at 06:38 +0530, Ben Clifford wrote: >>>> I get the below when I try to run a hello world workflow >>>> (examples/tutorial/q1.swift). >>>> >>>> I think Nika also saw something that looks similar, with a >>>> different >>>> workflow. >>>> >>>> This is with cog r1655. >>>> >>>> I reverted my checkout to cog r1650 (svn merge -r1655:1650 .) >>>> and hello >>>> world runs ok (r1650 being before the most recent set of cog >>>> commits). >>>> >>>> >>>> $ swift -debug q1.swift >>>> Recompilation suppressed. >>>> >>>> null >>>> kernel:cache @ sys.xml, line: 3 >>>> Caused by: java.lang.UnsupportedOperationException >>>> at java.util.AbstractMap.put(AbstractMap.java:228) >>>> at >>>> org.globus.cog.karajan.workflow.nodes.CacheNode.getTrackingArgument >>>> s(CacheNode.java:153) >>>> at >>>> org.globus.cog.karajan.workflow.nodes.CacheNode.post >>>> (CacheNode.java:77) >>>> at >>>> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArgumen >>>> ts.childCompleted(AbstractSequentialWithArguments.java:192) >>>> at >>>> org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.non >>>> ArgChildCompleted(PartialArgumentsContainer.java:90) >>>> at >>>> org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.chi >>>> ldCompleted(PartialArgumentsContainer.java:85) >>>> at >>>> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent( >>>> Sequential.java:33) >>>> at >>>> org.globus.cog.karajan.workflow.nodes.CacheNode.notificationEvent >>>> (CacheNode.java:111) >>>> at >>>> org.globus.cog.karajan.workflow.nodes.FlowNode.event >>>> (FlowNode.java:334) >>>> at >>>> org.globus.cog.karajan.workflow.events.EventBus.send >>>> (EventBus.java:123) >>>> at >>>> org.globus.cog.karajan.workflow.events.EventBus.sendHooked >>>> (EventBus.java:97) >>>> at >>>> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEven >>>> t(FlowNode.java:172) >>>> at >>>> org.globus.cog.karajan.workflow.nodes.FlowNode.complete >>>> (FlowNode.java:298) >>>> at >>>> org.globus.cog.karajan.workflow.nodes.FlowContainer.post >>>> (FlowContainer.java:58) >>>> at >>>> org.globus.cog.karajan.workflow.nodes.Namespace.post >>>> (Namespace.java:40) >>>> at >>>> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArgumen >>>> ts.childCompleted(AbstractSequentialWithArguments.java:192) >>>> at >>>> org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.non >>>> ArgChildCompleted(PartialArgumentsContainer.java:90) >>>> >>>> >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >> > From benc at hawaga.org.uk Thu Jul 5 15:13:02 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 5 Jul 2007 20:13:02 +0000 (GMT) Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> Message-ID: what karajan revision? and what swift revision? (type svn info in the cog and dsk directories...) On Thu, 5 Jul 2007, Veronika Nefedova wrote: > you can use my kml file that I compiled today with the latest karajan (its on > terminable). > > On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote: > > > ou have your heap set on the 244 molecule workflow? I run out at the > > compile stage with default. > > -- > > > > > From nefedova at mcs.anl.gov Thu Jul 5 15:45:49 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Thu, 5 Jul 2007 15:45:49 -0500 Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> Message-ID: <74CD898E-7A60-4F25-BD15-C0219487AEC0@mcs.anl.gov> 1657 for Karajan and 887 for vdsk On Jul 5, 2007, at 3:13 PM, Ben Clifford wrote: > > what karajan revision? and what swift revision? > > (type svn info in the cog and dsk directories...) > > On Thu, 5 Jul 2007, Veronika Nefedova wrote: > >> you can use my kml file that I compiled today with the latest >> karajan (its on >> terminable). >> >> On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote: >> >>> ou have your heap set on the 244 molecule workflow? I run out at the >>> compile stage with default. >>> -- >>> >>> >> > From hategan at mcs.anl.gov Thu Jul 5 16:20:37 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 05 Jul 2007 16:20:37 -0500 Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: <74CD898E-7A60-4F25-BD15-C0219487AEC0@mcs.anl.gov> References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> <74CD898E-7A60-4F25-BD15-C0219487AEC0@mcs.anl.gov> Message-ID: <1183670437.31476.0.camel@blabla.mcs.anl.gov> I might know what it is. Stay tuned. On Thu, 2007-07-05 at 15:45 -0500, Veronika Nefedova wrote: > 1657 for Karajan and 887 for vdsk > > On Jul 5, 2007, at 3:13 PM, Ben Clifford wrote: > > > > > what karajan revision? and what swift revision? > > > > (type svn info in the cog and dsk directories...) > > > > On Thu, 5 Jul 2007, Veronika Nefedova wrote: > > > >> you can use my kml file that I compiled today with the latest > >> karajan (its on > >> terminable). > >> > >> On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote: > >> > >>> ou have your heap set on the 244 molecule workflow? I run out at the > >>> compile stage with default. > >>> -- > >>> > >>> > >> > > > From hategan at mcs.anl.gov Thu Jul 5 16:33:42 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 05 Jul 2007 16:33:42 -0500 Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: <1183670437.31476.0.camel@blabla.mcs.anl.gov> References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> <74CD898E-7A60-4F25-BD15-C0219487AEC0@mcs.anl.gov> <1183670437.31476.0.camel@blabla.mcs.anl.gov> Message-ID: <1183671222.31476.4.camel@blabla.mcs.anl.gov> In iteratizing the recursive thing that caused the stack overflow, I ignored the fact that there was a lock on every object in the recursion steps. Tentative fix in SVN. I'm running tests to see if things hold. On Thu, 2007-07-05 at 16:20 -0500, Mihael Hategan wrote: > I might know what it is. Stay tuned. > > On Thu, 2007-07-05 at 15:45 -0500, Veronika Nefedova wrote: > > 1657 for Karajan and 887 for vdsk > > > > On Jul 5, 2007, at 3:13 PM, Ben Clifford wrote: > > > > > > > > what karajan revision? and what swift revision? > > > > > > (type svn info in the cog and dsk directories...) > > > > > > On Thu, 5 Jul 2007, Veronika Nefedova wrote: > > > > > >> you can use my kml file that I compiled today with the latest > > >> karajan (its on > > >> terminable). > > >> > > >> On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote: > > >> > > >>> ou have your heap set on the 244 molecule workflow? I run out at the > > >>> compile stage with default. > > >>> -- > > >>> > > >>> > > >> > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Thu Jul 5 16:17:54 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 5 Jul 2007 21:17:54 +0000 (GMT) Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: <74CD898E-7A60-4F25-BD15-C0219487AEC0@mcs.anl.gov> References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> <74CD898E-7A60-4F25-BD15-C0219487AEC0@mcs.anl.gov> Message-ID: try r1650 - that's the version of karajan that we've had for ages, before this week. -- From nefedova at mcs.anl.gov Thu Jul 5 17:05:43 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Thu, 5 Jul 2007 17:05:43 -0500 Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> <74CD898E-7A60-4F25-BD15-C0219487AEC0@mcs.anl.gov> Message-ID: <8DF643EF-54CD-4900-B209-C3C0210D8E8E@mcs.anl.gov> I know that r1650 works - but I need to use Mihael's fix to see if my workflow could run successfully w/falcon (thats what his karajan update is about) On Jul 5, 2007, at 4:17 PM, Ben Clifford wrote: > > try r1650 - that's the version of karajan that we've had for ages, > before > this week. > > -- > From hategan at mcs.anl.gov Thu Jul 5 17:10:36 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 05 Jul 2007 17:10:36 -0500 Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: <1183671222.31476.4.camel@blabla.mcs.anl.gov> References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> <74CD898E-7A60-4F25-BD15-C0219487AEC0@mcs.anl.gov> <1183670437.31476.0.camel@blabla.mcs.anl.gov> <1183671222.31476.4.camel@blabla.mcs.anl.gov> Message-ID: <1183673436.9192.0.camel@blabla.mcs.anl.gov> On Thu, 2007-07-05 at 16:33 -0500, Mihael Hategan wrote: > Tentative fix in SVN. I'm running tests to see if things hold. Seems to work, as far as the karajan tests can tell. > > On Thu, 2007-07-05 at 16:20 -0500, Mihael Hategan wrote: > > I might know what it is. Stay tuned. > > > > On Thu, 2007-07-05 at 15:45 -0500, Veronika Nefedova wrote: > > > 1657 for Karajan and 887 for vdsk > > > > > > On Jul 5, 2007, at 3:13 PM, Ben Clifford wrote: > > > > > > > > > > > what karajan revision? and what swift revision? > > > > > > > > (type svn info in the cog and dsk directories...) > > > > > > > > On Thu, 5 Jul 2007, Veronika Nefedova wrote: > > > > > > > >> you can use my kml file that I compiled today with the latest > > > >> karajan (its on > > > >> terminable). > > > >> > > > >> On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote: > > > >> > > > >>> ou have your heap set on the 244 molecule workflow? I run out at the > > > >>> compile stage with default. > > > >>> -- > > > >>> > > > >>> > > > >> > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From bugzilla-daemon at mcs.anl.gov Fri Jul 6 09:16:51 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 6 Jul 2007 09:16:51 -0500 (CDT) Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: Message-ID: <20070706141651.D911516502@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 ------- Comment #16 from nefedova at mcs.anl.gov 2007-07-06 09:16 ------- The latest Karajan fix seems to work (i.e. Workflow compiles). Falcon experiences some problems. Ioan, please post the details of the current problems here. Nika -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From benc at hawaga.org.uk Fri Jul 6 09:49:33 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 6 Jul 2007 20:19:33 +0530 (IST) Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> Message-ID: On my machine that seems to take i) forever to compile (as in I gave up after it generated the 25mb intermediate xml file but before it had made a kml file), and ii) forever to get to the stage where it tries to execute anything (as in I gave up before it gave me an error about not being able to find the transformations to run). What sort of times does it usually take for you to: i) compile ii) run the first executable ? On Thu, 5 Jul 2007, Ben Clifford wrote: > > what karajan revision? and what swift revision? > > (type svn info in the cog and dsk directories...) > > On Thu, 5 Jul 2007, Veronika Nefedova wrote: > > > you can use my kml file that I compiled today with the latest karajan (its on > > terminable). > > > > On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote: > > > > > ou have your heap set on the 244 molecule workflow? I run out at the > > > compile stage with default. > > > -- > > > > > > > > > > From nefedova at mcs.anl.gov Fri Jul 6 10:13:28 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Fri, 6 Jul 2007 10:13:28 -0500 Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> Message-ID: with the old version of the script (all loops unrolled) it would take about 1.5 hours to compile (244 molecules). Once compiled it would start the execution within a minute. A new swift code (with the main loop done in 'foreach' is under way (I am testing it right now). Nika On Jul 6, 2007, at 9:49 AM, Ben Clifford wrote: > > On my machine that seems to take i) forever to compile (as in I > gave up > after it generated the 25mb intermediate xml file but before it had > made a > kml file), and ii) forever to get to the stage where it tries to > execute > anything (as in I gave up before it gave me an error about not > being able > to find the transformations to run). > > What sort of times does it usually take for you to: > > i) compile > > ii) run the first executable > > ? > > On Thu, 5 Jul 2007, Ben Clifford wrote: > >> >> what karajan revision? and what swift revision? >> >> (type svn info in the cog and dsk directories...) >> >> On Thu, 5 Jul 2007, Veronika Nefedova wrote: >> >>> you can use my kml file that I compiled today with the latest >>> karajan (its on >>> terminable). >>> >>> On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote: >>> >>>> ou have your heap set on the 244 molecule workflow? I run out at >>>> the >>>> compile stage with default. >>>> -- >>>> >>>> >>> >> >> > From hategan at mcs.anl.gov Fri Jul 6 10:16:43 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 06 Jul 2007 10:16:43 -0500 Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> Message-ID: <1183735003.9663.0.camel@blabla.mcs.anl.gov> On Fri, 2007-07-06 at 10:13 -0500, Veronika Nefedova wrote: > with the old version of the script (all loops unrolled) it would take > about 1.5 hours to compile (244 molecules). Once compiled it would > start the execution within a minute. How can you tell when it's done compiling? > A new swift code (with the main loop done in 'foreach' is under way > (I am testing it right now). > > Nika > > On Jul 6, 2007, at 9:49 AM, Ben Clifford wrote: > > > > > On my machine that seems to take i) forever to compile (as in I > > gave up > > after it generated the 25mb intermediate xml file but before it had > > made a > > kml file), and ii) forever to get to the stage where it tries to > > execute > > anything (as in I gave up before it gave me an error about not > > being able > > to find the transformations to run). > > > > What sort of times does it usually take for you to: > > > > i) compile > > > > ii) run the first executable > > > > ? > > > > On Thu, 5 Jul 2007, Ben Clifford wrote: > > > >> > >> what karajan revision? and what swift revision? > >> > >> (type svn info in the cog and dsk directories...) > >> > >> On Thu, 5 Jul 2007, Veronika Nefedova wrote: > >> > >>> you can use my kml file that I compiled today with the latest > >>> karajan (its on > >>> terminable). > >>> > >>> On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote: > >>> > >>>> ou have your heap set on the 244 molecule workflow? I run out at > >>>> the > >>>> compile stage with default. > >>>> -- > >>>> > >>>> > >>> > >> > >> > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From nefedova at mcs.anl.gov Fri Jul 6 10:22:23 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Fri, 6 Jul 2007 10:22:23 -0500 Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: <1183735003.9663.0.camel@blabla.mcs.anl.gov> References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> <1183735003.9663.0.camel@blabla.mcs.anl.gov> Message-ID: <8DC8874A-B469-44BA-B9AB-B3CBCBD34E60@mcs.anl.gov> On Jul 6, 2007, at 10:16 AM, Mihael Hategan wrote: > On Fri, 2007-07-06 at 10:13 -0500, Veronika Nefedova wrote: >> with the old version of the script (all loops unrolled) it would take >> about 1.5 hours to compile (244 molecules). Once compiled it would >> start the execution within a minute. > > How can you tell when it's done compiling? > When its done compiling, it starts execution - you right, its hard to tell when its all done in one step. But when you already have the compiled code and start execution - it takes less then a minute (30 seconds?) to send the first task out. Nika >> A new swift code (with the main loop done in 'foreach' is under way >> (I am testing it right now). >> >> Nika >> >> On Jul 6, 2007, at 9:49 AM, Ben Clifford wrote: >> >>> >>> On my machine that seems to take i) forever to compile (as in I >>> gave up >>> after it generated the 25mb intermediate xml file but before it had >>> made a >>> kml file), and ii) forever to get to the stage where it tries to >>> execute >>> anything (as in I gave up before it gave me an error about not >>> being able >>> to find the transformations to run). >>> >>> What sort of times does it usually take for you to: >>> >>> i) compile >>> >>> ii) run the first executable >>> >>> ? >>> >>> On Thu, 5 Jul 2007, Ben Clifford wrote: >>> >>>> >>>> what karajan revision? and what swift revision? >>>> >>>> (type svn info in the cog and dsk directories...) >>>> >>>> On Thu, 5 Jul 2007, Veronika Nefedova wrote: >>>> >>>>> you can use my kml file that I compiled today with the latest >>>>> karajan (its on >>>>> terminable). >>>>> >>>>> On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote: >>>>> >>>>>> ou have your heap set on the 244 molecule workflow? I run out at >>>>>> the >>>>>> compile stage with default. >>>>>> -- >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > From hategan at mcs.anl.gov Fri Jul 6 10:25:32 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 06 Jul 2007 10:25:32 -0500 Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: <8DC8874A-B469-44BA-B9AB-B3CBCBD34E60@mcs.anl.gov> References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> <1183735003.9663.0.camel@blabla.mcs.anl.gov> <8DC8874A-B469-44BA-B9AB-B3CBCBD34E60@mcs.anl.gov> Message-ID: <1183735532.10139.0.camel@blabla.mcs.anl.gov> On Fri, 2007-07-06 at 10:22 -0500, Veronika Nefedova wrote: > On Jul 6, 2007, at 10:16 AM, Mihael Hategan wrote: > > > On Fri, 2007-07-06 at 10:13 -0500, Veronika Nefedova wrote: > >> with the old version of the script (all loops unrolled) it would take > >> about 1.5 hours to compile (244 molecules). Once compiled it would > >> start the execution within a minute. > > > > How can you tell when it's done compiling? > > > > When its done compiling, it starts execution - you right, its hard > to tell when its all done in one step. But when you already have the > compiled code and start execution - it takes less then a minute (30 > seconds?) to send the first task out. That makes sense. We need to speed up compilation? > > Nika > > >> A new swift code (with the main loop done in 'foreach' is under way > >> (I am testing it right now). > >> > >> Nika > >> > >> On Jul 6, 2007, at 9:49 AM, Ben Clifford wrote: > >> > >>> > >>> On my machine that seems to take i) forever to compile (as in I > >>> gave up > >>> after it generated the 25mb intermediate xml file but before it had > >>> made a > >>> kml file), and ii) forever to get to the stage where it tries to > >>> execute > >>> anything (as in I gave up before it gave me an error about not > >>> being able > >>> to find the transformations to run). > >>> > >>> What sort of times does it usually take for you to: > >>> > >>> i) compile > >>> > >>> ii) run the first executable > >>> > >>> ? > >>> > >>> On Thu, 5 Jul 2007, Ben Clifford wrote: > >>> > >>>> > >>>> what karajan revision? and what swift revision? > >>>> > >>>> (type svn info in the cog and dsk directories...) > >>>> > >>>> On Thu, 5 Jul 2007, Veronika Nefedova wrote: > >>>> > >>>>> you can use my kml file that I compiled today with the latest > >>>>> karajan (its on > >>>>> terminable). > >>>>> > >>>>> On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote: > >>>>> > >>>>>> ou have your heap set on the 244 molecule workflow? I run out at > >>>>>> the > >>>>>> compile stage with default. > >>>>>> -- > >>>>>> > >>>>>> > >>>>> > >>>> > >>>> > >>> > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >> > > > From benc at hawaga.org.uk Fri Jul 6 11:02:01 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 6 Jul 2007 16:02:01 +0000 (GMT) Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: <1183735532.10139.0.camel@blabla.mcs.anl.gov> References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> <1183735003.9663.0.camel@blabla.mcs.anl.gov> <8DC8874A-B469-44BA-B9AB-B3CBCBD34E60@mcs.anl.gov> <1183735532.10139.0.camel@blabla.mcs.anl.gov> Message-ID: On Fri, 6 Jul 2007, Mihael Hategan wrote: > That makes sense. We need to speed up compilation? I think more important is concentrating on the langauge features necessary to have smaller source files. I'm working with Nika on that at the moment. -- From hategan at mcs.anl.gov Fri Jul 6 11:05:30 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 06 Jul 2007 11:05:30 -0500 Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> <1183735003.9663.0.camel@blabla.mcs.anl.gov> <8DC8874A-B469-44BA-B9AB-B3CBCBD34E60@mcs.anl.gov> <1183735532.10139.0.camel@blabla.mcs.anl.gov> Message-ID: <1183737930.15085.0.camel@blabla.mcs.anl.gov> On Fri, 2007-07-06 at 16:02 +0000, Ben Clifford wrote: > > On Fri, 6 Jul 2007, Mihael Hategan wrote: > > > That makes sense. We need to speed up compilation? > > I think more important is concentrating on the langauge features necessary > to have smaller source files. Yes, of course. But the compilation time is still ridiculous, considering that it doesn't do much fancy stuff? > > I'm working with Nika on that at the moment. > From yongzh at cs.uchicago.edu Fri Jul 6 11:14:18 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Fri, 6 Jul 2007 11:14:18 -0500 (CDT) Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: <1183737930.15085.0.camel@blabla.mcs.anl.gov> References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> <1183735003.9663.0.camel@blabla.mcs.anl.gov> <8DC8874A-B469-44BA-B9AB-B3CBCBD34E60@mcs.anl.gov> <1183735532.10139.0.camel@blabla.mcs.anl.gov> <1183737930.15085.0.camel@blabla.mcs.anl.gov> Message-ID: I don't think the compilation takes that much time. It is the starting time from loading the kml file to dispatching the first job that takes a long time (for 20k jobs). Yong. On Fri, 6 Jul 2007, Mihael Hategan wrote: > On Fri, 2007-07-06 at 16:02 +0000, Ben Clifford wrote: > > > > On Fri, 6 Jul 2007, Mihael Hategan wrote: > > > > > That makes sense. We need to speed up compilation? > > > > I think more important is concentrating on the langauge features necessary > > to have smaller source files. > > Yes, of course. But the compilation time is still ridiculous, > considering that it doesn't do much fancy stuff? > > > > > I'm working with Nika on that at the moment. > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Fri Jul 6 11:16:28 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 6 Jul 2007 16:16:28 +0000 (GMT) Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> <1183735003.9663.0.camel@blabla.mcs.anl.gov> <8DC8874A-B469-44BA-B9AB-B3CBCBD34E60@mcs.anl.gov> <1183735532.10139.0.camel@blabla.mcs.anl.gov> <1183737930.15085.0.camel@blabla.mcs.anl.gov> Message-ID: the xml->kml conversion took long time when I tried to compile nika's .swift file. I can leave it running overnight and see if it ends... On Fri, 6 Jul 2007, Yong Zhao wrote: > I don't think the compilation takes that much time. It is the starting > time from loading the kml file to dispatching the first job that takes a > long time (for 20k jobs). > > Yong. > > On Fri, 6 Jul 2007, Mihael Hategan wrote: > > > On Fri, 2007-07-06 at 16:02 +0000, Ben Clifford wrote: > > > > > > On Fri, 6 Jul 2007, Mihael Hategan wrote: > > > > > > > That makes sense. We need to speed up compilation? > > > > > > I think more important is concentrating on the langauge features necessary > > > to have smaller source files. > > > > Yes, of course. But the compilation time is still ridiculous, > > considering that it doesn't do much fancy stuff? > > > > > > > > I'm working with Nika on that at the moment. > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > From yongzh at cs.uchicago.edu Fri Jul 6 11:19:33 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Fri, 6 Jul 2007 11:19:33 -0500 (CDT) Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> <1183735003.9663.0.camel@blabla.mcs.anl.gov> <8DC8874A-B469-44BA-B9AB-B3CBCBD34E60@mcs.anl.gov> <1183735532.10139.0.camel@blabla.mcs.anl.gov> <1183737930.15085.0.camel@blabla.mcs.anl.gov> Message-ID: Really? that is a bit strange. I only tried compiling the 100 molecule file on viper, and it went through quite fast. Is it always like this for different versions, and what is the config of your machine? Yong. On Fri, 6 Jul 2007, Ben Clifford wrote: > > the xml->kml conversion took long time when I tried to compile nika's > .swift file. > > I can leave it running overnight and see if it ends... > > On Fri, 6 Jul 2007, Yong Zhao wrote: > > > I don't think the compilation takes that much time. It is the starting > > time from loading the kml file to dispatching the first job that takes a > > long time (for 20k jobs). > > > > Yong. > > > > On Fri, 6 Jul 2007, Mihael Hategan wrote: > > > > > On Fri, 2007-07-06 at 16:02 +0000, Ben Clifford wrote: > > > > > > > > On Fri, 6 Jul 2007, Mihael Hategan wrote: > > > > > > > > > That makes sense. We need to speed up compilation? > > > > > > > > I think more important is concentrating on the langauge features necessary > > > > to have smaller source files. > > > > > > Yes, of course. But the compilation time is still ridiculous, > > > considering that it doesn't do much fancy stuff? > > > > > > > > > > > I'm working with Nika on that at the moment. > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > From hategan at mcs.anl.gov Fri Jul 6 11:30:31 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 06 Jul 2007 11:30:31 -0500 Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> <1183735003.9663.0.camel@blabla.mcs.anl.gov> <8DC8874A-B469-44BA-B9AB-B3CBCBD34E60@mcs.anl.gov> <1183735532.10139.0.camel@blabla.mcs.anl.gov> <1183737930.15085.0.camel@blabla.mcs.anl.gov> Message-ID: <1183739431.18292.0.camel@blabla.mcs.anl.gov> On Fri, 2007-07-06 at 11:14 -0500, Yong Zhao wrote: > I don't think the compilation takes that much time. It is the starting > time from loading the kml file to dispatching the first job that takes a > long time (for 20k jobs). Apparently Nika just mentioned that starting an already compiled .kml takes less than one minute. > > Yong. > > On Fri, 6 Jul 2007, Mihael Hategan wrote: > > > On Fri, 2007-07-06 at 16:02 +0000, Ben Clifford wrote: > > > > > > On Fri, 6 Jul 2007, Mihael Hategan wrote: > > > > > > > That makes sense. We need to speed up compilation? > > > > > > I think more important is concentrating on the langauge features necessary > > > to have smaller source files. > > > > Yes, of course. But the compilation time is still ridiculous, > > considering that it doesn't do much fancy stuff? > > > > > > > > I'm working with Nika on that at the moment. > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > From bugzilla-daemon at mcs.anl.gov Fri Jul 6 11:43:40 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 6 Jul 2007 11:43:40 -0500 (CDT) Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: Message-ID: <20070706164340.52EEE164DD@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 ------- Comment #17 from iraicu at cs.uchicago.edu 2007-07-06 11:43 ------- (In reply to comment #16) > The latest Karajan fix seems to work (i.e. Workflow compiles). Falcon > experiences some problems. Ioan, please post the details of the current > problems here. > I made some chnages in the last few days to fix some known issues I have had with Falkon, although none of these issues were relevant to the MolDyn runs we have been making recently. I made some small sanity checks after I made the changes, and everything seemed fine. Then, yesterday, when we tried the 244 mol run again, within the first 100 jobs, Falkon seemed to be having problems. It looked like notifications to the workers weren't always going through (which has never happened before). This would cause some number of CPUs to sit idle while Falkon recovered from this (its default is to clean up every 60 sec). I made some more synthetic tests from my command line client (independent of Swift), and the problem was reproducible about 3~4 times in a row that I tried. Then, I even managed to crash the GT4 container, as it locked up and it would not do anything. This was also a fist, I have never managed to get the GT4 container in a state where it would not answer any more WS calls, yet the CPU was idle on the machine. From the surface, it looked like all hell broke loose.... I added some more debuging statements and turned on all possible debugging... and a few hours later (last night), I tried again and everything was working perfect! I ran some 100K jobs through it and it seemed to work perfect. I even disabled all the debugging that I added just to see if that did anything,and things were still perfect. I blows my mind what could have happened, to go from something that was repeatable every time, to something that I can't reproduce, and this is all in the same environment, configuration, and hardware. I'll dig around some more to try to make sense of what happened, and perhaps we can try the 244 mol run again once I am convinced that I have not broken anything with my latest changes from earlier this week. Ioan PS: I could also try to revert back to the earlier version before my changes, especially as the changes I made were not geared for the MolDyn app, and more in general. > Nika > -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Fri Jul 6 11:49:50 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 6 Jul 2007 11:49:50 -0500 (CDT) Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: Message-ID: <20070706164950.6AC3816502@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 ------- Comment #18 from hategan at mcs.anl.gov 2007-07-06 11:49 ------- (In reply to comment #17) > (In reply to comment #16) > the problem was reproducible about 3~4 times in a row that I tried. > [...] > > I blows my mind what could have > happened, to go from something that was repeatable every time, to something > that I can't reproduce, and this is all in the same environment, configuration, > and hardware. Those are concurrency issues, most likely. The fact that things work fine a number of times is not a guarantee that they will always do so. That's what makes this so difficult. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From yongzh at cs.uchicago.edu Fri Jul 6 11:51:59 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Fri, 6 Jul 2007 11:51:59 -0500 (CDT) Subject: [Swift-devel] recent karajan changes causing trouble In-Reply-To: <1183739431.18292.0.camel@blabla.mcs.anl.gov> References: <1183429417.16404.0.camel@blabla.mcs.anl.gov> <1183735003.9663.0.camel@blabla.mcs.anl.gov> <8DC8874A-B469-44BA-B9AB-B3CBCBD34E60@mcs.anl.gov> <1183735532.10139.0.camel@blabla.mcs.anl.gov> <1183737930.15085.0.camel@blabla.mcs.anl.gov> <1183739431.18292.0.camel@blabla.mcs.anl.gov> Message-ID: Yeah, and I am arguing against that as I believe that is not the case. Yong. On Fri, 6 Jul 2007, Mihael Hategan wrote: > On Fri, 2007-07-06 at 11:14 -0500, Yong Zhao wrote: > > I don't think the compilation takes that much time. It is the starting > > time from loading the kml file to dispatching the first job that takes a > > long time (for 20k jobs). > > Apparently Nika just mentioned that starting an already compiled .kml > takes less than one minute. > > > > > Yong. > > > > On Fri, 6 Jul 2007, Mihael Hategan wrote: > > > > > On Fri, 2007-07-06 at 16:02 +0000, Ben Clifford wrote: > > > > > > > > On Fri, 6 Jul 2007, Mihael Hategan wrote: > > > > > > > > > That makes sense. We need to speed up compilation? > > > > > > > > I think more important is concentrating on the langauge features necessary > > > > to have smaller source files. > > > > > > Yes, of course. But the compilation time is still ridiculous, > > > considering that it doesn't do much fancy stuff? > > > > > > > > > > > I'm working with Nika on that at the moment. > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > From nefedova at mcs.anl.gov Fri Jul 6 12:38:02 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Fri, 6 Jul 2007 12:38:02 -0500 Subject: [Swift-devel] Karajan problem? Message-ID: Hi, Mihael: I am testing now my new code (with loops and various string operations!) but I am getting some Karajan errors. I am wondering if you could point me to a possible reason for these errors? I do not see any reference to my code so I am not sure where to start looking... org.griphyn.vdl.mapping.InvalidPathException: Invalid path (1) for type file org.griphyn.vdl.mapping.InvalidPathException: Invalid path (1) for type file Caused by: org.griphyn.vdl.mapping.InvalidPathException: Invalid path (1) for type file at org.griphyn.vdl.karajan.lib.GetField.function (GetField.java:33) at org.griphyn.vdl.karajan.lib.VDLFunction.post (VDLFunction.java:58) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext (Sequential.java:51) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren (Sequential.java:27) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute (FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart (FlowNode.java:239) at org.globus.cog.karajan.workflow.nodes.FlowNode.start (FlowNode.java:280) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent (FlowNode.java:392) at org.globus.cog.karajan.workflow.nodes.FlowNode.event (FlowNode.java:331) at org.globus.cog.karajan.workflow.FlowElementWrapper.event (FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.send (EventBus.java:123) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked (EventBus.java:97) at org.globus.cog.karajan.workflow.events.EventWorker.run (EventWorker.java:69) Caused by: org.griphyn.vdl.mapping.InvalidPathException: Invalid path (1) for type file at org.griphyn.vdl.mapping.AbstractDataNode.getFields (AbstractDataNode.java:139) at org.griphyn.vdl.mapping.AbstractDataNode.getFields (AbstractDataNode.java:114) at org.griphyn.vdl.karajan.lib.GetField.function (GetField.java:25) ... 12 more Execution failed: org.griphyn.vdl.mapping.InvalidPathException: Invalid path (1) for type file Thanks! Nika From hategan at mcs.anl.gov Fri Jul 6 13:18:46 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 06 Jul 2007 13:18:46 -0500 Subject: [Swift-devel] Re: Karajan problem? In-Reply-To: References: Message-ID: <1183745926.22318.1.camel@blabla.mcs.anl.gov> Are you trying to access a file as an array? As in file x; x[1] On Fri, 2007-07-06 at 12:38 -0500, Veronika Nefedova wrote: > Hi, Mihael: > > I am testing now my new code (with loops and various string > operations!) but I am getting some Karajan errors. I am wondering if > you could point me to a possible reason for these errors? I do not > see any reference to my code so I am not sure where to start looking... > > org.griphyn.vdl.mapping.InvalidPathException: Invalid path (1) for > type file > org.griphyn.vdl.mapping.InvalidPathException: Invalid path (1) for > type file > Caused by: org.griphyn.vdl.mapping.InvalidPathException: Invalid path > (1) for type file > at org.griphyn.vdl.karajan.lib.GetField.function > (GetField.java:33) > at org.griphyn.vdl.karajan.lib.VDLFunction.post > (VDLFunction.java:58) > at org.globus.cog.karajan.workflow.nodes.Sequential.startNext > (Sequential.java:51) > at > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren > (Sequential.java:27) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute > (FlowContainer.java:63) > at org.globus.cog.karajan.workflow.nodes.FlowNode.restart > (FlowNode.java:239) > at org.globus.cog.karajan.workflow.nodes.FlowNode.start > (FlowNode.java:280) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent > (FlowNode.java:392) > at org.globus.cog.karajan.workflow.nodes.FlowNode.event > (FlowNode.java:331) > at org.globus.cog.karajan.workflow.FlowElementWrapper.event > (FlowElementWrapper.java:227) > at org.globus.cog.karajan.workflow.events.EventBus.send > (EventBus.java:123) > at org.globus.cog.karajan.workflow.events.EventBus.sendHooked > (EventBus.java:97) > at org.globus.cog.karajan.workflow.events.EventWorker.run > (EventWorker.java:69) > Caused by: org.griphyn.vdl.mapping.InvalidPathException: Invalid path > (1) for type file > at org.griphyn.vdl.mapping.AbstractDataNode.getFields > (AbstractDataNode.java:139) > at org.griphyn.vdl.mapping.AbstractDataNode.getFields > (AbstractDataNode.java:114) > at org.griphyn.vdl.karajan.lib.GetField.function > (GetField.java:25) > ... 12 more > Execution failed: > org.griphyn.vdl.mapping.InvalidPathException: Invalid path > (1) for type file > > > Thanks! > > Nika > From nefedova at mcs.anl.gov Fri Jul 6 13:20:44 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Fri, 6 Jul 2007 13:20:44 -0500 Subject: [Swift-devel] Re: Karajan problem? In-Reply-To: <1183745926.22318.1.camel@blabla.mcs.anl.gov> References: <1183745926.22318.1.camel@blabla.mcs.anl.gov> Message-ID: Yes, I am: file outfiles ; outfiles [0] = CHARMM4 (solv_chg, whaminp, s1, "input:solv_chg"); On Jul 6, 2007, at 1:18 PM, Mihael Hategan wrote: > Are you trying to access a file as an array? As in > file x; > x[1] > > > On Fri, 2007-07-06 at 12:38 -0500, Veronika Nefedova wrote: >> Hi, Mihael: >> >> I am testing now my new code (with loops and various string >> operations!) but I am getting some Karajan errors. I am wondering if >> you could point me to a possible reason for these errors? I do not >> see any reference to my code so I am not sure where to start >> looking... >> >> org.griphyn.vdl.mapping.InvalidPathException: Invalid path (1) for >> type file >> org.griphyn.vdl.mapping.InvalidPathException: Invalid path (1) for >> type file >> Caused by: org.griphyn.vdl.mapping.InvalidPathException: Invalid path >> (1) for type file >> at org.griphyn.vdl.karajan.lib.GetField.function >> (GetField.java:33) >> at org.griphyn.vdl.karajan.lib.VDLFunction.post >> (VDLFunction.java:58) >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.startNext >> (Sequential.java:51) >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren >> (Sequential.java:27) >> at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.execute >> (FlowContainer.java:63) >> at org.globus.cog.karajan.workflow.nodes.FlowNode.restart >> (FlowNode.java:239) >> at org.globus.cog.karajan.workflow.nodes.FlowNode.start >> (FlowNode.java:280) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent >> (FlowNode.java:392) >> at org.globus.cog.karajan.workflow.nodes.FlowNode.event >> (FlowNode.java:331) >> at org.globus.cog.karajan.workflow.FlowElementWrapper.event >> (FlowElementWrapper.java:227) >> at org.globus.cog.karajan.workflow.events.EventBus.send >> (EventBus.java:123) >> at >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked >> (EventBus.java:97) >> at org.globus.cog.karajan.workflow.events.EventWorker.run >> (EventWorker.java:69) >> Caused by: org.griphyn.vdl.mapping.InvalidPathException: Invalid path >> (1) for type file >> at org.griphyn.vdl.mapping.AbstractDataNode.getFields >> (AbstractDataNode.java:139) >> at org.griphyn.vdl.mapping.AbstractDataNode.getFields >> (AbstractDataNode.java:114) >> at org.griphyn.vdl.karajan.lib.GetField.function >> (GetField.java:25) >> ... 12 more >> Execution failed: >> org.griphyn.vdl.mapping.InvalidPathException: Invalid path >> (1) for type file >> >> >> Thanks! >> >> Nika >> > From hategan at mcs.anl.gov Fri Jul 6 13:29:00 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 06 Jul 2007 13:29:00 -0500 Subject: [Swift-devel] Re: Karajan problem? In-Reply-To: References: <1183745926.22318.1.camel@blabla.mcs.anl.gov> Message-ID: <1183746540.23000.1.camel@blabla.mcs.anl.gov> I'm assuming you see the problem with that. On Fri, 2007-07-06 at 13:20 -0500, Veronika Nefedova wrote: > Yes, I am: > > file outfiles solv_repu_0_0.2.out, solv_repu_0.2_0.3.out, solv_repu_0.3_0.4.out, > solv_repu_0.4_0.5.out, solv_repu_0.5_0.6.out, solv_repu_0.6_0.7.out, > solv_repu_0.7_0.8.out, solv_repu_0.8_0.9.out, solv_repu_0.9_1.out">; > outfiles [0] = CHARMM4 (solv_chg, whaminp, s1, "input:solv_chg"); > > > > On Jul 6, 2007, at 1:18 PM, Mihael Hategan wrote: > > > Are you trying to access a file as an array? As in > > file x; > > x[1] > > > > > > On Fri, 2007-07-06 at 12:38 -0500, Veronika Nefedova wrote: > >> Hi, Mihael: > >> > >> I am testing now my new code (with loops and various string > >> operations!) but I am getting some Karajan errors. I am wondering if > >> you could point me to a possible reason for these errors? I do not > >> see any reference to my code so I am not sure where to start > >> looking... > >> > >> org.griphyn.vdl.mapping.InvalidPathException: Invalid path (1) for > >> type file > >> org.griphyn.vdl.mapping.InvalidPathException: Invalid path (1) for > >> type file > >> Caused by: org.griphyn.vdl.mapping.InvalidPathException: Invalid path > >> (1) for type file > >> at org.griphyn.vdl.karajan.lib.GetField.function > >> (GetField.java:33) > >> at org.griphyn.vdl.karajan.lib.VDLFunction.post > >> (VDLFunction.java:58) > >> at > >> org.globus.cog.karajan.workflow.nodes.Sequential.startNext > >> (Sequential.java:51) > >> at > >> org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren > >> (Sequential.java:27) > >> at > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.execute > >> (FlowContainer.java:63) > >> at org.globus.cog.karajan.workflow.nodes.FlowNode.restart > >> (FlowNode.java:239) > >> at org.globus.cog.karajan.workflow.nodes.FlowNode.start > >> (FlowNode.java:280) > >> at > >> org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent > >> (FlowNode.java:392) > >> at org.globus.cog.karajan.workflow.nodes.FlowNode.event > >> (FlowNode.java:331) > >> at org.globus.cog.karajan.workflow.FlowElementWrapper.event > >> (FlowElementWrapper.java:227) > >> at org.globus.cog.karajan.workflow.events.EventBus.send > >> (EventBus.java:123) > >> at > >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked > >> (EventBus.java:97) > >> at org.globus.cog.karajan.workflow.events.EventWorker.run > >> (EventWorker.java:69) > >> Caused by: org.griphyn.vdl.mapping.InvalidPathException: Invalid path > >> (1) for type file > >> at org.griphyn.vdl.mapping.AbstractDataNode.getFields > >> (AbstractDataNode.java:139) > >> at org.griphyn.vdl.mapping.AbstractDataNode.getFields > >> (AbstractDataNode.java:114) > >> at org.griphyn.vdl.karajan.lib.GetField.function > >> (GetField.java:25) > >> ... 12 more > >> Execution failed: > >> org.griphyn.vdl.mapping.InvalidPathException: Invalid path > >> (1) for type file > >> > >> > >> Thanks! > >> > >> Nika > >> > > > From nefedova at mcs.anl.gov Fri Jul 6 13:33:59 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Fri, 6 Jul 2007 13:33:59 -0500 Subject: [Swift-devel] Re: Karajan problem? In-Reply-To: <1183746540.23000.1.camel@blabla.mcs.anl.gov> References: <1183745926.22318.1.camel@blabla.mcs.anl.gov> <1183746540.23000.1.camel@blabla.mcs.anl.gov> Message-ID: Yep! Thanks for the tip (; On Jul 6, 2007, at 1:29 PM, Mihael Hategan wrote: > I'm assuming you see the problem with that. > > On Fri, 2007-07-06 at 13:20 -0500, Veronika Nefedova wrote: >> Yes, I am: >> >> file outfiles > solv_repu_0_0.2.out, solv_repu_0.2_0.3.out, solv_repu_0.3_0.4.out, >> solv_repu_0.4_0.5.out, solv_repu_0.5_0.6.out, solv_repu_0.6_0.7.out, >> solv_repu_0.7_0.8.out, solv_repu_0.8_0.9.out, solv_repu_0.9_1.out">; >> outfiles [0] = CHARMM4 (solv_chg, whaminp, s1, "input:solv_chg"); >> >> >> >> On Jul 6, 2007, at 1:18 PM, Mihael Hategan wrote: >> >>> Are you trying to access a file as an array? As in >>> file x; >>> x[1] >>> >>> >>> On Fri, 2007-07-06 at 12:38 -0500, Veronika Nefedova wrote: >>>> Hi, Mihael: >>>> >>>> I am testing now my new code (with loops and various string >>>> operations!) but I am getting some Karajan errors. I am >>>> wondering if >>>> you could point me to a possible reason for these errors? I do not >>>> see any reference to my code so I am not sure where to start >>>> looking... >>>> >>>> org.griphyn.vdl.mapping.InvalidPathException: Invalid path (1) for >>>> type file >>>> org.griphyn.vdl.mapping.InvalidPathException: Invalid path (1) for >>>> type file >>>> Caused by: org.griphyn.vdl.mapping.InvalidPathException: Invalid >>>> path >>>> (1) for type file >>>> at org.griphyn.vdl.karajan.lib.GetField.function >>>> (GetField.java:33) >>>> at org.griphyn.vdl.karajan.lib.VDLFunction.post >>>> (VDLFunction.java:58) >>>> at >>>> org.globus.cog.karajan.workflow.nodes.Sequential.startNext >>>> (Sequential.java:51) >>>> at >>>> org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren >>>> (Sequential.java:27) >>>> at >>>> org.globus.cog.karajan.workflow.nodes.FlowContainer.execute >>>> (FlowContainer.java:63) >>>> at org.globus.cog.karajan.workflow.nodes.FlowNode.restart >>>> (FlowNode.java:239) >>>> at org.globus.cog.karajan.workflow.nodes.FlowNode.start >>>> (FlowNode.java:280) >>>> at >>>> org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent >>>> (FlowNode.java:392) >>>> at org.globus.cog.karajan.workflow.nodes.FlowNode.event >>>> (FlowNode.java:331) >>>> at >>>> org.globus.cog.karajan.workflow.FlowElementWrapper.event >>>> (FlowElementWrapper.java:227) >>>> at org.globus.cog.karajan.workflow.events.EventBus.send >>>> (EventBus.java:123) >>>> at >>>> org.globus.cog.karajan.workflow.events.EventBus.sendHooked >>>> (EventBus.java:97) >>>> at org.globus.cog.karajan.workflow.events.EventWorker.run >>>> (EventWorker.java:69) >>>> Caused by: org.griphyn.vdl.mapping.InvalidPathException: Invalid >>>> path >>>> (1) for type file >>>> at org.griphyn.vdl.mapping.AbstractDataNode.getFields >>>> (AbstractDataNode.java:139) >>>> at org.griphyn.vdl.mapping.AbstractDataNode.getFields >>>> (AbstractDataNode.java:114) >>>> at org.griphyn.vdl.karajan.lib.GetField.function >>>> (GetField.java:25) >>>> ... 12 more >>>> Execution failed: >>>> org.griphyn.vdl.mapping.InvalidPathException: Invalid path >>>> (1) for type file >>>> >>>> >>>> Thanks! >>>> >>>> Nika >>>> >>> >> > From nefedova at mcs.anl.gov Fri Jul 6 16:03:31 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Fri, 6 Jul 2007 16:03:31 -0500 Subject: [Swift-devel] wrong file staged in Message-ID: The wrong file was staged in during the 4th stage of the workflow... I have this inside my foreach loop: file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">; file solv_repu_0DOT9_1_b1_crd <"solv_repu_0.9_1_b1.crd">; file solv_repu_0DOT9_1_b1_out <"solv_repu_0.9_1_b1.out">; file solv_repu_0DOT9_1_b1_done <"solv_repu_0.9_1_b1_done">; (whamfiles[67] , solv_repu_0DOT9_1_b1_crd, solv_repu_0DOT9_1_b1_out, solv_repu_0DO\ T9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft, rtf_file, prm_file, psf_file,\ crd_eq_file, solv_repu_0DOT9_1_b1_prt, ss1, s1, s2, s3, s4, s5, s7, "urandseed:59\ 64163", sprt, "rcut1:0.9", "rcut2:1"); The first file (with DOT) is an input files for CHARMM3 and three last declared files (out, crd and done) are output files. When I check my remote directory during execution, I see that the wrong files were staged in. In particular, the wrong prt file was staged in: solv_disp_a3.prt instead of solv_repu_0.9_1_b1.prt (aka solv_repu_0DOT9_1_b1_prt) The solv_repu_0.9_1_b1.prt file is not produced by a previous stage, its being/supposed to be/ staged in from the submit host. The above declaration is the only place where the file solv_repu_0DOT9_1_b1_prt is being declared in swift file (I did grep to check it). kml file also looks ok. I am not sure why it has happened -- this piece of code has not been changed from the previous version... This is the work directory for this job (CHARMM3) on TG-UC: nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/ chrm_long-p2v28ydi> ls m001_am1.prm solv.inp solv_m001_eq.crd stderr.txt m001_am1.rtf solv_disp_a3.out solv_repu_0.9_1_b1.rst parm03_gaff_all.rtf solv_disp_a3.prt solv_repu_0.9_1_b1.trj parm03_gaffnb_all.prm solv_m001.psf solv_repu_0.9_1_b1.wham nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/ chrm_long-p2v28ydi> as you can see 2 files have the wrong names (solv_disp_a3 instead of solv_repu_0.9_1_b1 ) and execution is screwed up since the wrong parameter file (prt) was staged in... I checked whether that file was even staged in to the remote host -- in fact it was: nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0> find */ -name solv_repu_0.9_1_b1.prt -print shared/solv_repu_0.9_1_b1.prt But it never went to the right working directory... Any idea what is going on here? Thanks, Nika From hategan at mcs.anl.gov Fri Jul 6 16:31:18 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 06 Jul 2007 16:31:18 -0500 Subject: [Swift-devel] wrong file staged in In-Reply-To: References: Message-ID: <1183757478.29416.1.camel@blabla.mcs.anl.gov> Wonder if there is another declaration of the same variable mapped to the wrong file. On Fri, 2007-07-06 at 16:03 -0500, Veronika Nefedova wrote: > The wrong file was staged in during the 4th stage of the workflow... > > I have this inside my foreach loop: > > file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">; > file solv_repu_0DOT9_1_b1_crd <"solv_repu_0.9_1_b1.crd">; > file solv_repu_0DOT9_1_b1_out <"solv_repu_0.9_1_b1.out">; > file solv_repu_0DOT9_1_b1_done <"solv_repu_0.9_1_b1_done">; > > (whamfiles[67] , solv_repu_0DOT9_1_b1_crd, solv_repu_0DOT9_1_b1_out, > solv_repu_0DO\ > T9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft, rtf_file, > prm_file, psf_file,\ > crd_eq_file, solv_repu_0DOT9_1_b1_prt, ss1, s1, s2, s3, s4, s5, s7, > "urandseed:59\ > 64163", sprt, "rcut1:0.9", "rcut2:1"); > > > > The first file (with DOT) is an input files for CHARMM3 and three > last declared files (out, crd and done) are output files. > > When I check my remote directory during execution, I see that the > wrong files were staged in. In particular, the wrong prt file was > staged in: > > solv_disp_a3.prt instead of solv_repu_0.9_1_b1.prt (aka > solv_repu_0DOT9_1_b1_prt) > > The solv_repu_0.9_1_b1.prt file is not produced by a previous stage, > its being/supposed to be/ staged in from the submit host. > > The above declaration is the only place where the file > solv_repu_0DOT9_1_b1_prt is being declared in swift file (I did grep > to check it). kml file also looks ok. > > I am not sure why it has happened -- this piece of code has not been > changed from the previous version... > > > This is the work directory for this job (CHARMM3) on TG-UC: > > nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/ > chrm_long-p2v28ydi> ls > m001_am1.prm solv.inp solv_m001_eq.crd > stderr.txt > m001_am1.rtf solv_disp_a3.out solv_repu_0.9_1_b1.rst > parm03_gaff_all.rtf solv_disp_a3.prt solv_repu_0.9_1_b1.trj > parm03_gaffnb_all.prm solv_m001.psf solv_repu_0.9_1_b1.wham > nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/ > chrm_long-p2v28ydi> > > as you can see 2 files have the wrong names (solv_disp_a3 instead of > solv_repu_0.9_1_b1 ) and execution is screwed up since the wrong > parameter file (prt) was staged in... > > > I checked whether that file was even staged in to the remote host -- > in fact it was: > > nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0> > find */ -name solv_repu_0.9_1_b1.prt -print > shared/solv_repu_0.9_1_b1.prt > But it never went to the right working directory... > > Any idea what is going on here? > > Thanks, > > Nika > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From nefedova at mcs.anl.gov Fri Jul 6 16:37:19 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Fri, 6 Jul 2007 16:37:19 -0500 Subject: [Swift-devel] wrong file staged in In-Reply-To: <1183757478.29416.1.camel@blabla.mcs.anl.gov> References: <1183757478.29416.1.camel@blabla.mcs.anl.gov> Message-ID: Nope... I checked with grep: nefedova at viper:~/alamines> grep solv_repu_0DOT9_1_b1_prt MolDyn.dtm file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">; (whamfiles[67] , solv_repu_0DOT9_1_b1_crd, solv_repu_0DOT9_1_b1_out, solv_repu_0DOT9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft, rtf_file, prm_file, psf_file, crd_eq_file, solv_repu_0DOT9_1_b1_prt, ss1, s1, s2, s3, s4, s5, s7, "urandseed:5964163", sprt, "rcut1:0.9", "rcut2:1"); nefedova at viper:~/alamines> On Jul 6, 2007, at 4:31 PM, Mihael Hategan wrote: > Wonder if there is another declaration of the same variable mapped to > the wrong file. > > On Fri, 2007-07-06 at 16:03 -0500, Veronika Nefedova wrote: >> The wrong file was staged in during the 4th stage of the workflow... >> >> I have this inside my foreach loop: >> >> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">; >> file solv_repu_0DOT9_1_b1_crd <"solv_repu_0.9_1_b1.crd">; >> file solv_repu_0DOT9_1_b1_out <"solv_repu_0.9_1_b1.out">; >> file solv_repu_0DOT9_1_b1_done <"solv_repu_0.9_1_b1_done">; >> >> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd, solv_repu_0DOT9_1_b1_out, >> solv_repu_0DO\ >> T9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft, rtf_file, >> prm_file, psf_file,\ >> crd_eq_file, solv_repu_0DOT9_1_b1_prt, ss1, s1, s2, s3, s4, s5, s7, >> "urandseed:59\ >> 64163", sprt, "rcut1:0.9", "rcut2:1"); >> >> >> >> The first file (with DOT) is an input files for CHARMM3 and three >> last declared files (out, crd and done) are output files. >> >> When I check my remote directory during execution, I see that the >> wrong files were staged in. In particular, the wrong prt file was >> staged in: >> >> solv_disp_a3.prt instead of solv_repu_0.9_1_b1.prt (aka >> solv_repu_0DOT9_1_b1_prt) >> >> The solv_repu_0.9_1_b1.prt file is not produced by a previous stage, >> its being/supposed to be/ staged in from the submit host. >> >> The above declaration is the only place where the file >> solv_repu_0DOT9_1_b1_prt is being declared in swift file (I did grep >> to check it). kml file also looks ok. >> >> I am not sure why it has happened -- this piece of code has not been >> changed from the previous version... >> >> >> This is the work directory for this job (CHARMM3) on TG-UC: >> >> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/ >> chrm_long-p2v28ydi> ls >> m001_am1.prm solv.inp solv_m001_eq.crd >> stderr.txt >> m001_am1.rtf solv_disp_a3.out solv_repu_0.9_1_b1.rst >> parm03_gaff_all.rtf solv_disp_a3.prt solv_repu_0.9_1_b1.trj >> parm03_gaffnb_all.prm solv_m001.psf solv_repu_0.9_1_b1.wham >> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/ >> chrm_long-p2v28ydi> >> >> as you can see 2 files have the wrong names (solv_disp_a3 instead of >> solv_repu_0.9_1_b1 ) and execution is screwed up since the wrong >> parameter file (prt) was staged in... >> >> >> I checked whether that file was even staged in to the remote host -- >> in fact it was: >> >> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0> >> find */ -name solv_repu_0.9_1_b1.prt -print >> shared/solv_repu_0.9_1_b1.prt >> But it never went to the right working directory... >> >> Any idea what is going on here? >> >> Thanks, >> >> Nika >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > From hategan at mcs.anl.gov Fri Jul 6 16:39:19 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 06 Jul 2007 16:39:19 -0500 Subject: [Swift-devel] wrong file staged in In-Reply-To: References: <1183757478.29416.1.camel@blabla.mcs.anl.gov> Message-ID: <1183757959.29798.0.camel@blabla.mcs.anl.gov> Consistent or intermittent behavior? Also, can you attach the swift source? On Fri, 2007-07-06 at 16:37 -0500, Veronika Nefedova wrote: > Nope... I checked with grep: > > nefedova at viper:~/alamines> grep solv_repu_0DOT9_1_b1_prt MolDyn.dtm > file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">; > (whamfiles[67] , solv_repu_0DOT9_1_b1_crd, solv_repu_0DOT9_1_b1_out, > solv_repu_0DOT9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft, > rtf_file, prm_file, psf_file, crd_eq_file, solv_repu_0DOT9_1_b1_prt, > ss1, s1, s2, s3, s4, s5, s7, "urandseed:5964163", sprt, "rcut1:0.9", > "rcut2:1"); > nefedova at viper:~/alamines> > > On Jul 6, 2007, at 4:31 PM, Mihael Hategan wrote: > > > Wonder if there is another declaration of the same variable mapped to > > the wrong file. > > > > On Fri, 2007-07-06 at 16:03 -0500, Veronika Nefedova wrote: > >> The wrong file was staged in during the 4th stage of the workflow... > >> > >> I have this inside my foreach loop: > >> > >> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">; > >> file solv_repu_0DOT9_1_b1_crd <"solv_repu_0.9_1_b1.crd">; > >> file solv_repu_0DOT9_1_b1_out <"solv_repu_0.9_1_b1.out">; > >> file solv_repu_0DOT9_1_b1_done <"solv_repu_0.9_1_b1_done">; > >> > >> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd, solv_repu_0DOT9_1_b1_out, > >> solv_repu_0DO\ > >> T9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft, rtf_file, > >> prm_file, psf_file,\ > >> crd_eq_file, solv_repu_0DOT9_1_b1_prt, ss1, s1, s2, s3, s4, s5, s7, > >> "urandseed:59\ > >> 64163", sprt, "rcut1:0.9", "rcut2:1"); > >> > >> > >> > >> The first file (with DOT) is an input files for CHARMM3 and three > >> last declared files (out, crd and done) are output files. > >> > >> When I check my remote directory during execution, I see that the > >> wrong files were staged in. In particular, the wrong prt file was > >> staged in: > >> > >> solv_disp_a3.prt instead of solv_repu_0.9_1_b1.prt (aka > >> solv_repu_0DOT9_1_b1_prt) > >> > >> The solv_repu_0.9_1_b1.prt file is not produced by a previous stage, > >> its being/supposed to be/ staged in from the submit host. > >> > >> The above declaration is the only place where the file > >> solv_repu_0DOT9_1_b1_prt is being declared in swift file (I did grep > >> to check it). kml file also looks ok. > >> > >> I am not sure why it has happened -- this piece of code has not been > >> changed from the previous version... > >> > >> > >> This is the work directory for this job (CHARMM3) on TG-UC: > >> > >> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/ > >> chrm_long-p2v28ydi> ls > >> m001_am1.prm solv.inp solv_m001_eq.crd > >> stderr.txt > >> m001_am1.rtf solv_disp_a3.out solv_repu_0.9_1_b1.rst > >> parm03_gaff_all.rtf solv_disp_a3.prt solv_repu_0.9_1_b1.trj > >> parm03_gaffnb_all.prm solv_m001.psf solv_repu_0.9_1_b1.wham > >> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/ > >> chrm_long-p2v28ydi> > >> > >> as you can see 2 files have the wrong names (solv_disp_a3 instead of > >> solv_repu_0.9_1_b1 ) and execution is screwed up since the wrong > >> parameter file (prt) was staged in... > >> > >> > >> I checked whether that file was even staged in to the remote host -- > >> in fact it was: > >> > >> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0> > >> find */ -name solv_repu_0.9_1_b1.prt -print > >> shared/solv_repu_0.9_1_b1.prt > >> But it never went to the right working directory... > >> > >> Any idea what is going on here? > >> > >> Thanks, > >> > >> Nika > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >> > > > From nefedova at mcs.anl.gov Fri Jul 6 16:44:22 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Fri, 6 Jul 2007 16:44:22 -0500 Subject: [Swift-devel] wrong file staged in In-Reply-To: <1183757959.29798.0.camel@blabla.mcs.anl.gov> References: <1183757478.29416.1.camel@blabla.mcs.anl.gov> <1183757959.29798.0.camel@blabla.mcs.anl.gov> Message-ID: <07647021-AA1B-4231-85EB-D922DA485687@mcs.anl.gov> I put the dtm file on terminable in ~nefedova/MolDyn.dtm I see a few more directories with wrong files staged in, but I didn't check them all (130+ of them). I saw at least one with the correct files staged in. Nika On Jul 6, 2007, at 4:39 PM, Mihael Hategan wrote: > Consistent or intermittent behavior? > > Also, can you attach the swift source? > > On Fri, 2007-07-06 at 16:37 -0500, Veronika Nefedova wrote: >> Nope... I checked with grep: >> >> nefedova at viper:~/alamines> grep solv_repu_0DOT9_1_b1_prt MolDyn.dtm >> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">; >> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd, solv_repu_0DOT9_1_b1_out, >> solv_repu_0DOT9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft, >> rtf_file, prm_file, psf_file, crd_eq_file, solv_repu_0DOT9_1_b1_prt, >> ss1, s1, s2, s3, s4, s5, s7, "urandseed:5964163", sprt, "rcut1:0.9", >> "rcut2:1"); >> nefedova at viper:~/alamines> >> >> On Jul 6, 2007, at 4:31 PM, Mihael Hategan wrote: >> >>> Wonder if there is another declaration of the same variable >>> mapped to >>> the wrong file. >>> >>> On Fri, 2007-07-06 at 16:03 -0500, Veronika Nefedova wrote: >>>> The wrong file was staged in during the 4th stage of the >>>> workflow... >>>> >>>> I have this inside my foreach loop: >>>> >>>> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">; >>>> file solv_repu_0DOT9_1_b1_crd <"solv_repu_0.9_1_b1.crd">; >>>> file solv_repu_0DOT9_1_b1_out <"solv_repu_0.9_1_b1.out">; >>>> file solv_repu_0DOT9_1_b1_done <"solv_repu_0.9_1_b1_done">; >>>> >>>> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd, >>>> solv_repu_0DOT9_1_b1_out, >>>> solv_repu_0DO\ >>>> T9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft, rtf_file, >>>> prm_file, psf_file,\ >>>> crd_eq_file, solv_repu_0DOT9_1_b1_prt, ss1, s1, s2, s3, s4, s5, s7, >>>> "urandseed:59\ >>>> 64163", sprt, "rcut1:0.9", "rcut2:1"); >>>> >>>> >>>> >>>> The first file (with DOT) is an input files for CHARMM3 and three >>>> last declared files (out, crd and done) are output files. >>>> >>>> When I check my remote directory during execution, I see that the >>>> wrong files were staged in. In particular, the wrong prt file was >>>> staged in: >>>> >>>> solv_disp_a3.prt instead of solv_repu_0.9_1_b1.prt (aka >>>> solv_repu_0DOT9_1_b1_prt) >>>> >>>> The solv_repu_0.9_1_b1.prt file is not produced by a previous >>>> stage, >>>> its being/supposed to be/ staged in from the submit host. >>>> >>>> The above declaration is the only place where the file >>>> solv_repu_0DOT9_1_b1_prt is being declared in swift file (I did >>>> grep >>>> to check it). kml file also looks ok. >>>> >>>> I am not sure why it has happened -- this piece of code has not >>>> been >>>> changed from the previous version... >>>> >>>> >>>> This is the work directory for this job (CHARMM3) on TG-UC: >>>> >>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/ >>>> chrm_long-p2v28ydi> ls >>>> m001_am1.prm solv.inp solv_m001_eq.crd >>>> stderr.txt >>>> m001_am1.rtf solv_disp_a3.out solv_repu_0.9_1_b1.rst >>>> parm03_gaff_all.rtf solv_disp_a3.prt solv_repu_0.9_1_b1.trj >>>> parm03_gaffnb_all.prm solv_m001.psf solv_repu_0.9_1_b1.wham >>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/ >>>> chrm_long-p2v28ydi> >>>> >>>> as you can see 2 files have the wrong names (solv_disp_a3 >>>> instead of >>>> solv_repu_0.9_1_b1 ) and execution is screwed up since the wrong >>>> parameter file (prt) was staged in... >>>> >>>> >>>> I checked whether that file was even staged in to the remote >>>> host -- >>>> in fact it was: >>>> >>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0> >>>> find */ -name solv_repu_0.9_1_b1.prt -print >>>> shared/solv_repu_0.9_1_b1.prt >>>> But it never went to the right working directory... >>>> >>>> Any idea what is going on here? >>>> >>>> Thanks, >>>> >>>> Nika >>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>> >> > From hategan at mcs.anl.gov Fri Jul 6 16:49:39 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 06 Jul 2007 16:49:39 -0500 Subject: [Swift-devel] wrong file staged in In-Reply-To: <07647021-AA1B-4231-85EB-D922DA485687@mcs.anl.gov> References: <1183757478.29416.1.camel@blabla.mcs.anl.gov> <1183757959.29798.0.camel@blabla.mcs.anl.gov> <07647021-AA1B-4231-85EB-D922DA485687@mcs.anl.gov> Message-ID: <1183758579.30227.1.camel@blabla.mcs.anl.gov> On Fri, 2007-07-06 at 16:44 -0500, Veronika Nefedova wrote: > I put the dtm file on terminable in ~nefedova/MolDyn.dtm > > I see a few more directories with wrong files staged in, but I > didn't > check them all (130+ of them). I saw at least one with the correct > files staged in. Across different runs that is. Do you get the exact same mess-up, or is it different? > > Nika > > On Jul 6, 2007, at 4:39 PM, Mihael Hategan wrote: > > > Consistent or intermittent behavior? > > > > Also, can you attach the swift source? > > > > On Fri, 2007-07-06 at 16:37 -0500, Veronika Nefedova wrote: > >> Nope... I checked with grep: > >> > >> nefedova at viper:~/alamines> grep solv_repu_0DOT9_1_b1_prt MolDyn.dtm > >> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">; > >> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd, solv_repu_0DOT9_1_b1_out, > >> solv_repu_0DOT9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft, > >> rtf_file, prm_file, psf_file, crd_eq_file, solv_repu_0DOT9_1_b1_prt, > >> ss1, s1, s2, s3, s4, s5, s7, "urandseed:5964163", sprt, "rcut1:0.9", > >> "rcut2:1"); > >> nefedova at viper:~/alamines> > >> > >> On Jul 6, 2007, at 4:31 PM, Mihael Hategan wrote: > >> > >>> Wonder if there is another declaration of the same variable > >>> mapped to > >>> the wrong file. > >>> > >>> On Fri, 2007-07-06 at 16:03 -0500, Veronika Nefedova wrote: > >>>> The wrong file was staged in during the 4th stage of the > >>>> workflow... > >>>> > >>>> I have this inside my foreach loop: > >>>> > >>>> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">; > >>>> file solv_repu_0DOT9_1_b1_crd <"solv_repu_0.9_1_b1.crd">; > >>>> file solv_repu_0DOT9_1_b1_out <"solv_repu_0.9_1_b1.out">; > >>>> file solv_repu_0DOT9_1_b1_done <"solv_repu_0.9_1_b1_done">; > >>>> > >>>> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd, > >>>> solv_repu_0DOT9_1_b1_out, > >>>> solv_repu_0DO\ > >>>> T9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft, rtf_file, > >>>> prm_file, psf_file,\ > >>>> crd_eq_file, solv_repu_0DOT9_1_b1_prt, ss1, s1, s2, s3, s4, s5, s7, > >>>> "urandseed:59\ > >>>> 64163", sprt, "rcut1:0.9", "rcut2:1"); > >>>> > >>>> > >>>> > >>>> The first file (with DOT) is an input files for CHARMM3 and three > >>>> last declared files (out, crd and done) are output files. > >>>> > >>>> When I check my remote directory during execution, I see that the > >>>> wrong files were staged in. In particular, the wrong prt file was > >>>> staged in: > >>>> > >>>> solv_disp_a3.prt instead of solv_repu_0.9_1_b1.prt (aka > >>>> solv_repu_0DOT9_1_b1_prt) > >>>> > >>>> The solv_repu_0.9_1_b1.prt file is not produced by a previous > >>>> stage, > >>>> its being/supposed to be/ staged in from the submit host. > >>>> > >>>> The above declaration is the only place where the file > >>>> solv_repu_0DOT9_1_b1_prt is being declared in swift file (I did > >>>> grep > >>>> to check it). kml file also looks ok. > >>>> > >>>> I am not sure why it has happened -- this piece of code has not > >>>> been > >>>> changed from the previous version... > >>>> > >>>> > >>>> This is the work directory for this job (CHARMM3) on TG-UC: > >>>> > >>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/ > >>>> chrm_long-p2v28ydi> ls > >>>> m001_am1.prm solv.inp solv_m001_eq.crd > >>>> stderr.txt > >>>> m001_am1.rtf solv_disp_a3.out solv_repu_0.9_1_b1.rst > >>>> parm03_gaff_all.rtf solv_disp_a3.prt solv_repu_0.9_1_b1.trj > >>>> parm03_gaffnb_all.prm solv_m001.psf solv_repu_0.9_1_b1.wham > >>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/ > >>>> chrm_long-p2v28ydi> > >>>> > >>>> as you can see 2 files have the wrong names (solv_disp_a3 > >>>> instead of > >>>> solv_repu_0.9_1_b1 ) and execution is screwed up since the wrong > >>>> parameter file (prt) was staged in... > >>>> > >>>> > >>>> I checked whether that file was even staged in to the remote > >>>> host -- > >>>> in fact it was: > >>>> > >>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0> > >>>> find */ -name solv_repu_0.9_1_b1.prt -print > >>>> shared/solv_repu_0.9_1_b1.prt > >>>> But it never went to the right working directory... > >>>> > >>>> Any idea what is going on here? > >>>> > >>>> Thanks, > >>>> > >>>> Nika > >>>> > >>>> _______________________________________________ > >>>> Swift-devel mailing list > >>>> Swift-devel at ci.uchicago.edu > >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>> > >>> > >> > > > From nefedova at mcs.anl.gov Fri Jul 6 16:53:58 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Fri, 6 Jul 2007 16:53:58 -0500 Subject: [Swift-devel] wrong file staged in In-Reply-To: <1183758579.30227.1.camel@blabla.mcs.anl.gov> References: <1183757478.29416.1.camel@blabla.mcs.anl.gov> <1183757959.29798.0.camel@blabla.mcs.anl.gov> <07647021-AA1B-4231-85EB-D922DA485687@mcs.anl.gov> <1183758579.30227.1.camel@blabla.mcs.anl.gov> Message-ID: <5F6F8624-2C89-431E-A643-7A3ED13E6BD2@mcs.anl.gov> I didn't try another run. Something was really weird during that run. Some jobs just failed because the executable failed: stderr.txt: forrtl: No such file or directory /home/ydeng/c34a2/exec/ia64/charmm: relocation error: /soft/intel- c-9.1.049-f-9.1.045/lib/libunwind.so.6: undefined symbol: ? 1__serial_memmove But the jobs with wrong files staged in were running (the same executable)... I can repeat the run again now. Nika On Jul 6, 2007, at 4:49 PM, Mihael Hategan wrote: > On Fri, 2007-07-06 at 16:44 -0500, Veronika Nefedova wrote: >> I put the dtm file on terminable in ~nefedova/MolDyn.dtm >> >> I see a few more directories with wrong files staged in, but I >> didn't >> check them all (130+ of them). I saw at least one with the correct >> files staged in. > > Across different runs that is. Do you get the exact same mess-up, > or is > it different? > >> >> Nika >> >> On Jul 6, 2007, at 4:39 PM, Mihael Hategan wrote: >> >>> Consistent or intermittent behavior? >>> >>> Also, can you attach the swift source? >>> >>> On Fri, 2007-07-06 at 16:37 -0500, Veronika Nefedova wrote: >>>> Nope... I checked with grep: >>>> >>>> nefedova at viper:~/alamines> grep solv_repu_0DOT9_1_b1_prt MolDyn.dtm >>>> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">; >>>> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd, >>>> solv_repu_0DOT9_1_b1_out, >>>> solv_repu_0DOT9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft, >>>> rtf_file, prm_file, psf_file, crd_eq_file, >>>> solv_repu_0DOT9_1_b1_prt, >>>> ss1, s1, s2, s3, s4, s5, s7, "urandseed:5964163", sprt, >>>> "rcut1:0.9", >>>> "rcut2:1"); >>>> nefedova at viper:~/alamines> >>>> >>>> On Jul 6, 2007, at 4:31 PM, Mihael Hategan wrote: >>>> >>>>> Wonder if there is another declaration of the same variable >>>>> mapped to >>>>> the wrong file. >>>>> >>>>> On Fri, 2007-07-06 at 16:03 -0500, Veronika Nefedova wrote: >>>>>> The wrong file was staged in during the 4th stage of the >>>>>> workflow... >>>>>> >>>>>> I have this inside my foreach loop: >>>>>> >>>>>> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">; >>>>>> file solv_repu_0DOT9_1_b1_crd <"solv_repu_0.9_1_b1.crd">; >>>>>> file solv_repu_0DOT9_1_b1_out <"solv_repu_0.9_1_b1.out">; >>>>>> file solv_repu_0DOT9_1_b1_done <"solv_repu_0.9_1_b1_done">; >>>>>> >>>>>> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd, >>>>>> solv_repu_0DOT9_1_b1_out, >>>>>> solv_repu_0DO\ >>>>>> T9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft, rtf_file, >>>>>> prm_file, psf_file,\ >>>>>> crd_eq_file, solv_repu_0DOT9_1_b1_prt, ss1, s1, s2, s3, s4, >>>>>> s5, s7, >>>>>> "urandseed:59\ >>>>>> 64163", sprt, "rcut1:0.9", "rcut2:1"); >>>>>> >>>>>> >>>>>> >>>>>> The first file (with DOT) is an input files for CHARMM3 and >>>>>> three >>>>>> last declared files (out, crd and done) are output files. >>>>>> >>>>>> When I check my remote directory during execution, I see that the >>>>>> wrong files were staged in. In particular, the wrong prt file was >>>>>> staged in: >>>>>> >>>>>> solv_disp_a3.prt instead of solv_repu_0.9_1_b1.prt (aka >>>>>> solv_repu_0DOT9_1_b1_prt) >>>>>> >>>>>> The solv_repu_0.9_1_b1.prt file is not produced by a previous >>>>>> stage, >>>>>> its being/supposed to be/ staged in from the submit host. >>>>>> >>>>>> The above declaration is the only place where the file >>>>>> solv_repu_0DOT9_1_b1_prt is being declared in swift file (I did >>>>>> grep >>>>>> to check it). kml file also looks ok. >>>>>> >>>>>> I am not sure why it has happened -- this piece of code has not >>>>>> been >>>>>> changed from the previous version... >>>>>> >>>>>> >>>>>> This is the work directory for this job (CHARMM3) on TG-UC: >>>>>> >>>>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn- >>>>>> zvlc1f9c03pf0/ >>>>>> chrm_long-p2v28ydi> ls >>>>>> m001_am1.prm solv.inp solv_m001_eq.crd >>>>>> stderr.txt >>>>>> m001_am1.rtf solv_disp_a3.out solv_repu_0.9_1_b1.rst >>>>>> parm03_gaff_all.rtf solv_disp_a3.prt solv_repu_0.9_1_b1.trj >>>>>> parm03_gaffnb_all.prm solv_m001.psf solv_repu_0.9_1_b1.wham >>>>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn- >>>>>> zvlc1f9c03pf0/ >>>>>> chrm_long-p2v28ydi> >>>>>> >>>>>> as you can see 2 files have the wrong names (solv_disp_a3 >>>>>> instead of >>>>>> solv_repu_0.9_1_b1 ) and execution is screwed up since the wrong >>>>>> parameter file (prt) was staged in... >>>>>> >>>>>> >>>>>> I checked whether that file was even staged in to the remote >>>>>> host -- >>>>>> in fact it was: >>>>>> >>>>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn- >>>>>> zvlc1f9c03pf0> >>>>>> find */ -name solv_repu_0.9_1_b1.prt -print >>>>>> shared/solv_repu_0.9_1_b1.prt >>>>>> But it never went to the right working directory... >>>>>> >>>>>> Any idea what is going on here? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Nika >>>>>> >>>>>> _______________________________________________ >>>>>> Swift-devel mailing list >>>>>> Swift-devel at ci.uchicago.edu >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>> >>>>> >>>> >>> >> > From benc at hawaga.org.uk Fri Jul 6 22:09:43 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 7 Jul 2007 03:09:43 +0000 (GMT) Subject: [Swift-devel] wrong file staged in In-Reply-To: <5F6F8624-2C89-431E-A643-7A3ED13E6BD2@mcs.anl.gov> References: <1183757478.29416.1.camel@blabla.mcs.anl.gov> <1183757959.29798.0.camel@blabla.mcs.anl.gov> <07647021-AA1B-4231-85EB-D922DA485687@mcs.anl.gov> <1183758579.30227.1.camel@blabla.mcs.anl.gov> <5F6F8624-2C89-431E-A643-7A3ED13E6BD2@mcs.anl.gov> Message-ID: On Fri, 6 Jul 2007, Veronika Nefedova wrote: > I can repeat the run again now. successfully? -- From benc at hawaga.org.uk Sat Jul 7 02:38:14 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 7 Jul 2007 13:08:14 +0530 (IST) Subject: [Swift-devel] Re: mapper syntax In-Reply-To: References: Message-ID: On Tue, 3 Jul 2007, Ben Clifford wrote: > The syntax: > > imagefiles if[] > ; > > is rather noisy all on one line. > > A syntax change could be to express the above as: > > imagefiles if[] map my_mapper { > foo = @strcat(filename,blah); > otherparam = true; > moreparams = false; > }; I realised the present syntax admits enough whitespace for a multi-line representation, thusly: foreach s in array { messagefile outfile < single_file_mapper; file=@strcat("051-foreach.",s,".out") >; outfile = greeting(s); } -- From tiberius at ci.uchicago.edu Sun Jul 8 23:35:39 2007 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Sun, 8 Jul 2007 23:35:39 -0500 Subject: [Swift-devel] dot files by default In-Reply-To: References: Message-ID: Add a command line option: --gengraph=false by default. Probably it makes more sense in terms of the cleanness of the output. On 7/4/07, Ben Clifford wrote: > does anyone have preference about whether .dot graphviz files are > generated by default or not? > > I find them a bit annoying in as much as they double the number of run > files in my working directories to no immediate benefit. > > -- > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Tiberiu (Tibi) Stef-Praun, PhD Research Staff, Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From hategan at mcs.anl.gov Sun Jul 8 23:43:48 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 08 Jul 2007 23:43:48 -0500 Subject: [Swift-devel] dot files by default In-Reply-To: References: Message-ID: <1183956228.4067.2.camel@blabla.mcs.anl.gov> To quote swift -help: [-pgraph >] Whether to generate a provenance graph or not. If a 'true' is used, the file name for the graph will be chosen by swift. This can also be set in swift.properties. The default is 'true'. The issue is whether to switch the default to 'false'. On Sun, 2007-07-08 at 23:35 -0500, Tiberiu Stef-Praun wrote: > Add a command line option: --gengraph=false by default. > Probably it makes more sense in terms of the cleanness of the output. > > On 7/4/07, Ben Clifford wrote: > > does anyone have preference about whether .dot graphviz files are > > generated by default or not? > > > > I find them a bit annoying in as much as they double the number of run > > files in my working directories to no immediate benefit. > > > > -- > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > From benc at hawaga.org.uk Sun Jul 8 00:56:21 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 8 Jul 2007 11:26:21 +0530 (IST) Subject: [Swift-devel] mapping and primitive types Message-ID: The runtime has the concept of 'primitive' types - these are types such as int, float, string. If a type is primitive, it is not staged in or out during procedure execution. This is (I think) the only difference in behaviour. However, this isn't implemented particularly nicely. If I run program A below, with the output mapped like this: messagefile outfile <"055-pass-int.out">; then I get output in a file called 055-pass-int.out. However, if I run program B below, which is similar but declares its output like this: int outfile <"056-pass-int.out">; then the output file is not staged back, but no error is given suggesting that it is unwise to map an integer to a file. I see why that is in the implementation, but its not pleasing from a user perspective. Should it be possible to map a 'primitive' type? If yes, then the below two programs should work. If no, then program B should produce a sensible error message. I think the answer should be 'yes' - there seems to be a long term desire to be able to access mapped data in the language (for example, to run a program to determine if an iterative process has converged, outputting a boolean, and use that boolean as a condition in a loop). PROGRAM A ========= type messagefile {} (messagefile t) greeting(string m, int i) { app { echo i stdout=@filename(t); } } messagefile outfile <"055-pass-int.out">; int luftballons; luftballons = 99; outfile = greeting("hi", luftballons); PROGRAM B ========= (int t) greeting(string m, int i) { app { echo i stdout=@filename(t); } } int outfile <"056-pass-int.out">; int luftballons; luftballons = 99; outfile = greeting("hi", luftballons); -- From benc at hawaga.org.uk Sun Jul 8 09:22:29 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 8 Jul 2007 19:52:29 +0530 (IST) Subject: [Swift-devel] arrays-of-arrays Message-ID: The present language syntax does not admit arrays-of-arrays, with expressions such as a[5][3]. However, I don't see anything particularly constraining in the implementation to require this. Does anyone have preference? -- From bugzilla-daemon at mcs.anl.gov Mon Jul 9 08:56:06 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 9 Jul 2007 08:56:06 -0500 (CDT) Subject: [Swift-devel] [Bug 80] New: simple_mapper strange prefix behaviour Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=80 Summary: simple_mapper strange prefix behaviour Product: Swift Version: unspecified Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: General AssignedTo: benc at hawaga.org.uk ReportedBy: benc at hawaga.org.uk CC: swift-devel at ci.uchicago.edu The following program generates output files z.3.out and z.7.out. This is what I expected. Substituting prefix to be "99" instead of "z" produces files: 0099.3.out and 0099.7.out - the array index value is padded to four digits. This is slightly surprising. And substituting prefix to be "99-" causes an execution failure like this: Swift v0.1-dev RunID: spqficzyd1ey1 Execution failed: For input string: "99-" which is very surprising. It looks as if the mapper is trying to find structure (unsuccessfully) inside prefix when perhaps it shouldn't. This is with swift r900. Program follows: type messagefile {} (messagefile t) greeting() { app { echo "hello" stdout=@filename(t); } } messagefile outfile[] ; outfile[3] = greeting(); outfile[7] = greeting(); -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From nefedova at mcs.anl.gov Mon Jul 9 09:45:38 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Mon, 9 Jul 2007 09:45:38 -0500 Subject: [Swift-devel] Re: Missing security context In-Reply-To: <46916C20.5050602@fnal.gov> References: <46916C20.5050602@fnal.gov> Message-ID: <0678E6CE-4521-4752-95DE-6BAF80C25F13@mcs.anl.gov> Hi, Luciano: My guess would be that there is a mismatch between your sites.xml and tc.data files: provider 'pbs' is mentioned maybe in only one of those files? Could you please send me these 2 files? I am Cc to swift-devel - maybe there is a more definite answer to you question. Thanks, Nika On Jul 8, 2007, at 5:58 PM, Luciano Piccoli wrote: > > Hi Veronika, > > I'm building swift in order to test a new mapper, but I'm having > some troubles configuring it. From the following error message can > you recognize what is missing? > > bash-3.00$ swift -tc.file ./tc.data3 example.swift -NUM=3 > Execution failed: > No security context can be found or created for service > (provider pbs): No 'pbs' provider or alias found. Available > providers: [gt2ft, gsiftp, condor, ssh, gt4ft, local, gt4, gsiftp- > old, gt2, ftp, webdav]. Aliases: webdav <-> http; local <-> file; > gsiftp-old <-> gridftp-old; gsiftp <-> gridftp; gt4 <-> gt3.9.5, > gt4.0.2, gt4.0.1, gt4.0.0; > > Thanks, > Luciano > From hategan at mcs.anl.gov Mon Jul 9 09:58:27 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 09 Jul 2007 09:58:27 -0500 Subject: [Swift-devel] Re: Missing security context In-Reply-To: <0678E6CE-4521-4752-95DE-6BAF80C25F13@mcs.anl.gov> References: <46916C20.5050602@fnal.gov> <0678E6CE-4521-4752-95DE-6BAF80C25F13@mcs.anl.gov> Message-ID: <1183993107.7428.4.camel@blabla.mcs.anl.gov> You need to download the pbs provider separately. I think you can find the latest at http://wiki.cogkit.org/index.php/V:4.1.5/Java_CoG_Kit_Release_Page#Downloads On Mon, 2007-07-09 at 09:45 -0500, Veronika Nefedova wrote: > Hi, Luciano: > > My guess would be that there is a mismatch between your sites.xml and > tc.data files: provider 'pbs' is mentioned maybe in only one of > those files? Could you please send me these 2 files? I am Cc to > swift-devel - maybe there is a more definite answer to you question. > > Thanks, > > Nika > > On Jul 8, 2007, at 5:58 PM, Luciano Piccoli wrote: > > > > > Hi Veronika, > > > > I'm building swift in order to test a new mapper, but I'm having > > some troubles configuring it. From the following error message can > > you recognize what is missing? > > > > bash-3.00$ swift -tc.file ./tc.data3 example.swift -NUM=3 > > Execution failed: > > No security context can be found or created for service > > (provider pbs): No 'pbs' provider or alias found. Available > > providers: [gt2ft, gsiftp, condor, ssh, gt4ft, local, gt4, gsiftp- > > old, gt2, ftp, webdav]. Aliases: webdav <-> http; local <-> file; > > gsiftp-old <-> gridftp-old; gsiftp <-> gridftp; gt4 <-> gt3.9.5, > > gt4.0.2, gt4.0.1, gt4.0.0; > > > > Thanks, > > Luciano > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From tiberius at ci.uchicago.edu Mon Jul 9 10:28:48 2007 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Mon, 9 Jul 2007 10:28:48 -0500 Subject: [Swift-devel] arrays-of-arrays In-Reply-To: References: Message-ID: Do you have an example when this would be useful ? In the case doing parameter sweeps, I would be tempted to replace a[m][n] with b[m] and c[n] and loop over b and c Tibi On 7/8/07, Ben Clifford wrote: > The present language syntax does not admit arrays-of-arrays, with > expressions such as a[5][3]. However, I don't see anything particularly > constraining in the implementation to require this. Does anyone have > preference? > -- > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Tiberiu (Tibi) Stef-Praun, PhD Research Staff, Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From benc at hawaga.org.uk Mon Jul 9 12:06:02 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 9 Jul 2007 17:06:02 +0000 (GMT) Subject: [Swift-devel] arrays-of-arrays In-Reply-To: References: Message-ID: On Mon, 9 Jul 2007, Tiberiu Stef-Praun wrote: > Do you have an example when this would be useful ? not particularly - I just noticed that the way that some of the language changes that I'm making, it probably is no longer hard to have this syntax and wanted to know if there was a deeper reason for it to not be around. -- From hategan at mcs.anl.gov Mon Jul 9 12:11:09 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 09 Jul 2007 12:11:09 -0500 Subject: [Swift-devel] arrays-of-arrays In-Reply-To: References: Message-ID: <1184001069.12696.0.camel@blabla.mcs.anl.gov> On Mon, 2007-07-09 at 17:06 +0000, Ben Clifford wrote: > > On Mon, 9 Jul 2007, Tiberiu Stef-Praun wrote: > > > Do you have an example when this would be useful ? > > not particularly - I just noticed that the way that some of the language > changes that I'm making, it probably is no longer hard to have this syntax > and wanted to know if there was a deeper reason for it to not be around. Yong had some issues with it. Maybe he can clarify. > From benc at hawaga.org.uk Mon Jul 9 17:03:50 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 9 Jul 2007 22:03:50 +0000 (GMT) Subject: [Swift-devel] dot files by default In-Reply-To: <1183956228.4067.2.camel@blabla.mcs.anl.gov> References: <1183956228.4067.2.camel@blabla.mcs.anl.gov> Message-ID: On Sun, 8 Jul 2007, Mihael Hategan wrote: > The default is 'true'. The issue is whether to switch the default to > 'false'. If no one pops up claiming to regularly use the outputted .dot files by default then I'll change this to false. -- From hategan at mcs.anl.gov Mon Jul 9 17:05:09 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 09 Jul 2007 17:05:09 -0500 Subject: [Swift-devel] dot files by default In-Reply-To: References: <1183956228.4067.2.camel@blabla.mcs.anl.gov> Message-ID: <1184018709.24076.0.camel@blabla.mcs.anl.gov> On Mon, 2007-07-09 at 22:03 +0000, Ben Clifford wrote: > > On Sun, 8 Jul 2007, Mihael Hategan wrote: > > > The default is 'true'. The issue is whether to switch the default to > > 'false'. > > If no one pops up claiming to regularly use the outputted .dot files by > default then I'll change this to false. +1 > From benc at hawaga.org.uk Tue Jul 10 11:53:39 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 10 Jul 2007 17:53:39 +0100 (BST) Subject: [Swift-devel] status of conversion of bug 30: making the xml intermediate form more XML-like Message-ID: At present, the XML intermediate form (between the user written SwiftScript form and the karajan code) is partly XML and partly other languages. This makes parsing of the XML language hard and thus the language somewhat buggy. In practice, this has resulted in wasted time and frustration for various people in this group trying to write applications. So I'm working on converting the XML intermediate language to be more XML-like without the various embedded non-XML / quasi-XML languages that are there. I have a basic implementation that is not ready for real use but seems to behave mostly ok. A couple of caveats: i) different number types are not supported - there is a bunch of implicit type conversion between ints and floats that happens inside the present runtime. As part of tightening up the type checking, this messed up a bunch of numerical stuff so I temporarily have made my development code only accept integers - no floats (I don't know of anyone who uses non-integers in programs, though). I need to think some more about implicit type conversion and how operator overloading should work - at the moment in production a lot of semantics are inherited from karajan that are maybe but not necessarily what are right. ii) The present production implementation has a dual type model - sometimes data flows around as java objects of types such as Integer or String; sometimes it flows around as DSHandle objects which contain those values. The need to convert between those at many points causes trouble. My development code keeps values in DSHandle objects as much as possible. This is some additional runtime overhead (because the expression 1 + 2 now creates three intermediate DSHandle objects, rather than evaluating the expression as the Karajan level and wrapping at the end). However, in practice expressions are not used very much and so this overhead is hopefully not excessively onerous. If it is, there is scope for optimistion to happen at the xml->kml layer (as I'm doing with path handling). -- From hategan at mcs.anl.gov Tue Jul 10 12:02:35 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 10 Jul 2007 12:02:35 -0500 Subject: [Swift-devel] status of conversion of bug 30: making the xml intermediate form more XML-like In-Reply-To: References: Message-ID: <1184086955.13408.4.camel@blabla.mcs.anl.gov> On Tue, 2007-07-10 at 17:53 +0100, Ben Clifford wrote: > My development code keeps values in DSHandle objects as much as possible. That's what I would (and might have) argued for. > > This is some additional runtime overhead (because the expression 1 + 2 now > creates three intermediate DSHandle objects, rather than evaluating the > expression as the Karajan level and wrapping at the end). I think the best solution is to not use the normal karajan functions for swift arithmetic. > > However, in practice expressions are not used very much and so this > overhead is hopefully not excessively onerous. If it is, there is scope > for optimistion to happen at the xml->kml layer (as I'm doing with path > handling). > From benc at hawaga.org.uk Tue Jul 10 12:04:47 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 10 Jul 2007 17:04:47 +0000 (GMT) Subject: [Swift-devel] status of conversion of bug 30: making the xml intermediate form more XML-like In-Reply-To: <1184086955.13408.4.camel@blabla.mcs.anl.gov> References: <1184086955.13408.4.camel@blabla.mcs.anl.gov> Message-ID: On Tue, 10 Jul 2007, Mihael Hategan wrote: > I think the best solution is to not use the normal karajan functions for > swift arithmetic. It doesn't in my present impl - I have code that unwraps and rewraps (and then uses the underlying karajan functions). Though, something more fancy is necessary there I think, once I figure out what return types should look like. -- From hategan at mcs.anl.gov Tue Jul 10 12:09:50 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 10 Jul 2007 12:09:50 -0500 Subject: [Swift-devel] status of conversion of bug 30: making the xml intermediate form more XML-like In-Reply-To: References: <1184086955.13408.4.camel@blabla.mcs.anl.gov> Message-ID: <1184087390.13873.1.camel@blabla.mcs.anl.gov> On Tue, 2007-07-10 at 17:04 +0000, Ben Clifford wrote: > > On Tue, 10 Jul 2007, Mihael Hategan wrote: > > > I think the best solution is to not use the normal karajan functions for > > swift arithmetic. > > It doesn't in my present impl - I have code that unwraps and rewraps (and > then uses the underlying karajan functions). Though, something more fancy > is necessary there I think, once I figure out what return types should > look like. Right. Those could be implemented in Java directly for performance reasons. > From bugzilla-daemon at mcs.anl.gov Wed Jul 11 11:20:43 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 11 Jul 2007 11:20:43 -0500 (CDT) Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: Message-ID: <20070711162043.9FCF416502@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 ------- Comment #19 from nefedova at mcs.anl.gov 2007-07-11 11:20 ------- Some issues arised when testing the re-written swift code with loops (in attempt to reduce the size and thus to eliminate a possible reason for the problems with large workflows). When tested with just one loop - it all worked, but when intrioduced an inside loop, it just hangs there. I have 2 loops in the workflow, one inside the other: foreach f in files{ do_something; print(a); foreach s in sfiles{ print(b); something; if (a=="blah"){ do_staff; }else{ do_another_stuff; } } # close foreach s } # close foreach f (the full code could be found on terminable in ~nefedova/MolDyn.dtm) I see the code hanging without *any* errors right when the second foreach is supposed to start. I.e. I see a is being printed but not b. Any suggestions on what could be wrong here? -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Wed Jul 11 16:40:20 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 11 Jul 2007 16:40:20 -0500 (CDT) Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: Message-ID: <20070711214020.D4BDA16502@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 ------- Comment #20 from hategan at mcs.anl.gov 2007-07-11 16:40 ------- (In reply to comment #19) > Some issues arised when testing the re-written swift code with loops (in > attempt to reduce the size and thus to eliminate a possible reason for the > problems with large workflows). When tested with just one loop - it all worked, > but when intrioduced an inside loop, it just hangs there. > [...] Working on it... -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Wed Jul 11 18:12:32 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 11 Jul 2007 18:12:32 -0500 (CDT) Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: Message-ID: <20070711231232.6005E16502@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 ------- Comment #21 from hategan at mcs.anl.gov 2007-07-11 18:12 ------- It freezes because files[] is not used. In a sense. The compiler should tag all data that is not an lvalue but appears as part of an expression as input data. Apparently the compiler misses the part where the variable is used by a for loop. You can convince swift that files and sfiles are inputs by doing something like print(files); print(sfiles);. Nonetheless, this should be fixed in the compiler. Mihael (In reply to comment #20) > (In reply to comment #19) > > Some issues arised when testing the re-written swift code with loops (in > > attempt to reduce the size and thus to eliminate a possible reason for the > > problems with large workflows). When tested with just one loop - it all worked, > > but when intrioduced an inside loop, it just hangs there. > > [...] > > Working on it... > > -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From benc at hawaga.org.uk Thu Jul 12 08:25:53 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 12 Jul 2007 13:25:53 +0000 (GMT) Subject: [Swift-devel] cog svn update Message-ID: cog svn update is failing for me (and I think tibi) for the past few days: $ svn update svn: PROPFIND request failed on '/svnroot/cogkit/trunk/current/src/cog' svn: PROPFIND of '/svnroot/cogkit/trunk/current/src/cog': Could not resolve hostname `svn.sourceforge.net': No address associated with nodename (https://svn.sourceforge.net) $ svn info Path: . URL: https://svn.sourceforge.net/svnroot/cogkit/trunk/current/src/cog Repository Root: https://svn.sourceforge.net/svnroot/cogkit Repository UUID: 5b74d2a0-fa0e-0410-85ed-ffba77ec0bde Revision: 1658 Node Kind: directory Schedule: normal Last Changed Author: hategan Last Changed Rev: 1658 Last Changed Date: 2007-07-05 23:30:44 +0200 (Thu, 05 Jul 2007) -- From hategan at mcs.anl.gov Thu Jul 12 23:45:30 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 12 Jul 2007 23:45:30 -0500 Subject: [Swift-devel] cog svn update In-Reply-To: References: Message-ID: <1184301930.24582.0.camel@blabla.mcs.anl.gov> http://sourceforge.net/docs/A04#1184001090 ( 2007-07-09 10:43:54 - Project Subversion (SVN) Service ) As announced, support for the deprecated subversion access method (svn.sourceforge.net) was removed. Please use the PROJECT.svn.sourceforge.net access method that is described in our docs. On Thu, 2007-07-12 at 13:25 +0000, Ben Clifford wrote: > cog svn update is failing for me (and I think tibi) for the past few days: > > $ svn update > svn: PROPFIND request failed on '/svnroot/cogkit/trunk/current/src/cog' > svn: PROPFIND of '/svnroot/cogkit/trunk/current/src/cog': Could not > resolve hostname `svn.sourceforge.net': No address associated with > nodename (https://svn.sourceforge.net) > > $ svn info > Path: . > URL: https://svn.sourceforge.net/svnroot/cogkit/trunk/current/src/cog > Repository Root: https://svn.sourceforge.net/svnroot/cogkit > Repository UUID: 5b74d2a0-fa0e-0410-85ed-ffba77ec0bde > Revision: 1658 > Node Kind: directory > Schedule: normal > Last Changed Author: hategan > Last Changed Rev: 1658 > Last Changed Date: 2007-07-05 23:30:44 +0200 (Thu, 05 Jul 2007) > > From benc at hawaga.org.uk Fri Jul 13 02:05:32 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 13 Jul 2007 07:05:32 +0000 (GMT) Subject: [Swift-devel] cog svn update In-Reply-To: <1184301930.24582.0.camel@blabla.mcs.anl.gov> References: <1184301930.24582.0.camel@blabla.mcs.anl.gov> Message-ID: ok. I see you fixed that in nighly builds. I've changed the download page too. On Thu, 12 Jul 2007, Mihael Hategan wrote: > http://sourceforge.net/docs/A04#1184001090 > > ( 2007-07-09 10:43:54 - Project Subversion (SVN) Service ) As > announced, support for the deprecated subversion access method > (svn.sourceforge.net) was removed. Please use the > PROJECT.svn.sourceforge.net access method that is described in our docs. > > On Thu, 2007-07-12 at 13:25 +0000, Ben Clifford wrote: > > cog svn update is failing for me (and I think tibi) for the past few days: > > > > $ svn update > > svn: PROPFIND request failed on '/svnroot/cogkit/trunk/current/src/cog' > > svn: PROPFIND of '/svnroot/cogkit/trunk/current/src/cog': Could not > > resolve hostname `svn.sourceforge.net': No address associated with > > nodename (https://svn.sourceforge.net) > > > > $ svn info > > Path: . > > URL: https://svn.sourceforge.net/svnroot/cogkit/trunk/current/src/cog > > Repository Root: https://svn.sourceforge.net/svnroot/cogkit > > Repository UUID: 5b74d2a0-fa0e-0410-85ed-ffba77ec0bde > > Revision: 1658 > > Node Kind: directory > > Schedule: normal > > Last Changed Author: hategan > > Last Changed Rev: 1658 > > Last Changed Date: 2007-07-05 23:30:44 +0200 (Thu, 05 Jul 2007) > > > > > > From bugzilla-daemon at mcs.anl.gov Fri Jul 13 13:34:34 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 13 Jul 2007 13:34:34 -0500 (CDT) Subject: [Swift-devel] [Bug 83] New: nested loops hung Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83 Summary: nested loops hung Product: Swift Version: unspecified Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: General AssignedTo: hategan at mcs.anl.gov ReportedBy: nefedova at mcs.anl.gov CC: swift-devel at ci.uchicago.edu OtherBugsDependingO 72 nThis: Workflows with nested loops freeze. Specifically, this construct: foreach f in files{ do_something; print(a); foreach s in sfiles{ print(b); something; if (a=="blah"){ do_staff; }else{ do_another_stuff; } } # close foreach s } # close foreach f (the full code could be found on terminable in ~nefedova/MolDyn.dtm) Comments from Mihael: It freezes because files[] is not used. In a sense. The compiler should tag all data that is not an lvalue but appears as part of an expression as input data. Apparently the compiler misses the part where the variable is used by a for loop. You can convince swift that files and sfiles are inputs by doing something like print(files); print(sfiles);. Nonetheless, this should be fixed in the compiler. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Fri Jul 13 13:34:35 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 13 Jul 2007 13:34:35 -0500 (CDT) Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: Message-ID: <20070713183435.5069416506@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 nefedova at mcs.anl.gov changed: What |Removed |Added ---------------------------------------------------------------------------- BugsThisDependsOn| |83 -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Fri Jul 13 14:24:16 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 13 Jul 2007 14:24:16 -0500 (CDT) Subject: [Swift-devel] [Bug 83] nested loops hung In-Reply-To: Message-ID: <20070713192416.87C54164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|hategan at mcs.anl.gov |benc at hawaga.org.uk -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Fri Jul 13 14:46:59 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 13 Jul 2007 14:46:59 -0500 (CDT) Subject: [Swift-devel] [Bug 83] nested loops hung In-Reply-To: Message-ID: <20070713194659.01B77164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83 ------- Comment #1 from benc at hawaga.org.uk 2007-07-13 14:46 ------- The supplied program isn't something that can be fed into swift - its missing definitions for all of the variables. I tried the below and it does not hang in r912. Please can you supply a small test program that is a complete valid swift program and hangs. type file; file files[] ; file sfiles[] ; foreach f in files{ print(f); foreach s in sfiles{ print(s); } # close foreach s } # close foreach f -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Fri Jul 13 14:54:06 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 13 Jul 2007 14:54:06 -0500 (CDT) Subject: [Swift-devel] [Bug 83] nested loops hung In-Reply-To: Message-ID: <20070713195406.75188164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83 ------- Comment #2 from nefedova at mcs.anl.gov 2007-07-13 14:54 ------- Please copy *.mol2 and *.prt from ~nefedova/alamines to your directory and try this program: type file {} file files[]; file sfiles[]; string a = "a"; string b = "b"; string c = "c"; print(c); foreach file f in files { string aa = "aa"; print(aa); foreach s in sfiles{ print(b); if (a=="a"){ print (a); }else{ print(b); } } } It hangs after printing "c". -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Fri Jul 13 14:57:45 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 13 Jul 2007 14:57:45 -0500 (CDT) Subject: [Swift-devel] [Bug 83] nested loops hung In-Reply-To: Message-ID: <20070713195745.0EC61164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83 ------- Comment #3 from hategan at mcs.anl.gov 2007-07-13 14:57 ------- (In reply to comment #1) > The supplied program isn't something that can be fed into swift [...] -------------- type file {} file f1[] ; //magic switch below //print(f1); foreach i1 in f1 { } -------------- You need a some .mol2 dummies. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Fri Jul 13 15:16:02 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 13 Jul 2007 15:16:02 -0500 (CDT) Subject: [Swift-devel] [Bug 83] nested loops hung In-Reply-To: Message-ID: <20070713201602.3B855164DD@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83 ------- Comment #4 from benc at hawaga.org.uk 2007-07-13 15:16 ------- hangs for me. but if I replace the array definition with: file f1[] ;. it does not. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Fri Jul 13 15:26:26 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 13 Jul 2007 15:26:26 -0500 (CDT) Subject: [Swift-devel] [Bug 83] nested loops hung In-Reply-To: Message-ID: <20070713202626.4C085164DD@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83 ------- Comment #5 from hategan at mcs.anl.gov 2007-07-13 15:26 ------- (In reply to comment #4) > hangs for me. but if I replace the array definition with: > > file f1[] ;. > > it does not. > Regardless. In the hanging scenario f1 is not marked as input, although it should. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Fri Jul 13 16:32:16 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 13 Jul 2007 16:32:16 -0500 (CDT) Subject: [Swift-devel] [Bug 83] nested loops hung In-Reply-To: Message-ID: <20070713213216.0B349164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83 ------- Comment #6 from nefedova at mcs.anl.gov 2007-07-13 16:32 ------- this is a snippet from my code: file fls[]; print(fls); foreach file in files { command; foreach prt_file in fls { <> } } and it still hangs inside the second loop, while the stuff in the first loop("command;)that comes before the second loop works. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From foster at mcs.anl.gov Sat Jul 14 15:44:12 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Sat, 14 Jul 2007 15:44:12 -0500 Subject: [Swift-devel] MolDyn Message-ID: <4699359C.5080200@mcs.anl.gov> Hi, I haven't seen any communications regarding MolDyn recently. Where do things stand with the 244 molecule run Ian From iraicu at cs.uchicago.edu Sun Jul 15 01:05:02 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Sun, 15 Jul 2007 01:05:02 -0500 Subject: [Swift-devel] MolDyn In-Reply-To: <4699359C.5080200@mcs.anl.gov> References: <4699359C.5080200@mcs.anl.gov> Message-ID: <4699B90E.4070708@cs.uchicago.edu> Hi, I think Nika has been waiting on me this week, as we are still using the AstroPortal allocation at the ANL/UC site. I have been super busy with the camera ready Falkon paper, re-running experiments, etc... but I just finished that! Assuming Nika is ready (which I think she is) , we'll give the 244 mol run another try on Monday! Ioan Ian Foster wrote: > Hi, > > I haven't seen any communications regarding MolDyn recently. Where do > things stand with the 244 molecule run > > Ian > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ From foster at mcs.anl.gov Sun Jul 15 15:58:13 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Sun, 15 Jul 2007 15:58:13 -0500 Subject: [Swift-devel] MolDyn In-Reply-To: <4699B90E.4070708@cs.uchicago.edu> References: <4699359C.5080200@mcs.anl.gov> <4699B90E.4070708@cs.uchicago.edu> Message-ID: <469A8A65.5010209@mcs.anl.gov> This is crazy ... Nike is working on this, not you--she should not be waiting for you, or depending on an AstroPortal allocation. Ioan Raicu wrote: > Hi, > I think Nika has been waiting on me this week, as we are still using > the AstroPortal allocation at the ANL/UC site. I have been super busy > with the camera ready Falkon paper, re-running experiments, etc... but > I just finished that! Assuming Nika is ready (which I think she is) , > we'll give the 244 mol run another try on Monday! > > Ioan > > Ian Foster wrote: >> Hi, >> >> I haven't seen any communications regarding MolDyn recently. Where do >> things stand with the 244 molecule run >> >> Ian >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > -- Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org. From iraicu at cs.uchicago.edu Sun Jul 15 16:06:59 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Sun, 15 Jul 2007 16:06:59 -0500 Subject: [Swift-devel] MolDyn In-Reply-To: <469A8A65.5010209@mcs.anl.gov> References: <4699359C.5080200@mcs.anl.gov> <4699B90E.4070708@cs.uchicago.edu> <469A8A65.5010209@mcs.anl.gov> Message-ID: <469A8C73.5000901@cs.uchicago.edu> To speed up the debugging process, we decided to use the ANL site. Nika did not have an allocation there, so we decided to go ahead and use my allocation (AstroPortal's) just for debugging. Essentially, I was running Falkon under my credentials, and Nika was running Swift. This has been the way things have been for the past weeks since we moved over to the ANL site with the MolDyn code. Perhaps its time to give Nika the latest Falkon code, and run Falkon with her credentials. Then, she wouldn't have to wait for me, unless there were problems with Falkon that need to be resolved. Nika, do you finally have credentials for ANL, or would we have to move over to Purdue again? perhaps we can do one more debug run at ANL tomorrow (Monday) under my credentials, as we have everything setup and ready to go? Ioan Ian Foster wrote: > This is crazy ... Nike is working on this, not you--she should not be > waiting for you, or depending on an AstroPortal allocation. > > Ioan Raicu wrote: >> Hi, >> I think Nika has been waiting on me this week, as we are still using >> the AstroPortal allocation at the ANL/UC site. I have been super >> busy with the camera ready Falkon paper, re-running experiments, >> etc... but I just finished that! Assuming Nika is ready (which I >> think she is) , we'll give the 244 mol run another try on Monday! >> >> Ioan >> >> Ian Foster wrote: >>> Hi, >>> >>> I haven't seen any communications regarding MolDyn recently. Where >>> do things stand with the 244 molecule run >>> >>> Ian >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >> > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ From bugzilla-daemon at mcs.anl.gov Sun Jul 15 23:09:53 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sun, 15 Jul 2007 23:09:53 -0500 (CDT) Subject: [Swift-devel] [Bug 83] nested loops hung In-Reply-To: Message-ID: <20070716040953.8B931164DD@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83 ------- Comment #7 from tiberius at mcs.anl.gov 2007-07-15 23:09 ------- The code below also hangs. On the execution node, I only get a subset of the echo jobs to be executed. This is not good at all. I was trying the following patttern: A set of similar inputs (processData) I need to process through various procedures (echoA, echoB, echoNone) and I was trying to have a batch job that processes all the inputs through these procedures. Note that in this case there are no dependencies between the procedures (echoA,echoB, echoNone). This has got to be a pretty standard pattern. type file{}; (file echoAfile) echoA (string sIn){ app{ echo sIn stdout=@filename(echoAfile); } } (file echoBfile) echoB (string sIn){ app{ echo sIn stdout=@filename(echoBfile); } } (file echoCfile) echoNone(){ app{ echo "NONE" stdout=@filename(echoCfile); } } (file aResults[], file bResults[], file noResults) testLoop (string symbols[]){ noResults=echoNone(); foreach s,i in symbols { aResults[i] = echoA(s); bResults[i] = echoB(s); } } string processData[]=["data-1", "data-2"]; string echoANames = "data-1.A data-2.B"; string echoBNames = "data-2.A data-2.B"; file echoEmpty<"echo.empty">; file echoA[]; file echoB[]; (echoA, echoB, echoEmpty) = testLoop (processData); -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From foster at mcs.anl.gov Mon Jul 16 00:45:28 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Mon, 16 Jul 2007 00:45:28 -0500 Subject: [Swift-devel] MolDyn In-Reply-To: <469A8A65.5010209@mcs.anl.gov> References: <4699359C.5080200@mcs.anl.gov> <4699B90E.4070708@cs.uchicago.edu> <469A8A65.5010209@mcs.anl.gov> Message-ID: <469B05F8.9010801@mcs.anl.gov> Mike points out that Nike has been very busy re-rolling loops in MolDyn, thanks to the new @strcut operator. I still feel concerned about the fact that we don't yet seem to have allocations sorted out. Ian. Ian Foster wrote: > This is crazy ... Nike is working on this, not you--she should not be > waiting for you, or depending on an AstroPortal allocation. > > Ioan Raicu wrote: >> Hi, >> I think Nika has been waiting on me this week, as we are still using >> the AstroPortal allocation at the ANL/UC site. I have been super >> busy with the camera ready Falkon paper, re-running experiments, >> etc... but I just finished that! Assuming Nika is ready (which I >> think she is) , we'll give the 244 mol run another try on Monday! >> >> Ioan >> >> Ian Foster wrote: >>> Hi, >>> >>> I haven't seen any communications regarding MolDyn recently. Where >>> do things stand with the 244 molecule run >>> >>> Ian >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >> > -- Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org. From iraicu at cs.uchicago.edu Mon Jul 16 00:50:00 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 16 Jul 2007 00:50:00 -0500 Subject: [Swift-devel] MolDyn In-Reply-To: <469B05F8.9010801@mcs.anl.gov> References: <4699359C.5080200@mcs.anl.gov> <4699B90E.4070708@cs.uchicago.edu> <469A8A65.5010209@mcs.anl.gov> <469B05F8.9010801@mcs.anl.gov> Message-ID: <469B0708.1010908@cs.uchicago.edu> Right, I also had the impression that Nika was busy rewriting the workflow, so she wasn't really just waiting on me, sitting idle.... We'll do another attempt for the 244 mol run tomorrow from the AstroPortal allocation. There were some minor changes I made in Falkon, and Mihael made some fixes that caused the stack overflow, so let's see how it all holds up! Ioan Ian Foster wrote: > Mike points out that Nike has been very busy re-rolling loops in > MolDyn, thanks to the new @strcut operator. > > I still feel concerned about the fact that we don't yet seem to have > allocations sorted out. > > Ian. > > Ian Foster wrote: >> This is crazy ... Nike is working on this, not you--she should not be >> waiting for you, or depending on an AstroPortal allocation. >> >> Ioan Raicu wrote: >>> Hi, >>> I think Nika has been waiting on me this week, as we are still using >>> the AstroPortal allocation at the ANL/UC site. I have been super >>> busy with the camera ready Falkon paper, re-running experiments, >>> etc... but I just finished that! Assuming Nika is ready (which I >>> think she is) , we'll give the 244 mol run another try on Monday! >>> >>> Ioan >>> >>> Ian Foster wrote: >>>> Hi, >>>> >>>> I haven't seen any communications regarding MolDyn recently. Where >>>> do things stand with the 244 molecule run >>>> >>>> Ian >>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>> >> > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ From benc at hawaga.org.uk Mon Jul 16 01:16:24 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 16 Jul 2007 06:16:24 +0000 (GMT) Subject: [Swift-devel] MolDyn In-Reply-To: <469A8C73.5000901@cs.uchicago.edu> References: <4699359C.5080200@mcs.anl.gov> <4699B90E.4070708@cs.uchicago.edu> <469A8A65.5010209@mcs.anl.gov> <469A8C73.5000901@cs.uchicago.edu> Message-ID: On Sun, 15 Jul 2007, Ioan Raicu wrote: > code. Perhaps its time to give Nika the latest Falkon code, and run Falkon > with her credentials. Then, she wouldn't have to wait for me, unless there Perhaps its time to put Falkon somewhere where people can download the latest code wherever and whenever they want. -- From benc at hawaga.org.uk Mon Jul 16 03:14:05 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 16 Jul 2007 08:14:05 +0000 (GMT) Subject: [Swift-devel] numeric type(s) in swift. Message-ID: Swift has floating point and integer types. However, now that I look at implementing those, it makes me wonder if we should have a single numeric type. Its not clear that we need float/double in the language as distinct types. -- From bugzilla-daemon at mcs.anl.gov Mon Jul 16 06:52:08 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 16 Jul 2007 06:52:08 -0500 (CDT) Subject: [Swift-devel] [Bug 83] nested loops hung In-Reply-To: Message-ID: <20070716115208.CB34E164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83 ------- Comment #8 from benc at hawaga.org.uk 2007-07-16 06:52 ------- I don't think that comment #7 is this bug. Please open a new one. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From benc at hawaga.org.uk Mon Jul 16 06:58:25 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 16 Jul 2007 11:58:25 +0000 (GMT) Subject: [Swift-devel] bug 82: request for centralised installed applications catalog Message-ID: Tibi put the following bug in: > I'm thinking a tc.data database on the web, where everyone who has a > swift workflow can publish the applications that they have installed on > the Grid, and let others benefit from using them > Ex:http://www.ci.uchicago.edu/wiki/bin/view/SWFT/SwiftGridResources > This will be an extra incentive for people to use swift: they can use > already existing (and verified) applications from the grid. > If we had a web interface for this, we could add it to the Swift > webpage, and let visitors see that the Swift has a active and diverse > set of users. i) who will own the list? that person would need to be responsible for ongoing verification (and documenting what they mean by verification) of that list, including regularly removing entries that have ceased to verify. ii) anything in SVN already has a URL to link to - anything in SVN is already 'on the web'. A better place for this might be as a tc.data.big file in the SVN, given that everyone really is using HEAD not releases at the moment. -- From tiberius at ci.uchicago.edu Mon Jul 16 08:01:53 2007 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Mon, 16 Jul 2007 08:01:53 -0500 Subject: [Swift-devel] [Bug 83] nested loops hung In-Reply-To: <20070716115208.CB34E164EC@foxtrot.mcs.anl.gov> References: <20070716115208.CB34E164EC@foxtrot.mcs.anl.gov> Message-ID: Well, it's still about loops that hang. I did not want to pollute the bugzilla with another bug that is very similar to the nested loops bug. Maybe comment #7 is a different realization of the same bug. Hopefully a bit of progress in addressing the loops bug will clear up whether this should be a different bug or not. On 7/16/07, bugzilla-daemon at mcs.anl.gov wrote: > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83 > > > > > > ------- Comment #8 from benc at hawaga.org.uk 2007-07-16 06:52 ------- > I don't think that comment #7 is this bug. Please open a new one. > > > -- > Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Tiberiu (Tibi) Stef-Praun, PhD Research Staff, Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From benc at hawaga.org.uk Mon Jul 16 08:37:22 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 16 Jul 2007 13:37:22 +0000 (GMT) Subject: [Swift-devel] [Bug 83] nested loops hung In-Reply-To: References: <20070716115208.CB34E164EC@foxtrot.mcs.anl.gov> Message-ID: open a new bug. descripe the subset of echos that actually run. see if you can recreate it with a smaller program. if it turns out to be the same, its easy to mark as duplicate. On Mon, 16 Jul 2007, Tiberiu Stef-Praun wrote: > Well, it's still about loops that hang. > I did not want to pollute the bugzilla with another bug that is very > similar to the nested loops bug. Maybe comment #7 is a different > realization of the same bug. > Hopefully a bit of progress in addressing the loops bug will clear up > whether this should be a different bug or not. > > > On 7/16/07, bugzilla-daemon at mcs.anl.gov wrote: > > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83 > > > > > > > > > > > > ------- Comment #8 from benc at hawaga.org.uk 2007-07-16 06:52 ------- > > I don't think that comment #7 is this bug. Please open a new one. > > > > > > -- > > Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email > > ------- You are receiving this mail because: ------- > > You are on the CC list for the bug, or are watching someone who is. > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > From bugzilla-daemon at mcs.anl.gov Mon Jul 16 15:09:08 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 16 Jul 2007 15:09:08 -0500 (CDT) Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: Message-ID: <20070716200908.0112316502@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 ------- Comment #22 from nefedova at mcs.anl.gov 2007-07-16 15:09 ------- a new 244-molecule experiment has started. You can watch it live here: http://viper.uchicago.edu:55000/index.htm Please notice that the link is valid only while the job is running. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Mon Jul 16 15:14:38 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 16 Jul 2007 15:14:38 -0500 (CDT) Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: Message-ID: <20070716201438.826B9164DD@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 ------- Comment #23 from iraicu at cs.uchicago.edu 2007-07-16 15:14 ------- (In reply to comment #22) > a new 244-molecule experiment has started. You can watch it live here: > http://viper.uchicago.edu:55000/index.htm > > Please notice that the link is valid only while the job is running. > Actually, the graphs will be generated every 60 sec until the script is shut down... and the web server and graph generation scripts are set to shut down when Falkon is shut down, and not when Swift finishes the run. Once the run is over, I'll shut everything down and post the graphs on a static web page that is persistent for later viewing (I'll send out the new URL). Ioan -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From foster at mcs.anl.gov Mon Jul 16 16:27:07 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Mon, 16 Jul 2007 16:27:07 -0500 Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: <20070716201438.826B9164DD@foxtrot.mcs.anl.gov> References: <20070716201438.826B9164DD@foxtrot.mcs.anl.gov> Message-ID: <469BE2AB.1090609@mcs.anl.gov> hey this is neat! bugzilla-daemon at mcs.anl.gov wrote: > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 > > > > > > ------- Comment #23 from iraicu at cs.uchicago.edu 2007-07-16 15:14 ------- > (In reply to comment #22) > >> a new 244-molecule experiment has started. You can watch it live here: >> http://viper.uchicago.edu:55000/index.htm >> >> Please notice that the link is valid only while the job is running. >> >> > > Actually, the graphs will be generated every 60 sec until the script is shut > down... and the web server and graph generation scripts are set to shut down > when Falkon is shut down, and not when Swift finishes the run. Once the run is > over, I'll shut everything down and post the graphs on a static web page that > is persistent for later viewing (I'll send out the new URL). > > Ioan > > > -- Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at mcs.anl.gov Mon Jul 16 17:06:19 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 16 Jul 2007 17:06:19 -0500 (CDT) Subject: [Swift-devel] [Bug 83] nested loops hung In-Reply-To: Message-ID: <20070716220619.99D75164DD@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83 ------- Comment #9 from tiberius at mcs.anl.gov 2007-07-16 17:06 ------- Comment #7 was caused by a type on the workflow. Never mind, and sorry for the confusion. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Mon Jul 16 17:08:09 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 16 Jul 2007 17:08:09 -0500 (CDT) Subject: [Swift-devel] [Bug 83] nested loops hung In-Reply-To: Message-ID: <20070716220809.06341164DD@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83 ------- Comment #10 from tiberius at mcs.anl.gov 2007-07-16 17:08 ------- (In reply to comment #9) > Comment #7 was caused by a type on the workflow. > Never mind, and sorry for the confusion. > I meant typo. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Mon Jul 16 17:18:52 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 16 Jul 2007 17:18:52 -0500 (CDT) Subject: [Swift-devel] [Bug 83] nested loops hung In-Reply-To: Message-ID: <20070716221852.92C19164DD@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83 ------- Comment #11 from hategan at mcs.anl.gov 2007-07-16 17:18 ------- I'd file this as a separate bug report. This is nasty and costly behavior. Mappers can probably keep a list of output files mapped and complain when two output things map to the same file. (In reply to comment #9) > Comment #7 was caused by a type on the workflow. > Never mind, and sorry for the confusion. > -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From benc at hawaga.org.uk Tue Jul 17 07:13:53 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 17 Jul 2007 12:13:53 +0000 (GMT) Subject: [Swift-devel] swift tutorial at ISSGC07 Message-ID: I just did a 1h30m swift tutorial at the two-week-long International Summer School on Grid Computing 2007 in Sweden. The tutorial was pretty much the same as what we did at TG07. It went pretty well - no problems with running out of entropy like last time. People reached the end approximately on time. There are still some inelegant bits with mappers in this tutorial - there's at least one bug open for that and eventually it will get fixed. -- From wilde at mcs.anl.gov Tue Jul 17 07:42:07 2007 From: wilde at mcs.anl.gov (Mike Wilde) Date: Tue, 17 Jul 2007 07:42:07 -0500 Subject: [Swift-devel] swift tutorial at ISSGC07 In-Reply-To: References: Message-ID: <469CB91F.1090903@mcs.anl.gov> Sounds great, Ben! Any comments from the students? (All - this is around 60-70 students) - Mike Ben Clifford wrote, On 7/17/2007 7:13 AM: > I just did a 1h30m swift tutorial at the two-week-long International > Summer School on Grid Computing 2007 in Sweden. > > The tutorial was pretty much the same as what we did at TG07. > > It went pretty well - no problems with running out of entropy like last > time. People reached the end approximately on time. > > There are still some inelegant bits with mappers in this tutorial - > there's at least one bug open for that and eventually it will get fixed. > -- Mike Wilde Computation Institute, University of Chicago Math & Computer Science Division Argonne National Laboratory Argonne, IL 60439 USA tel 630-252-7497 fax 630-252-1997 From benc at hawaga.org.uk Tue Jul 17 08:53:14 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 17 Jul 2007 13:53:14 +0000 (GMT) Subject: [Swift-devel] 0.2 release (again) Message-ID: I'm building a release candidate for a low-effort 0.2 release from swift r915 and cog r1658. Will post here with it later on. -- From benc at hawaga.org.uk Tue Jul 17 10:47:36 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 17 Jul 2007 15:47:36 +0000 (GMT) Subject: [Swift-devel] 0.2 release (again) In-Reply-To: References: Message-ID: On Tue, 17 Jul 2007, Ben Clifford wrote: > I'm building a release candidate for a low-effort 0.2 release from swift > r915 and cog r1658. Will post here with it later on. http://www.ci.uchicago.edu/~benc/vdsk-0.2.tar.gz $ md5sum vdsk-0.2.tar.gz 25130bbe97f2f10653b48968953c6d84 vdsk-0.2.tar.gz It runs hello world for me. I haven't done any other testing. -- From benc at hawaga.org.uk Tue Jul 17 10:50:09 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 17 Jul 2007 15:50:09 +0000 (GMT) Subject: [Swift-devel] Re: dot files by default In-Reply-To: References: Message-ID: On Wed, 4 Jul 2007, Ben Clifford wrote: > does anyone have preference about whether .dot graphviz files are > generated by default or not? > > I find them a bit annoying in as much as they double the number of run > files in my working directories to no immediate benefit. r907 makes this turned off by default. -- From bugzilla-daemon at mcs.anl.gov Tue Jul 17 16:08:59 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 17 Jul 2007 16:08:59 -0500 (CDT) Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: Message-ID: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 ------- Comment #24 from iraicu at cs.uchicago.edu 2007-07-17 16:08 ------- So the latest MolDyn's 244 mol run also failed... but I think it made it all the way to the final few jobs... The place where I put all the information about the run is at: http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ Here are the graphs: http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg The Swift log can be found at: http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log The Falkon logs are at: http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/ The 244 mol run was supposed to have 20497 tasks, broken down as follows: 1 1 1 1 244 244 1 244 244 68 244 16592 1 244 244 11 244 2684 1 244 244 1 244 244 ====================== 20497 We had 20495 tasks that exited with an exit code of 0, and 6 tasks that exited with an exit code of -3. The worker logs don't show anything on the stdout or stderr of the failed jobs. I looked online what an exit code of -3 could mean, but didn't find anything. Here are the failed 6 tasks: Executing task urn:0-9408-1184616132483... Building executable command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei fe_stdout_m112 stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite --resultonly --wham_outputs wf_m112 --solv_lrc_file solv_chg_a10_m112_done --fe_file fe_solv_m112 Task urn:0-9408-1184616132483 completed with exit code -3 in 238 ms Executing task urn:0-9408-1184616133199... Building executable command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei fe_stdout_m112 stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite --resultonly --wham_outputs wf_m112 --solv_lrc_file solv_chg_a10_m112_done --fe_file fe_solv_m112 Task urn:0-9408-1184616133199 completed with exit code -3 in 201 ms Executing task urn:0-15036-1184616133342... Building executable command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei fe_stdout_m179 stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite --resultonly --wham_outputs wf_m179 --solv_lrc_file solv_chg_a10_m179_done --fe_file fe_solv_m179 Task urn:0-15036-1184616133342 completed with exit code -3 in 267 ms Executing task urn:0-15036-1184616133628... Building executable command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei fe_stdout_m179 stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite --resultonly --wham_outputs wf_m179 --solv_lrc_file solv_chg_a10_m179_done --fe_file fe_solv_m179 Task urn:0-15036-1184616133628 completed with exit code -3 in 2368 ms Executing task urn:0-15036-1184616133528... Building executable command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei fe_stdout_m179 stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite --resultonly --wham_outputs wf_m179 --solv_lrc_file solv_chg_a10_m179_done --fe_file fe_solv_m179 Task urn:0-15036-1184616133528 completed with exit code -3 in 311 ms Executing task urn:0-9408-1184616130688... Building executable command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei fe_stdout_m112 stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite --resultonly --wham_outputs wf_m112 --solv_lrc_file solv_chg_a10_m112_done --fe_file fe_solv_m112 Task urn:0-9408-1184616130688 completed with exit code -3 in 464 ms Both the Falkon logs and the Swift logs agree on the number of submitted tasks, number of successful tasks, and number of failed tasks. There were no outstanding tasks at the time when the workflow failed. BTW, I checked the disk space usage after about an hour that the whole experiment finished, and there was plenty of disk space left. Yong mentioned that he looked through the output of MolDyn, and there were only 242 'fe_solv_*' files, so 2 molecule files were missing... one question for Nika, are the 6 failed tasks the same job, resubmitted? Nika, can you add anything more to this? Is there anything else to be learned from the Swift log, as to why those last few jobs failed? After we have tried to figure out what happened, can we resume the workflow, and hopefully finish the last few jobs in another run? Ioan -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From foster at mcs.anl.gov Tue Jul 17 21:39:20 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Tue, 17 Jul 2007 21:39:20 -0500 Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov> References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov> Message-ID: <469D7D58.8000908@mcs.anl.gov> Ioan: a) I think this information should be in the bugzilla summary, according to our processes? b) Why did it take so long to get all of the workers working? c) Can we debug using less than O(800) node hours? Ian. bugzilla-daemon at mcs.anl.gov wrote: > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 > > > > > > ------- Comment #24 from iraicu at cs.uchicago.edu 2007-07-17 16:08 ------- > So the latest MolDyn's 244 mol run also failed... but I think it made it all > the way to the final few jobs... > > The place where I put all the information about the run is at: > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ > > Here are the graphs: > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg > > The Swift log can be found at: > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log > > The Falkon logs are at: > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/ > > The 244 mol run was supposed to have 20497 tasks, broken down as follows: > 1 1 1 > 1 244 244 > 1 244 244 > 68 244 16592 > 1 244 244 > 11 244 2684 > 1 244 244 > 1 244 244 > ====================== > 20497 > > We had 20495 tasks that exited with an exit code of 0, and 6 tasks that exited > with an exit code of -3. The worker logs don't show anything on the stdout or > stderr of the failed jobs. I looked online what an exit code of -3 could mean, > but didn't find anything. > > Here are the failed 6 tasks: > Executing task urn:0-9408-1184616132483... Building executable > command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei fe_stdout_m112 > stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out > solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out > solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out > solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out > solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 > fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite > --resultonly --wham_outputs wf_m112 --solv_lrc_file solv_chg_a10_m112_done > --fe_file fe_solv_m112 > Task urn:0-9408-1184616132483 completed with exit code -3 in 238 ms > > Executing task urn:0-9408-1184616133199... Building executable > command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei fe_stdout_m112 > stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out > solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out > solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out > solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out > solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 > fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite > --resultonly --wham_outputs wf_m112 --solv_lrc_file solv_chg_a10_m112_done > --fe_file fe_solv_m112 > Task urn:0-9408-1184616133199 completed with exit code -3 in 201 ms > > Executing task urn:0-15036-1184616133342... Building executable > command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei fe_stdout_m179 > stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out > solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out > solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out > solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out > solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 > fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite > --resultonly --wham_outputs wf_m179 --solv_lrc_file solv_chg_a10_m179_done > --fe_file fe_solv_m179 > Task urn:0-15036-1184616133342 completed with exit code -3 in 267 ms > > Executing task urn:0-15036-1184616133628... Building executable > command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei fe_stdout_m179 > stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out > solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out > solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out > solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out > solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 > fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite > --resultonly --wham_outputs wf_m179 --solv_lrc_file solv_chg_a10_m179_done > --fe_file fe_solv_m179 > Task urn:0-15036-1184616133628 completed with exit code -3 in 2368 ms > > Executing task urn:0-15036-1184616133528... Building executable > command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei fe_stdout_m179 > stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out > solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out > solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out > solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out > solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 > fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite > --resultonly --wham_outputs wf_m179 --solv_lrc_file solv_chg_a10_m179_done > --fe_file fe_solv_m179 > Task urn:0-15036-1184616133528 completed with exit code -3 in 311 ms > > Executing task urn:0-9408-1184616130688... Building executable > command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei fe_stdout_m112 > stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out > solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out > solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out > solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out > solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 > fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite > --resultonly --wham_outputs wf_m112 --solv_lrc_file solv_chg_a10_m112_done > --fe_file fe_solv_m112 > Task urn:0-9408-1184616130688 completed with exit code -3 in 464 ms > > > Both the Falkon logs and the Swift logs agree on the number of submitted tasks, > number of successful tasks, and number of failed tasks. There were no > outstanding tasks at the time when the workflow failed. BTW, I checked the > disk space usage after about an hour that the whole experiment finished, and > there was plenty of disk space left. > > Yong mentioned that he looked through the output of MolDyn, and there were only > 242 'fe_solv_*' files, so 2 molecule files were missing... one question for > Nika, are the 6 failed tasks the same job, resubmitted? > > Nika, can you add anything more to this? Is there anything else to be learned > from the Swift log, as to why those last few jobs failed? After we have tried > to figure out what happened, can we resume the workflow, and hopefully finish > the last few jobs in another run? > > Ioan > > > -- Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org. From foster at mcs.anl.gov Tue Jul 17 21:43:52 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Tue, 17 Jul 2007 21:43:52 -0500 Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: <469D7D58.8000908@mcs.anl.gov> References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov> <469D7D58.8000908@mcs.anl.gov> Message-ID: <469D7E68.9050202@mcs.anl.gov> Another (perhaps dumb?) question--it would seem desirable that we be able to quickly determine what tasks failed and then (attempt to) rerun them in such circumstances. Here it seems that a lot of effort is required just to determine what tasks failed, and I am not sure that the information extracted is enough to rerun them. It also seems that we can't easily determine which output files are missing. Ian. Ian Foster wrote: > Ioan: > > a) I think this information should be in the bugzilla summary, > according to our processes? > > b) Why did it take so long to get all of the workers working? > > c) Can we debug using less than O(800) node hours? > > Ian. > > bugzilla-daemon at mcs.anl.gov wrote: >> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 >> >> >> >> >> >> ------- Comment #24 from iraicu at cs.uchicago.edu 2007-07-17 16:08 >> ------- >> So the latest MolDyn's 244 mol run also failed... but I think it made >> it all >> the way to the final few jobs... >> >> The place where I put all the information about the run is at: >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ >> >> >> Here are the graphs: >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg >> >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg >> >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg >> >> >> The Swift log can be found at: >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log >> >> >> The Falkon logs are at: >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/ >> >> >> The 244 mol run was supposed to have 20497 tasks, broken down as >> follows: >> 1 1 1 >> 1 244 244 >> 1 244 244 >> 68 244 16592 >> 1 244 244 >> 11 244 2684 >> 1 244 244 >> 1 244 244 >> ====================== >> 20497 >> >> We had 20495 tasks that exited with an exit code of 0, and 6 tasks >> that exited >> with an exit code of -3. The worker logs don't show anything on the >> stdout or >> stderr of the failed jobs. I looked online what an exit code of -3 >> could mean, >> but didn't find anything. >> Here are the failed 6 tasks: >> Executing task urn:0-9408-1184616132483... Building executable >> command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei >> fe_stdout_m112 >> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out >> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out >> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out >> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out >> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 >> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >> --resultonly --wham_outputs wf_m112 --solv_lrc_file >> solv_chg_a10_m112_done >> --fe_file fe_solv_m112 Task urn:0-9408-1184616132483 completed with >> exit code -3 in 238 ms >> >> Executing task urn:0-9408-1184616133199... Building executable >> command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei >> fe_stdout_m112 >> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out >> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out >> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out >> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out >> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 >> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >> --resultonly --wham_outputs wf_m112 --solv_lrc_file >> solv_chg_a10_m112_done >> --fe_file fe_solv_m112 Task urn:0-9408-1184616133199 completed with >> exit code -3 in 201 ms >> >> Executing task urn:0-15036-1184616133342... Building executable >> command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei >> fe_stdout_m179 >> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out >> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out >> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out >> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out >> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 >> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >> --resultonly --wham_outputs wf_m179 --solv_lrc_file >> solv_chg_a10_m179_done >> --fe_file fe_solv_m179 Task urn:0-15036-1184616133342 completed with >> exit code -3 in 267 ms >> >> Executing task urn:0-15036-1184616133628... Building executable >> command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei >> fe_stdout_m179 >> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out >> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out >> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out >> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out >> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 >> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >> --resultonly --wham_outputs wf_m179 --solv_lrc_file >> solv_chg_a10_m179_done >> --fe_file fe_solv_m179 Task urn:0-15036-1184616133628 completed with >> exit code -3 in 2368 ms >> >> Executing task urn:0-15036-1184616133528... Building executable >> command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei >> fe_stdout_m179 >> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out >> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out >> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out >> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out >> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 >> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >> --resultonly --wham_outputs wf_m179 --solv_lrc_file >> solv_chg_a10_m179_done >> --fe_file fe_solv_m179 Task urn:0-15036-1184616133528 completed with >> exit code -3 in 311 ms >> >> Executing task urn:0-9408-1184616130688... Building executable >> command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei >> fe_stdout_m112 >> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out >> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out >> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out >> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out >> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 >> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >> --resultonly --wham_outputs wf_m112 --solv_lrc_file >> solv_chg_a10_m112_done >> --fe_file fe_solv_m112 Task urn:0-9408-1184616130688 completed with >> exit code -3 in 464 ms >> >> >> Both the Falkon logs and the Swift logs agree on the number of >> submitted tasks, >> number of successful tasks, and number of failed tasks. There were no >> outstanding tasks at the time when the workflow failed. BTW, I >> checked the >> disk space usage after about an hour that the whole experiment >> finished, and >> there was plenty of disk space left. >> >> Yong mentioned that he looked through the output of MolDyn, and there >> were only >> 242 'fe_solv_*' files, so 2 molecule files were missing... one >> question for >> Nika, are the 6 failed tasks the same job, resubmitted? >> Nika, can you add anything more to this? Is there anything else to >> be learned >> from the Swift log, as to why those last few jobs failed? After we >> have tried >> to figure out what happened, can we resume the workflow, and >> hopefully finish >> the last few jobs in another run? >> >> Ioan >> >> >> > -- Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org. From yongzh at cs.uchicago.edu Tue Jul 17 21:50:12 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Tue, 17 Jul 2007 21:50:12 -0500 (CDT) Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: <469D7E68.9050202@mcs.anl.gov> References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov> <469D7D58.8000908@mcs.anl.gov> <469D7E68.9050202@mcs.anl.gov> Message-ID: We already have retry mechanism there. I suspect the failed jobs were retried but failed again. The server side logs should have something about which files were missing. Yong. On Tue, 17 Jul 2007, Ian Foster wrote: > Another (perhaps dumb?) question--it would seem desirable that we be > able to quickly determine what tasks failed and then (attempt to) rerun > them in such circumstances. > > Here it seems that a lot of effort is required just to determine what > tasks failed, and I am not sure that the information extracted is enough > to rerun them. > > It also seems that we can't easily determine which output files are missing. > > Ian. > > Ian Foster wrote: > > Ioan: > > > > a) I think this information should be in the bugzilla summary, > > according to our processes? > > > > b) Why did it take so long to get all of the workers working? > > > > c) Can we debug using less than O(800) node hours? > > > > Ian. > > > > bugzilla-daemon at mcs.anl.gov wrote: > >> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 > >> > >> > >> > >> > >> > >> ------- Comment #24 from iraicu at cs.uchicago.edu 2007-07-17 16:08 > >> ------- > >> So the latest MolDyn's 244 mol run also failed... but I think it made > >> it all > >> the way to the final few jobs... > >> > >> The place where I put all the information about the run is at: > >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ > >> > >> > >> Here are the graphs: > >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg > >> > >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg > >> > >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg > >> > >> > >> The Swift log can be found at: > >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log > >> > >> > >> The Falkon logs are at: > >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/ > >> > >> > >> The 244 mol run was supposed to have 20497 tasks, broken down as > >> follows: > >> 1 1 1 > >> 1 244 244 > >> 1 244 244 > >> 68 244 16592 > >> 1 244 244 > >> 11 244 2684 > >> 1 244 244 > >> 1 244 244 > >> ====================== > >> 20497 > >> > >> We had 20495 tasks that exited with an exit code of 0, and 6 tasks > >> that exited > >> with an exit code of -3. The worker logs don't show anything on the > >> stdout or > >> stderr of the failed jobs. I looked online what an exit code of -3 > >> could mean, > >> but didn't find anything. > >> Here are the failed 6 tasks: > >> Executing task urn:0-9408-1184616132483... Building executable > >> command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei > >> fe_stdout_m112 > >> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out > >> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out > >> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out > >> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out > >> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 > >> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite > >> --resultonly --wham_outputs wf_m112 --solv_lrc_file > >> solv_chg_a10_m112_done > >> --fe_file fe_solv_m112 Task urn:0-9408-1184616132483 completed with > >> exit code -3 in 238 ms > >> > >> Executing task urn:0-9408-1184616133199... Building executable > >> command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei > >> fe_stdout_m112 > >> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out > >> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out > >> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out > >> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out > >> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 > >> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite > >> --resultonly --wham_outputs wf_m112 --solv_lrc_file > >> solv_chg_a10_m112_done > >> --fe_file fe_solv_m112 Task urn:0-9408-1184616133199 completed with > >> exit code -3 in 201 ms > >> > >> Executing task urn:0-15036-1184616133342... Building executable > >> command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei > >> fe_stdout_m179 > >> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out > >> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out > >> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out > >> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out > >> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 > >> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite > >> --resultonly --wham_outputs wf_m179 --solv_lrc_file > >> solv_chg_a10_m179_done > >> --fe_file fe_solv_m179 Task urn:0-15036-1184616133342 completed with > >> exit code -3 in 267 ms > >> > >> Executing task urn:0-15036-1184616133628... Building executable > >> command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei > >> fe_stdout_m179 > >> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out > >> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out > >> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out > >> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out > >> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 > >> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite > >> --resultonly --wham_outputs wf_m179 --solv_lrc_file > >> solv_chg_a10_m179_done > >> --fe_file fe_solv_m179 Task urn:0-15036-1184616133628 completed with > >> exit code -3 in 2368 ms > >> > >> Executing task urn:0-15036-1184616133528... Building executable > >> command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei > >> fe_stdout_m179 > >> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out > >> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out > >> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out > >> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out > >> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 > >> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite > >> --resultonly --wham_outputs wf_m179 --solv_lrc_file > >> solv_chg_a10_m179_done > >> --fe_file fe_solv_m179 Task urn:0-15036-1184616133528 completed with > >> exit code -3 in 311 ms > >> > >> Executing task urn:0-9408-1184616130688... Building executable > >> command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei > >> fe_stdout_m112 > >> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out > >> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out > >> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out > >> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out > >> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 > >> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite > >> --resultonly --wham_outputs wf_m112 --solv_lrc_file > >> solv_chg_a10_m112_done > >> --fe_file fe_solv_m112 Task urn:0-9408-1184616130688 completed with > >> exit code -3 in 464 ms > >> > >> > >> Both the Falkon logs and the Swift logs agree on the number of > >> submitted tasks, > >> number of successful tasks, and number of failed tasks. There were no > >> outstanding tasks at the time when the workflow failed. BTW, I > >> checked the > >> disk space usage after about an hour that the whole experiment > >> finished, and > >> there was plenty of disk space left. > >> > >> Yong mentioned that he looked through the output of MolDyn, and there > >> were only > >> 242 'fe_solv_*' files, so 2 molecule files were missing... one > >> question for > >> Nika, are the 6 failed tasks the same job, resubmitted? > >> Nika, can you add anything more to this? Is there anything else to > >> be learned > >> from the Swift log, as to why those last few jobs failed? After we > >> have tried > >> to figure out what happened, can we resume the workflow, and > >> hopefully finish > >> the last few jobs in another run? > >> > >> Ioan > >> > >> > >> > > > > -- > > Ian Foster, Director, Computation Institute > Argonne National Laboratory & University of Chicago > Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 > Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 > Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. > Globus Alliance: www.globus.org. > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Tue Jul 17 22:11:23 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 17 Jul 2007 22:11:23 -0500 Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: <469D7E68.9050202@mcs.anl.gov> References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov> <469D7D58.8000908@mcs.anl.gov> <469D7E68.9050202@mcs.anl.gov> Message-ID: <1184728284.2004.12.camel@blabla.mcs.anl.gov> On Tue, 2007-07-17 at 21:43 -0500, Ian Foster wrote: > Another (perhaps dumb?) question--it would seem desirable that we be > able to quickly determine what tasks failed and then (attempt to) rerun > them in such circumstances. > > Here it seems that a lot of effort is required just to determine what > tasks failed, and I am not sure that the information extracted is enough > to rerun them. Normally, a summary of what failed with the reasons is printed on stderr, together with the stdout and stderr of the jobs. Perhaps it should also go to the log file. In this case, 2 jobs failed. The 6 failures are due to restarts. Which is in agreement with the 2 missing molecules. When jobs fail, swift should not clean up the job directories so that one can do post-mortem debugging. I suggest invoking the application manually to see if it's a matter of a bad node or bad data. > > It also seems that we can't easily determine which output files are missing. In the general case we wouldn't be able to, because the exact outputs may only be known at run-time. Granted, that kind of dynamics would depend on our ability to have nondeterministic files being returned, which we haven't gotten around to implementing. But there is a question of whether we should try to implement a short term solution that would be invalidated by our own plans. > > Ian. > > Ian Foster wrote: > > Ioan: > > > > a) I think this information should be in the bugzilla summary, > > according to our processes? > > > > b) Why did it take so long to get all of the workers working? > > > > c) Can we debug using less than O(800) node hours? > > > > Ian. > > > > bugzilla-daemon at mcs.anl.gov wrote: > >> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 > >> > >> > >> > >> > >> > >> ------- Comment #24 from iraicu at cs.uchicago.edu 2007-07-17 16:08 > >> ------- > >> So the latest MolDyn's 244 mol run also failed... but I think it made > >> it all > >> the way to the final few jobs... > >> > >> The place where I put all the information about the run is at: > >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ > >> > >> > >> Here are the graphs: > >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg > >> > >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg > >> > >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg > >> > >> > >> The Swift log can be found at: > >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log > >> > >> > >> The Falkon logs are at: > >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/ > >> > >> > >> The 244 mol run was supposed to have 20497 tasks, broken down as > >> follows: > >> 1 1 1 > >> 1 244 244 > >> 1 244 244 > >> 68 244 16592 > >> 1 244 244 > >> 11 244 2684 > >> 1 244 244 > >> 1 244 244 > >> ====================== > >> 20497 > >> > >> We had 20495 tasks that exited with an exit code of 0, and 6 tasks > >> that exited > >> with an exit code of -3. The worker logs don't show anything on the > >> stdout or > >> stderr of the failed jobs. I looked online what an exit code of -3 > >> could mean, > >> but didn't find anything. > >> Here are the failed 6 tasks: > >> Executing task urn:0-9408-1184616132483... Building executable > >> command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei > >> fe_stdout_m112 > >> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out > >> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out > >> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out > >> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out > >> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 > >> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite > >> --resultonly --wham_outputs wf_m112 --solv_lrc_file > >> solv_chg_a10_m112_done > >> --fe_file fe_solv_m112 Task urn:0-9408-1184616132483 completed with > >> exit code -3 in 238 ms > >> > >> Executing task urn:0-9408-1184616133199... Building executable > >> command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei > >> fe_stdout_m112 > >> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out > >> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out > >> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out > >> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out > >> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 > >> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite > >> --resultonly --wham_outputs wf_m112 --solv_lrc_file > >> solv_chg_a10_m112_done > >> --fe_file fe_solv_m112 Task urn:0-9408-1184616133199 completed with > >> exit code -3 in 201 ms > >> > >> Executing task urn:0-15036-1184616133342... Building executable > >> command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei > >> fe_stdout_m179 > >> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out > >> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out > >> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out > >> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out > >> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 > >> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite > >> --resultonly --wham_outputs wf_m179 --solv_lrc_file > >> solv_chg_a10_m179_done > >> --fe_file fe_solv_m179 Task urn:0-15036-1184616133342 completed with > >> exit code -3 in 267 ms > >> > >> Executing task urn:0-15036-1184616133628... Building executable > >> command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei > >> fe_stdout_m179 > >> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out > >> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out > >> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out > >> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out > >> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 > >> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite > >> --resultonly --wham_outputs wf_m179 --solv_lrc_file > >> solv_chg_a10_m179_done > >> --fe_file fe_solv_m179 Task urn:0-15036-1184616133628 completed with > >> exit code -3 in 2368 ms > >> > >> Executing task urn:0-15036-1184616133528... Building executable > >> command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei > >> fe_stdout_m179 > >> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out > >> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out > >> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out > >> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out > >> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 > >> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite > >> --resultonly --wham_outputs wf_m179 --solv_lrc_file > >> solv_chg_a10_m179_done > >> --fe_file fe_solv_m179 Task urn:0-15036-1184616133528 completed with > >> exit code -3 in 311 ms > >> > >> Executing task urn:0-9408-1184616130688... Building executable > >> command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei > >> fe_stdout_m112 > >> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out > >> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out > >> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out > >> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out > >> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 > >> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite > >> --resultonly --wham_outputs wf_m112 --solv_lrc_file > >> solv_chg_a10_m112_done > >> --fe_file fe_solv_m112 Task urn:0-9408-1184616130688 completed with > >> exit code -3 in 464 ms > >> > >> > >> Both the Falkon logs and the Swift logs agree on the number of > >> submitted tasks, > >> number of successful tasks, and number of failed tasks. There were no > >> outstanding tasks at the time when the workflow failed. BTW, I > >> checked the > >> disk space usage after about an hour that the whole experiment > >> finished, and > >> there was plenty of disk space left. > >> > >> Yong mentioned that he looked through the output of MolDyn, and there > >> were only > >> 242 'fe_solv_*' files, so 2 molecule files were missing... one > >> question for > >> Nika, are the 6 failed tasks the same job, resubmitted? > >> Nika, can you add anything more to this? Is there anything else to > >> be learned > >> from the Swift log, as to why those last few jobs failed? After we > >> have tried > >> to figure out what happened, can we resume the workflow, and > >> hopefully finish > >> the last few jobs in another run? > >> > >> Ioan > >> > >> > >> > > > > -- > > Ian Foster, Director, Computation Institute > Argonne National Laboratory & University of Chicago > Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 > Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 > Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. > Globus Alliance: www.globus.org. > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From iraicu at cs.uchicago.edu Tue Jul 17 22:30:34 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 17 Jul 2007 22:30:34 -0500 Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: <469D7D58.8000908@mcs.anl.gov> References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov> <469D7D58.8000908@mcs.anl.gov> Message-ID: <469D895A.5090706@cs.uchicago.edu> Hi, See below: Ian Foster wrote: > Ioan: > > a) I think this information should be in the bugzilla summary, > according to our processes? > I posted all this to bugzilla, didn't I? > b) Why did it take so long to get all of the workers working? I finally had enough confidence in the dynamic resource provisioning that we won't loose any jobs across resource allocation boundaries (ran lots of tests and they were all positive), so I enabled it for this run. I set the max to be the entire ANL site (274 processors)... and we got 146 at the beginning, and with time, the # of processors kept increasing up to the peak of 208 or so... the rest up to 274 were queued up in the PBS wait queue. The difference between the beginning with 146 and the end with 208 was that others who were in the system at the beginning finished their work and released some nodes, and idle processors went from the wait queue into the run queue. I would actually be curious to try out the latest DRP stuff on a busy site, such as Purdue or NCSA, and to see if we can maintain a nice pool size over a period of time, despite the sites being busy... BTW, in the previous runs for MolDyn, we normally set the min and max to say 100 processors, or 200 processors, and we would wait until we had all of them before we started... sometimes, this meant waiting 12~24 hours for enough nodes to become free so the large job could start. With DRP, you can start off with whatever the site has available, and you get more with time as your jobs make it through the wait queue and other jobs that are running complete... > > c) Can we debug using less than O(800) node hours? The real MolDyn run for 244 molecules takes on the order of O(20K) node hours, so O(0.8K) is still an improvement. Remember that we can run the smaller workflows fine, but its the bigger ones that are giving us a hard time. Nika, if you have any other suggestion on how we can further reduce the run time of each job just to simulate the # of jobs and the input/output # of files, let us know. Ioan > > Ian. > > bugzilla-daemon at mcs.anl.gov wrote: >> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 >> >> >> >> >> >> ------- Comment #24 from iraicu at cs.uchicago.edu 2007-07-17 16:08 >> ------- >> So the latest MolDyn's 244 mol run also failed... but I think it made >> it all >> the way to the final few jobs... >> >> The place where I put all the information about the run is at: >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ >> >> >> Here are the graphs: >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg >> >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg >> >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg >> >> >> The Swift log can be found at: >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log >> >> >> The Falkon logs are at: >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/ >> >> >> The 244 mol run was supposed to have 20497 tasks, broken down as >> follows: >> 1 1 1 >> 1 244 244 >> 1 244 244 >> 68 244 16592 >> 1 244 244 >> 11 244 2684 >> 1 244 244 >> 1 244 244 >> ====================== >> 20497 >> >> We had 20495 tasks that exited with an exit code of 0, and 6 tasks >> that exited >> with an exit code of -3. The worker logs don't show anything on the >> stdout or >> stderr of the failed jobs. I looked online what an exit code of -3 >> could mean, >> but didn't find anything. >> Here are the failed 6 tasks: >> Executing task urn:0-9408-1184616132483... Building executable >> command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei >> fe_stdout_m112 >> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out >> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out >> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out >> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out >> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 >> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >> --resultonly --wham_outputs wf_m112 --solv_lrc_file >> solv_chg_a10_m112_done >> --fe_file fe_solv_m112 Task urn:0-9408-1184616132483 completed with >> exit code -3 in 238 ms >> >> Executing task urn:0-9408-1184616133199... Building executable >> command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei >> fe_stdout_m112 >> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out >> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out >> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out >> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out >> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 >> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >> --resultonly --wham_outputs wf_m112 --solv_lrc_file >> solv_chg_a10_m112_done >> --fe_file fe_solv_m112 Task urn:0-9408-1184616133199 completed with >> exit code -3 in 201 ms >> >> Executing task urn:0-15036-1184616133342... Building executable >> command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei >> fe_stdout_m179 >> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out >> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out >> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out >> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out >> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 >> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >> --resultonly --wham_outputs wf_m179 --solv_lrc_file >> solv_chg_a10_m179_done >> --fe_file fe_solv_m179 Task urn:0-15036-1184616133342 completed with >> exit code -3 in 267 ms >> >> Executing task urn:0-15036-1184616133628... Building executable >> command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei >> fe_stdout_m179 >> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out >> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out >> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out >> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out >> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 >> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >> --resultonly --wham_outputs wf_m179 --solv_lrc_file >> solv_chg_a10_m179_done >> --fe_file fe_solv_m179 Task urn:0-15036-1184616133628 completed with >> exit code -3 in 2368 ms >> >> Executing task urn:0-15036-1184616133528... Building executable >> command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei >> fe_stdout_m179 >> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out >> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out >> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out >> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out >> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 >> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >> --resultonly --wham_outputs wf_m179 --solv_lrc_file >> solv_chg_a10_m179_done >> --fe_file fe_solv_m179 Task urn:0-15036-1184616133528 completed with >> exit code -3 in 311 ms >> >> Executing task urn:0-9408-1184616130688... Building executable >> command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei >> fe_stdout_m112 >> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out >> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out >> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out >> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out >> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 >> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >> --resultonly --wham_outputs wf_m112 --solv_lrc_file >> solv_chg_a10_m112_done >> --fe_file fe_solv_m112 Task urn:0-9408-1184616130688 completed with >> exit code -3 in 464 ms >> >> >> Both the Falkon logs and the Swift logs agree on the number of >> submitted tasks, >> number of successful tasks, and number of failed tasks. There were no >> outstanding tasks at the time when the workflow failed. BTW, I >> checked the >> disk space usage after about an hour that the whole experiment >> finished, and >> there was plenty of disk space left. >> >> Yong mentioned that he looked through the output of MolDyn, and there >> were only >> 242 'fe_solv_*' files, so 2 molecule files were missing... one >> question for >> Nika, are the 6 failed tasks the same job, resubmitted? >> Nika, can you add anything more to this? Is there anything else to >> be learned >> from the Swift log, as to why those last few jobs failed? After we >> have tried >> to figure out what happened, can we resume the workflow, and >> hopefully finish >> the last few jobs in another run? >> >> Ioan >> >> >> > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ From iraicu at cs.uchicago.edu Tue Jul 17 22:33:02 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 17 Jul 2007 22:33:02 -0500 Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: <469D7E68.9050202@mcs.anl.gov> References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov> <469D7D58.8000908@mcs.anl.gov> <469D7E68.9050202@mcs.anl.gov> Message-ID: <469D89EE.5090202@cs.uchicago.edu> Ian Foster wrote: > Another (perhaps dumb?) question--it would seem desirable that we be > able to quickly determine what tasks failed and then (attempt to) > rerun them in such circumstances. \ I think Swift already does this up to a fixed # of times (I think it is 3 or 5). > > Here it seems that a lot of effort is required just to determine what > tasks failed, and I am not sure that the information extracted is > enough to rerun them. The failed tasks are pretty easy to find in the logs based on the exit code. If we were to do a resume from Swift, I think it would automatically resubmit just the failed tasks... but unless we figure out why they failed and fix the problem, they will likely again. > > It also seems that we can't easily determine which output files are > missing. I don't know about this one, Maybe Nika can comment on this. Ioan > > Ian. > > Ian Foster wrote: >> Ioan: >> >> a) I think this information should be in the bugzilla summary, >> according to our processes? >> >> b) Why did it take so long to get all of the workers working? >> >> c) Can we debug using less than O(800) node hours? >> >> Ian. >> >> bugzilla-daemon at mcs.anl.gov wrote: >>> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 >>> >>> >>> >>> >>> >>> ------- Comment #24 from iraicu at cs.uchicago.edu 2007-07-17 16:08 >>> ------- >>> So the latest MolDyn's 244 mol run also failed... but I think it >>> made it all >>> the way to the final few jobs... >>> >>> The place where I put all the information about the run is at: >>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ >>> >>> >>> Here are the graphs: >>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg >>> >>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg >>> >>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg >>> >>> >>> The Swift log can be found at: >>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log >>> >>> >>> The Falkon logs are at: >>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/ >>> >>> >>> The 244 mol run was supposed to have 20497 tasks, broken down as >>> follows: >>> 1 1 1 >>> 1 244 244 >>> 1 244 244 >>> 68 244 16592 >>> 1 244 244 >>> 11 244 2684 >>> 1 244 244 >>> 1 244 244 >>> ====================== >>> 20497 >>> >>> We had 20495 tasks that exited with an exit code of 0, and 6 tasks >>> that exited >>> with an exit code of -3. The worker logs don't show anything on the >>> stdout or >>> stderr of the failed jobs. I looked online what an exit code of -3 >>> could mean, >>> but didn't find anything. Here are the failed 6 tasks: >>> Executing task urn:0-9408-1184616132483... Building executable >>> command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei >>> fe_stdout_m112 >>> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out >>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out >>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out >>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out >>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 >>> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >>> --resultonly --wham_outputs wf_m112 --solv_lrc_file >>> solv_chg_a10_m112_done >>> --fe_file fe_solv_m112 Task urn:0-9408-1184616132483 completed with >>> exit code -3 in 238 ms >>> >>> Executing task urn:0-9408-1184616133199... Building executable >>> command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei >>> fe_stdout_m112 >>> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out >>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out >>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out >>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out >>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 >>> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >>> --resultonly --wham_outputs wf_m112 --solv_lrc_file >>> solv_chg_a10_m112_done >>> --fe_file fe_solv_m112 Task urn:0-9408-1184616133199 completed with >>> exit code -3 in 201 ms >>> >>> Executing task urn:0-15036-1184616133342... Building executable >>> command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei >>> fe_stdout_m179 >>> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out >>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out >>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out >>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out >>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 >>> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >>> --resultonly --wham_outputs wf_m179 --solv_lrc_file >>> solv_chg_a10_m179_done >>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133342 completed with >>> exit code -3 in 267 ms >>> >>> Executing task urn:0-15036-1184616133628... Building executable >>> command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei >>> fe_stdout_m179 >>> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out >>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out >>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out >>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out >>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 >>> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >>> --resultonly --wham_outputs wf_m179 --solv_lrc_file >>> solv_chg_a10_m179_done >>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133628 completed with >>> exit code -3 in 2368 ms >>> >>> Executing task urn:0-15036-1184616133528... Building executable >>> command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei >>> fe_stdout_m179 >>> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out >>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out >>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out >>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out >>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 >>> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >>> --resultonly --wham_outputs wf_m179 --solv_lrc_file >>> solv_chg_a10_m179_done >>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133528 completed with >>> exit code -3 in 311 ms >>> >>> Executing task urn:0-9408-1184616130688... Building executable >>> command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei >>> fe_stdout_m112 >>> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out >>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out >>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out >>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out >>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 >>> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >>> --resultonly --wham_outputs wf_m112 --solv_lrc_file >>> solv_chg_a10_m112_done >>> --fe_file fe_solv_m112 Task urn:0-9408-1184616130688 completed with >>> exit code -3 in 464 ms >>> >>> >>> Both the Falkon logs and the Swift logs agree on the number of >>> submitted tasks, >>> number of successful tasks, and number of failed tasks. There were no >>> outstanding tasks at the time when the workflow failed. BTW, I >>> checked the >>> disk space usage after about an hour that the whole experiment >>> finished, and >>> there was plenty of disk space left. >>> >>> Yong mentioned that he looked through the output of MolDyn, and >>> there were only >>> 242 'fe_solv_*' files, so 2 molecule files were missing... one >>> question for >>> Nika, are the 6 failed tasks the same job, resubmitted? Nika, can >>> you add anything more to this? Is there anything else to be learned >>> from the Swift log, as to why those last few jobs failed? After we >>> have tried >>> to figure out what happened, can we resume the workflow, and >>> hopefully finish >>> the last few jobs in another run? >>> >>> Ioan >>> >>> >>> >> > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ From foster at mcs.anl.gov Tue Jul 17 22:35:24 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Tue, 17 Jul 2007 22:35:24 -0500 Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: <469D895A.5090706@cs.uchicago.edu> References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov> <469D7D58.8000908@mcs.anl.gov> <469D895A.5090706@cs.uchicago.edu> Message-ID: <469D8A7C.9030604@mcs.anl.gov> Great! What resource acquisition policy are you using? >> b) Why did it take so long to get all of the workers working? > I finally had enough confidence in the dynamic resource provisioning > that we won't loose any jobs across resource allocation boundaries > (ran lots of tests and they were all positive), so I enabled it for > this run. I set the max to be the entire ANL site (274 processors)... > and we got 146 at the beginning, and with time, the # of processors > kept increasing up to the peak of 208 or so... the rest up to 274 were > queued up in the PBS wait queue. The difference between the beginning > with 146 and the end with 208 was that others who were in the system > at the beginning finished their work and released some nodes, and idle > processors went from the wait queue into the run queue. I would > actually be curious to try out the latest DRP stuff on a busy site, > such as Purdue or NCSA, and to see if we can maintain a nice pool size > over a period of time, despite the sites being busy... > > BTW, in the previous runs for MolDyn, we normally set the min and max > to say 100 processors, or 200 processors, and we would wait until we > had all of them before we started... sometimes, this meant waiting > 12~24 hours for enough nodes to become free so the large job could > start. With DRP, you can start off with whatever the site has > available, and you get more with time as your jobs make it through the > wait queue and other jobs that are running complete... From iraicu at cs.uchicago.edu Tue Jul 17 22:36:36 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 17 Jul 2007 22:36:36 -0500 Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: <1184728284.2004.12.camel@blabla.mcs.anl.gov> References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov> <469D7D58.8000908@mcs.anl.gov> <469D7E68.9050202@mcs.anl.gov> <1184728284.2004.12.camel@blabla.mcs.anl.gov> Message-ID: <469D8AC4.4010400@cs.uchicago.edu> Mihael Hategan wrote: > On Tue, 2007-07-17 at 21:43 -0500, Ian Foster wrote: > >> Another (perhaps dumb?) question--it would seem desirable that we be >> able to quickly determine what tasks failed and then (attempt to) rerun >> them in such circumstances. >> >> Here it seems that a lot of effort is required just to determine what >> tasks failed, and I am not sure that the information extracted is enough >> to rerun them. >> > > Normally, a summary of what failed with the reasons is printed on > stderr, together with the stdout and stderr of the jobs. Perhaps it > should also go to the log file. > > In this case, 2 jobs failed. The 6 failures are due to restarts. Which > is in agreement with the 2 missing molecules. > > When jobs fail, swift should not clean up the job directories so that > one can do post-mortem debugging. I suggest invoking the application > manually to see if it's a matter of a bad node or bad data. > The errors happened on 3 different nodes, so I suspect that its not bad nodes (as we had previously experience with the stale NFS handle). Nika, I sent out the actual commands that failed... can you try to run them manually to see what happens, and possibly determine why they failed? Can you also find out what an exit code of -3 means within the application that failed (you might have to look at the app source code, or contact the original source code writer). Ioan > >> It also seems that we can't easily determine which output files are missing. >> > > In the general case we wouldn't be able to, because the exact outputs > may only be known at run-time. Granted, that kind of dynamics would > depend on our ability to have nondeterministic files being returned, > which we haven't gotten around to implementing. But there is a question > of whether we should try to implement a short term solution that would > be invalidated by our own plans. > > >> Ian. >> >> Ian Foster wrote: >> >>> Ioan: >>> >>> a) I think this information should be in the bugzilla summary, >>> according to our processes? >>> >>> b) Why did it take so long to get all of the workers working? >>> >>> c) Can we debug using less than O(800) node hours? >>> >>> Ian. >>> >>> bugzilla-daemon at mcs.anl.gov wrote: >>> >>>> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 >>>> >>>> >>>> >>>> >>>> >>>> ------- Comment #24 from iraicu at cs.uchicago.edu 2007-07-17 16:08 >>>> ------- >>>> So the latest MolDyn's 244 mol run also failed... but I think it made >>>> it all >>>> the way to the final few jobs... >>>> >>>> The place where I put all the information about the run is at: >>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ >>>> >>>> >>>> Here are the graphs: >>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg >>>> >>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg >>>> >>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg >>>> >>>> >>>> The Swift log can be found at: >>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log >>>> >>>> >>>> The Falkon logs are at: >>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/ >>>> >>>> >>>> The 244 mol run was supposed to have 20497 tasks, broken down as >>>> follows: >>>> 1 1 1 >>>> 1 244 244 >>>> 1 244 244 >>>> 68 244 16592 >>>> 1 244 244 >>>> 11 244 2684 >>>> 1 244 244 >>>> 1 244 244 >>>> ====================== >>>> 20497 >>>> >>>> We had 20495 tasks that exited with an exit code of 0, and 6 tasks >>>> that exited >>>> with an exit code of -3. The worker logs don't show anything on the >>>> stdout or >>>> stderr of the failed jobs. I looked online what an exit code of -3 >>>> could mean, >>>> but didn't find anything. >>>> Here are the failed 6 tasks: >>>> Executing task urn:0-9408-1184616132483... Building executable >>>> command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei >>>> fe_stdout_m112 >>>> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out >>>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out >>>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out >>>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out >>>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 >>>> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >>>> --resultonly --wham_outputs wf_m112 --solv_lrc_file >>>> solv_chg_a10_m112_done >>>> --fe_file fe_solv_m112 Task urn:0-9408-1184616132483 completed with >>>> exit code -3 in 238 ms >>>> >>>> Executing task urn:0-9408-1184616133199... Building executable >>>> command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei >>>> fe_stdout_m112 >>>> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out >>>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out >>>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out >>>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out >>>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 >>>> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >>>> --resultonly --wham_outputs wf_m112 --solv_lrc_file >>>> solv_chg_a10_m112_done >>>> --fe_file fe_solv_m112 Task urn:0-9408-1184616133199 completed with >>>> exit code -3 in 201 ms >>>> >>>> Executing task urn:0-15036-1184616133342... Building executable >>>> command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei >>>> fe_stdout_m179 >>>> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out >>>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out >>>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out >>>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out >>>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 >>>> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >>>> --resultonly --wham_outputs wf_m179 --solv_lrc_file >>>> solv_chg_a10_m179_done >>>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133342 completed with >>>> exit code -3 in 267 ms >>>> >>>> Executing task urn:0-15036-1184616133628... Building executable >>>> command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei >>>> fe_stdout_m179 >>>> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out >>>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out >>>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out >>>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out >>>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 >>>> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >>>> --resultonly --wham_outputs wf_m179 --solv_lrc_file >>>> solv_chg_a10_m179_done >>>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133628 completed with >>>> exit code -3 in 2368 ms >>>> >>>> Executing task urn:0-15036-1184616133528... Building executable >>>> command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei >>>> fe_stdout_m179 >>>> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out >>>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out >>>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out >>>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out >>>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 >>>> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >>>> --resultonly --wham_outputs wf_m179 --solv_lrc_file >>>> solv_chg_a10_m179_done >>>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133528 completed with >>>> exit code -3 in 311 ms >>>> >>>> Executing task urn:0-9408-1184616130688... Building executable >>>> command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei >>>> fe_stdout_m112 >>>> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out >>>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out >>>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out >>>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out >>>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 >>>> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >>>> --resultonly --wham_outputs wf_m112 --solv_lrc_file >>>> solv_chg_a10_m112_done >>>> --fe_file fe_solv_m112 Task urn:0-9408-1184616130688 completed with >>>> exit code -3 in 464 ms >>>> >>>> >>>> Both the Falkon logs and the Swift logs agree on the number of >>>> submitted tasks, >>>> number of successful tasks, and number of failed tasks. There were no >>>> outstanding tasks at the time when the workflow failed. BTW, I >>>> checked the >>>> disk space usage after about an hour that the whole experiment >>>> finished, and >>>> there was plenty of disk space left. >>>> >>>> Yong mentioned that he looked through the output of MolDyn, and there >>>> were only >>>> 242 'fe_solv_*' files, so 2 molecule files were missing... one >>>> question for >>>> Nika, are the 6 failed tasks the same job, resubmitted? >>>> Nika, can you add anything more to this? Is there anything else to >>>> be learned >>>> from the Swift log, as to why those last few jobs failed? After we >>>> have tried >>>> to figure out what happened, can we resume the workflow, and >>>> hopefully finish >>>> the last few jobs in another run? >>>> >>>> Ioan >>>> >>>> >>>> >>>> >> -- >> >> Ian Foster, Director, Computation Institute >> Argonne National Laboratory & University of Chicago >> Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 >> Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 >> Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. >> Globus Alliance: www.globus.org. >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From foster at mcs.anl.gov Tue Jul 17 22:37:29 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Tue, 17 Jul 2007 22:37:29 -0500 Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: <469D89EE.5090202@cs.uchicago.edu> References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov> <469D7D58.8000908@mcs.anl.gov> <469D7E68.9050202@mcs.anl.gov> <469D89EE.5090202@cs.uchicago.edu> Message-ID: <469D8AF9.7070401@mcs.anl.gov> Sorry, I was unclear. What I meant was: in the event that Swift decides that things have "failed" (definitively), it would be good to have something like a DAGman "rescue dag" that would show exactly what needed to be done to resubmit a task manually. Your comment that "If we were to do a resume from Swift, I think it would automatically resubmit just the failed tasks" suggests that (in effect) we already ahve this. Ian. Ioan Raicu wrote: > > > Ian Foster wrote: >> Another (perhaps dumb?) question--it would seem desirable that we be >> able to quickly determine what tasks failed and then (attempt to) >> rerun them in such circumstances. \ > I think Swift already does this up to a fixed # of times (I think it > is 3 or 5). >> >> Here it seems that a lot of effort is required just to determine what >> tasks failed, and I am not sure that the information extracted is >> enough to rerun them. > The failed tasks are pretty easy to find in the logs based on the exit > code. If we were to do a resume from Swift, I think it would > automatically resubmit just the failed tasks... but unless we figure > out why they failed and fix the problem, they will likely again. >> >> It also seems that we can't easily determine which output files are >> missing. > I don't know about this one, Maybe Nika can comment on this. > > Ioan >> >> Ian. >> >> Ian Foster wrote: >>> Ioan: >>> >>> a) I think this information should be in the bugzilla summary, >>> according to our processes? >>> >>> b) Why did it take so long to get all of the workers working? >>> >>> c) Can we debug using less than O(800) node hours? >>> >>> Ian. >>> >>> bugzilla-daemon at mcs.anl.gov wrote: >>>> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 >>>> >>>> >>>> >>>> >>>> >>>> ------- Comment #24 from iraicu at cs.uchicago.edu 2007-07-17 16:08 >>>> ------- >>>> So the latest MolDyn's 244 mol run also failed... but I think it >>>> made it all >>>> the way to the final few jobs... >>>> >>>> The place where I put all the information about the run is at: >>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ >>>> >>>> >>>> Here are the graphs: >>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg >>>> >>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg >>>> >>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg >>>> >>>> >>>> The Swift log can be found at: >>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log >>>> >>>> >>>> The Falkon logs are at: >>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/ >>>> >>>> >>>> The 244 mol run was supposed to have 20497 tasks, broken down as >>>> follows: >>>> 1 1 1 >>>> 1 244 244 >>>> 1 244 244 >>>> 68 244 16592 >>>> 1 244 244 >>>> 11 244 2684 >>>> 1 244 244 >>>> 1 244 244 >>>> ====================== >>>> 20497 >>>> >>>> We had 20495 tasks that exited with an exit code of 0, and 6 tasks >>>> that exited >>>> with an exit code of -3. The worker logs don't show anything on >>>> the stdout or >>>> stderr of the failed jobs. I looked online what an exit code of -3 >>>> could mean, >>>> but didn't find anything. Here are the failed 6 tasks: >>>> Executing task urn:0-9408-1184616132483... Building executable >>>> command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei >>>> fe_stdout_m112 >>>> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out >>>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out >>>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out >>>> solv_repu_0.5_0.6_m112.out >>>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out >>>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 >>>> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >>>> --resultonly --wham_outputs wf_m112 --solv_lrc_file >>>> solv_chg_a10_m112_done >>>> --fe_file fe_solv_m112 Task urn:0-9408-1184616132483 completed with >>>> exit code -3 in 238 ms >>>> >>>> Executing task urn:0-9408-1184616133199... Building executable >>>> command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei >>>> fe_stdout_m112 >>>> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out >>>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out >>>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out >>>> solv_repu_0.5_0.6_m112.out >>>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out >>>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 >>>> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >>>> --resultonly --wham_outputs wf_m112 --solv_lrc_file >>>> solv_chg_a10_m112_done >>>> --fe_file fe_solv_m112 Task urn:0-9408-1184616133199 completed with >>>> exit code -3 in 201 ms >>>> >>>> Executing task urn:0-15036-1184616133342... Building executable >>>> command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei >>>> fe_stdout_m179 >>>> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out >>>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out >>>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out >>>> solv_repu_0.5_0.6_m179.out >>>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out >>>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 >>>> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >>>> --resultonly --wham_outputs wf_m179 --solv_lrc_file >>>> solv_chg_a10_m179_done >>>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133342 completed >>>> with exit code -3 in 267 ms >>>> >>>> Executing task urn:0-15036-1184616133628... Building executable >>>> command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei >>>> fe_stdout_m179 >>>> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out >>>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out >>>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out >>>> solv_repu_0.5_0.6_m179.out >>>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out >>>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 >>>> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >>>> --resultonly --wham_outputs wf_m179 --solv_lrc_file >>>> solv_chg_a10_m179_done >>>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133628 completed >>>> with exit code -3 in 2368 ms >>>> >>>> Executing task urn:0-15036-1184616133528... Building executable >>>> command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei >>>> fe_stdout_m179 >>>> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out >>>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out >>>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out >>>> solv_repu_0.5_0.6_m179.out >>>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out >>>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 >>>> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >>>> --resultonly --wham_outputs wf_m179 --solv_lrc_file >>>> solv_chg_a10_m179_done >>>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133528 completed >>>> with exit code -3 in 311 ms >>>> >>>> Executing task urn:0-9408-1184616130688... Building executable >>>> command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei >>>> fe_stdout_m112 >>>> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out >>>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out >>>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out >>>> solv_repu_0.5_0.6_m112.out >>>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out >>>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 >>>> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite >>>> --resultonly --wham_outputs wf_m112 --solv_lrc_file >>>> solv_chg_a10_m112_done >>>> --fe_file fe_solv_m112 Task urn:0-9408-1184616130688 completed with >>>> exit code -3 in 464 ms >>>> >>>> >>>> Both the Falkon logs and the Swift logs agree on the number of >>>> submitted tasks, >>>> number of successful tasks, and number of failed tasks. There were no >>>> outstanding tasks at the time when the workflow failed. BTW, I >>>> checked the >>>> disk space usage after about an hour that the whole experiment >>>> finished, and >>>> there was plenty of disk space left. >>>> >>>> Yong mentioned that he looked through the output of MolDyn, and >>>> there were only >>>> 242 'fe_solv_*' files, so 2 molecule files were missing... one >>>> question for >>>> Nika, are the 6 failed tasks the same job, resubmitted? Nika, can >>>> you add anything more to this? Is there anything else to be learned >>>> from the Swift log, as to why those last few jobs failed? After we >>>> have tried >>>> to figure out what happened, can we resume the workflow, and >>>> hopefully finish >>>> the last few jobs in another run? >>>> >>>> Ioan >>>> >>>> >>>> >>> >> > -- Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org. From iraicu at cs.uchicago.edu Tue Jul 17 22:37:56 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 17 Jul 2007 22:37:56 -0500 Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: <469D8A7C.9030604@mcs.anl.gov> References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov> <469D7D58.8000908@mcs.anl.gov> <469D895A.5090706@cs.uchicago.edu> <469D8A7C.9030604@mcs.anl.gov> Message-ID: <469D8B14.9090509@cs.uchicago.edu> Linear, 1, 2, 3, 4, ... For the ANL/UC site, its generating a small enough number of jobs... Ioan Ian Foster wrote: > Great! What resource acquisition policy are you using? >>> b) Why did it take so long to get all of the workers working? >> I finally had enough confidence in the dynamic resource provisioning >> that we won't loose any jobs across resource allocation boundaries >> (ran lots of tests and they were all positive), so I enabled it for >> this run. I set the max to be the entire ANL site (274 >> processors)... and we got 146 at the beginning, and with time, the # >> of processors kept increasing up to the peak of 208 or so... the rest >> up to 274 were queued up in the PBS wait queue. The difference >> between the beginning with 146 and the end with 208 was that others >> who were in the system at the beginning finished their work and >> released some nodes, and idle processors went from the wait queue >> into the run queue. I would actually be curious to try out the >> latest DRP stuff on a busy site, such as Purdue or NCSA, and to see >> if we can maintain a nice pool size over a period of time, despite >> the sites being busy... >> >> BTW, in the previous runs for MolDyn, we normally set the min and max >> to say 100 processors, or 200 processors, and we would wait until we >> had all of them before we started... sometimes, this meant waiting >> 12~24 hours for enough nodes to become free so the large job could >> start. With DRP, you can start off with whatever the site has >> available, and you get more with time as your jobs make it through >> the wait queue and other jobs that are running complete... > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ From tiberius at ci.uchicago.edu Tue Jul 17 23:18:49 2007 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Tue, 17 Jul 2007 23:18:49 -0500 Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: <469D8AC4.4010400@cs.uchicago.edu> References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov> <469D7D58.8000908@mcs.anl.gov> <469D7E68.9050202@mcs.anl.gov> <1184728284.2004.12.camel@blabla.mcs.anl.gov> <469D8AC4.4010400@cs.uchicago.edu> Message-ID: I also had jobs failing at the Argonne site today. It seems that the ia_32 were randomly fail on executing some of my jobs, so I had to switch my apps to the ia_64 to get a full, successful execution. Tibi On 7/17/07, Ioan Raicu wrote: > > > > Mihael Hategan wrote: > On Tue, 2007-07-17 at 21:43 -0500, Ian Foster wrote: > > > Another (perhaps dumb?) question--it would seem desirable that we be > able to quickly determine what tasks failed and then (attempt to) rerun > them in such circumstances. > > Here it seems that a lot of effort is required just to determine what > tasks failed, and I am not sure that the information extracted is enough > to rerun them. > > Normally, a summary of what failed with the reasons is printed on > stderr, together with the stdout and stderr of the jobs. Perhaps it > should also go to the log file. > > In this case, 2 jobs failed. The 6 failures are due to restarts. Which > is in agreement with the 2 missing molecules. > > When jobs fail, swift should not clean up the job directories so that > one can do post-mortem debugging. I suggest invoking the application > manually to see if it's a matter of a bad node or bad data. > > The errors happened on 3 different nodes, so I suspect that its not bad > nodes (as we had previously experience with the stale NFS handle). > > Nika, I sent out the actual commands that failed... can you try to run them > manually to see what happens, and possibly determine why they failed? Can > you also find out what an exit code of -3 means within the application that > failed (you might have to look at the app source code, or contact the > original source code writer). > > Ioan > > > > > It also seems that we can't easily determine which output files are > missing. > > In the general case we wouldn't be able to, because the exact outputs > may only be known at run-time. Granted, that kind of dynamics would > depend on our ability to have nondeterministic files being returned, > which we haven't gotten around to implementing. But there is a question > of whether we should try to implement a short term solution that would > be invalidated by our own plans. > > > > Ian. > > Ian Foster wrote: > > > Ioan: > > a) I think this information should be in the bugzilla summary, > according to our processes? > > b) Why did it take so long to get all of the workers working? > > c) Can we debug using less than O(800) node hours? > > Ian. > > bugzilla-daemon at mcs.anl.gov wrote: > > > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 > > > > > > ------- Comment #24 from iraicu at cs.uchicago.edu 2007-07-17 16:08 > ------- > So the latest MolDyn's 244 mol run also failed... but I think it made > it all > the way to the final few jobs... > > The place where I put all the information about the run is at: > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ > > > Here are the graphs: > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg > > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg > > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg > > > The Swift log can be found at: > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log > > > The Falkon logs are at: > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/ > > > The 244 mol run was supposed to have 20497 tasks, broken down as > follows: > 1 1 1 > 1 244 244 > 1 244 244 > 68 244 16592 > 1 244 244 > 11 244 2684 > 1 244 244 > 1 244 244 > ====================== > 20497 > > We had 20495 tasks that exited with an exit code of 0, and 6 tasks > that exited > with an exit code of -3. The worker logs don't show anything on the > stdout or > stderr of the failed jobs. I looked online what an exit code of -3 > could mean, > but didn't find anything. > Here are the failed 6 tasks: > Executing task urn:0-9408-1184616132483... Building executable > command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei > fe_stdout_m112 > stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out > solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out > solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out > solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out > solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 > fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl > --nosite > --resultonly --wham_outputs wf_m112 --solv_lrc_file > solv_chg_a10_m112_done > --fe_file fe_solv_m112 Task urn:0-9408-1184616132483 completed with > exit code -3 in 238 ms > > Executing task urn:0-9408-1184616133199... Building executable > command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei > fe_stdout_m112 > stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out > solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out > solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out > solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out > solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 > fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl > --nosite > --resultonly --wham_outputs wf_m112 --solv_lrc_file > solv_chg_a10_m112_done > --fe_file fe_solv_m112 Task urn:0-9408-1184616133199 completed with > exit code -3 in 201 ms > > Executing task urn:0-15036-1184616133342... Building executable > command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei > fe_stdout_m179 > stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out > solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out > solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out > solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out > solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 > fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl > --nosite > --resultonly --wham_outputs wf_m179 --solv_lrc_file > solv_chg_a10_m179_done > --fe_file fe_solv_m179 Task urn:0-15036-1184616133342 completed with > exit code -3 in 267 ms > > Executing task urn:0-15036-1184616133628... Building executable > command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei > fe_stdout_m179 > stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out > solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out > solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out > solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out > solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 > fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl > --nosite > --resultonly --wham_outputs wf_m179 --solv_lrc_file > solv_chg_a10_m179_done > --fe_file fe_solv_m179 Task urn:0-15036-1184616133628 completed with > exit code -3 in 2368 ms > > Executing task urn:0-15036-1184616133528... Building executable > command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei > fe_stdout_m179 > stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out > solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out > solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out > solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out > solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 > fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl > --nosite > --resultonly --wham_outputs wf_m179 --solv_lrc_file > solv_chg_a10_m179_done > --fe_file fe_solv_m179 Task urn:0-15036-1184616133528 completed with > exit code -3 in 311 ms > > Executing task urn:0-9408-1184616130688... Building executable > command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei > fe_stdout_m112 > stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out > solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out > solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out > solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out > solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 > fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl > --nosite > --resultonly --wham_outputs wf_m112 --solv_lrc_file > solv_chg_a10_m112_done > --fe_file fe_solv_m112 Task urn:0-9408-1184616130688 completed with > exit code -3 in 464 ms > > > Both the Falkon logs and the Swift logs agree on the number of > submitted tasks, > number of successful tasks, and number of failed tasks. There were no > outstanding tasks at the time when the workflow failed. BTW, I > checked the > disk space usage after about an hour that the whole experiment > finished, and > there was plenty of disk space left. > > Yong mentioned that he looked through the output of MolDyn, and there > were only > 242 'fe_solv_*' files, so 2 molecule files were missing... one > question for > Nika, are the 6 failed tasks the same job, resubmitted? > Nika, can you add anything more to this? Is there anything else to > be learned > from the Swift log, as to why those last few jobs failed? After we > have tried > to figure out what happened, can we resume the workflow, and > hopefully finish > the last few jobs in another run? > > Ioan > > > > > -- > > Ian Foster, Director, Computation Institute > Argonne National Laboratory & University of Chicago > Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 > Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 > Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. > Globus Alliance: www.globus.org. > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > ============================================ > Ioan Raicu > Ph.D. Student > ============================================ > Distributed Systems Laboratory > Computer Science Department > University of Chicago > 1100 E. 58th Street, Ryerson Hall > Chicago, IL 60637 > ============================================ > Email: iraicu at cs.uchicago.edu > Web: http://www.cs.uchicago.edu/~iraicu > http://dsl.cs.uchicago.edu/ > ============================================ > ============================================ > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Tiberiu (Tibi) Stef-Praun, PhD Research Staff, Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From hategan at mcs.anl.gov Tue Jul 17 23:32:44 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 17 Jul 2007 23:32:44 -0500 Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov> <469D7D58.8000908@mcs.anl.gov> <469D7E68.9050202@mcs.anl.gov> <1184728284.2004.12.camel@blabla.mcs.anl.gov> <469D8AC4.4010400@cs.uchicago.edu> Message-ID: <1184733164.14719.5.camel@blabla.mcs.anl.gov> I don't think these are random failures. In the whole workflow there were exactly 6 tasks failed. 3 belonging to one job and 3 to the other. Statistically, and if Ioan's assertion that they were not sent to the exact same worker is correct, I'd be pretty confident saying that it was due to specific executables failing on specific data (and by that I would include the possibility of missing data). Mihael On Tue, 2007-07-17 at 23:18 -0500, Tiberiu Stef-Praun wrote: > I also had jobs failing at the Argonne site today. > It seems that the ia_32 were randomly fail on executing some of my > jobs, so I had to switch my apps to the ia_64 to get a full, > successful execution. > > Tibi > > On 7/17/07, Ioan Raicu wrote: > > > > > > > > Mihael Hategan wrote: > > On Tue, 2007-07-17 at 21:43 -0500, Ian Foster wrote: > > > > > > Another (perhaps dumb?) question--it would seem desirable that we be > > able to quickly determine what tasks failed and then (attempt to) rerun > > them in such circumstances. > > > > Here it seems that a lot of effort is required just to determine what > > tasks failed, and I am not sure that the information extracted is enough > > to rerun them. > > > > Normally, a summary of what failed with the reasons is printed on > > stderr, together with the stdout and stderr of the jobs. Perhaps it > > should also go to the log file. > > > > In this case, 2 jobs failed. The 6 failures are due to restarts. Which > > is in agreement with the 2 missing molecules. > > > > When jobs fail, swift should not clean up the job directories so that > > one can do post-mortem debugging. I suggest invoking the application > > manually to see if it's a matter of a bad node or bad data. > > > > The errors happened on 3 different nodes, so I suspect that its not bad > > nodes (as we had previously experience with the stale NFS handle). > > > > Nika, I sent out the actual commands that failed... can you try to run them > > manually to see what happens, and possibly determine why they failed? Can > > you also find out what an exit code of -3 means within the application that > > failed (you might have to look at the app source code, or contact the > > original source code writer). > > > > Ioan > > > > > > > > > > It also seems that we can't easily determine which output files are > > missing. > > > > In the general case we wouldn't be able to, because the exact outputs > > may only be known at run-time. Granted, that kind of dynamics would > > depend on our ability to have nondeterministic files being returned, > > which we haven't gotten around to implementing. But there is a question > > of whether we should try to implement a short term solution that would > > be invalidated by our own plans. > > > > > > > > Ian. > > > > Ian Foster wrote: > > > > > > Ioan: > > > > a) I think this information should be in the bugzilla summary, > > according to our processes? > > > > b) Why did it take so long to get all of the workers working? > > > > c) Can we debug using less than O(800) node hours? > > > > Ian. > > > > bugzilla-daemon at mcs.anl.gov wrote: > > > > > > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 > > > > > > > > > > > > ------- Comment #24 from iraicu at cs.uchicago.edu 2007-07-17 16:08 > > ------- > > So the latest MolDyn's 244 mol run also failed... but I think it made > > it all > > the way to the final few jobs... > > > > The place where I put all the information about the run is at: > > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ > > > > > > Here are the graphs: > > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg > > > > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg > > > > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg > > > > > > The Swift log can be found at: > > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log > > > > > > The Falkon logs are at: > > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/ > > > > > > The 244 mol run was supposed to have 20497 tasks, broken down as > > follows: > > 1 1 1 > > 1 244 244 > > 1 244 244 > > 68 244 16592 > > 1 244 244 > > 11 244 2684 > > 1 244 244 > > 1 244 244 > > ====================== > > 20497 > > > > We had 20495 tasks that exited with an exit code of 0, and 6 tasks > > that exited > > with an exit code of -3. The worker logs don't show anything on the > > stdout or > > stderr of the failed jobs. I looked online what an exit code of -3 > > could mean, > > but didn't find anything. > > Here are the failed 6 tasks: > > Executing task urn:0-9408-1184616132483... Building executable > > command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei > > fe_stdout_m112 > > stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out > > solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out > > solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out > > solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out > > solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 > > fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl > > --nosite > > --resultonly --wham_outputs wf_m112 --solv_lrc_file > > solv_chg_a10_m112_done > > --fe_file fe_solv_m112 Task urn:0-9408-1184616132483 completed with > > exit code -3 in 238 ms > > > > Executing task urn:0-9408-1184616133199... Building executable > > command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei > > fe_stdout_m112 > > stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out > > solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out > > solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out > > solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out > > solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 > > fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl > > --nosite > > --resultonly --wham_outputs wf_m112 --solv_lrc_file > > solv_chg_a10_m112_done > > --fe_file fe_solv_m112 Task urn:0-9408-1184616133199 completed with > > exit code -3 in 201 ms > > > > Executing task urn:0-15036-1184616133342... Building executable > > command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei > > fe_stdout_m179 > > stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out > > solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out > > solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out > > solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out > > solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 > > fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl > > --nosite > > --resultonly --wham_outputs wf_m179 --solv_lrc_file > > solv_chg_a10_m179_done > > --fe_file fe_solv_m179 Task urn:0-15036-1184616133342 completed with > > exit code -3 in 267 ms > > > > Executing task urn:0-15036-1184616133628... Building executable > > command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei > > fe_stdout_m179 > > stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out > > solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out > > solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out > > solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out > > solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 > > fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl > > --nosite > > --resultonly --wham_outputs wf_m179 --solv_lrc_file > > solv_chg_a10_m179_done > > --fe_file fe_solv_m179 Task urn:0-15036-1184616133628 completed with > > exit code -3 in 2368 ms > > > > Executing task urn:0-15036-1184616133528... Building executable > > command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei > > fe_stdout_m179 > > stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out > > solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out > > solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out > > solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out > > solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 > > fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl > > --nosite > > --resultonly --wham_outputs wf_m179 --solv_lrc_file > > solv_chg_a10_m179_done > > --fe_file fe_solv_m179 Task urn:0-15036-1184616133528 completed with > > exit code -3 in 311 ms > > > > Executing task urn:0-9408-1184616130688... Building executable > > command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei > > fe_stdout_m112 > > stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out > > solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out > > solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out > > solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out > > solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 > > fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl > > --nosite > > --resultonly --wham_outputs wf_m112 --solv_lrc_file > > solv_chg_a10_m112_done > > --fe_file fe_solv_m112 Task urn:0-9408-1184616130688 completed with > > exit code -3 in 464 ms > > > > > > Both the Falkon logs and the Swift logs agree on the number of > > submitted tasks, > > number of successful tasks, and number of failed tasks. There were no > > outstanding tasks at the time when the workflow failed. BTW, I > > checked the > > disk space usage after about an hour that the whole experiment > > finished, and > > there was plenty of disk space left. > > > > Yong mentioned that he looked through the output of MolDyn, and there > > were only > > 242 'fe_solv_*' files, so 2 molecule files were missing... one > > question for > > Nika, are the 6 failed tasks the same job, resubmitted? > > Nika, can you add anything more to this? Is there anything else to > > be learned > > from the Swift log, as to why those last few jobs failed? After we > > have tried > > to figure out what happened, can we resume the workflow, and > > hopefully finish > > the last few jobs in another run? > > > > Ioan > > > > > > > > > > -- > > > > Ian Foster, Director, Computation Institute > > Argonne National Laboratory & University of Chicago > > Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 > > Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 > > Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. > > Globus Alliance: www.globus.org. > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > -- > > ============================================ > > Ioan Raicu > > Ph.D. Student > > ============================================ > > Distributed Systems Laboratory > > Computer Science Department > > University of Chicago > > 1100 E. 58th Street, Ryerson Hall > > Chicago, IL 60637 > > ============================================ > > Email: iraicu at cs.uchicago.edu > > Web: http://www.cs.uchicago.edu/~iraicu > > http://dsl.cs.uchicago.edu/ > > ============================================ > > ============================================ > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > From benc at hawaga.org.uk Wed Jul 18 02:45:12 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 18 Jul 2007 07:45:12 +0000 (GMT) Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: <469D8AF9.7070401@mcs.anl.gov> References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov> <469D7D58.8000908@mcs.anl.gov> <469D7E68.9050202@mcs.anl.gov> <469D89EE.5090202@cs.uchicago.edu> <469D8AF9.7070401@mcs.anl.gov> Message-ID: On Tue, 17 Jul 2007, Ian Foster wrote: > Sorry, I was unclear. What I meant was: in the event that Swift decides that > things have "failed" (definitively), it would be good to have something like a > DAGman "rescue dag" that would show exactly what needed to be done to resubmit > a task manually. > Your comment that "If we were to do a resume from Swift, I think it would > automatically resubmit just the failed tasks" suggests that (in effect) we > already ahve this. Swift has resume (though it lists what has been done, not what needs to be done). I think there's something funny with it in the context of this application because of some hacks to work round swift deficiences. -- From bugzilla-daemon at mcs.anl.gov Wed Jul 18 08:18:21 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 18 Jul 2007 08:18:21 -0500 (CDT) Subject: [Swift-devel] [Bug 83] nested loops hung In-Reply-To: Message-ID: <20070718131821.0BA40164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #12 from benc at hawaga.org.uk 2007-07-18 08:18 ------- r920 has a fix for code that looks like that in comment #3 - that code became the regression test tests/language-behaviour/0084-for.swift. Please try out r920 or later and report back. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From nefedova at mcs.anl.gov Wed Jul 18 08:27:56 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Wed, 18 Jul 2007 08:27:56 -0500 Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: <1184733164.14719.5.camel@blabla.mcs.anl.gov> References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov> <469D7D58.8000908@mcs.anl.gov> <469D7E68.9050202@mcs.anl.gov> <1184728284.2004.12.camel@blabla.mcs.anl.gov> <469D8AC4.4010400@cs.uchicago.edu> <1184733164.14719.5.camel@blabla.mcs.anl.gov> Message-ID: <79C29A10-D8AC-43D3-B548-8553B712FDE5@mcs.anl.gov> Sorry I was offline (sick w/cold/fever). I am taking today off as well. I've checked the stderr files from the last run - it looks like 2 jobs failed due to some application-specific reasons. I am Cc Yuqing to see if he has any insights... Here is what i had: WHAM is not converged for solv_chg_m112 WHAM is not converged for solv_chg_m179 So it looks like 2 molecules (out of 244) failed. The last stage of the workflow failed for these molecules because the previous stage(s) produced some wrong/incomplete (?) results. Yuqing, there are 6 directories on tg-login1:/disks/scratchgpfs1/ iraicu/ModLyn/MolDyn-244-ja4ya01d6cti1 (3 for each of the failed molecules). Any ideas what went wrong with these 2 molecules? Nika On Jul 17, 2007, at 11:32 PM, Mihael Hategan wrote: > I don't think these are random failures. In the whole workflow there > were exactly 6 tasks failed. 3 belonging to one job and 3 to the > other. > Statistically, and if Ioan's assertion that they were not sent to the > exact same worker is correct, I'd be pretty confident saying that > it was > due to specific executables failing on specific data (and by that I > would include the possibility of missing data). > > Mihael > > On Tue, 2007-07-17 at 23:18 -0500, Tiberiu Stef-Praun wrote: >> I also had jobs failing at the Argonne site today. >> It seems that the ia_32 were randomly fail on executing some of my >> jobs, so I had to switch my apps to the ia_64 to get a full, >> successful execution. >> >> Tibi >> >> On 7/17/07, Ioan Raicu wrote: >>> >>> >>> >>> Mihael Hategan wrote: >>> On Tue, 2007-07-17 at 21:43 -0500, Ian Foster wrote: >>> >>> >>> Another (perhaps dumb?) question--it would seem desirable that >>> we be >>> able to quickly determine what tasks failed and then (attempt to) >>> rerun >>> them in such circumstances. >>> >>> Here it seems that a lot of effort is required just to determine >>> what >>> tasks failed, and I am not sure that the information extracted is >>> enough >>> to rerun them. >>> >>> Normally, a summary of what failed with the reasons is printed on >>> stderr, together with the stdout and stderr of the jobs. Perhaps it >>> should also go to the log file. >>> >>> In this case, 2 jobs failed. The 6 failures are due to restarts. >>> Which >>> is in agreement with the 2 missing molecules. >>> >>> When jobs fail, swift should not clean up the job directories so >>> that >>> one can do post-mortem debugging. I suggest invoking the application >>> manually to see if it's a matter of a bad node or bad data. >>> >>> The errors happened on 3 different nodes, so I suspect that its >>> not bad >>> nodes (as we had previously experience with the stale NFS handle). >>> >>> Nika, I sent out the actual commands that failed... can you try >>> to run them >>> manually to see what happens, and possibly determine why they >>> failed? Can >>> you also find out what an exit code of -3 means within the >>> application that >>> failed (you might have to look at the app source code, or contact >>> the >>> original source code writer). >>> >>> Ioan >>> >>> >>> >>> >>> It also seems that we can't easily determine which output files are >>> missing. >>> >>> In the general case we wouldn't be able to, because the exact >>> outputs >>> may only be known at run-time. Granted, that kind of dynamics would >>> depend on our ability to have nondeterministic files being returned, >>> which we haven't gotten around to implementing. But there is a >>> question >>> of whether we should try to implement a short term solution that >>> would >>> be invalidated by our own plans. >>> >>> >>> >>> Ian. >>> >>> Ian Foster wrote: >>> >>> >>> Ioan: >>> >>> a) I think this information should be in the bugzilla summary, >>> according to our processes? >>> >>> b) Why did it take so long to get all of the workers working? >>> >>> c) Can we debug using less than O(800) node hours? >>> >>> Ian. >>> >>> bugzilla-daemon at mcs.anl.gov wrote: >>> >>> >>> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 >>> >>> >>> >>> >>> >>> ------- Comment #24 from iraicu at cs.uchicago.edu 2007-07-17 16:08 >>> ------- >>> So the latest MolDyn's 244 mol run also failed... but I think it >>> made >>> it all >>> the way to the final few jobs... >>> >>> The place where I put all the information about the run is at: >>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244- >>> mol-failed-7-16-07/ >>> >>> >>> Here are the graphs: >>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244- >>> mol-failed-7-16-07/summary_graph_med.jpg >>> >>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244- >>> mol-failed-7-16-07/task_graph_med.jpg >>> >>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244- >>> mol-failed-7-16-07/executor_graph_med.jpg >>> >>> >>> The Swift log can be found at: >>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244- >>> mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log >>> >>> >>> The Falkon logs are at: >>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244- >>> mol-failed-7-16-07/logs/falkon/ >>> >>> >>> The 244 mol run was supposed to have 20497 tasks, broken down as >>> follows: >>> 1 1 1 >>> 1 244 244 >>> 1 244 244 >>> 68 244 16592 >>> 1 244 244 >>> 11 244 2684 >>> 1 244 244 >>> 1 244 244 >>> ====================== >>> 20497 >>> >>> We had 20495 tasks that exited with an exit code of 0, and 6 tasks >>> that exited >>> with an exit code of -3. The worker logs don't show anything on the >>> stdout or >>> stderr of the failed jobs. I looked online what an exit code of -3 >>> could mean, >>> but didn't find anything. >>> Here are the failed 6 tasks: >>> Executing task urn:0-9408-1184616132483... Building executable >>> command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei >>> fe_stdout_m112 >>> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out >>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out >>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out >>> solv_repu_0.5_0.6_m112.out >>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out >>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 >>> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl >>> --nosite >>> --resultonly --wham_outputs wf_m112 --solv_lrc_file >>> solv_chg_a10_m112_done >>> --fe_file fe_solv_m112 Task urn:0-9408-1184616132483 completed with >>> exit code -3 in 238 ms >>> >>> Executing task urn:0-9408-1184616133199... Building executable >>> command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei >>> fe_stdout_m112 >>> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out >>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out >>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out >>> solv_repu_0.5_0.6_m112.out >>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out >>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 >>> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl >>> --nosite >>> --resultonly --wham_outputs wf_m112 --solv_lrc_file >>> solv_chg_a10_m112_done >>> --fe_file fe_solv_m112 Task urn:0-9408-1184616133199 completed with >>> exit code -3 in 201 ms >>> >>> Executing task urn:0-15036-1184616133342... Building executable >>> command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei >>> fe_stdout_m179 >>> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out >>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out >>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out >>> solv_repu_0.5_0.6_m179.out >>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out >>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 >>> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl >>> --nosite >>> --resultonly --wham_outputs wf_m179 --solv_lrc_file >>> solv_chg_a10_m179_done >>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133342 completed with >>> exit code -3 in 267 ms >>> >>> Executing task urn:0-15036-1184616133628... Building executable >>> command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei >>> fe_stdout_m179 >>> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out >>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out >>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out >>> solv_repu_0.5_0.6_m179.out >>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out >>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 >>> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl >>> --nosite >>> --resultonly --wham_outputs wf_m179 --solv_lrc_file >>> solv_chg_a10_m179_done >>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133628 completed with >>> exit code -3 in 2368 ms >>> >>> Executing task urn:0-15036-1184616133528... Building executable >>> command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei >>> fe_stdout_m179 >>> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out >>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out >>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out >>> solv_repu_0.5_0.6_m179.out >>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out >>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 >>> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl >>> --nosite >>> --resultonly --wham_outputs wf_m179 --solv_lrc_file >>> solv_chg_a10_m179_done >>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133528 completed with >>> exit code -3 in 311 ms >>> >>> Executing task urn:0-9408-1184616130688... Building executable >>> command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei >>> fe_stdout_m112 >>> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out >>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out >>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out >>> solv_repu_0.5_0.6_m112.out >>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out >>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 >>> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl >>> --nosite >>> --resultonly --wham_outputs wf_m112 --solv_lrc_file >>> solv_chg_a10_m112_done >>> --fe_file fe_solv_m112 Task urn:0-9408-1184616130688 completed with >>> exit code -3 in 464 ms >>> >>> >>> Both the Falkon logs and the Swift logs agree on the number of >>> submitted tasks, >>> number of successful tasks, and number of failed tasks. There >>> were no >>> outstanding tasks at the time when the workflow failed. BTW, I >>> checked the >>> disk space usage after about an hour that the whole experiment >>> finished, and >>> there was plenty of disk space left. >>> >>> Yong mentioned that he looked through the output of MolDyn, and >>> there >>> were only >>> 242 'fe_solv_*' files, so 2 molecule files were missing... one >>> question for >>> Nika, are the 6 failed tasks the same job, resubmitted? >>> Nika, can you add anything more to this? Is there anything else to >>> be learned >>> from the Swift log, as to why those last few jobs failed? After we >>> have tried >>> to figure out what happened, can we resume the workflow, and >>> hopefully finish >>> the last few jobs in another run? >>> >>> Ioan >>> >>> >>> >>> >>> -- >>> >>> Ian Foster, Director, Computation Institute >>> Argonne National Laboratory & University of Chicago >>> Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 >>> Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 >>> Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. >>> Globus Alliance: www.globus.org. >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> >>> >>> -- >>> ============================================ >>> Ioan Raicu >>> Ph.D. Student >>> ============================================ >>> Distributed Systems Laboratory >>> Computer Science Department >>> University of Chicago >>> 1100 E. 58th Street, Ryerson Hall >>> Chicago, IL 60637 >>> ============================================ >>> Email: iraicu at cs.uchicago.edu >>> Web: http://www.cs.uchicago.edu/~iraicu >>> http://dsl.cs.uchicago.edu/ >>> ============================================ >>> ============================================ >>> >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> >> >> > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From iraicu at cs.uchicago.edu Wed Jul 18 08:58:20 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Wed, 18 Jul 2007 08:58:20 -0500 Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules In-Reply-To: <1184733164.14719.5.camel@blabla.mcs.anl.gov> References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov> <469D7D58.8000908@mcs.anl.gov> <469D7E68.9050202@mcs.anl.gov> <1184728284.2004.12.camel@blabla.mcs.anl.gov> <469D8AC4.4010400@cs.uchicago.edu> <1184733164.14719.5.camel@blabla.mcs.anl.gov> Message-ID: <469E1C7C.5050703@cs.uchicago.edu> The 4 machines that failed 6 jobs were: tg-c055 tg-v028 tg-v092 tg-v023 Note that there is a 64 bit one, and 3 32 bit ones.... also, I had two workers on each machine, only one worker on each machine failed some job... if it was indeed a node hardware problem, I would have expected that both workers on that machine to have failed jobs... I concur with Mihael that there might have been incomplete or missing data... we just have to find out if that is possible despite the previous stages all exiting with an exit code of 0. Yuqing (the domain/app specific expert) is probably the key to finding out what happened in this run with these failed 6 jobs. Nika, did you try to run the jobs manually to see if they fail on the same -3 exit code? Ioan Mihael Hategan wrote: > I don't think these are random failures. In the whole workflow there > were exactly 6 tasks failed. 3 belonging to one job and 3 to the other. > Statistically, and if Ioan's assertion that they were not sent to the > exact same worker is correct, I'd be pretty confident saying that it was > due to specific executables failing on specific data (and by that I > would include the possibility of missing data). > > Mihael > > On Tue, 2007-07-17 at 23:18 -0500, Tiberiu Stef-Praun wrote: > >> I also had jobs failing at the Argonne site today. >> It seems that the ia_32 were randomly fail on executing some of my >> jobs, so I had to switch my apps to the ia_64 to get a full, >> successful execution. >> >> Tibi >> >> On 7/17/07, Ioan Raicu wrote: >> >>> >>> Mihael Hategan wrote: >>> On Tue, 2007-07-17 at 21:43 -0500, Ian Foster wrote: >>> >>> >>> Another (perhaps dumb?) question--it would seem desirable that we be >>> able to quickly determine what tasks failed and then (attempt to) rerun >>> them in such circumstances. >>> >>> Here it seems that a lot of effort is required just to determine what >>> tasks failed, and I am not sure that the information extracted is enough >>> to rerun them. >>> >>> Normally, a summary of what failed with the reasons is printed on >>> stderr, together with the stdout and stderr of the jobs. Perhaps it >>> should also go to the log file. >>> >>> In this case, 2 jobs failed. The 6 failures are due to restarts. Which >>> is in agreement with the 2 missing molecules. >>> >>> When jobs fail, swift should not clean up the job directories so that >>> one can do post-mortem debugging. I suggest invoking the application >>> manually to see if it's a matter of a bad node or bad data. >>> >>> The errors happened on 3 different nodes, so I suspect that its not bad >>> nodes (as we had previously experience with the stale NFS handle). >>> >>> Nika, I sent out the actual commands that failed... can you try to run them >>> manually to see what happens, and possibly determine why they failed? Can >>> you also find out what an exit code of -3 means within the application that >>> failed (you might have to look at the app source code, or contact the >>> original source code writer). >>> >>> Ioan >>> >>> >>> >>> >>> It also seems that we can't easily determine which output files are >>> missing. >>> >>> In the general case we wouldn't be able to, because the exact outputs >>> may only be known at run-time. Granted, that kind of dynamics would >>> depend on our ability to have nondeterministic files being returned, >>> which we haven't gotten around to implementing. But there is a question >>> of whether we should try to implement a short term solution that would >>> be invalidated by our own plans. >>> >>> >>> >>> Ian. >>> >>> Ian Foster wrote: >>> >>> >>> Ioan: >>> >>> a) I think this information should be in the bugzilla summary, >>> according to our processes? >>> >>> b) Why did it take so long to get all of the workers working? >>> >>> c) Can we debug using less than O(800) node hours? >>> >>> Ian. >>> >>> bugzilla-daemon at mcs.anl.gov wrote: >>> >>> >>> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72 >>> >>> >>> >>> >>> >>> ------- Comment #24 from iraicu at cs.uchicago.edu 2007-07-17 16:08 >>> ------- >>> So the latest MolDyn's 244 mol run also failed... but I think it made >>> it all >>> the way to the final few jobs... >>> >>> The place where I put all the information about the run is at: >>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ >>> >>> >>> Here are the graphs: >>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg >>> >>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg >>> >>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg >>> >>> >>> The Swift log can be found at: >>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log >>> >>> >>> The Falkon logs are at: >>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/ >>> >>> >>> The 244 mol run was supposed to have 20497 tasks, broken down as >>> follows: >>> 1 1 1 >>> 1 244 244 >>> 1 244 244 >>> 68 244 16592 >>> 1 244 244 >>> 11 244 2684 >>> 1 244 244 >>> 1 244 244 >>> ====================== >>> 20497 >>> >>> We had 20495 tasks that exited with an exit code of 0, and 6 tasks >>> that exited >>> with an exit code of -3. The worker logs don't show anything on the >>> stdout or >>> stderr of the failed jobs. I looked online what an exit code of -3 >>> could mean, >>> but didn't find anything. >>> Here are the failed 6 tasks: >>> Executing task urn:0-9408-1184616132483... Building executable >>> command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei >>> fe_stdout_m112 >>> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out >>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out >>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out >>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out >>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 >>> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl >>> --nosite >>> --resultonly --wham_outputs wf_m112 --solv_lrc_file >>> solv_chg_a10_m112_done >>> --fe_file fe_solv_m112 Task urn:0-9408-1184616132483 completed with >>> exit code -3 in 238 ms >>> >>> Executing task urn:0-9408-1184616133199... Building executable >>> command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei >>> fe_stdout_m112 >>> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out >>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out >>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out >>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out >>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 >>> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl >>> --nosite >>> --resultonly --wham_outputs wf_m112 --solv_lrc_file >>> solv_chg_a10_m112_done >>> --fe_file fe_solv_m112 Task urn:0-9408-1184616133199 completed with >>> exit code -3 in 201 ms >>> >>> Executing task urn:0-15036-1184616133342... Building executable >>> command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei >>> fe_stdout_m179 >>> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out >>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out >>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out >>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out >>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 >>> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl >>> --nosite >>> --resultonly --wham_outputs wf_m179 --solv_lrc_file >>> solv_chg_a10_m179_done >>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133342 completed with >>> exit code -3 in 267 ms >>> >>> Executing task urn:0-15036-1184616133628... Building executable >>> command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei >>> fe_stdout_m179 >>> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out >>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out >>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out >>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out >>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 >>> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl >>> --nosite >>> --resultonly --wham_outputs wf_m179 --solv_lrc_file >>> solv_chg_a10_m179_done >>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133628 completed with >>> exit code -3 in 2368 ms >>> >>> Executing task urn:0-15036-1184616133528... Building executable >>> command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei >>> fe_stdout_m179 >>> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out >>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out >>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out >>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out >>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179 >>> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl >>> --nosite >>> --resultonly --wham_outputs wf_m179 --solv_lrc_file >>> solv_chg_a10_m179_done >>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133528 completed with >>> exit code -3 in 311 ms >>> >>> Executing task urn:0-9408-1184616130688... Building executable >>> command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei >>> fe_stdout_m112 >>> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out >>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out >>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out >>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out >>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112 >>> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl >>> --nosite >>> --resultonly --wham_outputs wf_m112 --solv_lrc_file >>> solv_chg_a10_m112_done >>> --fe_file fe_solv_m112 Task urn:0-9408-1184616130688 completed with >>> exit code -3 in 464 ms >>> >>> >>> Both the Falkon logs and the Swift logs agree on the number of >>> submitted tasks, >>> number of successful tasks, and number of failed tasks. There were no >>> outstanding tasks at the time when the workflow failed. BTW, I >>> checked the >>> disk space usage after about an hour that the whole experiment >>> finished, and >>> there was plenty of disk space left. >>> >>> Yong mentioned that he looked through the output of MolDyn, and there >>> were only >>> 242 'fe_solv_*' files, so 2 molecule files were missing... one >>> question for >>> Nika, are the 6 failed tasks the same job, resubmitted? >>> Nika, can you add anything more to this? Is there anything else to >>> be learned >>> from the Swift log, as to why those last few jobs failed? After we >>> have tried >>> to figure out what happened, can we resume the workflow, and >>> hopefully finish >>> the last few jobs in another run? >>> >>> Ioan >>> >>> >>> >>> >>> -- >>> >>> Ian Foster, Director, Computation Institute >>> Argonne National Laboratory & University of Chicago >>> Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 >>> Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 >>> Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. >>> Globus Alliance: www.globus.org. >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> >>> >>> -- >>> ============================================ >>> Ioan Raicu >>> Ph.D. Student >>> ============================================ >>> Distributed Systems Laboratory >>> Computer Science Department >>> University of Chicago >>> 1100 E. 58th Street, Ryerson Hall >>> Chicago, IL 60637 >>> ============================================ >>> Email: iraicu at cs.uchicago.edu >>> Web: http://www.cs.uchicago.edu/~iraicu >>> http://dsl.cs.uchicago.edu/ >>> ============================================ >>> ============================================ >>> >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> >>> >> > > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Wed Jul 18 11:18:57 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 18 Jul 2007 16:18:57 +0000 (GMT) Subject: [Swift-devel] kickstart on regular sites Message-ID: There was some discussion many months ago about installing kickstart on the sites that our users use regularly; and cataloging that information in the same standard site catalog that lists all the sites we have. That never happened though, for whatever reason. It might be useful to do that though; I think any OSG site will have it installated already as part of the OSG standard software stack (?). -- From foster at mcs.anl.gov Wed Jul 18 16:46:40 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Wed, 18 Jul 2007 16:46:40 -0500 Subject: [Swift-devel] kickstart on regular sites In-Reply-To: References: Message-ID: <469E8A40.50602@mcs.anl.gov> The set of sites we run on is so small, and the amount of time we spend saying "we don't know exactly what happened because kickstart wasn't installed" so large, that maybe we should do this :0( Note that TG has a software catalog (MDS based) for just this sort of information Ben Clifford wrote: > There was some discussion many months ago about installing kickstart on > the sites that our users use regularly; and cataloging that information in > the same standard site catalog that lists all the sites we have. > > That never happened though, for whatever reason. > > It might be useful to do that though; I think any OSG site will have it > installated already as part of the OSG standard software stack (?). > > -- Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org. From hategan at mcs.anl.gov Wed Jul 18 17:06:39 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 18 Jul 2007 17:06:39 -0500 Subject: [Swift-devel] kickstart on regular sites In-Reply-To: <469E8A40.50602@mcs.anl.gov> References: <469E8A40.50602@mcs.anl.gov> Message-ID: <1184796399.9931.10.camel@blabla.mcs.anl.gov> On Wed, 2007-07-18 at 16:46 -0500, Ian Foster wrote: > The set of sites we run on is so small, and the amount of time we spend > saying "we don't know exactly what happened because kickstart wasn't > installed" so large, that maybe we should do this :0( Good point. It's probably less than the time I spend on email replies saying that kicstart is not going to be the answer to all our problems. Mihael > > Note that TG has a software catalog (MDS based) for just this sort of > information > > Ben Clifford wrote: > > There was some discussion many months ago about installing kickstart on > > the sites that our users use regularly; and cataloging that information in > > the same standard site catalog that lists all the sites we have. > > > > That never happened though, for whatever reason. > > > > It might be useful to do that though; I think any OSG site will have it > > installated already as part of the OSG standard software stack (?). > > > > > From itf at mcs.anl.gov Wed Jul 18 17:24:49 2007 From: itf at mcs.anl.gov (=?utf-8?B?SWFuIEZvc3Rlcg==?=) Date: Wed, 18 Jul 2007 22:24:49 +0000 Subject: [Swift-devel] kickstart on regular sites In-Reply-To: <1184796399.9931.10.camel@blabla.mcs.anl.gov> References: <469E8A40.50602@mcs.anl.gov><1184796399.9931.10.camel@blabla.mcs.anl.gov> Message-ID: <160288387-1184797558-cardhu_decombobulator_blackberry.rim.net-198160359-@bxe009.bisx.prod.on.blackberry> It isn't? :-) Sent via BlackBerry from T-Mobile -----Original Message----- From: Mihael Hategan Date: Wed, 18 Jul 2007 17:06:39 To:Ian Foster Cc:Ben Clifford , swift-devel at ci.uchicago.edu Subject: Re: [Swift-devel] kickstart on regular sites On Wed, 2007-07-18 at 16:46 -0500, Ian Foster wrote: > The set of sites we run on is so small, and the amount of time we spend > saying "we don't know exactly what happened because kickstart wasn't > installed" so large, that maybe we should do this :0( Good point. It's probably less than the time I spend on email replies saying that kicstart is not going to be the answer to all our problems. Mihael > > Note that TG has a software catalog (MDS based) for just this sort of > information > > Ben Clifford wrote: > > There was some discussion many months ago about installing kickstart on > > the sites that our users use regularly; and cataloging that information in > > the same standard site catalog that lists all the sites we have. > > > > That never happened though, for whatever reason. > > > > It might be useful to do that though; I think any OSG site will have it > > installated already as part of the OSG standard software stack (?). > > > > > From benc at hawaga.org.uk Wed Jul 18 19:57:36 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 19 Jul 2007 00:57:36 +0000 (GMT) Subject: [Swift-devel] kickstart on regular sites In-Reply-To: <1184796399.9931.10.camel@blabla.mcs.anl.gov> References: <469E8A40.50602@mcs.anl.gov> <1184796399.9931.10.camel@blabla.mcs.anl.gov> Message-ID: > Good point. It's probably less than the time I spend on email replies > saying that kicstart is not going to be the answer to all our problems. based on my experience with VDS, kickstart is the answer to a large portion of the problems that my users were experiencing; this was, however, with a codebase that had been substantially used and debugged over some years and so I think that experience doesn't reflect the swift situation where its often Swift and associated components that don't work, rather than remote sites that don't work. -- From hategan at mcs.anl.gov Wed Jul 18 20:11:43 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 18 Jul 2007 20:11:43 -0500 Subject: [Swift-devel] kickstart on regular sites In-Reply-To: References: <469E8A40.50602@mcs.anl.gov> <1184796399.9931.10.camel@blabla.mcs.anl.gov> Message-ID: <1184807503.23345.3.camel@blabla.mcs.anl.gov> Right. I don't think we were debating that. But the fact that given the low number of sites, we might as well install (or query the MDS server) and use kickstart and avoid the debate altogether. On Thu, 2007-07-19 at 00:57 +0000, Ben Clifford wrote: > > > Good point. It's probably less than the time I spend on email replies > > saying that kicstart is not going to be the answer to all our problems. > > based on my experience with VDS, kickstart is the answer to a large > portion of the problems that my users were experiencing; this was, > however, with a codebase that had been substantially used and debugged > over some years and so I think that experience doesn't reflect the swift > situation where its often Swift and associated components that don't work, > rather than remote sites that don't work. > From nefedova at mcs.anl.gov Thu Jul 19 07:24:58 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Thu, 19 Jul 2007 07:24:58 -0500 Subject: [Swift-devel] off through the end of the week Message-ID: <18EBDEDE-9A7E-49AE-96C1-A08F7903298C@mcs.anl.gov> Sorry I have to take the rest of the week off as a sick days -- I saw my Dr. yesterday and he diagnosed me with West Nile virus ); I should be OK by next week I hope. Nika From benc at hawaga.org.uk Thu Jul 19 13:13:22 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 19 Jul 2007 18:13:22 +0000 (GMT) Subject: [Swift-devel] 0.2 release (again) In-Reply-To: References: Message-ID: This passes my fairly lightweight testing; and no one else has commented (though I suspect that means no one has used it, rather than people have tested it successfully). However, that's enough for me to put it up as a lightweight release for now, which I have done. On Tue, 17 Jul 2007, Ben Clifford wrote: > > > On Tue, 17 Jul 2007, Ben Clifford wrote: > > > I'm building a release candidate for a low-effort 0.2 release from swift > > r915 and cog r1658. Will post here with it later on. > > http://www.ci.uchicago.edu/~benc/vdsk-0.2.tar.gz > > $ md5sum vdsk-0.2.tar.gz > 25130bbe97f2f10653b48968953c6d84 vdsk-0.2.tar.gz > > It runs hello world for me. I haven't done any other testing. > > From benc at hawaga.org.uk Thu Jul 19 13:14:34 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 19 Jul 2007 18:14:34 +0000 (GMT) Subject: [Swift-devel] 0.2 release (again) In-Reply-To: References: Message-ID: sufficiently lightweight, however, that I did not go through the commit messages since 0.1 to prepare detailed release notes. (I don't see an immediately obvious way to do it with svn (?!)) -- From benc at hawaga.org.uk Fri Jul 20 09:23:27 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 20 Jul 2007 14:23:27 +0000 (GMT) Subject: [Swift-devel] numeric type(s) in swift. In-Reply-To: References: Message-ID: On Mon, 16 Jul 2007, Ben Clifford wrote: > However, now that I look at implementing those, it makes me wonder if we > should have a single numeric type. Its not clear that we need float/double > in the language as distinct types. does any one have any particular preferences for numeric types? In particular has anyone used anything other than 'int' for anything? -- From yongzh at cs.uchicago.edu Fri Jul 20 09:32:46 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Fri, 20 Jul 2007 09:32:46 -0500 (CDT) Subject: [Swift-devel] numeric type(s) in swift. In-Reply-To: References: Message-ID: I've used float. I think the problem is on the contrary that int and float may not be enough, we may need more numeric types. The issues we are having now is just we need a vdl library to deal with numeric operations, instead of relying on karajan (karajan only has double type, which is not good for cases when we only need int). I'd suggust we understand real user requirements before jumping into solutions. Yong. On Fri, 20 Jul 2007, Ben Clifford wrote: > > > On Mon, 16 Jul 2007, Ben Clifford wrote: > > > However, now that I look at implementing those, it makes me wonder if we > > should have a single numeric type. Its not clear that we need float/double > > in the language as distinct types. > > does any one have any particular preferences for numeric types? > > In particular has anyone used anything other than 'int' for anything? > > -- > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Fri Jul 20 09:37:14 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 20 Jul 2007 14:37:14 +0000 (GMT) Subject: [Swift-devel] numeric type(s) in swift. In-Reply-To: References: Message-ID: On Fri, 20 Jul 2007, Yong Zhao wrote: > I've used float. I think the problem is on the contrary that int and float > may not be enough, we may need more numeric types. > > The issues we are having now is just we need a vdl library to deal with > numeric operations, instead of relying on karajan (karajan only has double > type, which is not good for cases when we only need int). There's a type issue to. What is the type of this expression? 5 + 3 and should this be permitted? float f = 5 + 3; There's a bunch of type conversion going on at the moment that isn't terribly well defined and that causes me trouble when I want to put in more type information/checking. I bring this up because its getting in the way of my proper-xml-intermediate-format work. -- From wilde at mcs.anl.gov Fri Jul 20 10:12:17 2007 From: wilde at mcs.anl.gov (Mike Wilde) Date: Fri, 20 Jul 2007 10:12:17 -0500 Subject: [Swift-devel] numeric type(s) in swift. In-Reply-To: References: Message-ID: <46A0D0D1.6070407@mcs.anl.gov> Can we leave things as they are for the moment and come back to this when we have more concrete examples? I certainly see the need to: a) describe atomic functions that have numeric args b) do minor calculations on those args, in swift, between calls How we do b) will be strongly affected by where we go in the "fold" issue, so lets gather some app examples to drive this decision. Seems like we can always do (b) in another language, so we can always "get by" by having all args be strings for the moment. Not pretty, but it lowers the urgency of an immediate decision. I think also that at some point we'll need to reconcile whether we support all (or more) of the primitive data types of XML Schema, which has more numeric and date types. Is there any app-based request in bugzilla right now that demands a more immediate resolution of this issue? Yong, can you post the example you had of using a float as an arg? Did you do any swift calculates on those float values in this example? Thanks, Mike Yong Zhao wrote, On 7/20/2007 9:32 AM: > I've used float. I think the problem is on the contrary that int and float > may not be enough, we may need more numeric types. > > The issues we are having now is just we need a vdl library to deal with > numeric operations, instead of relying on karajan (karajan only has double > type, which is not good for cases when we only need int). > > I'd suggust we understand real user requirements before jumping into > solutions. > > Yong. > > On Fri, 20 Jul 2007, Ben Clifford wrote: > >> >> On Mon, 16 Jul 2007, Ben Clifford wrote: >> >>> However, now that I look at implementing those, it makes me wonder if we >>> should have a single numeric type. Its not clear that we need float/double >>> in the language as distinct types. >> does any one have any particular preferences for numeric types? >> >> In particular has anyone used anything other than 'int' for anything? >> >> -- >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Mike Wilde Computation Institute, University of Chicago Math & Computer Science Division Argonne National Laboratory Argonne, IL 60439 USA tel 630-252-7497 fax 630-252-1997 From benc at hawaga.org.uk Fri Jul 20 11:09:00 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 20 Jul 2007 16:09:00 +0000 (GMT) Subject: [Swift-devel] numeric type(s) in swift. In-Reply-To: References: Message-ID: some of the way that numbers are implemented at the moment, they don't even keep their declared types. the following puts the string 3.5 into a file, despite the fact that there's an integer type involved which should be doing something else (causing an error or rounding, most likely). that's behaviour that's consistent with having a single 'number' type rather than multiple strong number types. type messagefile {} (messagefile t) greeting(float m) { app { echo m stdout=@filename(t); } } float f = 7/2; int i = f; messagefile outfile <"j-echo.out">; outfile = greeting(i); From benc at hawaga.org.uk Fri Jul 20 11:47:46 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 20 Jul 2007 16:47:46 +0000 (GMT) Subject: [Swift-devel] numeric type(s) in swift. In-Reply-To: <46A0D0D1.6070407@mcs.anl.gov> References: <46A0D0D1.6070407@mcs.anl.gov> Message-ID: On Fri, 20 Jul 2007, Mike Wilde wrote: > Can we leave things as they are for the moment and come back to this when we > have more concrete examples? not really - its sufficiently poorly defined and badly behaved at the moment that its causing me trouble with the bug 30 work - I'm doing things which make stronger type demands than have been previously needed, and so the typing needs to be more consistent. In the absence of any particularly compelling argument in any way, I'll make it consistent in the way that is easiest to me. -- From benc at hawaga.org.uk Fri Jul 20 12:11:50 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 20 Jul 2007 17:11:50 +0000 (GMT) Subject: [Swift-devel] numeric type(s) in swift. In-Reply-To: <46A0D0D1.6070407@mcs.anl.gov> References: <46A0D0D1.6070407@mcs.anl.gov> Message-ID: On Fri, 20 Jul 2007, Mike Wilde wrote: > Is there any app-based request in bugzilla right now that demands a more > immediate resolution of this issue? its more that it comes from me trying to do the bug 30 rewrite of the intermediate format - every bug that depends on that has a workaround being used by apps as they encounter them, however its a serious usability problem in terms of people writing code they think will work and finding it doesn't (and worse, finding it fails in mysterious ways). > Seems like we can always do (b) in another language, so we can always > "get by" by having all args be strings for the moment. Not pretty, but > it lowers the urgency of an immediate decision. A 'numeric' type that makes no more constraint on its content looks very much like a string; but is still typed enough to know that you can use + or - or / or * on it. There's no shame in that. You say: > it lowers the urgency of an immediate decision. but deciding (if we do or not) on this approach *is* the kind of decision that I'm looking for! > I think also that at some point we'll need to reconcile whether we > support all (or more) of the primitive data types of XML Schema, which > has more numeric and date types. 'int' isn't even an XML Schema primitive type - its defined as a restriction of a more general type... Our present type model looks almost entirely unlike XML Schema. -- From wilde at mcs.anl.gov Fri Jul 20 13:30:13 2007 From: wilde at mcs.anl.gov (Mike Wilde) Date: Fri, 20 Jul 2007 13:30:13 -0500 Subject: [Swift-devel] numeric type(s) in swift. In-Reply-To: References: <46A0D0D1.6070407@mcs.anl.gov> Message-ID: <46A0FF35.9000108@mcs.anl.gov> Ben Clifford wrote, On 7/20/2007 11:47 AM: > > On Fri, 20 Jul 2007, Mike Wilde wrote: > >> Can we leave things as they are for the moment and come back to this when we >> have more concrete examples? > > not really - its sufficiently poorly defined and badly behaved at the > moment that its causing me trouble with the bug 30 work - I'm doing things > which make stronger type demands than have been previously needed, and so > the typing needs to be more consistent. OK, thats what I wanted to know. So we do need to discuss it now. > > In the absence of any particularly compelling argument in any way, I'll > make it consistent in the way that is easiest to me. Im eager to hear what you propose, but reserve the right to call for more discussion if I feel its necessary. I suggested we defer the discussion because I felt that issues of mapping are more important, and I thought those were independent of the nature of numeric types. But if you feel bug 30 is compelling enough to force a decision on this now, then we should discuss deeper. - Mike > -- Mike Wilde Computation Institute, University of Chicago Math & Computer Science Division Argonne National Laboratory Argonne, IL 60439 USA tel 630-252-7497 fax 630-252-1997 From wilde at mcs.anl.gov Fri Jul 20 14:52:38 2007 From: wilde at mcs.anl.gov (Mike Wilde) Date: Fri, 20 Jul 2007 14:52:38 -0500 Subject: [Swift-devel] numeric type(s) in swift. In-Reply-To: References: <46A0D0D1.6070407@mcs.anl.gov> Message-ID: <46A11286.7080807@mcs.anl.gov> OK, I just reread the thread from the top, and have some thoughts on what our alternatives are. Some of the road forward depends on: 1) whether we care about breaking current code 2) how well the current code can handle (or be taught) type coercions If I had to pick a simple system, I'd pick either: a) just strings b) just string and ints c) just strings and floats where floats act like ints when they have integral values (many systems are like this) d) strings, ints and floats with fully manual coercions e) strings, ints and floats with reasonable auto coercions ala C My pref would be (e) if thats easy to implement. Forget what I said abut XML-Schema types earlier. Do the choices above cover the range of reasonable choices? What are the major open issues, give, say (e)? - Mike Ben Clifford wrote, On 7/20/2007 12:11 PM: > > On Fri, 20 Jul 2007, Mike Wilde wrote: > >> Is there any app-based request in bugzilla right now that demands a more >> immediate resolution of this issue? > > its more that it comes from me trying to do the bug 30 rewrite of the > intermediate format - every bug that depends on that has a workaround > being used by apps as they encounter them, however its a serious usability > problem in terms of people writing code they think will work and finding > it doesn't (and worse, finding it fails in mysterious ways). > >> Seems like we can always do (b) in another language, so we can always >> "get by" by having all args be strings for the moment. Not pretty, but >> it lowers the urgency of an immediate decision. > > A 'numeric' type that makes no more constraint on its content looks very > much like a string; but is still typed enough to know that you can use + > or - or / or * on it. There's no shame in that. > > You say: > >> it lowers the urgency of an immediate decision. > > but deciding (if we do or not) on this approach *is* the kind of decision > that I'm looking for! > >> I think also that at some point we'll need to reconcile whether we >> support all (or more) of the primitive data types of XML Schema, which >> has more numeric and date types. > > 'int' isn't even an XML Schema primitive type - its defined as a > restriction of a more general type... Our present type model looks almost > entirely unlike XML Schema. > -- Mike Wilde Computation Institute, University of Chicago Math & Computer Science Division Argonne National Laboratory Argonne, IL 60439 USA tel 630-252-7497 fax 630-252-1997 From hategan at mcs.anl.gov Fri Jul 20 15:49:08 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 20 Jul 2007 15:49:08 -0500 Subject: [Swift-devel] numeric type(s) in swift. In-Reply-To: <46A11286.7080807@mcs.anl.gov> References: <46A0D0D1.6070407@mcs.anl.gov> <46A11286.7080807@mcs.anl.gov> Message-ID: <1184964549.26024.0.camel@blabla.mcs.anl.gov> On Fri, 2007-07-20 at 14:52 -0500, Mike Wilde wrote: > OK, I just reread the thread from the top, and have some thoughts on what our > alternatives are. > > Some of the road forward depends on: > > 1) whether we care about breaking current code > 2) how well the current code can handle (or be taught) type coercions > > If I had to pick a simple system, I'd pick either: > > a) just strings > b) just string and ints > c) just strings and floats where floats act like ints when they have integral > values (many systems are like this) > d) strings, ints and floats with fully manual coercions > e) strings, ints and floats with reasonable auto coercions ala C > > My pref would be (e) if thats easy to implement. > > Forget what I said abut XML-Schema types earlier. > > Do the choices above cover the range of reasonable choices? > > What are the major open issues, give, say (e)? I don't see many. I've been chatting with ben and decided it's probably worth trying it on a separate branch. > > - Mike > > > > > Ben Clifford wrote, On 7/20/2007 12:11 PM: > > > > On Fri, 20 Jul 2007, Mike Wilde wrote: > > > >> Is there any app-based request in bugzilla right now that demands a more > >> immediate resolution of this issue? > > > > its more that it comes from me trying to do the bug 30 rewrite of the > > intermediate format - every bug that depends on that has a workaround > > being used by apps as they encounter them, however its a serious usability > > problem in terms of people writing code they think will work and finding > > it doesn't (and worse, finding it fails in mysterious ways). > > > >> Seems like we can always do (b) in another language, so we can always > >> "get by" by having all args be strings for the moment. Not pretty, but > >> it lowers the urgency of an immediate decision. > > > > A 'numeric' type that makes no more constraint on its content looks very > > much like a string; but is still typed enough to know that you can use + > > or - or / or * on it. There's no shame in that. > > > > You say: > > > >> it lowers the urgency of an immediate decision. > > > > but deciding (if we do or not) on this approach *is* the kind of decision > > that I'm looking for! > > > >> I think also that at some point we'll need to reconcile whether we > >> support all (or more) of the primitive data types of XML Schema, which > >> has more numeric and date types. > > > > 'int' isn't even an XML Schema primitive type - its defined as a > > restriction of a more general type... Our present type model looks almost > > entirely unlike XML Schema. > > > From benc at hawaga.org.uk Fri Jul 20 17:12:49 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 20 Jul 2007 22:12:49 +0000 (GMT) Subject: [Swift-devel] numeric type(s) in swift. In-Reply-To: <1184964549.26024.0.camel@blabla.mcs.anl.gov> References: <46A0D0D1.6070407@mcs.anl.gov> <46A11286.7080807@mcs.anl.gov> <1184964549.26024.0.camel@blabla.mcs.anl.gov> Message-ID: I made a branch with the relevant patches from my quilt patch stack. https://svn.ci.uchicago.edu/svn/vdl2/branches/types-and-expressions In r940, I remove non-integer numbers from by language by virtue of removing the test cases from language-behaviour for them (but no actual code changes). If you want to run the language-behaviour tests with the non-integer tests in there again, roll back r940 in your local repo. The two biggest changes are r941 which makes much more stuff be wrapped in DSHandles, and r942 which is adjustment to the intermediate language to have XML based expressions. As a consequence of r942, the resulting karajan code has a lot more cruft in it (but should still behave as previously). I'm intending to work on that more so don't be alarmed. Type this for the commit logs so far: svn log https://svn.ci.uchicago.edu/svn/vdl2/branches/types-and-expressions -r933:HEAD -- -- From bugzilla-daemon at mcs.anl.gov Mon Jul 23 08:45:25 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 23 Jul 2007 08:45:25 -0500 (CDT) Subject: [Swift-devel] [Bug 80] simple_mapper strange prefix behaviour In-Reply-To: Message-ID: <20070723134525.700EC164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=80 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #1 from benc at hawaga.org.uk 2007-07-23 08:45 ------- This looks very strongly related to bug 10. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Mon Jul 23 08:50:29 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 23 Jul 2007 08:50:29 -0500 (CDT) Subject: [Swift-devel] [Bug 30] swiftscript XML language should express expressions in XML rather than as string literals In-Reply-To: Message-ID: <20070723135029.A95A5164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=30 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |swift-devel at ci.uchicago.edu Status|NEW |ASSIGNED ------- Comment #2 from benc at hawaga.org.uk 2007-07-23 08:50 ------- I have this implemented except for issues raised with numerical types, which Mihael is investigating. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From benc at hawaga.org.uk Mon Jul 23 10:08:51 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 23 Jul 2007 15:08:51 +0000 (GMT) Subject: [Swift-devel] VDS1 transfer executable Message-ID: VDS1 has a utility, transfer, which is for use on the worker nodes to stage data in and out. It seems fairly seriously worth considering using that, rather than re-implementing stuff from ground up. -- From benc at hawaga.org.uk Mon Jul 23 10:16:23 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 23 Jul 2007 15:16:23 +0000 (GMT) Subject: [Swift-devel] Re: VDS1 transfer executable In-Reply-To: References: Message-ID: the former suggestion comes to mind becase I was just chatting to buzz about dcache and XIO (primarily because I tease him about writing XIO drivers for everything), but then it turns into the serious suggestion that: i) worker-side transfer executable becomes (or is, already, I suspect) XIO-aware ii) xio-dcache driver should be easy to write (by us or by xio people) I'm increasingly more convinced as I think about it that there needs to be an (optional) worker-side transfer executable for decent staging in/out of data on workers; and that maybe we should not mess round with other approaches that skirt round this. -- From hategan at mcs.anl.gov Mon Jul 23 10:17:39 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 23 Jul 2007 10:17:39 -0500 Subject: [Swift-devel] VDS1 transfer executable In-Reply-To: References: Message-ID: <1185203859.17343.5.camel@blabla.mcs.anl.gov> I think the reimplementation argument is not universally valid. One must consider costs vs. benefits. On Mon, 2007-07-23 at 15:08 +0000, Ben Clifford wrote: > VDS1 has a utility, transfer, which is for use on the worker nodes to > stage data in and out. > > It seems fairly seriously worth considering using that, rather than > re-implementing stuff from ground up. > From benc at hawaga.org.uk Mon Jul 23 10:19:29 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 23 Jul 2007 15:19:29 +0000 (GMT) Subject: [Swift-devel] VDS1 transfer executable In-Reply-To: <1185203859.17343.5.camel@blabla.mcs.anl.gov> References: <1185203859.17343.5.camel@blabla.mcs.anl.gov> Message-ID: Given that the VDS1 transfer executable exists and appears to work, there would need to be some strong argument to not use that as a base (which there may be, but I don't know of one). On Mon, 23 Jul 2007, Mihael Hategan wrote: > I think the reimplementation argument is not universally valid. One must > consider costs vs. benefits. > > On Mon, 2007-07-23 at 15:08 +0000, Ben Clifford wrote: > > VDS1 has a utility, transfer, which is for use on the worker nodes to > > stage data in and out. > > > > It seems fairly seriously worth considering using that, rather than > > re-implementing stuff from ground up. > > > > From hategan at mcs.anl.gov Mon Jul 23 10:28:07 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 23 Jul 2007 10:28:07 -0500 Subject: [Swift-devel] VDS1 transfer executable In-Reply-To: References: <1185203859.17343.5.camel@blabla.mcs.anl.gov> Message-ID: <1185204487.17343.14.camel@blabla.mcs.anl.gov> Support, throttling, concurrency control. We seem to be fundamentally changing the way things work, and we do that because we can. On Mon, 2007-07-23 at 15:19 +0000, Ben Clifford wrote: > Given that the VDS1 transfer executable exists and appears to work, there > would need to be some strong argument to not use that as a base (which > there may be, but I don't know of one). > > On Mon, 23 Jul 2007, Mihael Hategan wrote: > > > I think the reimplementation argument is not universally valid. One must > > consider costs vs. benefits. > > > > On Mon, 2007-07-23 at 15:08 +0000, Ben Clifford wrote: > > > VDS1 has a utility, transfer, which is for use on the worker nodes to > > > stage data in and out. > > > > > > It seems fairly seriously worth considering using that, rather than > > > re-implementing stuff from ground up. > > > > > > > > From benc at hawaga.org.uk Mon Jul 23 10:29:38 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 23 Jul 2007 15:29:38 +0000 (GMT) Subject: [Swift-devel] VDS1 transfer executable In-Reply-To: <1185204487.17343.14.camel@blabla.mcs.anl.gov> References: <1185203859.17343.5.camel@blabla.mcs.anl.gov> <1185204487.17343.14.camel@blabla.mcs.anl.gov> Message-ID: none of those seem to be arguments for or against rewriting vs reusing. On Mon, 23 Jul 2007, Mihael Hategan wrote: > Support, throttling, concurrency control. We seem to be fundamentally > changing the way things work, and we do that because we can. > > On Mon, 2007-07-23 at 15:19 +0000, Ben Clifford wrote: > > Given that the VDS1 transfer executable exists and appears to work, there > > would need to be some strong argument to not use that as a base (which > > there may be, but I don't know of one). > > > > On Mon, 23 Jul 2007, Mihael Hategan wrote: > > > > > I think the reimplementation argument is not universally valid. One must > > > consider costs vs. benefits. > > > > > > On Mon, 2007-07-23 at 15:08 +0000, Ben Clifford wrote: > > > > VDS1 has a utility, transfer, which is for use on the worker nodes to > > > > stage data in and out. > > > > > > > > It seems fairly seriously worth considering using that, rather than > > > > re-implementing stuff from ground up. > > > > > > > > > > > > > > From hategan at mcs.anl.gov Mon Jul 23 10:37:24 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 23 Jul 2007 10:37:24 -0500 Subject: [Swift-devel] VDS1 transfer executable In-Reply-To: References: <1185203859.17343.5.camel@blabla.mcs.anl.gov> <1185204487.17343.14.camel@blabla.mcs.anl.gov> Message-ID: <1185205044.17343.22.camel@blabla.mcs.anl.gov> They are not in general. They are arguments against reusing a particular thing, which may justify rewriting. On Mon, 2007-07-23 at 15:29 +0000, Ben Clifford wrote: > none of those seem to be arguments for or against rewriting vs reusing. > > On Mon, 23 Jul 2007, Mihael Hategan wrote: > > > Support, throttling, concurrency control. We seem to be fundamentally > > changing the way things work, and we do that because we can. > > > > On Mon, 2007-07-23 at 15:19 +0000, Ben Clifford wrote: > > > Given that the VDS1 transfer executable exists and appears to work, there > > > would need to be some strong argument to not use that as a base (which > > > there may be, but I don't know of one). > > > > > > On Mon, 23 Jul 2007, Mihael Hategan wrote: > > > > > > > I think the reimplementation argument is not universally valid. One must > > > > consider costs vs. benefits. > > > > > > > > On Mon, 2007-07-23 at 15:08 +0000, Ben Clifford wrote: > > > > > VDS1 has a utility, transfer, which is for use on the worker nodes to > > > > > stage data in and out. > > > > > > > > > > It seems fairly seriously worth considering using that, rather than > > > > > re-implementing stuff from ground up. > > > > > > > > > > > > > > > > > > > > > From hategan at mcs.anl.gov Mon Jul 23 10:39:33 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 23 Jul 2007 10:39:33 -0500 Subject: [Swift-devel] VDS1 transfer executable In-Reply-To: <1185204487.17343.14.camel@blabla.mcs.anl.gov> References: <1185203859.17343.5.camel@blabla.mcs.anl.gov> <1185204487.17343.14.camel@blabla.mcs.anl.gov> Message-ID: <1185205173.17343.24.camel@blabla.mcs.anl.gov> Also, we should steer away from C code. We're far more efficient with java (both as programmers and as troubleshooters). On Mon, 2007-07-23 at 10:28 -0500, Mihael Hategan wrote: > Support, throttling, concurrency control. We seem to be fundamentally > changing the way things work, and we do that because we can. > > On Mon, 2007-07-23 at 15:19 +0000, Ben Clifford wrote: > > Given that the VDS1 transfer executable exists and appears to work, there > > would need to be some strong argument to not use that as a base (which > > there may be, but I don't know of one). > > > > On Mon, 23 Jul 2007, Mihael Hategan wrote: > > > > > I think the reimplementation argument is not universally valid. One must > > > consider costs vs. benefits. > > > > > > On Mon, 2007-07-23 at 15:08 +0000, Ben Clifford wrote: > > > > VDS1 has a utility, transfer, which is for use on the worker nodes to > > > > stage data in and out. > > > > > > > > It seems fairly seriously worth considering using that, rather than > > > > re-implementing stuff from ground up. > > > > > > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From wilde at mcs.anl.gov Mon Jul 23 10:40:25 2007 From: wilde at mcs.anl.gov (Mike Wilde) Date: Mon, 23 Jul 2007 10:40:25 -0500 Subject: [Swift-devel] VDS1 transfer executable In-Reply-To: References: <1185203859.17343.5.camel@blabla.mcs.anl.gov> <1185204487.17343.14.camel@blabla.mcs.anl.gov> Message-ID: <46A4CBE9.6060600@mcs.anl.gov> Im in favor of building on transfer. There is also the newer utility "t2". Jens Im sure will be delighted to expound on these. As I recall one was better at some things and the other at others. For example, I think t2 will retry failing I/Os from am alternate PFN if several replicas are available. But perhaps transfer does parallel transfers better, or some such advantage. I need to dive into old email to find the info, but in the meantime Jens or the manpages can probably explain much. - Mike Ben Clifford wrote, On 7/23/2007 10:29 AM: > none of those seem to be arguments for or against rewriting vs reusing. > > On Mon, 23 Jul 2007, Mihael Hategan wrote: > >> Support, throttling, concurrency control. We seem to be fundamentally >> changing the way things work, and we do that because we can. >> >> On Mon, 2007-07-23 at 15:19 +0000, Ben Clifford wrote: >>> Given that the VDS1 transfer executable exists and appears to work, there >>> would need to be some strong argument to not use that as a base (which >>> there may be, but I don't know of one). >>> >>> On Mon, 23 Jul 2007, Mihael Hategan wrote: >>> >>>> I think the reimplementation argument is not universally valid. One must >>>> consider costs vs. benefits. >>>> >>>> On Mon, 2007-07-23 at 15:08 +0000, Ben Clifford wrote: >>>>> VDS1 has a utility, transfer, which is for use on the worker nodes to >>>>> stage data in and out. >>>>> >>>>> It seems fairly seriously worth considering using that, rather than >>>>> re-implementing stuff from ground up. >>>>> >>>> >> > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Mike Wilde Computation Institute, University of Chicago Math & Computer Science Division Argonne National Laboratory Argonne, IL 60439 USA tel 630-252-7497 fax 630-252-1997 From hategan at mcs.anl.gov Mon Jul 23 11:02:59 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 23 Jul 2007 11:02:59 -0500 Subject: [Swift-devel] VDS1 transfer executable In-Reply-To: <46A4CBE9.6060600@mcs.anl.gov> References: <1185203859.17343.5.camel@blabla.mcs.anl.gov> <1185204487.17343.14.camel@blabla.mcs.anl.gov> <46A4CBE9.6060600@mcs.anl.gov> Message-ID: <1185206579.18828.7.camel@blabla.mcs.anl.gov> And that's what I mean by "because we can". On Mon, 2007-07-23 at 10:40 -0500, Mike Wilde wrote: > Im in favor of building on transfer. There is also the newer utility "t2". > > Jens Im sure will be delighted to expound on these. As I recall one was better > at some things and the other at others. > > For example, I think t2 will retry failing I/Os from am alternate PFN if several > replicas are available. But perhaps transfer does parallel transfers better, > or some such advantage. I need to dive into old email to find the info, but in > the meantime Jens or the manpages can probably explain much. > > - Mike > > Ben Clifford wrote, On 7/23/2007 10:29 AM: > > none of those seem to be arguments for or against rewriting vs reusing. > > > > On Mon, 23 Jul 2007, Mihael Hategan wrote: > > > >> Support, throttling, concurrency control. We seem to be fundamentally > >> changing the way things work, and we do that because we can. > >> > >> On Mon, 2007-07-23 at 15:19 +0000, Ben Clifford wrote: > >>> Given that the VDS1 transfer executable exists and appears to work, there > >>> would need to be some strong argument to not use that as a base (which > >>> there may be, but I don't know of one). > >>> > >>> On Mon, 23 Jul 2007, Mihael Hategan wrote: > >>> > >>>> I think the reimplementation argument is not universally valid. One must > >>>> consider costs vs. benefits. > >>>> > >>>> On Mon, 2007-07-23 at 15:08 +0000, Ben Clifford wrote: > >>>>> VDS1 has a utility, transfer, which is for use on the worker nodes to > >>>>> stage data in and out. > >>>>> > >>>>> It seems fairly seriously worth considering using that, rather than > >>>>> re-implementing stuff from ground up. > >>>>> > >>>> > >> > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > From foster at mcs.anl.gov Mon Jul 23 11:48:25 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Mon, 23 Jul 2007 11:48:25 -0500 Subject: [Swift-devel] VDS1 transfer executable In-Reply-To: <1185204487.17343.14.camel@blabla.mcs.anl.gov> References: <1185203859.17343.5.camel@blabla.mcs.anl.gov> <1185204487.17343.14.camel@blabla.mcs.anl.gov> Message-ID: <46A4DBD9.6050205@mcs.anl.gov> A couple of comments that may be relevant: a) I'd really like to see evaluation of what we have, at scale, before starting reimplementation of anything. (Have I mentioned that we need to be showing routine use at scale if we are to justify continuation of this project? An important step would seem to be to try running with what we have.) b) The CEDPS guys are hard at work on storage management solutions (MOPS is the keyword). I think we should be thinking about whether/how this has a role to play in the future. Ian. Mihael Hategan wrote: > Support, throttling, concurrency control. We seem to be fundamentally > changing the way things work, and we do that because we can. > > On Mon, 2007-07-23 at 15:19 +0000, Ben Clifford wrote: > >> Given that the VDS1 transfer executable exists and appears to work, there >> would need to be some strong argument to not use that as a base (which >> there may be, but I don't know of one). >> >> On Mon, 23 Jul 2007, Mihael Hategan wrote: >> >> >>> I think the reimplementation argument is not universally valid. One must >>> consider costs vs. benefits. >>> >>> On Mon, 2007-07-23 at 15:08 +0000, Ben Clifford wrote: >>> >>>> VDS1 has a utility, transfer, which is for use on the worker nodes to >>>> stage data in and out. >>>> >>>> It seems fairly seriously worth considering using that, rather than >>>> re-implementing stuff from ground up. >>>> >>>> >>> > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org. -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Tue Jul 24 05:51:10 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 24 Jul 2007 10:51:10 +0000 (GMT) Subject: [Swift-devel] nightly tests changes Message-ID: I made some changes to the nightly tests (one for more information, the other to fix the file counter test that was broken) but I don't know how to deploy them. I think at least nightly.sh doesn't get updated automatically. r952: fix ls portion of file_counter nightly test - can't pass wildcards to ls as those are expanded by the shell, not by ls itself; and if ls finds no files it returns a failure code. Now use the root directory, on the assumption that this always has some files in it and is always readable. r953: formatting of nightly test output - specify the full year, to match up with verbose specification of the time component; log the hostname on which the tests ran -- From bugzilla-daemon at mcs.anl.gov Tue Jul 24 07:26:14 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 24 Jul 2007 07:26:14 -0500 (CDT) Subject: [Swift-devel] [Bug 80] simple_mapper strange prefix behaviour In-Reply-To: Message-ID: <20070724122614.923CA164B3@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=80 ------- Comment #2 from benc at hawaga.org.uk 2007-07-24 07:26 ------- 1. In some cases (such as illustrated by this bug) Path.Entry.getName() returns the prefix for the first element, which is a value in filename-space, not in dataset-path-space. The DefaultFileNameElementMapper is stupid enough to pass through a prefix untouched, even though it isn't a valid path component, so this doesn't cause a problem. 2. AbstractFileMapper tries to infer whether a path entry is an array index or a field name by testing whether the first character is a numeric digit or not (rather than using the Path.Entry index member value). When the prefix begins with a digit, the above two properties interact to cause the filename prefix to be treated as an array index, which fails. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From benc at hawaga.org.uk Tue Jul 24 07:45:03 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 24 Jul 2007 12:45:03 +0000 (GMT) Subject: [Swift-devel] more swift-devel bugzilla mails Message-ID: I just modified the bugzilla config so that swift-devel is watching all of the swift developer's emails. This will get more bug change email sent to the list. Alas, bugzilla doesn't seem to have a facility for an address to watch all activity in the bugzilla. -- From bugzilla-daemon at mcs.anl.gov Tue Jul 24 09:31:32 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 24 Jul 2007 09:31:32 -0500 (CDT) Subject: [Swift-devel] [Bug 6] Not globally unique temporary file names In-Reply-To: Message-ID: <20070724143132.CEBE1164B3@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=6 ------- Comment #2 from hategan at mcs.anl.gov 2007-07-24 09:31 ------- Yes. It needs to stay here. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From hategan at mcs.anl.gov Tue Jul 24 09:34:07 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 24 Jul 2007 09:34:07 -0500 Subject: [Swift-devel] Re: nightly tests changes In-Reply-To: References: Message-ID: <1185287647.16438.3.camel@blabla.mcs.anl.gov> On Tue, 2007-07-24 at 10:51 +0000, Ben Clifford wrote: > I made some changes to the nightly tests (one for more information, the > other to fix the file counter test that was broken) but I don't know how > to deploy them. I think at least nightly.sh doesn't get updated > automatically. Right. I'll poke it. > > r952: fix ls portion of file_counter nightly test - can't pass wildcards > to ls as those are expanded by the shell, not by ls itself; and if ls > finds no files it returns a failure code. Now use the root directory, on > the assumption that this always has some files in it and is always > readable. > > r953: formatting of nightly test output - specify the full year, to match > up with verbose specification of the time component; log the hostname on > which the tests ran > From hategan at mcs.anl.gov Tue Jul 24 09:37:19 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 24 Jul 2007 09:37:19 -0500 Subject: [Swift-devel] VDS1 transfer executable In-Reply-To: <46A4CBE9.6060600@mcs.anl.gov> References: <1185203859.17343.5.camel@blabla.mcs.anl.gov> <1185204487.17343.14.camel@blabla.mcs.anl.gov> <46A4CBE9.6060600@mcs.anl.gov> Message-ID: <1185287839.16438.7.camel@blabla.mcs.anl.gov> On Mon, 2007-07-23 at 10:40 -0500, Mike Wilde wrote: > Im in favor of building on transfer. There is also the newer utility "t2". > > Jens Im sure will be delighted to expound on these. Well, he said he'd use the Java stuff, since it has more flexibility than the command line interface of globus-url-copy, which is used by "transfer". On the other hand, the Java stuff is heavier on resources (unless there's some form of JVM running on some form of head node permanently). > As I recall one was better > at some things and the other at others. > > For example, I think t2 will retry failing I/Os from am alternate PFN if several > replicas are available. But perhaps transfer does parallel transfers better, > or some such advantage. I need to dive into old email to find the info, but in > the meantime Jens or the manpages can probably explain much. > > - Mike > > Ben Clifford wrote, On 7/23/2007 10:29 AM: > > none of those seem to be arguments for or against rewriting vs reusing. > > > > On Mon, 23 Jul 2007, Mihael Hategan wrote: > > > >> Support, throttling, concurrency control. We seem to be fundamentally > >> changing the way things work, and we do that because we can. > >> > >> On Mon, 2007-07-23 at 15:19 +0000, Ben Clifford wrote: > >>> Given that the VDS1 transfer executable exists and appears to work, there > >>> would need to be some strong argument to not use that as a base (which > >>> there may be, but I don't know of one). > >>> > >>> On Mon, 23 Jul 2007, Mihael Hategan wrote: > >>> > >>>> I think the reimplementation argument is not universally valid. One must > >>>> consider costs vs. benefits. > >>>> > >>>> On Mon, 2007-07-23 at 15:08 +0000, Ben Clifford wrote: > >>>>> VDS1 has a utility, transfer, which is for use on the worker nodes to > >>>>> stage data in and out. > >>>>> > >>>>> It seems fairly seriously worth considering using that, rather than > >>>>> re-implementing stuff from ground up. > >>>>> > >>>> > >> > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > From benc at hawaga.org.uk Tue Jul 24 09:42:36 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 24 Jul 2007 14:42:36 +0000 (GMT) Subject: [Swift-devel] simple_mapper separators Message-ID: I've been poking through simple_mapper to look at the various bugs open on that code. There's some special case handling for path component separators (in the abstractfilemapper superclass) such that the last component separator ends up being a "." instead of whatever comes from the supplied FileNameElementMapper (which is "_" in the default case). See the test in svn tests/language-behaviour/T077-simplemapper-bug80.swift, which is also here: http://www.ci.uchicago.edu/trac/swift/browser/trunk/tests/language-behaviour/T076-simplemapper-bug80.swift?format=raw This maps a three level array structure to filenames in a fairly straightforward fashion. The output files are: T077-simplemapper-bug80.aleph.out T077-simplemapper-bug80.beth.out T077-simplemapper-bug80_subordinate.epsilon.out T077-simplemapper-bug80_subordinate.sigma.out T077-simplemapper-bug80_subordinate_moresubordinate.hamza.out Its a bit surprising/unintuitive that the last separator that comes from the expression path is a "." rather than a "_" like the other ones, at least in the presence of a suffix; though I can see circumstances where it is useful (when the structure fields have the same name as filename extensions and there is not suffix). The path of least complexity says that this final separator change shouldn't happen - its easier to document and easier to explain. -- From benc at hawaga.org.uk Tue Jul 24 09:45:37 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 24 Jul 2007 14:45:37 +0000 (GMT) Subject: [Swift-devel] VDS1 transfer executable In-Reply-To: <1185287839.16438.7.camel@blabla.mcs.anl.gov> References: <1185203859.17343.5.camel@blabla.mcs.anl.gov> <1185204487.17343.14.camel@blabla.mcs.anl.gov> <46A4CBE9.6060600@mcs.anl.gov> <1185287839.16438.7.camel@blabla.mcs.anl.gov> Message-ID: > (unless there's some form of JVM running on some form of head node > permanently). Transfer stuff needs to (sometimes) run on the worker node, not the head node, I think. I think running things through the head node is going to produce similar performance bottle necks to running on the submit node in the case of running on a single site with a distributed file system supplying the data. -- From hategan at mcs.anl.gov Tue Jul 24 09:49:30 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 24 Jul 2007 09:49:30 -0500 Subject: [Swift-devel] VDS1 transfer executable In-Reply-To: References: <1185203859.17343.5.camel@blabla.mcs.anl.gov> <1185204487.17343.14.camel@blabla.mcs.anl.gov> <46A4CBE9.6060600@mcs.anl.gov> <1185287839.16438.7.camel@blabla.mcs.anl.gov> Message-ID: <1185288570.17215.3.camel@blabla.mcs.anl.gov> Can you be more specific on what bottlenecks we're trying to avoid? On Tue, 2007-07-24 at 14:45 +0000, Ben Clifford wrote: > > (unless there's some form of JVM running on some form of head node > > permanently). > > Transfer stuff needs to (sometimes) run on the worker node, not the head > node, I think. > > I think running things through the head node is going to produce similar > performance bottle necks to running on the submit node in the case of > running on a single site with a distributed file system supplying the > data. > From benc at hawaga.org.uk Tue Jul 24 09:52:47 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 24 Jul 2007 14:52:47 +0000 (GMT) Subject: [Swift-devel] VDS1 transfer executable In-Reply-To: <1185288570.17215.3.camel@blabla.mcs.anl.gov> References: <1185203859.17343.5.camel@blabla.mcs.anl.gov> <1185204487.17343.14.camel@blabla.mcs.anl.gov> <46A4CBE9.6060600@mcs.anl.gov> <1185287839.16438.7.camel@blabla.mcs.anl.gov> <1185288570.17215.3.camel@blabla.mcs.anl.gov> Message-ID: On Tue, 24 Jul 2007, Mihael Hategan wrote: > Can you be more specific on what bottlenecks we're trying to avoid? pumping all the data for the workflow through one ethernet card and CPU. -- From hategan at mcs.anl.gov Tue Jul 24 10:05:30 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 24 Jul 2007 10:05:30 -0500 Subject: [Swift-devel] VDS1 transfer executable In-Reply-To: References: <1185203859.17343.5.camel@blabla.mcs.anl.gov> <1185204487.17343.14.camel@blabla.mcs.anl.gov> <46A4CBE9.6060600@mcs.anl.gov> <1185287839.16438.7.camel@blabla.mcs.anl.gov> <1185288570.17215.3.camel@blabla.mcs.anl.gov> Message-ID: <1185289530.17828.5.camel@blabla.mcs.anl.gov> On Tue, 2007-07-24 at 14:52 +0000, Ben Clifford wrote: > > On Tue, 24 Jul 2007, Mihael Hategan wrote: > > > Can you be more specific on what bottlenecks we're trying to avoid? > > pumping all the data for the workflow through one ethernet card and CPU. It's I/O bound stuff, so the CPU is likely not to be the problem. And generally the eth card would be fatter than the pipe outside. The local storage on the other hand may be a problem. It's tricky however. Should a bunch of executables need the same input file, it would likely be better to transfer it only once on the head node than multiple times on each worker node. > From benc at hawaga.org.uk Tue Jul 24 10:11:03 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 24 Jul 2007 15:11:03 +0000 (GMT) Subject: [Swift-devel] VDS1 transfer executable In-Reply-To: <1185289530.17828.5.camel@blabla.mcs.anl.gov> References: <1185203859.17343.5.camel@blabla.mcs.anl.gov> <1185204487.17343.14.camel@blabla.mcs.anl.gov> <46A4CBE9.6060600@mcs.anl.gov> <1185287839.16438.7.camel@blabla.mcs.anl.gov> <1185288570.17215.3.camel@blabla.mcs.anl.gov> <1185289530.17828.5.camel@blabla.mcs.anl.gov> Message-ID: On Tue, 24 Jul 2007, Mihael Hategan wrote: > It's I/O bound stuff, so the CPU is likely not to be the problem. And > generally the eth card would be fatter than the pipe outside. In the case where eg. dCache is 'inside' rather than 'outside', that's different. > The local storage on the other hand may be a problem. It's tricky > however. Should a bunch of executables need the same input file, it > would likely be better to transfer it only once on the head node than > multiple times on each worker node. Its got to be transferred to the worker nodes anyway (at least as much of it as is read/written) - in the present case using whatever shared posix fs the site-wide scratch space lives on. How the two different approaches stack up is probably going to depend on the site layout and its relation to wherever submit-side data lives (which, as I said, may be on-site); and on the app. So I don't think there's one right way to do it. -- From foster at mcs.anl.gov Tue Jul 24 10:21:34 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Tue, 24 Jul 2007 10:21:34 -0500 Subject: [Swift-devel] VDS1 transfer executable In-Reply-To: References: <1185203859.17343.5.camel@blabla.mcs.anl.gov> <1185204487.17343.14.camel@blabla.mcs.anl.gov> <46A4CBE9.6060600@mcs.anl.gov> <1185287839.16438.7.camel@blabla.mcs.anl.gov> <1185288570.17215.3.camel@blabla.mcs.anl.gov> Message-ID: <46A618FE.20205@mcs.anl.gov> Do we have data that show this to be a problem? Ben Clifford wrote: > On Tue, 24 Jul 2007, Mihael Hategan wrote: > > >> Can you be more specific on what bottlenecks we're trying to avoid? >> > > pumping all the data for the workflow through one ethernet card and CPU. > > -- Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue Jul 24 10:23:15 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 24 Jul 2007 10:23:15 -0500 Subject: [Swift-devel] VDS1 transfer executable In-Reply-To: References: <1185203859.17343.5.camel@blabla.mcs.anl.gov> <1185204487.17343.14.camel@blabla.mcs.anl.gov> <46A4CBE9.6060600@mcs.anl.gov> <1185287839.16438.7.camel@blabla.mcs.anl.gov> <1185288570.17215.3.camel@blabla.mcs.anl.gov> <1185289530.17828.5.camel@blabla.mcs.anl.gov> Message-ID: <1185290595.18405.9.camel@blabla.mcs.anl.gov> On Tue, 2007-07-24 at 15:11 +0000, Ben Clifford wrote: > On Tue, 24 Jul 2007, Mihael Hategan wrote: > > > It's I/O bound stuff, so the CPU is likely not to be the problem. And > > generally the eth card would be fatter than the pipe outside. > > In the case where eg. dCache is 'inside' rather than 'outside', that's > different. Then it wouldn't be going through eth, I'm guessing. They invented lo. And if it's not lo, then you'd still have a single eth (the source). Doing single eth to single eth will probably be not much different from single eth to multiple eths. There's the other possibility where the source is multi-headed. But we should probably not optimize for 1% of the scenarios. > > > The local storage on the other hand may be a problem. It's tricky > > however. Should a bunch of executables need the same input file, it > > would likely be better to transfer it only once on the head node than > > multiple times on each worker node. > > Its got to be transferred to the worker nodes anyway (at least as much of > it as is read/written) - in the present case using whatever shared posix > fs the site-wide scratch space lives on. Yes and no. Some of the data may be transferred, as needed. Also, there may be high performance shared FSes, which may beat our puny attempts at better performance. > > How the two different approaches stack up is probably going to depend on > the site layout and its relation to wherever submit-side data lives > (which, as I said, may be on-site); and on the app. So I don't think > there's one right way to do it. Yep. But one of the choices is an engineering no no for us. If we can make the other sufficiently good, we can provide a reasonable solution at a low cost. (Note to Ian: we're not implementing anything yet) > From hategan at mcs.anl.gov Tue Jul 24 13:47:16 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 24 Jul 2007 13:47:16 -0500 Subject: [Swift-devel] numeric type(s) in swift. In-Reply-To: References: <46A0D0D1.6070407@mcs.anl.gov> <46A11286.7080807@mcs.anl.gov> <1184964549.26024.0.camel@blabla.mcs.anl.gov> Message-ID: <1185302836.6949.5.camel@blabla.mcs.anl.gov> I'm thinking we should have two division operators: div - integer division (int, int -> int) / - floating point division ( [int|float], [int|float] -> float ) This is necessary because we don't have type casting, so a programmer could not specify nicely how to force an int/int division to be result in a floating point number. In C (and related), one would type cast one of the operands to double (e.g. double x = (double) i / j;). In our case it could be done with a separate assignment, but I think that's cumbersome. Mihael On Fri, 2007-07-20 at 22:12 +0000, Ben Clifford wrote: > I made a branch with the relevant patches from my quilt patch stack. > > https://svn.ci.uchicago.edu/svn/vdl2/branches/types-and-expressions > > In r940, I remove non-integer numbers from by language by virtue of > removing the test cases from language-behaviour for them (but no actual > code changes). If you want to run the language-behaviour tests with the > non-integer tests in there again, roll back r940 in your local repo. > > The two biggest changes are r941 which makes much more stuff be wrapped in > DSHandles, and r942 which is adjustment to the intermediate language to > have XML based expressions. > > As a consequence of r942, the resulting karajan code has a lot more cruft > in it (but should still behave as previously). I'm intending to work on > that more so don't be alarmed. > > Type this for the commit logs so far: > > svn log > https://svn.ci.uchicago.edu/svn/vdl2/branches/types-and-expressions > -r933:HEAD > > -- > > From hategan at mcs.anl.gov Tue Jul 24 15:12:28 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 24 Jul 2007 15:12:28 -0500 Subject: [Swift-devel] numeric type(s) in swift. In-Reply-To: References: <46A0D0D1.6070407@mcs.anl.gov> <46A11286.7080807@mcs.anl.gov> <1184964549.26024.0.camel@blabla.mcs.anl.gov> Message-ID: <1185307948.14893.2.camel@blabla.mcs.anl.gov> I've committed some stuff to that branch which should make the numeric operators more efficient. The language behavior tests seem to pass. One potentially problem-causing change (if broken code makes broken assumptions) is that Swift number values are not stored as strings any more, but as subclasses of java.lang.Number. Mihael On Fri, 2007-07-20 at 22:12 +0000, Ben Clifford wrote: > I made a branch with the relevant patches from my quilt patch stack. > > https://svn.ci.uchicago.edu/svn/vdl2/branches/types-and-expressions > > In r940, I remove non-integer numbers from by language by virtue of > removing the test cases from language-behaviour for them (but no actual > code changes). If you want to run the language-behaviour tests with the > non-integer tests in there again, roll back r940 in your local repo. > > The two biggest changes are r941 which makes much more stuff be wrapped in > DSHandles, and r942 which is adjustment to the intermediate language to > have XML based expressions. > > As a consequence of r942, the resulting karajan code has a lot more cruft > in it (but should still behave as previously). I'm intending to work on > that more so don't be alarmed. > > Type this for the commit logs so far: > > svn log > https://svn.ci.uchicago.edu/svn/vdl2/branches/types-and-expressions > -r933:HEAD > > -- > > From bugzilla-daemon at mcs.anl.gov Tue Jul 24 16:21:00 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 24 Jul 2007 16:21:00 -0500 (CDT) Subject: [Swift-devel] [Bug 83] nested loops hung In-Reply-To: Message-ID: <20070724212100.C8FAB164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83 ------- Comment #13 from nefedova at mcs.anl.gov 2007-07-24 16:21 ------- I tried the same code as in Comment #2 with r951 and it hangs the same way as before. has it worked for you? -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From benc at hawaga.org.uk Tue Jul 24 16:54:45 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 24 Jul 2007 21:54:45 +0000 (GMT) Subject: [Swift-devel] VDS1 transfer executable In-Reply-To: <46A618FE.20205@mcs.anl.gov> References: <1185203859.17343.5.camel@blabla.mcs.anl.gov> <1185204487.17343.14.camel@blabla.mcs.anl.gov> <46A4CBE9.6060600@mcs.anl.gov> <1185287839.16438.7.camel@blabla.mcs.anl.gov> <1185288570.17215.3.camel@blabla.mcs.anl.gov> <46A618FE.20205@mcs.anl.gov> Message-ID: Not numerical data. I just recall it being something that we ISI sysadmin people used to laugh about as people had VDS moving data all over the place unnecessarily within the ISI network whilst they complained that there wasn't enough space in one particular space or that ftp servers weren't coping. On Tue, 24 Jul 2007, Ian Foster wrote: > Do we have data that show this to be a problem? > > Ben Clifford wrote: > > On Tue, 24 Jul 2007, Mihael Hategan wrote: > > > > > > > Can you be more specific on what bottlenecks we're trying to avoid? > > > > > > > pumping all the data for the workflow through one ethernet card and CPU. > > > > > > From hategan at mcs.anl.gov Tue Jul 24 17:05:18 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 24 Jul 2007 17:05:18 -0500 Subject: [Swift-devel] VDS1 transfer executable In-Reply-To: References: <1185203859.17343.5.camel@blabla.mcs.anl.gov> <1185204487.17343.14.camel@blabla.mcs.anl.gov> <46A4CBE9.6060600@mcs.anl.gov> <1185287839.16438.7.camel@blabla.mcs.anl.gov> <1185288570.17215.3.camel@blabla.mcs.anl.gov> <46A618FE.20205@mcs.anl.gov> Message-ID: <1185314718.8214.0.camel@blabla.mcs.anl.gov> On Tue, 2007-07-24 at 21:54 +0000, Ben Clifford wrote: > Not numerical data. I just recall it being something that we ISI sysadmin > people used to laugh about as people had VDS moving data all over the > place unnecessarily within the ISI network Still laughing? :) > whilst they complained that > there wasn't enough space in one particular space or that ftp servers > weren't coping. > > On Tue, 24 Jul 2007, Ian Foster wrote: > > > Do we have data that show this to be a problem? > > > > Ben Clifford wrote: > > > On Tue, 24 Jul 2007, Mihael Hategan wrote: > > > > > > > > > > Can you be more specific on what bottlenecks we're trying to avoid? > > > > > > > > > > pumping all the data for the workflow through one ethernet card and CPU. > > > > > > > > > > > From benc at hawaga.org.uk Tue Jul 24 17:06:47 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 24 Jul 2007 22:06:47 +0000 (GMT) Subject: [Swift-devel] VDS1 transfer executable In-Reply-To: <1185314718.8214.0.camel@blabla.mcs.anl.gov> References: <1185203859.17343.5.camel@blabla.mcs.anl.gov> <1185204487.17343.14.camel@blabla.mcs.anl.gov> <46A4CBE9.6060600@mcs.anl.gov> <1185287839.16438.7.camel@blabla.mcs.anl.gov> <1185288570.17215.3.camel@blabla.mcs.anl.gov> <46A618FE.20205@mcs.anl.gov> <1185314718.8214.0.camel@blabla.mcs.anl.gov> Message-ID: On Tue, 24 Jul 2007, Mihael Hategan wrote: > On Tue, 2007-07-24 at 21:54 +0000, Ben Clifford wrote: > > Not numerical data. I just recall it being something that we ISI sysadmin > > people used to laugh about as people had VDS moving data all over the > > place unnecessarily within the ISI network > > Still laughing? :) no, I left. -- From benc at hawaga.org.uk Tue Jul 24 17:10:43 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 24 Jul 2007 22:10:43 +0000 (GMT) Subject: [Swift-devel] numeric type(s) in swift. In-Reply-To: <1185307948.14893.2.camel@blabla.mcs.anl.gov> References: <46A0D0D1.6070407@mcs.anl.gov> <46A11286.7080807@mcs.anl.gov> <1184964549.26024.0.camel@blabla.mcs.anl.gov> <1185307948.14893.2.camel@blabla.mcs.anl.gov> Message-ID: On Tue, 24 Jul 2007, Mihael Hategan wrote: > One potentially problem-causing change (if broken code makes broken > assumptions) is that Swift number values are not stored as strings any > more, but as subclasses of java.lang.Number. I think(?) that the only code that made assumptions about the number formats are the numerical operators and code that assumes the toString() output will be of a particular format when passing as a commandline parameter. -- From foster at mcs.anl.gov Tue Jul 24 17:28:59 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Tue, 24 Jul 2007 17:28:59 -0500 Subject: [Swift-devel] VDS1 transfer executable In-Reply-To: References: <1185203859.17343.5.camel@blabla.mcs.anl.gov> <1185204487.17343.14.camel@blabla.mcs.anl.gov> <46A4CBE9.6060600@mcs.anl.gov> <1185287839.16438.7.camel@blabla.mcs.anl.gov> <1185288570.17215.3.camel@blabla.mcs.anl.gov> <46A618FE.20205@mcs.anl.gov> Message-ID: <46A67D2B.4090804@mcs.anl.gov> Ben: I feel strongly that we should be focusing our scarce development resources on problems that we have documented via user experience. That means we need that performance monitoring infrastructure in Swift ... I do think that data movement and caching are likely to become important issues. But it would be good to know when/how exactly they do. Mike mentioned that he thought Nika's MolDyn code had some workaround in it to reduce data movement, introduced because of a lack of caching support. Does anyone know about that? Ian. Ben Clifford wrote: > Not numerical data. I just recall it being something that we ISI sysadmin > people used to laugh about as people had VDS moving data all over the > place unnecessarily within the ISI network whilst they complained that > there wasn't enough space in one particular space or that ftp servers > weren't coping. > > On Tue, 24 Jul 2007, Ian Foster wrote: > > >> Do we have data that show this to be a problem? >> >> Ben Clifford wrote: >> >>> On Tue, 24 Jul 2007, Mihael Hategan wrote: >>> >>> >>> >>>> Can you be more specific on what bottlenecks we're trying to avoid? >>>> >>>> >>> pumping all the data for the workflow through one ethernet card and CPU. >>> >>> >>> >> > > -- Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue Jul 24 17:40:29 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 24 Jul 2007 17:40:29 -0500 Subject: [Swift-devel] numeric type(s) in swift. In-Reply-To: References: <46A0D0D1.6070407@mcs.anl.gov> <46A11286.7080807@mcs.anl.gov> <1184964549.26024.0.camel@blabla.mcs.anl.gov> <1185307948.14893.2.camel@blabla.mcs.anl.gov> Message-ID: <1185316829.9373.3.camel@blabla.mcs.anl.gov> On Tue, 2007-07-24 at 22:10 +0000, Ben Clifford wrote: > On Tue, 24 Jul 2007, Mihael Hategan wrote: > > > One potentially problem-causing change (if broken code makes broken > > assumptions) is that Swift number values are not stored as strings any > > more, but as subclasses of java.lang.Number. > > I think(?) that the only code that made assumptions about the number > formats are the numerical operators and code that assumes the toString() > output will be of a particular format when passing as a commandline > parameter. That format would only be kept in the case in which the assigned value would be used. Using any arithmetic operators would not make any guarantee of a particular format. That in the old code. Maybe some formatting functions should be provided? > From benc at hawaga.org.uk Tue Jul 24 17:41:52 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 24 Jul 2007 22:41:52 +0000 (GMT) Subject: [Swift-devel] numeric type(s) in swift. In-Reply-To: <1185316829.9373.3.camel@blabla.mcs.anl.gov> References: <46A0D0D1.6070407@mcs.anl.gov> <46A11286.7080807@mcs.anl.gov> <1184964549.26024.0.camel@blabla.mcs.anl.gov> <1185307948.14893.2.camel@blabla.mcs.anl.gov> <1185316829.9373.3.camel@blabla.mcs.anl.gov> Message-ID: On Tue, 24 Jul 2007, Mihael Hategan wrote: > That format would only be kept in the case in which the assigned value > would be used. Using any arithmetic operators would not make any > guarantee of a particular format. That in the old code. Maybe some > formatting functions should be provided? easy enough to implement ad-hoc when someone needs them - for now, we can wait till it causes someone trouble. -- From benc at hawaga.org.uk Tue Jul 24 17:42:18 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 24 Jul 2007 22:42:18 +0000 (GMT) Subject: [Swift-devel] VDS1 transfer executable In-Reply-To: <46A67D2B.4090804@mcs.anl.gov> References: <1185203859.17343.5.camel@blabla.mcs.anl.gov> <1185204487.17343.14.camel@blabla.mcs.anl.gov> <46A4CBE9.6060600@mcs.anl.gov> <1185287839.16438.7.camel@blabla.mcs.anl.gov> <1185288570.17215.3.camel@blabla.mcs.anl.gov> <46A618FE.20205@mcs.anl.gov> <46A67D2B.4090804@mcs.anl.gov> Message-ID: On Tue, 24 Jul 2007, Ian Foster wrote: > Mike mentioned that he thought Nika's MolDyn code had some workaround in > it to reduce data movement, introduced because of a lack of caching > support. Does anyone know about that? that is bug 76: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=76 Bug 78 http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=78 (or rather, the privately discussed rather different root cause of bug 78, which is to access dcache data) is higher priority. A basic approach is to have the submit side access dcache, as I've discussed elsewhere; there's no direct evidence that that approach will be unsuitable (though thoughts that it might be are what motivated this thread). We can look at doing that next. -- From benc at hawaga.org.uk Tue Jul 24 18:11:45 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 24 Jul 2007 23:11:45 +0000 (GMT) Subject: [Swift-devel] r064: use /dev/urandom by default Message-ID: This should go to trunk not language reform branch? -- From hategan at mcs.anl.gov Tue Jul 24 18:20:49 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 24 Jul 2007 18:20:49 -0500 Subject: [Swift-devel] Re: r064: use /dev/urandom by default In-Reply-To: References: Message-ID: <1185319249.11093.1.camel@blabla.mcs.anl.gov> On Tue, 2007-07-24 at 23:11 +0000, Ben Clifford wrote: > This should go to trunk not language reform branch? Right. I'm guessing it will get there when we merge them. I wanted some testing to be done on it. In another order of ideas, I think we should have a general development branch, not specific to a certain thing (such as expressions). Mihael From benc at hawaga.org.uk Wed Jul 25 02:38:27 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 25 Jul 2007 07:38:27 +0000 (GMT) Subject: [Swift-devel] Re: r064: use /dev/urandom by default In-Reply-To: <1185319249.11093.1.camel@blabla.mcs.anl.gov> References: <1185319249.11093.1.camel@blabla.mcs.anl.gov> Message-ID: On Tue, 24 Jul 2007, Mihael Hategan wrote: > Right. I'm guessing it will get there when we merge them. I wanted some > testing to be done on it. Pretty much the only serious testing that's going to happen is when it gets to trunk and people get it on the occasions that they update from there. SVN's branch management is sufficiently poor that I prefer to not have long lived general development branches that dilute testing of stuff that's gone into trunk. (now if we were using git, that would be another matter...) -- From benc at hawaga.org.uk Wed Jul 25 02:42:56 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 25 Jul 2007 07:42:56 +0000 (GMT) Subject: [Swift-devel] Re: nightly tests changes In-Reply-To: <1185287647.16438.3.camel@blabla.mcs.anl.gov> References: <1185287647.16438.3.camel@blabla.mcs.anl.gov> Message-ID: On Tue, 24 Jul 2007, Mihael Hategan wrote: > > r952: fix ls portion of file_counter nightly test - can't pass wildcards > > to ls as those are expanded by the shell, not by ls itself; and if ls > > finds no files it returns a failure code. Now use the root directory, on > > the assumption that this always has some files in it and is always > > readable. Looks like this fix worked. Tests now look greener, though not completely green - 5 of the 110 grid tests failed (at random?) with gridftp errors. -- From benc at hawaga.org.uk Wed Jul 25 05:30:22 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 25 Jul 2007 10:30:22 +0000 (GMT) Subject: [Swift-devel] numeric type(s) in swift. In-Reply-To: <1185302836.6949.5.camel@blabla.mcs.anl.gov> References: <46A0D0D1.6070407@mcs.anl.gov> <46A11286.7080807@mcs.anl.gov> <1184964549.26024.0.camel@blabla.mcs.anl.gov> <1185302836.6949.5.camel@blabla.mcs.anl.gov> Message-ID: need to be careful a bit about casting between floating point and fixed precision types in the operator implementation. ints are small enough that they fit within a double such that no precision is lost; but using eg. a java long would cause a problem (see below code) public class casts { public static void main(String args[]) { long i = 9223372036854775784l; double d = (double) i; long i2 = (long)d; System.out.println(" i="+i); System.out.println(" d="+d); System.out.println(" i2="+i2); if(i != i2) System.out.println("Different"); } } From benc at hawaga.org.uk Wed Jul 25 08:04:52 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 25 Jul 2007 13:04:52 +0000 (GMT) Subject: [Swift-devel] airsn and ROI mappers Message-ID: Hi. Are these two mappers used? If so, I need to make sure some code changes I want to make to AbstractFileMapper don't break those. If not, I'm less concerned. -- From bugzilla-daemon at mcs.anl.gov Wed Jul 25 08:33:46 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 25 Jul 2007 08:33:46 -0500 (CDT) Subject: [Swift-devel] [Bug 83] nested loops hung In-Reply-To: Message-ID: <20070725133346.4DA6C164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83 nefedova at mcs.anl.gov changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |blocker -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From yongzh at cs.uchicago.edu Wed Jul 25 09:08:53 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Wed, 25 Jul 2007 09:08:53 -0500 (CDT) Subject: [Swift-devel] airsn and ROI mappers In-Reply-To: References: Message-ID: The airsn mapper is critical for the fRMI workflows, ROIMapper was developed for the RADGrid workflow. Yong. On Wed, 25 Jul 2007, Ben Clifford wrote: > > Hi. > > Are these two mappers used? > > If so, I need to make sure some code changes I want to make to > AbstractFileMapper don't break those. If not, I'm less concerned. > > -- > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Wed Jul 25 09:32:52 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 25 Jul 2007 09:32:52 -0500 Subject: [Swift-devel] numeric type(s) in swift. In-Reply-To: References: <46A0D0D1.6070407@mcs.anl.gov> <46A11286.7080807@mcs.anl.gov> <1184964549.26024.0.camel@blabla.mcs.anl.gov> <1185302836.6949.5.camel@blabla.mcs.anl.gov> Message-ID: <1185373972.12444.1.camel@blabla.mcs.anl.gov> Yep. But we're not using longs. On Wed, 2007-07-25 at 10:30 +0000, Ben Clifford wrote: > need to be careful a bit about casting between floating point and fixed > precision types in the operator implementation. > > ints are small enough that they fit within a double such that no precision > is lost; but using eg. a java long would cause a problem (see below code) > > public class casts { > public static void main(String args[]) { > > long i = 9223372036854775784l; > double d = (double) i; > long i2 = (long)d; > > System.out.println(" i="+i); > System.out.println(" d="+d); > System.out.println(" i2="+i2); > if(i != i2) System.out.println("Different"); > } > } > From benc at hawaga.org.uk Wed Jul 25 11:13:02 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 25 Jul 2007 16:13:02 +0000 (GMT) Subject: [Swift-devel] Re: Falkon code and logs In-Reply-To: <469BE095.4010608@cs.uchicago.edu> References: <469BE095.4010608@cs.uchicago.edu> Message-ID: On Mon, 16 Jul 2007, Ioan Raicu wrote: > Hey Ben, > Here is the latest Falkon code base, including all compiled classes, scripts, > libraries, 1.4 JRE, ploticus binaries, GT4 WS-core container, web server, > etc... its the entire branch that is needed containing all the different > Falkon components. I would have preffered to clean things up a bit, but here > it is, and I'll do the clean-up later... > http://people.cs.uchicago.edu/~iraicu/research/Falkon/Falkon_v0.8.1.tgz I just imported this into the vdl2 subversion repo. Type: svn co https://svn.ci.uchicago.edu/svn/vdl2/falkon to get the checkout. I removed the embedded JRE (putting aside issues of whether we should big binaries like that in the SVN, a quick glance at the JRE redistribution licence looked like it was not something acceptable) If you edit files, you can commit them with: svn commit which will require you to feed in your CI password. Type svn update in the root directory of your checkout to pull down changes that other people have made since your last checkout/update (probably you'll find me making a bunch of those to tidy some things up) If you add files, you will need to: svn add myfile.java before committing it. This is the tarball as I received it, so has lots of built cruft in there (.class files and things). I'll help work on tidying that up in the repository. Please commit any changes you have made since this tarball, and begin making your releases from committed SVN code rather than from your own private codebase - that way, people can talk about 'falkon built from r972' and then everyone can look at the exact code version from SVN. -- From benc at hawaga.org.uk Wed Jul 25 11:15:30 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 25 Jul 2007 16:15:30 +0000 (GMT) Subject: [Swift-devel] Re: Falkon code and logs In-Reply-To: References: <469BE095.4010608@cs.uchicago.edu> Message-ID: btw, the import hasn't actually finished yet... i sent this mail by accident without waiting for it to finish. From hategan at mcs.anl.gov Wed Jul 25 12:01:33 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 25 Jul 2007 12:01:33 -0500 Subject: [Swift-devel] Re: Falkon code and logs In-Reply-To: References: <469BE095.4010608@cs.uchicago.edu> Message-ID: <1185382893.15519.0.camel@blabla.mcs.anl.gov> Aaargh! It's being imported into the root not the falkon directory! On Wed, 2007-07-25 at 16:13 +0000, Ben Clifford wrote: > > On Mon, 16 Jul 2007, Ioan Raicu wrote: > > > Hey Ben, > > Here is the latest Falkon code base, including all compiled classes, scripts, > > libraries, 1.4 JRE, ploticus binaries, GT4 WS-core container, web server, > > etc... its the entire branch that is needed containing all the different > > Falkon components. I would have preffered to clean things up a bit, but here > > it is, and I'll do the clean-up later... > > http://people.cs.uchicago.edu/~iraicu/research/Falkon/Falkon_v0.8.1.tgz > > I just imported this into the vdl2 subversion repo. > > Type: > > svn co https://svn.ci.uchicago.edu/svn/vdl2/falkon > > to get the checkout. > > I removed the embedded JRE (putting aside issues of whether we should big > binaries like that in the SVN, a quick glance at the JRE redistribution > licence looked like it was not something acceptable) > > If you edit files, you can commit them with: > > svn commit > > which will require you to feed in your CI password. > > Type svn update in the root directory of your checkout to pull down > changes that other people have made since your last checkout/update > (probably you'll find me making a bunch of those to tidy some things up) > > If you add files, you will need to: > > svn add myfile.java > > before committing it. > > This is the tarball as I received it, so has lots of built cruft in there > (.class files and things). > > I'll help work on tidying that up in the repository. > > Please commit any changes you have made since this tarball, and begin > making your releases from committed SVN code rather than from your own > private codebase - that way, people can talk about 'falkon built from > r972' and then everyone can look at the exact code version from SVN. > From benc at hawaga.org.uk Wed Jul 25 12:03:20 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 25 Jul 2007 17:03:20 +0000 (GMT) Subject: [Swift-devel] Re: Falkon code and logs In-Reply-To: <1185382893.15519.0.camel@blabla.mcs.anl.gov> References: <469BE095.4010608@cs.uchicago.edu> <1185382893.15519.0.camel@blabla.mcs.anl.gov> Message-ID: ja I saw that. Easy to move, which I am doing now. Please wait. On Wed, 25 Jul 2007, Mihael Hategan wrote: > Aaargh! It's being imported into the root not the falkon directory! > > On Wed, 2007-07-25 at 16:13 +0000, Ben Clifford wrote: > > > > On Mon, 16 Jul 2007, Ioan Raicu wrote: > > > > > Hey Ben, > > > Here is the latest Falkon code base, including all compiled classes, scripts, > > > libraries, 1.4 JRE, ploticus binaries, GT4 WS-core container, web server, > > > etc... its the entire branch that is needed containing all the different > > > Falkon components. I would have preffered to clean things up a bit, but here > > > it is, and I'll do the clean-up later... > > > http://people.cs.uchicago.edu/~iraicu/research/Falkon/Falkon_v0.8.1.tgz > > > > I just imported this into the vdl2 subversion repo. > > > > Type: > > > > svn co https://svn.ci.uchicago.edu/svn/vdl2/falkon > > > > to get the checkout. > > > > I removed the embedded JRE (putting aside issues of whether we should big > > binaries like that in the SVN, a quick glance at the JRE redistribution > > licence looked like it was not something acceptable) > > > > If you edit files, you can commit them with: > > > > svn commit > > > > which will require you to feed in your CI password. > > > > Type svn update in the root directory of your checkout to pull down > > changes that other people have made since your last checkout/update > > (probably you'll find me making a bunch of those to tidy some things up) > > > > If you add files, you will need to: > > > > svn add myfile.java > > > > before committing it. > > > > This is the tarball as I received it, so has lots of built cruft in there > > (.class files and things). > > > > I'll help work on tidying that up in the repository. > > > > Please commit any changes you have made since this tarball, and begin > > making your releases from committed SVN code rather than from your own > > private codebase - that way, people can talk about 'falkon built from > > r972' and then everyone can look at the exact code version from SVN. > > > > From benc at hawaga.org.uk Wed Jul 25 13:27:47 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 25 Jul 2007 18:27:47 +0000 (GMT) Subject: [Swift-devel] Re: Falkon code and logs In-Reply-To: References: <469BE095.4010608@cs.uchicago.edu> <1185382893.15519.0.camel@blabla.mcs.anl.gov> Message-ID: On Wed, 25 Jul 2007, Ben Clifford wrote: > ja I saw that. Easy to move, which I am doing now. Please wait. hopefully all better as of r995. -- From hategan at mcs.anl.gov Wed Jul 25 19:03:08 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 25 Jul 2007 19:03:08 -0500 Subject: [Swift-devel] dcache Message-ID: <1185408188.21980.0.camel@blabla.mcs.anl.gov> Does anyone know of an installation I can play with? Looking at the docs, I'm a bit reluctant to try to install it on my laptop. Mihael From iraicu at cs.uchicago.edu Wed Jul 25 19:10:28 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Wed, 25 Jul 2007 19:10:28 -0500 Subject: [Swift-devel] dcache In-Reply-To: <1185408188.21980.0.camel@blabla.mcs.anl.gov> References: <1185408188.21980.0.camel@blabla.mcs.anl.gov> Message-ID: <46A7E674.6060406@cs.uchicago.edu> I think dCache is installed on Tier3. http://twiki.mwt2.org/bin/view/UCTier3/WebHome Ioan Mihael Hategan wrote: > Does anyone know of an installation I can play with? > Looking at the docs, I'm a bit reluctant to try to install it on my > laptop. > > Mihael > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ From hategan at mcs.anl.gov Wed Jul 25 19:13:01 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 25 Jul 2007 19:13:01 -0500 Subject: [Swift-devel] dcache In-Reply-To: <46A7E674.6060406@cs.uchicago.edu> References: <1185408188.21980.0.camel@blabla.mcs.anl.gov> <46A7E674.6060406@cs.uchicago.edu> Message-ID: <1185408781.22344.1.camel@blabla.mcs.anl.gov> Right, and how would I get access to that? On Wed, 2007-07-25 at 19:10 -0500, Ioan Raicu wrote: > I think dCache is installed on Tier3. > http://twiki.mwt2.org/bin/view/UCTier3/WebHome > Ioan > > > Mihael Hategan wrote: > > Does anyone know of an installation I can play with? > > Looking at the docs, I'm a bit reluctant to try to install it on my > > laptop. > > > > Mihael > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > From iraicu at cs.uchicago.edu Wed Jul 25 19:24:49 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Wed, 25 Jul 2007 19:24:49 -0500 Subject: [Swift-devel] dcache In-Reply-To: <1185408781.22344.1.camel@blabla.mcs.anl.gov> References: <1185408188.21980.0.camel@blabla.mcs.anl.gov> <46A7E674.6060406@cs.uchicago.edu> <1185408781.22344.1.camel@blabla.mcs.anl.gov> Message-ID: <46A7E9D1.1050605@cs.uchicago.edu> Here is the message I got from Rob Gardner (rwg at ci.uchicago.edu) when I got my account for Tier3. You might have to write him, Mary, and/or double check the link below for further instructions. Ioan ========================= Hi Mary, Can you create accounts for Yong Zhao and Ioan Raicu, two CS computer science students. They will need to use the Tier3 cluster for a test tomorrow morning. Yong, Ioan, the first step is to follow the instructions for UChicago users at: http://twiki.mwt2.org/bin/view/TWiki/TWikiRegistration then Mary will create twiki accounts for you on the UC Tier3 twiki which is not public. Then you'll go to: http://twiki.mwt2.org/bin/view/UCTier3/WebHome and then http://twiki.mwt2.org/bin/view/UCTier3/GettingAnAccount. Rob Mihael Hategan wrote: > Right, and how would I get access to that? > > On Wed, 2007-07-25 at 19:10 -0500, Ioan Raicu wrote: > >> I think dCache is installed on Tier3. >> http://twiki.mwt2.org/bin/view/UCTier3/WebHome >> Ioan >> >> >> Mihael Hategan wrote: >> >>> Does anyone know of an installation I can play with? >>> Looking at the docs, I'm a bit reluctant to try to install it on my >>> laptop. >>> >>> Mihael >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> >>> > > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Wed Jul 25 19:26:34 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 25 Jul 2007 19:26:34 -0500 Subject: [Swift-devel] dcache In-Reply-To: <46A7E9D1.1050605@cs.uchicago.edu> References: <1185408188.21980.0.camel@blabla.mcs.anl.gov> <46A7E674.6060406@cs.uchicago.edu> <1185408781.22344.1.camel@blabla.mcs.anl.gov> <46A7E9D1.1050605@cs.uchicago.edu> Message-ID: <1185409594.23097.0.camel@blabla.mcs.anl.gov> Any simpler way? On Wed, 2007-07-25 at 19:24 -0500, Ioan Raicu wrote: > Here is the message I got from Rob Gardner (rwg at ci.uchicago.edu) when > I got my account for Tier3. You might have to write him, Mary, and/or > double check the link below for further instructions. > > Ioan > > ========================= > Hi Mary, > > > Can you create accounts for Yong Zhao and Ioan Raicu, two CS computer > science students. They will > need to use the Tier3 cluster for a test tomorrow morning. > > > Yong, Ioan, the first step is to follow the instructions for UChicago > users at: > > > http://twiki.mwt2.org/bin/view/TWiki/TWikiRegistration > > > then Mary will create twiki accounts for you on the UC Tier3 twiki > which is not public. Then you'll > go to: > > > http://twiki.mwt2.org/bin/view/UCTier3/WebHome > > > and then http://twiki.mwt2.org/bin/view/UCTier3/GettingAnAccount. > > > Rob > > > > > > > Mihael Hategan wrote: > > Right, and how would I get access to that? > > > > On Wed, 2007-07-25 at 19:10 -0500, Ioan Raicu wrote: > > > > > I think dCache is installed on Tier3. > > > http://twiki.mwt2.org/bin/view/UCTier3/WebHome > > > Ioan > > > > > > > > > Mihael Hategan wrote: > > > > > > > Does anyone know of an installation I can play with? > > > > Looking at the docs, I'm a bit reluctant to try to install it on my > > > > laptop. > > > > > > > > Mihael > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > -- > ============================================ > Ioan Raicu > Ph.D. Student > ============================================ > Distributed Systems Laboratory > Computer Science Department > University of Chicago > 1100 E. 58th Street, Ryerson Hall > Chicago, IL 60637 > ============================================ > Email: iraicu at cs.uchicago.edu > Web: http://www.cs.uchicago.edu/~iraicu > http://dsl.cs.uchicago.edu/ > ============================================ > ============================================ From iraicu at cs.uchicago.edu Wed Jul 25 19:29:40 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Wed, 25 Jul 2007 19:29:40 -0500 Subject: [Swift-devel] dcache In-Reply-To: <1185409594.23097.0.camel@blabla.mcs.anl.gov> References: <1185408188.21980.0.camel@blabla.mcs.anl.gov> <46A7E674.6060406@cs.uchicago.edu> <1185408781.22344.1.camel@blabla.mcs.anl.gov> <46A7E9D1.1050605@cs.uchicago.edu> <1185409594.23097.0.camel@blabla.mcs.anl.gov> Message-ID: <46A7EAF4.5060608@cs.uchicago.edu> You asked how, I told you how Yong and I got accounts on Tier3, which also has dCache installed. They actually have a really nice testbed, some 20 compute nodes with 8GB of memory and 4 cores on each node, and some 50TB of disk managed by dCache. I don't know of any other install of dCache around here, such as TeraPort or TeraGrid. Ioan Mihael Hategan wrote: > Any simpler way? > > On Wed, 2007-07-25 at 19:24 -0500, Ioan Raicu wrote: > >> Here is the message I got from Rob Gardner (rwg at ci.uchicago.edu) when >> I got my account for Tier3. You might have to write him, Mary, and/or >> double check the link below for further instructions. >> >> Ioan >> >> ========================= >> Hi Mary, >> >> >> Can you create accounts for Yong Zhao and Ioan Raicu, two CS computer >> science students. They will >> need to use the Tier3 cluster for a test tomorrow morning. >> >> >> Yong, Ioan, the first step is to follow the instructions for UChicago >> users at: >> >> >> http://twiki.mwt2.org/bin/view/TWiki/TWikiRegistration >> >> >> then Mary will create twiki accounts for you on the UC Tier3 twiki >> which is not public. Then you'll >> go to: >> >> >> http://twiki.mwt2.org/bin/view/UCTier3/WebHome >> >> >> and then http://twiki.mwt2.org/bin/view/UCTier3/GettingAnAccount. >> >> >> Rob >> >> >> >> >> >> >> Mihael Hategan wrote: >> >>> Right, and how would I get access to that? >>> >>> On Wed, 2007-07-25 at 19:10 -0500, Ioan Raicu wrote: >>> >>> >>>> I think dCache is installed on Tier3. >>>> http://twiki.mwt2.org/bin/view/UCTier3/WebHome >>>> Ioan >>>> >>>> >>>> Mihael Hategan wrote: >>>> >>>> >>>>> Does anyone know of an installation I can play with? >>>>> Looking at the docs, I'm a bit reluctant to try to install it on my >>>>> laptop. >>>>> >>>>> Mihael >>>>> >>>>> _______________________________________________ >>>>> Swift-devel mailing list >>>>> Swift-devel at ci.uchicago.edu >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>> >>>>> >>>>> >>>>> >>> >>> >> -- >> ============================================ >> Ioan Raicu >> Ph.D. Student >> ============================================ >> Distributed Systems Laboratory >> Computer Science Department >> University of Chicago >> 1100 E. 58th Street, Ryerson Hall >> Chicago, IL 60637 >> ============================================ >> Email: iraicu at cs.uchicago.edu >> Web: http://www.cs.uchicago.edu/~iraicu >> http://dsl.cs.uchicago.edu/ >> ============================================ >> ============================================ >> > > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Wed Jul 25 19:32:45 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 25 Jul 2007 19:32:45 -0500 Subject: [Swift-devel] dcache In-Reply-To: <46A7EAF4.5060608@cs.uchicago.edu> References: <1185408188.21980.0.camel@blabla.mcs.anl.gov> <46A7E674.6060406@cs.uchicago.edu> <1185408781.22344.1.camel@blabla.mcs.anl.gov> <46A7E9D1.1050605@cs.uchicago.edu> <1185409594.23097.0.camel@blabla.mcs.anl.gov> <46A7EAF4.5060608@cs.uchicago.edu> Message-ID: <1185409965.23644.0.camel@blabla.mcs.anl.gov> None that you know of, I gather. Thanks. Mihael On Wed, 2007-07-25 at 19:29 -0500, Ioan Raicu wrote: > You asked how, I told you how Yong and I got accounts on Tier3, which > also has dCache installed. They actually have a really nice testbed, > some 20 compute nodes with 8GB of memory and 4 cores on each node, and > some 50TB of disk managed by dCache. I don't know of any other > install of dCache around here, such as TeraPort or TeraGrid. > > Ioan > > Mihael Hategan wrote: > > Any simpler way? > > > > On Wed, 2007-07-25 at 19:24 -0500, Ioan Raicu wrote: > > > > > Here is the message I got from Rob Gardner (rwg at ci.uchicago.edu) when > > > I got my account for Tier3. You might have to write him, Mary, and/or > > > double check the link below for further instructions. > > > > > > Ioan > > > > > > ========================= > > > Hi Mary, > > > > > > > > > Can you create accounts for Yong Zhao and Ioan Raicu, two CS computer > > > science students. They will > > > need to use the Tier3 cluster for a test tomorrow morning. > > > > > > > > > Yong, Ioan, the first step is to follow the instructions for UChicago > > > users at: > > > > > > > > > http://twiki.mwt2.org/bin/view/TWiki/TWikiRegistration > > > > > > > > > then Mary will create twiki accounts for you on the UC Tier3 twiki > > > which is not public. Then you'll > > > go to: > > > > > > > > > http://twiki.mwt2.org/bin/view/UCTier3/WebHome > > > > > > > > > and then http://twiki.mwt2.org/bin/view/UCTier3/GettingAnAccount. > > > > > > > > > Rob > > > > > > > > > > > > > > > > > > > > > Mihael Hategan wrote: > > > > > > > Right, and how would I get access to that? > > > > > > > > On Wed, 2007-07-25 at 19:10 -0500, Ioan Raicu wrote: > > > > > > > > > > > > > I think dCache is installed on Tier3. > > > > > http://twiki.mwt2.org/bin/view/UCTier3/WebHome > > > > > Ioan > > > > > > > > > > > > > > > Mihael Hategan wrote: > > > > > > > > > > > > > > > > Does anyone know of an installation I can play with? > > > > > > Looking at the docs, I'm a bit reluctant to try to install it on my > > > > > > laptop. > > > > > > > > > > > > Mihael > > > > > > > > > > > > _______________________________________________ > > > > > > Swift-devel mailing list > > > > > > Swift-devel at ci.uchicago.edu > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > ============================================ > > > Ioan Raicu > > > Ph.D. Student > > > ============================================ > > > Distributed Systems Laboratory > > > Computer Science Department > > > University of Chicago > > > 1100 E. 58th Street, Ryerson Hall > > > Chicago, IL 60637 > > > ============================================ > > > Email: iraicu at cs.uchicago.edu > > > Web: http://www.cs.uchicago.edu/~iraicu > > > http://dsl.cs.uchicago.edu/ > > > ============================================ > > > ============================================ > > > > > > > > > > > -- > ============================================ > Ioan Raicu > Ph.D. Student > ============================================ > Distributed Systems Laboratory > Computer Science Department > University of Chicago > 1100 E. 58th Street, Ryerson Hall > Chicago, IL 60637 > ============================================ > Email: iraicu at cs.uchicago.edu > Web: http://www.cs.uchicago.edu/~iraicu > http://dsl.cs.uchicago.edu/ > ============================================ > ============================================ From benc at hawaga.org.uk Thu Jul 26 04:38:25 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 26 Jul 2007 09:38:25 +0000 (GMT) Subject: [Swift-devel] dcache In-Reply-To: <1185408188.21980.0.camel@blabla.mcs.anl.gov> References: <1185408188.21980.0.camel@blabla.mcs.anl.gov> Message-ID: On Wed, 25 Jul 2007, Mihael Hategan wrote: > Does anyone know of an installation I can play with? > Looking at the docs, I'm a bit reluctant to try to install it on my > laptop. I have access to one at fermi - it was pretty straightforward to get access, as I already had a fermi account. -- From benc at hawaga.org.uk Thu Jul 26 06:07:56 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 26 Jul 2007 11:07:56 +0000 (GMT) Subject: [Swift-devel] dcache In-Reply-To: References: <1185408188.21980.0.camel@blabla.mcs.anl.gov> Message-ID: an extremely crude implementation of dcache-in-swift could be told which subtrees of the local posix filesystem namespace are actually dCache; and then the swift stage in and stage out code would have an additional step which would dccp the file to submit-side storage before sending it from there to the remote site. -- From benc at hawaga.org.uk Thu Jul 26 09:01:25 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 26 Jul 2007 14:01:25 +0000 (GMT) Subject: [Swift-devel] numeric type(s) in swift. In-Reply-To: <1185302836.6949.5.camel@blabla.mcs.anl.gov> References: <46A0D0D1.6070407@mcs.anl.gov> <46A11286.7080807@mcs.anl.gov> <1184964549.26024.0.camel@blabla.mcs.anl.gov> <1185302836.6949.5.camel@blabla.mcs.anl.gov> Message-ID: On Tue, 24 Jul 2007, Mihael Hategan wrote: > I'm thinking we should have two division operators: > div - integer division (int, int -> int) > / - floating point division ( [int|float], [int|float] -> float ) So a patch I have (not yet committed) makes / be floating point division, %/ be integer division and %% be mod (rather than %). (the % prefix on %/ and %% because those two operators are strongly related). No other particularly nice symbols spring to mind. With that in place, the operator changes you committed work without too much change to the language both against the XML development stuff and also against the trunk code. I'd be happy for those to go into trunk now, ahead of the big XML expression work. -- From bugzilla-daemon at mcs.anl.gov Thu Jul 26 09:52:31 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 26 Jul 2007 09:52:31 -0500 (CDT) Subject: [Swift-devel] [Bug 22] configurable remote filesystem layout In-Reply-To: Message-ID: <20070726145231.EB473164B3@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=22 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|v0.2 |v0.3 -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From hategan at mcs.anl.gov Thu Jul 26 10:03:37 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 26 Jul 2007 10:03:37 -0500 Subject: [Swift-devel] numeric type(s) in swift. In-Reply-To: References: <46A0D0D1.6070407@mcs.anl.gov> <46A11286.7080807@mcs.anl.gov> <1184964549.26024.0.camel@blabla.mcs.anl.gov> <1185302836.6949.5.camel@blabla.mcs.anl.gov> Message-ID: <1185462217.1578.11.camel@blabla.mcs.anl.gov> On Thu, 2007-07-26 at 14:01 +0000, Ben Clifford wrote: > On Tue, 24 Jul 2007, Mihael Hategan wrote: > > > I'm thinking we should have two division operators: > > div - integer division (int, int -> int) > > / - floating point division ( [int|float], [int|float] -> float ) > > So a patch I have (not yet committed) makes / be floating point division, > %/ be integer division and %% be mod (rather than %). > > (the % prefix on %/ and %% because those two operators are strongly > related). > > No other particularly nice symbols spring to mind. 'div' and 'mod'? > > With that in place, the operator changes you committed work without too > much change to the language both against the XML development stuff and > also against the trunk code. > > I'd be happy for those to go into trunk now, ahead of the big XML > expression work. > From hategan at mcs.anl.gov Thu Jul 26 10:04:11 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 26 Jul 2007 10:04:11 -0500 Subject: [Swift-devel] dcache In-Reply-To: References: <1185408188.21980.0.camel@blabla.mcs.anl.gov> Message-ID: <1185462251.1578.12.camel@blabla.mcs.anl.gov> On Thu, 2007-07-26 at 09:38 +0000, Ben Clifford wrote: > > On Wed, 25 Jul 2007, Mihael Hategan wrote: > > > Does anyone know of an installation I can play with? > > Looking at the docs, I'm a bit reluctant to try to install it on my > > laptop. > > I have access to one at fermi - it was pretty straightforward to get > access, as I already had a fermi account. Right. That's what I thought I would do, unless we have something really close. > From nefedova at mcs.anl.gov Fri Jul 27 10:22:52 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Fri, 27 Jul 2007 10:22:52 -0500 Subject: [Swift-devel] loops and strings Message-ID: I am not sure if its possible to do string operations inside the loop in swift? I have a versy simple test code that doesn't work no matter what. Obviously, I am missing something. This is the code: file fls[]; string wham_string = "#"; foreach prt_file in fls { wham_string = @strcat (wham_string, ", wham"); print (wham_string); } print (wham_string); basically I expect to have this as an output: #,wham,wham,wham,wham,... (its a test code (-;) instead I have these errors: wham_string is already assigned with a value of # wham_string is already assigned with a value of # vdl:assign @ test.kml, line: 46 vdl:mains @ test.kml, line: 39 Caused by: java.lang.IllegalArgumentException: wham_string is already assigned with a value of # at org.griphyn.vdl.mapping.AbstractDataNode.setValue (AbstractDataNode.java:255) at org.griphyn.vdl.karajan.lib.Assign.function(Assign.java:70) In any case -- if I can't construct the string by using the loop - how else could it be done? I use the constructed string then to map an array (I understand I can't map individual array elements): file whamfiles_$s[] ; //it was in the wrapper script before) Nika From hategan at mcs.anl.gov Fri Jul 27 10:46:06 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 27 Jul 2007 10:46:06 -0500 Subject: [Swift-devel] loops and strings In-Reply-To: References: Message-ID: <1185551166.17961.2.camel@blabla.mcs.anl.gov> Variables in swift are single assignment. You can't assign to a variable twice. What, in your opinion, should the error message be instead of the current one? On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote: > I am not sure if its possible to do string operations inside the loop > in swift? > I have a versy simple test code that doesn't work no matter what. > Obviously, I am missing something. > This is the code: > > file fls[]; > string wham_string = "#"; > foreach prt_file in fls > { > wham_string = @strcat (wham_string, ", wham"); > print (wham_string); > } > print (wham_string); > > > basically I expect to have this as an output: > #,wham,wham,wham,wham,... (its a test code (-;) > > instead I have these errors: > > wham_string is already assigned with a value of # > wham_string is already assigned with a value of # > vdl:assign @ test.kml, line: 46 > vdl:mains @ test.kml, line: 39 > Caused by: java.lang.IllegalArgumentException: wham_string is already > assigned with a value of # > at org.griphyn.vdl.mapping.AbstractDataNode.setValue > (AbstractDataNode.java:255) > at org.griphyn.vdl.karajan.lib.Assign.function(Assign.java:70) > > > > In any case -- if I can't construct the string by using the loop - > how else could it be done? > > I use the constructed string then to map an array (I understand I > can't map individual array elements): > > file whamfiles_$s[] ; //it > was in the wrapper script before) > > > Nika > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From nefedova at mcs.anl.gov Fri Jul 27 10:50:51 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Fri, 27 Jul 2007 10:50:51 -0500 Subject: [Swift-devel] loops and strings In-Reply-To: <1185551166.17961.2.camel@blabla.mcs.anl.gov> References: <1185551166.17961.2.camel@blabla.mcs.anl.gov> Message-ID: So how else then I construct a string in swift ? On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote: > Variables in swift are single assignment. You can't assign to a > variable > twice. What, in your opinion, should the error message be instead > of the > current one? > > On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote: >> I am not sure if its possible to do string operations inside the loop >> in swift? >> I have a versy simple test code that doesn't work no matter what. >> Obviously, I am missing something. >> This is the code: >> >> file fls[]; >> string wham_string = "#"; >> foreach prt_file in fls >> { >> wham_string = @strcat (wham_string, ", wham"); >> print (wham_string); >> } >> print (wham_string); >> >> >> basically I expect to have this as an output: >> #,wham,wham,wham,wham,... (its a test code (-;) >> >> instead I have these errors: >> >> wham_string is already assigned with a value of # >> wham_string is already assigned with a value of # >> vdl:assign @ test.kml, line: 46 >> vdl:mains @ test.kml, line: 39 >> Caused by: java.lang.IllegalArgumentException: wham_string is already >> assigned with a value of # >> at org.griphyn.vdl.mapping.AbstractDataNode.setValue >> (AbstractDataNode.java:255) >> at org.griphyn.vdl.karajan.lib.Assign.function >> (Assign.java:70) >> >> >> >> In any case -- if I can't construct the string by using the loop - >> how else could it be done? >> >> I use the constructed string then to map an array (I understand I >> can't map individual array elements): >> >> file whamfiles_$s[] ; //it >> was in the wrapper script before) >> >> >> Nika >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > From hategan at mcs.anl.gov Fri Jul 27 11:01:59 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 27 Jul 2007 11:01:59 -0500 Subject: [Swift-devel] loops and strings In-Reply-To: References: <1185551166.17961.2.camel@blabla.mcs.anl.gov> Message-ID: <1185552119.18583.4.camel@blabla.mcs.anl.gov> wham_string2 = @strcat(wham_string, ", wham"); print(wham_string2); Variables are not variables. They are labels that are used to direct the data flow. Loops (in the sense of data looping around the same node - picture this as a data flow graph) make no sense. On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote: > So how else then I construct a string in swift ? > > > On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote: > > > Variables in swift are single assignment. You can't assign to a > > variable > > twice. What, in your opinion, should the error message be instead > > of the > > current one? > > > > On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote: > >> I am not sure if its possible to do string operations inside the loop > >> in swift? > >> I have a versy simple test code that doesn't work no matter what. > >> Obviously, I am missing something. > >> This is the code: > >> > >> file fls[]; > >> string wham_string = "#"; > >> foreach prt_file in fls > >> { > >> wham_string = @strcat (wham_string, ", wham"); > >> print (wham_string); > >> } > >> print (wham_string); > >> > >> > >> basically I expect to have this as an output: > >> #,wham,wham,wham,wham,... (its a test code (-;) > >> > >> instead I have these errors: > >> > >> wham_string is already assigned with a value of # > >> wham_string is already assigned with a value of # > >> vdl:assign @ test.kml, line: 46 > >> vdl:mains @ test.kml, line: 39 > >> Caused by: java.lang.IllegalArgumentException: wham_string is already > >> assigned with a value of # > >> at org.griphyn.vdl.mapping.AbstractDataNode.setValue > >> (AbstractDataNode.java:255) > >> at org.griphyn.vdl.karajan.lib.Assign.function > >> (Assign.java:70) > >> > >> > >> > >> In any case -- if I can't construct the string by using the loop - > >> how else could it be done? > >> > >> I use the constructed string then to map an array (I understand I > >> can't map individual array elements): > >> > >> file whamfiles_$s[] ; //it > >> was in the wrapper script before) > >> > >> > >> Nika > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >> > > > From nefedova at mcs.anl.gov Fri Jul 27 11:09:19 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Fri, 27 Jul 2007 11:09:19 -0500 Subject: [Swift-devel] loops and strings In-Reply-To: <1185552119.18583.4.camel@blabla.mcs.anl.gov> References: <1185551166.17961.2.camel@blabla.mcs.anl.gov> <1185552119.18583.4.camel@blabla.mcs.anl.gov> Message-ID: <866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov> I need to 'cat' together an unknown number of strings to form a string, thats why I was attempting to do it inside the loop. And even if I knew the number of loop cycles (say, its 68) -- are you suggesting to do it 'by hand' ? Anyway - my main goal is not to create this string, but to map an array: file whamfiles_$s[] ; Do you see a solution here? Thanks, Nika On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote: > wham_string2 = @strcat(wham_string, ", wham"); > print(wham_string2); > > Variables are not variables. They are labels that are used to > direct the > data flow. Loops (in the sense of data looping around the same node - > picture this as a data flow graph) make no sense. > > On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote: >> So how else then I construct a string in swift ? >> >> >> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote: >> >>> Variables in swift are single assignment. You can't assign to a >>> variable >>> twice. What, in your opinion, should the error message be instead >>> of the >>> current one? >>> >>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote: >>>> I am not sure if its possible to do string operations inside the >>>> loop >>>> in swift? >>>> I have a versy simple test code that doesn't work no matter what. >>>> Obviously, I am missing something. >>>> This is the code: >>>> >>>> file fls[]; >>>> string wham_string = "#"; >>>> foreach prt_file in fls >>>> { >>>> wham_string = @strcat (wham_string, ", wham"); >>>> print (wham_string); >>>> } >>>> print (wham_string); >>>> >>>> >>>> basically I expect to have this as an output: >>>> #,wham,wham,wham,wham,... (its a test code (-;) >>>> >>>> instead I have these errors: >>>> >>>> wham_string is already assigned with a value of # >>>> wham_string is already assigned with a value of # >>>> vdl:assign @ test.kml, line: 46 >>>> vdl:mains @ test.kml, line: 39 >>>> Caused by: java.lang.IllegalArgumentException: wham_string is >>>> already >>>> assigned with a value of # >>>> at org.griphyn.vdl.mapping.AbstractDataNode.setValue >>>> (AbstractDataNode.java:255) >>>> at org.griphyn.vdl.karajan.lib.Assign.function >>>> (Assign.java:70) >>>> >>>> >>>> >>>> In any case -- if I can't construct the string by using the loop - >>>> how else could it be done? >>>> >>>> I use the constructed string then to map an array (I understand I >>>> can't map individual array elements): >>>> >>>> file whamfiles_$s[] ; //it >>>> was in the wrapper script before) >>>> >>>> >>>> Nika >>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>> >> > From hategan at mcs.anl.gov Fri Jul 27 13:24:34 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 27 Jul 2007 13:24:34 -0500 Subject: [Swift-devel] loops and strings In-Reply-To: <866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov> References: <1185551166.17961.2.camel@blabla.mcs.anl.gov> <1185552119.18583.4.camel@blabla.mcs.anl.gov> <866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov> Message-ID: <1185560674.19922.7.camel@blabla.mcs.anl.gov> I see we're getting back to the same old story of the conflict between writing a mapper and hacking one directly in swift. This is an issue we really need to deal with. It has produced more discussions and hacks than any other single Swift issue. You could use an array, or we could provide a folding operator/function, or even a join function. We could also let fixed_array_mapper accept an array as a parameter, so you would build an array with the file names and then pass it to the mapper. On Fri, 2007-07-27 at 11:09 -0500, Veronika Nefedova wrote: > I need to 'cat' together an unknown number of strings to form a > string, thats why I was attempting to do it inside the loop. And even > if I knew the number of loop cycles (say, its 68) -- are you > suggesting to do it 'by hand' ? > > > Anyway - my main goal is not to create this string, but to map an array: > file whamfiles_$s[] ; > > Do you see a solution here? > > Thanks, > > Nika > > > On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote: > > > wham_string2 = @strcat(wham_string, ", wham"); > > print(wham_string2); > > > > Variables are not variables. They are labels that are used to > > direct the > > data flow. Loops (in the sense of data looping around the same node - > > picture this as a data flow graph) make no sense. > > > > On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote: > >> So how else then I construct a string in swift ? > >> > >> > >> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote: > >> > >>> Variables in swift are single assignment. You can't assign to a > >>> variable > >>> twice. What, in your opinion, should the error message be instead > >>> of the > >>> current one? > >>> > >>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote: > >>>> I am not sure if its possible to do string operations inside the > >>>> loop > >>>> in swift? > >>>> I have a versy simple test code that doesn't work no matter what. > >>>> Obviously, I am missing something. > >>>> This is the code: > >>>> > >>>> file fls[]; > >>>> string wham_string = "#"; > >>>> foreach prt_file in fls > >>>> { > >>>> wham_string = @strcat (wham_string, ", wham"); > >>>> print (wham_string); > >>>> } > >>>> print (wham_string); > >>>> > >>>> > >>>> basically I expect to have this as an output: > >>>> #,wham,wham,wham,wham,... (its a test code (-;) > >>>> > >>>> instead I have these errors: > >>>> > >>>> wham_string is already assigned with a value of # > >>>> wham_string is already assigned with a value of # > >>>> vdl:assign @ test.kml, line: 46 > >>>> vdl:mains @ test.kml, line: 39 > >>>> Caused by: java.lang.IllegalArgumentException: wham_string is > >>>> already > >>>> assigned with a value of # > >>>> at org.griphyn.vdl.mapping.AbstractDataNode.setValue > >>>> (AbstractDataNode.java:255) > >>>> at org.griphyn.vdl.karajan.lib.Assign.function > >>>> (Assign.java:70) > >>>> > >>>> > >>>> > >>>> In any case -- if I can't construct the string by using the loop - > >>>> how else could it be done? > >>>> > >>>> I use the constructed string then to map an array (I understand I > >>>> can't map individual array elements): > >>>> > >>>> file whamfiles_$s[] ; //it > >>>> was in the wrapper script before) > >>>> > >>>> > >>>> Nika > >>>> > >>>> _______________________________________________ > >>>> Swift-devel mailing list > >>>> Swift-devel at ci.uchicago.edu > >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>> > >>> > >> > > > From hategan at mcs.anl.gov Fri Jul 27 13:36:22 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 27 Jul 2007 13:36:22 -0500 Subject: [Swift-devel] loops and strings In-Reply-To: <410163427-1185561143-cardhu_decombobulator_blackberry.rim.net-1435636067-@bxe009.bisx.prod.on.blackberry> References: <1185551166.17961.2.camel@blabla.mcs.anl.gov> <1185552119.18583.4.camel@blabla.mcs.anl.gov> <866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov> <1185560674.19922.7.camel@blabla.mcs.anl.gov> <410163427-1185561143-cardhu_decombobulator_blackberry.rim.net-1435636067-@bxe009.bisx.prod.on.blackberry> Message-ID: <1185561383.21161.2.camel@blabla.mcs.anl.gov> I wish. I think we all need to think about it. On Fri, 2007-07-27 at 18:32 +0000, Ian Foster wrote: > Can you propose a general solution? > > Sent via BlackBerry from T-Mobile > > -----Original Message----- > From: Mihael Hategan > > Date: Fri, 27 Jul 2007 13:24:34 > To:Veronika Nefedova > Cc:swift-devel at ci.uchicago.edu > Subject: Re: [Swift-devel] loops and strings > > > I see we're getting back to the same old story of the conflict between > writing a mapper and hacking one directly in swift. > > This is an issue we really need to deal with. It has produced more > discussions and hacks than any other single Swift issue. > > You could use an array, or we could provide a folding operator/function, > or even a join function. > We could also let fixed_array_mapper accept an array as a parameter, so > you would build an array with the file names and then pass it to the > mapper. > > On Fri, 2007-07-27 at 11:09 -0500, Veronika Nefedova wrote: > > I need to 'cat' together an unknown number of strings to form a > > string, thats why I was attempting to do it inside the loop. And even > > if I knew the number of loop cycles (say, its 68) -- are you > > suggesting to do it 'by hand' ? > > > > > > Anyway - my main goal is not to create this string, but to map an array: > > file whamfiles_$s[] ; > > > > Do you see a solution here? > > > > Thanks, > > > > Nika > > > > > > On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote: > > > > > wham_string2 = @strcat(wham_string, ", wham"); > > > print(wham_string2); > > > > > > Variables are not variables. They are labels that are used to > > > direct the > > > data flow. Loops (in the sense of data looping around the same node - > > > picture this as a data flow graph) make no sense. > > > > > > On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote: > > >> So how else then I construct a string in swift ? > > >> > > >> > > >> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote: > > >> > > >>> Variables in swift are single assignment. You can't assign to a > > >>> variable > > >>> twice. What, in your opinion, should the error message be instead > > >>> of the > > >>> current one? > > >>> > > >>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote: > > >>>> I am not sure if its possible to do string operations inside the > > >>>> loop > > >>>> in swift? > > >>>> I have a versy simple test code that doesn't work no matter what. > > >>>> Obviously, I am missing something. > > >>>> This is the code: > > >>>> > > >>>> file fls[]; > > >>>> string wham_string = "#"; > > >>>> foreach prt_file in fls > > >>>> { > > >>>> wham_string = @strcat (wham_string, ", wham"); > > >>>> print (wham_string); > > >>>> } > > >>>> print (wham_string); > > >>>> > > >>>> > > >>>> basically I expect to have this as an output: > > >>>> #,wham,wham,wham,wham,... (its a test code (-;) > > >>>> > > >>>> instead I have these errors: > > >>>> > > >>>> wham_string is already assigned with a value of # > > >>>> wham_string is already assigned with a value of # > > >>>> vdl:assign @ test.kml, line: 46 > > >>>> vdl:mains @ test.kml, line: 39 > > >>>> Caused by: java.lang.IllegalArgumentException: wham_string is > > >>>> already > > >>>> assigned with a value of # > > >>>> at org.griphyn.vdl.mapping.AbstractDataNode.setValue > > >>>> (AbstractDataNode.java:255) > > >>>> at org.griphyn.vdl.karajan.lib.Assign.function > > >>>> (Assign.java:70) > > >>>> > > >>>> > > >>>> > > >>>> In any case -- if I can't construct the string by using the loop - > > >>>> how else could it be done? > > >>>> > > >>>> I use the constructed string then to map an array (I understand I > > >>>> can't map individual array elements): > > >>>> > > >>>> file whamfiles_$s[] ; //it > > >>>> was in the wrapper script before) > > >>>> > > >>>> > > >>>> Nika > > >>>> > > >>>> _______________________________________________ > > >>>> Swift-devel mailing list > > >>>> Swift-devel at ci.uchicago.edu > > >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > >>>> > > >>> > > >> > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From itf at mcs.anl.gov Fri Jul 27 13:32:16 2007 From: itf at mcs.anl.gov (=?utf-8?B?SWFuIEZvc3Rlcg==?=) Date: Fri, 27 Jul 2007 18:32:16 +0000 Subject: [Swift-devel] loops and strings In-Reply-To: <1185560674.19922.7.camel@blabla.mcs.anl.gov> References: <1185551166.17961.2.camel@blabla.mcs.anl.gov><1185552119.18583.4.camel@blabla.mcs.anl.gov><866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov><1185560674.19922.7.camel@blabla.mcs.anl.gov> Message-ID: <410163427-1185561143-cardhu_decombobulator_blackberry.rim.net-1435636067-@bxe009.bisx.prod.on.blackberry> Can you propose a general solution? Sent via BlackBerry from T-Mobile -----Original Message----- From: Mihael Hategan Date: Fri, 27 Jul 2007 13:24:34 To:Veronika Nefedova Cc:swift-devel at ci.uchicago.edu Subject: Re: [Swift-devel] loops and strings I see we're getting back to the same old story of the conflict between writing a mapper and hacking one directly in swift. This is an issue we really need to deal with. It has produced more discussions and hacks than any other single Swift issue. You could use an array, or we could provide a folding operator/function, or even a join function. We could also let fixed_array_mapper accept an array as a parameter, so you would build an array with the file names and then pass it to the mapper. On Fri, 2007-07-27 at 11:09 -0500, Veronika Nefedova wrote: > I need to 'cat' together an unknown number of strings to form a > string, thats why I was attempting to do it inside the loop. And even > if I knew the number of loop cycles (say, its 68) -- are you > suggesting to do it 'by hand' ? > > > Anyway - my main goal is not to create this string, but to map an array: > file whamfiles_$s[] ; > > Do you see a solution here? > > Thanks, > > Nika > > > On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote: > > > wham_string2 = @strcat(wham_string, ", wham"); > > print(wham_string2); > > > > Variables are not variables. They are labels that are used to > > direct the > > data flow. Loops (in the sense of data looping around the same node - > > picture this as a data flow graph) make no sense. > > > > On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote: > >> So how else then I construct a string in swift ? > >> > >> > >> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote: > >> > >>> Variables in swift are single assignment. You can't assign to a > >>> variable > >>> twice. What, in your opinion, should the error message be instead > >>> of the > >>> current one? > >>> > >>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote: > >>>> I am not sure if its possible to do string operations inside the > >>>> loop > >>>> in swift? > >>>> I have a versy simple test code that doesn't work no matter what. > >>>> Obviously, I am missing something. > >>>> This is the code: > >>>> > >>>> file fls[]; > >>>> string wham_string = "#"; > >>>> foreach prt_file in fls > >>>> { > >>>> wham_string = @strcat (wham_string, ", wham"); > >>>> print (wham_string); > >>>> } > >>>> print (wham_string); > >>>> > >>>> > >>>> basically I expect to have this as an output: > >>>> #,wham,wham,wham,wham,... (its a test code (-;) > >>>> > >>>> instead I have these errors: > >>>> > >>>> wham_string is already assigned with a value of # > >>>> wham_string is already assigned with a value of # > >>>> vdl:assign @ test.kml, line: 46 > >>>> vdl:mains @ test.kml, line: 39 > >>>> Caused by: java.lang.IllegalArgumentException: wham_string is > >>>> already > >>>> assigned with a value of # > >>>> at org.griphyn.vdl.mapping.AbstractDataNode.setValue > >>>> (AbstractDataNode.java:255) > >>>> at org.griphyn.vdl.karajan.lib.Assign.function > >>>> (Assign.java:70) > >>>> > >>>> > >>>> > >>>> In any case -- if I can't construct the string by using the loop - > >>>> how else could it be done? > >>>> > >>>> I use the constructed string then to map an array (I understand I > >>>> can't map individual array elements): > >>>> > >>>> file whamfiles_$s[] ; //it > >>>> was in the wrapper script before) > >>>> > >>>> > >>>> Nika > >>>> > >>>> _______________________________________________ > >>>> Swift-devel mailing list > >>>> Swift-devel at ci.uchicago.edu > >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>> > >>> > >> > > > _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From nefedova at mcs.anl.gov Fri Jul 27 14:01:52 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Fri, 27 Jul 2007 14:01:52 -0500 Subject: [Swift-devel] loops and strings In-Reply-To: <1185560674.19922.7.camel@blabla.mcs.anl.gov> References: <1185551166.17961.2.camel@blabla.mcs.anl.gov> <1185552119.18583.4.camel@blabla.mcs.anl.gov> <866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov> <1185560674.19922.7.camel@blabla.mcs.anl.gov> Message-ID: <4AF4ED33-613B-4193-AD83-B2C79D286F38@mcs.anl.gov> will allowing multiple assignments to the same variable be a really impossible thing to have in swift? Nika On Jul 27, 2007, at 1:24 PM, Mihael Hategan wrote: > I see we're getting back to the same old story of the conflict between > writing a mapper and hacking one directly in swift. > > This is an issue we really need to deal with. It has produced more > discussions and hacks than any other single Swift issue. > > You could use an array, or we could provide a folding operator/ > function, > or even a join function. > We could also let fixed_array_mapper accept an array as a > parameter, so > you would build an array with the file names and then pass it to the > mapper. > > On Fri, 2007-07-27 at 11:09 -0500, Veronika Nefedova wrote: >> I need to 'cat' together an unknown number of strings to form a >> string, thats why I was attempting to do it inside the loop. And even >> if I knew the number of loop cycles (say, its 68) -- are you >> suggesting to do it 'by hand' ? >> >> >> Anyway - my main goal is not to create this string, but to map an >> array: >> file whamfiles_$s[] ; >> >> Do you see a solution here? >> >> Thanks, >> >> Nika >> >> >> On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote: >> >>> wham_string2 = @strcat(wham_string, ", wham"); >>> print(wham_string2); >>> >>> Variables are not variables. They are labels that are used to >>> direct the >>> data flow. Loops (in the sense of data looping around the same >>> node - >>> picture this as a data flow graph) make no sense. >>> >>> On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote: >>>> So how else then I construct a string in swift ? >>>> >>>> >>>> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote: >>>> >>>>> Variables in swift are single assignment. You can't assign to a >>>>> variable >>>>> twice. What, in your opinion, should the error message be instead >>>>> of the >>>>> current one? >>>>> >>>>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote: >>>>>> I am not sure if its possible to do string operations inside the >>>>>> loop >>>>>> in swift? >>>>>> I have a versy simple test code that doesn't work no matter what. >>>>>> Obviously, I am missing something. >>>>>> This is the code: >>>>>> >>>>>> file fls[]; >>>>>> string wham_string = "#"; >>>>>> foreach prt_file in fls >>>>>> { >>>>>> wham_string = @strcat (wham_string, ", wham"); >>>>>> print (wham_string); >>>>>> } >>>>>> print (wham_string); >>>>>> >>>>>> >>>>>> basically I expect to have this as an output: >>>>>> #,wham,wham,wham,wham,... (its a test code (-;) >>>>>> >>>>>> instead I have these errors: >>>>>> >>>>>> wham_string is already assigned with a value of # >>>>>> wham_string is already assigned with a value of # >>>>>> vdl:assign @ test.kml, line: 46 >>>>>> vdl:mains @ test.kml, line: 39 >>>>>> Caused by: java.lang.IllegalArgumentException: wham_string is >>>>>> already >>>>>> assigned with a value of # >>>>>> at org.griphyn.vdl.mapping.AbstractDataNode.setValue >>>>>> (AbstractDataNode.java:255) >>>>>> at org.griphyn.vdl.karajan.lib.Assign.function >>>>>> (Assign.java:70) >>>>>> >>>>>> >>>>>> >>>>>> In any case -- if I can't construct the string by using the >>>>>> loop - >>>>>> how else could it be done? >>>>>> >>>>>> I use the constructed string then to map an array (I understand I >>>>>> can't map individual array elements): >>>>>> >>>>>> file whamfiles_$s[] >>>>>> ; //it >>>>>> was in the wrapper script before) >>>>>> >>>>>> >>>>>> Nika >>>>>> >>>>>> _______________________________________________ >>>>>> Swift-devel mailing list >>>>>> Swift-devel at ci.uchicago.edu >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>> >>>>> >>>> >>> >> > From hategan at mcs.anl.gov Fri Jul 27 14:11:09 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 27 Jul 2007 14:11:09 -0500 Subject: [Swift-devel] loops and strings In-Reply-To: <4AF4ED33-613B-4193-AD83-B2C79D286F38@mcs.anl.gov> References: <1185551166.17961.2.camel@blabla.mcs.anl.gov> <1185552119.18583.4.camel@blabla.mcs.anl.gov> <866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov> <1185560674.19922.7.camel@blabla.mcs.anl.gov> <4AF4ED33-613B-4193-AD83-B2C79D286F38@mcs.anl.gov> Message-ID: <1185563469.22752.7.camel@blabla.mcs.anl.gov> On Fri, 2007-07-27 at 14:01 -0500, Veronika Nefedova wrote: > will allowing multiple assignments to the same variable be a really > impossible thing to have in swift? With what we currently have as "Swift", yes. > > Nika > > On Jul 27, 2007, at 1:24 PM, Mihael Hategan wrote: > > I see we're getting back to the same old story of the conflict between > > writing a mapper and hacking one directly in swift. > > > > This is an issue we really need to deal with. It has produced more > > discussions and hacks than any other single Swift issue. > > > > You could use an array, or we could provide a folding operator/ > > function, > > or even a join function. > > We could also let fixed_array_mapper accept an array as a > > parameter, so > > you would build an array with the file names and then pass it to the > > mapper. > > > > On Fri, 2007-07-27 at 11:09 -0500, Veronika Nefedova wrote: > >> I need to 'cat' together an unknown number of strings to form a > >> string, thats why I was attempting to do it inside the loop. And even > >> if I knew the number of loop cycles (say, its 68) -- are you > >> suggesting to do it 'by hand' ? > >> > >> > >> Anyway - my main goal is not to create this string, but to map an > >> array: > >> file whamfiles_$s[] ; > >> > >> Do you see a solution here? > >> > >> Thanks, > >> > >> Nika > >> > >> > >> On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote: > >> > >>> wham_string2 = @strcat(wham_string, ", wham"); > >>> print(wham_string2); > >>> > >>> Variables are not variables. They are labels that are used to > >>> direct the > >>> data flow. Loops (in the sense of data looping around the same > >>> node - > >>> picture this as a data flow graph) make no sense. > >>> > >>> On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote: > >>>> So how else then I construct a string in swift ? > >>>> > >>>> > >>>> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote: > >>>> > >>>>> Variables in swift are single assignment. You can't assign to a > >>>>> variable > >>>>> twice. What, in your opinion, should the error message be instead > >>>>> of the > >>>>> current one? > >>>>> > >>>>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote: > >>>>>> I am not sure if its possible to do string operations inside the > >>>>>> loop > >>>>>> in swift? > >>>>>> I have a versy simple test code that doesn't work no matter what. > >>>>>> Obviously, I am missing something. > >>>>>> This is the code: > >>>>>> > >>>>>> file fls[]; > >>>>>> string wham_string = "#"; > >>>>>> foreach prt_file in fls > >>>>>> { > >>>>>> wham_string = @strcat (wham_string, ", wham"); > >>>>>> print (wham_string); > >>>>>> } > >>>>>> print (wham_string); > >>>>>> > >>>>>> > >>>>>> basically I expect to have this as an output: > >>>>>> #,wham,wham,wham,wham,... (its a test code (-;) > >>>>>> > >>>>>> instead I have these errors: > >>>>>> > >>>>>> wham_string is already assigned with a value of # > >>>>>> wham_string is already assigned with a value of # > >>>>>> vdl:assign @ test.kml, line: 46 > >>>>>> vdl:mains @ test.kml, line: 39 > >>>>>> Caused by: java.lang.IllegalArgumentException: wham_string is > >>>>>> already > >>>>>> assigned with a value of # > >>>>>> at org.griphyn.vdl.mapping.AbstractDataNode.setValue > >>>>>> (AbstractDataNode.java:255) > >>>>>> at org.griphyn.vdl.karajan.lib.Assign.function > >>>>>> (Assign.java:70) > >>>>>> > >>>>>> > >>>>>> > >>>>>> In any case -- if I can't construct the string by using the > >>>>>> loop - > >>>>>> how else could it be done? > >>>>>> > >>>>>> I use the constructed string then to map an array (I understand I > >>>>>> can't map individual array elements): > >>>>>> > >>>>>> file whamfiles_$s[] > >>>>>> ; //it > >>>>>> was in the wrapper script before) > >>>>>> > >>>>>> > >>>>>> Nika > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Swift-devel mailing list > >>>>>> Swift-devel at ci.uchicago.edu > >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>>> > >>>>> > >>>> > >>> > >> > > > From itf at mcs.anl.gov Fri Jul 27 14:20:11 2007 From: itf at mcs.anl.gov (=?utf-8?B?SWFuIEZvc3Rlcg==?=) Date: Fri, 27 Jul 2007 19:20:11 +0000 Subject: [Swift-devel] loops and strings In-Reply-To: <866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov> References: <1185551166.17961.2.camel@blabla.mcs.anl.gov><1185552119.18583.4.camel@blabla.mcs.anl.gov><866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov> Message-ID: <625535667-1185564017-cardhu_decombobulator_blackberry.rim.net-1248866056-@bxe009.bisx.prod.on.blackberry> Could you not handle the "cat a set of strings" case via a call to a shell script or other program that does this? Ian Sent via BlackBerry from T-Mobile -----Original Message----- From: Veronika Nefedova Date: Fri, 27 Jul 2007 11:09:19 To:Mihael Hategan Cc:swift-devel at ci.uchicago.edu Subject: Re: [Swift-devel] loops and strings I need to 'cat' together an unknown number of strings to form a string, thats why I was attempting to do it inside the loop. And even if I knew the number of loop cycles (say, its 68) -- are you suggesting to do it 'by hand' ? Anyway - my main goal is not to create this string, but to map an array: file whamfiles_$s[] ; Do you see a solution here? Thanks, Nika On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote: > wham_string2 = @strcat(wham_string, ", wham"); > print(wham_string2); > > Variables are not variables. They are labels that are used to > direct the > data flow. Loops (in the sense of data looping around the same node - > picture this as a data flow graph) make no sense. > > On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote: >> So how else then I construct a string in swift ? >> >> >> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote: >> >>> Variables in swift are single assignment. You can't assign to a >>> variable >>> twice. What, in your opinion, should the error message be instead >>> of the >>> current one? >>> >>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote: >>>> I am not sure if its possible to do string operations inside the >>>> loop >>>> in swift? >>>> I have a versy simple test code that doesn't work no matter what. >>>> Obviously, I am missing something. >>>> This is the code: >>>> >>>> file fls[]; >>>> string wham_string = "#"; >>>> foreach prt_file in fls >>>> { >>>> wham_string = @strcat (wham_string, ", wham"); >>>> print (wham_string); >>>> } >>>> print (wham_string); >>>> >>>> >>>> basically I expect to have this as an output: >>>> #,wham,wham,wham,wham,... (its a test code (-;) >>>> >>>> instead I have these errors: >>>> >>>> wham_string is already assigned with a value of # >>>> wham_string is already assigned with a value of # >>>> vdl:assign @ test.kml, line: 46 >>>> vdl:mains @ test.kml, line: 39 >>>> Caused by: java.lang.IllegalArgumentException: wham_string is >>>> already >>>> assigned with a value of # >>>> at org.griphyn.vdl.mapping.AbstractDataNode.setValue >>>> (AbstractDataNode.java:255) >>>> at org.griphyn.vdl.karajan.lib.Assign.function >>>> (Assign.java:70) >>>> >>>> >>>> >>>> In any case -- if I can't construct the string by using the loop - >>>> how else could it be done? >>>> >>>> I use the constructed string then to map an array (I understand I >>>> can't map individual array elements): >>>> >>>> file whamfiles_$s[] ; //it >>>> was in the wrapper script before) >>>> >>>> >>>> Nika >>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>> >> > _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From nefedova at mcs.anl.gov Fri Jul 27 14:26:36 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Fri, 27 Jul 2007 14:26:36 -0500 Subject: [Swift-devel] loops and strings In-Reply-To: <1185563469.22752.7.camel@blabla.mcs.anl.gov> References: <1185551166.17961.2.camel@blabla.mcs.anl.gov> <1185552119.18583.4.camel@blabla.mcs.anl.gov> <866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov> <1185560674.19922.7.camel@blabla.mcs.anl.gov> <4AF4ED33-613B-4193-AD83-B2C79D286F38@mcs.anl.gov> <1185563469.22752.7.camel@blabla.mcs.anl.gov> Message-ID: <68DFC8CA-3B70-4D09-94DA-786DD9BB9572@mcs.anl.gov> I guess I am still missing something. I *can* have multiple assignments to the same variable inside the loop. Here, this code assigns different values to "name" at every loop step: file fls[]; foreach prt_file in fls { string name = @strcut (@prt_file, "\.\/(.*)\.prt"); print (name); } Or "name" considered to be a new variable every time since I have a type declaration next to it? Nika On Jul 27, 2007, at 2:11 PM, Mihael Hategan wrote: > On Fri, 2007-07-27 at 14:01 -0500, Veronika Nefedova wrote: >> will allowing multiple assignments to the same variable be a really >> impossible thing to have in swift? > > With what we currently have as "Swift", yes. > >> >> Nika >> >> On Jul 27, 2007, at 1:24 PM, Mihael Hategan wrote: >>> I see we're getting back to the same old story of the conflict >>> between >>> writing a mapper and hacking one directly in swift. >>> >>> This is an issue we really need to deal with. It has produced more >>> discussions and hacks than any other single Swift issue. >>> >>> You could use an array, or we could provide a folding operator/ >>> function, >>> or even a join function. >>> We could also let fixed_array_mapper accept an array as a >>> parameter, so >>> you would build an array with the file names and then pass it to the >>> mapper. >>> >>> On Fri, 2007-07-27 at 11:09 -0500, Veronika Nefedova wrote: >>>> I need to 'cat' together an unknown number of strings to form a >>>> string, thats why I was attempting to do it inside the loop. And >>>> even >>>> if I knew the number of loop cycles (say, its 68) -- are you >>>> suggesting to do it 'by hand' ? >>>> >>>> >>>> Anyway - my main goal is not to create this string, but to map an >>>> array: >>>> file whamfiles_$s[] ; >>>> >>>> Do you see a solution here? >>>> >>>> Thanks, >>>> >>>> Nika >>>> >>>> >>>> On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote: >>>> >>>>> wham_string2 = @strcat(wham_string, ", wham"); >>>>> print(wham_string2); >>>>> >>>>> Variables are not variables. They are labels that are used to >>>>> direct the >>>>> data flow. Loops (in the sense of data looping around the same >>>>> node - >>>>> picture this as a data flow graph) make no sense. >>>>> >>>>> On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote: >>>>>> So how else then I construct a string in swift ? >>>>>> >>>>>> >>>>>> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote: >>>>>> >>>>>>> Variables in swift are single assignment. You can't assign to a >>>>>>> variable >>>>>>> twice. What, in your opinion, should the error message be >>>>>>> instead >>>>>>> of the >>>>>>> current one? >>>>>>> >>>>>>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote: >>>>>>>> I am not sure if its possible to do string operations inside >>>>>>>> the >>>>>>>> loop >>>>>>>> in swift? >>>>>>>> I have a versy simple test code that doesn't work no matter >>>>>>>> what. >>>>>>>> Obviously, I am missing something. >>>>>>>> This is the code: >>>>>>>> >>>>>>>> file fls[]; >>>>>>>> string wham_string = "#"; >>>>>>>> foreach prt_file in fls >>>>>>>> { >>>>>>>> wham_string = @strcat (wham_string, ", wham"); >>>>>>>> print (wham_string); >>>>>>>> } >>>>>>>> print (wham_string); >>>>>>>> >>>>>>>> >>>>>>>> basically I expect to have this as an output: >>>>>>>> #,wham,wham,wham,wham,... (its a test code (-;) >>>>>>>> >>>>>>>> instead I have these errors: >>>>>>>> >>>>>>>> wham_string is already assigned with a value of # >>>>>>>> wham_string is already assigned with a value of # >>>>>>>> vdl:assign @ test.kml, line: 46 >>>>>>>> vdl:mains @ test.kml, line: 39 >>>>>>>> Caused by: java.lang.IllegalArgumentException: wham_string is >>>>>>>> already >>>>>>>> assigned with a value of # >>>>>>>> at org.griphyn.vdl.mapping.AbstractDataNode.setValue >>>>>>>> (AbstractDataNode.java:255) >>>>>>>> at org.griphyn.vdl.karajan.lib.Assign.function >>>>>>>> (Assign.java:70) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> In any case -- if I can't construct the string by using the >>>>>>>> loop - >>>>>>>> how else could it be done? >>>>>>>> >>>>>>>> I use the constructed string then to map an array (I >>>>>>>> understand I >>>>>>>> can't map individual array elements): >>>>>>>> >>>>>>>> file whamfiles_$s[] >>>>>>>> ; //it >>>>>>>> was in the wrapper script before) >>>>>>>> >>>>>>>> >>>>>>>> Nika >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Swift-devel mailing list >>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > From nefedova at mcs.anl.gov Fri Jul 27 14:39:12 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Fri, 27 Jul 2007 14:39:12 -0500 Subject: [Swift-devel] loops and strings In-Reply-To: <625535667-1185564017-cardhu_decombobulator_blackberry.rim.net-1248866056-@bxe009.bisx.prod.on.blackberry> References: <1185551166.17961.2.camel@blabla.mcs.anl.gov><1185552119.18583.4.camel@blabla.mcs.anl.gov><866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov> <625535667-1185564017-cardhu_decombobulator_blackberry.rim.net-1248866056-@bxe009.bisx.prod.on.blackberry> Message-ID: <11E45345-6700-42C5-9624-694ED4D7666E@mcs.anl.gov> This proves a bit cumbersome to have this combination of swift and the wrapper. This array declaration has to be inside another loop, i.e. depend on the loop variable, yet being generated by shell script... I am still testing various possibilities. Although generating the string inside swift would've been much easier. On Jul 27, 2007, at 2:20 PM, Ian Foster wrote: > Could you not handle the "cat a set of strings" case via a call to > a shell script or other program that does this? > > Ian > > > Sent via BlackBerry from T-Mobile > > -----Original Message----- > From: Veronika Nefedova > > Date: Fri, 27 Jul 2007 11:09:19 > To:Mihael Hategan > Cc:swift-devel at ci.uchicago.edu > Subject: Re: [Swift-devel] loops and strings > > > I need to 'cat' together an unknown number of strings to form a > string, thats why I was attempting to do it inside the loop. And even > if I knew the number of loop cycles (say, its 68) -- are you > suggesting to do it 'by hand' ? > > > Anyway - my main goal is not to create this string, but to map an > array: > file whamfiles_$s[] ; > > Do you see a solution here? > > Thanks, > > Nika > > > On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote: > >> wham_string2 = @strcat(wham_string, ", wham"); >> print(wham_string2); >> >> Variables are not variables. They are labels that are used to >> direct the >> data flow. Loops (in the sense of data looping around the same node - >> picture this as a data flow graph) make no sense. >> >> On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote: >>> So how else then I construct a string in swift ? >>> >>> >>> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote: >>> >>>> Variables in swift are single assignment. You can't assign to a >>>> variable >>>> twice. What, in your opinion, should the error message be instead >>>> of the >>>> current one? >>>> >>>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote: >>>>> I am not sure if its possible to do string operations inside the >>>>> loop >>>>> in swift? >>>>> I have a versy simple test code that doesn't work no matter what. >>>>> Obviously, I am missing something. >>>>> This is the code: >>>>> >>>>> file fls[]; >>>>> string wham_string = "#"; >>>>> foreach prt_file in fls >>>>> { >>>>> wham_string = @strcat (wham_string, ", wham"); >>>>> print (wham_string); >>>>> } >>>>> print (wham_string); >>>>> >>>>> >>>>> basically I expect to have this as an output: >>>>> #,wham,wham,wham,wham,... (its a test code (-;) >>>>> >>>>> instead I have these errors: >>>>> >>>>> wham_string is already assigned with a value of # >>>>> wham_string is already assigned with a value of # >>>>> vdl:assign @ test.kml, line: 46 >>>>> vdl:mains @ test.kml, line: 39 >>>>> Caused by: java.lang.IllegalArgumentException: wham_string is >>>>> already >>>>> assigned with a value of # >>>>> at org.griphyn.vdl.mapping.AbstractDataNode.setValue >>>>> (AbstractDataNode.java:255) >>>>> at org.griphyn.vdl.karajan.lib.Assign.function >>>>> (Assign.java:70) >>>>> >>>>> >>>>> >>>>> In any case -- if I can't construct the string by using the loop - >>>>> how else could it be done? >>>>> >>>>> I use the constructed string then to map an array (I understand I >>>>> can't map individual array elements): >>>>> >>>>> file whamfiles_$s[] >>>>> ; //it >>>>> was in the wrapper script before) >>>>> >>>>> >>>>> Nika >>>>> >>>>> _______________________________________________ >>>>> Swift-devel mailing list >>>>> Swift-devel at ci.uchicago.edu >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>> >>>> >>> >> > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From itf at mcs.anl.gov Fri Jul 27 14:59:44 2007 From: itf at mcs.anl.gov (=?utf-8?B?SWFuIEZvc3Rlcg==?=) Date: Fri, 27 Jul 2007 19:59:44 +0000 Subject: [Swift-devel] loops and strings In-Reply-To: <68DFC8CA-3B70-4D09-94DA-786DD9BB9572@mcs.anl.gov> References: <1185551166.17961.2.camel@blabla.mcs.anl.gov><1185552119.18583.4.camel@blabla.mcs.anl.gov><866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov><1185560674.19922.7.camel@blabla.mcs.anl.gov><4AF4ED33-613B-4193-AD83-B2C79D286F38@mcs.anl.gov><1185563469.22752.7.camel@blabla.mcs.anl.gov><68DFC8CA-3B70-4D09-94DA-786DD9BB9572@mcs.anl.gov> Message-ID: <793163636-1185566391-cardhu_decombobulator_blackberry.rim.net-663918437-@bxe009.bisx.prod.on.blackberry> That has local scope and so each time around the loop is a different variable Sent via BlackBerry from T-Mobile -----Original Message----- From: Veronika Nefedova Date: Fri, 27 Jul 2007 14:26:36 To:Mihael Hategan Cc:swift-devel at ci.uchicago.edu Subject: Re: [Swift-devel] loops and strings I guess I am still missing something. I *can* have multiple assignments to the same variable inside the loop. Here, this code assigns different values to "name" at every loop step: file fls[]; foreach prt_file in fls { string name = @strcut (@prt_file, "\.\/(.*)\.prt"); print (name); } Or "name" considered to be a new variable every time since I have a type declaration next to it? Nika On Jul 27, 2007, at 2:11 PM, Mihael Hategan wrote: > On Fri, 2007-07-27 at 14:01 -0500, Veronika Nefedova wrote: >> will allowing multiple assignments to the same variable be a really >> impossible thing to have in swift? > > With what we currently have as "Swift", yes. > >> >> Nika >> >> On Jul 27, 2007, at 1:24 PM, Mihael Hategan wrote: >>> I see we're getting back to the same old story of the conflict >>> between >>> writing a mapper and hacking one directly in swift. >>> >>> This is an issue we really need to deal with. It has produced more >>> discussions and hacks than any other single Swift issue. >>> >>> You could use an array, or we could provide a folding operator/ >>> function, >>> or even a join function. >>> We could also let fixed_array_mapper accept an array as a >>> parameter, so >>> you would build an array with the file names and then pass it to the >>> mapper. >>> >>> On Fri, 2007-07-27 at 11:09 -0500, Veronika Nefedova wrote: >>>> I need to 'cat' together an unknown number of strings to form a >>>> string, thats why I was attempting to do it inside the loop. And >>>> even >>>> if I knew the number of loop cycles (say, its 68) -- are you >>>> suggesting to do it 'by hand' ? >>>> >>>> >>>> Anyway - my main goal is not to create this string, but to map an >>>> array: >>>> file whamfiles_$s[] ; >>>> >>>> Do you see a solution here? >>>> >>>> Thanks, >>>> >>>> Nika >>>> >>>> >>>> On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote: >>>> >>>>> wham_string2 = @strcat(wham_string, ", wham"); >>>>> print(wham_string2); >>>>> >>>>> Variables are not variables. They are labels that are used to >>>>> direct the >>>>> data flow. Loops (in the sense of data looping around the same >>>>> node - >>>>> picture this as a data flow graph) make no sense. >>>>> >>>>> On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote: >>>>>> So how else then I construct a string in swift ? >>>>>> >>>>>> >>>>>> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote: >>>>>> >>>>>>> Variables in swift are single assignment. You can't assign to a >>>>>>> variable >>>>>>> twice. What, in your opinion, should the error message be >>>>>>> instead >>>>>>> of the >>>>>>> current one? >>>>>>> >>>>>>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote: >>>>>>>> I am not sure if its possible to do string operations inside >>>>>>>> the >>>>>>>> loop >>>>>>>> in swift? >>>>>>>> I have a versy simple test code that doesn't work no matter >>>>>>>> what. >>>>>>>> Obviously, I am missing something. >>>>>>>> This is the code: >>>>>>>> >>>>>>>> file fls[]; >>>>>>>> string wham_string = "#"; >>>>>>>> foreach prt_file in fls >>>>>>>> { >>>>>>>> wham_string = @strcat (wham_string, ", wham"); >>>>>>>> print (wham_string); >>>>>>>> } >>>>>>>> print (wham_string); >>>>>>>> >>>>>>>> >>>>>>>> basically I expect to have this as an output: >>>>>>>> #,wham,wham,wham,wham,... (its a test code (-;) >>>>>>>> >>>>>>>> instead I have these errors: >>>>>>>> >>>>>>>> wham_string is already assigned with a value of # >>>>>>>> wham_string is already assigned with a value of # >>>>>>>> vdl:assign @ test.kml, line: 46 >>>>>>>> vdl:mains @ test.kml, line: 39 >>>>>>>> Caused by: java.lang.IllegalArgumentException: wham_string is >>>>>>>> already >>>>>>>> assigned with a value of # >>>>>>>> at org.griphyn.vdl.mapping.AbstractDataNode.setValue >>>>>>>> (AbstractDataNode.java:255) >>>>>>>> at org.griphyn.vdl.karajan.lib.Assign.function >>>>>>>> (Assign.java:70) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> In any case -- if I can't construct the string by using the >>>>>>>> loop - >>>>>>>> how else could it be done? >>>>>>>> >>>>>>>> I use the constructed string then to map an array (I >>>>>>>> understand I >>>>>>>> can't map individual array elements): >>>>>>>> >>>>>>>> file whamfiles_$s[] >>>>>>>> ; //it >>>>>>>> was in the wrapper script before) >>>>>>>> >>>>>>>> >>>>>>>> Nika >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Swift-devel mailing list >>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From nefedova at mcs.anl.gov Fri Jul 27 15:13:20 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Fri, 27 Jul 2007 15:13:20 -0500 Subject: [Swift-devel] loops and strings In-Reply-To: <11E45345-6700-42C5-9624-694ED4D7666E@mcs.anl.gov> References: <1185551166.17961.2.camel@blabla.mcs.anl.gov><1185552119.18583.4.camel@blabla.mcs.anl.gov><866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov> <625535667-1185564017-cardhu_decombobulator_blackberry.rim.net-1248866056-@bxe009.bisx.prod.on.blackberry> <11E45345-6700-42C5-9624-694ED4D7666E@mcs.anl.gov> Message-ID: ok, here is the problem I do not see how to bypass. I have an outer loop: foreach f in files { string S = "bla" } I need to have this array declared, and if I generate the string in the shell script, it has to be declared explicitly: foreach f in files { string S = "bla" file whamfiles [] ; } and it has to be "S", not its value since its all inside the loop. But for swift to recognize S as its own variable (and substitute its value on every loop step) I need to use strcat: @strcat("file1_", S), @strcat("file2_", S), etc for each of the string's element -- I do not see a way for doing it so far without being able to construct a string in swift... There are 68 elements in that string but could be any number. Does anybody have any suggestions? Nika > This proves a bit cumbersome to have this combination of swift and > the wrapper. This array declaration has to be inside another loop, > i.e. depend on the loop variable, yet being generated by shell > script... I am still testing various possibilities. Although > generating the string inside swift would've been much easier. > > On Jul 27, 2007, at 2:20 PM, Ian Foster wrote: > >> Could you not handle the "cat a set of strings" case via a call to >> a shell script or other program that does this? >> >> Ian >> >> >> Sent via BlackBerry from T-Mobile >> >> -----Original Message----- >> From: Veronika Nefedova >> >> Date: Fri, 27 Jul 2007 11:09:19 >> To:Mihael Hategan >> Cc:swift-devel at ci.uchicago.edu >> Subject: Re: [Swift-devel] loops and strings >> >> >> I need to 'cat' together an unknown number of strings to form a >> string, thats why I was attempting to do it inside the loop. And even >> if I knew the number of loop cycles (say, its 68) -- are you >> suggesting to do it 'by hand' ? >> >> >> Anyway - my main goal is not to create this string, but to map an >> array: >> file whamfiles_$s[] ; >> >> Do you see a solution here? >> >> Thanks, >> >> Nika >> >> >> On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote: >> >>> wham_string2 = @strcat(wham_string, ", wham"); >>> print(wham_string2); >>> >>> Variables are not variables. They are labels that are used to >>> direct the >>> data flow. Loops (in the sense of data looping around the same >>> node - >>> picture this as a data flow graph) make no sense. >>> >>> On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote: >>>> So how else then I construct a string in swift ? >>>> >>>> >>>> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote: >>>> >>>>> Variables in swift are single assignment. You can't assign to a >>>>> variable >>>>> twice. What, in your opinion, should the error message be instead >>>>> of the >>>>> current one? >>>>> >>>>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote: >>>>>> I am not sure if its possible to do string operations inside the >>>>>> loop >>>>>> in swift? >>>>>> I have a versy simple test code that doesn't work no matter what. >>>>>> Obviously, I am missing something. >>>>>> This is the code: >>>>>> >>>>>> file fls[]; >>>>>> string wham_string = "#"; >>>>>> foreach prt_file in fls >>>>>> { >>>>>> wham_string = @strcat (wham_string, ", wham"); >>>>>> print (wham_string); >>>>>> } >>>>>> print (wham_string); >>>>>> >>>>>> >>>>>> basically I expect to have this as an output: >>>>>> #,wham,wham,wham,wham,... (its a test code (-;) >>>>>> >>>>>> instead I have these errors: >>>>>> >>>>>> wham_string is already assigned with a value of # >>>>>> wham_string is already assigned with a value of # >>>>>> vdl:assign @ test.kml, line: 46 >>>>>> vdl:mains @ test.kml, line: 39 >>>>>> Caused by: java.lang.IllegalArgumentException: wham_string is >>>>>> already >>>>>> assigned with a value of # >>>>>> at org.griphyn.vdl.mapping.AbstractDataNode.setValue >>>>>> (AbstractDataNode.java:255) >>>>>> at org.griphyn.vdl.karajan.lib.Assign.function >>>>>> (Assign.java:70) >>>>>> >>>>>> >>>>>> >>>>>> In any case -- if I can't construct the string by using the >>>>>> loop - >>>>>> how else could it be done? >>>>>> >>>>>> I use the constructed string then to map an array (I understand I >>>>>> can't map individual array elements): >>>>>> >>>>>> file whamfiles_$s[] >>>>>> ; //it >>>>>> was in the wrapper script before) >>>>>> >>>>>> >>>>>> Nika >>>>>> >>>>>> _______________________________________________ >>>>>> Swift-devel mailing list >>>>>> Swift-devel at ci.uchicago.edu >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>> >>>>> >>>> >>> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Fri Jul 27 15:30:12 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 27 Jul 2007 15:30:12 -0500 Subject: [Swift-devel] loops and strings In-Reply-To: References: <1185551166.17961.2.camel@blabla.mcs.anl.gov> <1185552119.18583.4.camel@blabla.mcs.anl.gov> <866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov> <625535667-1185564017-cardhu_decombobulator_blackberry.rim.net-1248866056-@bxe009.bisx.prod.on.blackberry> <11E45345-6700-42C5-9624-694ED4D7666E@mcs.anl.gov> Message-ID: <1185568212.26509.3.camel@blabla.mcs.anl.gov> Seriously now. Having a mapper would save you lots of time. I'll help you out. Take a look at AirsnMapper.java and ROIMapper.java. Mihael On Fri, 2007-07-27 at 15:13 -0500, Veronika Nefedova wrote: > ok, here is the problem I do not see how to bypass. > > I have an outer loop: > > foreach f in files { > string S = "bla" > } > > I need to have this array declared, and if I generate the string in > the shell script, it has to be declared explicitly: > > foreach f in files { > string S = "bla" > file whamfiles [] file3_S">; > } > > and it has to be "S", not its value since its all inside the loop. > But for swift to recognize S as its own variable (and substitute its > value on every loop step) I need to use strcat: > @strcat("file1_", S), @strcat("file2_", S), etc for each of the > string's element -- I do not see a way for doing it so far without > being able to construct a string in swift... There are 68 elements in > that string but could be any number. > > Does anybody have any suggestions? > > Nika > > > This proves a bit cumbersome to have this combination of swift and > > the wrapper. This array declaration has to be inside another loop, > > i.e. depend on the loop variable, yet being generated by shell > > script... I am still testing various possibilities. Although > > generating the string inside swift would've been much easier. > > > > On Jul 27, 2007, at 2:20 PM, Ian Foster wrote: > > > >> Could you not handle the "cat a set of strings" case via a call to > >> a shell script or other program that does this? > >> > >> Ian > >> > >> > >> Sent via BlackBerry from T-Mobile > >> > >> -----Original Message----- > >> From: Veronika Nefedova > >> > >> Date: Fri, 27 Jul 2007 11:09:19 > >> To:Mihael Hategan > >> Cc:swift-devel at ci.uchicago.edu > >> Subject: Re: [Swift-devel] loops and strings > >> > >> > >> I need to 'cat' together an unknown number of strings to form a > >> string, thats why I was attempting to do it inside the loop. And even > >> if I knew the number of loop cycles (say, its 68) -- are you > >> suggesting to do it 'by hand' ? > >> > >> > >> Anyway - my main goal is not to create this string, but to map an > >> array: > >> file whamfiles_$s[] ; > >> > >> Do you see a solution here? > >> > >> Thanks, > >> > >> Nika > >> > >> > >> On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote: > >> > >>> wham_string2 = @strcat(wham_string, ", wham"); > >>> print(wham_string2); > >>> > >>> Variables are not variables. They are labels that are used to > >>> direct the > >>> data flow. Loops (in the sense of data looping around the same > >>> node - > >>> picture this as a data flow graph) make no sense. > >>> > >>> On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote: > >>>> So how else then I construct a string in swift ? > >>>> > >>>> > >>>> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote: > >>>> > >>>>> Variables in swift are single assignment. You can't assign to a > >>>>> variable > >>>>> twice. What, in your opinion, should the error message be instead > >>>>> of the > >>>>> current one? > >>>>> > >>>>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote: > >>>>>> I am not sure if its possible to do string operations inside the > >>>>>> loop > >>>>>> in swift? > >>>>>> I have a versy simple test code that doesn't work no matter what. > >>>>>> Obviously, I am missing something. > >>>>>> This is the code: > >>>>>> > >>>>>> file fls[]; > >>>>>> string wham_string = "#"; > >>>>>> foreach prt_file in fls > >>>>>> { > >>>>>> wham_string = @strcat (wham_string, ", wham"); > >>>>>> print (wham_string); > >>>>>> } > >>>>>> print (wham_string); > >>>>>> > >>>>>> > >>>>>> basically I expect to have this as an output: > >>>>>> #,wham,wham,wham,wham,... (its a test code (-;) > >>>>>> > >>>>>> instead I have these errors: > >>>>>> > >>>>>> wham_string is already assigned with a value of # > >>>>>> wham_string is already assigned with a value of # > >>>>>> vdl:assign @ test.kml, line: 46 > >>>>>> vdl:mains @ test.kml, line: 39 > >>>>>> Caused by: java.lang.IllegalArgumentException: wham_string is > >>>>>> already > >>>>>> assigned with a value of # > >>>>>> at org.griphyn.vdl.mapping.AbstractDataNode.setValue > >>>>>> (AbstractDataNode.java:255) > >>>>>> at org.griphyn.vdl.karajan.lib.Assign.function > >>>>>> (Assign.java:70) > >>>>>> > >>>>>> > >>>>>> > >>>>>> In any case -- if I can't construct the string by using the > >>>>>> loop - > >>>>>> how else could it be done? > >>>>>> > >>>>>> I use the constructed string then to map an array (I understand I > >>>>>> can't map individual array elements): > >>>>>> > >>>>>> file whamfiles_$s[] > >>>>>> ; //it > >>>>>> was in the wrapper script before) > >>>>>> > >>>>>> > >>>>>> Nika > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Swift-devel mailing list > >>>>>> Swift-devel at ci.uchicago.edu > >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>>> > >>>>> > >>>> > >>> > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >> > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From bugzilla-daemon at mcs.anl.gov Fri Jul 27 18:46:03 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 27 Jul 2007 18:46:03 -0500 (CDT) Subject: [Swift-devel] [Bug 84] New: switch does not work with variable parameter Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=84 Summary: switch does not work with variable parameter Product: Swift Version: unspecified Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: SwiftScript language AssignedTo: benc at hawaga.org.uk ReportedBy: benc at hawaga.org.uk CC: swift-devel at ci.uchicago.edu The below code executes the default case, rather than the 8 case. Replacing the switch with: switch(8) { with the selector value a hard-coded constant causes the 8 case to run. type messagefile {} (messagefile t) greeting(string m) { app { echo m stdout=@filename(t); } } messagefile outfile <"091-case.out">; int selector = 8; print(selector); string message; switch(selector) { case 3: message="first message"; break; case 8: message="eighth message"; break; case 57: message="last message"; break; default: message="no message at all..."; break; } outfile = greeting(message); -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Fri Jul 27 18:48:44 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 27 Jul 2007 18:48:44 -0500 (CDT) Subject: [Swift-devel] [Bug 85] New: break statements in switch/case have no effect. Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=85 Summary: break statements in switch/case have no effect. Product: Swift Version: unspecified Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: SwiftScript language AssignedTo: benc at hawaga.org.uk ReportedBy: benc at hawaga.org.uk CC: swift-devel at ci.uchicago.edu Break statements in case/switch have no effect - code behaves the same whether there's a break or not, and executes only the code directly attached to any particular case. For example, the below code executes only the 8th case, rather than executing the code associated with other case lower down too (which should then fail with multiple assignment error). Likely easiest course is to remove break; from the language. type messagefile {} (messagefile t) greeting(string m) { app { echo m stdout=@filename(t); } } messagefile outfile <"092-case-duffs-device.out">; string message; switch(8) { case 3: message="first message"; case 8: message="eighth message"; default: message="no message at all..."; case 57: message="last message"; } -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Fri Jul 27 20:09:28 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 27 Jul 2007 20:09:28 -0500 (CDT) Subject: [Swift-devel] [Bug 84] switch does not work with variable parameter In-Reply-To: Message-ID: <20070728010928.83157164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=84 ------- Comment #1 from hategan at mcs.anl.gov 2007-07-27 20:09 ------- Comparison broken? Can you add a test case in SVN? -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Fri Jul 27 20:15:05 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 27 Jul 2007 20:15:05 -0500 (CDT) Subject: [Swift-devel] [Bug 85] break statements in switch/case have no effect. In-Reply-To: Message-ID: <20070728011505.3B346164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=85 ------- Comment #1 from hategan at mcs.anl.gov 2007-07-27 20:15 ------- I'm thinking that the C behavio(u)r here, we may want to avoid. In fact we could drop the switch statement altogether. In C it fulfills the important role of having multiple if tests compiled into (more or less) one indirect jump. In Swift, it looks more like a liability. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From benc at hawaga.org.uk Fri Jul 27 22:42:38 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 28 Jul 2007 03:42:38 +0000 (GMT) Subject: [Swift-devel] Re: [Bug 84] switch does not work with variable parameter In-Reply-To: <20070728010928.9A88816502@foxtrot.mcs.anl.gov> References: <20070728010928.9A88816502@foxtrot.mcs.anl.gov> Message-ID: On Fri, 27 Jul 2007, bugzilla-daemon at mcs.anl.gov wrote: > ------- Comment #1 from hategan at mcs.anl.gov 2007-07-27 20:09 ------- > Comparison broken? Can you add a test case in SVN? yes, it turns out - I hadn't thought about that being a cause. The below fails (i.e. writes 'false' to cmp3.out). I have a bunch of tests related to this. Will commit them tomorrow when I'm more awake. type messagefile {} (messagefile t) greeting(boolean b) { app { echo b stdout=@filename(t); } } messagefile outfile <"cmp3.out">; int i = 2; boolean r = i==2; outfile = greeting(r); -- From benc at hawaga.org.uk Sat Jul 28 07:47:55 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 28 Jul 2007 12:47:55 +0000 (GMT) Subject: [Swift-devel] loops and strings In-Reply-To: <68DFC8CA-3B70-4D09-94DA-786DD9BB9572@mcs.anl.gov> References: <1185551166.17961.2.camel@blabla.mcs.anl.gov> <1185552119.18583.4.camel@blabla.mcs.anl.gov> <866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov> <1185560674.19922.7.camel@blabla.mcs.anl.gov> <4AF4ED33-613B-4193-AD83-B2C79D286F38@mcs.anl.gov> <1185563469.22752.7.camel@blabla.mcs.anl.gov> <68DFC8CA-3B70-4D09-94DA-786DD9BB9572@mcs.anl.gov> Message-ID: On Fri, 27 Jul 2007, Veronika Nefedova wrote: > Or "name" considered to be a new variable every time since I have a type > declaration next to it? pretty much, yes - its declared inside the loop so every time that loop code is run a new variables comes into existence. if its declared in an outer loop, then a new one comes into existence every time the outer loop runs. if its declared at the top level of your swift code, then a new one comes into existence every time you run a new workfow. -- From bugzilla-daemon at mcs.anl.gov Sat Jul 28 08:02:08 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sat, 28 Jul 2007 08:02:08 -0500 (CDT) Subject: [Swift-devel] [Bug 85] break statements in switch/case have no effect. In-Reply-To: Message-ID: <20070728130208.2A286164B3@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=85 ------- Comment #2 from benc at hawaga.org.uk 2007-07-28 08:02 ------- r1001 removes break from the language. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Sat Jul 28 08:18:34 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sat, 28 Jul 2007 08:18:34 -0500 (CDT) Subject: [Swift-devel] [Bug 84] switch does not work with variable parameter In-Reply-To: Message-ID: <20070728131834.BBE4E164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=84 ------- Comment #2 from benc at hawaga.org.uk 2007-07-28 08:18 ------- Yes, there seems to be a problem with numerical comparison. In r1002, I added three tests to language-behaviour - 100-comparison.swift which works, and broken/bug84*.swift which don't work. (To run the ones in the broken subdir, you need to be in the broken subdirectory, so type something like this: cd broken/ ../run bug84-comparisons2.swift ) -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Sat Jul 28 15:32:04 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sat, 28 Jul 2007 15:32:04 -0500 (CDT) Subject: [Swift-devel] [Bug 84] switch does not work with variable parameter In-Reply-To: Message-ID: <20070728203204.7C3F0164B3@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=84 ------- Comment #3 from hategan at mcs.anl.gov 2007-07-28 15:32 ------- It's comparing "2" with 2. Either we switch to the new expression stuff, where numbers are numbers, or equals() is changed to equalsNumeric(), which does some type conversion before doing the comparison. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Mon Jul 30 16:29:20 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 30 Jul 2007 16:29:20 -0500 (CDT) Subject: [Swift-devel] [Bug 85] break statements in switch/case have no effect. In-Reply-To: Message-ID: <20070730212920.60818164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=85 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Mon Jul 30 16:35:38 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 30 Jul 2007 16:35:38 -0500 (CDT) Subject: [Swift-devel] [Bug 2] Diamond and file_counter tests are failing. In-Reply-To: Message-ID: <20070730213538.C0BBB164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=2 hategan at mcs.anl.gov changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from hategan at mcs.anl.gov 2007-07-30 16:35 ------- Looks like this has been solved by Ben's updates (http://www.ci.uchicago.edu/trac/swift/changeset/952) -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. From bugzilla-daemon at mcs.anl.gov Mon Jul 30 16:40:53 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 30 Jul 2007 16:40:53 -0500 (CDT) Subject: [Swift-devel] [Bug 16] failing job behaviour varies depending on -debug or not In-Reply-To: Message-ID: <20070730214053.0C358164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=16 hategan at mcs.anl.gov changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from hategan at mcs.anl.gov 2007-07-30 16:40 ------- Closing this up since it seems solved. Reopen if necessary. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Mon Jul 30 16:43:29 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 30 Jul 2007 16:43:29 -0500 (CDT) Subject: [Swift-devel] [Bug 19] @stdin doesn't seem to work properly In-Reply-To: Message-ID: <20070730214329.6FF8E164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=19 hategan at mcs.anl.gov changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Mon Jul 30 16:51:14 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 30 Jul 2007 16:51:14 -0500 (CDT) Subject: [Swift-devel] [Bug 37] PATH being printed out when workflow runs In-Reply-To: Message-ID: <20070730215114.83FCD16505@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=37 hategan at mcs.anl.gov changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from hategan at mcs.anl.gov 2007-07-30 16:51 ------- Bug seems to have mysteriously disappeared. Reopen if needed. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Mon Jul 30 16:53:52 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 30 Jul 2007 16:53:52 -0500 (CDT) Subject: [Swift-devel] [Bug 37] PATH being printed out when workflow runs In-Reply-To: Message-ID: <20070730215352.648C116505@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=37 ------- Comment #6 from nefedova at mcs.anl.gov 2007-07-30 16:53 ------- nope, its still there (r999). It prints $PATH at every job invocation. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From foster at mcs.anl.gov Tue Jul 31 08:23:37 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Tue, 31 Jul 2007 08:23:37 -0500 Subject: [Swift-devel] Q about MolDyn Message-ID: <46AF37D9.7000301@mcs.anl.gov> Hi, I am curious whether we found out why those two jobs (?) were failing at the end of the big MolDyn run? Ian. From benc at hawaga.org.uk Tue Jul 31 14:14:04 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 31 Jul 2007 19:14:04 +0000 (GMT) Subject: [Swift-devel] kilo-commit Message-ID: Commit r1024 just went into SVN... --