From bugzilla-daemon at mcs.anl.gov  Sun Jul  1 00:09:12 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Sun,  1 Jul 2007 00:09:12 -0500 (CDT)
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <bug-72-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070701050912.2A2BC16505@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72


------- Comment #7 from iraicu at cs.uchicago.edu  2007-07-01 00:09 -------
(In reply to comment #6)
> (In reply to comment #4)
> > Hi again,
> > Here is an update of yesterday's 244 molecule run.  The experiment ran further
> > than before, but it still did not complete.  There were 240 molecules that
> > completed successfully (in the previous run, no molecule finished), but 4
> > molecules still did not finish. 
> > 
> 
> Actually it looks tasks worked fine:
> bash-3.1$ cat MolDyn-244-63ar6atbg2ae1.log |grep "type=1.*ubmitted"|wc
>   24309  243090 2806214
> bash-3.1$ cat MolDyn-244-63ar6atbg2ae1.log |grep "type=1.*ailed"|wc
>    3614   36140  405816
> bash-3.1$ cat MolDyn-244-63ar6atbg2ae1.log |grep "type=1.*ompleted"|wc
>   20695  206950 2389556
> 
> All tasks are accounted for. It may be that some jobs failed 3 times in a row.
> From the logs it looks like the workflow almost finished and it got to the
> point where the error reporting was to be done. Perhaps the stack overflow that
> you saw occurred there, and perhaps the impossible size of the workflow might
> have something to do with it.
> 
The same machine (tg-v024) that we had trouble with before acted up again, I
should have removed it before we started the experiment.  If this is the
consensus, we can certainly try it again, and make sure this machine is not in
the resource pool.  Another idea is to increase the retry # from 3 to something
higher, maybe 10, 30, etc?  Jobs can be resubmitted relatively fast with
Falkon, so retrying many times is not a big overhead... except that it takes
longer for Swift to give up!

Ioan


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Sun Jul  1 00:47:49 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Sun,  1 Jul 2007 00:47:49 -0500 (CDT)
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <bug-72-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070701054749.8E478164DB@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72


------- Comment #8 from iraicu at cs.uchicago.edu  2007-07-01 00:47 -------
(In reply to comment #5)
> First of all, can you commit the changes to SVN?
> 
Yong made the changes, I am sure he will commit them the first chance he gets!

> (In reply to comment #4)
> > We fixed the potential synchronization issue
> > Mihael pointed out.
> 
> There were two.
> 
I meant to say "issues"... from the discussion I had with Yong, I believe he
addressed both of them.
> > We also fixed a badly handled exception we had in the
> > Falkon provider, that would give up very easily and exit the Falkon provider
> > thread in case of an exception, even if it wasn't a fatal one.  This time
> > around, we changed the logic to simply print the exception, if there were any,
> > and not exit the Falkon provider, just continue.  Personally, I think this
> > logic on handling exceptions in the Falkon provider was causing the Falkon
> > provider to exit prematurely, and hence not send any more tasks to Falkon...
> 
> I can't seem to find anything that would fit that profile in the provider code.
> Can you be more specific? If the provider was setting the status of the task to
> failed, then it doesn't matter. Swift retries failed things.
> 
Sure.  Double check file SubmissionThread.java, notice that the thread will
live as long as exit is not set...
Line 54:    public void run() {
        while(!exit) {

exit is initially set to false, but anything that sets it to true, and the
submission thread will exit.

Notice the end of the file with the setStatus(Executable) function:
Line 98:    public void setStatus (Executable execs[]) {
        try {
            for (int i=0; i<execs.length; i++) {
                Task task = rp.removeTask(execs[i].getId());
                task.setStatus(Status.FAILED);
                System.out.println("*****************************SUPER_DEBUG:
setStatus(execs): " + i);
            }
        } catch (Exception e) {
            //no-op
            e.printStackTrace();
        }
        //this.exit = true;
    }

Notice the exit being set to true.  This setStatus function is being called in
a single place in that file:
Line 91:            } catch (Exception e) {
                setStatus(execs);
                e.printStackTrace();
            } 

So, this would essentially kill the Submission thread from an exception.

Also, check the StatusThread.java, 
...
Line 66:            } catch (Exception e) {
                logger.debug("Error removing tasks");
                e.printStackTrace();
                //exit = true;
            } 
With an exception here, it would have caused the StatusThread to exit, meaning
that no new notifications would be received.

Both of these exception handling have been modified to not exit and shutdown
the respective threads, by simply omitting the change of the exit value from
false to true.

We'll dig through the Falkon provider logs to find out the exceptions that were
thrown throughout the application run (assuming that some were thrown like in
the past), so we can better understand why those exceptions were happening in
the first place, and hopefully find a solution so they do not happen in the
future!

> > note that Swift was setting the set status of submitted tasks to the Falkon
> > provider in a separate thread,
> 
> Swift does not set status of tasks. That's what the provider is supposed to do.
> 
OK, there are several separate threads, one that sets the status of the task
for Swift, another that performs the submit, another that receives
notifications, etc.  The common data structure between the set status thread
and the submit thread is a queue; if the submission thread dies, the queue is
still valid, and the set status thread could still insert tasks into the queue
and set the status to submitted, although there would be no submission thread
alive to perform the submission itself to Falkon.

> > which was not necesarly exiting when the Falkon
> > provider was, and hence we had the scenario in which Swift thought it sent out
> > more tasks than Falkon really saw. 
> 
> Can you be more specific? If there is a problem in Swift, we need to fix it,
> but your comment is too vague.
> 
> > 
> > Now, the issue that I think stopped this experiment.  On the console of Swift,
> > the last thing that it printed was a "stack overflow error"; I don't think this
> > printed in the logs, just on the console.
> 
> Without the stack trace, the information is not very useful.
> 
Nika said it was simply a message printed on the console.  This was the same as
the case we saw on Thursday.  This was not a regular exception that Swift or
the Falkon provider controlled, and hence that it would have a print stack
trace along with it.  As far as I could tell, it was an error from the JVM, and
was not accompanied by any stack trace.  If you don't know where to even start
looking, let's run some quick synthetic runs of 20K jobs on Monday together,
and  hopefully we can reproduce the stack overflow error, and you can see it in
person!

Ioan
> > 
> > Ioan
> > 
> 


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From hategan at mcs.anl.gov  Sun Jul  1 01:53:43 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Sun, 01 Jul 2007 01:53:43 -0500
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <1702663950-1183255865-cardhu_decombobulator_blackberry.rim.net-1244943269-@bxe006.bisx.prod.on.blackberry>
References: <bug-72-21@http.bugzilla.mcs.anl.gov/swift/>
	<20070630225207.B70D916506@foxtrot.mcs.anl.gov>
	<1702663950-1183255865-cardhu_decombobulator_blackberry.rim.net-1244943269-@bxe006.bisx.prod.on.blackberry>
Message-ID: <1183272823.21185.5.camel@blabla.mcs.anl.gov>

On Sun, 2007-07-01 at 02:10 +0000, Ian Foster wrote:
> Why do you say the workflow's size was "impossible"? It doesn't seem that large to me. We'd like to run larger ones!

Most certainly so. However, we want to make use of loops rather than
generating large swift files.

> 
> 
> Sent via BlackBerry from T-Mobile
> 
> -----Original Message-----
> From: bugzilla-daemon at mcs.anl.gov
> 
> Date: Sat, 30 Jun 2007 17:52:07 
> To:swift-devel at ci.uchicago.edu
> Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
> 
> 
> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72
> 
> 
> 
> 
> 
> ------- Comment #6 from hategan at mcs.anl.gov  2007-06-30 17:52 -------
> (In reply to comment #4)
> > Hi again,
> > Here is an update of yesterday's 244 molecule run.  The experiment ran further
> > than before, but it still did not complete.  There were 240 molecules that
> > completed successfully (in the previous run, no molecule finished), but 4
> > molecules still did not finish. 
> > 
> 
> Actually it looks tasks worked fine:
> bash-3.1$ cat MolDyn-244-63ar6atbg2ae1.log |grep "type=1.*ubmitted"|wc
>   24309  243090 2806214
> bash-3.1$ cat MolDyn-244-63ar6atbg2ae1.log |grep "type=1.*ailed"|wc
>    3614   36140  405816
> bash-3.1$ cat MolDyn-244-63ar6atbg2ae1.log |grep "type=1.*ompleted"|wc
>   20695  206950 2389556
> 
> All tasks are accounted for. It may be that some jobs failed 3 times in a row.
> >From the logs it looks like the workflow almost finished and it got to the
> point where the error reporting was to be done. Perhaps the stack overflow that
> you saw occurred there, and perhaps the impossible size of the workflow might
> have something to do with it.
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel


From bugzilla-daemon at mcs.anl.gov  Sun Jul  1 01:56:28 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Sun,  1 Jul 2007 01:56:28 -0500 (CDT)
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <bug-72-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070701065628.0C73216506@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72


------- Comment #9 from hategan at mcs.anl.gov  2007-07-01 01:56 -------
(In reply to comment #7)
> (In reply to comment #6)
> > (In reply to comment #4)
> The same machine (tg-v024) that we had trouble with before acted up again, I
> should have removed it before we started the experiment.  If this is the
> consensus, we can certainly try it again, and make sure this machine is not in
> the resource pool.  Another idea is to increase the retry # from 3 to something
> higher, maybe 10, 30, etc?

Not a good idea in the general case, since many times the error may not be
something temporary. The swift scheduler takes bad machines into account and
attempts to avoid submitting to them.

> 
> Ioan
> 


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From hategan at mcs.anl.gov  Sun Jul  1 02:11:49 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Sun, 01 Jul 2007 02:11:49 -0500
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <1183272823.21185.5.camel@blabla.mcs.anl.gov>
References: <bug-72-21@http.bugzilla.mcs.anl.gov/swift/>
	<20070630225207.B70D916506@foxtrot.mcs.anl.gov>
	<1702663950-1183255865-cardhu_decombobulator_blackberry.rim.net-1244943269-@bxe006.bisx.prod.on.blackberry>
	<1183272823.21185.5.camel@blabla.mcs.anl.gov>
Message-ID: <1183273909.21185.11.camel@blabla.mcs.anl.gov>

On Sun, 2007-07-01 at 01:53 -0500, Mihael Hategan wrote:
> On Sun, 2007-07-01 at 02:10 +0000, Ian Foster wrote:
> > Why do you say the workflow's size was "impossible"? It doesn't seem that large to me. We'd like to run larger ones!
> 
> Most certainly so. However, we want to make use of loops rather than
> generating large swift files.

Ok. I see. I meant impossible size of the source file. We clearly want
to be running workflows with that many jobs smoothly. I just don't think
large source files (whether Swift or Karajan) are a good way to do it.
I'm quite (pleasantly) surprised that Swift/Karajan can load and run XML
files with 1M+ lines.

Of course, that doesn't mean we shouldn't try to fix the problems that
might arise with large source files if possible.

> 
> > 
> > 
> > Sent via BlackBerry from T-Mobile
> > 
> > -----Original Message-----
> > From: bugzilla-daemon at mcs.anl.gov
> > 
> > Date: Sat, 30 Jun 2007 17:52:07 
> > To:swift-devel at ci.uchicago.edu
> > Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
> > 
> > 
> > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72
> > 
> > 
> > 
> > 
> > 
> > ------- Comment #6 from hategan at mcs.anl.gov  2007-06-30 17:52 -------
> > (In reply to comment #4)
> > > Hi again,
> > > Here is an update of yesterday's 244 molecule run.  The experiment ran further
> > > than before, but it still did not complete.  There were 240 molecules that
> > > completed successfully (in the previous run, no molecule finished), but 4
> > > molecules still did not finish. 
> > > 
> > 
> > Actually it looks tasks worked fine:
> > bash-3.1$ cat MolDyn-244-63ar6atbg2ae1.log |grep "type=1.*ubmitted"|wc
> >   24309  243090 2806214
> > bash-3.1$ cat MolDyn-244-63ar6atbg2ae1.log |grep "type=1.*ailed"|wc
> >    3614   36140  405816
> > bash-3.1$ cat MolDyn-244-63ar6atbg2ae1.log |grep "type=1.*ompleted"|wc
> >   20695  206950 2389556
> > 
> > All tasks are accounted for. It may be that some jobs failed 3 times in a row.
> > >From the logs it looks like the workflow almost finished and it got to the
> > point where the error reporting was to be done. Perhaps the stack overflow that
> > you saw occurred there, and perhaps the impossible size of the workflow might
> > have something to do with it.
> > 
> > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 


From bugzilla-daemon at mcs.anl.gov  Sun Jul  1 02:15:35 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Sun,  1 Jul 2007 02:15:35 -0500 (CDT)
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <bug-72-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070701071535.1AE3416506@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72


------- Comment #10 from hategan at mcs.anl.gov  2007-07-01 02:15 -------
(In reply to comment #7)
> (In reply to comment #6)
> > (In reply to comment #4)
> > 
> > There were two.
> > 
> I meant to say "issues"... from the discussion I had with Yong, I believe he
> addressed both of them.

Ok. Got confused.

> > > We also fixed a badly handled exception we had [...]
> > Can you be more specific? [...]
> > 
> Sure.  Double check file SubmissionThread.java, notice that the thread will
> live as long as exit is not set...
> Also, check the StatusThread.java, 

Right. Missed that.

> 
> > > note that Swift was setting the set status of submitted tasks to the Falkon
> > > provider in a separate thread,
> > 
> > Swift does not set status of tasks. That's what the provider is supposed to do.
> > 
> OK, there are several separate threads, one that sets the status of the task
> for Swift, another that performs the submit, another that receives
> notifications, etc.  The common data structure between the set status thread
> and the submit thread is a queue; if the submission thread dies, the queue is
> still valid, and the set status thread could still insert tasks into the queue
> and set the status to submitted, although there would be no submission thread
> alive to perform the submission itself to Falkon.

That sounds like the provider, not Swift. Maybe I misunderstood something?


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Sun Jul  1 02:18:46 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Sun,  1 Jul 2007 02:18:46 -0500 (CDT)
Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data
In-Reply-To: <bug-76-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070701071846.56FF016505@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=76


------- Comment #1 from hategan at mcs.anl.gov  2007-07-01 02:18 -------
This would require a data file pointer store (VDC like thing) which can record
where intermediate files are instead of assuming they are always available on
the submit host.


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Sun Jul  1 10:48:09 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Sun,  1 Jul 2007 10:48:09 -0500 (CDT)
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <bug-72-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070701154809.1AFB5164DB@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72


------- Comment #11 from iraicu at cs.uchicago.edu  2007-07-01 10:48 -------
(In reply to comment #9)
> (In reply to comment #7)
> > (In reply to comment #6)
> > > (In reply to comment #4)
> > The same machine (tg-v024) that we had trouble with before acted up again, I
> > should have removed it before we started the experiment.  If this is the
> > consensus, we can certainly try it again, and make sure this machine is not in
> > the resource pool.  Another idea is to increase the retry # from 3 to something
> > higher, maybe 10, 30, etc?
> 
> Not a good idea in the general case, since many times the error may not be
> something temporary. The swift scheduler takes bad machines into account and
> attempts to avoid submitting to them.
>
Yes, but in this case, Falkon was the only set of resources that were available
to Swift, so giving up early means giving up on the entire workflow.  If it was
indeed that the # of failures reached up to the maximum of 3 and that is why
the worklow didn't complete, I would argue that it would be worthwhile to
increase this upper ceiling.... at least when running solely with Falkon, or at
the very least, for this experiment to see th 244 mol run succeed.  Remember
that Falkon is much faster than GRAM/PBS, so if errors happen quick, as in the
case on this tg-v024 node, where it happens in <50 ms, then 1000s of errors can
happen in a matter of seconds to minutes....  I am not sure what the correct
solution is, bu something to consider as the dynamics of the problem is now
different than it was before prior to Falkon.

Ioan 
> > 
> > Ioan
> > 
> 


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Sun Jul  1 10:49:46 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Sun,  1 Jul 2007 10:49:46 -0500 (CDT)
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <bug-72-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070701154946.60D4C164DB@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72


------- Comment #12 from iraicu at cs.uchicago.edu  2007-07-01 10:49 -------
(In reply to comment #10)
> (In reply to comment #7)
> > (In reply to comment #6)
> > > (In reply to comment #4)
> > > 
> > > There were two.
> > > 
> > I meant to say "issues"... from the discussion I had with Yong, I believe he
> > addressed both of them.
> 
> Ok. Got confused.
> 
> > > > We also fixed a badly handled exception we had [...]
> > > Can you be more specific? [...]
> > > 
> > Sure.  Double check file SubmissionThread.java, notice that the thread will
> > live as long as exit is not set...
> > Also, check the StatusThread.java, 
> 
> Right. Missed that.
> 
> > 
> > > > note that Swift was setting the set status of submitted tasks to the Falkon
> > > > provider in a separate thread,
> > > 
> > > Swift does not set status of tasks. That's what the provider is supposed to do.
> > > 
> > OK, there are several separate threads, one that sets the status of the task
> > for Swift, another that performs the submit, another that receives
> > notifications, etc.  The common data structure between the set status thread
> > and the submit thread is a queue; if the submission thread dies, the queue is
> > still valid, and the set status thread could still insert tasks into the queue
> > and set the status to submitted, although there would be no submission thread
> > alive to perform the submission itself to Falkon.
> 
> That sounds like the provider, not Swift. Maybe I misunderstood something?
> 

Right, the provider has multiple threads, and if any one of them exit
prematurely, then it cannot function correctly.  
Ioan


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Sun Jul  1 11:36:30 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Sun,  1 Jul 2007 11:36:30 -0500 (CDT)
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <bug-72-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070701163630.E977916506@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72


------- Comment #13 from hategan at mcs.anl.gov  2007-07-01 11:36 -------
(In reply to comment #11)
> (In reply to comment #9)
> > (In reply to comment #7)
> > > (In reply to comment #6)
> > > > (In reply to comment #4)
> > > The same machine (tg-v024) that we had trouble with before acted up again, I
> > > should have removed it before we started the experiment.  If this is the
> > > consensus, we can certainly try it again, and make sure this machine is not in
> > > the resource pool.  Another idea is to increase the retry # from 3 to something
> > > higher, maybe 10, 30, etc?
> > 
> > Not a good idea in the general case, since many times the error may not be
> > something temporary. The swift scheduler takes bad machines into account and
> > attempts to avoid submitting to them.
> >
> Yes, but in this case, Falkon was the only set of resources that were available
> to Swift, so giving up early means giving up on the entire workflow.  If it was
> indeed that the # of failures reached up to the maximum of 3 and that is why
> the worklow didn't complete, I would argue that it would be worthwhile to
> increase this upper ceiling.... at least when running solely with Falkon, or at
> the very least, for this experiment to see th 244 mol run succeed.  Remember
> that Falkon is much faster than GRAM/PBS, so if errors happen quick, as in the
> case on this tg-v024 node, where it happens in <50 ms, then 1000s of errors can
> happen in a matter of seconds to minutes....  I am not sure what the correct
> solution is, bu something to consider as the dynamics of the problem is now
> different than it was before prior to Falkon.

By themselves retries don't solve the problem. There must be a reasonable
chance that a job will finish. If you have 999 busy workers and 1 bad worker,
restarting 100 times will still cause the workflow to fail, and the fact that
restarts will happen fast is not exactly helping. 

While a bit reluctant to add more options, I guess the number of restarts could
be one in the future.

> 
> Ioan 
> > > 
> > > Ioan
> > > 
> > 
> 


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Mon Jul  2 07:58:28 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Mon,  2 Jul 2007 07:58:28 -0500 (CDT)
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <bug-72-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070702125828.05B7B16506@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72


------- Comment #14 from nefedova at mcs.anl.gov  2007-07-02 07:58 -------
This is what I had on stdout (stack overflow error). The last line was printed
over and over again 100s of times.

<Snip - normal stdout output>
*****************************SUPER_DEBUG: waiting for notification...
chrm_long completed
Exception in thread "Worker 3" java.lang.StackOverflowError
        at java.util.ArrayList.addAll(ArrayList.java:472)
        at
org.globus.cog.karajan.arguments.VariableArgumentsImpl.appendAll(VariableArgumentsImpl.java:79)
        at
org.globus.cog.karajan.workflow.futures.FutureVariableArguments.appendAll(FutureVariableArguments.java:40)
        at
org.globus.cog.karajan.arguments.OrderedParallelVariableArguments.flushBuffer(OrderedParallelVariableArguments.java:67)
        at
org.globus.cog.karajan.arguments.OrderedParallelVariableArguments.prevClosed(OrderedParallelVariableArguments.java:73)
        at
org.globus.cog.karajan.arguments.OrderedParallelVariableArguments.prevClosed(OrderedParallelVariableArguments.java:78)
        at
org.globus.cog.karajan.arguments.OrderedParallelVariableArguments.prevClosed(OrderedParallelVariableArguments.java:78)


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Mon Jul  2 09:17:11 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Mon,  2 Jul 2007 09:17:11 -0500 (CDT)
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <bug-72-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070702141711.DC3B516506@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72


------- Comment #15 from hategan at mcs.anl.gov  2007-07-02 09:17 -------
(In reply to comment #14)
> Exception in thread "Worker 3" java.lang.StackOverflowError
> [...]
> org.globus.cog.karajan.arguments.OrderedParallelVariableArguments.prevClosed(OrderedParallelVariableArguments.java:78)
>         at
> org.globus.cog.karajan.arguments.OrderedParallelVariableArguments.prevClosed(OrderedParallelVariableArguments.java:78)
> 
> (repeat ad nauseaum)

Fix to Karajan committed. Needs testing since it's in a delicate place.


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From benc at hawaga.org.uk  Mon Jul  2 13:42:30 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 3 Jul 2007 00:12:30 +0530 (IST)
Subject: [Swift-devel] @strcut
Message-ID: <Pine.OSX.4.64.0707030011010.18135@soju.hawaga.org.uk>


r881 makes a quick-and-dirty regexp function, @strcut, available. It 
doesn't handle errors nicely (or at all), but I've put it in so Nika can 
experiment with it a bit in an attempt to reduce her SwiftScript code 
size.

If its useful, I'll tidy it up, otherwise I'll back it out.

-- 


From foster at mcs.anl.gov  Mon Jul  2 14:05:54 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Mon, 02 Jul 2007 14:05:54 -0500
Subject: [Swift-devel] @strcut
In-Reply-To: <Pine.OSX.4.64.0707030011010.18135@soju.hawaga.org.uk>
References: <Pine.OSX.4.64.0707030011010.18135@soju.hawaga.org.uk>
Message-ID: <46894C92.6090602@mcs.anl.gov>

Hi,

I am curious--is this the only reason why the MolDyn program is so 
large, or are there other things that can be done to reduce code size?

Ian.

Ben Clifford wrote:
> r881 makes a quick-and-dirty regexp function, @strcut, available. It 
> doesn't handle errors nicely (or at all), but I've put it in so Nika can 
> experiment with it a bit in an attempt to reduce her SwiftScript code 
> size.
>
> If its useful, I'll tidy it up, otherwise I'll back it out.
>
>   

-- 

   Ian Foster, Director, Computation Institute
Argonne National Laboratory & University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
      Globus Alliance: www.globus.org.


From nefedova at mcs.anl.gov  Mon Jul  2 14:16:44 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Mon, 2 Jul 2007 14:16:44 -0500
Subject: [Swift-devel] @strcut
In-Reply-To: <46894C92.6090602@mcs.anl.gov>
References: <Pine.OSX.4.64.0707030011010.18135@soju.hawaga.org.uk>
	<46894C92.6090602@mcs.anl.gov>
Message-ID: <33F67BC7-F2CD-4402-89F9-C27AF8162A6F@mcs.anl.gov>

this is the main thing that prevented me from using loops. Once I re- 
write it with loops, the size of the code would be reduced dramatically.

On Jul 2, 2007, at 2:05 PM, Ian Foster wrote:

> Hi,
>
> I am curious--is this the only reason why the MolDyn program is so  
> large, or are there other things that can be done to reduce code size?
>
> Ian.
>
> Ben Clifford wrote:
>> r881 makes a quick-and-dirty regexp function, @strcut, available.  
>> It doesn't handle errors nicely (or at all), but I've put it in so  
>> Nika can experiment with it a bit in an attempt to reduce her  
>> SwiftScript code size.
>>
>> If its useful, I'll tidy it up, otherwise I'll back it out.
>>
>>
>
> -- 
>
>   Ian Foster, Director, Computation Institute
> Argonne National Laboratory & University of Chicago
> Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
> Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
> Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
>      Globus Alliance: www.globus.org.
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From foster at mcs.anl.gov  Mon Jul  2 14:17:11 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Mon, 02 Jul 2007 14:17:11 -0500
Subject: [Swift-devel] @strcut
In-Reply-To: <33F67BC7-F2CD-4402-89F9-C27AF8162A6F@mcs.anl.gov>
References: <Pine.OSX.4.64.0707030011010.18135@soju.hawaga.org.uk>
	<46894C92.6090602@mcs.anl.gov>
	<33F67BC7-F2CD-4402-89F9-C27AF8162A6F@mcs.anl.gov>
Message-ID: <46894F37.5000506@mcs.anl.gov>

cool ...

Veronika Nefedova wrote:
> this is the main thing that prevented me from using loops. Once I 
> re-write it with loops, the size of the code would be reduced 
> dramatically.
>
> On Jul 2, 2007, at 2:05 PM, Ian Foster wrote:
>
>> Hi,
>>
>> I am curious--is this the only reason why the MolDyn program is so 
>> large, or are there other things that can be done to reduce code size?
>>
>> Ian.
>>
>> Ben Clifford wrote:
>>> r881 makes a quick-and-dirty regexp function, @strcut, available. It 
>>> doesn't handle errors nicely (or at all), but I've put it in so Nika 
>>> can experiment with it a bit in an attempt to reduce her SwiftScript 
>>> code size.
>>>
>>> If its useful, I'll tidy it up, otherwise I'll back it out.
>>>
>>>
>>
>> -- 
>>
>>   Ian Foster, Director, Computation Institute
>> Argonne National Laboratory & University of Chicago
>> Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
>> Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
>> Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
>>      Globus Alliance: www.globus.org.
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>

-- 

   Ian Foster, Director, Computation Institute
Argonne National Laboratory & University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
      Globus Alliance: www.globus.org.


From hategan at mcs.anl.gov  Mon Jul  2 14:47:26 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Mon, 02 Jul 2007 14:47:26 -0500
Subject: [Swift-devel] @strcut
In-Reply-To: <Pine.OSX.4.64.0707030011010.18135@soju.hawaga.org.uk>
References: <Pine.OSX.4.64.0707030011010.18135@soju.hawaga.org.uk>
Message-ID: <1183405646.21420.1.camel@blabla.mcs.anl.gov>

Something like that should be added to the Karajan system library too.

On Tue, 2007-07-03 at 00:12 +0530, Ben Clifford wrote:
> r881 makes a quick-and-dirty regexp function, @strcut, available. It 
> doesn't handle errors nicely (or at all), but I've put it in so Nika can 
> experiment with it a bit in an attempt to reduce her SwiftScript code 
> size.
> 
> If its useful, I'll tidy it up, otherwise I'll back it out.
> 


From benc at hawaga.org.uk  Mon Jul  2 20:08:56 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 3 Jul 2007 06:38:56 +0530 (IST)
Subject: [Swift-devel] recent karajan changes causing trouble
Message-ID: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>


I get the below when I try to run a hello world workflow 
(examples/tutorial/q1.swift).

I think Nika also saw something that looks similar, with a different 
workflow.

This is with cog r1655.

I reverted my checkout to cog r1650 (svn merge -r1655:1650 .) and hello 
world runs ok (r1650 being before the most recent set of cog commits).


$ swift -debug q1.swift 
Recompilation suppressed.

null
        kernel:cache @ sys.xml, line: 3
Caused by: java.lang.UnsupportedOperationException
        at java.util.AbstractMap.put(AbstractMap.java:228)
        at 
org.globus.cog.karajan.workflow.nodes.CacheNode.getTrackingArguments(CacheNode.java:153)
        at 
org.globus.cog.karajan.workflow.nodes.CacheNode.post(CacheNode.java:77)
        at 
org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192)
        at 
org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.nonArgChildCompleted(PartialArgumentsContainer.java:90)
        at 
org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.childCompleted(PartialArgumentsContainer.java:85)
        at 
org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33)
        at 
org.globus.cog.karajan.workflow.nodes.CacheNode.notificationEvent(CacheNode.java:111)
        at 
org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334)
        at 
org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123)
        at 
org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97)
        at 
org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:172)
        at 
org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298)
        at 
org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58)
        at 
org.globus.cog.karajan.workflow.nodes.Namespace.post(Namespace.java:40)
        at 
org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192)
        at 
org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.nonArgChildCompleted(PartialArgumentsContainer.java:90)


-- 


From hategan at mcs.anl.gov  Mon Jul  2 21:23:37 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Mon, 02 Jul 2007 21:23:37 -0500
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
Message-ID: <1183429417.16404.0.camel@blabla.mcs.anl.gov>

Yup. Try now.

On Tue, 2007-07-03 at 06:38 +0530, Ben Clifford wrote:
> I get the below when I try to run a hello world workflow 
> (examples/tutorial/q1.swift).
> 
> I think Nika also saw something that looks similar, with a different 
> workflow.
> 
> This is with cog r1655.
> 
> I reverted my checkout to cog r1650 (svn merge -r1655:1650 .) and hello 
> world runs ok (r1650 being before the most recent set of cog commits).
> 
> 
> $ swift -debug q1.swift 
> Recompilation suppressed.
> 
> null
>         kernel:cache @ sys.xml, line: 3
> Caused by: java.lang.UnsupportedOperationException
>         at java.util.AbstractMap.put(AbstractMap.java:228)
>         at 
> org.globus.cog.karajan.workflow.nodes.CacheNode.getTrackingArguments(CacheNode.java:153)
>         at 
> org.globus.cog.karajan.workflow.nodes.CacheNode.post(CacheNode.java:77)
>         at 
> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192)
>         at 
> org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.nonArgChildCompleted(PartialArgumentsContainer.java:90)
>         at 
> org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.childCompleted(PartialArgumentsContainer.java:85)
>         at 
> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33)
>         at 
> org.globus.cog.karajan.workflow.nodes.CacheNode.notificationEvent(CacheNode.java:111)
>         at 
> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334)
>         at 
> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123)
>         at 
> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97)
>         at 
> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:172)
>         at 
> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298)
>         at 
> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58)
>         at 
> org.globus.cog.karajan.workflow.nodes.Namespace.post(Namespace.java:40)
>         at 
> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192)
>         at 
> org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.nonArgChildCompleted(PartialArgumentsContainer.java:90)
> 
> 


From benc at hawaga.org.uk  Mon Jul  2 22:40:11 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 3 Jul 2007 03:40:11 +0000 (GMT)
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <1183429417.16404.0.camel@blabla.mcs.anl.gov>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707030339440.7513@dildano.hawaga.org.uk>


On Mon, 2 Jul 2007, Mihael Hategan wrote:

> Yup. Try now.
> 

works 

-- 


From benc at hawaga.org.uk  Tue Jul  3 12:53:30 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 3 Jul 2007 23:23:30 +0530 (IST)
Subject: [Swift-devel] mapper syntax
Message-ID: <Pine.OSX.4.64.0707032321100.6578@soju.hawaga.org.uk>


The syntax:

  imagefiles if[] 
<my_mapper;foo=@strcat(filename,blah),otherparm=true,moreparams=false>;

is rather noisy all on one line.

A syntax change could be to express the above as:

  imagefiles if[] map my_mapper {
    foo = @strcat(filename,blah);
    otherparam = true;
    moreparams = false;
  };


-- 


From benc at hawaga.org.uk  Tue Jul  3 12:50:35 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 3 Jul 2007 23:20:35 +0530 (IST)
Subject: [Swift-devel] xml tc.data format
Message-ID: <Pine.OSX.4.64.0707032317260.6578@soju.hawaga.org.uk>


I'd like to make tc.data be formatted as XML:

i) the present tab-deliminated format has usability issues (pretty much 
the same as Makefile has). tabs are used for a reason (I think because 
some fields in the file can have spaces in them, or something like that).

ii) it would be more consistent with the sites.xml format.

-- 


From hategan at mcs.anl.gov  Tue Jul  3 13:01:47 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 03 Jul 2007 13:01:47 -0500
Subject: [Swift-devel] mapper syntax
In-Reply-To: <Pine.OSX.4.64.0707032321100.6578@soju.hawaga.org.uk>
References: <Pine.OSX.4.64.0707032321100.6578@soju.hawaga.org.uk>
Message-ID: <1183485707.17547.2.camel@blabla.mcs.anl.gov>

On Tue, 2007-07-03 at 23:23 +0530, Ben Clifford wrote:
> The syntax:
> 
>   imagefiles if[] 
> <my_mapper;foo=@strcat(filename,blah),otherparm=true,moreparams=false>;
> 
> is rather noisy all on one line.
> 
> A syntax change could be to express the above as:
> 
>   imagefiles if[] map my_mapper {

What if "map" be replaced by some operator (":", "~", "#")?

>     foo = @strcat(filename,blah);
>     otherparam = true;
>     moreparams = false;
>   };

The semicolon should not be required after a '}'.

> 
> 


From hategan at mcs.anl.gov  Tue Jul  3 13:04:09 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 03 Jul 2007 13:04:09 -0500
Subject: [Swift-devel] xml tc.data format
In-Reply-To: <Pine.OSX.4.64.0707032317260.6578@soju.hawaga.org.uk>
References: <Pine.OSX.4.64.0707032317260.6578@soju.hawaga.org.uk>
Message-ID: <1183485849.17547.5.camel@blabla.mcs.anl.gov>

On Tue, 2007-07-03 at 23:20 +0530, Ben Clifford wrote:
> I'd like to make tc.data be formatted as XML:
> 
> i) the present tab-deliminated format has usability issues (pretty much 
> the same as Makefile has). tabs are used for a reason (I think because 
> some fields in the file can have spaces in them, or something like that).

1. Having named args (attributes) would make it easier to skip some of
them instead of writing NULL (or was it null?).

2. The code for parsing it would be much simpler, and we could probably
remove the dependency on vds.

> 
> ii) it would be more consistent with the sites.xml format.
> 


From benc at hawaga.org.uk  Tue Jul  3 14:33:51 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 3 Jul 2007 19:33:51 +0000 (GMT)
Subject: [Swift-devel] xml tc.data format
In-Reply-To: <1183485849.17547.5.camel@blabla.mcs.anl.gov>
References: <Pine.OSX.4.64.0707032317260.6578@soju.hawaga.org.uk>
	<1183485849.17547.5.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707031928300.7513@dildano.hawaga.org.uk>

another thing i was thinking about for the config file formats, is to 
change profile specification from:

 <profile namespace="globus" key="joblimit">5</profile>

which is how profiles are represented in the VDS1-style sites.xml

to a more document-like(?) form such as:

  <globus:joblimit> 5 </globus:joblimit>

This makes better use of XML structure, but I don't know how it would fit 
(perhaps quite badly) into the present way in which the swift code reads 
in sites.xml.

-- 


From hategan at mcs.anl.gov  Tue Jul  3 14:37:20 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 03 Jul 2007 14:37:20 -0500
Subject: [Swift-devel] xml tc.data format
In-Reply-To: <Pine.LNX.4.64.0707031928300.7513@dildano.hawaga.org.uk>
References: <Pine.OSX.4.64.0707032317260.6578@soju.hawaga.org.uk>
	<1183485849.17547.5.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707031928300.7513@dildano.hawaga.org.uk>
Message-ID: <1183491440.24728.1.camel@blabla.mcs.anl.gov>

On Tue, 2007-07-03 at 19:33 +0000, Ben Clifford wrote:
> another thing i was thinking about for the config file formats, is to 
> change profile specification from:
> 
>  <profile namespace="globus" key="joblimit">5</profile>
> 
> which is how profiles are represented in the VDS1-style sites.xml
> 
> to a more document-like(?) form such as:
> 
>   <globus:joblimit> 5 </globus:joblimit>
> 
> This makes better use of XML structure, but I don't know how it would fit 
> (perhaps quite badly) into the present way in which the swift code reads 
> in sites.xml.

You'd have to pre-define things, so you won't get the flexibility of
dynamic properties.

> 


From benc at hawaga.org.uk  Tue Jul  3 23:34:25 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 4 Jul 2007 10:04:25 +0530 (IST)
Subject: [Swift-devel] xml tc.data format
In-Reply-To: <1183485849.17547.5.camel@blabla.mcs.anl.gov>
References: <Pine.OSX.4.64.0707032317260.6578@soju.hawaga.org.uk>
	<1183485849.17547.5.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.OSX.4.64.0707041002310.1364@soju.hawaga.org.uk>


On Tue, 3 Jul 2007, Mihael Hategan wrote:

> 2. The code for parsing it would be much simpler, and we could probably 
> remove the dependency on vds.

VDSScheduler uses the RoundRobin site selector from VDS1.

But people aren't using VDSScheduler so that can probably go away too.

-- 


From benc at hawaga.org.uk  Wed Jul  4 02:02:23 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 4 Jul 2007 12:32:23 +0530 (IST)
Subject: [Swift-devel] license
Message-ID: <Pine.OSX.4.64.0707041231270.20053@soju.hawaga.org.uk>


There's a jar file in lib/ called: jug-lgpl-2.0.0.jar

The filename might suggest that this is subject to the LGPL.

Does anyone know?

-- 


From benc at hawaga.org.uk  Wed Jul  4 00:37:39 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 4 Jul 2007 11:07:39 +0530 (IST)
Subject: [Swift-devel] dot files by default
Message-ID: <Pine.OSX.4.64.0707041106210.1364@soju.hawaga.org.uk>

does anyone have preference about whether .dot graphviz files are 
generated by default or not?

I find them a bit annoying in as much as they double the number of run 
files in my working directories to no immediate benefit.

-- 


From hategan at mcs.anl.gov  Wed Jul  4 22:57:28 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 04 Jul 2007 22:57:28 -0500
Subject: [Swift-devel] license
In-Reply-To: <Pine.OSX.4.64.0707041231270.20053@soju.hawaga.org.uk>
References: <Pine.OSX.4.64.0707041231270.20053@soju.hawaga.org.uk>
Message-ID: <1183607848.3638.1.camel@blabla.mcs.anl.gov>

Yes. It's actually available in 2 licenses.
http://jug.safehaus.org/Download

On Wed, 2007-07-04 at 12:32 +0530, Ben Clifford wrote:
> There's a jar file in lib/ called: jug-lgpl-2.0.0.jar
> 
> The filename might suggest that this is subject to the LGPL.
> 
> Does anyone know?
> 


From benc at hawaga.org.uk  Wed Jul  4 23:02:14 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 5 Jul 2007 04:02:14 +0000 (GMT)
Subject: [Swift-devel] license
In-Reply-To: <1183607848.3638.1.camel@blabla.mcs.anl.gov>
References: <Pine.OSX.4.64.0707041231270.20053@soju.hawaga.org.uk>
	<1183607848.3638.1.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707050401010.7513@dildano.hawaga.org.uk>


ok cool. I suspect the dev.globus incubatorgods will be happier with the 
ASL one. Funny that they have separate jar files for each license.

On Wed, 4 Jul 2007, Mihael Hategan wrote:

> Yes. It's actually available in 2 licenses.
> http://jug.safehaus.org/Download
> 
> On Wed, 2007-07-04 at 12:32 +0530, Ben Clifford wrote:
> > There's a jar file in lib/ called: jug-lgpl-2.0.0.jar
> > 
> > The filename might suggest that this is subject to the LGPL.
> > 
> > Does anyone know?
> > 
> 
> 


From hategan at mcs.anl.gov  Wed Jul  4 23:09:46 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 04 Jul 2007 23:09:46 -0500
Subject: [Swift-devel] license
In-Reply-To: <Pine.LNX.4.64.0707050401010.7513@dildano.hawaga.org.uk>
References: <Pine.OSX.4.64.0707041231270.20053@soju.hawaga.org.uk>
	<1183607848.3638.1.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707050401010.7513@dildano.hawaga.org.uk>
Message-ID: <1183608586.4172.0.camel@blabla.mcs.anl.gov>

What's wrong with LGPL now?

On Thu, 2007-07-05 at 04:02 +0000, Ben Clifford wrote:
> ok cool. I suspect the dev.globus incubatorgods will be happier with the 
> ASL one. Funny that they have separate jar files for each license.
> 
> On Wed, 4 Jul 2007, Mihael Hategan wrote:
> 
> > Yes. It's actually available in 2 licenses.
> > http://jug.safehaus.org/Download
> > 
> > On Wed, 2007-07-04 at 12:32 +0530, Ben Clifford wrote:
> > > There's a jar file in lib/ called: jug-lgpl-2.0.0.jar
> > > 
> > > The filename might suggest that this is subject to the LGPL.
> > > 
> > > Does anyone know?
> > > 
> > 
> > 
> 


From benc at hawaga.org.uk  Wed Jul  4 23:13:29 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 5 Jul 2007 04:13:29 +0000 (GMT)
Subject: [Swift-devel] license
In-Reply-To: <1183608586.4172.0.camel@blabla.mcs.anl.gov>
References: <Pine.OSX.4.64.0707041231270.20053@soju.hawaga.org.uk> 
	<1183607848.3638.1.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707050401010.7513@dildano.hawaga.org.uk>
	<1183608586.4172.0.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707050413230.7513@dildano.hawaga.org.uk>


paranoid lawyers?

On Wed, 4 Jul 2007, Mihael Hategan wrote:

> What's wrong with LGPL now?
> 
> On Thu, 2007-07-05 at 04:02 +0000, Ben Clifford wrote:
> > ok cool. I suspect the dev.globus incubatorgods will be happier with the 
> > ASL one. Funny that they have separate jar files for each license.
> > 
> > On Wed, 4 Jul 2007, Mihael Hategan wrote:
> > 
> > > Yes. It's actually available in 2 licenses.
> > > http://jug.safehaus.org/Download
> > > 
> > > On Wed, 2007-07-04 at 12:32 +0530, Ben Clifford wrote:
> > > > There's a jar file in lib/ called: jug-lgpl-2.0.0.jar
> > > > 
> > > > The filename might suggest that this is subject to the LGPL.
> > > > 
> > > > Does anyone know?
> > > > 
> > > 
> > > 
> > 
> 
> 


From benc at hawaga.org.uk  Wed Jul  4 23:41:38 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 5 Jul 2007 04:41:38 +0000 (GMT)
Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data
In-Reply-To: <20070701071846.56FF016505@foxtrot.mcs.anl.gov>
References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707050435520.10289@dildano.hawaga.org.uk>


I don't think that's true.

If data files are labelled with URIs rather than 
paths-relative-to-submit-directory, then those URIs are understandable 
without a VDC-as-entity.

You don't need a separate VDC to tell you how to get at myfile here:

  file myfile <"gsiftp://terminable.ci.uchicago.edu/scratch/foo/">;

The 'data file pointer store' exists already - its the hierarchical 
namespace that is rooted in IANA's management of the URI and DNS space, 
continues to UC's management of DNS space and then down to my management 
of terminable's filesystem space and then down to whoever owns the foo 
directory.


On Sun, 1 Jul 2007, bugzilla-daemon at mcs.anl.gov wrote:

> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=76
> 
> 
> 
> 
> 
> ------- Comment #1 from hategan at mcs.anl.gov  2007-07-01 02:18 -------
> This would require a data file pointer store (VDC like thing) which can record
> where intermediate files are instead of assuming they are always available on
> the submit host.
> 
> 
> 


From benc at hawaga.org.uk  Thu Jul  5 01:18:10 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 5 Jul 2007 11:48:10 +0530 (IST)
Subject: [Swift-devel] language behaviour tests
Message-ID: <Pine.OSX.4.64.0707051143190.13733@soju.hawaga.org.uk>


In r891 I put in some language behaviour tests in 
tests/language-behaviour/

These run a bunch of small SwiftScript programs locally and check that 
they output expected text - for example, checking that @strcat really does 
concatenate, that + really does add, and other such things.

I built them for testing various changes I've been playing with at the 
language parsing and compilation layer.

Previously I was using the tests 
in tests/language/ for testing parser changes.

The language/ tests check that input SwiftScript always produces the same 
.xml intermediate form, whilst these new tests check that the input 
SwiftScript always produces the same output (in a file) on execution, 
without regard to whether the .xml and .kml intermediate files take a 
different form or not.

-- 


From nefedova at mcs.anl.gov  Thu Jul  5 08:34:39 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Thu, 5 Jul 2007 08:34:39 -0500
Subject: [Swift-devel] dot files by default
In-Reply-To: <Pine.OSX.4.64.0707041106210.1364@soju.hawaga.org.uk>
References: <Pine.OSX.4.64.0707041106210.1364@soju.hawaga.org.uk>
Message-ID: <69D182A1-2658-4B6E-85E7-6B86ECB97A13@mcs.anl.gov>

It would've been even better if these dot files were generated  
correctly. There is Bug #35 about it...

Nika

On Jul 4, 2007, at 12:37 AM, Ben Clifford wrote:

> does anyone have preference about whether .dot graphviz files are
> generated by default or not?
>
> I find them a bit annoying in as much as they double the number of run
> files in my working directories to no immediate benefit.
>
> -- 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From hategan at mcs.anl.gov  Thu Jul  5 08:55:31 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 05 Jul 2007 08:55:31 -0500
Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data
In-Reply-To: <Pine.LNX.4.64.0707050435520.10289@dildano.hawaga.org.uk>
References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov>
	<Pine.LNX.4.64.0707050435520.10289@dildano.hawaga.org.uk>
Message-ID: <1183643731.5084.3.camel@blabla.mcs.anl.gov>

I think you're missing something. You need to remember where the files
are. The mapping information becomes insufficient. It tells you where
some initial files were, but it won't contain any site information. And
that's good, because the decision of where something is done is made at
run-time. But you still need some store (even though probably
memory-based and only persistent through one swift run).

On Thu, 2007-07-05 at 04:41 +0000, Ben Clifford wrote:
> I don't think that's true.
> 
> If data files are labelled with URIs rather than 
> paths-relative-to-submit-directory, then those URIs are understandable 
> without a VDC-as-entity.
> 
> You don't need a separate VDC to tell you how to get at myfile here:
> 
>   file myfile <"gsiftp://terminable.ci.uchicago.edu/scratch/foo/">;
> 
> The 'data file pointer store' exists already - its the hierarchical 
> namespace that is rooted in IANA's management of the URI and DNS space, 
> continues to UC's management of DNS space and then down to my management 
> of terminable's filesystem space and then down to whoever owns the foo 
> directory.
> 
> 
> On Sun, 1 Jul 2007, bugzilla-daemon at mcs.anl.gov wrote:
> 
> > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=76
> > 
> > 
> > 
> > 
> > 
> > ------- Comment #1 from hategan at mcs.anl.gov  2007-07-01 02:18 -------
> > This would require a data file pointer store (VDC like thing) which can record
> > where intermediate files are instead of assuming they are always available on
> > the submit host.
> > 
> > 
> > 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 


From hategan at mcs.anl.gov  Thu Jul  5 08:59:16 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 05 Jul 2007 08:59:16 -0500
Subject: [Swift-devel] dot files by default
In-Reply-To: <69D182A1-2658-4B6E-85E7-6B86ECB97A13@mcs.anl.gov>
References: <Pine.OSX.4.64.0707041106210.1364@soju.hawaga.org.uk>
	<69D182A1-2658-4B6E-85E7-6B86ECB97A13@mcs.anl.gov>
Message-ID: <1183643956.5084.7.camel@blabla.mcs.anl.gov>

On Thu, 2007-07-05 at 08:34 -0500, Veronika Nefedova wrote:
> It would've been even better if these dot files were generated  
> correctly. There is Bug #35 about it...

That's helpful ;)

> 
> Nika
> 
> On Jul 4, 2007, at 12:37 AM, Ben Clifford wrote:
> 
> > does anyone have preference about whether .dot graphviz files are
> > generated by default or not?
> >
> > I find them a bit annoying in as much as they double the number of run
> > files in my working directories to no immediate benefit.
> >
> > -- 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 


From benc at hawaga.org.uk  Thu Jul  5 09:05:17 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 5 Jul 2007 14:05:17 +0000 (GMT)
Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data
In-Reply-To: <1183643731.5084.3.camel@blabla.mcs.anl.gov>
References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov> 
	<Pine.LNX.4.64.0707050435520.10289@dildano.hawaga.org.uk>
	<1183643731.5084.3.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707051404380.7513@dildano.hawaga.org.uk>


how does it know where files are now, between jobs?

On Thu, 5 Jul 2007, Mihael Hategan wrote:

> I think you're missing something. You need to remember where the files
> are. The mapping information becomes insufficient. It tells you where
> some initial files were, but it won't contain any site information. And
> that's good, because the decision of where something is done is made at
> run-time. But you still need some store (even though probably
> memory-based and only persistent through one swift run).
> 
> On Thu, 2007-07-05 at 04:41 +0000, Ben Clifford wrote:
> > I don't think that's true.
> > 
> > If data files are labelled with URIs rather than 
> > paths-relative-to-submit-directory, then those URIs are understandable 
> > without a VDC-as-entity.
> > 
> > You don't need a separate VDC to tell you how to get at myfile here:
> > 
> >   file myfile <"gsiftp://terminable.ci.uchicago.edu/scratch/foo/">;
> > 
> > The 'data file pointer store' exists already - its the hierarchical 
> > namespace that is rooted in IANA's management of the URI and DNS space, 
> > continues to UC's management of DNS space and then down to my management 
> > of terminable's filesystem space and then down to whoever owns the foo 
> > directory.
> > 
> > 
> > On Sun, 1 Jul 2007, bugzilla-daemon at mcs.anl.gov wrote:
> > 
> > > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=76
> > > 
> > > 
> > > 
> > > 
> > > 
> > > ------- Comment #1 from hategan at mcs.anl.gov  2007-07-01 02:18 -------
> > > This would require a data file pointer store (VDC like thing) which can record
> > > where intermediate files are instead of assuming they are always available on
> > > the submit host.
> > > 
> > > 
> > > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > 
> 
> 


From hategan at mcs.anl.gov  Thu Jul  5 09:10:07 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 05 Jul 2007 09:10:07 -0500
Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data
In-Reply-To: <Pine.LNX.4.64.0707051404380.7513@dildano.hawaga.org.uk>
References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov>
	<Pine.LNX.4.64.0707050435520.10289@dildano.hawaga.org.uk>
	<1183643731.5084.3.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707051404380.7513@dildano.hawaga.org.uk>
Message-ID: <1183644607.5084.9.camel@blabla.mcs.anl.gov>

On Thu, 2007-07-05 at 14:05 +0000, Ben Clifford wrote:
> how does it know where files are now, between jobs?

That's the thing. They're always on localhost.

> 
> On Thu, 5 Jul 2007, Mihael Hategan wrote:
> 
> > I think you're missing something. You need to remember where the files
> > are. The mapping information becomes insufficient. It tells you where
> > some initial files were, but it won't contain any site information. And
> > that's good, because the decision of where something is done is made at
> > run-time. But you still need some store (even though probably
> > memory-based and only persistent through one swift run).
> > 
> > On Thu, 2007-07-05 at 04:41 +0000, Ben Clifford wrote:
> > > I don't think that's true.
> > > 
> > > If data files are labelled with URIs rather than 
> > > paths-relative-to-submit-directory, then those URIs are understandable 
> > > without a VDC-as-entity.
> > > 
> > > You don't need a separate VDC to tell you how to get at myfile here:
> > > 
> > >   file myfile <"gsiftp://terminable.ci.uchicago.edu/scratch/foo/">;
> > > 
> > > The 'data file pointer store' exists already - its the hierarchical 
> > > namespace that is rooted in IANA's management of the URI and DNS space, 
> > > continues to UC's management of DNS space and then down to my management 
> > > of terminable's filesystem space and then down to whoever owns the foo 
> > > directory.
> > > 
> > > 
> > > On Sun, 1 Jul 2007, bugzilla-daemon at mcs.anl.gov wrote:
> > > 
> > > > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=76
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > ------- Comment #1 from hategan at mcs.anl.gov  2007-07-01 02:18 -------
> > > > This would require a data file pointer store (VDC like thing) which can record
> > > > where intermediate files are instead of assuming they are always available on
> > > > the submit host.
> > > > 
> > > > 
> > > > 
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > 
> > 
> > 
> 


From benc at hawaga.org.uk  Thu Jul  5 11:25:06 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 5 Jul 2007 16:25:06 +0000 (GMT)
Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data
In-Reply-To: <1183644607.5084.9.camel@blabla.mcs.anl.gov>
References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov> 
	<Pine.LNX.4.64.0707050435520.10289@dildano.hawaga.org.uk> 
	<1183643731.5084.3.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707051404380.7513@dildano.hawaga.org.uk>
	<1183644607.5084.9.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707051624520.7513@dildano.hawaga.org.uk>


they're always in the place that the path name says they are. whether its 
a URI or a local relative path.

On Thu, 5 Jul 2007, Mihael Hategan wrote:

> On Thu, 2007-07-05 at 14:05 +0000, Ben Clifford wrote:
> > how does it know where files are now, between jobs?
> 
> That's the thing. They're always on localhost.
> 
> > 
> > On Thu, 5 Jul 2007, Mihael Hategan wrote:
> > 
> > > I think you're missing something. You need to remember where the files
> > > are. The mapping information becomes insufficient. It tells you where
> > > some initial files were, but it won't contain any site information. And
> > > that's good, because the decision of where something is done is made at
> > > run-time. But you still need some store (even though probably
> > > memory-based and only persistent through one swift run).
> > > 
> > > On Thu, 2007-07-05 at 04:41 +0000, Ben Clifford wrote:
> > > > I don't think that's true.
> > > > 
> > > > If data files are labelled with URIs rather than 
> > > > paths-relative-to-submit-directory, then those URIs are understandable 
> > > > without a VDC-as-entity.
> > > > 
> > > > You don't need a separate VDC to tell you how to get at myfile here:
> > > > 
> > > >   file myfile <"gsiftp://terminable.ci.uchicago.edu/scratch/foo/">;
> > > > 
> > > > The 'data file pointer store' exists already - its the hierarchical 
> > > > namespace that is rooted in IANA's management of the URI and DNS space, 
> > > > continues to UC's management of DNS space and then down to my management 
> > > > of terminable's filesystem space and then down to whoever owns the foo 
> > > > directory.
> > > > 
> > > > 
> > > > On Sun, 1 Jul 2007, bugzilla-daemon at mcs.anl.gov wrote:
> > > > 
> > > > > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=76
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > ------- Comment #1 from hategan at mcs.anl.gov  2007-07-01 02:18 -------
> > > > > This would require a data file pointer store (VDC like thing) which can record
> > > > > where intermediate files are instead of assuming they are always available on
> > > > > the submit host.
> > > > > 
> > > > > 
> > > > > 
> > > > _______________________________________________
> > > > Swift-devel mailing list
> > > > Swift-devel at ci.uchicago.edu
> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > 
> > > 
> > > 
> > 
> 
> 


From hategan at mcs.anl.gov  Thu Jul  5 11:41:30 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 05 Jul 2007 11:41:30 -0500
Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data
In-Reply-To: <Pine.LNX.4.64.0707051624520.7513@dildano.hawaga.org.uk>
References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov>
	<Pine.LNX.4.64.0707050435520.10289@dildano.hawaga.org.uk>
	<1183643731.5084.3.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707051404380.7513@dildano.hawaga.org.uk>
	<1183644607.5084.9.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707051624520.7513@dildano.hawaga.org.uk>
Message-ID: <1183653690.11132.2.camel@blabla.mcs.anl.gov>

On Thu, 2007-07-05 at 16:25 +0000, Ben Clifford wrote:
> they're always in the place that the path name says they are. whether its 
> a URI or a local relative path.

Right, but whereas in the current scheme you can assume the site is
localhost, because files are always staged back to localhost, if you
don't do the stage-out, that assumption goes away. In that case, the
site information needs to be recorded.

> 
> On Thu, 5 Jul 2007, Mihael Hategan wrote:
> 
> > On Thu, 2007-07-05 at 14:05 +0000, Ben Clifford wrote:
> > > how does it know where files are now, between jobs?
> > 
> > That's the thing. They're always on localhost.
> > 
> > > 
> > > On Thu, 5 Jul 2007, Mihael Hategan wrote:
> > > 
> > > > I think you're missing something. You need to remember where the files
> > > > are. The mapping information becomes insufficient. It tells you where
> > > > some initial files were, but it won't contain any site information. And
> > > > that's good, because the decision of where something is done is made at
> > > > run-time. But you still need some store (even though probably
> > > > memory-based and only persistent through one swift run).
> > > > 
> > > > On Thu, 2007-07-05 at 04:41 +0000, Ben Clifford wrote:
> > > > > I don't think that's true.
> > > > > 
> > > > > If data files are labelled with URIs rather than 
> > > > > paths-relative-to-submit-directory, then those URIs are understandable 
> > > > > without a VDC-as-entity.
> > > > > 
> > > > > You don't need a separate VDC to tell you how to get at myfile here:
> > > > > 
> > > > >   file myfile <"gsiftp://terminable.ci.uchicago.edu/scratch/foo/">;
> > > > > 
> > > > > The 'data file pointer store' exists already - its the hierarchical 
> > > > > namespace that is rooted in IANA's management of the URI and DNS space, 
> > > > > continues to UC's management of DNS space and then down to my management 
> > > > > of terminable's filesystem space and then down to whoever owns the foo 
> > > > > directory.
> > > > > 
> > > > > 
> > > > > On Sun, 1 Jul 2007, bugzilla-daemon at mcs.anl.gov wrote:
> > > > > 
> > > > > > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=76
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > ------- Comment #1 from hategan at mcs.anl.gov  2007-07-01 02:18 -------
> > > > > > This would require a data file pointer store (VDC like thing) which can record
> > > > > > where intermediate files are instead of assuming they are always available on
> > > > > > the submit host.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > _______________________________________________
> > > > > Swift-devel mailing list
> > > > > Swift-devel at ci.uchicago.edu
> > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > > 
> > > > 
> > > > 
> > > 
> > 
> > 
> 


From benc at hawaga.org.uk  Thu Jul  5 11:47:45 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 5 Jul 2007 16:47:45 +0000 (GMT)
Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data
In-Reply-To: <1183653690.11132.2.camel@blabla.mcs.anl.gov>
References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov> 
	<Pine.LNX.4.64.0707050435520.10289@dildano.hawaga.org.uk> 
	<1183643731.5084.3.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707051404380.7513@dildano.hawaga.org.uk>
	<1183644607.5084.9.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0707051624520.7513@dildano.hawaga.org.uk>
	<1183653690.11132.2.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707051647410.7513@dildano.hawaga.org.uk>


right.

On Thu, 5 Jul 2007, Mihael Hategan wrote:

> On Thu, 2007-07-05 at 16:25 +0000, Ben Clifford wrote:
> > they're always in the place that the path name says they are. whether its 
> > a URI or a local relative path.
> 
> Right, but whereas in the current scheme you can assume the site is
> localhost, because files are always staged back to localhost, if you
> don't do the stage-out, that assumption goes away. In that case, the
> site information needs to be recorded.
> 
> > 
> > On Thu, 5 Jul 2007, Mihael Hategan wrote:
> > 
> > > On Thu, 2007-07-05 at 14:05 +0000, Ben Clifford wrote:
> > > > how does it know where files are now, between jobs?
> > > 
> > > That's the thing. They're always on localhost.
> > > 
> > > > 
> > > > On Thu, 5 Jul 2007, Mihael Hategan wrote:
> > > > 
> > > > > I think you're missing something. You need to remember where the files
> > > > > are. The mapping information becomes insufficient. It tells you where
> > > > > some initial files were, but it won't contain any site information. And
> > > > > that's good, because the decision of where something is done is made at
> > > > > run-time. But you still need some store (even though probably
> > > > > memory-based and only persistent through one swift run).
> > > > > 
> > > > > On Thu, 2007-07-05 at 04:41 +0000, Ben Clifford wrote:
> > > > > > I don't think that's true.
> > > > > > 
> > > > > > If data files are labelled with URIs rather than 
> > > > > > paths-relative-to-submit-directory, then those URIs are understandable 
> > > > > > without a VDC-as-entity.
> > > > > > 
> > > > > > You don't need a separate VDC to tell you how to get at myfile here:
> > > > > > 
> > > > > >   file myfile <"gsiftp://terminable.ci.uchicago.edu/scratch/foo/">;
> > > > > > 
> > > > > > The 'data file pointer store' exists already - its the hierarchical 
> > > > > > namespace that is rooted in IANA's management of the URI and DNS space, 
> > > > > > continues to UC's management of DNS space and then down to my management 
> > > > > > of terminable's filesystem space and then down to whoever owns the foo 
> > > > > > directory.
> > > > > > 
> > > > > > 
> > > > > > On Sun, 1 Jul 2007, bugzilla-daemon at mcs.anl.gov wrote:
> > > > > > 
> > > > > > > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=76
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > ------- Comment #1 from hategan at mcs.anl.gov  2007-07-01 02:18 -------
> > > > > > > This would require a data file pointer store (VDC like thing) which can record
> > > > > > > where intermediate files are instead of assuming they are always available on
> > > > > > > the submit host.
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > _______________________________________________
> > > > > > Swift-devel mailing list
> > > > > > Swift-devel at ci.uchicago.edu
> > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > > > 
> > > > > 
> > > > > 
> > > > 
> > > 
> > > 
> > 
> 
> 


From benc at hawaga.org.uk  Thu Jul  5 11:55:26 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 5 Jul 2007 16:55:26 +0000 (GMT)
Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data
In-Reply-To: <1183644607.5084.9.camel@blabla.mcs.anl.gov>
References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov> 
	<Pine.LNX.4.64.0707050435520.10289@dildano.hawaga.org.uk> 
	<1183643731.5084.3.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707051404380.7513@dildano.hawaga.org.uk>
	<1183644607.5084.9.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707051637540.7513@dildano.hawaga.org.uk>


so I was thinking the other day while poking through code.

'data' in SwiftScript terms is mostly represented by DSHandle objects.

such objects (which can have one of several implementing classes, and 
potentially more in future) have a number of properties, such as:

 .  value - what the 'value' is, for adding to other values, using @strcat 
on, performing array/member access using [] and .

 . submit-side location - what is extracted with @filename and used when 
that 'data' is passed to an application rather than being operated on by 
submit-side functions.

Neither of these are compulsory (and I think in practice at the moment it 
works out that you either have a filename or a value and never 
meaningfully both).

So a different model of mapping (which might work better when we want data 
that doesn't necessarily exist as discrete files or as in-memory values - 
the two examples that I've seen talked about are 'data from an sql 
database' and 'constants in a csv file') might be that mappers generate 
DSHandle trees (specifically a mapper generates a DSHandle, which might 
have descendants). Those DSHandles might have values, might have 
filenames, might have other attributes, might have ongoing annotation 
(which could include keeping track of where within-this-run copies have 
been made).

--


From yongzh at cs.uchicago.edu  Thu Jul  5 12:14:32 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Thu, 5 Jul 2007 12:14:32 -0500 (CDT)
Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data
In-Reply-To: <Pine.LNX.4.64.0707051637540.7513@dildano.hawaga.org.uk>
References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov> 
	<Pine.LNX.4.64.0707050435520.10289@dildano.hawaga.org.uk> 
	<1183643731.5084.3.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707051404380.7513@dildano.hawaga.org.uk>
	<1183644607.5084.9.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707051637540.7513@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.58.0707051207310.30143@classes.cs.uchicago.edu>

My original thinking about value/filename was that we don't distinguish
those at the logical level, essentially they could all just be values.
Then when we need to call mapper functions (getFilename, for instance), we
interprete the values differently. So in the case of getFilename, we can
interprete the value either as
1) the filename itself, returning the value directly
2) writing the value into a file, and returning an automatically generated
filename.
3) some other possibilities, e.g. a directory of files.

The current DSHandle interface does allow nested trees, so a mapper
could return a dshandle tree as the implementation currently stands.

Yong.


On Thu, 5 Jul 2007, Ben Clifford wrote:

>
> so I was thinking the other day while poking through code.
>
> 'data' in SwiftScript terms is mostly represented by DSHandle objects.
>
> such objects (which can have one of several implementing classes, and
> potentially more in future) have a number of properties, such as:
>
>  .  value - what the 'value' is, for adding to other values, using @strcat
> on, performing array/member access using [] and .
>
>  . submit-side location - what is extracted with @filename and used when
> that 'data' is passed to an application rather than being operated on by
> submit-side functions.
>
> Neither of these are compulsory (and I think in practice at the moment it
> works out that you either have a filename or a value and never
> meaningfully both).
>
> So a different model of mapping (which might work better when we want data
> that doesn't necessarily exist as discrete files or as in-memory values -
> the two examples that I've seen talked about are 'data from an sql
> database' and 'constants in a csv file') might be that mappers generate
> DSHandle trees (specifically a mapper generates a DSHandle, which might
> have descendants). Those DSHandles might have values, might have
> filenames, might have other attributes, might have ongoing annotation
> (which could include keeping track of where within-this-run copies have
> been made).
>
> --
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From benc at hawaga.org.uk  Thu Jul  5 12:22:25 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 5 Jul 2007 17:22:25 +0000 (GMT)
Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data
In-Reply-To: <Pine.LNX.4.58.0707051207310.30143@classes.cs.uchicago.edu>
References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov> 
	<Pine.LNX.4.64.0707050435520.10289@dildano.hawaga.org.uk> 
	<1183643731.5084.3.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707051404380.7513@dildano.hawaga.org.uk>
	<1183644607.5084.9.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707051637540.7513@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0707051207310.30143@classes.cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0707051715450.7513@dildano.hawaga.org.uk>


On Thu, 5 Jul 2007, Yong Zhao wrote:

> My original thinking about value/filename was that we don't distinguish
> those at the logical level, essentially they could all just be values.
> Then when we need to call mapper functions (getFilename, for instance), we
> interprete the values differently. So in the case of getFilename, we can
> interprete the value either as
> 1) the filename itself, returning the value directly

> 2) writing the value into a file, and returning an automatically generated
> filename.

> 3) some other possibilities, e.g. a directory of files.

option 1 goes against the strongly typed model - if I have a brain image, 
I dont want an access to that brain image to suddenly be the string 
"brain.img" - that isn't of type 'braingimage', its of type 'string'.

but the other two options work, I think - that's what a mapper does - 
expresses how swiftscript data is interpreted in various different ways 
- as a (set of) file(s), as a in-memory value, in some other form.

But I don't think it will always be the case that each data object will be 
accessible in each form. For example, a brain scan doesn't make much sense 
being mapepd into the karajan runtime at the moment - we have nothing to 
do interesting things with such.

 > 
> The current DSHandle interface does allow nested trees, so a mapper
> could return a dshandle tree as the implementation currently stands.
> 
> Yong.
> 
> 
> On Thu, 5 Jul 2007, Ben Clifford wrote:
> 
> >
> > so I was thinking the other day while poking through code.
> >
> > 'data' in SwiftScript terms is mostly represented by DSHandle objects.
> >
> > such objects (which can have one of several implementing classes, and
> > potentially more in future) have a number of properties, such as:
> >
> >  .  value - what the 'value' is, for adding to other values, using @strcat
> > on, performing array/member access using [] and .
> >
> >  . submit-side location - what is extracted with @filename and used when
> > that 'data' is passed to an application rather than being operated on by
> > submit-side functions.
> >
> > Neither of these are compulsory (and I think in practice at the moment it
> > works out that you either have a filename or a value and never
> > meaningfully both).
> >
> > So a different model of mapping (which might work better when we want data
> > that doesn't necessarily exist as discrete files or as in-memory values -
> > the two examples that I've seen talked about are 'data from an sql
> > database' and 'constants in a csv file') might be that mappers generate
> > DSHandle trees (specifically a mapper generates a DSHandle, which might
> > have descendants). Those DSHandles might have values, might have
> > filenames, might have other attributes, might have ongoing annotation
> > (which could include keeping track of where within-this-run copies have
> > been made).
> >
> > --
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> 
> 


From yongzh at cs.uchicago.edu  Thu Jul  5 12:28:36 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Thu, 5 Jul 2007 12:28:36 -0500 (CDT)
Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data
In-Reply-To: <Pine.LNX.4.64.0707051715450.7513@dildano.hawaga.org.uk>
References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov> 
	<Pine.LNX.4.64.0707050435520.10289@dildano.hawaga.org.uk> 
	<1183643731.5084.3.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707051404380.7513@dildano.hawaga.org.uk>
	<1183644607.5084.9.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707051637540.7513@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0707051207310.30143@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0707051715450.7513@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.58.0707051226380.30143@classes.cs.uchicago.edu>

option 1) does not say it is of type string, but it is of type any, which
means it could be an opaque file that we are not interested in going into
the file content, in which case, a file name could be in place of the
content, it all depends on how the mapper interprete the value.

Yong.

On Thu, 5 Jul 2007, Ben Clifford wrote:

>
>
> On Thu, 5 Jul 2007, Yong Zhao wrote:
>
> > My original thinking about value/filename was that we don't distinguish
> > those at the logical level, essentially they could all just be values.
> > Then when we need to call mapper functions (getFilename, for instance), we
> > interprete the values differently. So in the case of getFilename, we can
> > interprete the value either as
> > 1) the filename itself, returning the value directly
>
> > 2) writing the value into a file, and returning an automatically generated
> > filename.
>
> > 3) some other possibilities, e.g. a directory of files.
>
> option 1 goes against the strongly typed model - if I have a brain image,
> I dont want an access to that brain image to suddenly be the string
> "brain.img" - that isn't of type 'braingimage', its of type 'string'.
>
> but the other two options work, I think - that's what a mapper does -
> expresses how swiftscript data is interpreted in various different ways
> - as a (set of) file(s), as a in-memory value, in some other form.
>
> But I don't think it will always be the case that each data object will be
> accessible in each form. For example, a brain scan doesn't make much sense
> being mapepd into the karajan runtime at the moment - we have nothing to
> do interesting things with such.
>
>  >
> > The current DSHandle interface does allow nested trees, so a mapper
> > could return a dshandle tree as the implementation currently stands.
> >
> > Yong.
> >
> >
> > On Thu, 5 Jul 2007, Ben Clifford wrote:
> >
> > >
> > > so I was thinking the other day while poking through code.
> > >
> > > 'data' in SwiftScript terms is mostly represented by DSHandle objects.
> > >
> > > such objects (which can have one of several implementing classes, and
> > > potentially more in future) have a number of properties, such as:
> > >
> > >  .  value - what the 'value' is, for adding to other values, using @strcat
> > > on, performing array/member access using [] and .
> > >
> > >  . submit-side location - what is extracted with @filename and used when
> > > that 'data' is passed to an application rather than being operated on by
> > > submit-side functions.
> > >
> > > Neither of these are compulsory (and I think in practice at the moment it
> > > works out that you either have a filename or a value and never
> > > meaningfully both).
> > >
> > > So a different model of mapping (which might work better when we want data
> > > that doesn't necessarily exist as discrete files or as in-memory values -
> > > the two examples that I've seen talked about are 'data from an sql
> > > database' and 'constants in a csv file') might be that mappers generate
> > > DSHandle trees (specifically a mapper generates a DSHandle, which might
> > > have descendants). Those DSHandles might have values, might have
> > > filenames, might have other attributes, might have ongoing annotation
> > > (which could include keeping track of where within-this-run copies have
> > > been made).
> > >
> > > --
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > >
> >
> >
>


From hategan at mcs.anl.gov  Thu Jul  5 13:02:03 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 05 Jul 2007 13:02:03 -0500
Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data
In-Reply-To: <Pine.LNX.4.64.0707051637540.7513@dildano.hawaga.org.uk>
References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov>
	<Pine.LNX.4.64.0707050435520.10289@dildano.hawaga.org.uk>
	<1183643731.5084.3.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707051404380.7513@dildano.hawaga.org.uk>
	<1183644607.5084.9.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707051637540.7513@dildano.hawaga.org.uk>
Message-ID: <1183658523.13928.3.camel@blabla.mcs.anl.gov>

On Thu, 2007-07-05 at 16:55 +0000, Ben Clifford wrote:

> So a different model of mapping (which might work better when we want data 
> that doesn't necessarily exist as discrete files or as in-memory values - 
> the two examples that I've seen talked about are 'data from an sql 
> database' and 'constants in a csv file') might be that mappers generate 
> DSHandle trees (specifically a mapper generates a DSHandle, which might 
> have descendants). Those DSHandles might have values, might have 
> filenames, might have other attributes, might have ongoing annotation 
> (which could include keeping track of where within-this-run copies have 
> been made).

However, we should keep in mind that mapping is lazy. We want that to
achieve scalability, and at least in theory, infinite arrays (for that
we would need some form of garbage collection).

On the other hand, data itself is future-like. The difference being that
everything is computed as soon as possible, but access is delayed until
data is available.

> 
> --
> 


From hategan at mcs.anl.gov  Thu Jul  5 13:05:45 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 05 Jul 2007 13:05:45 -0500
Subject: [Swift-devel] [Bug 76] disable intermediate stageout of data
In-Reply-To: <Pine.LNX.4.58.0707051207310.30143@classes.cs.uchicago.edu>
References: <20070701071846.56FF016505@foxtrot.mcs.anl.gov>
	<Pine.LNX.4.64.0707050435520.10289@dildano.hawaga.org.uk>
	<1183643731.5084.3.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707051404380.7513@dildano.hawaga.org.uk>
	<1183644607.5084.9.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707051637540.7513@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0707051207310.30143@classes.cs.uchicago.edu>
Message-ID: <1183658745.13928.8.camel@blabla.mcs.anl.gov>

On Thu, 2007-07-05 at 12:14 -0500, Yong Zhao wrote:
> My original thinking about value/filename was that we don't distinguish
> those at the logical level, essentially they could all just be values.
> Then when we need to call mapper functions (getFilename, for instance), we
> interprete the values differently. So in the case of getFilename, we can
> interprete the value either as
> 1) the filename itself, returning the value directly
> 2) writing the value into a file, and returning an automatically generated
> filename.
> 3) some other possibilities, e.g. a directory of files.

This clearly conflicts with the ability to apply swift functions to data
in files or databases, as one would need, in the case of files, both a
file pointer and actual data.

I would rather follow a known model for this: pointers. There are
addresses (files, uris, db/table/column/row) and values, which are
stored at those addresses. What's missing from the scheme right now is
the ability of a mapper to fetch actual data from such locations when
needed.

> 
> The current DSHandle interface does allow nested trees, so a mapper
> could return a dshandle tree as the implementation currently stands.
> 
> Yong.
> 
> 
> On Thu, 5 Jul 2007, Ben Clifford wrote:
> 
> >
> > so I was thinking the other day while poking through code.
> >
> > 'data' in SwiftScript terms is mostly represented by DSHandle objects.
> >
> > such objects (which can have one of several implementing classes, and
> > potentially more in future) have a number of properties, such as:
> >
> >  .  value - what the 'value' is, for adding to other values, using @strcat
> > on, performing array/member access using [] and .
> >
> >  . submit-side location - what is extracted with @filename and used when
> > that 'data' is passed to an application rather than being operated on by
> > submit-side functions.
> >
> > Neither of these are compulsory (and I think in practice at the moment it
> > works out that you either have a filename or a value and never
> > meaningfully both).
> >
> > So a different model of mapping (which might work better when we want data
> > that doesn't necessarily exist as discrete files or as in-memory values -
> > the two examples that I've seen talked about are 'data from an sql
> > database' and 'constants in a csv file') might be that mappers generate
> > DSHandle trees (specifically a mapper generates a DSHandle, which might
> > have descendants). Those DSHandles might have values, might have
> > filenames, might have other attributes, might have ongoing annotation
> > (which could include keeping track of where within-this-run copies have
> > been made).
> >
> > --
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> 


From nefedova at mcs.anl.gov  Thu Jul  5 13:55:51 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Thu, 5 Jul 2007 13:55:51 -0500
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <1183429417.16404.0.camel@blabla.mcs.anl.gov>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
Message-ID: <C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>

my workflow doesn't work with recent changes. It worked fine for 1  
molecule, but fails for 244 (right after compilation step, before  
submitting it to the grid). These are the errors:

2007-07-05 13:37:51,294 DEBUG VDL2ExecutionContext Missing argument  
s11 for sys:element(out1, out2, out3, out4, in1, in2, in3, in4, in5,  
in6, in7, in8, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11)
Missing argument s11 for sys:element(out1, out2, out3, out4, in1,  
in2, in3, in4, in5, in6, in7, in8, s1, s2, s3, s4, s5, s6, s7, s8,  
s9, s10, s11)
         CHARMM3 @ MolDyn-244.kml, line: 209
         vdl:mains @ MolDyn-244.kml, line: 583910

         at  
org.globus.cog.karajan.workflow.nodes.user.UserDefinedElement.prepareIns 
tanceArguments(UserDefinedElement.java:196)
         at  
org.globus.cog.karajan.workflow.nodes.user.UserDefinedElement.startBody( 
UserDefinedElement.java:170)
         at  
org.globus.cog.karajan.workflow.nodes.user.SequentialImplicitExecutionUD 
E.startBody(SequentialImplicitExecutionUDE.java:55)
         at  
org.globus.cog.karajan.workflow.nodes.user.SequentialImplicitExecutionUD 
E.childCompleted(SequentialImplicitExecutionUDE.java:82)
         at  
org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent 
(Sequential.java:33)
         at org.globus.cog.karajan.workflow.nodes.FlowNode.event 
(FlowNode.java:334)
         at org.globus.cog.karajan.workflow.events.EventBus.send 
(EventBus.java:123)
         at org.globus.cog.karajan.workflow.events.EventBus.sendHooked 
(EventBus.java:97)
         at  
org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent 
(FlowNode.java:172)
         at org.globus.cog.karajan.workflow.nodes.FlowNode.complete 
(FlowNode.java:298)
         at org.globus.cog.karajan.workflow.nodes.FlowContainer.post 
(FlowContainer.java:58)
         at  
org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.ch 
ildCompleted(AbstractSequentialWithArguments.java:192)
         at  
org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent 
(Sequential.java:33)
         at org.globus.cog.karajan.workflow.nodes.FlowNode.event 
(FlowNode.java:334)
         at org.globus.cog.karajan.workflow.events.EventBus.send 
(EventBus.java:123)
         at org.globus.cog.karajan.workflow.events.EventBus.sendHooked 
(EventBus.java:97)
         at  
org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent 
(FlowNode.java:172)
         at org.globus.cog.karajan.workflow.nodes.FlowNode.complete 
(FlowNode.java:298)
         at org.globus.cog.karajan.workflow.nodes.FlowContainer.post 
(FlowContainer.java:58)
         at  
org.globus.cog.karajan.workflow.nodes.Parallel.notificationEvent 
(Parallel.java:90)
         at org.globus.cog.karajan.workflow.nodes.FlowNode.event 
(FlowNode.java:334)
<snip>

a complete log is on terminable in ~nefedova/MolDyn-244- 
zvhy3me4scm61.log
the MolDyn-244.* files are also there. Please note that this is  
exactly the same file (dtm) that worked before.

Nika


On Jul 2, 2007, at 9:23 PM, Mihael Hategan wrote:

> Yup. Try now.
>
> On Tue, 2007-07-03 at 06:38 +0530, Ben Clifford wrote:
>> I get the below when I try to run a hello world workflow
>> (examples/tutorial/q1.swift).
>>
>> I think Nika also saw something that looks similar, with a different
>> workflow.
>>
>> This is with cog r1655.
>>
>> I reverted my checkout to cog r1650 (svn merge -r1655:1650 .) and  
>> hello
>> world runs ok (r1650 being before the most recent set of cog  
>> commits).
>>
>>
>> $ swift -debug q1.swift
>> Recompilation suppressed.
>>
>> null
>>         kernel:cache @ sys.xml, line: 3
>> Caused by: java.lang.UnsupportedOperationException
>>         at java.util.AbstractMap.put(AbstractMap.java:228)
>>         at
>> org.globus.cog.karajan.workflow.nodes.CacheNode.getTrackingArguments( 
>> CacheNode.java:153)
>>         at
>> org.globus.cog.karajan.workflow.nodes.CacheNode.post 
>> (CacheNode.java:77)
>>         at
>> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments 
>> .childCompleted(AbstractSequentialWithArguments.java:192)
>>         at
>> org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.nonAr 
>> gChildCompleted(PartialArgumentsContainer.java:90)
>>         at
>> org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.child 
>> Completed(PartialArgumentsContainer.java:85)
>>         at
>> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent 
>> (Sequential.java:33)
>>         at
>> org.globus.cog.karajan.workflow.nodes.CacheNode.notificationEvent 
>> (CacheNode.java:111)
>>         at
>> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java: 
>> 334)
>>         at
>> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java: 
>> 123)
>>         at
>> org.globus.cog.karajan.workflow.events.EventBus.sendHooked 
>> (EventBus.java:97)
>>         at
>> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent( 
>> FlowNode.java:172)
>>         at
>> org.globus.cog.karajan.workflow.nodes.FlowNode.complete 
>> (FlowNode.java:298)
>>         at
>> org.globus.cog.karajan.workflow.nodes.FlowContainer.post 
>> (FlowContainer.java:58)
>>         at
>> org.globus.cog.karajan.workflow.nodes.Namespace.post 
>> (Namespace.java:40)
>>         at
>> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments 
>> .childCompleted(AbstractSequentialWithArguments.java:192)
>>         at
>> org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.nonAr 
>> gChildCompleted(PartialArgumentsContainer.java:90)
>>
>>
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From benc at hawaga.org.uk  Thu Jul  5 14:57:25 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 5 Jul 2007 19:57:25 +0000 (GMT)
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk>

ou have your heap set on the 244 molecule workflow? I run out at the 
compile stage with default.
-- 


From nefedova at mcs.anl.gov  Thu Jul  5 15:01:06 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Thu, 5 Jul 2007 15:01:06 -0500
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
	<Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk>
Message-ID: <AD7B46A3-CE76-4ECC-AB3D-72E8472E430D@mcs.anl.gov>

yep, its set to the max:
OPTIONS="-Xms1536m -Xmx1536m"
(in bin/swift )

On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote:

> ou have your heap set on the 244 molecule workflow? I run out at the
> compile stage with default.
> -- 
>
>


From benc at hawaga.org.uk  Thu Jul  5 14:39:32 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 5 Jul 2007 19:39:32 +0000 (GMT)
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707051939110.7513@dildano.hawaga.org.uk>

did you touch both the 1 molecule and 244 moleule .swift files to cause 
recompilation?

also, do you have the 1-molecule .swift, .xml and .kml files around?

On Thu, 5 Jul 2007, Veronika Nefedova wrote:

> my workflow doesn't work with recent changes. It worked fine for 1 molecule,
> but fails for 244 (right after compilation step, before submitting it to the
> grid). These are the errors:
> 
> 2007-07-05 13:37:51,294 DEBUG VDL2ExecutionContext Missing argument s11 for
> sys:element(out1, out2, out3, out4, in1, in2, in3, in4, in5, in6, in7, in8,
> s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11)
> Missing argument s11 for sys:element(out1, out2, out3, out4, in1, in2, in3,
> in4, in5, in6, in7, in8, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11)
>        CHARMM3 @ MolDyn-244.kml, line: 209
>        vdl:mains @ MolDyn-244.kml, line: 583910
> 
>        at
> org.globus.cog.karajan.workflow.nodes.user.UserDefinedElement.prepareInstanceArguments(UserDefinedElement.java:196)
>        at
> org.globus.cog.karajan.workflow.nodes.user.UserDefinedElement.startBody(UserDefinedElement.java:170)
>        at
> org.globus.cog.karajan.workflow.nodes.user.SequentialImplicitExecutionUDE.startBody(SequentialImplicitExecutionUDE.java:55)
>        at
> org.globus.cog.karajan.workflow.nodes.user.SequentialImplicitExecutionUDE.childCompleted(SequentialImplicitExecutionUDE.java:82)
>        at
> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33)
>        at
> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334)
>        at
> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123)
>        at
> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97)
>        at
> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:172)
>        at
> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298)
>        at
> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58)
>        at
> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192)
>        at
> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33)
>        at
> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334)
>        at
> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123)
>        at
> org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97)
>        at
> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:172)
>        at
> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298)
>        at
> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58)
>        at
> org.globus.cog.karajan.workflow.nodes.Parallel.notificationEvent(Parallel.java:90)
>        at
> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334)
> <snip>
> 
> a complete log is on terminable in ~nefedova/MolDyn-244-zvhy3me4scm61.log
> the MolDyn-244.* files are also there. Please note that this is exactly the
> same file (dtm) that worked before.
> 
> Nika
> 
> 
> On Jul 2, 2007, at 9:23 PM, Mihael Hategan wrote:
> 
> > Yup. Try now.
> > 
> > On Tue, 2007-07-03 at 06:38 +0530, Ben Clifford wrote:
> > > I get the below when I try to run a hello world workflow
> > > (examples/tutorial/q1.swift).
> > > 
> > > I think Nika also saw something that looks similar, with a different
> > > workflow.
> > > 
> > > This is with cog r1655.
> > > 
> > > I reverted my checkout to cog r1650 (svn merge -r1655:1650 .) and hello
> > > world runs ok (r1650 being before the most recent set of cog commits).
> > > 
> > > 
> > > $ swift -debug q1.swift
> > > Recompilation suppressed.
> > > 
> > > null
> > >        kernel:cache @ sys.xml, line: 3
> > > Caused by: java.lang.UnsupportedOperationException
> > >        at java.util.AbstractMap.put(AbstractMap.java:228)
> > >        at
> > > org.globus.cog.karajan.workflow.nodes.CacheNode.getTrackingArguments(CacheNode.java:153)
> > >        at
> > > org.globus.cog.karajan.workflow.nodes.CacheNode.post(CacheNode.java:77)
> > >        at
> > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192)
> > >        at
> > > org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.nonArgChildCompleted(PartialArgumentsContainer.java:90)
> > >        at
> > > org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.childCompleted(PartialArgumentsContainer.java:85)
> > >        at
> > > org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33)
> > >        at
> > > org.globus.cog.karajan.workflow.nodes.CacheNode.notificationEvent(CacheNode.java:111)
> > >        at
> > > org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:334)
> > >        at
> > > org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123)
> > >        at
> > > org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97)
> > >        at
> > > org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:172)
> > >        at
> > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:298)
> > >        at
> > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58)
> > >        at
> > > org.globus.cog.karajan.workflow.nodes.Namespace.post(Namespace.java:40)
> > >        at
> > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192)
> > >        at
> > > org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.nonArgChildCompleted(PartialArgumentsContainer.java:90)
> > > 
> > > 
> > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > 
> 


From nefedova at mcs.anl.gov  Thu Jul  5 15:03:28 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Thu, 5 Jul 2007 15:03:28 -0500
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
	<Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk>
Message-ID: <C2E60075-E0F8-4EEF-AA09-63691512CE8F@mcs.anl.gov>

you can use my kml file that I compiled today with the latest karajan  
(its on terminable).

On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote:

> ou have your heap set on the 244 molecule workflow? I run out at the
> compile stage with default.
> -- 
>
>


From nefedova at mcs.anl.gov  Thu Jul  5 15:06:07 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Thu, 5 Jul 2007 15:06:07 -0500
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <Pine.LNX.4.64.0707051939110.7513@dildano.hawaga.org.uk>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
	<Pine.LNX.4.64.0707051939110.7513@dildano.hawaga.org.uk>
Message-ID: <3EF6AA08-FA79-430C-99BE-9CB8EF8CEF70@mcs.anl.gov>

yep, I "touched" them both.

I put the MoDyn-1.* files also in ~nefedova on terminable.  
MolDyn-1.dtm ran successfully today.

Nika

On Jul 5, 2007, at 2:39 PM, Ben Clifford wrote:

> did you touch both the 1 molecule and 244 moleule .swift files to  
> cause
> recompilation?
>
> also, do you have the 1-molecule .swift, .xml and .kml files around?
>
> On Thu, 5 Jul 2007, Veronika Nefedova wrote:
>
>> my workflow doesn't work with recent changes. It worked fine for 1  
>> molecule,
>> but fails for 244 (right after compilation step, before submitting  
>> it to the
>> grid). These are the errors:
>>
>> 2007-07-05 13:37:51,294 DEBUG VDL2ExecutionContext Missing  
>> argument s11 for
>> sys:element(out1, out2, out3, out4, in1, in2, in3, in4, in5, in6,  
>> in7, in8,
>> s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11)
>> Missing argument s11 for sys:element(out1, out2, out3, out4, in1,  
>> in2, in3,
>> in4, in5, in6, in7, in8, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10,  
>> s11)
>>        CHARMM3 @ MolDyn-244.kml, line: 209
>>        vdl:mains @ MolDyn-244.kml, line: 583910
>>
>>        at
>> org.globus.cog.karajan.workflow.nodes.user.UserDefinedElement.prepare 
>> InstanceArguments(UserDefinedElement.java:196)
>>        at
>> org.globus.cog.karajan.workflow.nodes.user.UserDefinedElement.startBo 
>> dy(UserDefinedElement.java:170)
>>        at
>> org.globus.cog.karajan.workflow.nodes.user.SequentialImplicitExecutio 
>> nUDE.startBody(SequentialImplicitExecutionUDE.java:55)
>>        at
>> org.globus.cog.karajan.workflow.nodes.user.SequentialImplicitExecutio 
>> nUDE.childCompleted(SequentialImplicitExecutionUDE.java:82)
>>        at
>> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent 
>> (Sequential.java:33)
>>        at
>> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java: 
>> 334)
>>        at
>> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java: 
>> 123)
>>        at
>> org.globus.cog.karajan.workflow.events.EventBus.sendHooked 
>> (EventBus.java:97)
>>        at
>> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent( 
>> FlowNode.java:172)
>>        at
>> org.globus.cog.karajan.workflow.nodes.FlowNode.complete 
>> (FlowNode.java:298)
>>        at
>> org.globus.cog.karajan.workflow.nodes.FlowContainer.post 
>> (FlowContainer.java:58)
>>        at
>> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments 
>> .childCompleted(AbstractSequentialWithArguments.java:192)
>>        at
>> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent 
>> (Sequential.java:33)
>>        at
>> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java: 
>> 334)
>>        at
>> org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java: 
>> 123)
>>        at
>> org.globus.cog.karajan.workflow.events.EventBus.sendHooked 
>> (EventBus.java:97)
>>        at
>> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent( 
>> FlowNode.java:172)
>>        at
>> org.globus.cog.karajan.workflow.nodes.FlowNode.complete 
>> (FlowNode.java:298)
>>        at
>> org.globus.cog.karajan.workflow.nodes.FlowContainer.post 
>> (FlowContainer.java:58)
>>        at
>> org.globus.cog.karajan.workflow.nodes.Parallel.notificationEvent 
>> (Parallel.java:90)
>>        at
>> org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java: 
>> 334)
>> <snip>
>>
>> a complete log is on terminable in ~nefedova/MolDyn-244- 
>> zvhy3me4scm61.log
>> the MolDyn-244.* files are also there. Please note that this is  
>> exactly the
>> same file (dtm) that worked before.
>>
>> Nika
>>
>>
>> On Jul 2, 2007, at 9:23 PM, Mihael Hategan wrote:
>>
>>> Yup. Try now.
>>>
>>> On Tue, 2007-07-03 at 06:38 +0530, Ben Clifford wrote:
>>>> I get the below when I try to run a hello world workflow
>>>> (examples/tutorial/q1.swift).
>>>>
>>>> I think Nika also saw something that looks similar, with a  
>>>> different
>>>> workflow.
>>>>
>>>> This is with cog r1655.
>>>>
>>>> I reverted my checkout to cog r1650 (svn merge -r1655:1650 .)  
>>>> and hello
>>>> world runs ok (r1650 being before the most recent set of cog  
>>>> commits).
>>>>
>>>>
>>>> $ swift -debug q1.swift
>>>> Recompilation suppressed.
>>>>
>>>> null
>>>>        kernel:cache @ sys.xml, line: 3
>>>> Caused by: java.lang.UnsupportedOperationException
>>>>        at java.util.AbstractMap.put(AbstractMap.java:228)
>>>>        at
>>>> org.globus.cog.karajan.workflow.nodes.CacheNode.getTrackingArgument 
>>>> s(CacheNode.java:153)
>>>>        at
>>>> org.globus.cog.karajan.workflow.nodes.CacheNode.post 
>>>> (CacheNode.java:77)
>>>>        at
>>>> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArgumen 
>>>> ts.childCompleted(AbstractSequentialWithArguments.java:192)
>>>>        at
>>>> org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.non 
>>>> ArgChildCompleted(PartialArgumentsContainer.java:90)
>>>>        at
>>>> org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.chi 
>>>> ldCompleted(PartialArgumentsContainer.java:85)
>>>>        at
>>>> org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent( 
>>>> Sequential.java:33)
>>>>        at
>>>> org.globus.cog.karajan.workflow.nodes.CacheNode.notificationEvent 
>>>> (CacheNode.java:111)
>>>>        at
>>>> org.globus.cog.karajan.workflow.nodes.FlowNode.event 
>>>> (FlowNode.java:334)
>>>>        at
>>>> org.globus.cog.karajan.workflow.events.EventBus.send 
>>>> (EventBus.java:123)
>>>>        at
>>>> org.globus.cog.karajan.workflow.events.EventBus.sendHooked 
>>>> (EventBus.java:97)
>>>>        at
>>>> org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEven 
>>>> t(FlowNode.java:172)
>>>>        at
>>>> org.globus.cog.karajan.workflow.nodes.FlowNode.complete 
>>>> (FlowNode.java:298)
>>>>        at
>>>> org.globus.cog.karajan.workflow.nodes.FlowContainer.post 
>>>> (FlowContainer.java:58)
>>>>        at
>>>> org.globus.cog.karajan.workflow.nodes.Namespace.post 
>>>> (Namespace.java:40)
>>>>        at
>>>> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArgumen 
>>>> ts.childCompleted(AbstractSequentialWithArguments.java:192)
>>>>        at
>>>> org.globus.cog.karajan.workflow.nodes.PartialArgumentsContainer.non 
>>>> ArgChildCompleted(PartialArgumentsContainer.java:90)
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>
>


From benc at hawaga.org.uk  Thu Jul  5 15:13:02 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 5 Jul 2007 20:13:02 +0000 (GMT)
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <C2E60075-E0F8-4EEF-AA09-63691512CE8F@mcs.anl.gov>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
	<Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk>
	<C2E60075-E0F8-4EEF-AA09-63691512CE8F@mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707052012050.7513@dildano.hawaga.org.uk>


what karajan revision? and what swift revision?

(type svn info in the cog and dsk directories...)

On Thu, 5 Jul 2007, Veronika Nefedova wrote:

> you can use my kml file that I compiled today with the latest karajan (its on
> terminable).
> 
> On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote:
> 
> > ou have your heap set on the 244 molecule workflow? I run out at the
> > compile stage with default.
> > -- 
> > 
> > 
> 


From nefedova at mcs.anl.gov  Thu Jul  5 15:45:49 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Thu, 5 Jul 2007 15:45:49 -0500
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <Pine.LNX.4.64.0707052012050.7513@dildano.hawaga.org.uk>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
	<Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk>
	<C2E60075-E0F8-4EEF-AA09-63691512CE8F@mcs.anl.gov>
	<Pine.LNX.4.64.0707052012050.7513@dildano.hawaga.org.uk>
Message-ID: <74CD898E-7A60-4F25-BD15-C0219487AEC0@mcs.anl.gov>

1657 for Karajan and 887 for vdsk

On Jul 5, 2007, at 3:13 PM, Ben Clifford wrote:

>
> what karajan revision? and what swift revision?
>
> (type svn info in the cog and dsk directories...)
>
> On Thu, 5 Jul 2007, Veronika Nefedova wrote:
>
>> you can use my kml file that I compiled today with the latest  
>> karajan (its on
>> terminable).
>>
>> On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote:
>>
>>> ou have your heap set on the 244 molecule workflow? I run out at the
>>> compile stage with default.
>>> -- 
>>>
>>>
>>
>


From hategan at mcs.anl.gov  Thu Jul  5 16:20:37 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 05 Jul 2007 16:20:37 -0500
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <74CD898E-7A60-4F25-BD15-C0219487AEC0@mcs.anl.gov>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
	<Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk>
	<C2E60075-E0F8-4EEF-AA09-63691512CE8F@mcs.anl.gov>
	<Pine.LNX.4.64.0707052012050.7513@dildano.hawaga.org.uk>
	<74CD898E-7A60-4F25-BD15-C0219487AEC0@mcs.anl.gov>
Message-ID: <1183670437.31476.0.camel@blabla.mcs.anl.gov>

I might know what it is. Stay tuned.

On Thu, 2007-07-05 at 15:45 -0500, Veronika Nefedova wrote:
> 1657 for Karajan and 887 for vdsk
> 
> On Jul 5, 2007, at 3:13 PM, Ben Clifford wrote:
> 
> >
> > what karajan revision? and what swift revision?
> >
> > (type svn info in the cog and dsk directories...)
> >
> > On Thu, 5 Jul 2007, Veronika Nefedova wrote:
> >
> >> you can use my kml file that I compiled today with the latest  
> >> karajan (its on
> >> terminable).
> >>
> >> On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote:
> >>
> >>> ou have your heap set on the 244 molecule workflow? I run out at the
> >>> compile stage with default.
> >>> -- 
> >>>
> >>>
> >>
> >
> 


From hategan at mcs.anl.gov  Thu Jul  5 16:33:42 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 05 Jul 2007 16:33:42 -0500
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <1183670437.31476.0.camel@blabla.mcs.anl.gov>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
	<Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk>
	<C2E60075-E0F8-4EEF-AA09-63691512CE8F@mcs.anl.gov>
	<Pine.LNX.4.64.0707052012050.7513@dildano.hawaga.org.uk>
	<74CD898E-7A60-4F25-BD15-C0219487AEC0@mcs.anl.gov>
	<1183670437.31476.0.camel@blabla.mcs.anl.gov>
Message-ID: <1183671222.31476.4.camel@blabla.mcs.anl.gov>

In iteratizing the recursive thing that caused the stack overflow, I
ignored the fact that there was a lock on every object in the recursion
steps.

Tentative fix in SVN. I'm running tests to see if things hold.

On Thu, 2007-07-05 at 16:20 -0500, Mihael Hategan wrote:
> I might know what it is. Stay tuned.
> 
> On Thu, 2007-07-05 at 15:45 -0500, Veronika Nefedova wrote:
> > 1657 for Karajan and 887 for vdsk
> > 
> > On Jul 5, 2007, at 3:13 PM, Ben Clifford wrote:
> > 
> > >
> > > what karajan revision? and what swift revision?
> > >
> > > (type svn info in the cog and dsk directories...)
> > >
> > > On Thu, 5 Jul 2007, Veronika Nefedova wrote:
> > >
> > >> you can use my kml file that I compiled today with the latest  
> > >> karajan (its on
> > >> terminable).
> > >>
> > >> On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote:
> > >>
> > >>> ou have your heap set on the 244 molecule workflow? I run out at the
> > >>> compile stage with default.
> > >>> -- 
> > >>>
> > >>>
> > >>
> > >
> > 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 


From benc at hawaga.org.uk  Thu Jul  5 16:17:54 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 5 Jul 2007 21:17:54 +0000 (GMT)
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <74CD898E-7A60-4F25-BD15-C0219487AEC0@mcs.anl.gov>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
	<Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk>
	<C2E60075-E0F8-4EEF-AA09-63691512CE8F@mcs.anl.gov>
	<Pine.LNX.4.64.0707052012050.7513@dildano.hawaga.org.uk>
	<74CD898E-7A60-4F25-BD15-C0219487AEC0@mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707052116260.7513@dildano.hawaga.org.uk>


try r1650 - that's the version of karajan that we've had for ages, before 
this week.

-- 


From nefedova at mcs.anl.gov  Thu Jul  5 17:05:43 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Thu, 5 Jul 2007 17:05:43 -0500
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <Pine.LNX.4.64.0707052116260.7513@dildano.hawaga.org.uk>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
	<Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk>
	<C2E60075-E0F8-4EEF-AA09-63691512CE8F@mcs.anl.gov>
	<Pine.LNX.4.64.0707052012050.7513@dildano.hawaga.org.uk>
	<74CD898E-7A60-4F25-BD15-C0219487AEC0@mcs.anl.gov>
	<Pine.LNX.4.64.0707052116260.7513@dildano.hawaga.org.uk>
Message-ID: <8DF643EF-54CD-4900-B209-C3C0210D8E8E@mcs.anl.gov>

I know that r1650 works - but I need to use Mihael's fix to see if my  
workflow could run successfully w/falcon (thats what his karajan  
update is about)

On Jul 5, 2007, at 4:17 PM, Ben Clifford wrote:

>
> try r1650 - that's the version of karajan that we've had for ages,  
> before
> this week.
>
> -- 
>


From hategan at mcs.anl.gov  Thu Jul  5 17:10:36 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 05 Jul 2007 17:10:36 -0500
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <1183671222.31476.4.camel@blabla.mcs.anl.gov>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
	<Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk>
	<C2E60075-E0F8-4EEF-AA09-63691512CE8F@mcs.anl.gov>
	<Pine.LNX.4.64.0707052012050.7513@dildano.hawaga.org.uk>
	<74CD898E-7A60-4F25-BD15-C0219487AEC0@mcs.anl.gov>
	<1183670437.31476.0.camel@blabla.mcs.anl.gov>
	<1183671222.31476.4.camel@blabla.mcs.anl.gov>
Message-ID: <1183673436.9192.0.camel@blabla.mcs.anl.gov>

On Thu, 2007-07-05 at 16:33 -0500, Mihael Hategan wrote:
> Tentative fix in SVN. I'm running tests to see if things hold.

Seems to work, as far as the karajan tests can tell.

> 
> On Thu, 2007-07-05 at 16:20 -0500, Mihael Hategan wrote:
> > I might know what it is. Stay tuned.
> > 
> > On Thu, 2007-07-05 at 15:45 -0500, Veronika Nefedova wrote:
> > > 1657 for Karajan and 887 for vdsk
> > > 
> > > On Jul 5, 2007, at 3:13 PM, Ben Clifford wrote:
> > > 
> > > >
> > > > what karajan revision? and what swift revision?
> > > >
> > > > (type svn info in the cog and dsk directories...)
> > > >
> > > > On Thu, 5 Jul 2007, Veronika Nefedova wrote:
> > > >
> > > >> you can use my kml file that I compiled today with the latest  
> > > >> karajan (its on
> > > >> terminable).
> > > >>
> > > >> On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote:
> > > >>
> > > >>> ou have your heap set on the 244 molecule workflow? I run out at the
> > > >>> compile stage with default.
> > > >>> -- 
> > > >>>
> > > >>>
> > > >>
> > > >
> > > 
> > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 


From bugzilla-daemon at mcs.anl.gov  Fri Jul  6 09:16:51 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Fri,  6 Jul 2007 09:16:51 -0500 (CDT)
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <bug-72-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070706141651.D911516502@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72


------- Comment #16 from nefedova at mcs.anl.gov  2007-07-06 09:16 -------
The latest Karajan fix seems to work (i.e. Workflow compiles). Falcon
experiences some problems. Ioan, please post the details of the current
problems here.

Nika


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From benc at hawaga.org.uk  Fri Jul  6 09:49:33 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Fri, 6 Jul 2007 20:19:33 +0530 (IST)
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <Pine.LNX.4.64.0707052012050.7513@dildano.hawaga.org.uk>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
	<Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk>
	<C2E60075-E0F8-4EEF-AA09-63691512CE8F@mcs.anl.gov>
	<Pine.LNX.4.64.0707052012050.7513@dildano.hawaga.org.uk>
Message-ID: <Pine.OSX.4.64.0707062017240.14331@soju.hawaga.org.uk>


On my machine that seems to take i) forever to compile (as in I gave up 
after it generated the 25mb intermediate xml file but before it had made a 
kml file), and ii) forever to get to the stage where it tries to execute 
anything (as in I gave up before it gave me an error about not being able 
to find the transformations to run).

What sort of times does it usually take for you to:

 i) compile

 ii) run the first executable

?

On Thu, 5 Jul 2007, Ben Clifford wrote:

> 
> what karajan revision? and what swift revision?
> 
> (type svn info in the cog and dsk directories...)
> 
> On Thu, 5 Jul 2007, Veronika Nefedova wrote:
> 
> > you can use my kml file that I compiled today with the latest karajan (its on
> > terminable).
> > 
> > On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote:
> > 
> > > ou have your heap set on the 244 molecule workflow? I run out at the
> > > compile stage with default.
> > > -- 
> > > 
> > > 
> > 
> 
> 


From nefedova at mcs.anl.gov  Fri Jul  6 10:13:28 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Fri, 6 Jul 2007 10:13:28 -0500
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <Pine.OSX.4.64.0707062017240.14331@soju.hawaga.org.uk>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
	<Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk>
	<C2E60075-E0F8-4EEF-AA09-63691512CE8F@mcs.anl.gov>
	<Pine.LNX.4.64.0707052012050.7513@dildano.hawaga.org.uk>
	<Pine.OSX.4.64.0707062017240.14331@soju.hawaga.org.uk>
Message-ID: <B8C3C06B-657F-4156-A50A-E580A26F4BAD@mcs.anl.gov>

with the old version of the script (all loops unrolled) it would take  
about 1.5 hours to compile (244 molecules). Once compiled it would  
start the execution within a minute.
A new swift code (with the main loop done in 'foreach' is under way  
(I am testing it right now).

Nika

On Jul 6, 2007, at 9:49 AM, Ben Clifford wrote:

>
> On my machine that seems to take i) forever to compile (as in I  
> gave up
> after it generated the 25mb intermediate xml file but before it had  
> made a
> kml file), and ii) forever to get to the stage where it tries to  
> execute
> anything (as in I gave up before it gave me an error about not  
> being able
> to find the transformations to run).
>
> What sort of times does it usually take for you to:
>
>  i) compile
>
>  ii) run the first executable
>
> ?
>
> On Thu, 5 Jul 2007, Ben Clifford wrote:
>
>>
>> what karajan revision? and what swift revision?
>>
>> (type svn info in the cog and dsk directories...)
>>
>> On Thu, 5 Jul 2007, Veronika Nefedova wrote:
>>
>>> you can use my kml file that I compiled today with the latest  
>>> karajan (its on
>>> terminable).
>>>
>>> On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote:
>>>
>>>> ou have your heap set on the 244 molecule workflow? I run out at  
>>>> the
>>>> compile stage with default.
>>>> -- 
>>>>
>>>>
>>>
>>
>>
>


From hategan at mcs.anl.gov  Fri Jul  6 10:16:43 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 06 Jul 2007 10:16:43 -0500
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <B8C3C06B-657F-4156-A50A-E580A26F4BAD@mcs.anl.gov>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
	<Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk>
	<C2E60075-E0F8-4EEF-AA09-63691512CE8F@mcs.anl.gov>
	<Pine.LNX.4.64.0707052012050.7513@dildano.hawaga.org.uk>
	<Pine.OSX.4.64.0707062017240.14331@soju.hawaga.org.uk>
	<B8C3C06B-657F-4156-A50A-E580A26F4BAD@mcs.anl.gov>
Message-ID: <1183735003.9663.0.camel@blabla.mcs.anl.gov>

On Fri, 2007-07-06 at 10:13 -0500, Veronika Nefedova wrote:
> with the old version of the script (all loops unrolled) it would take  
> about 1.5 hours to compile (244 molecules). Once compiled it would  
> start the execution within a minute.

How can you tell when it's done compiling?

> A new swift code (with the main loop done in 'foreach' is under way  
> (I am testing it right now).
> 
> Nika
> 
> On Jul 6, 2007, at 9:49 AM, Ben Clifford wrote:
> 
> >
> > On my machine that seems to take i) forever to compile (as in I  
> > gave up
> > after it generated the 25mb intermediate xml file but before it had  
> > made a
> > kml file), and ii) forever to get to the stage where it tries to  
> > execute
> > anything (as in I gave up before it gave me an error about not  
> > being able
> > to find the transformations to run).
> >
> > What sort of times does it usually take for you to:
> >
> >  i) compile
> >
> >  ii) run the first executable
> >
> > ?
> >
> > On Thu, 5 Jul 2007, Ben Clifford wrote:
> >
> >>
> >> what karajan revision? and what swift revision?
> >>
> >> (type svn info in the cog and dsk directories...)
> >>
> >> On Thu, 5 Jul 2007, Veronika Nefedova wrote:
> >>
> >>> you can use my kml file that I compiled today with the latest  
> >>> karajan (its on
> >>> terminable).
> >>>
> >>> On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote:
> >>>
> >>>> ou have your heap set on the 244 molecule workflow? I run out at  
> >>>> the
> >>>> compile stage with default.
> >>>> -- 
> >>>>
> >>>>
> >>>
> >>
> >>
> >
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 


From nefedova at mcs.anl.gov  Fri Jul  6 10:22:23 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Fri, 6 Jul 2007 10:22:23 -0500
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <1183735003.9663.0.camel@blabla.mcs.anl.gov>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
	<Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk>
	<C2E60075-E0F8-4EEF-AA09-63691512CE8F@mcs.anl.gov>
	<Pine.LNX.4.64.0707052012050.7513@dildano.hawaga.org.uk>
	<Pine.OSX.4.64.0707062017240.14331@soju.hawaga.org.uk>
	<B8C3C06B-657F-4156-A50A-E580A26F4BAD@mcs.anl.gov>
	<1183735003.9663.0.camel@blabla.mcs.anl.gov>
Message-ID: <8DC8874A-B469-44BA-B9AB-B3CBCBD34E60@mcs.anl.gov>


On Jul 6, 2007, at 10:16 AM, Mihael Hategan wrote:

> On Fri, 2007-07-06 at 10:13 -0500, Veronika Nefedova wrote:
>> with the old version of the script (all loops unrolled) it would take
>> about 1.5 hours to compile (244 molecules). Once compiled it would
>> start the execution within a minute.
>
> How can you tell when it's done compiling?
>

When its done compiling, it starts execution  - you right, its hard  
to tell when its all done in one step. But when you already have the  
compiled code and start execution - it takes less then a minute (30  
seconds?) to send the first task out.

Nika

>> A new swift code (with the main loop done in 'foreach' is under way
>> (I am testing it right now).
>>
>> Nika
>>
>> On Jul 6, 2007, at 9:49 AM, Ben Clifford wrote:
>>
>>>
>>> On my machine that seems to take i) forever to compile (as in I
>>> gave up
>>> after it generated the 25mb intermediate xml file but before it had
>>> made a
>>> kml file), and ii) forever to get to the stage where it tries to
>>> execute
>>> anything (as in I gave up before it gave me an error about not
>>> being able
>>> to find the transformations to run).
>>>
>>> What sort of times does it usually take for you to:
>>>
>>>  i) compile
>>>
>>>  ii) run the first executable
>>>
>>> ?
>>>
>>> On Thu, 5 Jul 2007, Ben Clifford wrote:
>>>
>>>>
>>>> what karajan revision? and what swift revision?
>>>>
>>>> (type svn info in the cog and dsk directories...)
>>>>
>>>> On Thu, 5 Jul 2007, Veronika Nefedova wrote:
>>>>
>>>>> you can use my kml file that I compiled today with the latest
>>>>> karajan (its on
>>>>> terminable).
>>>>>
>>>>> On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote:
>>>>>
>>>>>> ou have your heap set on the 244 molecule workflow? I run out at
>>>>>> the
>>>>>> compile stage with default.
>>>>>> -- 
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>


From hategan at mcs.anl.gov  Fri Jul  6 10:25:32 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 06 Jul 2007 10:25:32 -0500
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <8DC8874A-B469-44BA-B9AB-B3CBCBD34E60@mcs.anl.gov>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
	<Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk>
	<C2E60075-E0F8-4EEF-AA09-63691512CE8F@mcs.anl.gov>
	<Pine.LNX.4.64.0707052012050.7513@dildano.hawaga.org.uk>
	<Pine.OSX.4.64.0707062017240.14331@soju.hawaga.org.uk>
	<B8C3C06B-657F-4156-A50A-E580A26F4BAD@mcs.anl.gov>
	<1183735003.9663.0.camel@blabla.mcs.anl.gov>
	<8DC8874A-B469-44BA-B9AB-B3CBCBD34E60@mcs.anl.gov>
Message-ID: <1183735532.10139.0.camel@blabla.mcs.anl.gov>

On Fri, 2007-07-06 at 10:22 -0500, Veronika Nefedova wrote:
> On Jul 6, 2007, at 10:16 AM, Mihael Hategan wrote:
> 
> > On Fri, 2007-07-06 at 10:13 -0500, Veronika Nefedova wrote:
> >> with the old version of the script (all loops unrolled) it would take
> >> about 1.5 hours to compile (244 molecules). Once compiled it would
> >> start the execution within a minute.
> >
> > How can you tell when it's done compiling?
> >
> 
> When its done compiling, it starts execution  - you right, its hard  
> to tell when its all done in one step. But when you already have the  
> compiled code and start execution - it takes less then a minute (30  
> seconds?) to send the first task out.

That makes sense. We need to speed up compilation?

> 
> Nika
> 
> >> A new swift code (with the main loop done in 'foreach' is under way
> >> (I am testing it right now).
> >>
> >> Nika
> >>
> >> On Jul 6, 2007, at 9:49 AM, Ben Clifford wrote:
> >>
> >>>
> >>> On my machine that seems to take i) forever to compile (as in I
> >>> gave up
> >>> after it generated the 25mb intermediate xml file but before it had
> >>> made a
> >>> kml file), and ii) forever to get to the stage where it tries to
> >>> execute
> >>> anything (as in I gave up before it gave me an error about not
> >>> being able
> >>> to find the transformations to run).
> >>>
> >>> What sort of times does it usually take for you to:
> >>>
> >>>  i) compile
> >>>
> >>>  ii) run the first executable
> >>>
> >>> ?
> >>>
> >>> On Thu, 5 Jul 2007, Ben Clifford wrote:
> >>>
> >>>>
> >>>> what karajan revision? and what swift revision?
> >>>>
> >>>> (type svn info in the cog and dsk directories...)
> >>>>
> >>>> On Thu, 5 Jul 2007, Veronika Nefedova wrote:
> >>>>
> >>>>> you can use my kml file that I compiled today with the latest
> >>>>> karajan (its on
> >>>>> terminable).
> >>>>>
> >>>>> On Jul 5, 2007, at 2:57 PM, Ben Clifford wrote:
> >>>>>
> >>>>>> ou have your heap set on the 244 molecule workflow? I run out at
> >>>>>> the
> >>>>>> compile stage with default.
> >>>>>> -- 
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >> _______________________________________________
> >> Swift-devel mailing list
> >> Swift-devel at ci.uchicago.edu
> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>
> >
> 


From benc at hawaga.org.uk  Fri Jul  6 11:02:01 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Fri, 6 Jul 2007 16:02:01 +0000 (GMT)
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <1183735532.10139.0.camel@blabla.mcs.anl.gov>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk> 
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
	<Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk> 
	<C2E60075-E0F8-4EEF-AA09-63691512CE8F@mcs.anl.gov> 
	<Pine.LNX.4.64.0707052012050.7513@dildano.hawaga.org.uk> 
	<Pine.OSX.4.64.0707062017240.14331@soju.hawaga.org.uk> 
	<B8C3C06B-657F-4156-A50A-E580A26F4BAD@mcs.anl.gov>
	<1183735003.9663.0.camel@blabla.mcs.anl.gov>
	<8DC8874A-B469-44BA-B9AB-B3CBCBD34E60@mcs.anl.gov>
	<1183735532.10139.0.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707061527190.7513@dildano.hawaga.org.uk>


On Fri, 6 Jul 2007, Mihael Hategan wrote:

> That makes sense. We need to speed up compilation?

I think more important is concentrating on the langauge features necessary 
to have smaller source files.

I'm working with Nika on that at the moment.

-- 


From hategan at mcs.anl.gov  Fri Jul  6 11:05:30 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 06 Jul 2007 11:05:30 -0500
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <Pine.LNX.4.64.0707061527190.7513@dildano.hawaga.org.uk>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
	<Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk>
	<C2E60075-E0F8-4EEF-AA09-63691512CE8F@mcs.anl.gov>
	<Pine.LNX.4.64.0707052012050.7513@dildano.hawaga.org.uk>
	<Pine.OSX.4.64.0707062017240.14331@soju.hawaga.org.uk>
	<B8C3C06B-657F-4156-A50A-E580A26F4BAD@mcs.anl.gov>
	<1183735003.9663.0.camel@blabla.mcs.anl.gov>
	<8DC8874A-B469-44BA-B9AB-B3CBCBD34E60@mcs.anl.gov>
	<1183735532.10139.0.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707061527190.7513@dildano.hawaga.org.uk>
Message-ID: <1183737930.15085.0.camel@blabla.mcs.anl.gov>

On Fri, 2007-07-06 at 16:02 +0000, Ben Clifford wrote:
> 
> On Fri, 6 Jul 2007, Mihael Hategan wrote:
> 
> > That makes sense. We need to speed up compilation?
> 
> I think more important is concentrating on the langauge features necessary 
> to have smaller source files.

Yes, of course. But the compilation time is still ridiculous,
considering that it doesn't do much fancy stuff?

> 
> I'm working with Nika on that at the moment.
> 


From yongzh at cs.uchicago.edu  Fri Jul  6 11:14:18 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Fri, 6 Jul 2007 11:14:18 -0500 (CDT)
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <1183737930.15085.0.camel@blabla.mcs.anl.gov>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
	<Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk>
	<C2E60075-E0F8-4EEF-AA09-63691512CE8F@mcs.anl.gov>
	<Pine.LNX.4.64.0707052012050.7513@dildano.hawaga.org.uk>
	<Pine.OSX.4.64.0707062017240.14331@soju.hawaga.org.uk>
	<B8C3C06B-657F-4156-A50A-E580A26F4BAD@mcs.anl.gov>
	<1183735003.9663.0.camel@blabla.mcs.anl.gov>
	<8DC8874A-B469-44BA-B9AB-B3CBCBD34E60@mcs.anl.gov>
	<1183735532.10139.0.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707061527190.7513@dildano.hawaga.org.uk>
	<1183737930.15085.0.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.58.0707061113070.5551@classes.cs.uchicago.edu>

I don't think the compilation takes that much time. It is the starting
time from loading the kml file to dispatching the first job that takes a
long time (for 20k jobs).

Yong.

On Fri, 6 Jul 2007, Mihael Hategan wrote:

> On Fri, 2007-07-06 at 16:02 +0000, Ben Clifford wrote:
> >
> > On Fri, 6 Jul 2007, Mihael Hategan wrote:
> >
> > > That makes sense. We need to speed up compilation?
> >
> > I think more important is concentrating on the langauge features necessary
> > to have smaller source files.
>
> Yes, of course. But the compilation time is still ridiculous,
> considering that it doesn't do much fancy stuff?
>
> >
> > I'm working with Nika on that at the moment.
> >
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From benc at hawaga.org.uk  Fri Jul  6 11:16:28 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Fri, 6 Jul 2007 16:16:28 +0000 (GMT)
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <Pine.LNX.4.58.0707061113070.5551@classes.cs.uchicago.edu>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
	<Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk>
	<C2E60075-E0F8-4EEF-AA09-63691512CE8F@mcs.anl.gov>
	<Pine.LNX.4.64.0707052012050.7513@dildano.hawaga.org.uk>
	<Pine.OSX.4.64.0707062017240.14331@soju.hawaga.org.uk>
	<B8C3C06B-657F-4156-A50A-E580A26F4BAD@mcs.anl.gov>
	<1183735003.9663.0.camel@blabla.mcs.anl.gov>
	<8DC8874A-B469-44BA-B9AB-B3CBCBD34E60@mcs.anl.gov>
	<1183735532.10139.0.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707061527190.7513@dildano.hawaga.org.uk>
	<1183737930.15085.0.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.58.0707061113070.5551@classes.cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0707061615500.7513@dildano.hawaga.org.uk>


the xml->kml conversion took long time when I tried to compile nika's 
.swift file.

I can leave it running overnight and see if it ends...

On Fri, 6 Jul 2007, Yong Zhao wrote:

> I don't think the compilation takes that much time. It is the starting
> time from loading the kml file to dispatching the first job that takes a
> long time (for 20k jobs).
> 
> Yong.
> 
> On Fri, 6 Jul 2007, Mihael Hategan wrote:
> 
> > On Fri, 2007-07-06 at 16:02 +0000, Ben Clifford wrote:
> > >
> > > On Fri, 6 Jul 2007, Mihael Hategan wrote:
> > >
> > > > That makes sense. We need to speed up compilation?
> > >
> > > I think more important is concentrating on the langauge features necessary
> > > to have smaller source files.
> >
> > Yes, of course. But the compilation time is still ridiculous,
> > considering that it doesn't do much fancy stuff?
> >
> > >
> > > I'm working with Nika on that at the moment.
> > >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> 
> 


From yongzh at cs.uchicago.edu  Fri Jul  6 11:19:33 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Fri, 6 Jul 2007 11:19:33 -0500 (CDT)
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <Pine.LNX.4.64.0707061615500.7513@dildano.hawaga.org.uk>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
	<Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk>
	<C2E60075-E0F8-4EEF-AA09-63691512CE8F@mcs.anl.gov>
	<Pine.LNX.4.64.0707052012050.7513@dildano.hawaga.org.uk>
	<Pine.OSX.4.64.0707062017240.14331@soju.hawaga.org.uk>
	<B8C3C06B-657F-4156-A50A-E580A26F4BAD@mcs.anl.gov>
	<1183735003.9663.0.camel@blabla.mcs.anl.gov>
	<8DC8874A-B469-44BA-B9AB-B3CBCBD34E60@mcs.anl.gov>
	<1183735532.10139.0.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707061527190.7513@dildano.hawaga.org.uk>
	<1183737930.15085.0.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.58.0707061113070.5551@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0707061615500.7513@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.58.0707061118050.5551@classes.cs.uchicago.edu>

Really? that is a bit strange. I only tried compiling the 100 molecule
file on viper, and it went through quite fast.

Is it always like this for different versions, and what is the config of
your machine?

Yong.

On Fri, 6 Jul 2007, Ben Clifford wrote:

>
> the xml->kml conversion took long time when I tried to compile nika's
> .swift file.
>
> I can leave it running overnight and see if it ends...
>
> On Fri, 6 Jul 2007, Yong Zhao wrote:
>
> > I don't think the compilation takes that much time. It is the starting
> > time from loading the kml file to dispatching the first job that takes a
> > long time (for 20k jobs).
> >
> > Yong.
> >
> > On Fri, 6 Jul 2007, Mihael Hategan wrote:
> >
> > > On Fri, 2007-07-06 at 16:02 +0000, Ben Clifford wrote:
> > > >
> > > > On Fri, 6 Jul 2007, Mihael Hategan wrote:
> > > >
> > > > > That makes sense. We need to speed up compilation?
> > > >
> > > > I think more important is concentrating on the langauge features necessary
> > > > to have smaller source files.
> > >
> > > Yes, of course. But the compilation time is still ridiculous,
> > > considering that it doesn't do much fancy stuff?
> > >
> > > >
> > > > I'm working with Nika on that at the moment.
> > > >
> > >
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > >
> >
> >
>


From hategan at mcs.anl.gov  Fri Jul  6 11:30:31 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 06 Jul 2007 11:30:31 -0500
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <Pine.LNX.4.58.0707061113070.5551@classes.cs.uchicago.edu>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk>
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
	<Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk>
	<C2E60075-E0F8-4EEF-AA09-63691512CE8F@mcs.anl.gov>
	<Pine.LNX.4.64.0707052012050.7513@dildano.hawaga.org.uk>
	<Pine.OSX.4.64.0707062017240.14331@soju.hawaga.org.uk>
	<B8C3C06B-657F-4156-A50A-E580A26F4BAD@mcs.anl.gov>
	<1183735003.9663.0.camel@blabla.mcs.anl.gov>
	<8DC8874A-B469-44BA-B9AB-B3CBCBD34E60@mcs.anl.gov>
	<1183735532.10139.0.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707061527190.7513@dildano.hawaga.org.uk>
	<1183737930.15085.0.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.58.0707061113070.5551@classes.cs.uchicago.edu>
Message-ID: <1183739431.18292.0.camel@blabla.mcs.anl.gov>

On Fri, 2007-07-06 at 11:14 -0500, Yong Zhao wrote:
> I don't think the compilation takes that much time. It is the starting
> time from loading the kml file to dispatching the first job that takes a
> long time (for 20k jobs).

Apparently Nika just mentioned that starting an already compiled .kml
takes less than one minute.

> 
> Yong.
> 
> On Fri, 6 Jul 2007, Mihael Hategan wrote:
> 
> > On Fri, 2007-07-06 at 16:02 +0000, Ben Clifford wrote:
> > >
> > > On Fri, 6 Jul 2007, Mihael Hategan wrote:
> > >
> > > > That makes sense. We need to speed up compilation?
> > >
> > > I think more important is concentrating on the langauge features necessary
> > > to have smaller source files.
> >
> > Yes, of course. But the compilation time is still ridiculous,
> > considering that it doesn't do much fancy stuff?
> >
> > >
> > > I'm working with Nika on that at the moment.
> > >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> 


From bugzilla-daemon at mcs.anl.gov  Fri Jul  6 11:43:40 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Fri,  6 Jul 2007 11:43:40 -0500 (CDT)
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <bug-72-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070706164340.52EEE164DD@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72


------- Comment #17 from iraicu at cs.uchicago.edu  2007-07-06 11:43 -------
(In reply to comment #16)
> The latest Karajan fix seems to work (i.e. Workflow compiles). Falcon
> experiences some problems. Ioan, please post the details of the current
> problems here.
> 
I made some chnages in the last few days to fix some known issues I have had
with Falkon, although none of these issues were relevant to the MolDyn runs we
have been making recently.  I made some small sanity checks after I made the
changes, and everything seemed fine.  Then, yesterday, when we tried the 244
mol run again, within the first 100 jobs, Falkon seemed to be having problems.

It looked like notifications to the workers weren't always going through (which
has never happened before). This would cause some number of CPUs to sit idle
while Falkon recovered from this (its default is to clean up every 60 sec).  I
made some more synthetic tests from my command line client (independent of
Swift), and the problem was reproducible about 3~4 times in a row that I tried.
 Then, I even managed to crash the GT4 container, as it locked up and it would
not do anything.  This was also a fist, I have never managed to get the GT4
container in a state where it would not answer any more WS calls, yet the CPU
was idle on the machine.  From the surface, it looked like all hell broke
loose....

I added some more debuging statements and turned on all possible debugging...
and a few hours later (last night), I tried again and everything was working
perfect!  I ran some 100K jobs through it and it seemed to work perfect.  I
even disabled all the debugging that I added just to see if that did
anything,and things were still perfect.  I blows my mind what could have
happened, to go from something that was repeatable every time, to something
that I can't reproduce, and this is all in the same environment, configuration,
and hardware.  I'll dig around some more to try to make sense of what happened,
and perhaps we can try the 244 mol run again once I am convinced that I have
not broken anything with my latest changes from earlier this week.

Ioan

PS: I could also try to revert back to the earlier version before my changes,
especially as the changes I made were not geared for the MolDyn app, and more
in general.

> Nika
> 


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Fri Jul  6 11:49:50 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Fri,  6 Jul 2007 11:49:50 -0500 (CDT)
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <bug-72-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070706164950.6AC3816502@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72


------- Comment #18 from hategan at mcs.anl.gov  2007-07-06 11:49 -------
(In reply to comment #17)
> (In reply to comment #16)
> the problem was reproducible about 3~4 times in a row that I tried.
>  [...]
> 
> I blows my mind what could have
> happened, to go from something that was repeatable every time, to something
> that I can't reproduce, and this is all in the same environment, configuration,
> and hardware.

Those are concurrency issues, most likely. The fact that things work fine a
number of times is not a guarantee that they will always do so. That's what
makes this so difficult.


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From yongzh at cs.uchicago.edu  Fri Jul  6 11:51:59 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Fri, 6 Jul 2007 11:51:59 -0500 (CDT)
Subject: [Swift-devel] recent karajan changes causing trouble
In-Reply-To: <1183739431.18292.0.camel@blabla.mcs.anl.gov>
References: <Pine.OSX.4.64.0707030629310.25505@soju.hawaga.org.uk> 
	<1183429417.16404.0.camel@blabla.mcs.anl.gov>
	<C7F28B25-5F62-4C2F-9237-EBE3743FFC34@mcs.anl.gov>
	<Pine.LNX.4.64.0707051957100.7513@dildano.hawaga.org.uk> 
	<C2E60075-E0F8-4EEF-AA09-63691512CE8F@mcs.anl.gov> 
	<Pine.LNX.4.64.0707052012050.7513@dildano.hawaga.org.uk> 
	<Pine.OSX.4.64.0707062017240.14331@soju.hawaga.org.uk> 
	<B8C3C06B-657F-4156-A50A-E580A26F4BAD@mcs.anl.gov>
	<1183735003.9663.0.camel@blabla.mcs.anl.gov>
	<8DC8874A-B469-44BA-B9AB-B3CBCBD34E60@mcs.anl.gov> 
	<1183735532.10139.0.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0707061527190.7513@dildano.hawaga.org.uk> 
	<1183737930.15085.0.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.58.0707061113070.5551@classes.cs.uchicago.edu>
	<1183739431.18292.0.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.58.0707061151350.10338@classes.cs.uchicago.edu>

Yeah, and I am arguing against that as I believe that is not the case.

Yong.

On Fri, 6 Jul 2007, Mihael Hategan wrote:

> On Fri, 2007-07-06 at 11:14 -0500, Yong Zhao wrote:
> > I don't think the compilation takes that much time. It is the starting
> > time from loading the kml file to dispatching the first job that takes a
> > long time (for 20k jobs).
>
> Apparently Nika just mentioned that starting an already compiled .kml
> takes less than one minute.
>
> >
> > Yong.
> >
> > On Fri, 6 Jul 2007, Mihael Hategan wrote:
> >
> > > On Fri, 2007-07-06 at 16:02 +0000, Ben Clifford wrote:
> > > >
> > > > On Fri, 6 Jul 2007, Mihael Hategan wrote:
> > > >
> > > > > That makes sense. We need to speed up compilation?
> > > >
> > > > I think more important is concentrating on the langauge features necessary
> > > > to have smaller source files.
> > >
> > > Yes, of course. But the compilation time is still ridiculous,
> > > considering that it doesn't do much fancy stuff?
> > >
> > > >
> > > > I'm working with Nika on that at the moment.
> > > >
> > >
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > >
> >
>
>


From nefedova at mcs.anl.gov  Fri Jul  6 12:38:02 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Fri, 6 Jul 2007 12:38:02 -0500
Subject: [Swift-devel] Karajan problem?
Message-ID: <AE66F8BB-C1AC-482A-8A86-6A5BAC7389E2@mcs.anl.gov>

Hi, Mihael:

I am testing now my new code (with loops and various string  
operations!) but I am getting some Karajan errors. I am wondering if  
you could point me to a possible reason for these errors? I do not  
see any reference to my code so I am not sure where to start looking...

org.griphyn.vdl.mapping.InvalidPathException: Invalid path (1) for  
type file
org.griphyn.vdl.mapping.InvalidPathException: Invalid path (1) for  
type file
Caused by: org.griphyn.vdl.mapping.InvalidPathException: Invalid path  
(1) for type file
         at org.griphyn.vdl.karajan.lib.GetField.function 
(GetField.java:33)
         at org.griphyn.vdl.karajan.lib.VDLFunction.post 
(VDLFunction.java:58)
         at org.globus.cog.karajan.workflow.nodes.Sequential.startNext 
(Sequential.java:51)
         at  
org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren 
(Sequential.java:27)
         at  
org.globus.cog.karajan.workflow.nodes.FlowContainer.execute 
(FlowContainer.java:63)
         at org.globus.cog.karajan.workflow.nodes.FlowNode.restart 
(FlowNode.java:239)
         at org.globus.cog.karajan.workflow.nodes.FlowNode.start 
(FlowNode.java:280)
         at  
org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent 
(FlowNode.java:392)
         at org.globus.cog.karajan.workflow.nodes.FlowNode.event 
(FlowNode.java:331)
         at org.globus.cog.karajan.workflow.FlowElementWrapper.event 
(FlowElementWrapper.java:227)
         at org.globus.cog.karajan.workflow.events.EventBus.send 
(EventBus.java:123)
         at org.globus.cog.karajan.workflow.events.EventBus.sendHooked 
(EventBus.java:97)
         at org.globus.cog.karajan.workflow.events.EventWorker.run 
(EventWorker.java:69)
Caused by: org.griphyn.vdl.mapping.InvalidPathException: Invalid path  
(1) for type file
         at org.griphyn.vdl.mapping.AbstractDataNode.getFields 
(AbstractDataNode.java:139)
         at org.griphyn.vdl.mapping.AbstractDataNode.getFields 
(AbstractDataNode.java:114)
         at org.griphyn.vdl.karajan.lib.GetField.function 
(GetField.java:25)
         ... 12 more
Execution failed:
         org.griphyn.vdl.mapping.InvalidPathException: Invalid path  
(1) for type file


Thanks!

Nika


From hategan at mcs.anl.gov  Fri Jul  6 13:18:46 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 06 Jul 2007 13:18:46 -0500
Subject: [Swift-devel] Re: Karajan problem?
In-Reply-To: <AE66F8BB-C1AC-482A-8A86-6A5BAC7389E2@mcs.anl.gov>
References: <AE66F8BB-C1AC-482A-8A86-6A5BAC7389E2@mcs.anl.gov>
Message-ID: <1183745926.22318.1.camel@blabla.mcs.anl.gov>

Are you trying to access a file as an array? As in
file x;
x[1]


On Fri, 2007-07-06 at 12:38 -0500, Veronika Nefedova wrote:
> Hi, Mihael:
> 
> I am testing now my new code (with loops and various string  
> operations!) but I am getting some Karajan errors. I am wondering if  
> you could point me to a possible reason for these errors? I do not  
> see any reference to my code so I am not sure where to start looking...
> 
> org.griphyn.vdl.mapping.InvalidPathException: Invalid path (1) for  
> type file
> org.griphyn.vdl.mapping.InvalidPathException: Invalid path (1) for  
> type file
> Caused by: org.griphyn.vdl.mapping.InvalidPathException: Invalid path  
> (1) for type file
>          at org.griphyn.vdl.karajan.lib.GetField.function 
> (GetField.java:33)
>          at org.griphyn.vdl.karajan.lib.VDLFunction.post 
> (VDLFunction.java:58)
>          at org.globus.cog.karajan.workflow.nodes.Sequential.startNext 
> (Sequential.java:51)
>          at  
> org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren 
> (Sequential.java:27)
>          at  
> org.globus.cog.karajan.workflow.nodes.FlowContainer.execute 
> (FlowContainer.java:63)
>          at org.globus.cog.karajan.workflow.nodes.FlowNode.restart 
> (FlowNode.java:239)
>          at org.globus.cog.karajan.workflow.nodes.FlowNode.start 
> (FlowNode.java:280)
>          at  
> org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent 
> (FlowNode.java:392)
>          at org.globus.cog.karajan.workflow.nodes.FlowNode.event 
> (FlowNode.java:331)
>          at org.globus.cog.karajan.workflow.FlowElementWrapper.event 
> (FlowElementWrapper.java:227)
>          at org.globus.cog.karajan.workflow.events.EventBus.send 
> (EventBus.java:123)
>          at org.globus.cog.karajan.workflow.events.EventBus.sendHooked 
> (EventBus.java:97)
>          at org.globus.cog.karajan.workflow.events.EventWorker.run 
> (EventWorker.java:69)
> Caused by: org.griphyn.vdl.mapping.InvalidPathException: Invalid path  
> (1) for type file
>          at org.griphyn.vdl.mapping.AbstractDataNode.getFields 
> (AbstractDataNode.java:139)
>          at org.griphyn.vdl.mapping.AbstractDataNode.getFields 
> (AbstractDataNode.java:114)
>          at org.griphyn.vdl.karajan.lib.GetField.function 
> (GetField.java:25)
>          ... 12 more
> Execution failed:
>          org.griphyn.vdl.mapping.InvalidPathException: Invalid path  
> (1) for type file
> 
> 
> Thanks!
> 
> Nika
> 


From nefedova at mcs.anl.gov  Fri Jul  6 13:20:44 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Fri, 6 Jul 2007 13:20:44 -0500
Subject: [Swift-devel] Re: Karajan problem?
In-Reply-To: <1183745926.22318.1.camel@blabla.mcs.anl.gov>
References: <AE66F8BB-C1AC-482A-8A86-6A5BAC7389E2@mcs.anl.gov>
	<1183745926.22318.1.camel@blabla.mcs.anl.gov>
Message-ID: <D963B7C0-FFD4-4CF7-B2A8-B15BFEC5C4F6@mcs.anl.gov>

Yes, I am:

file outfiles <fixed_array_mapper;files="solv_chg.out, solv_disp.out,  
solv_repu_0_0.2.out, solv_repu_0.2_0.3.out, solv_repu_0.3_0.4.out,  
solv_repu_0.4_0.5.out, solv_repu_0.5_0.6.out, solv_repu_0.6_0.7.out,  
solv_repu_0.7_0.8.out, solv_repu_0.8_0.9.out, solv_repu_0.9_1.out">;
outfiles [0] = CHARMM4 (solv_chg, whaminp, s1, "input:solv_chg");
<snip>


On Jul 6, 2007, at 1:18 PM, Mihael Hategan wrote:

> Are you trying to access a file as an array? As in
> file x;
> x[1]
>
>
> On Fri, 2007-07-06 at 12:38 -0500, Veronika Nefedova wrote:
>> Hi, Mihael:
>>
>> I am testing now my new code (with loops and various string
>> operations!) but I am getting some Karajan errors. I am wondering if
>> you could point me to a possible reason for these errors? I do not
>> see any reference to my code so I am not sure where to start  
>> looking...
>>
>> org.griphyn.vdl.mapping.InvalidPathException: Invalid path (1) for
>> type file
>> org.griphyn.vdl.mapping.InvalidPathException: Invalid path (1) for
>> type file
>> Caused by: org.griphyn.vdl.mapping.InvalidPathException: Invalid path
>> (1) for type file
>>          at org.griphyn.vdl.karajan.lib.GetField.function
>> (GetField.java:33)
>>          at org.griphyn.vdl.karajan.lib.VDLFunction.post
>> (VDLFunction.java:58)
>>          at  
>> org.globus.cog.karajan.workflow.nodes.Sequential.startNext
>> (Sequential.java:51)
>>          at
>> org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren
>> (Sequential.java:27)
>>          at
>> org.globus.cog.karajan.workflow.nodes.FlowContainer.execute
>> (FlowContainer.java:63)
>>          at org.globus.cog.karajan.workflow.nodes.FlowNode.restart
>> (FlowNode.java:239)
>>          at org.globus.cog.karajan.workflow.nodes.FlowNode.start
>> (FlowNode.java:280)
>>          at
>> org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent
>> (FlowNode.java:392)
>>          at org.globus.cog.karajan.workflow.nodes.FlowNode.event
>> (FlowNode.java:331)
>>          at org.globus.cog.karajan.workflow.FlowElementWrapper.event
>> (FlowElementWrapper.java:227)
>>          at org.globus.cog.karajan.workflow.events.EventBus.send
>> (EventBus.java:123)
>>          at  
>> org.globus.cog.karajan.workflow.events.EventBus.sendHooked
>> (EventBus.java:97)
>>          at org.globus.cog.karajan.workflow.events.EventWorker.run
>> (EventWorker.java:69)
>> Caused by: org.griphyn.vdl.mapping.InvalidPathException: Invalid path
>> (1) for type file
>>          at org.griphyn.vdl.mapping.AbstractDataNode.getFields
>> (AbstractDataNode.java:139)
>>          at org.griphyn.vdl.mapping.AbstractDataNode.getFields
>> (AbstractDataNode.java:114)
>>          at org.griphyn.vdl.karajan.lib.GetField.function
>> (GetField.java:25)
>>          ... 12 more
>> Execution failed:
>>          org.griphyn.vdl.mapping.InvalidPathException: Invalid path
>> (1) for type file
>>
>>
>> Thanks!
>>
>> Nika
>>
>


From hategan at mcs.anl.gov  Fri Jul  6 13:29:00 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 06 Jul 2007 13:29:00 -0500
Subject: [Swift-devel] Re: Karajan problem?
In-Reply-To: <D963B7C0-FFD4-4CF7-B2A8-B15BFEC5C4F6@mcs.anl.gov>
References: <AE66F8BB-C1AC-482A-8A86-6A5BAC7389E2@mcs.anl.gov>
	<1183745926.22318.1.camel@blabla.mcs.anl.gov>
	<D963B7C0-FFD4-4CF7-B2A8-B15BFEC5C4F6@mcs.anl.gov>
Message-ID: <1183746540.23000.1.camel@blabla.mcs.anl.gov>

I'm assuming you see the problem with that.

On Fri, 2007-07-06 at 13:20 -0500, Veronika Nefedova wrote:
> Yes, I am:
> 
> file outfiles <fixed_array_mapper;files="solv_chg.out, solv_disp.out,  
> solv_repu_0_0.2.out, solv_repu_0.2_0.3.out, solv_repu_0.3_0.4.out,  
> solv_repu_0.4_0.5.out, solv_repu_0.5_0.6.out, solv_repu_0.6_0.7.out,  
> solv_repu_0.7_0.8.out, solv_repu_0.8_0.9.out, solv_repu_0.9_1.out">;
> outfiles [0] = CHARMM4 (solv_chg, whaminp, s1, "input:solv_chg");
> <snip>
> 
> 
> On Jul 6, 2007, at 1:18 PM, Mihael Hategan wrote:
> 
> > Are you trying to access a file as an array? As in
> > file x;
> > x[1]
> >
> >
> > On Fri, 2007-07-06 at 12:38 -0500, Veronika Nefedova wrote:
> >> Hi, Mihael:
> >>
> >> I am testing now my new code (with loops and various string
> >> operations!) but I am getting some Karajan errors. I am wondering if
> >> you could point me to a possible reason for these errors? I do not
> >> see any reference to my code so I am not sure where to start  
> >> looking...
> >>
> >> org.griphyn.vdl.mapping.InvalidPathException: Invalid path (1) for
> >> type file
> >> org.griphyn.vdl.mapping.InvalidPathException: Invalid path (1) for
> >> type file
> >> Caused by: org.griphyn.vdl.mapping.InvalidPathException: Invalid path
> >> (1) for type file
> >>          at org.griphyn.vdl.karajan.lib.GetField.function
> >> (GetField.java:33)
> >>          at org.griphyn.vdl.karajan.lib.VDLFunction.post
> >> (VDLFunction.java:58)
> >>          at  
> >> org.globus.cog.karajan.workflow.nodes.Sequential.startNext
> >> (Sequential.java:51)
> >>          at
> >> org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren
> >> (Sequential.java:27)
> >>          at
> >> org.globus.cog.karajan.workflow.nodes.FlowContainer.execute
> >> (FlowContainer.java:63)
> >>          at org.globus.cog.karajan.workflow.nodes.FlowNode.restart
> >> (FlowNode.java:239)
> >>          at org.globus.cog.karajan.workflow.nodes.FlowNode.start
> >> (FlowNode.java:280)
> >>          at
> >> org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent
> >> (FlowNode.java:392)
> >>          at org.globus.cog.karajan.workflow.nodes.FlowNode.event
> >> (FlowNode.java:331)
> >>          at org.globus.cog.karajan.workflow.FlowElementWrapper.event
> >> (FlowElementWrapper.java:227)
> >>          at org.globus.cog.karajan.workflow.events.EventBus.send
> >> (EventBus.java:123)
> >>          at  
> >> org.globus.cog.karajan.workflow.events.EventBus.sendHooked
> >> (EventBus.java:97)
> >>          at org.globus.cog.karajan.workflow.events.EventWorker.run
> >> (EventWorker.java:69)
> >> Caused by: org.griphyn.vdl.mapping.InvalidPathException: Invalid path
> >> (1) for type file
> >>          at org.griphyn.vdl.mapping.AbstractDataNode.getFields
> >> (AbstractDataNode.java:139)
> >>          at org.griphyn.vdl.mapping.AbstractDataNode.getFields
> >> (AbstractDataNode.java:114)
> >>          at org.griphyn.vdl.karajan.lib.GetField.function
> >> (GetField.java:25)
> >>          ... 12 more
> >> Execution failed:
> >>          org.griphyn.vdl.mapping.InvalidPathException: Invalid path
> >> (1) for type file
> >>
> >>
> >> Thanks!
> >>
> >> Nika
> >>
> >
> 


From nefedova at mcs.anl.gov  Fri Jul  6 13:33:59 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Fri, 6 Jul 2007 13:33:59 -0500
Subject: [Swift-devel] Re: Karajan problem?
In-Reply-To: <1183746540.23000.1.camel@blabla.mcs.anl.gov>
References: <AE66F8BB-C1AC-482A-8A86-6A5BAC7389E2@mcs.anl.gov>
	<1183745926.22318.1.camel@blabla.mcs.anl.gov>
	<D963B7C0-FFD4-4CF7-B2A8-B15BFEC5C4F6@mcs.anl.gov>
	<1183746540.23000.1.camel@blabla.mcs.anl.gov>
Message-ID: <C1E042E7-41E3-4393-8486-200C717F58D1@mcs.anl.gov>

Yep! Thanks for the tip (;

On Jul 6, 2007, at 1:29 PM, Mihael Hategan wrote:

> I'm assuming you see the problem with that.
>
> On Fri, 2007-07-06 at 13:20 -0500, Veronika Nefedova wrote:
>> Yes, I am:
>>
>> file outfiles <fixed_array_mapper;files="solv_chg.out, solv_disp.out,
>> solv_repu_0_0.2.out, solv_repu_0.2_0.3.out, solv_repu_0.3_0.4.out,
>> solv_repu_0.4_0.5.out, solv_repu_0.5_0.6.out, solv_repu_0.6_0.7.out,
>> solv_repu_0.7_0.8.out, solv_repu_0.8_0.9.out, solv_repu_0.9_1.out">;
>> outfiles [0] = CHARMM4 (solv_chg, whaminp, s1, "input:solv_chg");
>> <snip>
>>
>>
>> On Jul 6, 2007, at 1:18 PM, Mihael Hategan wrote:
>>
>>> Are you trying to access a file as an array? As in
>>> file x;
>>> x[1]
>>>
>>>
>>> On Fri, 2007-07-06 at 12:38 -0500, Veronika Nefedova wrote:
>>>> Hi, Mihael:
>>>>
>>>> I am testing now my new code (with loops and various string
>>>> operations!) but I am getting some Karajan errors. I am  
>>>> wondering if
>>>> you could point me to a possible reason for these errors? I do not
>>>> see any reference to my code so I am not sure where to start
>>>> looking...
>>>>
>>>> org.griphyn.vdl.mapping.InvalidPathException: Invalid path (1) for
>>>> type file
>>>> org.griphyn.vdl.mapping.InvalidPathException: Invalid path (1) for
>>>> type file
>>>> Caused by: org.griphyn.vdl.mapping.InvalidPathException: Invalid  
>>>> path
>>>> (1) for type file
>>>>          at org.griphyn.vdl.karajan.lib.GetField.function
>>>> (GetField.java:33)
>>>>          at org.griphyn.vdl.karajan.lib.VDLFunction.post
>>>> (VDLFunction.java:58)
>>>>          at
>>>> org.globus.cog.karajan.workflow.nodes.Sequential.startNext
>>>> (Sequential.java:51)
>>>>          at
>>>> org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren
>>>> (Sequential.java:27)
>>>>          at
>>>> org.globus.cog.karajan.workflow.nodes.FlowContainer.execute
>>>> (FlowContainer.java:63)
>>>>          at org.globus.cog.karajan.workflow.nodes.FlowNode.restart
>>>> (FlowNode.java:239)
>>>>          at org.globus.cog.karajan.workflow.nodes.FlowNode.start
>>>> (FlowNode.java:280)
>>>>          at
>>>> org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent
>>>> (FlowNode.java:392)
>>>>          at org.globus.cog.karajan.workflow.nodes.FlowNode.event
>>>> (FlowNode.java:331)
>>>>          at  
>>>> org.globus.cog.karajan.workflow.FlowElementWrapper.event
>>>> (FlowElementWrapper.java:227)
>>>>          at org.globus.cog.karajan.workflow.events.EventBus.send
>>>> (EventBus.java:123)
>>>>          at
>>>> org.globus.cog.karajan.workflow.events.EventBus.sendHooked
>>>> (EventBus.java:97)
>>>>          at org.globus.cog.karajan.workflow.events.EventWorker.run
>>>> (EventWorker.java:69)
>>>> Caused by: org.griphyn.vdl.mapping.InvalidPathException: Invalid  
>>>> path
>>>> (1) for type file
>>>>          at org.griphyn.vdl.mapping.AbstractDataNode.getFields
>>>> (AbstractDataNode.java:139)
>>>>          at org.griphyn.vdl.mapping.AbstractDataNode.getFields
>>>> (AbstractDataNode.java:114)
>>>>          at org.griphyn.vdl.karajan.lib.GetField.function
>>>> (GetField.java:25)
>>>>          ... 12 more
>>>> Execution failed:
>>>>          org.griphyn.vdl.mapping.InvalidPathException: Invalid path
>>>> (1) for type file
>>>>
>>>>
>>>> Thanks!
>>>>
>>>> Nika
>>>>
>>>
>>
>


From nefedova at mcs.anl.gov  Fri Jul  6 16:03:31 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Fri, 6 Jul 2007 16:03:31 -0500
Subject: [Swift-devel] wrong file staged in
Message-ID: <AD79852A-8053-4878-AEE3-4DA1EDBC41AC@mcs.anl.gov>

The wrong file was staged in during the 4th stage of the workflow...

I have this inside my foreach loop:
<snip>
file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">;
file solv_repu_0DOT9_1_b1_crd  <"solv_repu_0.9_1_b1.crd">;
file solv_repu_0DOT9_1_b1_out <"solv_repu_0.9_1_b1.out">;
file solv_repu_0DOT9_1_b1_done  <"solv_repu_0.9_1_b1_done">;

(whamfiles[67] , solv_repu_0DOT9_1_b1_crd, solv_repu_0DOT9_1_b1_out,  
solv_repu_0DO\
T9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft, rtf_file,  
prm_file, psf_file,\
crd_eq_file, solv_repu_0DOT9_1_b1_prt, ss1, s1, s2, s3, s4, s5, s7,  
"urandseed:59\
64163", sprt, "rcut1:0.9", "rcut2:1");
<snip>


The first  file (with DOT) is an input files for CHARMM3 and three  
last declared files (out, crd and done) are output files.

When I check my remote directory during execution, I see that the  
wrong files were staged in. In particular, the wrong prt file was  
staged in:

solv_disp_a3.prt instead of solv_repu_0.9_1_b1.prt  (aka  
solv_repu_0DOT9_1_b1_prt)

The solv_repu_0.9_1_b1.prt file is not produced by a previous stage,  
its being/supposed to be/ staged in from the submit host.

The above declaration is the only place where the file  
solv_repu_0DOT9_1_b1_prt is being declared in swift file (I did grep  
to check it). kml file also looks ok.

I am not sure why it has happened -- this piece of code has not been  
changed from the previous version...


This is the work directory for this job (CHARMM3) on TG-UC:

nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/ 
chrm_long-p2v28ydi> ls
m001_am1.prm           solv.inp          solv_m001_eq.crd          
stderr.txt
m001_am1.rtf           solv_disp_a3.out  solv_repu_0.9_1_b1.rst
parm03_gaff_all.rtf    solv_disp_a3.prt  solv_repu_0.9_1_b1.trj
parm03_gaffnb_all.prm  solv_m001.psf     solv_repu_0.9_1_b1.wham
nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/ 
chrm_long-p2v28ydi>

as you can see 2 files have the wrong names (solv_disp_a3 instead of  
solv_repu_0.9_1_b1 ) and execution is screwed up since the wrong  
parameter file (prt) was staged in...


I checked whether that file was even staged in to the remote host --  
in fact it was:

nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0>  
find */ -name solv_repu_0.9_1_b1.prt -print
shared/solv_repu_0.9_1_b1.prt
But it never went to the right working directory...

Any idea what is going on here?

Thanks,

Nika


From hategan at mcs.anl.gov  Fri Jul  6 16:31:18 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 06 Jul 2007 16:31:18 -0500
Subject: [Swift-devel] wrong file staged in
In-Reply-To: <AD79852A-8053-4878-AEE3-4DA1EDBC41AC@mcs.anl.gov>
References: <AD79852A-8053-4878-AEE3-4DA1EDBC41AC@mcs.anl.gov>
Message-ID: <1183757478.29416.1.camel@blabla.mcs.anl.gov>

Wonder if there is another declaration of the same variable mapped to
the wrong file.

On Fri, 2007-07-06 at 16:03 -0500, Veronika Nefedova wrote:
> The wrong file was staged in during the 4th stage of the workflow...
> 
> I have this inside my foreach loop:
> <snip>
> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">;
> file solv_repu_0DOT9_1_b1_crd  <"solv_repu_0.9_1_b1.crd">;
> file solv_repu_0DOT9_1_b1_out <"solv_repu_0.9_1_b1.out">;
> file solv_repu_0DOT9_1_b1_done  <"solv_repu_0.9_1_b1_done">;
> 
> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd, solv_repu_0DOT9_1_b1_out,  
> solv_repu_0DO\
> T9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft, rtf_file,  
> prm_file, psf_file,\
> crd_eq_file, solv_repu_0DOT9_1_b1_prt, ss1, s1, s2, s3, s4, s5, s7,  
> "urandseed:59\
> 64163", sprt, "rcut1:0.9", "rcut2:1");
> <snip>
> 
> 
> The first  file (with DOT) is an input files for CHARMM3 and three  
> last declared files (out, crd and done) are output files.
> 
> When I check my remote directory during execution, I see that the  
> wrong files were staged in. In particular, the wrong prt file was  
> staged in:
> 
> solv_disp_a3.prt instead of solv_repu_0.9_1_b1.prt  (aka  
> solv_repu_0DOT9_1_b1_prt)
> 
> The solv_repu_0.9_1_b1.prt file is not produced by a previous stage,  
> its being/supposed to be/ staged in from the submit host.
> 
> The above declaration is the only place where the file  
> solv_repu_0DOT9_1_b1_prt is being declared in swift file (I did grep  
> to check it). kml file also looks ok.
> 
> I am not sure why it has happened -- this piece of code has not been  
> changed from the previous version...
> 
> 
> This is the work directory for this job (CHARMM3) on TG-UC:
> 
> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/ 
> chrm_long-p2v28ydi> ls
> m001_am1.prm           solv.inp          solv_m001_eq.crd          
> stderr.txt
> m001_am1.rtf           solv_disp_a3.out  solv_repu_0.9_1_b1.rst
> parm03_gaff_all.rtf    solv_disp_a3.prt  solv_repu_0.9_1_b1.trj
> parm03_gaffnb_all.prm  solv_m001.psf     solv_repu_0.9_1_b1.wham
> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/ 
> chrm_long-p2v28ydi>
> 
> as you can see 2 files have the wrong names (solv_disp_a3 instead of  
> solv_repu_0.9_1_b1 ) and execution is screwed up since the wrong  
> parameter file (prt) was staged in...
> 
> 
> I checked whether that file was even staged in to the remote host --  
> in fact it was:
> 
> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0>  
> find */ -name solv_repu_0.9_1_b1.prt -print
> shared/solv_repu_0.9_1_b1.prt
> But it never went to the right working directory...
> 
> Any idea what is going on here?
> 
> Thanks,
> 
> Nika
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 


From nefedova at mcs.anl.gov  Fri Jul  6 16:37:19 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Fri, 6 Jul 2007 16:37:19 -0500
Subject: [Swift-devel] wrong file staged in
In-Reply-To: <1183757478.29416.1.camel@blabla.mcs.anl.gov>
References: <AD79852A-8053-4878-AEE3-4DA1EDBC41AC@mcs.anl.gov>
	<1183757478.29416.1.camel@blabla.mcs.anl.gov>
Message-ID: <FF707172-3591-4179-B549-C49DC155ABA6@mcs.anl.gov>

Nope... I checked with grep:

nefedova at viper:~/alamines> grep solv_repu_0DOT9_1_b1_prt MolDyn.dtm
file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">;
(whamfiles[67] , solv_repu_0DOT9_1_b1_crd, solv_repu_0DOT9_1_b1_out,  
solv_repu_0DOT9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft,  
rtf_file, prm_file, psf_file, crd_eq_file, solv_repu_0DOT9_1_b1_prt,  
ss1, s1, s2, s3, s4, s5, s7, "urandseed:5964163", sprt, "rcut1:0.9",  
"rcut2:1");
nefedova at viper:~/alamines>

On Jul 6, 2007, at 4:31 PM, Mihael Hategan wrote:

> Wonder if there is another declaration of the same variable mapped to
> the wrong file.
>
> On Fri, 2007-07-06 at 16:03 -0500, Veronika Nefedova wrote:
>> The wrong file was staged in during the 4th stage of the workflow...
>>
>> I have this inside my foreach loop:
>> <snip>
>> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">;
>> file solv_repu_0DOT9_1_b1_crd  <"solv_repu_0.9_1_b1.crd">;
>> file solv_repu_0DOT9_1_b1_out <"solv_repu_0.9_1_b1.out">;
>> file solv_repu_0DOT9_1_b1_done  <"solv_repu_0.9_1_b1_done">;
>>
>> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd, solv_repu_0DOT9_1_b1_out,
>> solv_repu_0DO\
>> T9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft, rtf_file,
>> prm_file, psf_file,\
>> crd_eq_file, solv_repu_0DOT9_1_b1_prt, ss1, s1, s2, s3, s4, s5, s7,
>> "urandseed:59\
>> 64163", sprt, "rcut1:0.9", "rcut2:1");
>> <snip>
>>
>>
>> The first  file (with DOT) is an input files for CHARMM3 and three
>> last declared files (out, crd and done) are output files.
>>
>> When I check my remote directory during execution, I see that the
>> wrong files were staged in. In particular, the wrong prt file was
>> staged in:
>>
>> solv_disp_a3.prt instead of solv_repu_0.9_1_b1.prt  (aka
>> solv_repu_0DOT9_1_b1_prt)
>>
>> The solv_repu_0.9_1_b1.prt file is not produced by a previous stage,
>> its being/supposed to be/ staged in from the submit host.
>>
>> The above declaration is the only place where the file
>> solv_repu_0DOT9_1_b1_prt is being declared in swift file (I did grep
>> to check it). kml file also looks ok.
>>
>> I am not sure why it has happened -- this piece of code has not been
>> changed from the previous version...
>>
>>
>> This is the work directory for this job (CHARMM3) on TG-UC:
>>
>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/
>> chrm_long-p2v28ydi> ls
>> m001_am1.prm           solv.inp          solv_m001_eq.crd
>> stderr.txt
>> m001_am1.rtf           solv_disp_a3.out  solv_repu_0.9_1_b1.rst
>> parm03_gaff_all.rtf    solv_disp_a3.prt  solv_repu_0.9_1_b1.trj
>> parm03_gaffnb_all.prm  solv_m001.psf     solv_repu_0.9_1_b1.wham
>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/
>> chrm_long-p2v28ydi>
>>
>> as you can see 2 files have the wrong names (solv_disp_a3 instead of
>> solv_repu_0.9_1_b1 ) and execution is screwed up since the wrong
>> parameter file (prt) was staged in...
>>
>>
>> I checked whether that file was even staged in to the remote host --
>> in fact it was:
>>
>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0>
>> find */ -name solv_repu_0.9_1_b1.prt -print
>> shared/solv_repu_0.9_1_b1.prt
>> But it never went to the right working directory...
>>
>> Any idea what is going on here?
>>
>> Thanks,
>>
>> Nika
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>


From hategan at mcs.anl.gov  Fri Jul  6 16:39:19 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 06 Jul 2007 16:39:19 -0500
Subject: [Swift-devel] wrong file staged in
In-Reply-To: <FF707172-3591-4179-B549-C49DC155ABA6@mcs.anl.gov>
References: <AD79852A-8053-4878-AEE3-4DA1EDBC41AC@mcs.anl.gov>
	<1183757478.29416.1.camel@blabla.mcs.anl.gov>
	<FF707172-3591-4179-B549-C49DC155ABA6@mcs.anl.gov>
Message-ID: <1183757959.29798.0.camel@blabla.mcs.anl.gov>

Consistent or intermittent behavior?

Also, can you attach the swift source?

On Fri, 2007-07-06 at 16:37 -0500, Veronika Nefedova wrote:
> Nope... I checked with grep:
> 
> nefedova at viper:~/alamines> grep solv_repu_0DOT9_1_b1_prt MolDyn.dtm
> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">;
> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd, solv_repu_0DOT9_1_b1_out,  
> solv_repu_0DOT9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft,  
> rtf_file, prm_file, psf_file, crd_eq_file, solv_repu_0DOT9_1_b1_prt,  
> ss1, s1, s2, s3, s4, s5, s7, "urandseed:5964163", sprt, "rcut1:0.9",  
> "rcut2:1");
> nefedova at viper:~/alamines>
> 
> On Jul 6, 2007, at 4:31 PM, Mihael Hategan wrote:
> 
> > Wonder if there is another declaration of the same variable mapped to
> > the wrong file.
> >
> > On Fri, 2007-07-06 at 16:03 -0500, Veronika Nefedova wrote:
> >> The wrong file was staged in during the 4th stage of the workflow...
> >>
> >> I have this inside my foreach loop:
> >> <snip>
> >> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">;
> >> file solv_repu_0DOT9_1_b1_crd  <"solv_repu_0.9_1_b1.crd">;
> >> file solv_repu_0DOT9_1_b1_out <"solv_repu_0.9_1_b1.out">;
> >> file solv_repu_0DOT9_1_b1_done  <"solv_repu_0.9_1_b1_done">;
> >>
> >> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd, solv_repu_0DOT9_1_b1_out,
> >> solv_repu_0DO\
> >> T9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft, rtf_file,
> >> prm_file, psf_file,\
> >> crd_eq_file, solv_repu_0DOT9_1_b1_prt, ss1, s1, s2, s3, s4, s5, s7,
> >> "urandseed:59\
> >> 64163", sprt, "rcut1:0.9", "rcut2:1");
> >> <snip>
> >>
> >>
> >> The first  file (with DOT) is an input files for CHARMM3 and three
> >> last declared files (out, crd and done) are output files.
> >>
> >> When I check my remote directory during execution, I see that the
> >> wrong files were staged in. In particular, the wrong prt file was
> >> staged in:
> >>
> >> solv_disp_a3.prt instead of solv_repu_0.9_1_b1.prt  (aka
> >> solv_repu_0DOT9_1_b1_prt)
> >>
> >> The solv_repu_0.9_1_b1.prt file is not produced by a previous stage,
> >> its being/supposed to be/ staged in from the submit host.
> >>
> >> The above declaration is the only place where the file
> >> solv_repu_0DOT9_1_b1_prt is being declared in swift file (I did grep
> >> to check it). kml file also looks ok.
> >>
> >> I am not sure why it has happened -- this piece of code has not been
> >> changed from the previous version...
> >>
> >>
> >> This is the work directory for this job (CHARMM3) on TG-UC:
> >>
> >> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/
> >> chrm_long-p2v28ydi> ls
> >> m001_am1.prm           solv.inp          solv_m001_eq.crd
> >> stderr.txt
> >> m001_am1.rtf           solv_disp_a3.out  solv_repu_0.9_1_b1.rst
> >> parm03_gaff_all.rtf    solv_disp_a3.prt  solv_repu_0.9_1_b1.trj
> >> parm03_gaffnb_all.prm  solv_m001.psf     solv_repu_0.9_1_b1.wham
> >> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/
> >> chrm_long-p2v28ydi>
> >>
> >> as you can see 2 files have the wrong names (solv_disp_a3 instead of
> >> solv_repu_0.9_1_b1 ) and execution is screwed up since the wrong
> >> parameter file (prt) was staged in...
> >>
> >>
> >> I checked whether that file was even staged in to the remote host --
> >> in fact it was:
> >>
> >> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0>
> >> find */ -name solv_repu_0.9_1_b1.prt -print
> >> shared/solv_repu_0.9_1_b1.prt
> >> But it never went to the right working directory...
> >>
> >> Any idea what is going on here?
> >>
> >> Thanks,
> >>
> >> Nika
> >>
> >> _______________________________________________
> >> Swift-devel mailing list
> >> Swift-devel at ci.uchicago.edu
> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>
> >
> 


From nefedova at mcs.anl.gov  Fri Jul  6 16:44:22 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Fri, 6 Jul 2007 16:44:22 -0500
Subject: [Swift-devel] wrong file staged in
In-Reply-To: <1183757959.29798.0.camel@blabla.mcs.anl.gov>
References: <AD79852A-8053-4878-AEE3-4DA1EDBC41AC@mcs.anl.gov>
	<1183757478.29416.1.camel@blabla.mcs.anl.gov>
	<FF707172-3591-4179-B549-C49DC155ABA6@mcs.anl.gov>
	<1183757959.29798.0.camel@blabla.mcs.anl.gov>
Message-ID: <07647021-AA1B-4231-85EB-D922DA485687@mcs.anl.gov>

I put the dtm file on terminable in ~nefedova/MolDyn.dtm

I see a few more directories with wrong files staged in, but I didn't  
check them all (130+ of them). I saw at least one with the correct  
files staged in.

Nika

On Jul 6, 2007, at 4:39 PM, Mihael Hategan wrote:

> Consistent or intermittent behavior?
>
> Also, can you attach the swift source?
>
> On Fri, 2007-07-06 at 16:37 -0500, Veronika Nefedova wrote:
>> Nope... I checked with grep:
>>
>> nefedova at viper:~/alamines> grep solv_repu_0DOT9_1_b1_prt MolDyn.dtm
>> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">;
>> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd, solv_repu_0DOT9_1_b1_out,
>> solv_repu_0DOT9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft,
>> rtf_file, prm_file, psf_file, crd_eq_file, solv_repu_0DOT9_1_b1_prt,
>> ss1, s1, s2, s3, s4, s5, s7, "urandseed:5964163", sprt, "rcut1:0.9",
>> "rcut2:1");
>> nefedova at viper:~/alamines>
>>
>> On Jul 6, 2007, at 4:31 PM, Mihael Hategan wrote:
>>
>>> Wonder if there is another declaration of the same variable  
>>> mapped to
>>> the wrong file.
>>>
>>> On Fri, 2007-07-06 at 16:03 -0500, Veronika Nefedova wrote:
>>>> The wrong file was staged in during the 4th stage of the  
>>>> workflow...
>>>>
>>>> I have this inside my foreach loop:
>>>> <snip>
>>>> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">;
>>>> file solv_repu_0DOT9_1_b1_crd  <"solv_repu_0.9_1_b1.crd">;
>>>> file solv_repu_0DOT9_1_b1_out <"solv_repu_0.9_1_b1.out">;
>>>> file solv_repu_0DOT9_1_b1_done  <"solv_repu_0.9_1_b1_done">;
>>>>
>>>> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd,  
>>>> solv_repu_0DOT9_1_b1_out,
>>>> solv_repu_0DO\
>>>> T9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft, rtf_file,
>>>> prm_file, psf_file,\
>>>> crd_eq_file, solv_repu_0DOT9_1_b1_prt, ss1, s1, s2, s3, s4, s5, s7,
>>>> "urandseed:59\
>>>> 64163", sprt, "rcut1:0.9", "rcut2:1");
>>>> <snip>
>>>>
>>>>
>>>> The first  file (with DOT) is an input files for CHARMM3 and three
>>>> last declared files (out, crd and done) are output files.
>>>>
>>>> When I check my remote directory during execution, I see that the
>>>> wrong files were staged in. In particular, the wrong prt file was
>>>> staged in:
>>>>
>>>> solv_disp_a3.prt instead of solv_repu_0.9_1_b1.prt  (aka
>>>> solv_repu_0DOT9_1_b1_prt)
>>>>
>>>> The solv_repu_0.9_1_b1.prt file is not produced by a previous  
>>>> stage,
>>>> its being/supposed to be/ staged in from the submit host.
>>>>
>>>> The above declaration is the only place where the file
>>>> solv_repu_0DOT9_1_b1_prt is being declared in swift file (I did  
>>>> grep
>>>> to check it). kml file also looks ok.
>>>>
>>>> I am not sure why it has happened -- this piece of code has not  
>>>> been
>>>> changed from the previous version...
>>>>
>>>>
>>>> This is the work directory for this job (CHARMM3) on TG-UC:
>>>>
>>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/
>>>> chrm_long-p2v28ydi> ls
>>>> m001_am1.prm           solv.inp          solv_m001_eq.crd
>>>> stderr.txt
>>>> m001_am1.rtf           solv_disp_a3.out  solv_repu_0.9_1_b1.rst
>>>> parm03_gaff_all.rtf    solv_disp_a3.prt  solv_repu_0.9_1_b1.trj
>>>> parm03_gaffnb_all.prm  solv_m001.psf     solv_repu_0.9_1_b1.wham
>>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/
>>>> chrm_long-p2v28ydi>
>>>>
>>>> as you can see 2 files have the wrong names (solv_disp_a3  
>>>> instead of
>>>> solv_repu_0.9_1_b1 ) and execution is screwed up since the wrong
>>>> parameter file (prt) was staged in...
>>>>
>>>>
>>>> I checked whether that file was even staged in to the remote  
>>>> host --
>>>> in fact it was:
>>>>
>>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0>
>>>> find */ -name solv_repu_0.9_1_b1.prt -print
>>>> shared/solv_repu_0.9_1_b1.prt
>>>> But it never went to the right working directory...
>>>>
>>>> Any idea what is going on here?
>>>>
>>>> Thanks,
>>>>
>>>> Nika
>>>>
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>
>>>
>>
>


From hategan at mcs.anl.gov  Fri Jul  6 16:49:39 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 06 Jul 2007 16:49:39 -0500
Subject: [Swift-devel] wrong file staged in
In-Reply-To: <07647021-AA1B-4231-85EB-D922DA485687@mcs.anl.gov>
References: <AD79852A-8053-4878-AEE3-4DA1EDBC41AC@mcs.anl.gov>
	<1183757478.29416.1.camel@blabla.mcs.anl.gov>
	<FF707172-3591-4179-B549-C49DC155ABA6@mcs.anl.gov>
	<1183757959.29798.0.camel@blabla.mcs.anl.gov>
	<07647021-AA1B-4231-85EB-D922DA485687@mcs.anl.gov>
Message-ID: <1183758579.30227.1.camel@blabla.mcs.anl.gov>

On Fri, 2007-07-06 at 16:44 -0500, Veronika Nefedova wrote:
> I put the dtm file on terminable in ~nefedova/MolDyn.dtm
> 
> I see a few more directories with wrong files staged in, but I
> didn't  
> check them all (130+ of them). I saw at least one with the correct  
> files staged in.

Across different runs that is. Do you get the exact same mess-up, or is
it different?

> 
> Nika
> 
> On Jul 6, 2007, at 4:39 PM, Mihael Hategan wrote:
> 
> > Consistent or intermittent behavior?
> >
> > Also, can you attach the swift source?
> >
> > On Fri, 2007-07-06 at 16:37 -0500, Veronika Nefedova wrote:
> >> Nope... I checked with grep:
> >>
> >> nefedova at viper:~/alamines> grep solv_repu_0DOT9_1_b1_prt MolDyn.dtm
> >> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">;
> >> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd, solv_repu_0DOT9_1_b1_out,
> >> solv_repu_0DOT9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft,
> >> rtf_file, prm_file, psf_file, crd_eq_file, solv_repu_0DOT9_1_b1_prt,
> >> ss1, s1, s2, s3, s4, s5, s7, "urandseed:5964163", sprt, "rcut1:0.9",
> >> "rcut2:1");
> >> nefedova at viper:~/alamines>
> >>
> >> On Jul 6, 2007, at 4:31 PM, Mihael Hategan wrote:
> >>
> >>> Wonder if there is another declaration of the same variable  
> >>> mapped to
> >>> the wrong file.
> >>>
> >>> On Fri, 2007-07-06 at 16:03 -0500, Veronika Nefedova wrote:
> >>>> The wrong file was staged in during the 4th stage of the  
> >>>> workflow...
> >>>>
> >>>> I have this inside my foreach loop:
> >>>> <snip>
> >>>> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">;
> >>>> file solv_repu_0DOT9_1_b1_crd  <"solv_repu_0.9_1_b1.crd">;
> >>>> file solv_repu_0DOT9_1_b1_out <"solv_repu_0.9_1_b1.out">;
> >>>> file solv_repu_0DOT9_1_b1_done  <"solv_repu_0.9_1_b1_done">;
> >>>>
> >>>> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd,  
> >>>> solv_repu_0DOT9_1_b1_out,
> >>>> solv_repu_0DO\
> >>>> T9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft, rtf_file,
> >>>> prm_file, psf_file,\
> >>>> crd_eq_file, solv_repu_0DOT9_1_b1_prt, ss1, s1, s2, s3, s4, s5, s7,
> >>>> "urandseed:59\
> >>>> 64163", sprt, "rcut1:0.9", "rcut2:1");
> >>>> <snip>
> >>>>
> >>>>
> >>>> The first  file (with DOT) is an input files for CHARMM3 and three
> >>>> last declared files (out, crd and done) are output files.
> >>>>
> >>>> When I check my remote directory during execution, I see that the
> >>>> wrong files were staged in. In particular, the wrong prt file was
> >>>> staged in:
> >>>>
> >>>> solv_disp_a3.prt instead of solv_repu_0.9_1_b1.prt  (aka
> >>>> solv_repu_0DOT9_1_b1_prt)
> >>>>
> >>>> The solv_repu_0.9_1_b1.prt file is not produced by a previous  
> >>>> stage,
> >>>> its being/supposed to be/ staged in from the submit host.
> >>>>
> >>>> The above declaration is the only place where the file
> >>>> solv_repu_0DOT9_1_b1_prt is being declared in swift file (I did  
> >>>> grep
> >>>> to check it). kml file also looks ok.
> >>>>
> >>>> I am not sure why it has happened -- this piece of code has not  
> >>>> been
> >>>> changed from the previous version...
> >>>>
> >>>>
> >>>> This is the work directory for this job (CHARMM3) on TG-UC:
> >>>>
> >>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/
> >>>> chrm_long-p2v28ydi> ls
> >>>> m001_am1.prm           solv.inp          solv_m001_eq.crd
> >>>> stderr.txt
> >>>> m001_am1.rtf           solv_disp_a3.out  solv_repu_0.9_1_b1.rst
> >>>> parm03_gaff_all.rtf    solv_disp_a3.prt  solv_repu_0.9_1_b1.trj
> >>>> parm03_gaffnb_all.prm  solv_m001.psf     solv_repu_0.9_1_b1.wham
> >>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/
> >>>> chrm_long-p2v28ydi>
> >>>>
> >>>> as you can see 2 files have the wrong names (solv_disp_a3  
> >>>> instead of
> >>>> solv_repu_0.9_1_b1 ) and execution is screwed up since the wrong
> >>>> parameter file (prt) was staged in...
> >>>>
> >>>>
> >>>> I checked whether that file was even staged in to the remote  
> >>>> host --
> >>>> in fact it was:
> >>>>
> >>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0>
> >>>> find */ -name solv_repu_0.9_1_b1.prt -print
> >>>> shared/solv_repu_0.9_1_b1.prt
> >>>> But it never went to the right working directory...
> >>>>
> >>>> Any idea what is going on here?
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Nika
> >>>>
> >>>> _______________________________________________
> >>>> Swift-devel mailing list
> >>>> Swift-devel at ci.uchicago.edu
> >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>>>
> >>>
> >>
> >
> 


From nefedova at mcs.anl.gov  Fri Jul  6 16:53:58 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Fri, 6 Jul 2007 16:53:58 -0500
Subject: [Swift-devel] wrong file staged in
In-Reply-To: <1183758579.30227.1.camel@blabla.mcs.anl.gov>
References: <AD79852A-8053-4878-AEE3-4DA1EDBC41AC@mcs.anl.gov>
	<1183757478.29416.1.camel@blabla.mcs.anl.gov>
	<FF707172-3591-4179-B549-C49DC155ABA6@mcs.anl.gov>
	<1183757959.29798.0.camel@blabla.mcs.anl.gov>
	<07647021-AA1B-4231-85EB-D922DA485687@mcs.anl.gov>
	<1183758579.30227.1.camel@blabla.mcs.anl.gov>
Message-ID: <5F6F8624-2C89-431E-A643-7A3ED13E6BD2@mcs.anl.gov>

I didn't try another run. Something was really weird during that run.  
Some jobs just failed because the executable failed:
stderr.txt:
forrtl: No such file or directory
/home/ydeng/c34a2/exec/ia64/charmm: relocation error: /soft/intel- 
c-9.1.049-f-9.1.045/lib/libunwind.so.6: undefined symbol: ? 
1__serial_memmove

But the jobs with wrong files staged in were running (the same  
executable)...

I can repeat the run again now.

Nika

On Jul 6, 2007, at 4:49 PM, Mihael Hategan wrote:

> On Fri, 2007-07-06 at 16:44 -0500, Veronika Nefedova wrote:
>> I put the dtm file on terminable in ~nefedova/MolDyn.dtm
>>
>> I see a few more directories with wrong files staged in, but I
>> didn't
>> check them all (130+ of them). I saw at least one with the correct
>> files staged in.
>
> Across different runs that is. Do you get the exact same mess-up,  
> or is
> it different?
>
>>
>> Nika
>>
>> On Jul 6, 2007, at 4:39 PM, Mihael Hategan wrote:
>>
>>> Consistent or intermittent behavior?
>>>
>>> Also, can you attach the swift source?
>>>
>>> On Fri, 2007-07-06 at 16:37 -0500, Veronika Nefedova wrote:
>>>> Nope... I checked with grep:
>>>>
>>>> nefedova at viper:~/alamines> grep solv_repu_0DOT9_1_b1_prt MolDyn.dtm
>>>> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">;
>>>> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd,  
>>>> solv_repu_0DOT9_1_b1_out,
>>>> solv_repu_0DOT9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft,
>>>> rtf_file, prm_file, psf_file, crd_eq_file,  
>>>> solv_repu_0DOT9_1_b1_prt,
>>>> ss1, s1, s2, s3, s4, s5, s7, "urandseed:5964163", sprt,  
>>>> "rcut1:0.9",
>>>> "rcut2:1");
>>>> nefedova at viper:~/alamines>
>>>>
>>>> On Jul 6, 2007, at 4:31 PM, Mihael Hategan wrote:
>>>>
>>>>> Wonder if there is another declaration of the same variable
>>>>> mapped to
>>>>> the wrong file.
>>>>>
>>>>> On Fri, 2007-07-06 at 16:03 -0500, Veronika Nefedova wrote:
>>>>>> The wrong file was staged in during the 4th stage of the
>>>>>> workflow...
>>>>>>
>>>>>> I have this inside my foreach loop:
>>>>>> <snip>
>>>>>> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">;
>>>>>> file solv_repu_0DOT9_1_b1_crd  <"solv_repu_0.9_1_b1.crd">;
>>>>>> file solv_repu_0DOT9_1_b1_out <"solv_repu_0.9_1_b1.out">;
>>>>>> file solv_repu_0DOT9_1_b1_done  <"solv_repu_0.9_1_b1_done">;
>>>>>>
>>>>>> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd,
>>>>>> solv_repu_0DOT9_1_b1_out,
>>>>>> solv_repu_0DO\
>>>>>> T9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft, rtf_file,
>>>>>> prm_file, psf_file,\
>>>>>> crd_eq_file, solv_repu_0DOT9_1_b1_prt, ss1, s1, s2, s3, s4,  
>>>>>> s5, s7,
>>>>>> "urandseed:59\
>>>>>> 64163", sprt, "rcut1:0.9", "rcut2:1");
>>>>>> <snip>
>>>>>>
>>>>>>
>>>>>> The first  file (with DOT) is an input files for CHARMM3 and  
>>>>>> three
>>>>>> last declared files (out, crd and done) are output files.
>>>>>>
>>>>>> When I check my remote directory during execution, I see that the
>>>>>> wrong files were staged in. In particular, the wrong prt file was
>>>>>> staged in:
>>>>>>
>>>>>> solv_disp_a3.prt instead of solv_repu_0.9_1_b1.prt  (aka
>>>>>> solv_repu_0DOT9_1_b1_prt)
>>>>>>
>>>>>> The solv_repu_0.9_1_b1.prt file is not produced by a previous
>>>>>> stage,
>>>>>> its being/supposed to be/ staged in from the submit host.
>>>>>>
>>>>>> The above declaration is the only place where the file
>>>>>> solv_repu_0DOT9_1_b1_prt is being declared in swift file (I did
>>>>>> grep
>>>>>> to check it). kml file also looks ok.
>>>>>>
>>>>>> I am not sure why it has happened -- this piece of code has not
>>>>>> been
>>>>>> changed from the previous version...
>>>>>>
>>>>>>
>>>>>> This is the work directory for this job (CHARMM3) on TG-UC:
>>>>>>
>>>>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn- 
>>>>>> zvlc1f9c03pf0/
>>>>>> chrm_long-p2v28ydi> ls
>>>>>> m001_am1.prm           solv.inp          solv_m001_eq.crd
>>>>>> stderr.txt
>>>>>> m001_am1.rtf           solv_disp_a3.out  solv_repu_0.9_1_b1.rst
>>>>>> parm03_gaff_all.rtf    solv_disp_a3.prt  solv_repu_0.9_1_b1.trj
>>>>>> parm03_gaffnb_all.prm  solv_m001.psf     solv_repu_0.9_1_b1.wham
>>>>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn- 
>>>>>> zvlc1f9c03pf0/
>>>>>> chrm_long-p2v28ydi>
>>>>>>
>>>>>> as you can see 2 files have the wrong names (solv_disp_a3
>>>>>> instead of
>>>>>> solv_repu_0.9_1_b1 ) and execution is screwed up since the wrong
>>>>>> parameter file (prt) was staged in...
>>>>>>
>>>>>>
>>>>>> I checked whether that file was even staged in to the remote
>>>>>> host --
>>>>>> in fact it was:
>>>>>>
>>>>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn- 
>>>>>> zvlc1f9c03pf0>
>>>>>> find */ -name solv_repu_0.9_1_b1.prt -print
>>>>>> shared/solv_repu_0.9_1_b1.prt
>>>>>> But it never went to the right working directory...
>>>>>>
>>>>>> Any idea what is going on here?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Nika
>>>>>>
>>>>>> _______________________________________________
>>>>>> Swift-devel mailing list
>>>>>> Swift-devel at ci.uchicago.edu
>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>
>>>>>
>>>>
>>>
>>
>


From benc at hawaga.org.uk  Fri Jul  6 22:09:43 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Sat, 7 Jul 2007 03:09:43 +0000 (GMT)
Subject: [Swift-devel] wrong file staged in
In-Reply-To: <5F6F8624-2C89-431E-A643-7A3ED13E6BD2@mcs.anl.gov>
References: <AD79852A-8053-4878-AEE3-4DA1EDBC41AC@mcs.anl.gov>
	<1183757478.29416.1.camel@blabla.mcs.anl.gov>
	<FF707172-3591-4179-B549-C49DC155ABA6@mcs.anl.gov>
	<1183757959.29798.0.camel@blabla.mcs.anl.gov>
	<07647021-AA1B-4231-85EB-D922DA485687@mcs.anl.gov>
	<1183758579.30227.1.camel@blabla.mcs.anl.gov>
	<5F6F8624-2C89-431E-A643-7A3ED13E6BD2@mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707070308590.10289@dildano.hawaga.org.uk>


On Fri, 6 Jul 2007, Veronika Nefedova wrote:

> I can repeat the run again now.

successfully?

-- 


From benc at hawaga.org.uk  Sat Jul  7 02:38:14 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Sat, 7 Jul 2007 13:08:14 +0530 (IST)
Subject: [Swift-devel] Re: mapper syntax
In-Reply-To: <Pine.OSX.4.64.0707032321100.6578@soju.hawaga.org.uk>
References: <Pine.OSX.4.64.0707032321100.6578@soju.hawaga.org.uk>
Message-ID: <Pine.OSX.4.64.0707071307120.580@soju.hawaga.org.uk>


On Tue, 3 Jul 2007, Ben Clifford wrote:

> The syntax:
> 
>   imagefiles if[] 
> <my_mapper;foo=@strcat(filename,blah),otherparm=true,moreparams=false>;
> 
> is rather noisy all on one line.
> 
> A syntax change could be to express the above as:
> 
>   imagefiles if[] map my_mapper {
>     foo = @strcat(filename,blah);
>     otherparam = true;
>     moreparams = false;
>   };

I realised the present syntax admits enough whitespace for a multi-line 
representation, thusly:

foreach s in array {
  messagefile outfile <
      single_file_mapper;
      file=@strcat("051-foreach.",s,".out")
     >;
  outfile = greeting(s);
}

-- 


From tiberius at ci.uchicago.edu  Sun Jul  8 23:35:39 2007
From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun)
Date: Sun, 8 Jul 2007 23:35:39 -0500
Subject: [Swift-devel] dot files by default
In-Reply-To: <Pine.OSX.4.64.0707041106210.1364@soju.hawaga.org.uk>
References: <Pine.OSX.4.64.0707041106210.1364@soju.hawaga.org.uk>
Message-ID: <fec1351f0707082135p9435f4dx557c2f8f7e7016ec@mail.gmail.com>

Add a command line option: --gengraph=false by default.
Probably it makes more sense in terms of the cleanness of the output.

On 7/4/07, Ben Clifford <benc at hawaga.org.uk> wrote:
> does anyone have preference about whether .dot graphviz files are
> generated by default or not?
>
> I find them a bit annoying in as much as they double the number of run
> files in my working directories to no immediate benefit.
>
> --
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


-- 
Tiberiu (Tibi) Stef-Praun, PhD
Research Staff, Computation Institute
5640 S. Ellis Ave, #405
University of Chicago
http://www-unix.mcs.anl.gov/~tiberius/


From hategan at mcs.anl.gov  Sun Jul  8 23:43:48 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Sun, 08 Jul 2007 23:43:48 -0500
Subject: [Swift-devel] dot files by default
In-Reply-To: <fec1351f0707082135p9435f4dx557c2f8f7e7016ec@mail.gmail.com>
References: <Pine.OSX.4.64.0707041106210.1364@soju.hawaga.org.uk>
	<fec1351f0707082135p9435f4dx557c2f8f7e7016ec@mail.gmail.com>
Message-ID: <1183956228.4067.2.camel@blabla.mcs.anl.gov>

To quote swift -help:
[-pgraph <true|false|<filename>>]
      Whether to generate a provenance graph or not. If a 'true' is
      used, the file name for the graph will be chosen by swift.

This can also be set in swift.properties.

The default is 'true'. The issue is whether to switch the default to
'false'.

On Sun, 2007-07-08 at 23:35 -0500, Tiberiu Stef-Praun wrote:
> Add a command line option: --gengraph=false by default.
> Probably it makes more sense in terms of the cleanness of the output.
> 
> On 7/4/07, Ben Clifford <benc at hawaga.org.uk> wrote:
> > does anyone have preference about whether .dot graphviz files are
> > generated by default or not?
> >
> > I find them a bit annoying in as much as they double the number of run
> > files in my working directories to no immediate benefit.
> >
> > --
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> 
> 


From benc at hawaga.org.uk  Sun Jul  8 00:56:21 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Sun, 8 Jul 2007 11:26:21 +0530 (IST)
Subject: [Swift-devel] mapping and primitive types
Message-ID: <Pine.OSX.4.64.0707081114130.9771@soju.hawaga.org.uk>


The runtime has the concept of 'primitive' types - these are types such as 
int, float, string.

If a type is primitive, it is not staged in or out during procedure 
execution. This is (I think) the only difference in behaviour.

However, this isn't implemented particularly nicely.

If I run program A below, with the output mapped like this:

   messagefile outfile <"055-pass-int.out">;

then I get output in a file called 055-pass-int.out.

However, if I run program B below, which is similar but declares its 
output like this:

   int outfile <"056-pass-int.out">;

then the output file is not staged back, but no error is given suggesting 
that it is unwise to map an integer to a file.

I see why that is in the implementation, but its not pleasing from a user 
perspective.

Should it be possible to map a 'primitive' type?

If yes, then the below two programs should work.

If no, then program B should produce a sensible error message.

I think the answer should be 'yes' - there seems to be a long term desire 
to be able to access mapped data in the language (for example, to run a 
program to determine if an iterative process has converged, outputting a 
boolean, and use that boolean as a condition in a loop).


PROGRAM A
=========

type messagefile {}

(messagefile t) greeting(string m, int i) { 
    app {
        echo i stdout=@filename(t);
    }
}

messagefile outfile <"055-pass-int.out">;

int luftballons;

luftballons = 99;

outfile = greeting("hi", luftballons);


PROGRAM B
=========

(int t) greeting(string m, int i) { 
    app {
        echo i stdout=@filename(t);
    }
}

int outfile <"056-pass-int.out">;

int luftballons;

luftballons = 99;

outfile = greeting("hi", luftballons);


-- 


From benc at hawaga.org.uk  Sun Jul  8 09:22:29 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Sun, 8 Jul 2007 19:52:29 +0530 (IST)
Subject: [Swift-devel] arrays-of-arrays
Message-ID: <Pine.OSX.4.64.0707081951330.16322@soju.hawaga.org.uk>

The present language syntax does not admit arrays-of-arrays, with 
expressions such as a[5][3]. However, I don't see anything particularly 
constraining in the implementation to require this. Does anyone have 
preference?
-- 


From bugzilla-daemon at mcs.anl.gov  Mon Jul  9 08:56:06 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Mon,  9 Jul 2007 08:56:06 -0500 (CDT)
Subject: [Swift-devel] [Bug 80] New: simple_mapper strange prefix behaviour
Message-ID: <bug-80-21@http.bugzilla.mcs.anl.gov/swift/>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=80

           Summary: simple_mapper strange prefix behaviour
           Product: Swift
           Version: unspecified
          Platform: Macintosh
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P2
         Component: General
        AssignedTo: benc at hawaga.org.uk
        ReportedBy: benc at hawaga.org.uk
                CC: swift-devel at ci.uchicago.edu


The following program generates output files z.3.out and z.7.out. This is what
I expected.

Substituting prefix to be "99" instead of "z" produces files: 0099.3.out and
0099.7.out - the array index value is padded to four digits. This is slightly
surprising.

And substituting prefix to be "99-" causes an execution failure like this:
Swift v0.1-dev

RunID: spqficzyd1ey1
Execution failed:
        For input string: "99-"

which is very surprising.

It looks as if the mapper is trying to find structure (unsuccessfully) inside
prefix when perhaps it shouldn't.

This is with swift r900.

Program follows:

type messagefile {}

(messagefile t) greeting() { 
    app {
        echo "hello" stdout=@filename(t);
    }
}

messagefile outfile[] <simple_mapper;prefix="z",suffix=".out">;

outfile[3] = greeting();
outfile[7] = greeting();


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From nefedova at mcs.anl.gov  Mon Jul  9 09:45:38 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Mon, 9 Jul 2007 09:45:38 -0500
Subject: [Swift-devel] Re: Missing security context
In-Reply-To: <46916C20.5050602@fnal.gov>
References: <46916C20.5050602@fnal.gov>
Message-ID: <0678E6CE-4521-4752-95DE-6BAF80C25F13@mcs.anl.gov>

Hi, Luciano:

My guess would be that there is a mismatch between your sites.xml and  
tc.data  files:  provider 'pbs' is mentioned maybe in only one of  
those files? Could you please send me these 2 files?  I am Cc to  
swift-devel - maybe there is a more definite answer to you question.

Thanks,

Nika

On Jul 8, 2007, at 5:58 PM, Luciano Piccoli wrote:

>
> Hi Veronika,
>
> I'm building swift in order to test a new mapper, but I'm having  
> some troubles configuring it. From the following error message can  
> you recognize what is missing?
>
> bash-3.00$ swift -tc.file ./tc.data3 example.swift -NUM=3
> Execution failed:
>        No security context can be found or created for service  
> (provider pbs): No 'pbs' provider or alias found. Available  
> providers: [gt2ft, gsiftp, condor, ssh, gt4ft, local, gt4, gsiftp- 
> old, gt2, ftp, webdav]. Aliases: webdav <-> http; local <-> file;  
> gsiftp-old <-> gridftp-old; gsiftp <-> gridftp; gt4 <-> gt3.9.5,  
> gt4.0.2, gt4.0.1, gt4.0.0;
>
> Thanks,
> Luciano
>


From hategan at mcs.anl.gov  Mon Jul  9 09:58:27 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Mon, 09 Jul 2007 09:58:27 -0500
Subject: [Swift-devel] Re: Missing security context
In-Reply-To: <0678E6CE-4521-4752-95DE-6BAF80C25F13@mcs.anl.gov>
References: <46916C20.5050602@fnal.gov>
	<0678E6CE-4521-4752-95DE-6BAF80C25F13@mcs.anl.gov>
Message-ID: <1183993107.7428.4.camel@blabla.mcs.anl.gov>

You need to download the pbs provider separately. I think you can find
the latest at
http://wiki.cogkit.org/index.php/V:4.1.5/Java_CoG_Kit_Release_Page#Downloads 

On Mon, 2007-07-09 at 09:45 -0500, Veronika Nefedova wrote:
> Hi, Luciano:
> 
> My guess would be that there is a mismatch between your sites.xml and  
> tc.data  files:  provider 'pbs' is mentioned maybe in only one of  
> those files? Could you please send me these 2 files?  I am Cc to  
> swift-devel - maybe there is a more definite answer to you question.
> 
> Thanks,
> 
> Nika
> 
> On Jul 8, 2007, at 5:58 PM, Luciano Piccoli wrote:
> 
> >
> > Hi Veronika,
> >
> > I'm building swift in order to test a new mapper, but I'm having  
> > some troubles configuring it. From the following error message can  
> > you recognize what is missing?
> >
> > bash-3.00$ swift -tc.file ./tc.data3 example.swift -NUM=3
> > Execution failed:
> >        No security context can be found or created for service  
> > (provider pbs): No 'pbs' provider or alias found. Available  
> > providers: [gt2ft, gsiftp, condor, ssh, gt4ft, local, gt4, gsiftp- 
> > old, gt2, ftp, webdav]. Aliases: webdav <-> http; local <-> file;  
> > gsiftp-old <-> gridftp-old; gsiftp <-> gridftp; gt4 <-> gt3.9.5,  
> > gt4.0.2, gt4.0.1, gt4.0.0;
> >
> > Thanks,
> > Luciano
> >
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 


From tiberius at ci.uchicago.edu  Mon Jul  9 10:28:48 2007
From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun)
Date: Mon, 9 Jul 2007 10:28:48 -0500
Subject: [Swift-devel] arrays-of-arrays
In-Reply-To: <Pine.OSX.4.64.0707081951330.16322@soju.hawaga.org.uk>
References: <Pine.OSX.4.64.0707081951330.16322@soju.hawaga.org.uk>
Message-ID: <fec1351f0707090828y4e4d04e3m7d47c9e3a50c5a37@mail.gmail.com>

Do you have an example when this would be useful ?
In the case doing parameter sweeps, I would be tempted to replace
a[m][n] with b[m] and c[n] and loop over b and c

Tibi

On 7/8/07, Ben Clifford <benc at hawaga.org.uk> wrote:
> The present language syntax does not admit arrays-of-arrays, with
> expressions such as a[5][3]. However, I don't see anything particularly
> constraining in the implementation to require this. Does anyone have
> preference?
> --
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


-- 
Tiberiu (Tibi) Stef-Praun, PhD
Research Staff, Computation Institute
5640 S. Ellis Ave, #405
University of Chicago
http://www-unix.mcs.anl.gov/~tiberius/


From benc at hawaga.org.uk  Mon Jul  9 12:06:02 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Mon, 9 Jul 2007 17:06:02 +0000 (GMT)
Subject: [Swift-devel] arrays-of-arrays
In-Reply-To: <fec1351f0707090828y4e4d04e3m7d47c9e3a50c5a37@mail.gmail.com>
References: <Pine.OSX.4.64.0707081951330.16322@soju.hawaga.org.uk>
	<fec1351f0707090828y4e4d04e3m7d47c9e3a50c5a37@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0707091704540.7513@dildano.hawaga.org.uk>


On Mon, 9 Jul 2007, Tiberiu Stef-Praun wrote:

> Do you have an example when this would be useful ?

not particularly - I just noticed that the way that some of the language 
changes that I'm making, it probably is no longer hard to have this syntax 
and wanted to know if there was a deeper reason for it to not be around.

-- 


From hategan at mcs.anl.gov  Mon Jul  9 12:11:09 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Mon, 09 Jul 2007 12:11:09 -0500
Subject: [Swift-devel] arrays-of-arrays
In-Reply-To: <Pine.LNX.4.64.0707091704540.7513@dildano.hawaga.org.uk>
References: <Pine.OSX.4.64.0707081951330.16322@soju.hawaga.org.uk>
	<fec1351f0707090828y4e4d04e3m7d47c9e3a50c5a37@mail.gmail.com>
	<Pine.LNX.4.64.0707091704540.7513@dildano.hawaga.org.uk>
Message-ID: <1184001069.12696.0.camel@blabla.mcs.anl.gov>

On Mon, 2007-07-09 at 17:06 +0000, Ben Clifford wrote:
> 
> On Mon, 9 Jul 2007, Tiberiu Stef-Praun wrote:
> 
> > Do you have an example when this would be useful ?
> 
> not particularly - I just noticed that the way that some of the language 
> changes that I'm making, it probably is no longer hard to have this syntax 
> and wanted to know if there was a deeper reason for it to not be around.

Yong had some issues with it. Maybe he can clarify.

> 


From benc at hawaga.org.uk  Mon Jul  9 17:03:50 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Mon, 9 Jul 2007 22:03:50 +0000 (GMT)
Subject: [Swift-devel] dot files by default
In-Reply-To: <1183956228.4067.2.camel@blabla.mcs.anl.gov>
References: <Pine.OSX.4.64.0707041106210.1364@soju.hawaga.org.uk> 
	<fec1351f0707082135p9435f4dx557c2f8f7e7016ec@mail.gmail.com>
	<1183956228.4067.2.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707092203160.7513@dildano.hawaga.org.uk>


On Sun, 8 Jul 2007, Mihael Hategan wrote:

> The default is 'true'. The issue is whether to switch the default to
> 'false'.

If no one pops up claiming to regularly use the outputted .dot files by 
default then I'll change this to false.

-- 


From hategan at mcs.anl.gov  Mon Jul  9 17:05:09 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Mon, 09 Jul 2007 17:05:09 -0500
Subject: [Swift-devel] dot files by default
In-Reply-To: <Pine.LNX.4.64.0707092203160.7513@dildano.hawaga.org.uk>
References: <Pine.OSX.4.64.0707041106210.1364@soju.hawaga.org.uk>
	<fec1351f0707082135p9435f4dx557c2f8f7e7016ec@mail.gmail.com>
	<1183956228.4067.2.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707092203160.7513@dildano.hawaga.org.uk>
Message-ID: <1184018709.24076.0.camel@blabla.mcs.anl.gov>

On Mon, 2007-07-09 at 22:03 +0000, Ben Clifford wrote:
> 
> On Sun, 8 Jul 2007, Mihael Hategan wrote:
> 
> > The default is 'true'. The issue is whether to switch the default to
> > 'false'.
> 
> If no one pops up claiming to regularly use the outputted .dot files by 
> default then I'll change this to false.

+1

> 


From benc at hawaga.org.uk  Tue Jul 10 11:53:39 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 10 Jul 2007 17:53:39 +0100 (BST)
Subject: [Swift-devel] status of conversion of bug 30: making the xml
 intermediate form more XML-like
Message-ID: <Pine.OSX.4.64.0707101716230.1586@soju.hawaga.org.uk>


At present, the XML intermediate form (between the user written 
SwiftScript form and the karajan code) is partly XML and partly other 
languages.

This makes parsing of the XML language hard and thus the language somewhat 
buggy.

In practice, this has resulted in wasted time and frustration for various 
people in this group trying to write applications.

So I'm working on converting the XML intermediate language to be more 
XML-like without the various embedded non-XML / quasi-XML languages that 
are there.

I have a basic implementation that is not ready for real use but seems to 
behave mostly ok.

A couple of caveats:

 i) different number types are not supported - there is a bunch of 
implicit type conversion between ints and floats that happens inside the 
present runtime. As part of tightening up the type checking, this messed 
up a bunch of numerical stuff so I temporarily have made my development 
code only accept integers - no floats (I don't know of anyone who uses 
non-integers in programs, though).

I need to think some more about implicit type conversion and how operator 
overloading should work - at the moment in production a lot of semantics 
are inherited from karajan that are maybe but not necessarily what are 
right.

 ii) The present production implementation has a dual type model - 
sometimes data flows around as java objects of types such as Integer or 
String; sometimes it flows around as DSHandle objects which contain those 
values. The need to convert between those at many points causes trouble. 

My development code keeps values in DSHandle objects as much as possible.

This is some additional runtime overhead (because the expression 1 + 2 now 
creates three intermediate DSHandle objects, rather than evaluating the 
expression as the Karajan level and wrapping at the end).

However, in practice expressions are not used very much and so this 
overhead is hopefully not excessively onerous. If it is, there is scope 
for optimistion to happen at the xml->kml layer (as I'm doing with path 
handling).

-- 


From hategan at mcs.anl.gov  Tue Jul 10 12:02:35 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 10 Jul 2007 12:02:35 -0500
Subject: [Swift-devel] status of conversion of bug 30: making the xml
	intermediate form more XML-like
In-Reply-To: <Pine.OSX.4.64.0707101716230.1586@soju.hawaga.org.uk>
References: <Pine.OSX.4.64.0707101716230.1586@soju.hawaga.org.uk>
Message-ID: <1184086955.13408.4.camel@blabla.mcs.anl.gov>

On Tue, 2007-07-10 at 17:53 +0100, Ben Clifford wrote:
> My development code keeps values in DSHandle objects as much as possible.

That's what I would (and might have) argued for.

> 
> This is some additional runtime overhead (because the expression 1 + 2 now 
> creates three intermediate DSHandle objects, rather than evaluating the 
> expression as the Karajan level and wrapping at the end).

I think the best solution is to not use the normal karajan functions for
swift arithmetic.

> 
> However, in practice expressions are not used very much and so this 
> overhead is hopefully not excessively onerous. If it is, there is scope 
> for optimistion to happen at the xml->kml layer (as I'm doing with path 
> handling).
> 


From benc at hawaga.org.uk  Tue Jul 10 12:04:47 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 10 Jul 2007 17:04:47 +0000 (GMT)
Subject: [Swift-devel] status of conversion of bug 30: making the xml
	intermediate form more XML-like
In-Reply-To: <1184086955.13408.4.camel@blabla.mcs.anl.gov>
References: <Pine.OSX.4.64.0707101716230.1586@soju.hawaga.org.uk>
	<1184086955.13408.4.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707101703320.7513@dildano.hawaga.org.uk>


On Tue, 10 Jul 2007, Mihael Hategan wrote:

> I think the best solution is to not use the normal karajan functions for
> swift arithmetic.

It doesn't in my present impl - I have code that unwraps and rewraps (and 
then uses the underlying karajan functions).  Though, something more fancy 
is necessary there I think, once I figure out what return types should 
look like.

-- 


From hategan at mcs.anl.gov  Tue Jul 10 12:09:50 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 10 Jul 2007 12:09:50 -0500
Subject: [Swift-devel] status of conversion of bug 30: making the xml
	intermediate form more XML-like
In-Reply-To: <Pine.LNX.4.64.0707101703320.7513@dildano.hawaga.org.uk>
References: <Pine.OSX.4.64.0707101716230.1586@soju.hawaga.org.uk>
	<1184086955.13408.4.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707101703320.7513@dildano.hawaga.org.uk>
Message-ID: <1184087390.13873.1.camel@blabla.mcs.anl.gov>

On Tue, 2007-07-10 at 17:04 +0000, Ben Clifford wrote:
> 
> On Tue, 10 Jul 2007, Mihael Hategan wrote:
> 
> > I think the best solution is to not use the normal karajan functions for
> > swift arithmetic.
> 
> It doesn't in my present impl - I have code that unwraps and rewraps (and 
> then uses the underlying karajan functions).  Though, something more fancy 
> is necessary there I think, once I figure out what return types should 
> look like.

Right. Those could be implemented in Java directly for performance
reasons. 

> 


From bugzilla-daemon at mcs.anl.gov  Wed Jul 11 11:20:43 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Wed, 11 Jul 2007 11:20:43 -0500 (CDT)
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <bug-72-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070711162043.9FCF416502@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72


------- Comment #19 from nefedova at mcs.anl.gov  2007-07-11 11:20 -------
Some issues arised when testing the re-written swift code with loops (in
attempt to reduce the size and thus to eliminate a possible reason for the
problems with large workflows). When tested with just one loop - it all worked,
but when intrioduced an inside loop, it just hangs there.

I have 2 loops in the workflow, one inside the other:
foreach f in files{
do_something;
print(a);
foreach s in sfiles{
print(b);
something;
if (a=="blah"){
do_staff;
}else{
do_another_stuff;
}
} # close foreach s
} # close foreach f

(the full code could be found on terminable in ~nefedova/MolDyn.dtm)

I see the code hanging without *any* errors right when the second foreach is
supposed to start. I.e. I see a is being printed but not b.

Any suggestions on what could be wrong here?


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Wed Jul 11 16:40:20 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Wed, 11 Jul 2007 16:40:20 -0500 (CDT)
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <bug-72-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070711214020.D4BDA16502@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72


------- Comment #20 from hategan at mcs.anl.gov  2007-07-11 16:40 -------
(In reply to comment #19)
> Some issues arised when testing the re-written swift code with loops (in
> attempt to reduce the size and thus to eliminate a possible reason for the
> problems with large workflows). When tested with just one loop - it all worked,
> but when intrioduced an inside loop, it just hangs there.
> [...]

Working on it...


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Wed Jul 11 18:12:32 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Wed, 11 Jul 2007 18:12:32 -0500 (CDT)
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <bug-72-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070711231232.6005E16502@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72


------- Comment #21 from hategan at mcs.anl.gov  2007-07-11 18:12 -------
It freezes because files[] is not used. In a sense. The compiler should tag all
data that is not an lvalue but appears as part of an expression as input data.
Apparently the compiler misses the part where the variable is used by a for
loop.

You can convince swift that files and sfiles are inputs by doing something like
print(files); print(sfiles);. Nonetheless, this should be fixed in the
compiler.

Mihael

(In reply to comment #20)
> (In reply to comment #19)
> > Some issues arised when testing the re-written swift code with loops (in
> > attempt to reduce the size and thus to eliminate a possible reason for the
> > problems with large workflows). When tested with just one loop - it all worked,
> > but when intrioduced an inside loop, it just hangs there.
> > [...]
> 
> Working on it...
> 
> 


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From benc at hawaga.org.uk  Thu Jul 12 08:25:53 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 12 Jul 2007 13:25:53 +0000 (GMT)
Subject: [Swift-devel] cog svn update
Message-ID: <Pine.LNX.4.64.0707121325070.7513@dildano.hawaga.org.uk>


cog svn update is failing for me (and I think tibi) for the past few days:

$ svn update
svn: PROPFIND request failed on '/svnroot/cogkit/trunk/current/src/cog'
svn: PROPFIND of '/svnroot/cogkit/trunk/current/src/cog': Could not 
resolve hostname `svn.sourceforge.net': No address associated with 
nodename (https://svn.sourceforge.net)

$ svn info
Path: .
URL: https://svn.sourceforge.net/svnroot/cogkit/trunk/current/src/cog
Repository Root: https://svn.sourceforge.net/svnroot/cogkit
Repository UUID: 5b74d2a0-fa0e-0410-85ed-ffba77ec0bde
Revision: 1658
Node Kind: directory
Schedule: normal
Last Changed Author: hategan
Last Changed Rev: 1658
Last Changed Date: 2007-07-05 23:30:44 +0200 (Thu, 05 Jul 2007)


-- 


From hategan at mcs.anl.gov  Thu Jul 12 23:45:30 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 12 Jul 2007 23:45:30 -0500
Subject: [Swift-devel] cog svn update
In-Reply-To: <Pine.LNX.4.64.0707121325070.7513@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707121325070.7513@dildano.hawaga.org.uk>
Message-ID: <1184301930.24582.0.camel@blabla.mcs.anl.gov>

http://sourceforge.net/docs/A04#1184001090

( 2007-07-09 10:43:54 - Project Subversion (SVN) Service )   As
announced, support for the deprecated subversion access method
(svn.sourceforge.net) was removed. Please use the
PROJECT.svn.sourceforge.net access method that is described in our docs.

On Thu, 2007-07-12 at 13:25 +0000, Ben Clifford wrote:
> cog svn update is failing for me (and I think tibi) for the past few days:
> 
> $ svn update
> svn: PROPFIND request failed on '/svnroot/cogkit/trunk/current/src/cog'
> svn: PROPFIND of '/svnroot/cogkit/trunk/current/src/cog': Could not 
> resolve hostname `svn.sourceforge.net': No address associated with 
> nodename (https://svn.sourceforge.net)
> 
> $ svn info
> Path: .
> URL: https://svn.sourceforge.net/svnroot/cogkit/trunk/current/src/cog
> Repository Root: https://svn.sourceforge.net/svnroot/cogkit
> Repository UUID: 5b74d2a0-fa0e-0410-85ed-ffba77ec0bde
> Revision: 1658
> Node Kind: directory
> Schedule: normal
> Last Changed Author: hategan
> Last Changed Rev: 1658
> Last Changed Date: 2007-07-05 23:30:44 +0200 (Thu, 05 Jul 2007)
> 
> 


From benc at hawaga.org.uk  Fri Jul 13 02:05:32 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Fri, 13 Jul 2007 07:05:32 +0000 (GMT)
Subject: [Swift-devel] cog svn update
In-Reply-To: <1184301930.24582.0.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0707121325070.7513@dildano.hawaga.org.uk>
	<1184301930.24582.0.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707130704300.7513@dildano.hawaga.org.uk>


ok. I see you fixed that in nighly builds. I've changed the download page 
too.

On Thu, 12 Jul 2007, Mihael Hategan wrote:

> http://sourceforge.net/docs/A04#1184001090
> 
> ( 2007-07-09 10:43:54 - Project Subversion (SVN) Service )   As
> announced, support for the deprecated subversion access method
> (svn.sourceforge.net) was removed. Please use the
> PROJECT.svn.sourceforge.net access method that is described in our docs.
> 
> On Thu, 2007-07-12 at 13:25 +0000, Ben Clifford wrote:
> > cog svn update is failing for me (and I think tibi) for the past few days:
> > 
> > $ svn update
> > svn: PROPFIND request failed on '/svnroot/cogkit/trunk/current/src/cog'
> > svn: PROPFIND of '/svnroot/cogkit/trunk/current/src/cog': Could not 
> > resolve hostname `svn.sourceforge.net': No address associated with 
> > nodename (https://svn.sourceforge.net)
> > 
> > $ svn info
> > Path: .
> > URL: https://svn.sourceforge.net/svnroot/cogkit/trunk/current/src/cog
> > Repository Root: https://svn.sourceforge.net/svnroot/cogkit
> > Repository UUID: 5b74d2a0-fa0e-0410-85ed-ffba77ec0bde
> > Revision: 1658
> > Node Kind: directory
> > Schedule: normal
> > Last Changed Author: hategan
> > Last Changed Rev: 1658
> > Last Changed Date: 2007-07-05 23:30:44 +0200 (Thu, 05 Jul 2007)
> > 
> > 
> 
> 


From bugzilla-daemon at mcs.anl.gov  Fri Jul 13 13:34:34 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Fri, 13 Jul 2007 13:34:34 -0500 (CDT)
Subject: [Swift-devel] [Bug 83] New: nested loops hung
Message-ID: <bug-83-21@http.bugzilla.mcs.anl.gov/swift/>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83

           Summary: nested loops hung
           Product: Swift
           Version: unspecified
          Platform: Macintosh
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P2
         Component: General
        AssignedTo: hategan at mcs.anl.gov
        ReportedBy: nefedova at mcs.anl.gov
                CC: swift-devel at ci.uchicago.edu
OtherBugsDependingO 72
             nThis:


Workflows with nested loops freeze. Specifically, this construct:

foreach f in files{
do_something;
print(a);
foreach s in sfiles{
print(b);
something;
if (a=="blah"){
do_staff;
}else{
do_another_stuff;
}
} # close foreach s
} # close foreach f

(the full code could be found on terminable in ~nefedova/MolDyn.dtm)


Comments from Mihael:

It freezes because files[] is not used. In a sense. The compiler should tag all
data that is not an lvalue but appears as part of an expression as input data.
Apparently the compiler misses the part where the variable is used by a for
loop.

You can convince swift that files and sfiles are inputs by doing something like
print(files); print(sfiles);. Nonetheless, this should be fixed in the
compiler.


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Fri Jul 13 13:34:35 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Fri, 13 Jul 2007 13:34:35 -0500 (CDT)
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <bug-72-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070713183435.5069416506@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72


nefedova at mcs.anl.gov changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |83


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Fri Jul 13 14:24:16 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Fri, 13 Jul 2007 14:24:16 -0500 (CDT)
Subject: [Swift-devel] [Bug 83] nested loops hung
In-Reply-To: <bug-83-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070713192416.87C54164EC@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83


benc at hawaga.org.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|hategan at mcs.anl.gov         |benc at hawaga.org.uk


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Fri Jul 13 14:46:59 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Fri, 13 Jul 2007 14:46:59 -0500 (CDT)
Subject: [Swift-devel] [Bug 83] nested loops hung
In-Reply-To: <bug-83-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070713194659.01B77164EC@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83


------- Comment #1 from benc at hawaga.org.uk  2007-07-13 14:46 -------
The supplied program isn't something that can be fed into swift - its missing
definitions for all of the variables. I tried the below and it does not hang in
r912. Please can you supply a small test program that is a complete valid swift
program and hangs.

type file;
file files[] <fixed_array_mapper;files="a b">;
file sfiles[] <fixed_array_mapper;files="a b">;
foreach f in files{
print(f);
foreach s in sfiles{
print(s);
} # close foreach s
} # close foreach f


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Fri Jul 13 14:54:06 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Fri, 13 Jul 2007 14:54:06 -0500 (CDT)
Subject: [Swift-devel] [Bug 83] nested loops hung
In-Reply-To: <bug-83-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070713195406.75188164EC@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83


------- Comment #2 from nefedova at mcs.anl.gov  2007-07-13 14:54 -------
Please copy *.mol2 and *.prt from ~nefedova/alamines to your directory and try
this program:

type file {}
file files[]<filesys_mapper;pattern=".mol2",location=".">;
file sfiles[]<filesys_mapper;pattern=".prt",location=".">;
string a = "a";
string b = "b";
string c = "c";
print(c);
foreach file f in files
{
string aa = "aa";
print(aa);
foreach s in sfiles{
print(b);
if (a=="a"){
print (a);
}else{
print(b);
}
}
}

It hangs after printing "c".


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Fri Jul 13 14:57:45 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Fri, 13 Jul 2007 14:57:45 -0500 (CDT)
Subject: [Swift-devel] [Bug 83] nested loops hung
In-Reply-To: <bug-83-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070713195745.0EC61164EC@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83


------- Comment #3 from hategan at mcs.anl.gov  2007-07-13 14:57 -------
(In reply to comment #1)
> The supplied program isn't something that can be fed into swift [...]
--------------
type file {}

file f1[] <filesys_mapper;pattern="*.mol2",location=".">;

//magic switch below
//print(f1);

foreach i1 in f1 {

}

--------------

You need a some .mol2 dummies.


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Fri Jul 13 15:16:02 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Fri, 13 Jul 2007 15:16:02 -0500 (CDT)
Subject: [Swift-devel] [Bug 83] nested loops hung
In-Reply-To: <bug-83-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070713201602.3B855164DD@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83


------- Comment #4 from benc at hawaga.org.uk  2007-07-13 15:16 -------
hangs for me. but if I replace the array definition with:

file f1[] <fixed_array_mapper;files="a.mol2 b.mol2">;.

it does not.


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Fri Jul 13 15:26:26 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Fri, 13 Jul 2007 15:26:26 -0500 (CDT)
Subject: [Swift-devel] [Bug 83] nested loops hung
In-Reply-To: <bug-83-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070713202626.4C085164DD@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83


------- Comment #5 from hategan at mcs.anl.gov  2007-07-13 15:26 -------
(In reply to comment #4)
> hangs for me. but if I replace the array definition with:
> 
> file f1[] <fixed_array_mapper;files="a.mol2 b.mol2">;.
> 
> it does not.
> 

Regardless. In the hanging scenario f1 is not marked as input, although it
should.


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Fri Jul 13 16:32:16 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Fri, 13 Jul 2007 16:32:16 -0500 (CDT)
Subject: [Swift-devel] [Bug 83] nested loops hung
In-Reply-To: <bug-83-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070713213216.0B349164EC@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83


------- Comment #6 from nefedova at mcs.anl.gov  2007-07-13 16:32 -------
    this is a snippet from my code:

    file fls[]<filesys_mapper;pattern="*prt_",location=".">;  
    print(fls);
    foreach file in files {
    command;
    foreach prt_file in fls
    {
    <>
    }
    }


 and it still hangs inside the second loop, while the stuff in the first
loop("command;)that comes before the second loop works.


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From foster at mcs.anl.gov  Sat Jul 14 15:44:12 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Sat, 14 Jul 2007 15:44:12 -0500
Subject: [Swift-devel] MolDyn
Message-ID: <4699359C.5080200@mcs.anl.gov>

Hi,

I haven't seen any communications regarding MolDyn recently. Where do 
things stand with the 244 molecule run

Ian


From iraicu at cs.uchicago.edu  Sun Jul 15 01:05:02 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Sun, 15 Jul 2007 01:05:02 -0500
Subject: [Swift-devel] MolDyn
In-Reply-To: <4699359C.5080200@mcs.anl.gov>
References: <4699359C.5080200@mcs.anl.gov>
Message-ID: <4699B90E.4070708@cs.uchicago.edu>

Hi,
I think Nika has been waiting on me this week, as we are still using the 
AstroPortal allocation at the ANL/UC site.  I have been super busy with 
the camera ready Falkon paper, re-running experiments, etc... but I just 
finished that!  Assuming Nika is ready (which I think she is) , we'll 
give the 244 mol run another try on Monday!

Ioan

Ian Foster wrote:
> Hi,
>
> I haven't seen any communications regarding MolDyn recently. Where do 
> things stand with the 244 molecule run
>
> Ian
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================


From foster at mcs.anl.gov  Sun Jul 15 15:58:13 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Sun, 15 Jul 2007 15:58:13 -0500
Subject: [Swift-devel] MolDyn
In-Reply-To: <4699B90E.4070708@cs.uchicago.edu>
References: <4699359C.5080200@mcs.anl.gov> <4699B90E.4070708@cs.uchicago.edu>
Message-ID: <469A8A65.5010209@mcs.anl.gov>

This is crazy ... Nike is working on this, not you--she should not be 
waiting for you, or depending on an AstroPortal allocation.

Ioan Raicu wrote:
> Hi,
> I think Nika has been waiting on me this week, as we are still using 
> the AstroPortal allocation at the ANL/UC site.  I have been super busy 
> with the camera ready Falkon paper, re-running experiments, etc... but 
> I just finished that!  Assuming Nika is ready (which I think she is) , 
> we'll give the 244 mol run another try on Monday!
>
> Ioan
>
> Ian Foster wrote:
>> Hi,
>>
>> I haven't seen any communications regarding MolDyn recently. Where do 
>> things stand with the 244 molecule run
>>
>> Ian
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>

-- 

   Ian Foster, Director, Computation Institute
Argonne National Laboratory & University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
      Globus Alliance: www.globus.org.


From iraicu at cs.uchicago.edu  Sun Jul 15 16:06:59 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Sun, 15 Jul 2007 16:06:59 -0500
Subject: [Swift-devel] MolDyn
In-Reply-To: <469A8A65.5010209@mcs.anl.gov>
References: <4699359C.5080200@mcs.anl.gov> <4699B90E.4070708@cs.uchicago.edu>
	<469A8A65.5010209@mcs.anl.gov>
Message-ID: <469A8C73.5000901@cs.uchicago.edu>

To speed up the debugging process, we decided to use the ANL site.  Nika 
did not have an allocation there, so we decided to go ahead and use my 
allocation (AstroPortal's) just for debugging.  Essentially, I was 
running Falkon under my credentials, and Nika was running Swift.  This 
has been the way things have been for the past weeks since we moved over 
to the ANL site with the MolDyn code.  Perhaps its time to give Nika the 
latest Falkon code, and run Falkon with her credentials.  Then, she 
wouldn't have to wait for me, unless there were problems with Falkon 
that need to be resolved.  Nika, do you finally have credentials for 
ANL, or would we have to move over to Purdue again? perhaps we can do 
one more debug run at ANL tomorrow (Monday) under my credentials, as we 
have everything setup and ready to go?

Ioan


Ian Foster wrote:
> This is crazy ... Nike is working on this, not you--she should not be 
> waiting for you, or depending on an AstroPortal allocation.
>
> Ioan Raicu wrote:
>> Hi,
>> I think Nika has been waiting on me this week, as we are still using 
>> the AstroPortal allocation at the ANL/UC site.  I have been super 
>> busy with the camera ready Falkon paper, re-running experiments, 
>> etc... but I just finished that!  Assuming Nika is ready (which I 
>> think she is) , we'll give the 244 mol run another try on Monday!
>>
>> Ioan
>>
>> Ian Foster wrote:
>>> Hi,
>>>
>>> I haven't seen any communications regarding MolDyn recently. Where 
>>> do things stand with the 244 molecule run
>>>
>>> Ian
>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>
>

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================


From bugzilla-daemon at mcs.anl.gov  Sun Jul 15 23:09:53 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Sun, 15 Jul 2007 23:09:53 -0500 (CDT)
Subject: [Swift-devel] [Bug 83] nested loops hung
In-Reply-To: <bug-83-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070716040953.8B931164DD@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83


------- Comment #7 from tiberius at mcs.anl.gov  2007-07-15 23:09 -------

The code below also hangs. 
On the execution node, I only get a subset of the echo jobs to be executed.

This is not good at all.
I was trying the following patttern:
A set of similar inputs (processData) I need to process through various
procedures (echoA, echoB, echoNone) and I was trying to have a batch job  that
processes all the inputs through these procedures. Note that in this case there
are no dependencies between the procedures (echoA,echoB, echoNone).

This has got to be a pretty standard pattern.


type file{};

(file echoAfile) echoA (string sIn){
        app{
                echo sIn stdout=@filename(echoAfile);
        }
}

(file echoBfile) echoB (string sIn){
        app{
                echo sIn stdout=@filename(echoBfile);
        }
}

(file echoCfile) echoNone(){
        app{
                echo "NONE" stdout=@filename(echoCfile);
        }
}


(file aResults[], file bResults[], file noResults) testLoop (string symbols[]){
        noResults=echoNone();
        foreach s,i in symbols {
                aResults[i] = echoA(s); 
                bResults[i] = echoB(s);
        }
}

string processData[]=["data-1", "data-2"];
string echoANames = "data-1.A data-2.B";
string echoBNames = "data-2.A data-2.B";

file echoEmpty<"echo.empty">;
file echoA[]<fixed_array_mapper; files=echoANames>;
file echoB[]<fixed_array_mapper; files=echoBNames>;

(echoA, echoB, echoEmpty) = testLoop (processData);


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From foster at mcs.anl.gov  Mon Jul 16 00:45:28 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Mon, 16 Jul 2007 00:45:28 -0500
Subject: [Swift-devel] MolDyn
In-Reply-To: <469A8A65.5010209@mcs.anl.gov>
References: <4699359C.5080200@mcs.anl.gov> <4699B90E.4070708@cs.uchicago.edu>
	<469A8A65.5010209@mcs.anl.gov>
Message-ID: <469B05F8.9010801@mcs.anl.gov>

Mike points out that Nike has been very busy re-rolling loops in MolDyn, 
thanks to the new @strcut operator.

I still feel concerned about the fact that we don't yet seem to have 
allocations sorted out.

Ian.

Ian Foster wrote:
> This is crazy ... Nike is working on this, not you--she should not be 
> waiting for you, or depending on an AstroPortal allocation.
>
> Ioan Raicu wrote:
>> Hi,
>> I think Nika has been waiting on me this week, as we are still using 
>> the AstroPortal allocation at the ANL/UC site.  I have been super 
>> busy with the camera ready Falkon paper, re-running experiments, 
>> etc... but I just finished that!  Assuming Nika is ready (which I 
>> think she is) , we'll give the 244 mol run another try on Monday!
>>
>> Ioan
>>
>> Ian Foster wrote:
>>> Hi,
>>>
>>> I haven't seen any communications regarding MolDyn recently. Where 
>>> do things stand with the 244 molecule run
>>>
>>> Ian
>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>
>

-- 

   Ian Foster, Director, Computation Institute
Argonne National Laboratory & University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
      Globus Alliance: www.globus.org.


From iraicu at cs.uchicago.edu  Mon Jul 16 00:50:00 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Mon, 16 Jul 2007 00:50:00 -0500
Subject: [Swift-devel] MolDyn
In-Reply-To: <469B05F8.9010801@mcs.anl.gov>
References: <4699359C.5080200@mcs.anl.gov> <4699B90E.4070708@cs.uchicago.edu>
	<469A8A65.5010209@mcs.anl.gov> <469B05F8.9010801@mcs.anl.gov>
Message-ID: <469B0708.1010908@cs.uchicago.edu>

Right, I also had the impression that Nika was busy rewriting the 
workflow, so she wasn't really just waiting on me, sitting idle....
We'll do another attempt for the 244 mol run tomorrow from the 
AstroPortal allocation.  There were some minor changes I made in Falkon, 
and Mihael made some fixes that caused the stack overflow, so let's see 
how it all holds up!
Ioan

Ian Foster wrote:
> Mike points out that Nike has been very busy re-rolling loops in 
> MolDyn, thanks to the new @strcut operator.
>
> I still feel concerned about the fact that we don't yet seem to have 
> allocations sorted out.
>
> Ian.
>
> Ian Foster wrote:
>> This is crazy ... Nike is working on this, not you--she should not be 
>> waiting for you, or depending on an AstroPortal allocation.
>>
>> Ioan Raicu wrote:
>>> Hi,
>>> I think Nika has been waiting on me this week, as we are still using 
>>> the AstroPortal allocation at the ANL/UC site.  I have been super 
>>> busy with the camera ready Falkon paper, re-running experiments, 
>>> etc... but I just finished that!  Assuming Nika is ready (which I 
>>> think she is) , we'll give the 244 mol run another try on Monday!
>>>
>>> Ioan
>>>
>>> Ian Foster wrote:
>>>> Hi,
>>>>
>>>> I haven't seen any communications regarding MolDyn recently. Where 
>>>> do things stand with the 244 molecule run
>>>>
>>>> Ian
>>>>
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>
>>>
>>
>

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================


From benc at hawaga.org.uk  Mon Jul 16 01:16:24 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Mon, 16 Jul 2007 06:16:24 +0000 (GMT)
Subject: [Swift-devel] MolDyn
In-Reply-To: <469A8C73.5000901@cs.uchicago.edu>
References: <4699359C.5080200@mcs.anl.gov> <4699B90E.4070708@cs.uchicago.edu>
	<469A8A65.5010209@mcs.anl.gov> <469A8C73.5000901@cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0707160615290.11237@dildano.hawaga.org.uk>


On Sun, 15 Jul 2007, Ioan Raicu wrote:

> code.  Perhaps its time to give Nika the latest Falkon code, and run Falkon
> with her credentials.  Then, she wouldn't have to wait for me, unless there

Perhaps its time to put Falkon somewhere where people can download the 
latest code wherever and whenever they want.

-- 


From benc at hawaga.org.uk  Mon Jul 16 03:14:05 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Mon, 16 Jul 2007 08:14:05 +0000 (GMT)
Subject: [Swift-devel] numeric type(s) in swift.
Message-ID: <Pine.LNX.4.64.0707160806440.7513@dildano.hawaga.org.uk>


Swift has floating point and integer types.

However, now that I look at implementing those, it makes me wonder if we 
should have a single numeric type. Its not clear that we need float/double 
in the language as distinct types.

-- 


From bugzilla-daemon at mcs.anl.gov  Mon Jul 16 06:52:08 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Mon, 16 Jul 2007 06:52:08 -0500 (CDT)
Subject: [Swift-devel] [Bug 83] nested loops hung
In-Reply-To: <bug-83-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070716115208.CB34E164EC@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83


------- Comment #8 from benc at hawaga.org.uk  2007-07-16 06:52 -------
I don't think that comment #7 is this bug. Please open a new one.


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From benc at hawaga.org.uk  Mon Jul 16 06:58:25 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Mon, 16 Jul 2007 11:58:25 +0000 (GMT)
Subject: [Swift-devel] bug 82: request for centralised installed
	applications catalog
Message-ID: <Pine.LNX.4.64.0707161154420.7513@dildano.hawaga.org.uk>


Tibi put the following bug in:

> I'm thinking a tc.data database on the web, where everyone who has a 
> swift workflow can publish the applications that they have installed on 
> the Grid, and let others benefit from using them

> Ex:http://www.ci.uchicago.edu/wiki/bin/view/SWFT/SwiftGridResources

> This will be an extra incentive for people to use swift: they can use 
> already existing (and verified) applications from the grid.

> If we had a web interface for this, we could add it to the Swift 
> webpage, and let visitors see that the Swift has a active and diverse 
> set of users.

 i) who will own the list? that person would need to be responsible for 
ongoing verification (and documenting what they mean by verification) of 
that list, including regularly removing entries that have ceased to 
verify.

 ii) anything in SVN already has a URL to link to - anything in SVN is 
already 'on the web'. A better place for this might be as a tc.data.big 
file in the SVN, given that everyone really is using HEAD not releases at 
the moment.

-- 


From tiberius at ci.uchicago.edu  Mon Jul 16 08:01:53 2007
From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun)
Date: Mon, 16 Jul 2007 08:01:53 -0500
Subject: [Swift-devel] [Bug 83] nested loops hung
In-Reply-To: <20070716115208.CB34E164EC@foxtrot.mcs.anl.gov>
References: <bug-83-21@http.bugzilla.mcs.anl.gov/swift/>
	<20070716115208.CB34E164EC@foxtrot.mcs.anl.gov>
Message-ID: <fec1351f0707160601ne0577ecw9ccefd265ff6c9a0@mail.gmail.com>

Well, it's still about loops that hang.
I did not want to pollute the bugzilla with another bug that is very
similar to the nested loops bug. Maybe comment #7 is a  different
realization of the same bug.
Hopefully a bit of progress in addressing the loops bug will clear up
whether this should be a different bug or not.


On 7/16/07, bugzilla-daemon at mcs.anl.gov <bugzilla-daemon at mcs.anl.gov> wrote:
> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83
>
>
>
>
>
> ------- Comment #8 from benc at hawaga.org.uk  2007-07-16 06:52 -------
> I don't think that comment #7 is this bug. Please open a new one.
>
>
> --
> Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


-- 
Tiberiu (Tibi) Stef-Praun, PhD
Research Staff, Computation Institute
5640 S. Ellis Ave, #405
University of Chicago
http://www-unix.mcs.anl.gov/~tiberius/


From benc at hawaga.org.uk  Mon Jul 16 08:37:22 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Mon, 16 Jul 2007 13:37:22 +0000 (GMT)
Subject: [Swift-devel] [Bug 83] nested loops hung
In-Reply-To: <fec1351f0707160601ne0577ecw9ccefd265ff6c9a0@mail.gmail.com>
References: <bug-83-21@http.bugzilla.mcs.anl.gov/swift/>
	<20070716115208.CB34E164EC@foxtrot.mcs.anl.gov>
	<fec1351f0707160601ne0577ecw9ccefd265ff6c9a0@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0707161336540.11237@dildano.hawaga.org.uk>


open a new bug. descripe the subset of echos that actually run. see if you 
can recreate it with a smaller program. if it turns out to be the same, 
its easy to mark as duplicate.

On Mon, 16 Jul 2007, Tiberiu Stef-Praun wrote:

> Well, it's still about loops that hang.
> I did not want to pollute the bugzilla with another bug that is very
> similar to the nested loops bug. Maybe comment #7 is a  different
> realization of the same bug.
> Hopefully a bit of progress in addressing the loops bug will clear up
> whether this should be a different bug or not.
> 
> 
> On 7/16/07, bugzilla-daemon at mcs.anl.gov <bugzilla-daemon at mcs.anl.gov> wrote:
> > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83
> > 
> > 
> > 
> > 
> > 
> > ------- Comment #8 from benc at hawaga.org.uk  2007-07-16 06:52 -------
> > I don't think that comment #7 is this bug. Please open a new one.
> > 
> > 
> > --
> > Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
> > ------- You are receiving this mail because: -------
> > You are on the CC list for the bug, or are watching someone who is.
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > 
> 
> 
> 


From bugzilla-daemon at mcs.anl.gov  Mon Jul 16 15:09:08 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Mon, 16 Jul 2007 15:09:08 -0500 (CDT)
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <bug-72-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070716200908.0112316502@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72


------- Comment #22 from nefedova at mcs.anl.gov  2007-07-16 15:09 -------
a new 244-molecule experiment has started. You can watch it live here:
http://viper.uchicago.edu:55000/index.htm

Please notice that the link is valid only while the job is running. 


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Mon Jul 16 15:14:38 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Mon, 16 Jul 2007 15:14:38 -0500 (CDT)
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <bug-72-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070716201438.826B9164DD@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72


------- Comment #23 from iraicu at cs.uchicago.edu  2007-07-16 15:14 -------
(In reply to comment #22)
> a new 244-molecule experiment has started. You can watch it live here:
> http://viper.uchicago.edu:55000/index.htm
> 
> Please notice that the link is valid only while the job is running. 
> 

Actually, the graphs will be generated every 60 sec until the script is shut
down... and the web server and graph generation scripts are set to shut down
when Falkon is shut down, and not when Swift finishes the run.  Once the run is
over, I'll shut everything down and post the graphs on a static web page that
is persistent for later viewing (I'll send out the new URL).

Ioan  


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From foster at mcs.anl.gov  Mon Jul 16 16:27:07 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Mon, 16 Jul 2007 16:27:07 -0500
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <20070716201438.826B9164DD@foxtrot.mcs.anl.gov>
References: <20070716201438.826B9164DD@foxtrot.mcs.anl.gov>
Message-ID: <469BE2AB.1090609@mcs.anl.gov>

hey this is neat!

bugzilla-daemon at mcs.anl.gov wrote:
> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72
>
>
>
>
>
> ------- Comment #23 from iraicu at cs.uchicago.edu  2007-07-16 15:14 -------
> (In reply to comment #22)
>   
>> a new 244-molecule experiment has started. You can watch it live here:
>> http://viper.uchicago.edu:55000/index.htm
>>
>> Please notice that the link is valid only while the job is running. 
>>
>>     
>
> Actually, the graphs will be generated every 60 sec until the script is shut
> down... and the web server and graph generation scripts are set to shut down
> when Falkon is shut down, and not when Swift finishes the run.  Once the run is
> over, I'll shut everything down and post the graphs on a static web page that
> is persistent for later viewing (I'll send out the new URL).
>
> Ioan  
>
>
>   

-- 

   Ian Foster, Director, Computation Institute
Argonne National Laboratory & University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
      Globus Alliance: www.globus.org.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070716/34c9ad8e/attachment.html>

From bugzilla-daemon at mcs.anl.gov  Mon Jul 16 17:06:19 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Mon, 16 Jul 2007 17:06:19 -0500 (CDT)
Subject: [Swift-devel] [Bug 83] nested loops hung
In-Reply-To: <bug-83-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070716220619.99D75164DD@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83


------- Comment #9 from tiberius at mcs.anl.gov  2007-07-16 17:06 -------
Comment #7 was caused by a type on the workflow.
Never mind, and sorry for the confusion.


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Mon Jul 16 17:08:09 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Mon, 16 Jul 2007 17:08:09 -0500 (CDT)
Subject: [Swift-devel] [Bug 83] nested loops hung
In-Reply-To: <bug-83-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070716220809.06341164DD@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83


------- Comment #10 from tiberius at mcs.anl.gov  2007-07-16 17:08 -------
(In reply to comment #9)
> Comment #7 was caused by a type on the workflow.
> Never mind, and sorry for the confusion.
> 

I meant typo.


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Mon Jul 16 17:18:52 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Mon, 16 Jul 2007 17:18:52 -0500 (CDT)
Subject: [Swift-devel] [Bug 83] nested loops hung
In-Reply-To: <bug-83-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070716221852.92C19164DD@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83


------- Comment #11 from hategan at mcs.anl.gov  2007-07-16 17:18 -------
I'd file this as a separate bug report. This is nasty and costly behavior.
Mappers can probably keep a list of output files mapped and complain when two
output things map to the same file.

(In reply to comment #9)
> Comment #7 was caused by a type on the workflow.
> Never mind, and sorry for the confusion.
> 


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From benc at hawaga.org.uk  Tue Jul 17 07:13:53 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 17 Jul 2007 12:13:53 +0000 (GMT)
Subject: [Swift-devel] swift tutorial at ISSGC07
Message-ID: <Pine.LNX.4.64.0707171210260.7513@dildano.hawaga.org.uk>


I just did a 1h30m swift tutorial at the two-week-long International 
Summer School on Grid Computing 2007 in Sweden.

The tutorial was pretty much the same as what we did at TG07.

It went pretty well - no problems with running out of entropy like last 
time. People reached the end approximately on time.

There are still some inelegant bits with mappers in this tutorial - 
there's at least one bug open for that and eventually it will get fixed.

-- 


From wilde at mcs.anl.gov  Tue Jul 17 07:42:07 2007
From: wilde at mcs.anl.gov (Mike Wilde)
Date: Tue, 17 Jul 2007 07:42:07 -0500
Subject: [Swift-devel] swift tutorial at ISSGC07
In-Reply-To: <Pine.LNX.4.64.0707171210260.7513@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707171210260.7513@dildano.hawaga.org.uk>
Message-ID: <469CB91F.1090903@mcs.anl.gov>

Sounds great, Ben! Any comments from the students?

(All - this is around 60-70 students)

- Mike

Ben Clifford wrote, On 7/17/2007 7:13 AM:
> I just did a 1h30m swift tutorial at the two-week-long International 
> Summer School on Grid Computing 2007 in Sweden.
> 
> The tutorial was pretty much the same as what we did at TG07.
> 
> It went pretty well - no problems with running out of entropy like last 
> time. People reached the end approximately on time.
> 
> There are still some inelegant bits with mappers in this tutorial - 
> there's at least one bug open for that and eventually it will get fixed.
> 

-- 
Mike Wilde
Computation Institute, University of Chicago
Math & Computer Science Division
Argonne National Laboratory
Argonne, IL   60439    USA
tel 630-252-7497 fax 630-252-1997


From benc at hawaga.org.uk  Tue Jul 17 08:53:14 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 17 Jul 2007 13:53:14 +0000 (GMT)
Subject: [Swift-devel] 0.2 release (again)
Message-ID: <Pine.LNX.4.64.0707171351500.7513@dildano.hawaga.org.uk>


I'm building a release candidate for a low-effort 0.2 release from swift 
r915 and cog r1658. Will post here with it later on.

-- 


From benc at hawaga.org.uk  Tue Jul 17 10:47:36 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 17 Jul 2007 15:47:36 +0000 (GMT)
Subject: [Swift-devel] 0.2 release (again)
In-Reply-To: <Pine.LNX.4.64.0707171351500.7513@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707171351500.7513@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.64.0707171546560.11237@dildano.hawaga.org.uk>


On Tue, 17 Jul 2007, Ben Clifford wrote:

> I'm building a release candidate for a low-effort 0.2 release from swift 
> r915 and cog r1658. Will post here with it later on.

http://www.ci.uchicago.edu/~benc/vdsk-0.2.tar.gz

$ md5sum vdsk-0.2.tar.gz 
25130bbe97f2f10653b48968953c6d84  vdsk-0.2.tar.gz

It runs hello world for me. I haven't done any other testing.

-- 


From benc at hawaga.org.uk  Tue Jul 17 10:50:09 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 17 Jul 2007 15:50:09 +0000 (GMT)
Subject: [Swift-devel] Re: dot files by default
In-Reply-To: <Pine.OSX.4.64.0707041106210.1364@soju.hawaga.org.uk>
References: <Pine.OSX.4.64.0707041106210.1364@soju.hawaga.org.uk>
Message-ID: <Pine.LNX.4.64.0707171549560.7513@dildano.hawaga.org.uk>


On Wed, 4 Jul 2007, Ben Clifford wrote:

> does anyone have preference about whether .dot graphviz files are 
> generated by default or not?
> 
> I find them a bit annoying in as much as they double the number of run 
> files in my working directories to no immediate benefit.

r907 makes this turned off by default.

-- 


From bugzilla-daemon at mcs.anl.gov  Tue Jul 17 16:08:59 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Tue, 17 Jul 2007 16:08:59 -0500 (CDT)
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <bug-72-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72


------- Comment #24 from iraicu at cs.uchicago.edu  2007-07-17 16:08 -------
So the latest MolDyn's 244 mol run also failed... but I think it made it all
the way to the final few jobs...

The place where I put all the information about the run is at:
http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/

Here are the graphs:
http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg
http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg
http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg

The Swift log can be found at:
http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log

The Falkon logs are at:
http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/

The 244 mol run was supposed to have 20497 tasks, broken down as follows:
1       1       1
1       244     244
1       244     244
68      244     16592
1       244     244
11      244     2684
1       244     244
1       244     244
======================
                20497

We had 20495 tasks that exited with an exit code of 0, and 6 tasks that exited
with an exit code of -3.  The worker logs don't show anything on the stdout or
stderr of the failed jobs.  I looked online what an exit code of -3 could mean,
but didn't find anything.  

Here are the failed 6 tasks:
Executing task urn:0-9408-1184616132483... Building executable
command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei fe_stdout_m112
stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
--resultonly --wham_outputs wf_m112 --solv_lrc_file solv_chg_a10_m112_done
--fe_file fe_solv_m112 
Task urn:0-9408-1184616132483 completed with exit code -3 in 238 ms

Executing task urn:0-9408-1184616133199... Building executable
command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei fe_stdout_m112
stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
--resultonly --wham_outputs wf_m112 --solv_lrc_file solv_chg_a10_m112_done
--fe_file fe_solv_m112 
Task urn:0-9408-1184616133199 completed with exit code -3 in 201 ms

Executing task urn:0-15036-1184616133342... Building executable
command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei fe_stdout_m179
stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
--resultonly --wham_outputs wf_m179 --solv_lrc_file solv_chg_a10_m179_done
--fe_file fe_solv_m179 
Task urn:0-15036-1184616133342 completed with exit code -3 in 267 ms

Executing task urn:0-15036-1184616133628... Building executable
command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei fe_stdout_m179
stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
--resultonly --wham_outputs wf_m179 --solv_lrc_file solv_chg_a10_m179_done
--fe_file fe_solv_m179 
Task urn:0-15036-1184616133628 completed with exit code -3 in 2368 ms

Executing task urn:0-15036-1184616133528... Building executable
command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei fe_stdout_m179
stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
--resultonly --wham_outputs wf_m179 --solv_lrc_file solv_chg_a10_m179_done
--fe_file fe_solv_m179 
Task urn:0-15036-1184616133528 completed with exit code -3 in 311 ms

Executing task urn:0-9408-1184616130688... Building executable
command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei fe_stdout_m112
stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
--resultonly --wham_outputs wf_m112 --solv_lrc_file solv_chg_a10_m112_done
--fe_file fe_solv_m112 
Task urn:0-9408-1184616130688 completed with exit code -3 in 464 ms


Both the Falkon logs and the Swift logs agree on the number of submitted tasks,
number of successful tasks, and number of failed tasks.  There were no
outstanding tasks at the time when the workflow failed.  BTW, I checked the
disk space usage after about an hour that the whole experiment finished, and
there was plenty of disk space left.

Yong mentioned that he looked through the output of MolDyn, and there were only
242 'fe_solv_*' files, so 2 molecule files were missing...  one question for
Nika, are the 6 failed tasks the same job, resubmitted?  

Nika, can you add anything more to this?  Is there anything else to be learned
from the Swift log, as to why those last few jobs failed?  After we have tried
to figure out what happened, can we resume the workflow, and hopefully finish
the last few jobs in another run?

Ioan


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From foster at mcs.anl.gov  Tue Jul 17 21:39:20 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Tue, 17 Jul 2007 21:39:20 -0500
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov>
References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov>
Message-ID: <469D7D58.8000908@mcs.anl.gov>

Ioan:

a) I think this information should be in the bugzilla summary, according 
to our processes?

b) Why did it take so long to get all of the workers working?

c) Can we debug using less than O(800) node hours?

Ian.

bugzilla-daemon at mcs.anl.gov wrote:
> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72
>
>
>
>
>
> ------- Comment #24 from iraicu at cs.uchicago.edu  2007-07-17 16:08 -------
> So the latest MolDyn's 244 mol run also failed... but I think it made it all
> the way to the final few jobs...
>
> The place where I put all the information about the run is at:
> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/
>
> Here are the graphs:
> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg
> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg
> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg
>
> The Swift log can be found at:
> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log
>
> The Falkon logs are at:
> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/
>
> The 244 mol run was supposed to have 20497 tasks, broken down as follows:
> 1       1       1
> 1       244     244
> 1       244     244
> 68      244     16592
> 1       244     244
> 11      244     2684
> 1       244     244
> 1       244     244
> ======================
>                 20497
>
> We had 20495 tasks that exited with an exit code of 0, and 6 tasks that exited
> with an exit code of -3.  The worker logs don't show anything on the stdout or
> stderr of the failed jobs.  I looked online what an exit code of -3 could mean,
> but didn't find anything.  
>
> Here are the failed 6 tasks:
> Executing task urn:0-9408-1184616132483... Building executable
> command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei fe_stdout_m112
> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
> --resultonly --wham_outputs wf_m112 --solv_lrc_file solv_chg_a10_m112_done
> --fe_file fe_solv_m112 
> Task urn:0-9408-1184616132483 completed with exit code -3 in 238 ms
>
> Executing task urn:0-9408-1184616133199... Building executable
> command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei fe_stdout_m112
> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
> --resultonly --wham_outputs wf_m112 --solv_lrc_file solv_chg_a10_m112_done
> --fe_file fe_solv_m112 
> Task urn:0-9408-1184616133199 completed with exit code -3 in 201 ms
>
> Executing task urn:0-15036-1184616133342... Building executable
> command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei fe_stdout_m179
> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
> --resultonly --wham_outputs wf_m179 --solv_lrc_file solv_chg_a10_m179_done
> --fe_file fe_solv_m179 
> Task urn:0-15036-1184616133342 completed with exit code -3 in 267 ms
>
> Executing task urn:0-15036-1184616133628... Building executable
> command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei fe_stdout_m179
> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
> --resultonly --wham_outputs wf_m179 --solv_lrc_file solv_chg_a10_m179_done
> --fe_file fe_solv_m179 
> Task urn:0-15036-1184616133628 completed with exit code -3 in 2368 ms
>
> Executing task urn:0-15036-1184616133528... Building executable
> command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei fe_stdout_m179
> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
> --resultonly --wham_outputs wf_m179 --solv_lrc_file solv_chg_a10_m179_done
> --fe_file fe_solv_m179 
> Task urn:0-15036-1184616133528 completed with exit code -3 in 311 ms
>
> Executing task urn:0-9408-1184616130688... Building executable
> command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei fe_stdout_m112
> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
> --resultonly --wham_outputs wf_m112 --solv_lrc_file solv_chg_a10_m112_done
> --fe_file fe_solv_m112 
> Task urn:0-9408-1184616130688 completed with exit code -3 in 464 ms
>
>
> Both the Falkon logs and the Swift logs agree on the number of submitted tasks,
> number of successful tasks, and number of failed tasks.  There were no
> outstanding tasks at the time when the workflow failed.  BTW, I checked the
> disk space usage after about an hour that the whole experiment finished, and
> there was plenty of disk space left.
>
> Yong mentioned that he looked through the output of MolDyn, and there were only
> 242 'fe_solv_*' files, so 2 molecule files were missing...  one question for
> Nika, are the 6 failed tasks the same job, resubmitted?  
>
> Nika, can you add anything more to this?  Is there anything else to be learned
> from the Swift log, as to why those last few jobs failed?  After we have tried
> to figure out what happened, can we resume the workflow, and hopefully finish
> the last few jobs in another run?
>
> Ioan
>
>
>   

-- 

   Ian Foster, Director, Computation Institute
Argonne National Laboratory & University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
      Globus Alliance: www.globus.org.


From foster at mcs.anl.gov  Tue Jul 17 21:43:52 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Tue, 17 Jul 2007 21:43:52 -0500
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <469D7D58.8000908@mcs.anl.gov>
References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov>
	<469D7D58.8000908@mcs.anl.gov>
Message-ID: <469D7E68.9050202@mcs.anl.gov>

Another (perhaps dumb?) question--it would seem desirable that we be 
able to quickly determine what tasks failed and then (attempt to) rerun 
them in such circumstances.

Here it seems that a lot of effort is required just to determine what 
tasks failed, and I am not sure that the information extracted is enough 
to rerun them.

It also seems that we can't easily determine which output files are missing.

Ian.

Ian Foster wrote:
> Ioan:
>
> a) I think this information should be in the bugzilla summary, 
> according to our processes?
>
> b) Why did it take so long to get all of the workers working?
>
> c) Can we debug using less than O(800) node hours?
>
> Ian.
>
> bugzilla-daemon at mcs.anl.gov wrote:
>> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72
>>
>>
>>
>>
>>
>> ------- Comment #24 from iraicu at cs.uchicago.edu  2007-07-17 16:08 
>> -------
>> So the latest MolDyn's 244 mol run also failed... but I think it made 
>> it all
>> the way to the final few jobs...
>>
>> The place where I put all the information about the run is at:
>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ 
>>
>>
>> Here are the graphs:
>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg 
>>
>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg 
>>
>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg 
>>
>>
>> The Swift log can be found at:
>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log 
>>
>>
>> The Falkon logs are at:
>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/ 
>>
>>
>> The 244 mol run was supposed to have 20497 tasks, broken down as 
>> follows:
>> 1       1       1
>> 1       244     244
>> 1       244     244
>> 68      244     16592
>> 1       244     244
>> 11      244     2684
>> 1       244     244
>> 1       244     244
>> ======================
>>                 20497
>>
>> We had 20495 tasks that exited with an exit code of 0, and 6 tasks 
>> that exited
>> with an exit code of -3.  The worker logs don't show anything on the 
>> stdout or
>> stderr of the failed jobs.  I looked online what an exit code of -3 
>> could mean,
>> but didn't find anything. 
>> Here are the failed 6 tasks:
>> Executing task urn:0-9408-1184616132483... Building executable
>> command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei 
>> fe_stdout_m112
>> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
>> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>> --resultonly --wham_outputs wf_m112 --solv_lrc_file 
>> solv_chg_a10_m112_done
>> --fe_file fe_solv_m112 Task urn:0-9408-1184616132483 completed with 
>> exit code -3 in 238 ms
>>
>> Executing task urn:0-9408-1184616133199... Building executable
>> command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei 
>> fe_stdout_m112
>> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
>> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>> --resultonly --wham_outputs wf_m112 --solv_lrc_file 
>> solv_chg_a10_m112_done
>> --fe_file fe_solv_m112 Task urn:0-9408-1184616133199 completed with 
>> exit code -3 in 201 ms
>>
>> Executing task urn:0-15036-1184616133342... Building executable
>> command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei 
>> fe_stdout_m179
>> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
>> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>> --resultonly --wham_outputs wf_m179 --solv_lrc_file 
>> solv_chg_a10_m179_done
>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133342 completed with 
>> exit code -3 in 267 ms
>>
>> Executing task urn:0-15036-1184616133628... Building executable
>> command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei 
>> fe_stdout_m179
>> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
>> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>> --resultonly --wham_outputs wf_m179 --solv_lrc_file 
>> solv_chg_a10_m179_done
>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133628 completed with 
>> exit code -3 in 2368 ms
>>
>> Executing task urn:0-15036-1184616133528... Building executable
>> command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei 
>> fe_stdout_m179
>> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
>> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>> --resultonly --wham_outputs wf_m179 --solv_lrc_file 
>> solv_chg_a10_m179_done
>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133528 completed with 
>> exit code -3 in 311 ms
>>
>> Executing task urn:0-9408-1184616130688... Building executable
>> command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei 
>> fe_stdout_m112
>> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
>> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>> --resultonly --wham_outputs wf_m112 --solv_lrc_file 
>> solv_chg_a10_m112_done
>> --fe_file fe_solv_m112 Task urn:0-9408-1184616130688 completed with 
>> exit code -3 in 464 ms
>>
>>
>> Both the Falkon logs and the Swift logs agree on the number of 
>> submitted tasks,
>> number of successful tasks, and number of failed tasks.  There were no
>> outstanding tasks at the time when the workflow failed.  BTW, I 
>> checked the
>> disk space usage after about an hour that the whole experiment 
>> finished, and
>> there was plenty of disk space left.
>>
>> Yong mentioned that he looked through the output of MolDyn, and there 
>> were only
>> 242 'fe_solv_*' files, so 2 molecule files were missing...  one 
>> question for
>> Nika, are the 6 failed tasks the same job, resubmitted? 
>> Nika, can you add anything more to this?  Is there anything else to 
>> be learned
>> from the Swift log, as to why those last few jobs failed?  After we 
>> have tried
>> to figure out what happened, can we resume the workflow, and 
>> hopefully finish
>> the last few jobs in another run?
>>
>> Ioan
>>
>>
>>   
>

-- 

   Ian Foster, Director, Computation Institute
Argonne National Laboratory & University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
      Globus Alliance: www.globus.org.


From yongzh at cs.uchicago.edu  Tue Jul 17 21:50:12 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Tue, 17 Jul 2007 21:50:12 -0500 (CDT)
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <469D7E68.9050202@mcs.anl.gov>
References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov>
	<469D7D58.8000908@mcs.anl.gov> <469D7E68.9050202@mcs.anl.gov>
Message-ID: <Pine.LNX.4.58.0707172148260.8542@classes.cs.uchicago.edu>

We already have retry mechanism there. I suspect the failed jobs were
retried but failed again. The server side logs should have something about
which files were missing.

Yong.

On Tue, 17 Jul 2007, Ian Foster wrote:

> Another (perhaps dumb?) question--it would seem desirable that we be
> able to quickly determine what tasks failed and then (attempt to) rerun
> them in such circumstances.
>
> Here it seems that a lot of effort is required just to determine what
> tasks failed, and I am not sure that the information extracted is enough
> to rerun them.
>
> It also seems that we can't easily determine which output files are missing.
>
> Ian.
>
> Ian Foster wrote:
> > Ioan:
> >
> > a) I think this information should be in the bugzilla summary,
> > according to our processes?
> >
> > b) Why did it take so long to get all of the workers working?
> >
> > c) Can we debug using less than O(800) node hours?
> >
> > Ian.
> >
> > bugzilla-daemon at mcs.anl.gov wrote:
> >> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72
> >>
> >>
> >>
> >>
> >>
> >> ------- Comment #24 from iraicu at cs.uchicago.edu  2007-07-17 16:08
> >> -------
> >> So the latest MolDyn's 244 mol run also failed... but I think it made
> >> it all
> >> the way to the final few jobs...
> >>
> >> The place where I put all the information about the run is at:
> >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/
> >>
> >>
> >> Here are the graphs:
> >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg
> >>
> >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg
> >>
> >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg
> >>
> >>
> >> The Swift log can be found at:
> >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log
> >>
> >>
> >> The Falkon logs are at:
> >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/
> >>
> >>
> >> The 244 mol run was supposed to have 20497 tasks, broken down as
> >> follows:
> >> 1       1       1
> >> 1       244     244
> >> 1       244     244
> >> 68      244     16592
> >> 1       244     244
> >> 11      244     2684
> >> 1       244     244
> >> 1       244     244
> >> ======================
> >>                 20497
> >>
> >> We had 20495 tasks that exited with an exit code of 0, and 6 tasks
> >> that exited
> >> with an exit code of -3.  The worker logs don't show anything on the
> >> stdout or
> >> stderr of the failed jobs.  I looked online what an exit code of -3
> >> could mean,
> >> but didn't find anything.
> >> Here are the failed 6 tasks:
> >> Executing task urn:0-9408-1184616132483... Building executable
> >> command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei
> >> fe_stdout_m112
> >> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
> >> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
> >> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
> >> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
> >> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
> >> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
> >> --resultonly --wham_outputs wf_m112 --solv_lrc_file
> >> solv_chg_a10_m112_done
> >> --fe_file fe_solv_m112 Task urn:0-9408-1184616132483 completed with
> >> exit code -3 in 238 ms
> >>
> >> Executing task urn:0-9408-1184616133199... Building executable
> >> command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei
> >> fe_stdout_m112
> >> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
> >> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
> >> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
> >> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
> >> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
> >> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
> >> --resultonly --wham_outputs wf_m112 --solv_lrc_file
> >> solv_chg_a10_m112_done
> >> --fe_file fe_solv_m112 Task urn:0-9408-1184616133199 completed with
> >> exit code -3 in 201 ms
> >>
> >> Executing task urn:0-15036-1184616133342... Building executable
> >> command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei
> >> fe_stdout_m179
> >> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
> >> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
> >> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
> >> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
> >> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
> >> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
> >> --resultonly --wham_outputs wf_m179 --solv_lrc_file
> >> solv_chg_a10_m179_done
> >> --fe_file fe_solv_m179 Task urn:0-15036-1184616133342 completed with
> >> exit code -3 in 267 ms
> >>
> >> Executing task urn:0-15036-1184616133628... Building executable
> >> command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei
> >> fe_stdout_m179
> >> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
> >> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
> >> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
> >> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
> >> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
> >> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
> >> --resultonly --wham_outputs wf_m179 --solv_lrc_file
> >> solv_chg_a10_m179_done
> >> --fe_file fe_solv_m179 Task urn:0-15036-1184616133628 completed with
> >> exit code -3 in 2368 ms
> >>
> >> Executing task urn:0-15036-1184616133528... Building executable
> >> command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei
> >> fe_stdout_m179
> >> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
> >> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
> >> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
> >> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
> >> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
> >> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
> >> --resultonly --wham_outputs wf_m179 --solv_lrc_file
> >> solv_chg_a10_m179_done
> >> --fe_file fe_solv_m179 Task urn:0-15036-1184616133528 completed with
> >> exit code -3 in 311 ms
> >>
> >> Executing task urn:0-9408-1184616130688... Building executable
> >> command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei
> >> fe_stdout_m112
> >> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
> >> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
> >> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
> >> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
> >> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
> >> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
> >> --resultonly --wham_outputs wf_m112 --solv_lrc_file
> >> solv_chg_a10_m112_done
> >> --fe_file fe_solv_m112 Task urn:0-9408-1184616130688 completed with
> >> exit code -3 in 464 ms
> >>
> >>
> >> Both the Falkon logs and the Swift logs agree on the number of
> >> submitted tasks,
> >> number of successful tasks, and number of failed tasks.  There were no
> >> outstanding tasks at the time when the workflow failed.  BTW, I
> >> checked the
> >> disk space usage after about an hour that the whole experiment
> >> finished, and
> >> there was plenty of disk space left.
> >>
> >> Yong mentioned that he looked through the output of MolDyn, and there
> >> were only
> >> 242 'fe_solv_*' files, so 2 molecule files were missing...  one
> >> question for
> >> Nika, are the 6 failed tasks the same job, resubmitted?
> >> Nika, can you add anything more to this?  Is there anything else to
> >> be learned
> >> from the Swift log, as to why those last few jobs failed?  After we
> >> have tried
> >> to figure out what happened, can we resume the workflow, and
> >> hopefully finish
> >> the last few jobs in another run?
> >>
> >> Ioan
> >>
> >>
> >>
> >
>
> --
>
>    Ian Foster, Director, Computation Institute
> Argonne National Laboratory & University of Chicago
> Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
> Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
> Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
>       Globus Alliance: www.globus.org.
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From hategan at mcs.anl.gov  Tue Jul 17 22:11:23 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 17 Jul 2007 22:11:23 -0500
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <469D7E68.9050202@mcs.anl.gov>
References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov>
	<469D7D58.8000908@mcs.anl.gov>  <469D7E68.9050202@mcs.anl.gov>
Message-ID: <1184728284.2004.12.camel@blabla.mcs.anl.gov>

On Tue, 2007-07-17 at 21:43 -0500, Ian Foster wrote:
> Another (perhaps dumb?) question--it would seem desirable that we be 
> able to quickly determine what tasks failed and then (attempt to) rerun 
> them in such circumstances.
> 
> Here it seems that a lot of effort is required just to determine what 
> tasks failed, and I am not sure that the information extracted is enough 
> to rerun them.

Normally, a summary of what failed with the reasons is printed on
stderr, together with the stdout and stderr of the jobs. Perhaps it
should also go to the log file.

In this case, 2 jobs failed. The 6 failures are due to restarts. Which
is in agreement with the 2 missing molecules.

When jobs fail, swift should not clean up the job directories so that
one can do post-mortem debugging. I suggest invoking the application
manually to see if it's a matter of a bad node or bad data.

> 
> It also seems that we can't easily determine which output files are missing.

In the general case we wouldn't be able to, because the exact outputs
may only be known at run-time. Granted, that kind of dynamics would
depend on our ability to have nondeterministic files being returned,
which we haven't gotten around to implementing. But there is a question
of whether we should try to implement a short term solution that would
be invalidated by our own plans.

> 
> Ian.
> 
> Ian Foster wrote:
> > Ioan:
> >
> > a) I think this information should be in the bugzilla summary, 
> > according to our processes?
> >
> > b) Why did it take so long to get all of the workers working?
> >
> > c) Can we debug using less than O(800) node hours?
> >
> > Ian.
> >
> > bugzilla-daemon at mcs.anl.gov wrote:
> >> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72
> >>
> >>
> >>
> >>
> >>
> >> ------- Comment #24 from iraicu at cs.uchicago.edu  2007-07-17 16:08 
> >> -------
> >> So the latest MolDyn's 244 mol run also failed... but I think it made 
> >> it all
> >> the way to the final few jobs...
> >>
> >> The place where I put all the information about the run is at:
> >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ 
> >>
> >>
> >> Here are the graphs:
> >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg 
> >>
> >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg 
> >>
> >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg 
> >>
> >>
> >> The Swift log can be found at:
> >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log 
> >>
> >>
> >> The Falkon logs are at:
> >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/ 
> >>
> >>
> >> The 244 mol run was supposed to have 20497 tasks, broken down as 
> >> follows:
> >> 1       1       1
> >> 1       244     244
> >> 1       244     244
> >> 68      244     16592
> >> 1       244     244
> >> 11      244     2684
> >> 1       244     244
> >> 1       244     244
> >> ======================
> >>                 20497
> >>
> >> We had 20495 tasks that exited with an exit code of 0, and 6 tasks 
> >> that exited
> >> with an exit code of -3.  The worker logs don't show anything on the 
> >> stdout or
> >> stderr of the failed jobs.  I looked online what an exit code of -3 
> >> could mean,
> >> but didn't find anything. 
> >> Here are the failed 6 tasks:
> >> Executing task urn:0-9408-1184616132483... Building executable
> >> command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei 
> >> fe_stdout_m112
> >> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
> >> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
> >> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
> >> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
> >> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
> >> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
> >> --resultonly --wham_outputs wf_m112 --solv_lrc_file 
> >> solv_chg_a10_m112_done
> >> --fe_file fe_solv_m112 Task urn:0-9408-1184616132483 completed with 
> >> exit code -3 in 238 ms
> >>
> >> Executing task urn:0-9408-1184616133199... Building executable
> >> command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei 
> >> fe_stdout_m112
> >> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
> >> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
> >> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
> >> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
> >> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
> >> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
> >> --resultonly --wham_outputs wf_m112 --solv_lrc_file 
> >> solv_chg_a10_m112_done
> >> --fe_file fe_solv_m112 Task urn:0-9408-1184616133199 completed with 
> >> exit code -3 in 201 ms
> >>
> >> Executing task urn:0-15036-1184616133342... Building executable
> >> command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei 
> >> fe_stdout_m179
> >> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
> >> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
> >> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
> >> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
> >> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
> >> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
> >> --resultonly --wham_outputs wf_m179 --solv_lrc_file 
> >> solv_chg_a10_m179_done
> >> --fe_file fe_solv_m179 Task urn:0-15036-1184616133342 completed with 
> >> exit code -3 in 267 ms
> >>
> >> Executing task urn:0-15036-1184616133628... Building executable
> >> command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei 
> >> fe_stdout_m179
> >> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
> >> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
> >> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
> >> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
> >> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
> >> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
> >> --resultonly --wham_outputs wf_m179 --solv_lrc_file 
> >> solv_chg_a10_m179_done
> >> --fe_file fe_solv_m179 Task urn:0-15036-1184616133628 completed with 
> >> exit code -3 in 2368 ms
> >>
> >> Executing task urn:0-15036-1184616133528... Building executable
> >> command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei 
> >> fe_stdout_m179
> >> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
> >> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
> >> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
> >> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
> >> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
> >> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
> >> --resultonly --wham_outputs wf_m179 --solv_lrc_file 
> >> solv_chg_a10_m179_done
> >> --fe_file fe_solv_m179 Task urn:0-15036-1184616133528 completed with 
> >> exit code -3 in 311 ms
> >>
> >> Executing task urn:0-9408-1184616130688... Building executable
> >> command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei 
> >> fe_stdout_m112
> >> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
> >> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
> >> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
> >> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
> >> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
> >> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
> >> --resultonly --wham_outputs wf_m112 --solv_lrc_file 
> >> solv_chg_a10_m112_done
> >> --fe_file fe_solv_m112 Task urn:0-9408-1184616130688 completed with 
> >> exit code -3 in 464 ms
> >>
> >>
> >> Both the Falkon logs and the Swift logs agree on the number of 
> >> submitted tasks,
> >> number of successful tasks, and number of failed tasks.  There were no
> >> outstanding tasks at the time when the workflow failed.  BTW, I 
> >> checked the
> >> disk space usage after about an hour that the whole experiment 
> >> finished, and
> >> there was plenty of disk space left.
> >>
> >> Yong mentioned that he looked through the output of MolDyn, and there 
> >> were only
> >> 242 'fe_solv_*' files, so 2 molecule files were missing...  one 
> >> question for
> >> Nika, are the 6 failed tasks the same job, resubmitted? 
> >> Nika, can you add anything more to this?  Is there anything else to 
> >> be learned
> >> from the Swift log, as to why those last few jobs failed?  After we 
> >> have tried
> >> to figure out what happened, can we resume the workflow, and 
> >> hopefully finish
> >> the last few jobs in another run?
> >>
> >> Ioan
> >>
> >>
> >>   
> >
> 
> -- 
> 
>    Ian Foster, Director, Computation Institute
> Argonne National Laboratory & University of Chicago
> Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
> Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
> Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
>       Globus Alliance: www.globus.org.
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 


From iraicu at cs.uchicago.edu  Tue Jul 17 22:30:34 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Tue, 17 Jul 2007 22:30:34 -0500
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <469D7D58.8000908@mcs.anl.gov>
References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov>
	<469D7D58.8000908@mcs.anl.gov>
Message-ID: <469D895A.5090706@cs.uchicago.edu>

Hi,
See below:

Ian Foster wrote:
> Ioan:
>
> a) I think this information should be in the bugzilla summary, 
> according to our processes?
>
I posted all this to bugzilla, didn't I?
> b) Why did it take so long to get all of the workers working?
I finally had enough confidence in the dynamic resource provisioning 
that we won't loose any jobs across resource allocation boundaries (ran 
lots of tests and they were all positive), so I enabled it for this 
run.  I set the max to be the entire ANL site (274 processors)... and we 
got 146 at the beginning, and with time, the # of processors kept 
increasing up to the peak of 208 or so... the rest up to 274 were queued 
up in the PBS wait queue.  The difference between the beginning with 146 
and the end with 208 was that others who were in the system at the 
beginning finished their work and released some nodes, and idle 
processors went from the wait queue into the run queue.  I would 
actually be curious to try out the latest DRP stuff on a busy site, such 
as Purdue or NCSA, and to see if we can maintain a nice pool size over a 
period of time, despite the sites being busy...

BTW, in the previous runs for MolDyn, we normally set the min and max to 
say 100 processors, or 200 processors, and we would wait until we had 
all of them before we started... sometimes, this meant waiting 12~24 
hours for enough nodes to become free so the large job could start.  
With DRP, you can start off with whatever the site has available, and 
you get more with time as your jobs make it through the wait queue and 
other jobs that are running complete...
>
> c) Can we debug using less than O(800) node hours?
The real MolDyn run for 244 molecules takes on the order of O(20K) node 
hours, so O(0.8K) is still an improvement.  Remember that we can run the 
smaller workflows fine, but its the bigger ones that are giving us a 
hard time.  Nika, if you have any other suggestion on how we can further 
reduce the run time of each job just to simulate the # of jobs and the 
input/output # of files, let us know.

Ioan
>
> Ian.
>
> bugzilla-daemon at mcs.anl.gov wrote:
>> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72
>>
>>
>>
>>
>>
>> ------- Comment #24 from iraicu at cs.uchicago.edu  2007-07-17 16:08 
>> -------
>> So the latest MolDyn's 244 mol run also failed... but I think it made 
>> it all
>> the way to the final few jobs...
>>
>> The place where I put all the information about the run is at:
>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ 
>>
>>
>> Here are the graphs:
>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg 
>>
>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg 
>>
>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg 
>>
>>
>> The Swift log can be found at:
>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log 
>>
>>
>> The Falkon logs are at:
>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/ 
>>
>>
>> The 244 mol run was supposed to have 20497 tasks, broken down as 
>> follows:
>> 1       1       1
>> 1       244     244
>> 1       244     244
>> 68      244     16592
>> 1       244     244
>> 11      244     2684
>> 1       244     244
>> 1       244     244
>> ======================
>>                 20497
>>
>> We had 20495 tasks that exited with an exit code of 0, and 6 tasks 
>> that exited
>> with an exit code of -3.  The worker logs don't show anything on the 
>> stdout or
>> stderr of the failed jobs.  I looked online what an exit code of -3 
>> could mean,
>> but didn't find anything. 
>> Here are the failed 6 tasks:
>> Executing task urn:0-9408-1184616132483... Building executable
>> command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei 
>> fe_stdout_m112
>> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
>> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>> --resultonly --wham_outputs wf_m112 --solv_lrc_file 
>> solv_chg_a10_m112_done
>> --fe_file fe_solv_m112 Task urn:0-9408-1184616132483 completed with 
>> exit code -3 in 238 ms
>>
>> Executing task urn:0-9408-1184616133199... Building executable
>> command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei 
>> fe_stdout_m112
>> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
>> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>> --resultonly --wham_outputs wf_m112 --solv_lrc_file 
>> solv_chg_a10_m112_done
>> --fe_file fe_solv_m112 Task urn:0-9408-1184616133199 completed with 
>> exit code -3 in 201 ms
>>
>> Executing task urn:0-15036-1184616133342... Building executable
>> command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei 
>> fe_stdout_m179
>> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
>> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>> --resultonly --wham_outputs wf_m179 --solv_lrc_file 
>> solv_chg_a10_m179_done
>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133342 completed with 
>> exit code -3 in 267 ms
>>
>> Executing task urn:0-15036-1184616133628... Building executable
>> command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei 
>> fe_stdout_m179
>> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
>> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>> --resultonly --wham_outputs wf_m179 --solv_lrc_file 
>> solv_chg_a10_m179_done
>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133628 completed with 
>> exit code -3 in 2368 ms
>>
>> Executing task urn:0-15036-1184616133528... Building executable
>> command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei 
>> fe_stdout_m179
>> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
>> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>> --resultonly --wham_outputs wf_m179 --solv_lrc_file 
>> solv_chg_a10_m179_done
>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133528 completed with 
>> exit code -3 in 311 ms
>>
>> Executing task urn:0-9408-1184616130688... Building executable
>> command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei 
>> fe_stdout_m112
>> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
>> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>> --resultonly --wham_outputs wf_m112 --solv_lrc_file 
>> solv_chg_a10_m112_done
>> --fe_file fe_solv_m112 Task urn:0-9408-1184616130688 completed with 
>> exit code -3 in 464 ms
>>
>>
>> Both the Falkon logs and the Swift logs agree on the number of 
>> submitted tasks,
>> number of successful tasks, and number of failed tasks.  There were no
>> outstanding tasks at the time when the workflow failed.  BTW, I 
>> checked the
>> disk space usage after about an hour that the whole experiment 
>> finished, and
>> there was plenty of disk space left.
>>
>> Yong mentioned that he looked through the output of MolDyn, and there 
>> were only
>> 242 'fe_solv_*' files, so 2 molecule files were missing...  one 
>> question for
>> Nika, are the 6 failed tasks the same job, resubmitted? 
>> Nika, can you add anything more to this?  Is there anything else to 
>> be learned
>> from the Swift log, as to why those last few jobs failed?  After we 
>> have tried
>> to figure out what happened, can we resume the workflow, and 
>> hopefully finish
>> the last few jobs in another run?
>>
>> Ioan
>>
>>
>>   
>

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================


From iraicu at cs.uchicago.edu  Tue Jul 17 22:33:02 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Tue, 17 Jul 2007 22:33:02 -0500
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <469D7E68.9050202@mcs.anl.gov>
References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov>	<469D7D58.8000908@mcs.anl.gov>
	<469D7E68.9050202@mcs.anl.gov>
Message-ID: <469D89EE.5090202@cs.uchicago.edu>


Ian Foster wrote:
> Another (perhaps dumb?) question--it would seem desirable that we be 
> able to quickly determine what tasks failed and then (attempt to) 
> rerun them in such circumstances. \
I think Swift already does this up to a fixed # of times (I think it is 
3 or 5).
>
> Here it seems that a lot of effort is required just to determine what 
> tasks failed, and I am not sure that the information extracted is 
> enough to rerun them.
The failed tasks are pretty easy to find in the logs based on the exit 
code.  If we were to do a resume from Swift, I think it would 
automatically resubmit just the failed tasks... but unless we figure out 
why they failed and fix the problem, they will likely again.
>
> It also seems that we can't easily determine which output files are 
> missing.
I don't know about this one, Maybe Nika can comment on this.

Ioan
>
> Ian.
>
> Ian Foster wrote:
>> Ioan:
>>
>> a) I think this information should be in the bugzilla summary, 
>> according to our processes?
>>
>> b) Why did it take so long to get all of the workers working?
>>
>> c) Can we debug using less than O(800) node hours?
>>
>> Ian.
>>
>> bugzilla-daemon at mcs.anl.gov wrote:
>>> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72
>>>
>>>
>>>
>>>
>>>
>>> ------- Comment #24 from iraicu at cs.uchicago.edu  2007-07-17 16:08 
>>> -------
>>> So the latest MolDyn's 244 mol run also failed... but I think it 
>>> made it all
>>> the way to the final few jobs...
>>>
>>> The place where I put all the information about the run is at:
>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ 
>>>
>>>
>>> Here are the graphs:
>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg 
>>>
>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg 
>>>
>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg 
>>>
>>>
>>> The Swift log can be found at:
>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log 
>>>
>>>
>>> The Falkon logs are at:
>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/ 
>>>
>>>
>>> The 244 mol run was supposed to have 20497 tasks, broken down as 
>>> follows:
>>> 1       1       1
>>> 1       244     244
>>> 1       244     244
>>> 68      244     16592
>>> 1       244     244
>>> 11      244     2684
>>> 1       244     244
>>> 1       244     244
>>> ======================
>>>                 20497
>>>
>>> We had 20495 tasks that exited with an exit code of 0, and 6 tasks 
>>> that exited
>>> with an exit code of -3.  The worker logs don't show anything on the 
>>> stdout or
>>> stderr of the failed jobs.  I looked online what an exit code of -3 
>>> could mean,
>>> but didn't find anything. Here are the failed 6 tasks:
>>> Executing task urn:0-9408-1184616132483... Building executable
>>> command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei 
>>> fe_stdout_m112
>>> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
>>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
>>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
>>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
>>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
>>> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>>> --resultonly --wham_outputs wf_m112 --solv_lrc_file 
>>> solv_chg_a10_m112_done
>>> --fe_file fe_solv_m112 Task urn:0-9408-1184616132483 completed with 
>>> exit code -3 in 238 ms
>>>
>>> Executing task urn:0-9408-1184616133199... Building executable
>>> command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei 
>>> fe_stdout_m112
>>> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
>>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
>>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
>>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
>>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
>>> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>>> --resultonly --wham_outputs wf_m112 --solv_lrc_file 
>>> solv_chg_a10_m112_done
>>> --fe_file fe_solv_m112 Task urn:0-9408-1184616133199 completed with 
>>> exit code -3 in 201 ms
>>>
>>> Executing task urn:0-15036-1184616133342... Building executable
>>> command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei 
>>> fe_stdout_m179
>>> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
>>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
>>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
>>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
>>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
>>> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>>> --resultonly --wham_outputs wf_m179 --solv_lrc_file 
>>> solv_chg_a10_m179_done
>>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133342 completed with 
>>> exit code -3 in 267 ms
>>>
>>> Executing task urn:0-15036-1184616133628... Building executable
>>> command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei 
>>> fe_stdout_m179
>>> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
>>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
>>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
>>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
>>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
>>> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>>> --resultonly --wham_outputs wf_m179 --solv_lrc_file 
>>> solv_chg_a10_m179_done
>>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133628 completed with 
>>> exit code -3 in 2368 ms
>>>
>>> Executing task urn:0-15036-1184616133528... Building executable
>>> command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei 
>>> fe_stdout_m179
>>> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
>>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
>>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
>>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
>>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
>>> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>>> --resultonly --wham_outputs wf_m179 --solv_lrc_file 
>>> solv_chg_a10_m179_done
>>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133528 completed with 
>>> exit code -3 in 311 ms
>>>
>>> Executing task urn:0-9408-1184616130688... Building executable
>>> command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei 
>>> fe_stdout_m112
>>> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
>>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
>>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
>>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
>>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
>>> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>>> --resultonly --wham_outputs wf_m112 --solv_lrc_file 
>>> solv_chg_a10_m112_done
>>> --fe_file fe_solv_m112 Task urn:0-9408-1184616130688 completed with 
>>> exit code -3 in 464 ms
>>>
>>>
>>> Both the Falkon logs and the Swift logs agree on the number of 
>>> submitted tasks,
>>> number of successful tasks, and number of failed tasks.  There were no
>>> outstanding tasks at the time when the workflow failed.  BTW, I 
>>> checked the
>>> disk space usage after about an hour that the whole experiment 
>>> finished, and
>>> there was plenty of disk space left.
>>>
>>> Yong mentioned that he looked through the output of MolDyn, and 
>>> there were only
>>> 242 'fe_solv_*' files, so 2 molecule files were missing...  one 
>>> question for
>>> Nika, are the 6 failed tasks the same job, resubmitted? Nika, can 
>>> you add anything more to this?  Is there anything else to be learned
>>> from the Swift log, as to why those last few jobs failed?  After we 
>>> have tried
>>> to figure out what happened, can we resume the workflow, and 
>>> hopefully finish
>>> the last few jobs in another run?
>>>
>>> Ioan
>>>
>>>
>>>   
>>
>

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================


From foster at mcs.anl.gov  Tue Jul 17 22:35:24 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Tue, 17 Jul 2007 22:35:24 -0500
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <469D895A.5090706@cs.uchicago.edu>
References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov>
	<469D7D58.8000908@mcs.anl.gov> <469D895A.5090706@cs.uchicago.edu>
Message-ID: <469D8A7C.9030604@mcs.anl.gov>

Great! What resource acquisition policy are you using?
>> b) Why did it take so long to get all of the workers working?
> I finally had enough confidence in the dynamic resource provisioning 
> that we won't loose any jobs across resource allocation boundaries 
> (ran lots of tests and they were all positive), so I enabled it for 
> this run.  I set the max to be the entire ANL site (274 processors)... 
> and we got 146 at the beginning, and with time, the # of processors 
> kept increasing up to the peak of 208 or so... the rest up to 274 were 
> queued up in the PBS wait queue.  The difference between the beginning 
> with 146 and the end with 208 was that others who were in the system 
> at the beginning finished their work and released some nodes, and idle 
> processors went from the wait queue into the run queue.  I would 
> actually be curious to try out the latest DRP stuff on a busy site, 
> such as Purdue or NCSA, and to see if we can maintain a nice pool size 
> over a period of time, despite the sites being busy...
>
> BTW, in the previous runs for MolDyn, we normally set the min and max 
> to say 100 processors, or 200 processors, and we would wait until we 
> had all of them before we started... sometimes, this meant waiting 
> 12~24 hours for enough nodes to become free so the large job could 
> start.  With DRP, you can start off with whatever the site has 
> available, and you get more with time as your jobs make it through the 
> wait queue and other jobs that are running complete...


From iraicu at cs.uchicago.edu  Tue Jul 17 22:36:36 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Tue, 17 Jul 2007 22:36:36 -0500
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <1184728284.2004.12.camel@blabla.mcs.anl.gov>
References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov>	<469D7D58.8000908@mcs.anl.gov>
	<469D7E68.9050202@mcs.anl.gov>
	<1184728284.2004.12.camel@blabla.mcs.anl.gov>
Message-ID: <469D8AC4.4010400@cs.uchicago.edu>


Mihael Hategan wrote:
> On Tue, 2007-07-17 at 21:43 -0500, Ian Foster wrote:
>   
>> Another (perhaps dumb?) question--it would seem desirable that we be 
>> able to quickly determine what tasks failed and then (attempt to) rerun 
>> them in such circumstances.
>>
>> Here it seems that a lot of effort is required just to determine what 
>> tasks failed, and I am not sure that the information extracted is enough 
>> to rerun them.
>>     
>
> Normally, a summary of what failed with the reasons is printed on
> stderr, together with the stdout and stderr of the jobs. Perhaps it
> should also go to the log file.
>
> In this case, 2 jobs failed. The 6 failures are due to restarts. Which
> is in agreement with the 2 missing molecules.
>
> When jobs fail, swift should not clean up the job directories so that
> one can do post-mortem debugging. I suggest invoking the application
> manually to see if it's a matter of a bad node or bad data.
>   
The errors happened on 3 different nodes, so I suspect that its not bad 
nodes (as we had previously experience with the stale NFS handle). 

Nika, I sent out the actual commands that failed... can you try to run 
them manually to see what happens, and possibly determine why they 
failed?  Can you also find out what an exit code of -3 means within the 
application that failed (you might have to look at the app source code, 
or contact the original source code writer).

Ioan
>   
>> It also seems that we can't easily determine which output files are missing.
>>     
>
> In the general case we wouldn't be able to, because the exact outputs
> may only be known at run-time. Granted, that kind of dynamics would
> depend on our ability to have nondeterministic files being returned,
> which we haven't gotten around to implementing. But there is a question
> of whether we should try to implement a short term solution that would
> be invalidated by our own plans.
>
>   
>> Ian.
>>
>> Ian Foster wrote:
>>     
>>> Ioan:
>>>
>>> a) I think this information should be in the bugzilla summary, 
>>> according to our processes?
>>>
>>> b) Why did it take so long to get all of the workers working?
>>>
>>> c) Can we debug using less than O(800) node hours?
>>>
>>> Ian.
>>>
>>> bugzilla-daemon at mcs.anl.gov wrote:
>>>       
>>>> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ------- Comment #24 from iraicu at cs.uchicago.edu  2007-07-17 16:08 
>>>> -------
>>>> So the latest MolDyn's 244 mol run also failed... but I think it made 
>>>> it all
>>>> the way to the final few jobs...
>>>>
>>>> The place where I put all the information about the run is at:
>>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ 
>>>>
>>>>
>>>> Here are the graphs:
>>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg 
>>>>
>>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg 
>>>>
>>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg 
>>>>
>>>>
>>>> The Swift log can be found at:
>>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log 
>>>>
>>>>
>>>> The Falkon logs are at:
>>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/ 
>>>>
>>>>
>>>> The 244 mol run was supposed to have 20497 tasks, broken down as 
>>>> follows:
>>>> 1       1       1
>>>> 1       244     244
>>>> 1       244     244
>>>> 68      244     16592
>>>> 1       244     244
>>>> 11      244     2684
>>>> 1       244     244
>>>> 1       244     244
>>>> ======================
>>>>                 20497
>>>>
>>>> We had 20495 tasks that exited with an exit code of 0, and 6 tasks 
>>>> that exited
>>>> with an exit code of -3.  The worker logs don't show anything on the 
>>>> stdout or
>>>> stderr of the failed jobs.  I looked online what an exit code of -3 
>>>> could mean,
>>>> but didn't find anything. 
>>>> Here are the failed 6 tasks:
>>>> Executing task urn:0-9408-1184616132483... Building executable
>>>> command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei 
>>>> fe_stdout_m112
>>>> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
>>>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
>>>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
>>>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
>>>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
>>>> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>>>> --resultonly --wham_outputs wf_m112 --solv_lrc_file 
>>>> solv_chg_a10_m112_done
>>>> --fe_file fe_solv_m112 Task urn:0-9408-1184616132483 completed with 
>>>> exit code -3 in 238 ms
>>>>
>>>> Executing task urn:0-9408-1184616133199... Building executable
>>>> command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei 
>>>> fe_stdout_m112
>>>> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
>>>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
>>>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
>>>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
>>>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
>>>> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>>>> --resultonly --wham_outputs wf_m112 --solv_lrc_file 
>>>> solv_chg_a10_m112_done
>>>> --fe_file fe_solv_m112 Task urn:0-9408-1184616133199 completed with 
>>>> exit code -3 in 201 ms
>>>>
>>>> Executing task urn:0-15036-1184616133342... Building executable
>>>> command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei 
>>>> fe_stdout_m179
>>>> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
>>>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
>>>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
>>>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
>>>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
>>>> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>>>> --resultonly --wham_outputs wf_m179 --solv_lrc_file 
>>>> solv_chg_a10_m179_done
>>>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133342 completed with 
>>>> exit code -3 in 267 ms
>>>>
>>>> Executing task urn:0-15036-1184616133628... Building executable
>>>> command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei 
>>>> fe_stdout_m179
>>>> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
>>>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
>>>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
>>>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
>>>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
>>>> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>>>> --resultonly --wham_outputs wf_m179 --solv_lrc_file 
>>>> solv_chg_a10_m179_done
>>>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133628 completed with 
>>>> exit code -3 in 2368 ms
>>>>
>>>> Executing task urn:0-15036-1184616133528... Building executable
>>>> command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei 
>>>> fe_stdout_m179
>>>> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
>>>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
>>>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
>>>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
>>>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
>>>> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>>>> --resultonly --wham_outputs wf_m179 --solv_lrc_file 
>>>> solv_chg_a10_m179_done
>>>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133528 completed with 
>>>> exit code -3 in 311 ms
>>>>
>>>> Executing task urn:0-9408-1184616130688... Building executable
>>>> command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei 
>>>> fe_stdout_m112
>>>> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
>>>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
>>>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
>>>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
>>>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
>>>> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>>>> --resultonly --wham_outputs wf_m112 --solv_lrc_file 
>>>> solv_chg_a10_m112_done
>>>> --fe_file fe_solv_m112 Task urn:0-9408-1184616130688 completed with 
>>>> exit code -3 in 464 ms
>>>>
>>>>
>>>> Both the Falkon logs and the Swift logs agree on the number of 
>>>> submitted tasks,
>>>> number of successful tasks, and number of failed tasks.  There were no
>>>> outstanding tasks at the time when the workflow failed.  BTW, I 
>>>> checked the
>>>> disk space usage after about an hour that the whole experiment 
>>>> finished, and
>>>> there was plenty of disk space left.
>>>>
>>>> Yong mentioned that he looked through the output of MolDyn, and there 
>>>> were only
>>>> 242 'fe_solv_*' files, so 2 molecule files were missing...  one 
>>>> question for
>>>> Nika, are the 6 failed tasks the same job, resubmitted? 
>>>> Nika, can you add anything more to this?  Is there anything else to 
>>>> be learned
>>>> from the Swift log, as to why those last few jobs failed?  After we 
>>>> have tried
>>>> to figure out what happened, can we resume the workflow, and 
>>>> hopefully finish
>>>> the last few jobs in another run?
>>>>
>>>> Ioan
>>>>
>>>>
>>>>   
>>>>         
>> -- 
>>
>>    Ian Foster, Director, Computation Institute
>> Argonne National Laboratory & University of Chicago
>> Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
>> Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
>> Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
>>       Globus Alliance: www.globus.org.
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>>     
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070717/301ab316/attachment.html>

From foster at mcs.anl.gov  Tue Jul 17 22:37:29 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Tue, 17 Jul 2007 22:37:29 -0500
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <469D89EE.5090202@cs.uchicago.edu>
References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov>	<469D7D58.8000908@mcs.anl.gov>
	<469D7E68.9050202@mcs.anl.gov> <469D89EE.5090202@cs.uchicago.edu>
Message-ID: <469D8AF9.7070401@mcs.anl.gov>

Sorry, I was unclear. What I meant was: in the event that Swift decides 
that things have "failed" (definitively), it would be good to have 
something like a DAGman "rescue dag" that would show exactly what needed 
to be done to resubmit a task manually.

Your comment that "If we were to do a resume from Swift, I think it 
would automatically resubmit just the failed tasks" suggests that (in 
effect) we already ahve this.

Ian.

Ioan Raicu wrote:
>
>
> Ian Foster wrote:
>> Another (perhaps dumb?) question--it would seem desirable that we be 
>> able to quickly determine what tasks failed and then (attempt to) 
>> rerun them in such circumstances. \
> I think Swift already does this up to a fixed # of times (I think it 
> is 3 or 5).
>>
>> Here it seems that a lot of effort is required just to determine what 
>> tasks failed, and I am not sure that the information extracted is 
>> enough to rerun them.
> The failed tasks are pretty easy to find in the logs based on the exit 
> code.  If we were to do a resume from Swift, I think it would 
> automatically resubmit just the failed tasks... but unless we figure 
> out why they failed and fix the problem, they will likely again.
>>
>> It also seems that we can't easily determine which output files are 
>> missing.
> I don't know about this one, Maybe Nika can comment on this.
>
> Ioan
>>
>> Ian.
>>
>> Ian Foster wrote:
>>> Ioan:
>>>
>>> a) I think this information should be in the bugzilla summary, 
>>> according to our processes?
>>>
>>> b) Why did it take so long to get all of the workers working?
>>>
>>> c) Can we debug using less than O(800) node hours?
>>>
>>> Ian.
>>>
>>> bugzilla-daemon at mcs.anl.gov wrote:
>>>> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ------- Comment #24 from iraicu at cs.uchicago.edu  2007-07-17 16:08 
>>>> -------
>>>> So the latest MolDyn's 244 mol run also failed... but I think it 
>>>> made it all
>>>> the way to the final few jobs...
>>>>
>>>> The place where I put all the information about the run is at:
>>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ 
>>>>
>>>>
>>>> Here are the graphs:
>>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg 
>>>>
>>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg 
>>>>
>>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg 
>>>>
>>>>
>>>> The Swift log can be found at:
>>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log 
>>>>
>>>>
>>>> The Falkon logs are at:
>>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/ 
>>>>
>>>>
>>>> The 244 mol run was supposed to have 20497 tasks, broken down as 
>>>> follows:
>>>> 1       1       1
>>>> 1       244     244
>>>> 1       244     244
>>>> 68      244     16592
>>>> 1       244     244
>>>> 11      244     2684
>>>> 1       244     244
>>>> 1       244     244
>>>> ======================
>>>>                 20497
>>>>
>>>> We had 20495 tasks that exited with an exit code of 0, and 6 tasks 
>>>> that exited
>>>> with an exit code of -3.  The worker logs don't show anything on 
>>>> the stdout or
>>>> stderr of the failed jobs.  I looked online what an exit code of -3 
>>>> could mean,
>>>> but didn't find anything. Here are the failed 6 tasks:
>>>> Executing task urn:0-9408-1184616132483... Building executable
>>>> command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei 
>>>> fe_stdout_m112
>>>> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
>>>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
>>>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out 
>>>> solv_repu_0.5_0.6_m112.out
>>>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
>>>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
>>>> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>>>> --resultonly --wham_outputs wf_m112 --solv_lrc_file 
>>>> solv_chg_a10_m112_done
>>>> --fe_file fe_solv_m112 Task urn:0-9408-1184616132483 completed with 
>>>> exit code -3 in 238 ms
>>>>
>>>> Executing task urn:0-9408-1184616133199... Building executable
>>>> command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei 
>>>> fe_stdout_m112
>>>> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
>>>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
>>>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out 
>>>> solv_repu_0.5_0.6_m112.out
>>>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
>>>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
>>>> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>>>> --resultonly --wham_outputs wf_m112 --solv_lrc_file 
>>>> solv_chg_a10_m112_done
>>>> --fe_file fe_solv_m112 Task urn:0-9408-1184616133199 completed with 
>>>> exit code -3 in 201 ms
>>>>
>>>> Executing task urn:0-15036-1184616133342... Building executable
>>>> command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei 
>>>> fe_stdout_m179
>>>> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
>>>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
>>>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out 
>>>> solv_repu_0.5_0.6_m179.out
>>>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
>>>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
>>>> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>>>> --resultonly --wham_outputs wf_m179 --solv_lrc_file 
>>>> solv_chg_a10_m179_done
>>>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133342 completed 
>>>> with exit code -3 in 267 ms
>>>>
>>>> Executing task urn:0-15036-1184616133628... Building executable
>>>> command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei 
>>>> fe_stdout_m179
>>>> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
>>>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
>>>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out 
>>>> solv_repu_0.5_0.6_m179.out
>>>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
>>>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
>>>> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>>>> --resultonly --wham_outputs wf_m179 --solv_lrc_file 
>>>> solv_chg_a10_m179_done
>>>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133628 completed 
>>>> with exit code -3 in 2368 ms
>>>>
>>>> Executing task urn:0-15036-1184616133528... Building executable
>>>> command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei 
>>>> fe_stdout_m179
>>>> stderr.txt   wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
>>>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
>>>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out 
>>>> solv_repu_0.5_0.6_m179.out
>>>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
>>>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
>>>> fe_stdout_m179  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>>>> --resultonly --wham_outputs wf_m179 --solv_lrc_file 
>>>> solv_chg_a10_m179_done
>>>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133528 completed 
>>>> with exit code -3 in 311 ms
>>>>
>>>> Executing task urn:0-9408-1184616130688... Building executable
>>>> command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei 
>>>> fe_stdout_m112
>>>> stderr.txt   wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
>>>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
>>>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out 
>>>> solv_repu_0.5_0.6_m112.out
>>>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
>>>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
>>>> fe_stdout_m112  /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl --nosite
>>>> --resultonly --wham_outputs wf_m112 --solv_lrc_file 
>>>> solv_chg_a10_m112_done
>>>> --fe_file fe_solv_m112 Task urn:0-9408-1184616130688 completed with 
>>>> exit code -3 in 464 ms
>>>>
>>>>
>>>> Both the Falkon logs and the Swift logs agree on the number of 
>>>> submitted tasks,
>>>> number of successful tasks, and number of failed tasks.  There were no
>>>> outstanding tasks at the time when the workflow failed.  BTW, I 
>>>> checked the
>>>> disk space usage after about an hour that the whole experiment 
>>>> finished, and
>>>> there was plenty of disk space left.
>>>>
>>>> Yong mentioned that he looked through the output of MolDyn, and 
>>>> there were only
>>>> 242 'fe_solv_*' files, so 2 molecule files were missing...  one 
>>>> question for
>>>> Nika, are the 6 failed tasks the same job, resubmitted? Nika, can 
>>>> you add anything more to this?  Is there anything else to be learned
>>>> from the Swift log, as to why those last few jobs failed?  After we 
>>>> have tried
>>>> to figure out what happened, can we resume the workflow, and 
>>>> hopefully finish
>>>> the last few jobs in another run?
>>>>
>>>> Ioan
>>>>
>>>>
>>>>   
>>>
>>
>

-- 

   Ian Foster, Director, Computation Institute
Argonne National Laboratory & University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
      Globus Alliance: www.globus.org.


From iraicu at cs.uchicago.edu  Tue Jul 17 22:37:56 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Tue, 17 Jul 2007 22:37:56 -0500
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <469D8A7C.9030604@mcs.anl.gov>
References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov>
	<469D7D58.8000908@mcs.anl.gov> <469D895A.5090706@cs.uchicago.edu>
	<469D8A7C.9030604@mcs.anl.gov>
Message-ID: <469D8B14.9090509@cs.uchicago.edu>

Linear, 1, 2, 3, 4, ...
For the ANL/UC site, its generating a small enough number of jobs...
Ioan

Ian Foster wrote:
> Great! What resource acquisition policy are you using?
>>> b) Why did it take so long to get all of the workers working?
>> I finally had enough confidence in the dynamic resource provisioning 
>> that we won't loose any jobs across resource allocation boundaries 
>> (ran lots of tests and they were all positive), so I enabled it for 
>> this run.  I set the max to be the entire ANL site (274 
>> processors)... and we got 146 at the beginning, and with time, the # 
>> of processors kept increasing up to the peak of 208 or so... the rest 
>> up to 274 were queued up in the PBS wait queue.  The difference 
>> between the beginning with 146 and the end with 208 was that others 
>> who were in the system at the beginning finished their work and 
>> released some nodes, and idle processors went from the wait queue 
>> into the run queue.  I would actually be curious to try out the 
>> latest DRP stuff on a busy site, such as Purdue or NCSA, and to see 
>> if we can maintain a nice pool size over a period of time, despite 
>> the sites being busy...
>>
>> BTW, in the previous runs for MolDyn, we normally set the min and max 
>> to say 100 processors, or 200 processors, and we would wait until we 
>> had all of them before we started... sometimes, this meant waiting 
>> 12~24 hours for enough nodes to become free so the large job could 
>> start.  With DRP, you can start off with whatever the site has 
>> available, and you get more with time as your jobs make it through 
>> the wait queue and other jobs that are running complete...
>
>

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================


From tiberius at ci.uchicago.edu  Tue Jul 17 23:18:49 2007
From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun)
Date: Tue, 17 Jul 2007 23:18:49 -0500
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <469D8AC4.4010400@cs.uchicago.edu>
References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov>
	<469D7D58.8000908@mcs.anl.gov> <469D7E68.9050202@mcs.anl.gov>
	<1184728284.2004.12.camel@blabla.mcs.anl.gov>
	<469D8AC4.4010400@cs.uchicago.edu>
Message-ID: <fec1351f0707172118x2dd1ac60h7d01132a1c25150e@mail.gmail.com>

I also had jobs failing at the Argonne site today.
It seems that the ia_32 were randomly fail on executing some of my
jobs, so I had to switch my apps to the ia_64 to get a full,
successful execution.

Tibi

On 7/17/07, Ioan Raicu <iraicu at cs.uchicago.edu> wrote:
>
>
>
>  Mihael Hategan wrote:
>  On Tue, 2007-07-17 at 21:43 -0500, Ian Foster wrote:
>
>
>  Another (perhaps dumb?) question--it would seem desirable that we be
> able to quickly determine what tasks failed and then (attempt to) rerun
> them in such circumstances.
>
> Here it seems that a lot of effort is required just to determine what
> tasks failed, and I am not sure that the information extracted is enough
> to rerun them.
>
>  Normally, a summary of what failed with the reasons is printed on
> stderr, together with the stdout and stderr of the jobs. Perhaps it
> should also go to the log file.
>
> In this case, 2 jobs failed. The 6 failures are due to restarts. Which
> is in agreement with the 2 missing molecules.
>
> When jobs fail, swift should not clean up the job directories so that
> one can do post-mortem debugging. I suggest invoking the application
> manually to see if it's a matter of a bad node or bad data.
>
>  The errors happened on 3 different nodes, so I suspect that its not bad
> nodes (as we had previously experience with the stale NFS handle).
>
>  Nika, I sent out the actual commands that failed... can you try to run them
> manually to see what happens, and possibly determine why they failed?  Can
> you also find out what an exit code of -3 means within the application that
> failed (you might have to look at the app source code, or contact the
> original source code writer).
>
>  Ioan
>
>
>
>
>  It also seems that we can't easily determine which output files are
> missing.
>
>  In the general case we wouldn't be able to, because the exact outputs
> may only be known at run-time. Granted, that kind of dynamics would
> depend on our ability to have nondeterministic files being returned,
> which we haven't gotten around to implementing. But there is a question
> of whether we should try to implement a short term solution that would
> be invalidated by our own plans.
>
>
>
>  Ian.
>
> Ian Foster wrote:
>
>
>  Ioan:
>
> a) I think this information should be in the bugzilla summary,
> according to our processes?
>
> b) Why did it take so long to get all of the workers working?
>
> c) Can we debug using less than O(800) node hours?
>
> Ian.
>
> bugzilla-daemon at mcs.anl.gov wrote:
>
>
>  http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72
>
>
>
>
>
> ------- Comment #24 from iraicu at cs.uchicago.edu 2007-07-17 16:08
> -------
> So the latest MolDyn's 244 mol run also failed... but I think it made
> it all
> the way to the final few jobs...
>
> The place where I put all the information about the run is at:
> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/
>
>
> Here are the graphs:
> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg
>
> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg
>
> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg
>
>
> The Swift log can be found at:
> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log
>
>
> The Falkon logs are at:
> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/
>
>
> The 244 mol run was supposed to have 20497 tasks, broken down as
> follows:
> 1 1 1
> 1 244 244
> 1 244 244
> 68 244 16592
> 1 244 244
> 11 244 2684
> 1 244 244
> 1 244 244
> ======================
>  20497
>
> We had 20495 tasks that exited with an exit code of 0, and 6 tasks
> that exited
> with an exit code of -3. The worker logs don't show anything on the
> stdout or
> stderr of the failed jobs. I looked online what an exit code of -3
> could mean,
> but didn't find anything.
> Here are the failed 6 tasks:
> Executing task urn:0-9408-1184616132483... Building executable
> command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei
> fe_stdout_m112
> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
> --nosite
> --resultonly --wham_outputs wf_m112 --solv_lrc_file
> solv_chg_a10_m112_done
> --fe_file fe_solv_m112 Task urn:0-9408-1184616132483 completed with
> exit code -3 in 238 ms
>
> Executing task urn:0-9408-1184616133199... Building executable
> command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei
> fe_stdout_m112
> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
> --nosite
> --resultonly --wham_outputs wf_m112 --solv_lrc_file
> solv_chg_a10_m112_done
> --fe_file fe_solv_m112 Task urn:0-9408-1184616133199 completed with
> exit code -3 in 201 ms
>
> Executing task urn:0-15036-1184616133342... Building executable
> command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei
> fe_stdout_m179
> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
> --nosite
> --resultonly --wham_outputs wf_m179 --solv_lrc_file
> solv_chg_a10_m179_done
> --fe_file fe_solv_m179 Task urn:0-15036-1184616133342 completed with
> exit code -3 in 267 ms
>
> Executing task urn:0-15036-1184616133628... Building executable
> command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei
> fe_stdout_m179
> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
> --nosite
> --resultonly --wham_outputs wf_m179 --solv_lrc_file
> solv_chg_a10_m179_done
> --fe_file fe_solv_m179 Task urn:0-15036-1184616133628 completed with
> exit code -3 in 2368 ms
>
> Executing task urn:0-15036-1184616133528... Building executable
> command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei
> fe_stdout_m179
> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
> --nosite
> --resultonly --wham_outputs wf_m179 --solv_lrc_file
> solv_chg_a10_m179_done
> --fe_file fe_solv_m179 Task urn:0-15036-1184616133528 completed with
> exit code -3 in 311 ms
>
> Executing task urn:0-9408-1184616130688... Building executable
> command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei
> fe_stdout_m112
> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
> --nosite
> --resultonly --wham_outputs wf_m112 --solv_lrc_file
> solv_chg_a10_m112_done
> --fe_file fe_solv_m112 Task urn:0-9408-1184616130688 completed with
> exit code -3 in 464 ms
>
>
> Both the Falkon logs and the Swift logs agree on the number of
> submitted tasks,
> number of successful tasks, and number of failed tasks. There were no
> outstanding tasks at the time when the workflow failed. BTW, I
> checked the
> disk space usage after about an hour that the whole experiment
> finished, and
> there was plenty of disk space left.
>
> Yong mentioned that he looked through the output of MolDyn, and there
> were only
> 242 'fe_solv_*' files, so 2 molecule files were missing... one
> question for
> Nika, are the 6 failed tasks the same job, resubmitted?
> Nika, can you add anything more to this? Is there anything else to
> be learned
> from the Swift log, as to why those last few jobs failed? After we
> have tried
> to figure out what happened, can we resume the workflow, and
> hopefully finish
> the last few jobs in another run?
>
> Ioan
>
>
>
>
>  --
>
>  Ian Foster, Director, Computation Institute
> Argonne National Laboratory & University of Chicago
> Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
> Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
> Tel: +1 630 252 4619. Web: www.ci.uchicago.edu.
>  Globus Alliance: www.globus.org.
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>
>  _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>
>
>  --
> ============================================
> Ioan Raicu
> Ph.D. Student
> ============================================
> Distributed Systems Laboratory
> Computer Science Department
> University of Chicago
> 1100 E. 58th Street, Ryerson Hall
> Chicago, IL 60637
> ============================================
> Email: iraicu at cs.uchicago.edu
> Web: http://www.cs.uchicago.edu/~iraicu
>  http://dsl.cs.uchicago.edu/
> ============================================
> ============================================
>
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>


-- 
Tiberiu (Tibi) Stef-Praun, PhD
Research Staff, Computation Institute
5640 S. Ellis Ave, #405
University of Chicago
http://www-unix.mcs.anl.gov/~tiberius/


From hategan at mcs.anl.gov  Tue Jul 17 23:32:44 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 17 Jul 2007 23:32:44 -0500
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <fec1351f0707172118x2dd1ac60h7d01132a1c25150e@mail.gmail.com>
References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov>
	<469D7D58.8000908@mcs.anl.gov> <469D7E68.9050202@mcs.anl.gov>
	<1184728284.2004.12.camel@blabla.mcs.anl.gov>
	<469D8AC4.4010400@cs.uchicago.edu>
	<fec1351f0707172118x2dd1ac60h7d01132a1c25150e@mail.gmail.com>
Message-ID: <1184733164.14719.5.camel@blabla.mcs.anl.gov>

I don't think these are random failures. In the whole workflow there
were exactly 6 tasks failed. 3 belonging to one job and 3 to the other.
Statistically, and if Ioan's assertion that they were not sent to the
exact same worker is correct, I'd be pretty confident saying that it was
due to specific executables failing on specific data (and by that I
would include the possibility of missing data).

Mihael

On Tue, 2007-07-17 at 23:18 -0500, Tiberiu Stef-Praun wrote:
> I also had jobs failing at the Argonne site today.
> It seems that the ia_32 were randomly fail on executing some of my
> jobs, so I had to switch my apps to the ia_64 to get a full,
> successful execution.
> 
> Tibi
> 
> On 7/17/07, Ioan Raicu <iraicu at cs.uchicago.edu> wrote:
> >
> >
> >
> >  Mihael Hategan wrote:
> >  On Tue, 2007-07-17 at 21:43 -0500, Ian Foster wrote:
> >
> >
> >  Another (perhaps dumb?) question--it would seem desirable that we be
> > able to quickly determine what tasks failed and then (attempt to) rerun
> > them in such circumstances.
> >
> > Here it seems that a lot of effort is required just to determine what
> > tasks failed, and I am not sure that the information extracted is enough
> > to rerun them.
> >
> >  Normally, a summary of what failed with the reasons is printed on
> > stderr, together with the stdout and stderr of the jobs. Perhaps it
> > should also go to the log file.
> >
> > In this case, 2 jobs failed. The 6 failures are due to restarts. Which
> > is in agreement with the 2 missing molecules.
> >
> > When jobs fail, swift should not clean up the job directories so that
> > one can do post-mortem debugging. I suggest invoking the application
> > manually to see if it's a matter of a bad node or bad data.
> >
> >  The errors happened on 3 different nodes, so I suspect that its not bad
> > nodes (as we had previously experience with the stale NFS handle).
> >
> >  Nika, I sent out the actual commands that failed... can you try to run them
> > manually to see what happens, and possibly determine why they failed?  Can
> > you also find out what an exit code of -3 means within the application that
> > failed (you might have to look at the app source code, or contact the
> > original source code writer).
> >
> >  Ioan
> >
> >
> >
> >
> >  It also seems that we can't easily determine which output files are
> > missing.
> >
> >  In the general case we wouldn't be able to, because the exact outputs
> > may only be known at run-time. Granted, that kind of dynamics would
> > depend on our ability to have nondeterministic files being returned,
> > which we haven't gotten around to implementing. But there is a question
> > of whether we should try to implement a short term solution that would
> > be invalidated by our own plans.
> >
> >
> >
> >  Ian.
> >
> > Ian Foster wrote:
> >
> >
> >  Ioan:
> >
> > a) I think this information should be in the bugzilla summary,
> > according to our processes?
> >
> > b) Why did it take so long to get all of the workers working?
> >
> > c) Can we debug using less than O(800) node hours?
> >
> > Ian.
> >
> > bugzilla-daemon at mcs.anl.gov wrote:
> >
> >
> >  http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72
> >
> >
> >
> >
> >
> > ------- Comment #24 from iraicu at cs.uchicago.edu 2007-07-17 16:08
> > -------
> > So the latest MolDyn's 244 mol run also failed... but I think it made
> > it all
> > the way to the final few jobs...
> >
> > The place where I put all the information about the run is at:
> > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/
> >
> >
> > Here are the graphs:
> > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg
> >
> > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg
> >
> > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg
> >
> >
> > The Swift log can be found at:
> > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log
> >
> >
> > The Falkon logs are at:
> > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/
> >
> >
> > The 244 mol run was supposed to have 20497 tasks, broken down as
> > follows:
> > 1 1 1
> > 1 244 244
> > 1 244 244
> > 68 244 16592
> > 1 244 244
> > 11 244 2684
> > 1 244 244
> > 1 244 244
> > ======================
> >  20497
> >
> > We had 20495 tasks that exited with an exit code of 0, and 6 tasks
> > that exited
> > with an exit code of -3. The worker logs don't show anything on the
> > stdout or
> > stderr of the failed jobs. I looked online what an exit code of -3
> > could mean,
> > but didn't find anything.
> > Here are the failed 6 tasks:
> > Executing task urn:0-9408-1184616132483... Building executable
> > command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei
> > fe_stdout_m112
> > stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
> > solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
> > solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
> > solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
> > solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
> > fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
> > --nosite
> > --resultonly --wham_outputs wf_m112 --solv_lrc_file
> > solv_chg_a10_m112_done
> > --fe_file fe_solv_m112 Task urn:0-9408-1184616132483 completed with
> > exit code -3 in 238 ms
> >
> > Executing task urn:0-9408-1184616133199... Building executable
> > command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei
> > fe_stdout_m112
> > stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
> > solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
> > solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
> > solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
> > solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
> > fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
> > --nosite
> > --resultonly --wham_outputs wf_m112 --solv_lrc_file
> > solv_chg_a10_m112_done
> > --fe_file fe_solv_m112 Task urn:0-9408-1184616133199 completed with
> > exit code -3 in 201 ms
> >
> > Executing task urn:0-15036-1184616133342... Building executable
> > command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei
> > fe_stdout_m179
> > stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
> > solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
> > solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
> > solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
> > solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
> > fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
> > --nosite
> > --resultonly --wham_outputs wf_m179 --solv_lrc_file
> > solv_chg_a10_m179_done
> > --fe_file fe_solv_m179 Task urn:0-15036-1184616133342 completed with
> > exit code -3 in 267 ms
> >
> > Executing task urn:0-15036-1184616133628... Building executable
> > command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei
> > fe_stdout_m179
> > stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
> > solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
> > solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
> > solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
> > solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
> > fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
> > --nosite
> > --resultonly --wham_outputs wf_m179 --solv_lrc_file
> > solv_chg_a10_m179_done
> > --fe_file fe_solv_m179 Task urn:0-15036-1184616133628 completed with
> > exit code -3 in 2368 ms
> >
> > Executing task urn:0-15036-1184616133528... Building executable
> > command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei
> > fe_stdout_m179
> > stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
> > solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
> > solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
> > solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
> > solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
> > fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
> > --nosite
> > --resultonly --wham_outputs wf_m179 --solv_lrc_file
> > solv_chg_a10_m179_done
> > --fe_file fe_solv_m179 Task urn:0-15036-1184616133528 completed with
> > exit code -3 in 311 ms
> >
> > Executing task urn:0-9408-1184616130688... Building executable
> > command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei
> > fe_stdout_m112
> > stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
> > solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
> > solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
> > solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
> > solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
> > fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
> > --nosite
> > --resultonly --wham_outputs wf_m112 --solv_lrc_file
> > solv_chg_a10_m112_done
> > --fe_file fe_solv_m112 Task urn:0-9408-1184616130688 completed with
> > exit code -3 in 464 ms
> >
> >
> > Both the Falkon logs and the Swift logs agree on the number of
> > submitted tasks,
> > number of successful tasks, and number of failed tasks. There were no
> > outstanding tasks at the time when the workflow failed. BTW, I
> > checked the
> > disk space usage after about an hour that the whole experiment
> > finished, and
> > there was plenty of disk space left.
> >
> > Yong mentioned that he looked through the output of MolDyn, and there
> > were only
> > 242 'fe_solv_*' files, so 2 molecule files were missing... one
> > question for
> > Nika, are the 6 failed tasks the same job, resubmitted?
> > Nika, can you add anything more to this? Is there anything else to
> > be learned
> > from the Swift log, as to why those last few jobs failed? After we
> > have tried
> > to figure out what happened, can we resume the workflow, and
> > hopefully finish
> > the last few jobs in another run?
> >
> > Ioan
> >
> >
> >
> >
> >  --
> >
> >  Ian Foster, Director, Computation Institute
> > Argonne National Laboratory & University of Chicago
> > Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
> > Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
> > Tel: +1 630 252 4619. Web: www.ci.uchicago.edu.
> >  Globus Alliance: www.globus.org.
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> >
> >  _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> >
> >
> >  --
> > ============================================
> > Ioan Raicu
> > Ph.D. Student
> > ============================================
> > Distributed Systems Laboratory
> > Computer Science Department
> > University of Chicago
> > 1100 E. 58th Street, Ryerson Hall
> > Chicago, IL 60637
> > ============================================
> > Email: iraicu at cs.uchicago.edu
> > Web: http://www.cs.uchicago.edu/~iraicu
> >  http://dsl.cs.uchicago.edu/
> > ============================================
> > ============================================
> >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> >
> 
> 


From benc at hawaga.org.uk  Wed Jul 18 02:45:12 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 18 Jul 2007 07:45:12 +0000 (GMT)
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <469D8AF9.7070401@mcs.anl.gov>
References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov>
	<469D7D58.8000908@mcs.anl.gov>
	<469D7E68.9050202@mcs.anl.gov> <469D89EE.5090202@cs.uchicago.edu>
	<469D8AF9.7070401@mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707180742500.12710@dildano.hawaga.org.uk>


On Tue, 17 Jul 2007, Ian Foster wrote:

> Sorry, I was unclear. What I meant was: in the event that Swift decides that
> things have "failed" (definitively), it would be good to have something like a
> DAGman "rescue dag" that would show exactly what needed to be done to resubmit
> a task manually.

> Your comment that "If we were to do a resume from Swift, I think it would
> automatically resubmit just the failed tasks" suggests that (in effect) we
> already ahve this.

Swift has resume (though it lists what has been done, not what needs to 
be done). I think there's something funny with it in the context of this 
application because of some hacks to work round swift deficiences.

-- 


From bugzilla-daemon at mcs.anl.gov  Wed Jul 18 08:18:21 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Wed, 18 Jul 2007 08:18:21 -0500 (CDT)
Subject: [Swift-devel] [Bug 83] nested loops hung
In-Reply-To: <bug-83-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070718131821.0BA40164EC@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83


benc at hawaga.org.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED


------- Comment #12 from benc at hawaga.org.uk  2007-07-18 08:18 -------
r920 has a fix for code that looks like that in comment #3 - that code became
the regression test tests/language-behaviour/0084-for.swift.

Please try out r920 or later and report back.


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From nefedova at mcs.anl.gov  Wed Jul 18 08:27:56 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Wed, 18 Jul 2007 08:27:56 -0500
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules
In-Reply-To: <1184733164.14719.5.camel@blabla.mcs.anl.gov>
References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov>
	<469D7D58.8000908@mcs.anl.gov> <469D7E68.9050202@mcs.anl.gov>
	<1184728284.2004.12.camel@blabla.mcs.anl.gov>
	<469D8AC4.4010400@cs.uchicago.edu>
	<fec1351f0707172118x2dd1ac60h7d01132a1c25150e@mail.gmail.com>
	<1184733164.14719.5.camel@blabla.mcs.anl.gov>
Message-ID: <79C29A10-D8AC-43D3-B548-8553B712FDE5@mcs.anl.gov>

Sorry I was offline (sick w/cold/fever). I am taking today off as well.

I've checked the stderr files from the last run - it looks like 2  
jobs failed due to some application-specific reasons. I am Cc Yuqing  
to see if he has any insights... Here is what i had:

WHAM is not converged for solv_chg_m112
WHAM is not converged for solv_chg_m179

So it looks like 2 molecules (out of 244) failed. The last stage of  
the workflow failed for these molecules because the previous stage(s)  
produced some wrong/incomplete (?) results.

Yuqing, there are 6 directories on tg-login1:/disks/scratchgpfs1/ 
iraicu/ModLyn/MolDyn-244-ja4ya01d6cti1 (3 for each of the failed  
molecules). Any ideas what went wrong with these 2 molecules?

Nika


On Jul 17, 2007, at 11:32 PM, Mihael Hategan wrote:

> I don't think these are random failures. In the whole workflow there
> were exactly 6 tasks failed. 3 belonging to one job and 3 to the  
> other.
> Statistically, and if Ioan's assertion that they were not sent to the
> exact same worker is correct, I'd be pretty confident saying that  
> it was
> due to specific executables failing on specific data (and by that I
> would include the possibility of missing data).
>
> Mihael
>
> On Tue, 2007-07-17 at 23:18 -0500, Tiberiu Stef-Praun wrote:
>> I also had jobs failing at the Argonne site today.
>> It seems that the ia_32 were randomly fail on executing some of my
>> jobs, so I had to switch my apps to the ia_64 to get a full,
>> successful execution.
>>
>> Tibi
>>
>> On 7/17/07, Ioan Raicu <iraicu at cs.uchicago.edu> wrote:
>>>
>>>
>>>
>>>  Mihael Hategan wrote:
>>>  On Tue, 2007-07-17 at 21:43 -0500, Ian Foster wrote:
>>>
>>>
>>>  Another (perhaps dumb?) question--it would seem desirable that  
>>> we be
>>> able to quickly determine what tasks failed and then (attempt to)  
>>> rerun
>>> them in such circumstances.
>>>
>>> Here it seems that a lot of effort is required just to determine  
>>> what
>>> tasks failed, and I am not sure that the information extracted is  
>>> enough
>>> to rerun them.
>>>
>>>  Normally, a summary of what failed with the reasons is printed on
>>> stderr, together with the stdout and stderr of the jobs. Perhaps it
>>> should also go to the log file.
>>>
>>> In this case, 2 jobs failed. The 6 failures are due to restarts.  
>>> Which
>>> is in agreement with the 2 missing molecules.
>>>
>>> When jobs fail, swift should not clean up the job directories so  
>>> that
>>> one can do post-mortem debugging. I suggest invoking the application
>>> manually to see if it's a matter of a bad node or bad data.
>>>
>>>  The errors happened on 3 different nodes, so I suspect that its  
>>> not bad
>>> nodes (as we had previously experience with the stale NFS handle).
>>>
>>>  Nika, I sent out the actual commands that failed... can you try  
>>> to run them
>>> manually to see what happens, and possibly determine why they  
>>> failed?  Can
>>> you also find out what an exit code of -3 means within the  
>>> application that
>>> failed (you might have to look at the app source code, or contact  
>>> the
>>> original source code writer).
>>>
>>>  Ioan
>>>
>>>
>>>
>>>
>>>  It also seems that we can't easily determine which output files are
>>> missing.
>>>
>>>  In the general case we wouldn't be able to, because the exact  
>>> outputs
>>> may only be known at run-time. Granted, that kind of dynamics would
>>> depend on our ability to have nondeterministic files being returned,
>>> which we haven't gotten around to implementing. But there is a  
>>> question
>>> of whether we should try to implement a short term solution that  
>>> would
>>> be invalidated by our own plans.
>>>
>>>
>>>
>>>  Ian.
>>>
>>> Ian Foster wrote:
>>>
>>>
>>>  Ioan:
>>>
>>> a) I think this information should be in the bugzilla summary,
>>> according to our processes?
>>>
>>> b) Why did it take so long to get all of the workers working?
>>>
>>> c) Can we debug using less than O(800) node hours?
>>>
>>> Ian.
>>>
>>> bugzilla-daemon at mcs.anl.gov wrote:
>>>
>>>
>>>  http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72
>>>
>>>
>>>
>>>
>>>
>>> ------- Comment #24 from iraicu at cs.uchicago.edu 2007-07-17 16:08
>>> -------
>>> So the latest MolDyn's 244 mol run also failed... but I think it  
>>> made
>>> it all
>>> the way to the final few jobs...
>>>
>>> The place where I put all the information about the run is at:
>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244- 
>>> mol-failed-7-16-07/
>>>
>>>
>>> Here are the graphs:
>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244- 
>>> mol-failed-7-16-07/summary_graph_med.jpg
>>>
>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244- 
>>> mol-failed-7-16-07/task_graph_med.jpg
>>>
>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244- 
>>> mol-failed-7-16-07/executor_graph_med.jpg
>>>
>>>
>>> The Swift log can be found at:
>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244- 
>>> mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log
>>>
>>>
>>> The Falkon logs are at:
>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244- 
>>> mol-failed-7-16-07/logs/falkon/
>>>
>>>
>>> The 244 mol run was supposed to have 20497 tasks, broken down as
>>> follows:
>>> 1 1 1
>>> 1 244 244
>>> 1 244 244
>>> 68 244 16592
>>> 1 244 244
>>> 11 244 2684
>>> 1 244 244
>>> 1 244 244
>>> ======================
>>>  20497
>>>
>>> We had 20495 tasks that exited with an exit code of 0, and 6 tasks
>>> that exited
>>> with an exit code of -3. The worker logs don't show anything on the
>>> stdout or
>>> stderr of the failed jobs. I looked online what an exit code of -3
>>> could mean,
>>> but didn't find anything.
>>> Here are the failed 6 tasks:
>>> Executing task urn:0-9408-1184616132483... Building executable
>>> command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei
>>> fe_stdout_m112
>>> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
>>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
>>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out  
>>> solv_repu_0.5_0.6_m112.out
>>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
>>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
>>> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
>>> --nosite
>>> --resultonly --wham_outputs wf_m112 --solv_lrc_file
>>> solv_chg_a10_m112_done
>>> --fe_file fe_solv_m112 Task urn:0-9408-1184616132483 completed with
>>> exit code -3 in 238 ms
>>>
>>> Executing task urn:0-9408-1184616133199... Building executable
>>> command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei
>>> fe_stdout_m112
>>> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
>>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
>>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out  
>>> solv_repu_0.5_0.6_m112.out
>>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
>>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
>>> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
>>> --nosite
>>> --resultonly --wham_outputs wf_m112 --solv_lrc_file
>>> solv_chg_a10_m112_done
>>> --fe_file fe_solv_m112 Task urn:0-9408-1184616133199 completed with
>>> exit code -3 in 201 ms
>>>
>>> Executing task urn:0-15036-1184616133342... Building executable
>>> command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei
>>> fe_stdout_m179
>>> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
>>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
>>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out  
>>> solv_repu_0.5_0.6_m179.out
>>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
>>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
>>> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
>>> --nosite
>>> --resultonly --wham_outputs wf_m179 --solv_lrc_file
>>> solv_chg_a10_m179_done
>>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133342 completed with
>>> exit code -3 in 267 ms
>>>
>>> Executing task urn:0-15036-1184616133628... Building executable
>>> command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei
>>> fe_stdout_m179
>>> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
>>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
>>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out  
>>> solv_repu_0.5_0.6_m179.out
>>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
>>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
>>> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
>>> --nosite
>>> --resultonly --wham_outputs wf_m179 --solv_lrc_file
>>> solv_chg_a10_m179_done
>>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133628 completed with
>>> exit code -3 in 2368 ms
>>>
>>> Executing task urn:0-15036-1184616133528... Building executable
>>> command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei
>>> fe_stdout_m179
>>> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
>>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
>>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out  
>>> solv_repu_0.5_0.6_m179.out
>>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
>>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
>>> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
>>> --nosite
>>> --resultonly --wham_outputs wf_m179 --solv_lrc_file
>>> solv_chg_a10_m179_done
>>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133528 completed with
>>> exit code -3 in 311 ms
>>>
>>> Executing task urn:0-9408-1184616130688... Building executable
>>> command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei
>>> fe_stdout_m112
>>> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
>>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
>>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out  
>>> solv_repu_0.5_0.6_m112.out
>>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
>>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
>>> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
>>> --nosite
>>> --resultonly --wham_outputs wf_m112 --solv_lrc_file
>>> solv_chg_a10_m112_done
>>> --fe_file fe_solv_m112 Task urn:0-9408-1184616130688 completed with
>>> exit code -3 in 464 ms
>>>
>>>
>>> Both the Falkon logs and the Swift logs agree on the number of
>>> submitted tasks,
>>> number of successful tasks, and number of failed tasks. There  
>>> were no
>>> outstanding tasks at the time when the workflow failed. BTW, I
>>> checked the
>>> disk space usage after about an hour that the whole experiment
>>> finished, and
>>> there was plenty of disk space left.
>>>
>>> Yong mentioned that he looked through the output of MolDyn, and  
>>> there
>>> were only
>>> 242 'fe_solv_*' files, so 2 molecule files were missing... one
>>> question for
>>> Nika, are the 6 failed tasks the same job, resubmitted?
>>> Nika, can you add anything more to this? Is there anything else to
>>> be learned
>>> from the Swift log, as to why those last few jobs failed? After we
>>> have tried
>>> to figure out what happened, can we resume the workflow, and
>>> hopefully finish
>>> the last few jobs in another run?
>>>
>>> Ioan
>>>
>>>
>>>
>>>
>>>  --
>>>
>>>  Ian Foster, Director, Computation Institute
>>> Argonne National Laboratory & University of Chicago
>>> Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
>>> Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
>>> Tel: +1 630 252 4619. Web: www.ci.uchicago.edu.
>>>  Globus Alliance: www.globus.org.
>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>>
>>>  _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>>
>>>
>>>  --
>>> ============================================
>>> Ioan Raicu
>>> Ph.D. Student
>>> ============================================
>>> Distributed Systems Laboratory
>>> Computer Science Department
>>> University of Chicago
>>> 1100 E. 58th Street, Ryerson Hall
>>> Chicago, IL 60637
>>> ============================================
>>> Email: iraicu at cs.uchicago.edu
>>> Web: http://www.cs.uchicago.edu/~iraicu
>>>  http://dsl.cs.uchicago.edu/
>>> ============================================
>>> ============================================
>>>
>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>>
>>
>>
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From iraicu at cs.uchicago.edu  Wed Jul 18 08:58:20 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 18 Jul 2007 08:58:20 -0500
Subject: [Swift-devel] [Bug 72] Campaign for scaling wf up to 244	molecules
In-Reply-To: <1184733164.14719.5.camel@blabla.mcs.anl.gov>
References: <20070717210859.27A50164EC@foxtrot.mcs.anl.gov>	
	<469D7D58.8000908@mcs.anl.gov> <469D7E68.9050202@mcs.anl.gov>	
	<1184728284.2004.12.camel@blabla.mcs.anl.gov>	
	<469D8AC4.4010400@cs.uchicago.edu>	
	<fec1351f0707172118x2dd1ac60h7d01132a1c25150e@mail.gmail.com>
	<1184733164.14719.5.camel@blabla.mcs.anl.gov>
Message-ID: <469E1C7C.5050703@cs.uchicago.edu>

The 4 machines that failed 6 jobs were:
tg-c055
tg-v028
tg-v092
tg-v023

Note that there is a 64 bit one, and 3 32 bit ones.... also, I had two 
workers on each machine, only one worker on each machine failed some 
job... if it was indeed a node hardware problem, I would have expected 
that both workers on that machine to have failed jobs... 

I concur with Mihael that there might have been incomplete or missing 
data... we just have to find out if that is possible despite the 
previous stages all exiting with an exit code of 0.  Yuqing (the 
domain/app specific expert) is probably the key to finding out what 
happened in this run with these failed 6 jobs.  Nika, did you try to run 
the jobs manually to see if they fail on the same -3 exit code?

Ioan

Mihael Hategan wrote:
> I don't think these are random failures. In the whole workflow there
> were exactly 6 tasks failed. 3 belonging to one job and 3 to the other.
> Statistically, and if Ioan's assertion that they were not sent to the
> exact same worker is correct, I'd be pretty confident saying that it was
> due to specific executables failing on specific data (and by that I
> would include the possibility of missing data).
>
> Mihael
>
> On Tue, 2007-07-17 at 23:18 -0500, Tiberiu Stef-Praun wrote:
>   
>> I also had jobs failing at the Argonne site today.
>> It seems that the ia_32 were randomly fail on executing some of my
>> jobs, so I had to switch my apps to the ia_64 to get a full,
>> successful execution.
>>
>> Tibi
>>
>> On 7/17/07, Ioan Raicu <iraicu at cs.uchicago.edu> wrote:
>>     
>>>
>>>  Mihael Hategan wrote:
>>>  On Tue, 2007-07-17 at 21:43 -0500, Ian Foster wrote:
>>>
>>>
>>>  Another (perhaps dumb?) question--it would seem desirable that we be
>>> able to quickly determine what tasks failed and then (attempt to) rerun
>>> them in such circumstances.
>>>
>>> Here it seems that a lot of effort is required just to determine what
>>> tasks failed, and I am not sure that the information extracted is enough
>>> to rerun them.
>>>
>>>  Normally, a summary of what failed with the reasons is printed on
>>> stderr, together with the stdout and stderr of the jobs. Perhaps it
>>> should also go to the log file.
>>>
>>> In this case, 2 jobs failed. The 6 failures are due to restarts. Which
>>> is in agreement with the 2 missing molecules.
>>>
>>> When jobs fail, swift should not clean up the job directories so that
>>> one can do post-mortem debugging. I suggest invoking the application
>>> manually to see if it's a matter of a bad node or bad data.
>>>
>>>  The errors happened on 3 different nodes, so I suspect that its not bad
>>> nodes (as we had previously experience with the stale NFS handle).
>>>
>>>  Nika, I sent out the actual commands that failed... can you try to run them
>>> manually to see what happens, and possibly determine why they failed?  Can
>>> you also find out what an exit code of -3 means within the application that
>>> failed (you might have to look at the app source code, or contact the
>>> original source code writer).
>>>
>>>  Ioan
>>>
>>>
>>>
>>>
>>>  It also seems that we can't easily determine which output files are
>>> missing.
>>>
>>>  In the general case we wouldn't be able to, because the exact outputs
>>> may only be known at run-time. Granted, that kind of dynamics would
>>> depend on our ability to have nondeterministic files being returned,
>>> which we haven't gotten around to implementing. But there is a question
>>> of whether we should try to implement a short term solution that would
>>> be invalidated by our own plans.
>>>
>>>
>>>
>>>  Ian.
>>>
>>> Ian Foster wrote:
>>>
>>>
>>>  Ioan:
>>>
>>> a) I think this information should be in the bugzilla summary,
>>> according to our processes?
>>>
>>> b) Why did it take so long to get all of the workers working?
>>>
>>> c) Can we debug using less than O(800) node hours?
>>>
>>> Ian.
>>>
>>> bugzilla-daemon at mcs.anl.gov wrote:
>>>
>>>
>>>  http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72
>>>
>>>
>>>
>>>
>>>
>>> ------- Comment #24 from iraicu at cs.uchicago.edu 2007-07-17 16:08
>>> -------
>>> So the latest MolDyn's 244 mol run also failed... but I think it made
>>> it all
>>> the way to the final few jobs...
>>>
>>> The place where I put all the information about the run is at:
>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/
>>>
>>>
>>> Here are the graphs:
>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/summary_graph_med.jpg
>>>
>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/task_graph_med.jpg
>>>
>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/executor_graph_med.jpg
>>>
>>>
>>> The Swift log can be found at:
>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/swift/MolDyn-244-ja4ya01d6cti1.log
>>>
>>>
>>> The Falkon logs are at:
>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/logs/falkon/
>>>
>>>
>>> The 244 mol run was supposed to have 20497 tasks, broken down as
>>> follows:
>>> 1 1 1
>>> 1 244 244
>>> 1 244 244
>>> 68 244 16592
>>> 1 244 244
>>> 11 244 2684
>>> 1 244 244
>>> 1 244 244
>>> ======================
>>>  20497
>>>
>>> We had 20495 tasks that exited with an exit code of 0, and 6 tasks
>>> that exited
>>> with an exit code of -3. The worker logs don't show anything on the
>>> stdout or
>>> stderr of the failed jobs. I looked online what an exit code of -3
>>> could mean,
>>> but didn't find anything.
>>> Here are the failed 6 tasks:
>>> Executing task urn:0-9408-1184616132483... Building executable
>>> command...Executing: /bin/sh shared/wrapper.sh fepl-zqtloeei
>>> fe_stdout_m112
>>> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
>>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
>>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
>>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
>>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
>>> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
>>> --nosite
>>> --resultonly --wham_outputs wf_m112 --solv_lrc_file
>>> solv_chg_a10_m112_done
>>> --fe_file fe_solv_m112 Task urn:0-9408-1184616132483 completed with
>>> exit code -3 in 238 ms
>>>
>>> Executing task urn:0-9408-1184616133199... Building executable
>>> command...Executing: /bin/sh shared/wrapper.sh fepl-2rtloeei
>>> fe_stdout_m112
>>> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
>>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
>>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
>>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
>>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
>>> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
>>> --nosite
>>> --resultonly --wham_outputs wf_m112 --solv_lrc_file
>>> solv_chg_a10_m112_done
>>> --fe_file fe_solv_m112 Task urn:0-9408-1184616133199 completed with
>>> exit code -3 in 201 ms
>>>
>>> Executing task urn:0-15036-1184616133342... Building executable
>>> command...Executing: /bin/sh shared/wrapper.sh fepl-5rtloeei
>>> fe_stdout_m179
>>> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
>>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
>>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
>>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
>>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
>>> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
>>> --nosite
>>> --resultonly --wham_outputs wf_m179 --solv_lrc_file
>>> solv_chg_a10_m179_done
>>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133342 completed with
>>> exit code -3 in 267 ms
>>>
>>> Executing task urn:0-15036-1184616133628... Building executable
>>> command...Executing: /bin/sh shared/wrapper.sh fepl-9rtloeei
>>> fe_stdout_m179
>>> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
>>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
>>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
>>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
>>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
>>> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
>>> --nosite
>>> --resultonly --wham_outputs wf_m179 --solv_lrc_file
>>> solv_chg_a10_m179_done
>>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133628 completed with
>>> exit code -3 in 2368 ms
>>>
>>> Executing task urn:0-15036-1184616133528... Building executable
>>> command...Executing: /bin/sh shared/wrapper.sh fepl-8rtloeei
>>> fe_stdout_m179
>>> stderr.txt wf_m179 solv_chg_a10_m179_done solv_repu_0.2_0.3_m179.out
>>> solv_repu_0_0.2_m179.out solv_repu_0.9_1_m179.out solv_disp_m179.out
>>> solv_chg_m179.out solv_repu_0.6_0.7_m179.out solv_repu_0.5_0.6_m179.out
>>> solv_repu_0.4_0.5_m179.out solv_repu_0.3_0.4_m179.out
>>> solv_repu_0.8_0.9_m179.out solv_repu_0.7_0.8_m179.out fe_solv_m179
>>> fe_stdout_m179 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
>>> --nosite
>>> --resultonly --wham_outputs wf_m179 --solv_lrc_file
>>> solv_chg_a10_m179_done
>>> --fe_file fe_solv_m179 Task urn:0-15036-1184616133528 completed with
>>> exit code -3 in 311 ms
>>>
>>> Executing task urn:0-9408-1184616130688... Building executable
>>> command...Executing: /bin/sh shared/wrapper.sh fepl-9ptloeei
>>> fe_stdout_m112
>>> stderr.txt wf_m112 solv_chg_a10_m112_done solv_repu_0.2_0.3_m112.out
>>> solv_repu_0_0.2_m112.out solv_repu_0.9_1_m112.out solv_disp_m112.out
>>> solv_chg_m112.out solv_repu_0.6_0.7_m112.out solv_repu_0.5_0.6_m112.out
>>> solv_repu_0.4_0.5_m112.out solv_repu_0.3_0.4_m112.out
>>> solv_repu_0.8_0.9_m112.out solv_repu_0.7_0.8_m112.out fe_solv_m112
>>> fe_stdout_m112 /disks/scratchgpfs1/iraicu/ModLyn/bin/fe.pl
>>> --nosite
>>> --resultonly --wham_outputs wf_m112 --solv_lrc_file
>>> solv_chg_a10_m112_done
>>> --fe_file fe_solv_m112 Task urn:0-9408-1184616130688 completed with
>>> exit code -3 in 464 ms
>>>
>>>
>>> Both the Falkon logs and the Swift logs agree on the number of
>>> submitted tasks,
>>> number of successful tasks, and number of failed tasks. There were no
>>> outstanding tasks at the time when the workflow failed. BTW, I
>>> checked the
>>> disk space usage after about an hour that the whole experiment
>>> finished, and
>>> there was plenty of disk space left.
>>>
>>> Yong mentioned that he looked through the output of MolDyn, and there
>>> were only
>>> 242 'fe_solv_*' files, so 2 molecule files were missing... one
>>> question for
>>> Nika, are the 6 failed tasks the same job, resubmitted?
>>> Nika, can you add anything more to this? Is there anything else to
>>> be learned
>>> from the Swift log, as to why those last few jobs failed? After we
>>> have tried
>>> to figure out what happened, can we resume the workflow, and
>>> hopefully finish
>>> the last few jobs in another run?
>>>
>>> Ioan
>>>
>>>
>>>
>>>
>>>  --
>>>
>>>  Ian Foster, Director, Computation Institute
>>> Argonne National Laboratory & University of Chicago
>>> Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
>>> Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
>>> Tel: +1 630 252 4619. Web: www.ci.uchicago.edu.
>>>  Globus Alliance: www.globus.org.
>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>>
>>>  _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>>
>>>
>>>  --
>>> ============================================
>>> Ioan Raicu
>>> Ph.D. Student
>>> ============================================
>>> Distributed Systems Laboratory
>>> Computer Science Department
>>> University of Chicago
>>> 1100 E. 58th Street, Ryerson Hall
>>> Chicago, IL 60637
>>> ============================================
>>> Email: iraicu at cs.uchicago.edu
>>> Web: http://www.cs.uchicago.edu/~iraicu
>>>  http://dsl.cs.uchicago.edu/
>>> ============================================
>>> ============================================
>>>
>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>>
>>>       
>>     
>
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070718/4bb0d43e/attachment.html>

From benc at hawaga.org.uk  Wed Jul 18 11:18:57 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 18 Jul 2007 16:18:57 +0000 (GMT)
Subject: [Swift-devel] kickstart on regular sites
Message-ID: <Pine.LNX.4.64.0707181616580.7034@dildano.hawaga.org.uk>

There was some discussion many months ago about installing kickstart on 
the sites that our users use regularly; and cataloging that information in 
the same standard site catalog that lists all the sites we have.

That never happened though, for whatever reason.

It might be useful to do that though; I think any OSG site will have it 
installated already as part of the OSG standard software stack (?).

-- 


From foster at mcs.anl.gov  Wed Jul 18 16:46:40 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Wed, 18 Jul 2007 16:46:40 -0500
Subject: [Swift-devel] kickstart on regular sites
In-Reply-To: <Pine.LNX.4.64.0707181616580.7034@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707181616580.7034@dildano.hawaga.org.uk>
Message-ID: <469E8A40.50602@mcs.anl.gov>

The set of sites we run on is so small, and the amount of time we spend 
saying "we don't know exactly what happened because kickstart wasn't 
installed" so large, that maybe we should do this :0(

Note that TG has a software catalog (MDS based) for just this sort of 
information

Ben Clifford wrote:
> There was some discussion many months ago about installing kickstart on 
> the sites that our users use regularly; and cataloging that information in 
> the same standard site catalog that lists all the sites we have.
>
> That never happened though, for whatever reason.
>
> It might be useful to do that though; I think any OSG site will have it 
> installated already as part of the OSG standard software stack (?).
>
>   

-- 

   Ian Foster, Director, Computation Institute
Argonne National Laboratory & University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
      Globus Alliance: www.globus.org.


From hategan at mcs.anl.gov  Wed Jul 18 17:06:39 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 18 Jul 2007 17:06:39 -0500
Subject: [Swift-devel] kickstart on regular sites
In-Reply-To: <469E8A40.50602@mcs.anl.gov>
References: <Pine.LNX.4.64.0707181616580.7034@dildano.hawaga.org.uk>
	<469E8A40.50602@mcs.anl.gov>
Message-ID: <1184796399.9931.10.camel@blabla.mcs.anl.gov>

On Wed, 2007-07-18 at 16:46 -0500, Ian Foster wrote:
> The set of sites we run on is so small, and the amount of time we spend 
> saying "we don't know exactly what happened because kickstart wasn't 
> installed" so large, that maybe we should do this :0(

Good point. It's probably less than the time I spend on email replies
saying that kicstart is not going to be the answer to all our problems.

Mihael

> 
> Note that TG has a software catalog (MDS based) for just this sort of 
> information
> 
> Ben Clifford wrote:
> > There was some discussion many months ago about installing kickstart on 
> > the sites that our users use regularly; and cataloging that information in 
> > the same standard site catalog that lists all the sites we have.
> >
> > That never happened though, for whatever reason.
> >
> > It might be useful to do that though; I think any OSG site will have it 
> > installated already as part of the OSG standard software stack (?).
> >
> >   
> 


From itf at mcs.anl.gov  Wed Jul 18 17:24:49 2007
From: itf at mcs.anl.gov (=?utf-8?B?SWFuIEZvc3Rlcg==?=)
Date: Wed, 18 Jul 2007 22:24:49 +0000
Subject: [Swift-devel] kickstart on regular sites
In-Reply-To: <1184796399.9931.10.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0707181616580.7034@dildano.hawaga.org.uk>
	<469E8A40.50602@mcs.anl.gov><1184796399.9931.10.camel@blabla.mcs.anl.gov>
Message-ID: <160288387-1184797558-cardhu_decombobulator_blackberry.rim.net-198160359-@bxe009.bisx.prod.on.blackberry>

It isn't?

:-)

Sent via BlackBerry from T-Mobile

-----Original Message-----
From: Mihael Hategan <hategan at mcs.anl.gov>

Date: Wed, 18 Jul 2007 17:06:39 
To:Ian Foster <foster at mcs.anl.gov>
Cc:Ben Clifford <benc at hawaga.org.uk>, swift-devel at ci.uchicago.edu
Subject: Re: [Swift-devel] kickstart on regular sites


On Wed, 2007-07-18 at 16:46 -0500, Ian Foster wrote:
> The set of sites we run on is so small, and the amount of time we spend 
> saying "we don't know exactly what happened because kickstart wasn't 
> installed" so large, that maybe we should do this :0(

Good point. It's probably less than the time I spend on email replies
saying that kicstart is not going to be the answer to all our problems.

Mihael

> 
> Note that TG has a software catalog (MDS based) for just this sort of 
> information
> 
> Ben Clifford wrote:
> > There was some discussion many months ago about installing kickstart on 
> > the sites that our users use regularly; and cataloging that information in 
> > the same standard site catalog that lists all the sites we have.
> >
> > That never happened though, for whatever reason.
> >
> > It might be useful to do that though; I think any OSG site will have it 
> > installated already as part of the OSG standard software stack (?).
> >
> >   
> 


From benc at hawaga.org.uk  Wed Jul 18 19:57:36 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 19 Jul 2007 00:57:36 +0000 (GMT)
Subject: [Swift-devel] kickstart on regular sites
In-Reply-To: <1184796399.9931.10.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0707181616580.7034@dildano.hawaga.org.uk> 
	<469E8A40.50602@mcs.anl.gov>
	<1184796399.9931.10.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707190055350.7034@dildano.hawaga.org.uk>


> Good point. It's probably less than the time I spend on email replies
> saying that kicstart is not going to be the answer to all our problems.

based on my experience with VDS, kickstart is the answer to a large 
portion of the problems that my users were experiencing; this was, 
however, with a codebase that had been substantially used and debugged 
over some years and so I think that experience doesn't reflect the swift 
situation where its often Swift and associated components that don't work, 
rather than remote sites that don't work.

-- 


From hategan at mcs.anl.gov  Wed Jul 18 20:11:43 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 18 Jul 2007 20:11:43 -0500
Subject: [Swift-devel] kickstart on regular sites
In-Reply-To: <Pine.LNX.4.64.0707190055350.7034@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707181616580.7034@dildano.hawaga.org.uk>
	<469E8A40.50602@mcs.anl.gov>
	<1184796399.9931.10.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707190055350.7034@dildano.hawaga.org.uk>
Message-ID: <1184807503.23345.3.camel@blabla.mcs.anl.gov>

Right. I don't think we were debating that. But the fact that given the
low number of sites, we might as well install (or query the MDS server)
and use kickstart and avoid the debate altogether.

On Thu, 2007-07-19 at 00:57 +0000, Ben Clifford wrote:
> 
> > Good point. It's probably less than the time I spend on email replies
> > saying that kicstart is not going to be the answer to all our problems.
> 
> based on my experience with VDS, kickstart is the answer to a large 
> portion of the problems that my users were experiencing; this was, 
> however, with a codebase that had been substantially used and debugged 
> over some years and so I think that experience doesn't reflect the swift 
> situation where its often Swift and associated components that don't work, 
> rather than remote sites that don't work.
> 


From nefedova at mcs.anl.gov  Thu Jul 19 07:24:58 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Thu, 19 Jul 2007 07:24:58 -0500
Subject: [Swift-devel] off through the end of the week
Message-ID: <18EBDEDE-9A7E-49AE-96C1-A08F7903298C@mcs.anl.gov>

Sorry I have to take the rest of the week off as a sick days -- I saw  
my Dr. yesterday and he diagnosed me with West Nile virus );
I should be OK by next week I hope.

Nika


From benc at hawaga.org.uk  Thu Jul 19 13:13:22 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 19 Jul 2007 18:13:22 +0000 (GMT)
Subject: [Swift-devel] 0.2 release (again)
In-Reply-To: <Pine.LNX.4.64.0707171546560.11237@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707171351500.7513@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0707171546560.11237@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.64.0707191809470.11237@dildano.hawaga.org.uk>


This passes my fairly lightweight testing; and no one else has commented 
(though I suspect that means no one has used it, rather than people have 
tested it successfully). However, that's enough for me to put it up as a 
lightweight release for now, which I have done.

On Tue, 17 Jul 2007, Ben Clifford wrote:

> 
> 
> On Tue, 17 Jul 2007, Ben Clifford wrote:
> 
> > I'm building a release candidate for a low-effort 0.2 release from swift 
> > r915 and cog r1658. Will post here with it later on.
> 
> http://www.ci.uchicago.edu/~benc/vdsk-0.2.tar.gz
> 
> $ md5sum vdsk-0.2.tar.gz 
> 25130bbe97f2f10653b48968953c6d84  vdsk-0.2.tar.gz
> 
> It runs hello world for me. I haven't done any other testing.
> 
> 


From benc at hawaga.org.uk  Thu Jul 19 13:14:34 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 19 Jul 2007 18:14:34 +0000 (GMT)
Subject: [Swift-devel] 0.2 release (again)
In-Reply-To: <Pine.LNX.4.64.0707191809470.11237@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707171351500.7513@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0707171546560.11237@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0707191809470.11237@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.64.0707191813490.7034@dildano.hawaga.org.uk>

sufficiently lightweight, however, that I did not go through the commit 
messages since 0.1 to prepare detailed release notes.

(I don't see an immediately obvious way to do it with svn (?!))

-- 


From benc at hawaga.org.uk  Fri Jul 20 09:23:27 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Fri, 20 Jul 2007 14:23:27 +0000 (GMT)
Subject: [Swift-devel] numeric type(s) in swift.
In-Reply-To: <Pine.LNX.4.64.0707160806440.7513@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707160806440.7513@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.64.0707201422570.11237@dildano.hawaga.org.uk>


On Mon, 16 Jul 2007, Ben Clifford wrote:

> However, now that I look at implementing those, it makes me wonder if we 
> should have a single numeric type. Its not clear that we need float/double 
> in the language as distinct types.

does any one have any particular preferences for numeric types?

In particular has anyone used anything other than 'int' for anything?

-- 


From yongzh at cs.uchicago.edu  Fri Jul 20 09:32:46 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Fri, 20 Jul 2007 09:32:46 -0500 (CDT)
Subject: [Swift-devel] numeric type(s) in swift.
In-Reply-To: <Pine.LNX.4.64.0707201422570.11237@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707160806440.7513@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0707201422570.11237@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.58.0707200929220.8849@classes.cs.uchicago.edu>

I've used float. I think the problem is on the contrary that int and float
may not be enough, we may need more numeric types.

The issues we are having now is just we need a vdl library to deal with
numeric operations, instead of relying on karajan (karajan only has double
type, which is not good for cases when we only need int).

I'd suggust we understand real user requirements before jumping into
solutions.

Yong.

On Fri, 20 Jul 2007, Ben Clifford wrote:

>
>
> On Mon, 16 Jul 2007, Ben Clifford wrote:
>
> > However, now that I look at implementing those, it makes me wonder if we
> > should have a single numeric type. Its not clear that we need float/double
> > in the language as distinct types.
>
> does any one have any particular preferences for numeric types?
>
> In particular has anyone used anything other than 'int' for anything?
>
> --
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From benc at hawaga.org.uk  Fri Jul 20 09:37:14 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Fri, 20 Jul 2007 14:37:14 +0000 (GMT)
Subject: [Swift-devel] numeric type(s) in swift.
In-Reply-To: <Pine.LNX.4.58.0707200929220.8849@classes.cs.uchicago.edu>
References: <Pine.LNX.4.64.0707160806440.7513@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0707201422570.11237@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0707200929220.8849@classes.cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0707201434570.7034@dildano.hawaga.org.uk>


On Fri, 20 Jul 2007, Yong Zhao wrote:

> I've used float. I think the problem is on the contrary that int and float
> may not be enough, we may need more numeric types.
> 
> The issues we are having now is just we need a vdl library to deal with
> numeric operations, instead of relying on karajan (karajan only has double
> type, which is not good for cases when we only need int).

There's a type issue to.

What is the type of this expression?

   5 + 3

and should this be permitted?

  float f = 5 + 3;

There's a bunch of type conversion going on at the moment that isn't 
terribly well defined and that causes me trouble when I want to put in 
more type information/checking.

I bring this up because its getting in the way of my 
proper-xml-intermediate-format work.

-- 


From wilde at mcs.anl.gov  Fri Jul 20 10:12:17 2007
From: wilde at mcs.anl.gov (Mike Wilde)
Date: Fri, 20 Jul 2007 10:12:17 -0500
Subject: [Swift-devel] numeric type(s) in swift.
In-Reply-To: <Pine.LNX.4.58.0707200929220.8849@classes.cs.uchicago.edu>
References: <Pine.LNX.4.64.0707160806440.7513@dildano.hawaga.org.uk>	<Pine.LNX.4.64.0707201422570.11237@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0707200929220.8849@classes.cs.uchicago.edu>
Message-ID: <46A0D0D1.6070407@mcs.anl.gov>

Can we leave things as they are for the moment and come back to this when we 
have more concrete examples?

I certainly see the need to:

a) describe atomic functions that have numeric args

b) do minor calculations on those args, in swift, between calls

How we do b) will be strongly affected by where we go in the "fold" issue, so 
lets gather some app examples to drive this decision.

Seems like we can always do (b) in another language, so we can always "get by" 
by having all args be strings for the moment.  Not pretty, but it lowers the 
urgency of an immediate decision.

I think also that at some point we'll need to reconcile whether we support all 
(or more) of the primitive data types of XML Schema, which has more numeric and 
date types.

Is there any app-based request in bugzilla right now that demands a more 
immediate resolution of this issue?

Yong, can you post the example you had of using a float as an arg?  Did you do 
any swift calculates on those float values in this example?

Thanks,

Mike


Yong Zhao wrote, On 7/20/2007 9:32 AM:
> I've used float. I think the problem is on the contrary that int and float
> may not be enough, we may need more numeric types.
> 
> The issues we are having now is just we need a vdl library to deal with
> numeric operations, instead of relying on karajan (karajan only has double
> type, which is not good for cases when we only need int).
> 
> I'd suggust we understand real user requirements before jumping into
> solutions.
> 
> Yong.
> 
> On Fri, 20 Jul 2007, Ben Clifford wrote:
> 
>>
>> On Mon, 16 Jul 2007, Ben Clifford wrote:
>>
>>> However, now that I look at implementing those, it makes me wonder if we
>>> should have a single numeric type. Its not clear that we need float/double
>>> in the language as distinct types.
>> does any one have any particular preferences for numeric types?
>>
>> In particular has anyone used anything other than 'int' for anything?
>>
>> --
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 
> 

-- 
Mike Wilde
Computation Institute, University of Chicago
Math & Computer Science Division
Argonne National Laboratory
Argonne, IL   60439    USA
tel 630-252-7497 fax 630-252-1997


From benc at hawaga.org.uk  Fri Jul 20 11:09:00 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Fri, 20 Jul 2007 16:09:00 +0000 (GMT)
Subject: [Swift-devel] numeric type(s) in swift.
In-Reply-To: <Pine.LNX.4.64.0707201434570.7034@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707160806440.7513@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0707201422570.11237@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0707200929220.8849@classes.cs.uchicago.edu>
	<Pine.LNX.4.64.0707201434570.7034@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.64.0707201553190.26516@dildano.hawaga.org.uk>


some of the way that numbers are implemented at the moment, they don't 
even keep their declared types. the following puts the string 3.5 into a 
file, despite the fact that there's an integer type involved which should 
be doing something else (causing an error or rounding, most likely).

that's behaviour that's consistent with having a single 'number' type 
rather than multiple strong number types.


type messagefile {}

(messagefile t) greeting(float m) { 
    app {
        echo m  stdout=@filename(t);
    }
}

float f = 7/2;

int i = f;

messagefile outfile <"j-echo.out">;

outfile = greeting(i);


From benc at hawaga.org.uk  Fri Jul 20 11:47:46 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Fri, 20 Jul 2007 16:47:46 +0000 (GMT)
Subject: [Swift-devel] numeric type(s) in swift.
In-Reply-To: <46A0D0D1.6070407@mcs.anl.gov>
References: <Pine.LNX.4.64.0707160806440.7513@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0707201422570.11237@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0707200929220.8849@classes.cs.uchicago.edu>
	<46A0D0D1.6070407@mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707201640410.26516@dildano.hawaga.org.uk>


On Fri, 20 Jul 2007, Mike Wilde wrote:

> Can we leave things as they are for the moment and come back to this when we
> have more concrete examples?

not really - its sufficiently poorly defined and badly behaved at the 
moment that its causing me trouble with the bug 30 work - I'm doing things 
which make stronger type demands than have been previously needed, and so 
the typing needs to be more consistent.

In the absence of any particularly compelling argument in any way, I'll 
make it consistent in the way that is easiest to me.

-- 


From benc at hawaga.org.uk  Fri Jul 20 12:11:50 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Fri, 20 Jul 2007 17:11:50 +0000 (GMT)
Subject: [Swift-devel] numeric type(s) in swift.
In-Reply-To: <46A0D0D1.6070407@mcs.anl.gov>
References: <Pine.LNX.4.64.0707160806440.7513@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0707201422570.11237@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0707200929220.8849@classes.cs.uchicago.edu>
	<46A0D0D1.6070407@mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707201705140.26516@dildano.hawaga.org.uk>


On Fri, 20 Jul 2007, Mike Wilde wrote:

> Is there any app-based request in bugzilla right now that demands a more
> immediate resolution of this issue?

its more that it comes from me trying to do the bug 30 rewrite of the 
intermediate format - every bug that depends on that has a workaround 
being used by apps as they encounter them, however its a serious usability 
problem in terms of people writing code they think will work and finding 
it doesn't (and worse, finding it fails in mysterious ways).

> Seems like we can always do (b) in another language, so we can always 
> "get by" by having all args be strings for the moment.  Not pretty, but 
> it lowers the urgency of an immediate decision.

A 'numeric' type that makes no more constraint on its content looks very 
much like a string; but is still typed enough to know that you can use + 
or - or / or * on it. There's no shame in that.

You say:

> it lowers the urgency of an immediate decision.                               

but deciding (if we do or not) on this approach *is* the kind of decision 
that I'm looking for!

> I think also that at some point we'll need to reconcile whether we 
> support all (or more) of the primitive data types of XML Schema, which 
> has more numeric and date types.

'int' isn't even an XML Schema primitive type - its defined as a 
restriction of a more general type... Our present type model looks almost 
entirely unlike XML Schema.

-- 


From wilde at mcs.anl.gov  Fri Jul 20 13:30:13 2007
From: wilde at mcs.anl.gov (Mike Wilde)
Date: Fri, 20 Jul 2007 13:30:13 -0500
Subject: [Swift-devel] numeric type(s) in swift.
In-Reply-To: <Pine.LNX.4.64.0707201640410.26516@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707160806440.7513@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0707201422570.11237@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0707200929220.8849@classes.cs.uchicago.edu>
	<46A0D0D1.6070407@mcs.anl.gov>
	<Pine.LNX.4.64.0707201640410.26516@dildano.hawaga.org.uk>
Message-ID: <46A0FF35.9000108@mcs.anl.gov>


Ben Clifford wrote, On 7/20/2007 11:47 AM:
> 
> On Fri, 20 Jul 2007, Mike Wilde wrote:
> 
>> Can we leave things as they are for the moment and come back to this when we
>> have more concrete examples?
> 
> not really - its sufficiently poorly defined and badly behaved at the 
> moment that its causing me trouble with the bug 30 work - I'm doing things 
> which make stronger type demands than have been previously needed, and so 
> the typing needs to be more consistent.

OK, thats what I wanted to know.  So we do need to discuss it now.
> 
> In the absence of any particularly compelling argument in any way, I'll 
> make it consistent in the way that is easiest to me.

Im eager to hear what you propose, but reserve the right to call for more 
discussion if I feel its necessary.

I suggested we defer the discussion because I felt that issues of mapping are 
more important, and I thought those were independent of the nature of numeric types.

But if you feel bug 30 is compelling enough to force a decision on this now, 
then we should discuss deeper.

- Mike

> 

-- 
Mike Wilde
Computation Institute, University of Chicago
Math & Computer Science Division
Argonne National Laboratory
Argonne, IL   60439    USA
tel 630-252-7497 fax 630-252-1997


From wilde at mcs.anl.gov  Fri Jul 20 14:52:38 2007
From: wilde at mcs.anl.gov (Mike Wilde)
Date: Fri, 20 Jul 2007 14:52:38 -0500
Subject: [Swift-devel] numeric type(s) in swift.
In-Reply-To: <Pine.LNX.4.64.0707201705140.26516@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707160806440.7513@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0707201422570.11237@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0707200929220.8849@classes.cs.uchicago.edu>
	<46A0D0D1.6070407@mcs.anl.gov>
	<Pine.LNX.4.64.0707201705140.26516@dildano.hawaga.org.uk>
Message-ID: <46A11286.7080807@mcs.anl.gov>

OK, I just reread the thread from the top, and have some thoughts on what our 
alternatives are.

Some of the road forward depends on:

1) whether we care about breaking current code
2) how well the current code can handle (or be taught) type coercions

If I had to pick a simple system, I'd pick either:

a) just strings
b) just string and ints
c) just strings and floats where floats act like ints when they have integral 
values (many systems are like this)
d) strings, ints and floats with fully manual coercions
e) strings, ints and floats with reasonable auto coercions ala C

My pref would be (e) if thats easy to implement.

Forget what I said abut XML-Schema types earlier.

Do the choices above cover the range of reasonable choices?

What are the major open issues, give, say (e)?

- Mike


Ben Clifford wrote, On 7/20/2007 12:11 PM:
> 
> On Fri, 20 Jul 2007, Mike Wilde wrote:
> 
>> Is there any app-based request in bugzilla right now that demands a more
>> immediate resolution of this issue?
> 
> its more that it comes from me trying to do the bug 30 rewrite of the 
> intermediate format - every bug that depends on that has a workaround 
> being used by apps as they encounter them, however its a serious usability 
> problem in terms of people writing code they think will work and finding 
> it doesn't (and worse, finding it fails in mysterious ways).
> 
>> Seems like we can always do (b) in another language, so we can always 
>> "get by" by having all args be strings for the moment.  Not pretty, but 
>> it lowers the urgency of an immediate decision.
> 
> A 'numeric' type that makes no more constraint on its content looks very 
> much like a string; but is still typed enough to know that you can use + 
> or - or / or * on it. There's no shame in that.
> 
> You say:
> 
>> it lowers the urgency of an immediate decision.                               
> 
> but deciding (if we do or not) on this approach *is* the kind of decision 
> that I'm looking for!
> 
>> I think also that at some point we'll need to reconcile whether we 
>> support all (or more) of the primitive data types of XML Schema, which 
>> has more numeric and date types.
> 
> 'int' isn't even an XML Schema primitive type - its defined as a 
> restriction of a more general type... Our present type model looks almost 
> entirely unlike XML Schema.
> 

-- 
Mike Wilde
Computation Institute, University of Chicago
Math & Computer Science Division
Argonne National Laboratory
Argonne, IL   60439    USA
tel 630-252-7497 fax 630-252-1997


From hategan at mcs.anl.gov  Fri Jul 20 15:49:08 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 20 Jul 2007 15:49:08 -0500
Subject: [Swift-devel] numeric type(s) in swift.
In-Reply-To: <46A11286.7080807@mcs.anl.gov>
References: <Pine.LNX.4.64.0707160806440.7513@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0707201422570.11237@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0707200929220.8849@classes.cs.uchicago.edu>
	<46A0D0D1.6070407@mcs.anl.gov>
	<Pine.LNX.4.64.0707201705140.26516@dildano.hawaga.org.uk>
	<46A11286.7080807@mcs.anl.gov>
Message-ID: <1184964549.26024.0.camel@blabla.mcs.anl.gov>

On Fri, 2007-07-20 at 14:52 -0500, Mike Wilde wrote:
> OK, I just reread the thread from the top, and have some thoughts on what our 
> alternatives are.
> 
> Some of the road forward depends on:
> 
> 1) whether we care about breaking current code
> 2) how well the current code can handle (or be taught) type coercions
> 
> If I had to pick a simple system, I'd pick either:
> 
> a) just strings
> b) just string and ints
> c) just strings and floats where floats act like ints when they have integral 
> values (many systems are like this)
> d) strings, ints and floats with fully manual coercions
> e) strings, ints and floats with reasonable auto coercions ala C
> 
> My pref would be (e) if thats easy to implement.
> 
> Forget what I said abut XML-Schema types earlier.
> 
> Do the choices above cover the range of reasonable choices?
> 
> What are the major open issues, give, say (e)?

I don't see many. I've been chatting with ben and decided it's probably
worth trying it on a separate branch.

> 
> - Mike
> 
> 
> 
> 
> Ben Clifford wrote, On 7/20/2007 12:11 PM:
> > 
> > On Fri, 20 Jul 2007, Mike Wilde wrote:
> > 
> >> Is there any app-based request in bugzilla right now that demands a more
> >> immediate resolution of this issue?
> > 
> > its more that it comes from me trying to do the bug 30 rewrite of the 
> > intermediate format - every bug that depends on that has a workaround 
> > being used by apps as they encounter them, however its a serious usability 
> > problem in terms of people writing code they think will work and finding 
> > it doesn't (and worse, finding it fails in mysterious ways).
> > 
> >> Seems like we can always do (b) in another language, so we can always 
> >> "get by" by having all args be strings for the moment.  Not pretty, but 
> >> it lowers the urgency of an immediate decision.
> > 
> > A 'numeric' type that makes no more constraint on its content looks very 
> > much like a string; but is still typed enough to know that you can use + 
> > or - or / or * on it. There's no shame in that.
> > 
> > You say:
> > 
> >> it lowers the urgency of an immediate decision.                               
> > 
> > but deciding (if we do or not) on this approach *is* the kind of decision 
> > that I'm looking for!
> > 
> >> I think also that at some point we'll need to reconcile whether we 
> >> support all (or more) of the primitive data types of XML Schema, which 
> >> has more numeric and date types.
> > 
> > 'int' isn't even an XML Schema primitive type - its defined as a 
> > restriction of a more general type... Our present type model looks almost 
> > entirely unlike XML Schema.
> > 
> 


From benc at hawaga.org.uk  Fri Jul 20 17:12:49 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Fri, 20 Jul 2007 22:12:49 +0000 (GMT)
Subject: [Swift-devel] numeric type(s) in swift.
In-Reply-To: <1184964549.26024.0.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0707160806440.7513@dildano.hawaga.org.uk> 
	<Pine.LNX.4.64.0707201422570.11237@dildano.hawaga.org.uk> 
	<Pine.LNX.4.58.0707200929220.8849@classes.cs.uchicago.edu> 
	<46A0D0D1.6070407@mcs.anl.gov>
	<Pine.LNX.4.64.0707201705140.26516@dildano.hawaga.org.uk>
	<46A11286.7080807@mcs.anl.gov>
	<1184964549.26024.0.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707202206520.26516@dildano.hawaga.org.uk>


I made a branch with the relevant patches from my quilt patch stack.

https://svn.ci.uchicago.edu/svn/vdl2/branches/types-and-expressions

In r940, I remove non-integer numbers from by language by virtue of 
removing the test cases from language-behaviour for them (but no actual 
code changes). If you want to run the language-behaviour tests with the 
non-integer tests in there again, roll back r940 in your local repo.

The two biggest changes are r941 which makes much more stuff be wrapped in 
DSHandles, and r942 which is adjustment to the intermediate language to 
have XML based expressions.

As a consequence of r942, the resulting karajan code has a lot more cruft 
in it (but should still behave as previously). I'm intending to work on 
that more so don't be alarmed.

Type this for the commit logs so far:

svn log 
https://svn.ci.uchicago.edu/svn/vdl2/branches/types-and-expressions 
-r933:HEAD 

-- 


-- 


From bugzilla-daemon at mcs.anl.gov  Mon Jul 23 08:45:25 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Mon, 23 Jul 2007 08:45:25 -0500 (CDT)
Subject: [Swift-devel] [Bug 80] simple_mapper strange prefix behaviour
In-Reply-To: <bug-80-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070723134525.700EC164EC@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=80


benc at hawaga.org.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED


------- Comment #1 from benc at hawaga.org.uk  2007-07-23 08:45 -------
This looks very strongly related to bug 10.


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Mon Jul 23 08:50:29 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Mon, 23 Jul 2007 08:50:29 -0500 (CDT)
Subject: [Swift-devel] [Bug 30] swiftscript XML language should express
	expressions in XML rather than as string literals
In-Reply-To: <bug-30-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070723135029.A95A5164EC@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=30


benc at hawaga.org.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |swift-devel at ci.uchicago.edu
             Status|NEW                         |ASSIGNED


------- Comment #2 from benc at hawaga.org.uk  2007-07-23 08:50 -------
I have this implemented except for issues raised with numerical types, which
Mihael is investigating.


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From benc at hawaga.org.uk  Mon Jul 23 10:08:51 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Mon, 23 Jul 2007 15:08:51 +0000 (GMT)
Subject: [Swift-devel] VDS1 transfer executable
Message-ID: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk>


VDS1 has a utility, transfer, which is for use on the worker nodes to 
stage data in and out.

It seems fairly seriously worth considering using that, rather than 
re-implementing stuff from ground up.

-- 


From benc at hawaga.org.uk  Mon Jul 23 10:16:23 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Mon, 23 Jul 2007 15:16:23 +0000 (GMT)
Subject: [Swift-devel] Re: VDS1 transfer executable
In-Reply-To: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.64.0707231509300.26516@dildano.hawaga.org.uk>


the former suggestion comes to mind becase I was just chatting to buzz 
about dcache and XIO (primarily because I tease him about writing XIO 
drivers for everything), but then it turns into the serious suggestion 
that:

   i) worker-side transfer executable becomes (or is, already, I suspect)
      XIO-aware

  ii) xio-dcache driver should be easy to write (by us or by xio people)

I'm increasingly more convinced as I think about it that there needs to be 
an (optional) worker-side transfer executable for decent staging in/out of 
data on workers; and that maybe we should not mess round with other 
approaches that skirt round this.

--


From hategan at mcs.anl.gov  Mon Jul 23 10:17:39 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Mon, 23 Jul 2007 10:17:39 -0500
Subject: [Swift-devel] VDS1 transfer executable
In-Reply-To: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk>
Message-ID: <1185203859.17343.5.camel@blabla.mcs.anl.gov>

I think the reimplementation argument is not universally valid. One must
consider costs vs. benefits.

On Mon, 2007-07-23 at 15:08 +0000, Ben Clifford wrote:
> VDS1 has a utility, transfer, which is for use on the worker nodes to 
> stage data in and out.
> 
> It seems fairly seriously worth considering using that, rather than 
> re-implementing stuff from ground up.
> 


From benc at hawaga.org.uk  Mon Jul 23 10:19:29 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Mon, 23 Jul 2007 15:19:29 +0000 (GMT)
Subject: [Swift-devel] VDS1 transfer executable
In-Reply-To: <1185203859.17343.5.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk>
	<1185203859.17343.5.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707231518550.26516@dildano.hawaga.org.uk>


Given that the VDS1 transfer executable exists and appears to work, there 
would need to be some strong argument to not use that as a base (which 
there may be, but I don't know of one).

On Mon, 23 Jul 2007, Mihael Hategan wrote:

> I think the reimplementation argument is not universally valid. One must
> consider costs vs. benefits.
> 
> On Mon, 2007-07-23 at 15:08 +0000, Ben Clifford wrote:
> > VDS1 has a utility, transfer, which is for use on the worker nodes to 
> > stage data in and out.
> > 
> > It seems fairly seriously worth considering using that, rather than 
> > re-implementing stuff from ground up.
> > 
> 
> 


From hategan at mcs.anl.gov  Mon Jul 23 10:28:07 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Mon, 23 Jul 2007 10:28:07 -0500
Subject: [Swift-devel] VDS1 transfer executable
In-Reply-To: <Pine.LNX.4.64.0707231518550.26516@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk>
	<1185203859.17343.5.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707231518550.26516@dildano.hawaga.org.uk>
Message-ID: <1185204487.17343.14.camel@blabla.mcs.anl.gov>

Support, throttling, concurrency control. We seem to be fundamentally
changing the way things work, and we do that because we can.

On Mon, 2007-07-23 at 15:19 +0000, Ben Clifford wrote:
> Given that the VDS1 transfer executable exists and appears to work, there 
> would need to be some strong argument to not use that as a base (which 
> there may be, but I don't know of one).
> 
> On Mon, 23 Jul 2007, Mihael Hategan wrote:
> 
> > I think the reimplementation argument is not universally valid. One must
> > consider costs vs. benefits.
> > 
> > On Mon, 2007-07-23 at 15:08 +0000, Ben Clifford wrote:
> > > VDS1 has a utility, transfer, which is for use on the worker nodes to 
> > > stage data in and out.
> > > 
> > > It seems fairly seriously worth considering using that, rather than 
> > > re-implementing stuff from ground up.
> > > 
> > 
> > 
> 


From benc at hawaga.org.uk  Mon Jul 23 10:29:38 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Mon, 23 Jul 2007 15:29:38 +0000 (GMT)
Subject: [Swift-devel] VDS1 transfer executable
In-Reply-To: <1185204487.17343.14.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk> 
	<1185203859.17343.5.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0707231518550.26516@dildano.hawaga.org.uk>
	<1185204487.17343.14.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707231529210.26516@dildano.hawaga.org.uk>


none of those seem to be arguments for or against rewriting vs reusing.

On Mon, 23 Jul 2007, Mihael Hategan wrote:

> Support, throttling, concurrency control. We seem to be fundamentally
> changing the way things work, and we do that because we can.
> 
> On Mon, 2007-07-23 at 15:19 +0000, Ben Clifford wrote:
> > Given that the VDS1 transfer executable exists and appears to work, there 
> > would need to be some strong argument to not use that as a base (which 
> > there may be, but I don't know of one).
> > 
> > On Mon, 23 Jul 2007, Mihael Hategan wrote:
> > 
> > > I think the reimplementation argument is not universally valid. One must
> > > consider costs vs. benefits.
> > > 
> > > On Mon, 2007-07-23 at 15:08 +0000, Ben Clifford wrote:
> > > > VDS1 has a utility, transfer, which is for use on the worker nodes to 
> > > > stage data in and out.
> > > > 
> > > > It seems fairly seriously worth considering using that, rather than 
> > > > re-implementing stuff from ground up.
> > > > 
> > > 
> > > 
> > 
> 
> 


From hategan at mcs.anl.gov  Mon Jul 23 10:37:24 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Mon, 23 Jul 2007 10:37:24 -0500
Subject: [Swift-devel] VDS1 transfer executable
In-Reply-To: <Pine.LNX.4.64.0707231529210.26516@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk>
	<1185203859.17343.5.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707231518550.26516@dildano.hawaga.org.uk>
	<1185204487.17343.14.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707231529210.26516@dildano.hawaga.org.uk>
Message-ID: <1185205044.17343.22.camel@blabla.mcs.anl.gov>

They are not in general. They are arguments against reusing a particular
thing, which may justify rewriting.

On Mon, 2007-07-23 at 15:29 +0000, Ben Clifford wrote:
> none of those seem to be arguments for or against rewriting vs reusing.
> 
> On Mon, 23 Jul 2007, Mihael Hategan wrote:
> 
> > Support, throttling, concurrency control. We seem to be fundamentally
> > changing the way things work, and we do that because we can.
> > 
> > On Mon, 2007-07-23 at 15:19 +0000, Ben Clifford wrote:
> > > Given that the VDS1 transfer executable exists and appears to work, there 
> > > would need to be some strong argument to not use that as a base (which 
> > > there may be, but I don't know of one).
> > > 
> > > On Mon, 23 Jul 2007, Mihael Hategan wrote:
> > > 
> > > > I think the reimplementation argument is not universally valid. One must
> > > > consider costs vs. benefits.
> > > > 
> > > > On Mon, 2007-07-23 at 15:08 +0000, Ben Clifford wrote:
> > > > > VDS1 has a utility, transfer, which is for use on the worker nodes to 
> > > > > stage data in and out.
> > > > > 
> > > > > It seems fairly seriously worth considering using that, rather than 
> > > > > re-implementing stuff from ground up.
> > > > > 
> > > > 
> > > > 
> > > 
> > 
> > 
> 


From hategan at mcs.anl.gov  Mon Jul 23 10:39:33 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Mon, 23 Jul 2007 10:39:33 -0500
Subject: [Swift-devel] VDS1 transfer executable
In-Reply-To: <1185204487.17343.14.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk>
	<1185203859.17343.5.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707231518550.26516@dildano.hawaga.org.uk>
	<1185204487.17343.14.camel@blabla.mcs.anl.gov>
Message-ID: <1185205173.17343.24.camel@blabla.mcs.anl.gov>

Also, we should steer away from C code. We're far more efficient with
java (both as programmers and as troubleshooters).

On Mon, 2007-07-23 at 10:28 -0500, Mihael Hategan wrote:
> Support, throttling, concurrency control. We seem to be fundamentally
> changing the way things work, and we do that because we can.
> 
> On Mon, 2007-07-23 at 15:19 +0000, Ben Clifford wrote:
> > Given that the VDS1 transfer executable exists and appears to work, there 
> > would need to be some strong argument to not use that as a base (which 
> > there may be, but I don't know of one).
> > 
> > On Mon, 23 Jul 2007, Mihael Hategan wrote:
> > 
> > > I think the reimplementation argument is not universally valid. One must
> > > consider costs vs. benefits.
> > > 
> > > On Mon, 2007-07-23 at 15:08 +0000, Ben Clifford wrote:
> > > > VDS1 has a utility, transfer, which is for use on the worker nodes to 
> > > > stage data in and out.
> > > > 
> > > > It seems fairly seriously worth considering using that, rather than 
> > > > re-implementing stuff from ground up.
> > > > 
> > > 
> > > 
> > 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 


From wilde at mcs.anl.gov  Mon Jul 23 10:40:25 2007
From: wilde at mcs.anl.gov (Mike Wilde)
Date: Mon, 23 Jul 2007 10:40:25 -0500
Subject: [Swift-devel] VDS1 transfer executable
In-Reply-To: <Pine.LNX.4.64.0707231529210.26516@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk>
	<1185203859.17343.5.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707231518550.26516@dildano.hawaga.org.uk>	<1185204487.17343.14.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707231529210.26516@dildano.hawaga.org.uk>
Message-ID: <46A4CBE9.6060600@mcs.anl.gov>

Im in favor of building on transfer.  There is also the newer utility "t2".

Jens Im sure will be delighted to expound on these.  As I recall one was better 
at some things and the other at others.

For example, I think t2 will retry failing I/Os from am alternate PFN if several 
  replicas are available.  But perhaps transfer does parallel transfers better, 
or some such advantage.  I need to dive into old email to find the info, but in 
the meantime Jens or the manpages can probably explain much.

- Mike

Ben Clifford wrote, On 7/23/2007 10:29 AM:
> none of those seem to be arguments for or against rewriting vs reusing.
> 
> On Mon, 23 Jul 2007, Mihael Hategan wrote:
> 
>> Support, throttling, concurrency control. We seem to be fundamentally
>> changing the way things work, and we do that because we can.
>>
>> On Mon, 2007-07-23 at 15:19 +0000, Ben Clifford wrote:
>>> Given that the VDS1 transfer executable exists and appears to work, there 
>>> would need to be some strong argument to not use that as a base (which 
>>> there may be, but I don't know of one).
>>>
>>> On Mon, 23 Jul 2007, Mihael Hategan wrote:
>>>
>>>> I think the reimplementation argument is not universally valid. One must
>>>> consider costs vs. benefits.
>>>>
>>>> On Mon, 2007-07-23 at 15:08 +0000, Ben Clifford wrote:
>>>>> VDS1 has a utility, transfer, which is for use on the worker nodes to 
>>>>> stage data in and out.
>>>>>
>>>>> It seems fairly seriously worth considering using that, rather than 
>>>>> re-implementing stuff from ground up.
>>>>>
>>>>
>>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 
> 

-- 
Mike Wilde
Computation Institute, University of Chicago
Math & Computer Science Division
Argonne National Laboratory
Argonne, IL   60439    USA
tel 630-252-7497 fax 630-252-1997


From hategan at mcs.anl.gov  Mon Jul 23 11:02:59 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Mon, 23 Jul 2007 11:02:59 -0500
Subject: [Swift-devel] VDS1 transfer executable
In-Reply-To: <46A4CBE9.6060600@mcs.anl.gov>
References: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk>
	<1185203859.17343.5.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707231518550.26516@dildano.hawaga.org.uk>
	<1185204487.17343.14.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707231529210.26516@dildano.hawaga.org.uk>
	<46A4CBE9.6060600@mcs.anl.gov>
Message-ID: <1185206579.18828.7.camel@blabla.mcs.anl.gov>

And that's what I mean by "because we can".

On Mon, 2007-07-23 at 10:40 -0500, Mike Wilde wrote:
> Im in favor of building on transfer.  There is also the newer utility "t2".
> 
> Jens Im sure will be delighted to expound on these.  As I recall one was better 
> at some things and the other at others.
> 
> For example, I think t2 will retry failing I/Os from am alternate PFN if several 
>   replicas are available.  But perhaps transfer does parallel transfers better, 
> or some such advantage.  I need to dive into old email to find the info, but in 
> the meantime Jens or the manpages can probably explain much.
> 
> - Mike
> 
> Ben Clifford wrote, On 7/23/2007 10:29 AM:
> > none of those seem to be arguments for or against rewriting vs reusing.
> > 
> > On Mon, 23 Jul 2007, Mihael Hategan wrote:
> > 
> >> Support, throttling, concurrency control. We seem to be fundamentally
> >> changing the way things work, and we do that because we can.
> >>
> >> On Mon, 2007-07-23 at 15:19 +0000, Ben Clifford wrote:
> >>> Given that the VDS1 transfer executable exists and appears to work, there 
> >>> would need to be some strong argument to not use that as a base (which 
> >>> there may be, but I don't know of one).
> >>>
> >>> On Mon, 23 Jul 2007, Mihael Hategan wrote:
> >>>
> >>>> I think the reimplementation argument is not universally valid. One must
> >>>> consider costs vs. benefits.
> >>>>
> >>>> On Mon, 2007-07-23 at 15:08 +0000, Ben Clifford wrote:
> >>>>> VDS1 has a utility, transfer, which is for use on the worker nodes to 
> >>>>> stage data in and out.
> >>>>>
> >>>>> It seems fairly seriously worth considering using that, rather than 
> >>>>> re-implementing stuff from ground up.
> >>>>>
> >>>>
> >>
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > 
> > 
> 


From foster at mcs.anl.gov  Mon Jul 23 11:48:25 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Mon, 23 Jul 2007 11:48:25 -0500
Subject: [Swift-devel] VDS1 transfer executable
In-Reply-To: <1185204487.17343.14.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk>	<1185203859.17343.5.camel@blabla.mcs.anl.gov>	<Pine.LNX.4.64.0707231518550.26516@dildano.hawaga.org.uk>
	<1185204487.17343.14.camel@blabla.mcs.anl.gov>
Message-ID: <46A4DBD9.6050205@mcs.anl.gov>

A couple of comments that may be relevant:

a) I'd really like to see evaluation of what we have, at scale, before 
starting reimplementation of anything. (Have I mentioned that we need to 
be showing routine use at scale if we are to justify continuation of 
this project? An important step would seem to be to try running with 
what we have.)

b) The CEDPS guys are hard at work on storage management solutions (MOPS 
is the keyword). I think we should be thinking about whether/how this 
has a role to play in the future.

Ian.

Mihael Hategan wrote:
> Support, throttling, concurrency control. We seem to be fundamentally
> changing the way things work, and we do that because we can.
>
> On Mon, 2007-07-23 at 15:19 +0000, Ben Clifford wrote:
>   
>> Given that the VDS1 transfer executable exists and appears to work, there 
>> would need to be some strong argument to not use that as a base (which 
>> there may be, but I don't know of one).
>>
>> On Mon, 23 Jul 2007, Mihael Hategan wrote:
>>
>>     
>>> I think the reimplementation argument is not universally valid. One must
>>> consider costs vs. benefits.
>>>
>>> On Mon, 2007-07-23 at 15:08 +0000, Ben Clifford wrote:
>>>       
>>>> VDS1 has a utility, transfer, which is for use on the worker nodes to 
>>>> stage data in and out.
>>>>
>>>> It seems fairly seriously worth considering using that, rather than 
>>>> re-implementing stuff from ground up.
>>>>
>>>>         
>>>       
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>   

-- 

   Ian Foster, Director, Computation Institute
Argonne National Laboratory & University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
      Globus Alliance: www.globus.org.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070723/6a95aa90/attachment.html>

From benc at hawaga.org.uk  Tue Jul 24 05:51:10 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 24 Jul 2007 10:51:10 +0000 (GMT)
Subject: [Swift-devel] nightly tests changes
Message-ID: <Pine.LNX.4.64.0707241046500.26516@dildano.hawaga.org.uk>


I made some changes to the nightly tests (one for more information, the 
other to fix the file counter test that was broken) but I don't know how 
to deploy them. I think at least nightly.sh doesn't get updated 
automatically.

r952: fix ls portion of file_counter nightly test - can't pass wildcards 
to ls as those are expanded by the shell, not by ls itself; and if ls 
finds no files it returns a failure code. Now use the root directory, on 
the assumption that this always has some files in it and is always 
readable.

r953: formatting of nightly test output - specify the full year, to match 
up with verbose specification of the time component; log the hostname on 
which the tests ran

-- 


From bugzilla-daemon at mcs.anl.gov  Tue Jul 24 07:26:14 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Tue, 24 Jul 2007 07:26:14 -0500 (CDT)
Subject: [Swift-devel] [Bug 80] simple_mapper strange prefix behaviour
In-Reply-To: <bug-80-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070724122614.923CA164B3@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=80


------- Comment #2 from benc at hawaga.org.uk  2007-07-24 07:26 -------
1. In some cases (such as illustrated by this bug) Path.Entry.getName() returns
the prefix for the first element, which is a value in filename-space, not in
dataset-path-space. The DefaultFileNameElementMapper is stupid enough to pass
through a prefix untouched, even though it isn't a valid path component, so
this doesn't cause a problem.

2. AbstractFileMapper tries to infer whether a path entry is an array index or
a field name by testing whether the first character is a numeric digit or not
(rather than using the Path.Entry index member value).

When the prefix begins with a digit, the above two properties interact to cause
the filename prefix to be treated as an array index, which fails.


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From benc at hawaga.org.uk  Tue Jul 24 07:45:03 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 24 Jul 2007 12:45:03 +0000 (GMT)
Subject: [Swift-devel] more swift-devel bugzilla mails
Message-ID: <Pine.LNX.4.64.0707241243530.26516@dildano.hawaga.org.uk>


I just modified the bugzilla config so that swift-devel is watching all of 
the swift developer's emails. This will get more bug change email sent to 
the list. Alas, bugzilla doesn't seem to have a facility for an address to 
watch all activity in the bugzilla.
-- 


From bugzilla-daemon at mcs.anl.gov  Tue Jul 24 09:31:32 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Tue, 24 Jul 2007 09:31:32 -0500 (CDT)
Subject: [Swift-devel] [Bug 6] Not globally unique temporary file names
In-Reply-To: <bug-6-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070724143132.CEBE1164B3@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=6


------- Comment #2 from hategan at mcs.anl.gov  2007-07-24 09:31 -------
Yes. It needs to stay here.


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You reported the bug, or are watching the reporter.
You are the assignee for the bug, or are watching the assignee.


From hategan at mcs.anl.gov  Tue Jul 24 09:34:07 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 24 Jul 2007 09:34:07 -0500
Subject: [Swift-devel] Re: nightly tests changes
In-Reply-To: <Pine.LNX.4.64.0707241046500.26516@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707241046500.26516@dildano.hawaga.org.uk>
Message-ID: <1185287647.16438.3.camel@blabla.mcs.anl.gov>

On Tue, 2007-07-24 at 10:51 +0000, Ben Clifford wrote:
> I made some changes to the nightly tests (one for more information, the 
> other to fix the file counter test that was broken) but I don't know how 
> to deploy them. I think at least nightly.sh doesn't get updated 
> automatically.

Right. I'll poke it.

> 
> r952: fix ls portion of file_counter nightly test - can't pass wildcards 
> to ls as those are expanded by the shell, not by ls itself; and if ls 
> finds no files it returns a failure code. Now use the root directory, on 
> the assumption that this always has some files in it and is always 
> readable.
> 
> r953: formatting of nightly test output - specify the full year, to match 
> up with verbose specification of the time component; log the hostname on 
> which the tests ran
> 


From hategan at mcs.anl.gov  Tue Jul 24 09:37:19 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 24 Jul 2007 09:37:19 -0500
Subject: [Swift-devel] VDS1 transfer executable
In-Reply-To: <46A4CBE9.6060600@mcs.anl.gov>
References: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk>
	<1185203859.17343.5.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707231518550.26516@dildano.hawaga.org.uk>
	<1185204487.17343.14.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707231529210.26516@dildano.hawaga.org.uk>
	<46A4CBE9.6060600@mcs.anl.gov>
Message-ID: <1185287839.16438.7.camel@blabla.mcs.anl.gov>

On Mon, 2007-07-23 at 10:40 -0500, Mike Wilde wrote:
> Im in favor of building on transfer.  There is also the newer utility "t2".
> 
> Jens Im sure will be delighted to expound on these.

Well, he said he'd use the Java stuff, since it has more flexibility
than the command line interface of globus-url-copy, which is used by
"transfer". On the other hand, the Java stuff is heavier on resources
(unless there's some form of JVM running on some form of head node
permanently).

>   As I recall one was better 
> at some things and the other at others.
> 
> For example, I think t2 will retry failing I/Os from am alternate PFN if several 
>   replicas are available.  But perhaps transfer does parallel transfers better, 
> or some such advantage.  I need to dive into old email to find the info, but in 
> the meantime Jens or the manpages can probably explain much.
> 
> - Mike
> 
> Ben Clifford wrote, On 7/23/2007 10:29 AM:
> > none of those seem to be arguments for or against rewriting vs reusing.
> > 
> > On Mon, 23 Jul 2007, Mihael Hategan wrote:
> > 
> >> Support, throttling, concurrency control. We seem to be fundamentally
> >> changing the way things work, and we do that because we can.
> >>
> >> On Mon, 2007-07-23 at 15:19 +0000, Ben Clifford wrote:
> >>> Given that the VDS1 transfer executable exists and appears to work, there 
> >>> would need to be some strong argument to not use that as a base (which 
> >>> there may be, but I don't know of one).
> >>>
> >>> On Mon, 23 Jul 2007, Mihael Hategan wrote:
> >>>
> >>>> I think the reimplementation argument is not universally valid. One must
> >>>> consider costs vs. benefits.
> >>>>
> >>>> On Mon, 2007-07-23 at 15:08 +0000, Ben Clifford wrote:
> >>>>> VDS1 has a utility, transfer, which is for use on the worker nodes to 
> >>>>> stage data in and out.
> >>>>>
> >>>>> It seems fairly seriously worth considering using that, rather than 
> >>>>> re-implementing stuff from ground up.
> >>>>>
> >>>>
> >>
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > 
> > 
> 


From benc at hawaga.org.uk  Tue Jul 24 09:42:36 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 24 Jul 2007 14:42:36 +0000 (GMT)
Subject: [Swift-devel] simple_mapper separators
Message-ID: <Pine.LNX.4.64.0707241434270.26516@dildano.hawaga.org.uk>


I've been poking through simple_mapper to look at the various bugs open on 
that code.

There's some special case handling for path component separators (in the 
abstractfilemapper superclass) such that the last component separator ends 
up being a "." instead of whatever comes from the supplied 
FileNameElementMapper (which is "_" in the default case).

See the test in svn 

 tests/language-behaviour/T077-simplemapper-bug80.swift, 

which is also here:

http://www.ci.uchicago.edu/trac/swift/browser/trunk/tests/language-behaviour/T076-simplemapper-bug80.swift?format=raw

This maps a three level array structure to filenames in a fairly 
straightforward fashion.

The output files are:

T077-simplemapper-bug80.aleph.out
T077-simplemapper-bug80.beth.out
T077-simplemapper-bug80_subordinate.epsilon.out
T077-simplemapper-bug80_subordinate.sigma.out
T077-simplemapper-bug80_subordinate_moresubordinate.hamza.out

Its a bit surprising/unintuitive that the last separator that comes from 
the expression path is a "." rather than a "_" like the other ones, at 
least in the presence of a suffix; though I can see circumstances where it 
is useful (when the structure fields have the same name as filename 
extensions and there is not suffix).

The path of least complexity says that this final separator change 
shouldn't happen - its easier to document and easier to explain.

-- 


From benc at hawaga.org.uk  Tue Jul 24 09:45:37 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 24 Jul 2007 14:45:37 +0000 (GMT)
Subject: [Swift-devel] VDS1 transfer executable
In-Reply-To: <1185287839.16438.7.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk> 
	<1185203859.17343.5.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0707231518550.26516@dildano.hawaga.org.uk> 
	<1185204487.17343.14.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0707231529210.26516@dildano.hawaga.org.uk>
	<46A4CBE9.6060600@mcs.anl.gov>
	<1185287839.16438.7.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707241443440.26516@dildano.hawaga.org.uk>


> (unless there's some form of JVM running on some form of head node 
> permanently).

Transfer stuff needs to (sometimes) run on the worker node, not the head 
node, I think.

I think running things through the head node is going to produce similar 
performance bottle necks to running on the submit node in the case of 
running on a single site with a distributed file system supplying the 
data.

-- 


From hategan at mcs.anl.gov  Tue Jul 24 09:49:30 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 24 Jul 2007 09:49:30 -0500
Subject: [Swift-devel] VDS1 transfer executable
In-Reply-To: <Pine.LNX.4.64.0707241443440.26516@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk>
	<1185203859.17343.5.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707231518550.26516@dildano.hawaga.org.uk>
	<1185204487.17343.14.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707231529210.26516@dildano.hawaga.org.uk>
	<46A4CBE9.6060600@mcs.anl.gov>
	<1185287839.16438.7.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707241443440.26516@dildano.hawaga.org.uk>
Message-ID: <1185288570.17215.3.camel@blabla.mcs.anl.gov>

Can you be more specific on what bottlenecks we're trying to avoid?

On Tue, 2007-07-24 at 14:45 +0000, Ben Clifford wrote:
> > (unless there's some form of JVM running on some form of head node 
> > permanently).
> 
> Transfer stuff needs to (sometimes) run on the worker node, not the head 
> node, I think.
> 
> I think running things through the head node is going to produce similar 
> performance bottle necks to running on the submit node in the case of 
> running on a single site with a distributed file system supplying the 
> data.
> 


From benc at hawaga.org.uk  Tue Jul 24 09:52:47 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 24 Jul 2007 14:52:47 +0000 (GMT)
Subject: [Swift-devel] VDS1 transfer executable
In-Reply-To: <1185288570.17215.3.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk> 
	<1185203859.17343.5.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0707231518550.26516@dildano.hawaga.org.uk> 
	<1185204487.17343.14.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0707231529210.26516@dildano.hawaga.org.uk>
	<46A4CBE9.6060600@mcs.anl.gov>
	<1185287839.16438.7.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0707241443440.26516@dildano.hawaga.org.uk>
	<1185288570.17215.3.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707241452310.26516@dildano.hawaga.org.uk>


On Tue, 24 Jul 2007, Mihael Hategan wrote:

> Can you be more specific on what bottlenecks we're trying to avoid?

pumping all the data for the workflow through one ethernet card and CPU.

-- 


From hategan at mcs.anl.gov  Tue Jul 24 10:05:30 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 24 Jul 2007 10:05:30 -0500
Subject: [Swift-devel] VDS1 transfer executable
In-Reply-To: <Pine.LNX.4.64.0707241452310.26516@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk>
	<1185203859.17343.5.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707231518550.26516@dildano.hawaga.org.uk>
	<1185204487.17343.14.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707231529210.26516@dildano.hawaga.org.uk>
	<46A4CBE9.6060600@mcs.anl.gov>
	<1185287839.16438.7.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707241443440.26516@dildano.hawaga.org.uk>
	<1185288570.17215.3.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707241452310.26516@dildano.hawaga.org.uk>
Message-ID: <1185289530.17828.5.camel@blabla.mcs.anl.gov>

On Tue, 2007-07-24 at 14:52 +0000, Ben Clifford wrote:
> 
> On Tue, 24 Jul 2007, Mihael Hategan wrote:
> 
> > Can you be more specific on what bottlenecks we're trying to avoid?
> 
> pumping all the data for the workflow through one ethernet card and CPU.

It's I/O bound stuff, so the CPU is likely not to be the problem. And
generally the eth card would be fatter than the pipe outside.

The local storage on the other hand may be a problem. It's tricky
however. Should a bunch of executables need the same input file, it
would likely be better to transfer it only once on the head node than
multiple times on each worker node.

> 


From benc at hawaga.org.uk  Tue Jul 24 10:11:03 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 24 Jul 2007 15:11:03 +0000 (GMT)
Subject: [Swift-devel] VDS1 transfer executable
In-Reply-To: <1185289530.17828.5.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk> 
	<1185203859.17343.5.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0707231518550.26516@dildano.hawaga.org.uk> 
	<1185204487.17343.14.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0707231529210.26516@dildano.hawaga.org.uk>
	<46A4CBE9.6060600@mcs.anl.gov>
	<1185287839.16438.7.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0707241443440.26516@dildano.hawaga.org.uk> 
	<1185288570.17215.3.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0707241452310.26516@dildano.hawaga.org.uk>
	<1185289530.17828.5.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707241506440.26516@dildano.hawaga.org.uk>


On Tue, 24 Jul 2007, Mihael Hategan wrote:

> It's I/O bound stuff, so the CPU is likely not to be the problem. And 
> generally the eth card would be fatter than the pipe outside.

In the case where eg. dCache is 'inside' rather than 'outside', that's 
different.

> The local storage on the other hand may be a problem. It's tricky 
> however. Should a bunch of executables need the same input file, it 
> would likely be better to transfer it only once on the head node than 
> multiple times on each worker node.

Its got to be transferred to the worker nodes anyway (at least as much of 
it as is read/written) - in the present case using whatever shared posix 
fs the site-wide scratch space lives on.

How the two different approaches stack up is probably going to depend on 
the site layout and its relation to wherever submit-side data lives 
(which, as I said, may be on-site); and on the app. So I don't think 
there's one right way to do it.

-- 


From foster at mcs.anl.gov  Tue Jul 24 10:21:34 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Tue, 24 Jul 2007 10:21:34 -0500
Subject: [Swift-devel] VDS1 transfer executable
In-Reply-To: <Pine.LNX.4.64.0707241452310.26516@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk>
	<1185203859.17343.5.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707231518550.26516@dildano.hawaga.org.uk>
	<1185204487.17343.14.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707231529210.26516@dildano.hawaga.org.uk>	<46A4CBE9.6060600@mcs.anl.gov>	<1185287839.16438.7.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707241443440.26516@dildano.hawaga.org.uk>	<1185288570.17215.3.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707241452310.26516@dildano.hawaga.org.uk>
Message-ID: <46A618FE.20205@mcs.anl.gov>

Do we have data that show this to be a problem?

Ben Clifford wrote:
> On Tue, 24 Jul 2007, Mihael Hategan wrote:
>
>   
>> Can you be more specific on what bottlenecks we're trying to avoid?
>>     
>
> pumping all the data for the workflow through one ethernet card and CPU.
>
>   

-- 

   Ian Foster, Director, Computation Institute
Argonne National Laboratory & University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
      Globus Alliance: www.globus.org.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070724/c191113d/attachment.html>

From hategan at mcs.anl.gov  Tue Jul 24 10:23:15 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 24 Jul 2007 10:23:15 -0500
Subject: [Swift-devel] VDS1 transfer executable
In-Reply-To: <Pine.LNX.4.64.0707241506440.26516@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk>
	<1185203859.17343.5.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707231518550.26516@dildano.hawaga.org.uk>
	<1185204487.17343.14.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707231529210.26516@dildano.hawaga.org.uk>
	<46A4CBE9.6060600@mcs.anl.gov>
	<1185287839.16438.7.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707241443440.26516@dildano.hawaga.org.uk>
	<1185288570.17215.3.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707241452310.26516@dildano.hawaga.org.uk>
	<1185289530.17828.5.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707241506440.26516@dildano.hawaga.org.uk>
Message-ID: <1185290595.18405.9.camel@blabla.mcs.anl.gov>

On Tue, 2007-07-24 at 15:11 +0000, Ben Clifford wrote:
> On Tue, 24 Jul 2007, Mihael Hategan wrote:
> 
> > It's I/O bound stuff, so the CPU is likely not to be the problem. And 
> > generally the eth card would be fatter than the pipe outside.
> 
> In the case where eg. dCache is 'inside' rather than 'outside', that's 
> different.

Then it wouldn't be going through eth, I'm guessing. They invented lo.
And if it's not lo, then you'd still have a single eth (the source).
Doing single eth to single eth will probably be not much different from
single eth to multiple eths. There's the other possibility where the
source is multi-headed. But we should probably not optimize for 1% of
the scenarios.

> 
> > The local storage on the other hand may be a problem. It's tricky 
> > however. Should a bunch of executables need the same input file, it 
> > would likely be better to transfer it only once on the head node than 
> > multiple times on each worker node.
> 
> Its got to be transferred to the worker nodes anyway (at least as much of 
> it as is read/written) - in the present case using whatever shared posix 
> fs the site-wide scratch space lives on.

Yes and no. Some of the data may be transferred, as needed. Also, there
may be high performance shared FSes, which may beat our puny attempts at
better performance.

> 
> How the two different approaches stack up is probably going to depend on 
> the site layout and its relation to wherever submit-side data lives 
> (which, as I said, may be on-site); and on the app. So I don't think 
> there's one right way to do it.

Yep. But one of the choices is an engineering no no for us. If we can
make the other sufficiently good, we can provide a reasonable solution
at a low cost.

(Note to Ian: we're not implementing anything yet) 

> 


From hategan at mcs.anl.gov  Tue Jul 24 13:47:16 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 24 Jul 2007 13:47:16 -0500
Subject: [Swift-devel] numeric type(s) in swift.
In-Reply-To: <Pine.LNX.4.64.0707202206520.26516@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707160806440.7513@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0707201422570.11237@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0707200929220.8849@classes.cs.uchicago.edu>
	<46A0D0D1.6070407@mcs.anl.gov>
	<Pine.LNX.4.64.0707201705140.26516@dildano.hawaga.org.uk>
	<46A11286.7080807@mcs.anl.gov>
	<1184964549.26024.0.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707202206520.26516@dildano.hawaga.org.uk>
Message-ID: <1185302836.6949.5.camel@blabla.mcs.anl.gov>

I'm thinking we should have two division operators:
div - integer division (int, int -> int)
/ - floating point division ( [int|float], [int|float] -> float )

This is necessary because we don't have type casting, so a programmer
could not specify nicely how to force an int/int division to be result
in a floating point number. In C (and related), one would type cast one
of the operands to double (e.g. double x = (double) i / j;). In our case
it could be done with a separate assignment, but I think that's
cumbersome.

Mihael

On Fri, 2007-07-20 at 22:12 +0000, Ben Clifford wrote:
> I made a branch with the relevant patches from my quilt patch stack.
> 
> https://svn.ci.uchicago.edu/svn/vdl2/branches/types-and-expressions
> 
> In r940, I remove non-integer numbers from by language by virtue of 
> removing the test cases from language-behaviour for them (but no actual 
> code changes). If you want to run the language-behaviour tests with the 
> non-integer tests in there again, roll back r940 in your local repo.
> 
> The two biggest changes are r941 which makes much more stuff be wrapped in 
> DSHandles, and r942 which is adjustment to the intermediate language to 
> have XML based expressions.
> 
> As a consequence of r942, the resulting karajan code has a lot more cruft 
> in it (but should still behave as previously). I'm intending to work on 
> that more so don't be alarmed.
> 
> Type this for the commit logs so far:
> 
> svn log 
> https://svn.ci.uchicago.edu/svn/vdl2/branches/types-and-expressions 
> -r933:HEAD 
> 
> -- 
> 
> 


From hategan at mcs.anl.gov  Tue Jul 24 15:12:28 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 24 Jul 2007 15:12:28 -0500
Subject: [Swift-devel] numeric type(s) in swift.
In-Reply-To: <Pine.LNX.4.64.0707202206520.26516@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707160806440.7513@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0707201422570.11237@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0707200929220.8849@classes.cs.uchicago.edu>
	<46A0D0D1.6070407@mcs.anl.gov>
	<Pine.LNX.4.64.0707201705140.26516@dildano.hawaga.org.uk>
	<46A11286.7080807@mcs.anl.gov>
	<1184964549.26024.0.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707202206520.26516@dildano.hawaga.org.uk>
Message-ID: <1185307948.14893.2.camel@blabla.mcs.anl.gov>

I've committed some stuff to that branch which should make the numeric
operators more efficient. The language behavior tests seem to pass.

One potentially problem-causing change (if broken code makes broken
assumptions) is that Swift number values are not stored as strings any
more, but as subclasses of java.lang.Number.

Mihael

On Fri, 2007-07-20 at 22:12 +0000, Ben Clifford wrote:
> I made a branch with the relevant patches from my quilt patch stack.
> 
> https://svn.ci.uchicago.edu/svn/vdl2/branches/types-and-expressions
> 
> In r940, I remove non-integer numbers from by language by virtue of 
> removing the test cases from language-behaviour for them (but no actual 
> code changes). If you want to run the language-behaviour tests with the 
> non-integer tests in there again, roll back r940 in your local repo.
> 
> The two biggest changes are r941 which makes much more stuff be wrapped in 
> DSHandles, and r942 which is adjustment to the intermediate language to 
> have XML based expressions.
> 
> As a consequence of r942, the resulting karajan code has a lot more cruft 
> in it (but should still behave as previously). I'm intending to work on 
> that more so don't be alarmed.
> 
> Type this for the commit logs so far:
> 
> svn log 
> https://svn.ci.uchicago.edu/svn/vdl2/branches/types-and-expressions 
> -r933:HEAD 
> 
> -- 
> 
> 


From bugzilla-daemon at mcs.anl.gov  Tue Jul 24 16:21:00 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Tue, 24 Jul 2007 16:21:00 -0500 (CDT)
Subject: [Swift-devel] [Bug 83] nested loops hung
In-Reply-To: <bug-83-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070724212100.C8FAB164EC@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83


------- Comment #13 from nefedova at mcs.anl.gov  2007-07-24 16:21 -------
I tried the same code as in Comment #2 with r951 and it hangs the same way as
before. has it worked for you?


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From benc at hawaga.org.uk  Tue Jul 24 16:54:45 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 24 Jul 2007 21:54:45 +0000 (GMT)
Subject: [Swift-devel] VDS1 transfer executable
In-Reply-To: <46A618FE.20205@mcs.anl.gov>
References: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk> 
	<1185203859.17343.5.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0707231518550.26516@dildano.hawaga.org.uk> 
	<1185204487.17343.14.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0707231529210.26516@dildano.hawaga.org.uk>
	<46A4CBE9.6060600@mcs.anl.gov>
	<1185287839.16438.7.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0707241443440.26516@dildano.hawaga.org.uk>
	<1185288570.17215.3.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707241452310.26516@dildano.hawaga.org.uk>
	<46A618FE.20205@mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707242153130.26516@dildano.hawaga.org.uk>


Not numerical data. I just recall it being something that we ISI sysadmin 
people used to laugh about as people had VDS moving data all over the 
place unnecessarily within the ISI network whilst they complained that 
there wasn't enough space in one particular space or that ftp servers 
weren't coping.

On Tue, 24 Jul 2007, Ian Foster wrote:

> Do we have data that show this to be a problem?
> 
> Ben Clifford wrote:
> > On Tue, 24 Jul 2007, Mihael Hategan wrote:
> > 
> >   
> > > Can you be more specific on what bottlenecks we're trying to avoid?
> > >     
> > 
> > pumping all the data for the workflow through one ethernet card and CPU.
> > 
> >   
> 
> 


From hategan at mcs.anl.gov  Tue Jul 24 17:05:18 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 24 Jul 2007 17:05:18 -0500
Subject: [Swift-devel] VDS1 transfer executable
In-Reply-To: <Pine.LNX.4.64.0707242153130.26516@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk>
	<1185203859.17343.5.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707231518550.26516@dildano.hawaga.org.uk>
	<1185204487.17343.14.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707231529210.26516@dildano.hawaga.org.uk>
	<46A4CBE9.6060600@mcs.anl.gov>
	<1185287839.16438.7.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707241443440.26516@dildano.hawaga.org.uk>
	<1185288570.17215.3.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707241452310.26516@dildano.hawaga.org.uk>
	<46A618FE.20205@mcs.anl.gov>
	<Pine.LNX.4.64.0707242153130.26516@dildano.hawaga.org.uk>
Message-ID: <1185314718.8214.0.camel@blabla.mcs.anl.gov>

On Tue, 2007-07-24 at 21:54 +0000, Ben Clifford wrote:
> Not numerical data. I just recall it being something that we ISI sysadmin 
> people used to laugh about as people had VDS moving data all over the 
> place unnecessarily within the ISI network

Still laughing? :)

>  whilst they complained that 
> there wasn't enough space in one particular space or that ftp servers 
> weren't coping.
> 
> On Tue, 24 Jul 2007, Ian Foster wrote:
> 
> > Do we have data that show this to be a problem?
> > 
> > Ben Clifford wrote:
> > > On Tue, 24 Jul 2007, Mihael Hategan wrote:
> > > 
> > >   
> > > > Can you be more specific on what bottlenecks we're trying to avoid?
> > > >     
> > > 
> > > pumping all the data for the workflow through one ethernet card and CPU.
> > > 
> > >   
> > 
> > 
> 


From benc at hawaga.org.uk  Tue Jul 24 17:06:47 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 24 Jul 2007 22:06:47 +0000 (GMT)
Subject: [Swift-devel] VDS1 transfer executable
In-Reply-To: <1185314718.8214.0.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk> 
	<1185203859.17343.5.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0707231518550.26516@dildano.hawaga.org.uk> 
	<1185204487.17343.14.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0707231529210.26516@dildano.hawaga.org.uk>
	<46A4CBE9.6060600@mcs.anl.gov>
	<1185287839.16438.7.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0707241443440.26516@dildano.hawaga.org.uk> 
	<1185288570.17215.3.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0707241452310.26516@dildano.hawaga.org.uk>
	<46A618FE.20205@mcs.anl.gov>
	<Pine.LNX.4.64.0707242153130.26516@dildano.hawaga.org.uk>
	<1185314718.8214.0.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707242206380.26516@dildano.hawaga.org.uk>


On Tue, 24 Jul 2007, Mihael Hategan wrote:

> On Tue, 2007-07-24 at 21:54 +0000, Ben Clifford wrote:
> > Not numerical data. I just recall it being something that we ISI sysadmin 
> > people used to laugh about as people had VDS moving data all over the 
> > place unnecessarily within the ISI network
> 
> Still laughing? :)

no, I left.

-- 


From benc at hawaga.org.uk  Tue Jul 24 17:10:43 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 24 Jul 2007 22:10:43 +0000 (GMT)
Subject: [Swift-devel] numeric type(s) in swift.
In-Reply-To: <1185307948.14893.2.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0707160806440.7513@dildano.hawaga.org.uk> 
	<Pine.LNX.4.64.0707201422570.11237@dildano.hawaga.org.uk> 
	<Pine.LNX.4.58.0707200929220.8849@classes.cs.uchicago.edu> 
	<46A0D0D1.6070407@mcs.anl.gov>
	<Pine.LNX.4.64.0707201705140.26516@dildano.hawaga.org.uk>
	<46A11286.7080807@mcs.anl.gov>
	<1184964549.26024.0.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707202206520.26516@dildano.hawaga.org.uk>
	<1185307948.14893.2.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707242209440.26516@dildano.hawaga.org.uk>


On Tue, 24 Jul 2007, Mihael Hategan wrote:

> One potentially problem-causing change (if broken code makes broken
> assumptions) is that Swift number values are not stored as strings any
> more, but as subclasses of java.lang.Number.

I think(?) that the only code that made assumptions about the number 
formats are the numerical operators and code that assumes the toString() 
output will be of a particular format when passing as a commandline 
parameter.

-- 


From foster at mcs.anl.gov  Tue Jul 24 17:28:59 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Tue, 24 Jul 2007 17:28:59 -0500
Subject: [Swift-devel] VDS1 transfer executable
In-Reply-To: <Pine.LNX.4.64.0707242153130.26516@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk>
	<1185203859.17343.5.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707231518550.26516@dildano.hawaga.org.uk>
	<1185204487.17343.14.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707231529210.26516@dildano.hawaga.org.uk>
	<46A4CBE9.6060600@mcs.anl.gov>
	<1185287839.16438.7.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707241443440.26516@dildano.hawaga.org.uk>
	<1185288570.17215.3.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707241452310.26516@dildano.hawaga.org.uk>
	<46A618FE.20205@mcs.anl.gov>
	<Pine.LNX.4.64.0707242153130.26516@dildano.hawaga.org.uk>
Message-ID: <46A67D2B.4090804@mcs.anl.gov>

Ben:

I feel strongly that we should be focusing our scarce development 
resources on problems that we have documented via user experience. That 
means we need that performance monitoring infrastructure in Swift ...

I do think that data movement and caching are likely to become important 
issues. But it would be good to know when/how exactly they do.

Mike mentioned that he thought Nika's MolDyn code had some workaround in 
it to reduce data movement, introduced because of a lack of caching 
support. Does anyone know about that?

Ian.

Ben Clifford wrote:
> Not numerical data. I just recall it being something that we ISI sysadmin 
> people used to laugh about as people had VDS moving data all over the 
> place unnecessarily within the ISI network whilst they complained that 
> there wasn't enough space in one particular space or that ftp servers 
> weren't coping.
>
> On Tue, 24 Jul 2007, Ian Foster wrote:
>
>   
>> Do we have data that show this to be a problem?
>>
>> Ben Clifford wrote:
>>     
>>> On Tue, 24 Jul 2007, Mihael Hategan wrote:
>>>
>>>   
>>>       
>>>> Can you be more specific on what bottlenecks we're trying to avoid?
>>>>     
>>>>         
>>> pumping all the data for the workflow through one ethernet card and CPU.
>>>
>>>   
>>>       
>>     
>
>   

-- 

   Ian Foster, Director, Computation Institute
Argonne National Laboratory & University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630 252 4619.  Web: www.ci.uchicago.edu.
      Globus Alliance: www.globus.org.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070724/095040e1/attachment.html>

From hategan at mcs.anl.gov  Tue Jul 24 17:40:29 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 24 Jul 2007 17:40:29 -0500
Subject: [Swift-devel] numeric type(s) in swift.
In-Reply-To: <Pine.LNX.4.64.0707242209440.26516@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707160806440.7513@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0707201422570.11237@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0707200929220.8849@classes.cs.uchicago.edu>
	<46A0D0D1.6070407@mcs.anl.gov>
	<Pine.LNX.4.64.0707201705140.26516@dildano.hawaga.org.uk>
	<46A11286.7080807@mcs.anl.gov>
	<1184964549.26024.0.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707202206520.26516@dildano.hawaga.org.uk>
	<1185307948.14893.2.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707242209440.26516@dildano.hawaga.org.uk>
Message-ID: <1185316829.9373.3.camel@blabla.mcs.anl.gov>

On Tue, 2007-07-24 at 22:10 +0000, Ben Clifford wrote:
> On Tue, 24 Jul 2007, Mihael Hategan wrote:
> 
> > One potentially problem-causing change (if broken code makes broken
> > assumptions) is that Swift number values are not stored as strings any
> > more, but as subclasses of java.lang.Number.
> 
> I think(?) that the only code that made assumptions about the number 
> formats are the numerical operators and code that assumes the toString() 
> output will be of a particular format when passing as a commandline 
> parameter.

That format would only be kept in the case in which the assigned value
would be used. Using any arithmetic operators would not make any
guarantee of a particular format. That in the old code. Maybe some
formatting functions should be provided?

> 


From benc at hawaga.org.uk  Tue Jul 24 17:41:52 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 24 Jul 2007 22:41:52 +0000 (GMT)
Subject: [Swift-devel] numeric type(s) in swift.
In-Reply-To: <1185316829.9373.3.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0707160806440.7513@dildano.hawaga.org.uk> 
	<Pine.LNX.4.64.0707201422570.11237@dildano.hawaga.org.uk> 
	<Pine.LNX.4.58.0707200929220.8849@classes.cs.uchicago.edu> 
	<46A0D0D1.6070407@mcs.anl.gov>
	<Pine.LNX.4.64.0707201705140.26516@dildano.hawaga.org.uk>
	<46A11286.7080807@mcs.anl.gov>
	<1184964549.26024.0.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707202206520.26516@dildano.hawaga.org.uk> 
	<1185307948.14893.2.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0707242209440.26516@dildano.hawaga.org.uk>
	<1185316829.9373.3.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707242241260.26516@dildano.hawaga.org.uk>


On Tue, 24 Jul 2007, Mihael Hategan wrote:

> That format would only be kept in the case in which the assigned value
> would be used. Using any arithmetic operators would not make any
> guarantee of a particular format. That in the old code. Maybe some
> formatting functions should be provided?

easy enough to implement ad-hoc when someone needs them - for now, we can 
wait till it causes someone trouble.

-- 


From benc at hawaga.org.uk  Tue Jul 24 17:42:18 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 24 Jul 2007 22:42:18 +0000 (GMT)
Subject: [Swift-devel] VDS1 transfer executable
In-Reply-To: <46A67D2B.4090804@mcs.anl.gov>
References: <Pine.LNX.4.64.0707231507340.26516@dildano.hawaga.org.uk> 
	<1185203859.17343.5.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0707231518550.26516@dildano.hawaga.org.uk> 
	<1185204487.17343.14.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0707231529210.26516@dildano.hawaga.org.uk>
	<46A4CBE9.6060600@mcs.anl.gov>
	<1185287839.16438.7.camel@blabla.mcs.anl.gov> 
	<Pine.LNX.4.64.0707241443440.26516@dildano.hawaga.org.uk>
	<1185288570.17215.3.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707241452310.26516@dildano.hawaga.org.uk>
	<46A618FE.20205@mcs.anl.gov>
	<Pine.LNX.4.64.0707242153130.26516@dildano.hawaga.org.uk>
	<46A67D2B.4090804@mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707242231180.26516@dildano.hawaga.org.uk>


On Tue, 24 Jul 2007, Ian Foster wrote:

> Mike mentioned that he thought Nika's MolDyn code had some workaround in 
> it to reduce data movement, introduced because of a lack of caching 
> support. Does anyone know about that?

that is bug 76: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=76

Bug 78 http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=78 (or rather, 
the privately discussed rather different root cause of bug 78, which is to 
access dcache data) is higher priority.

A basic approach is to have the submit side access dcache, as I've 
discussed elsewhere; there's no direct evidence that that approach will be 
unsuitable (though thoughts that it might be are what motivated this 
thread).

We can look at doing that next.

-- 


From benc at hawaga.org.uk  Tue Jul 24 18:11:45 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 24 Jul 2007 23:11:45 +0000 (GMT)
Subject: [Swift-devel] r064: use /dev/urandom by default
Message-ID: <Pine.LNX.4.64.0707242311140.26516@dildano.hawaga.org.uk>


This should go to trunk not language reform branch?
-- 


From hategan at mcs.anl.gov  Tue Jul 24 18:20:49 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 24 Jul 2007 18:20:49 -0500
Subject: [Swift-devel] Re: r064: use /dev/urandom by default
In-Reply-To: <Pine.LNX.4.64.0707242311140.26516@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707242311140.26516@dildano.hawaga.org.uk>
Message-ID: <1185319249.11093.1.camel@blabla.mcs.anl.gov>

On Tue, 2007-07-24 at 23:11 +0000, Ben Clifford wrote:
> This should go to trunk not language reform branch?

Right. I'm guessing it will get there when we merge them. I wanted some
testing to be done on it.

In another order of ideas, I think we should have a general development
branch, not specific to a certain thing (such as expressions).

Mihael


From benc at hawaga.org.uk  Wed Jul 25 02:38:27 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 25 Jul 2007 07:38:27 +0000 (GMT)
Subject: [Swift-devel] Re: r064: use /dev/urandom by default
In-Reply-To: <1185319249.11093.1.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0707242311140.26516@dildano.hawaga.org.uk>
	<1185319249.11093.1.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707250733220.26516@dildano.hawaga.org.uk>


On Tue, 24 Jul 2007, Mihael Hategan wrote:

> Right. I'm guessing it will get there when we merge them. I wanted some
> testing to be done on it.

Pretty much the only serious testing that's going to happen is when it 
gets to trunk and people get it on the occasions that they update from 
there.

SVN's branch management is sufficiently poor that I prefer to not have 
long lived general development branches that dilute testing of stuff 
that's gone into trunk. (now if we were using git, that would be another 
matter...)

-- 


From benc at hawaga.org.uk  Wed Jul 25 02:42:56 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 25 Jul 2007 07:42:56 +0000 (GMT)
Subject: [Swift-devel] Re: nightly tests changes
In-Reply-To: <1185287647.16438.3.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0707241046500.26516@dildano.hawaga.org.uk>
	<1185287647.16438.3.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707250739070.26516@dildano.hawaga.org.uk>


On Tue, 24 Jul 2007, Mihael Hategan wrote:

> > r952: fix ls portion of file_counter nightly test - can't pass wildcards 
> > to ls as those are expanded by the shell, not by ls itself; and if ls 
> > finds no files it returns a failure code. Now use the root directory, on 
> > the assumption that this always has some files in it and is always 
> > readable.

Looks like this fix worked. Tests now look greener, though not completely 
green - 5 of the 110 grid tests failed (at random?) with gridftp errors.

-- 


From benc at hawaga.org.uk  Wed Jul 25 05:30:22 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 25 Jul 2007 10:30:22 +0000 (GMT)
Subject: [Swift-devel] numeric type(s) in swift.
In-Reply-To: <1185302836.6949.5.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0707160806440.7513@dildano.hawaga.org.uk> 
	<Pine.LNX.4.64.0707201422570.11237@dildano.hawaga.org.uk> 
	<Pine.LNX.4.58.0707200929220.8849@classes.cs.uchicago.edu> 
	<46A0D0D1.6070407@mcs.anl.gov>
	<Pine.LNX.4.64.0707201705140.26516@dildano.hawaga.org.uk>
	<46A11286.7080807@mcs.anl.gov>
	<1184964549.26024.0.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707202206520.26516@dildano.hawaga.org.uk>
	<1185302836.6949.5.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707251023490.26516@dildano.hawaga.org.uk>


need to be careful a bit about casting between floating point and fixed 
precision types in the operator implementation.

ints are small enough that they fit within a double such that no precision 
is lost; but using eg. a java long would cause a problem (see below code)

public class casts {
    public static void main(String args[]) {

        long i = 9223372036854775784l;
        double d = (double) i;
        long i2 = (long)d;

        System.out.println("  i="+i);
        System.out.println("  d="+d);
        System.out.println(" i2="+i2);
        if(i != i2) System.out.println("Different");
    }
}


From benc at hawaga.org.uk  Wed Jul 25 08:04:52 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 25 Jul 2007 13:04:52 +0000 (GMT)
Subject: [Swift-devel] airsn and ROI mappers
Message-ID: <Pine.LNX.4.64.0707251303370.26516@dildano.hawaga.org.uk>


Hi.

Are these two mappers used?

If so, I need to make sure some code changes I want to make to 
AbstractFileMapper don't break those. If not, I'm less concerned.

-- 


From bugzilla-daemon at mcs.anl.gov  Wed Jul 25 08:33:46 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Wed, 25 Jul 2007 08:33:46 -0500 (CDT)
Subject: [Swift-devel] [Bug 83] nested loops hung
In-Reply-To: <bug-83-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070725133346.4DA6C164EC@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=83


nefedova at mcs.anl.gov changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |blocker


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From yongzh at cs.uchicago.edu  Wed Jul 25 09:08:53 2007
From: yongzh at cs.uchicago.edu (Yong Zhao)
Date: Wed, 25 Jul 2007 09:08:53 -0500 (CDT)
Subject: [Swift-devel] airsn and ROI mappers
In-Reply-To: <Pine.LNX.4.64.0707251303370.26516@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707251303370.26516@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.58.0707250908100.1119@classes.cs.uchicago.edu>

The airsn mapper is critical for the fRMI workflows, ROIMapper was
developed for the RADGrid workflow.

Yong.

On Wed, 25 Jul 2007, Ben Clifford wrote:

>
> Hi.
>
> Are these two mappers used?
>
> If so, I need to make sure some code changes I want to make to
> AbstractFileMapper don't break those. If not, I'm less concerned.
>
> --
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From hategan at mcs.anl.gov  Wed Jul 25 09:32:52 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 25 Jul 2007 09:32:52 -0500
Subject: [Swift-devel] numeric type(s) in swift.
In-Reply-To: <Pine.LNX.4.64.0707251023490.26516@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707160806440.7513@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0707201422570.11237@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0707200929220.8849@classes.cs.uchicago.edu>
	<46A0D0D1.6070407@mcs.anl.gov>
	<Pine.LNX.4.64.0707201705140.26516@dildano.hawaga.org.uk>
	<46A11286.7080807@mcs.anl.gov>
	<1184964549.26024.0.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707202206520.26516@dildano.hawaga.org.uk>
	<1185302836.6949.5.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707251023490.26516@dildano.hawaga.org.uk>
Message-ID: <1185373972.12444.1.camel@blabla.mcs.anl.gov>

Yep. But we're not using longs.

On Wed, 2007-07-25 at 10:30 +0000, Ben Clifford wrote:
> need to be careful a bit about casting between floating point and fixed 
> precision types in the operator implementation.
> 
> ints are small enough that they fit within a double such that no precision 
> is lost; but using eg. a java long would cause a problem (see below code)
> 
> public class casts {
>     public static void main(String args[]) {
> 
>         long i = 9223372036854775784l;
>         double d = (double) i;
>         long i2 = (long)d;
> 
>         System.out.println("  i="+i);
>         System.out.println("  d="+d);
>         System.out.println(" i2="+i2);
>         if(i != i2) System.out.println("Different");
>     }
> }
> 


From benc at hawaga.org.uk  Wed Jul 25 11:13:02 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 25 Jul 2007 16:13:02 +0000 (GMT)
Subject: [Swift-devel] Re: Falkon code and logs
In-Reply-To: <469BE095.4010608@cs.uchicago.edu>
References: <469BE095.4010608@cs.uchicago.edu>
Message-ID: <Pine.LNX.4.64.0707251605030.21532@dildano.hawaga.org.uk>


On Mon, 16 Jul 2007, Ioan Raicu wrote:

> Hey Ben,
> Here is the latest Falkon code base, including all compiled classes, scripts,
> libraries, 1.4 JRE, ploticus binaries, GT4 WS-core container, web server,
> etc... its the entire branch that is needed containing all the different
> Falkon components.  I would have preffered to clean things up a bit, but here
> it is, and I'll do the clean-up later...
> http://people.cs.uchicago.edu/~iraicu/research/Falkon/Falkon_v0.8.1.tgz

I just imported this into the vdl2 subversion repo.

Type:

   svn co https://svn.ci.uchicago.edu/svn/vdl2/falkon

to get the checkout.

I removed the embedded JRE (putting aside issues of whether we should big 
binaries like that in the SVN, a quick glance at the JRE redistribution 
licence looked like it was not something acceptable)

If you edit files, you can commit them with:

 svn commit

which will require you to feed in your CI password.

Type svn update in the root directory of your checkout to pull down 
changes that other people have made since your last checkout/update 
(probably you'll find me making a bunch of those to tidy some things up)

If you add files, you will need to:

 svn add myfile.java

before committing it.

This is the tarball as I received it, so has lots of built cruft in there 
(.class files and things).

I'll help work on tidying that up in the repository.

Please commit any changes you have made since this tarball, and begin 
making your releases from committed SVN code rather than from your own 
private codebase - that way, people can talk about 'falkon built from 
r972' and then everyone can look at the exact code version from SVN.

-- 


From benc at hawaga.org.uk  Wed Jul 25 11:15:30 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 25 Jul 2007 16:15:30 +0000 (GMT)
Subject: [Swift-devel] Re: Falkon code and logs
In-Reply-To: <Pine.LNX.4.64.0707251605030.21532@dildano.hawaga.org.uk>
References: <469BE095.4010608@cs.uchicago.edu>
	<Pine.LNX.4.64.0707251605030.21532@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.64.0707251615110.26516@dildano.hawaga.org.uk>

btw, the import hasn't actually finished yet... i sent this mail by 
accident without waiting for it to finish.


From hategan at mcs.anl.gov  Wed Jul 25 12:01:33 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 25 Jul 2007 12:01:33 -0500
Subject: [Swift-devel] Re: Falkon code and logs
In-Reply-To: <Pine.LNX.4.64.0707251605030.21532@dildano.hawaga.org.uk>
References: <469BE095.4010608@cs.uchicago.edu>
	<Pine.LNX.4.64.0707251605030.21532@dildano.hawaga.org.uk>
Message-ID: <1185382893.15519.0.camel@blabla.mcs.anl.gov>

Aaargh! It's being imported into the root not the falkon directory!

On Wed, 2007-07-25 at 16:13 +0000, Ben Clifford wrote:
> 
> On Mon, 16 Jul 2007, Ioan Raicu wrote:
> 
> > Hey Ben,
> > Here is the latest Falkon code base, including all compiled classes, scripts,
> > libraries, 1.4 JRE, ploticus binaries, GT4 WS-core container, web server,
> > etc... its the entire branch that is needed containing all the different
> > Falkon components.  I would have preffered to clean things up a bit, but here
> > it is, and I'll do the clean-up later...
> > http://people.cs.uchicago.edu/~iraicu/research/Falkon/Falkon_v0.8.1.tgz
> 
> I just imported this into the vdl2 subversion repo.
> 
> Type:
> 
>    svn co https://svn.ci.uchicago.edu/svn/vdl2/falkon
> 
> to get the checkout.
> 
> I removed the embedded JRE (putting aside issues of whether we should big 
> binaries like that in the SVN, a quick glance at the JRE redistribution 
> licence looked like it was not something acceptable)
> 
> If you edit files, you can commit them with:
> 
>  svn commit
> 
> which will require you to feed in your CI password.
> 
> Type svn update in the root directory of your checkout to pull down 
> changes that other people have made since your last checkout/update 
> (probably you'll find me making a bunch of those to tidy some things up)
> 
> If you add files, you will need to:
> 
>  svn add myfile.java
> 
> before committing it.
> 
> This is the tarball as I received it, so has lots of built cruft in there 
> (.class files and things).
> 
> I'll help work on tidying that up in the repository.
> 
> Please commit any changes you have made since this tarball, and begin 
> making your releases from committed SVN code rather than from your own 
> private codebase - that way, people can talk about 'falkon built from 
> r972' and then everyone can look at the exact code version from SVN.
> 


From benc at hawaga.org.uk  Wed Jul 25 12:03:20 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 25 Jul 2007 17:03:20 +0000 (GMT)
Subject: [Swift-devel] Re: Falkon code and logs
In-Reply-To: <1185382893.15519.0.camel@blabla.mcs.anl.gov>
References: <469BE095.4010608@cs.uchicago.edu> 
	<Pine.LNX.4.64.0707251605030.21532@dildano.hawaga.org.uk>
	<1185382893.15519.0.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707251703040.26516@dildano.hawaga.org.uk>


ja I saw that. Easy to move, which I am doing now. Please wait.

On Wed, 25 Jul 2007, Mihael Hategan wrote:

> Aaargh! It's being imported into the root not the falkon directory!
> 
> On Wed, 2007-07-25 at 16:13 +0000, Ben Clifford wrote:
> > 
> > On Mon, 16 Jul 2007, Ioan Raicu wrote:
> > 
> > > Hey Ben,
> > > Here is the latest Falkon code base, including all compiled classes, scripts,
> > > libraries, 1.4 JRE, ploticus binaries, GT4 WS-core container, web server,
> > > etc... its the entire branch that is needed containing all the different
> > > Falkon components.  I would have preffered to clean things up a bit, but here
> > > it is, and I'll do the clean-up later...
> > > http://people.cs.uchicago.edu/~iraicu/research/Falkon/Falkon_v0.8.1.tgz
> > 
> > I just imported this into the vdl2 subversion repo.
> > 
> > Type:
> > 
> >    svn co https://svn.ci.uchicago.edu/svn/vdl2/falkon
> > 
> > to get the checkout.
> > 
> > I removed the embedded JRE (putting aside issues of whether we should big 
> > binaries like that in the SVN, a quick glance at the JRE redistribution 
> > licence looked like it was not something acceptable)
> > 
> > If you edit files, you can commit them with:
> > 
> >  svn commit
> > 
> > which will require you to feed in your CI password.
> > 
> > Type svn update in the root directory of your checkout to pull down 
> > changes that other people have made since your last checkout/update 
> > (probably you'll find me making a bunch of those to tidy some things up)
> > 
> > If you add files, you will need to:
> > 
> >  svn add myfile.java
> > 
> > before committing it.
> > 
> > This is the tarball as I received it, so has lots of built cruft in there 
> > (.class files and things).
> > 
> > I'll help work on tidying that up in the repository.
> > 
> > Please commit any changes you have made since this tarball, and begin 
> > making your releases from committed SVN code rather than from your own 
> > private codebase - that way, people can talk about 'falkon built from 
> > r972' and then everyone can look at the exact code version from SVN.
> > 
> 
> 


From benc at hawaga.org.uk  Wed Jul 25 13:27:47 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 25 Jul 2007 18:27:47 +0000 (GMT)
Subject: [Swift-devel] Re: Falkon code and logs
In-Reply-To: <Pine.LNX.4.64.0707251703040.26516@dildano.hawaga.org.uk>
References: <469BE095.4010608@cs.uchicago.edu> 
	<Pine.LNX.4.64.0707251605030.21532@dildano.hawaga.org.uk>
	<1185382893.15519.0.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707251703040.26516@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.64.0707251827070.26516@dildano.hawaga.org.uk>


On Wed, 25 Jul 2007, Ben Clifford wrote:

> ja I saw that. Easy to move, which I am doing now. Please wait.

hopefully all better as of r995.

-- 


From hategan at mcs.anl.gov  Wed Jul 25 19:03:08 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 25 Jul 2007 19:03:08 -0500
Subject: [Swift-devel] dcache
Message-ID: <1185408188.21980.0.camel@blabla.mcs.anl.gov>

Does anyone know of an installation I can play with?
Looking at the docs, I'm a bit reluctant to try to install it on my
laptop.

Mihael


From iraicu at cs.uchicago.edu  Wed Jul 25 19:10:28 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 25 Jul 2007 19:10:28 -0500
Subject: [Swift-devel] dcache
In-Reply-To: <1185408188.21980.0.camel@blabla.mcs.anl.gov>
References: <1185408188.21980.0.camel@blabla.mcs.anl.gov>
Message-ID: <46A7E674.6060406@cs.uchicago.edu>

I think dCache is installed on Tier3.
http://twiki.mwt2.org/bin/view/UCTier3/WebHome
Ioan


Mihael Hategan wrote:
> Does anyone know of an installation I can play with?
> Looking at the docs, I'm a bit reluctant to try to install it on my
> laptop.
>
> Mihael
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================


From hategan at mcs.anl.gov  Wed Jul 25 19:13:01 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 25 Jul 2007 19:13:01 -0500
Subject: [Swift-devel] dcache
In-Reply-To: <46A7E674.6060406@cs.uchicago.edu>
References: <1185408188.21980.0.camel@blabla.mcs.anl.gov>
	<46A7E674.6060406@cs.uchicago.edu>
Message-ID: <1185408781.22344.1.camel@blabla.mcs.anl.gov>

Right, and how would I get access to that?

On Wed, 2007-07-25 at 19:10 -0500, Ioan Raicu wrote:
> I think dCache is installed on Tier3.
> http://twiki.mwt2.org/bin/view/UCTier3/WebHome
> Ioan
> 
> 
> Mihael Hategan wrote:
> > Does anyone know of an installation I can play with?
> > Looking at the docs, I'm a bit reluctant to try to install it on my
> > laptop.
> >
> > Mihael
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> >   
> 


From iraicu at cs.uchicago.edu  Wed Jul 25 19:24:49 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 25 Jul 2007 19:24:49 -0500
Subject: [Swift-devel] dcache
In-Reply-To: <1185408781.22344.1.camel@blabla.mcs.anl.gov>
References: <1185408188.21980.0.camel@blabla.mcs.anl.gov>	
	<46A7E674.6060406@cs.uchicago.edu>
	<1185408781.22344.1.camel@blabla.mcs.anl.gov>
Message-ID: <46A7E9D1.1050605@cs.uchicago.edu>

Here is the message I got from Rob Gardner (rwg at ci.uchicago.edu) when I 
got my account for Tier3.  You might have to write him, Mary, and/or 
double check the link below for further instructions.

Ioan

=========================
Hi Mary,

Can you create accounts for Yong Zhao and Ioan Raicu, two CS computer 
science students.  They will
need to use the Tier3 cluster for a test tomorrow morning.

Yong, Ioan, the first step is to follow the instructions for UChicago 
users at:

http://twiki.mwt2.org/bin/view/TWiki/TWikiRegistration

then Mary will create twiki accounts for you on the UC Tier3 twiki which 
is not public.   Then you'll
go to:

http://twiki.mwt2.org/bin/view/UCTier3/WebHome

and then http://twiki.mwt2.org/bin/view/UCTier3/GettingAnAccount.

Rob


Mihael Hategan wrote:
> Right, and how would I get access to that?
>
> On Wed, 2007-07-25 at 19:10 -0500, Ioan Raicu wrote:
>   
>> I think dCache is installed on Tier3.
>> http://twiki.mwt2.org/bin/view/UCTier3/WebHome
>> Ioan
>>
>>
>> Mihael Hategan wrote:
>>     
>>> Does anyone know of an installation I can play with?
>>> Looking at the docs, I'm a bit reluctant to try to install it on my
>>> laptop.
>>>
>>> Mihael
>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>>   
>>>       
>
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070725/73808838/attachment.html>

From hategan at mcs.anl.gov  Wed Jul 25 19:26:34 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 25 Jul 2007 19:26:34 -0500
Subject: [Swift-devel] dcache
In-Reply-To: <46A7E9D1.1050605@cs.uchicago.edu>
References: <1185408188.21980.0.camel@blabla.mcs.anl.gov>
	<46A7E674.6060406@cs.uchicago.edu>
	<1185408781.22344.1.camel@blabla.mcs.anl.gov>
	<46A7E9D1.1050605@cs.uchicago.edu>
Message-ID: <1185409594.23097.0.camel@blabla.mcs.anl.gov>

Any simpler way?

On Wed, 2007-07-25 at 19:24 -0500, Ioan Raicu wrote:
> Here is the message I got from Rob Gardner (rwg at ci.uchicago.edu) when
> I got my account for Tier3.  You might have to write him, Mary, and/or
> double check the link below for further instructions.
> 
> Ioan
> 
> =========================
> Hi Mary,
> 
> 
> Can you create accounts for Yong Zhao and Ioan Raicu, two CS computer
> science students.  They will
> need to use the Tier3 cluster for a test tomorrow morning.
> 
> 
> Yong, Ioan, the first step is to follow the instructions for UChicago
> users at:
> 
> 
> http://twiki.mwt2.org/bin/view/TWiki/TWikiRegistration
> 
> 
> then Mary will create twiki accounts for you on the UC Tier3 twiki
> which is not public.   Then you'll
> go to:
> 
> 
> http://twiki.mwt2.org/bin/view/UCTier3/WebHome
> 
> 
> and then http://twiki.mwt2.org/bin/view/UCTier3/GettingAnAccount.
> 
> 
> Rob
> 
> 
> 
> 
> 
> 
> Mihael Hategan wrote: 
> > Right, and how would I get access to that?
> > 
> > On Wed, 2007-07-25 at 19:10 -0500, Ioan Raicu wrote:
> >   
> > > I think dCache is installed on Tier3.
> > > http://twiki.mwt2.org/bin/view/UCTier3/WebHome
> > > Ioan
> > > 
> > > 
> > > Mihael Hategan wrote:
> > >     
> > > > Does anyone know of an installation I can play with?
> > > > Looking at the docs, I'm a bit reluctant to try to install it on my
> > > > laptop.
> > > > 
> > > > Mihael
> > > > 
> > > > _______________________________________________
> > > > Swift-devel mailing list
> > > > Swift-devel at ci.uchicago.edu
> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > 
> > > >   
> > > >       
> > 
> > 
> >   
> 
> -- 
> ============================================
> Ioan Raicu
> Ph.D. Student
> ============================================
> Distributed Systems Laboratory
> Computer Science Department
> University of Chicago
> 1100 E. 58th Street, Ryerson Hall
> Chicago, IL 60637
> ============================================
> Email: iraicu at cs.uchicago.edu
> Web:   http://www.cs.uchicago.edu/~iraicu
>        http://dsl.cs.uchicago.edu/
> ============================================
> ============================================


From iraicu at cs.uchicago.edu  Wed Jul 25 19:29:40 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Wed, 25 Jul 2007 19:29:40 -0500
Subject: [Swift-devel] dcache
In-Reply-To: <1185409594.23097.0.camel@blabla.mcs.anl.gov>
References: <1185408188.21980.0.camel@blabla.mcs.anl.gov>	
	<46A7E674.6060406@cs.uchicago.edu>	
	<1185408781.22344.1.camel@blabla.mcs.anl.gov>	
	<46A7E9D1.1050605@cs.uchicago.edu>
	<1185409594.23097.0.camel@blabla.mcs.anl.gov>
Message-ID: <46A7EAF4.5060608@cs.uchicago.edu>

You asked how, I told you how Yong and I got accounts on Tier3, which 
also has dCache installed.  They actually have a really nice testbed, 
some 20 compute nodes with 8GB of memory and 4 cores on each node, and 
some 50TB of disk managed by dCache.  I don't know of any other install 
of dCache around here, such as TeraPort or TeraGrid. 

Ioan

Mihael Hategan wrote:
> Any simpler way?
>
> On Wed, 2007-07-25 at 19:24 -0500, Ioan Raicu wrote:
>   
>> Here is the message I got from Rob Gardner (rwg at ci.uchicago.edu) when
>> I got my account for Tier3.  You might have to write him, Mary, and/or
>> double check the link below for further instructions.
>>
>> Ioan
>>
>> =========================
>> Hi Mary,
>>
>>
>> Can you create accounts for Yong Zhao and Ioan Raicu, two CS computer
>> science students.  They will
>> need to use the Tier3 cluster for a test tomorrow morning.
>>
>>
>> Yong, Ioan, the first step is to follow the instructions for UChicago
>> users at:
>>
>>
>> http://twiki.mwt2.org/bin/view/TWiki/TWikiRegistration
>>
>>
>> then Mary will create twiki accounts for you on the UC Tier3 twiki
>> which is not public.   Then you'll
>> go to:
>>
>>
>> http://twiki.mwt2.org/bin/view/UCTier3/WebHome
>>
>>
>> and then http://twiki.mwt2.org/bin/view/UCTier3/GettingAnAccount.
>>
>>
>> Rob
>>
>>
>>
>>
>>
>>
>> Mihael Hategan wrote: 
>>     
>>> Right, and how would I get access to that?
>>>
>>> On Wed, 2007-07-25 at 19:10 -0500, Ioan Raicu wrote:
>>>   
>>>       
>>>> I think dCache is installed on Tier3.
>>>> http://twiki.mwt2.org/bin/view/UCTier3/WebHome
>>>> Ioan
>>>>
>>>>
>>>> Mihael Hategan wrote:
>>>>     
>>>>         
>>>>> Does anyone know of an installation I can play with?
>>>>> Looking at the docs, I'm a bit reluctant to try to install it on my
>>>>> laptop.
>>>>>
>>>>> Mihael
>>>>>
>>>>> _______________________________________________
>>>>> Swift-devel mailing list
>>>>> Swift-devel at ci.uchicago.edu
>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>
>>>>>   
>>>>>       
>>>>>           
>>>   
>>>       
>> -- 
>> ============================================
>> Ioan Raicu
>> Ph.D. Student
>> ============================================
>> Distributed Systems Laboratory
>> Computer Science Department
>> University of Chicago
>> 1100 E. 58th Street, Ryerson Hall
>> Chicago, IL 60637
>> ============================================
>> Email: iraicu at cs.uchicago.edu
>> Web:   http://www.cs.uchicago.edu/~iraicu
>>        http://dsl.cs.uchicago.edu/
>> ============================================
>> ============================================
>>     
>
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070725/6ec5002a/attachment.html>

From hategan at mcs.anl.gov  Wed Jul 25 19:32:45 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 25 Jul 2007 19:32:45 -0500
Subject: [Swift-devel] dcache
In-Reply-To: <46A7EAF4.5060608@cs.uchicago.edu>
References: <1185408188.21980.0.camel@blabla.mcs.anl.gov>
	<46A7E674.6060406@cs.uchicago.edu>
	<1185408781.22344.1.camel@blabla.mcs.anl.gov>
	<46A7E9D1.1050605@cs.uchicago.edu>
	<1185409594.23097.0.camel@blabla.mcs.anl.gov>
	<46A7EAF4.5060608@cs.uchicago.edu>
Message-ID: <1185409965.23644.0.camel@blabla.mcs.anl.gov>

None that you know of, I gather.

Thanks.

Mihael

On Wed, 2007-07-25 at 19:29 -0500, Ioan Raicu wrote:
> You asked how, I told you how Yong and I got accounts on Tier3, which
> also has dCache installed.  They actually have a really nice testbed,
> some 20 compute nodes with 8GB of memory and 4 cores on each node, and
> some 50TB of disk managed by dCache.  I don't know of any other
> install of dCache around here, such as TeraPort or TeraGrid.  
> 
> Ioan
> 
> Mihael Hategan wrote: 
> > Any simpler way?
> > 
> > On Wed, 2007-07-25 at 19:24 -0500, Ioan Raicu wrote:
> >   
> > > Here is the message I got from Rob Gardner (rwg at ci.uchicago.edu) when
> > > I got my account for Tier3.  You might have to write him, Mary, and/or
> > > double check the link below for further instructions.
> > > 
> > > Ioan
> > > 
> > > =========================
> > > Hi Mary,
> > > 
> > > 
> > > Can you create accounts for Yong Zhao and Ioan Raicu, two CS computer
> > > science students.  They will
> > > need to use the Tier3 cluster for a test tomorrow morning.
> > > 
> > > 
> > > Yong, Ioan, the first step is to follow the instructions for UChicago
> > > users at:
> > > 
> > > 
> > > http://twiki.mwt2.org/bin/view/TWiki/TWikiRegistration
> > > 
> > > 
> > > then Mary will create twiki accounts for you on the UC Tier3 twiki
> > > which is not public.   Then you'll
> > > go to:
> > > 
> > > 
> > > http://twiki.mwt2.org/bin/view/UCTier3/WebHome
> > > 
> > > 
> > > and then http://twiki.mwt2.org/bin/view/UCTier3/GettingAnAccount.
> > > 
> > > 
> > > Rob
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Mihael Hategan wrote: 
> > >     
> > > > Right, and how would I get access to that?
> > > > 
> > > > On Wed, 2007-07-25 at 19:10 -0500, Ioan Raicu wrote:
> > > >   
> > > >       
> > > > > I think dCache is installed on Tier3.
> > > > > http://twiki.mwt2.org/bin/view/UCTier3/WebHome
> > > > > Ioan
> > > > > 
> > > > > 
> > > > > Mihael Hategan wrote:
> > > > >     
> > > > >         
> > > > > > Does anyone know of an installation I can play with?
> > > > > > Looking at the docs, I'm a bit reluctant to try to install it on my
> > > > > > laptop.
> > > > > > 
> > > > > > Mihael
> > > > > > 
> > > > > > _______________________________________________
> > > > > > Swift-devel mailing list
> > > > > > Swift-devel at ci.uchicago.edu
> > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > > > 
> > > > > >   
> > > > > >       
> > > > > >           
> > > > 
> > > >       
> > > -- 
> > > ============================================
> > > Ioan Raicu
> > > Ph.D. Student
> > > ============================================
> > > Distributed Systems Laboratory
> > > Computer Science Department
> > > University of Chicago
> > > 1100 E. 58th Street, Ryerson Hall
> > > Chicago, IL 60637
> > > ============================================
> > > Email: iraicu at cs.uchicago.edu
> > > Web:   http://www.cs.uchicago.edu/~iraicu
> > >        http://dsl.cs.uchicago.edu/
> > > ============================================
> > > ============================================
> > >     
> > 
> > 
> >   
> 
> -- 
> ============================================
> Ioan Raicu
> Ph.D. Student
> ============================================
> Distributed Systems Laboratory
> Computer Science Department
> University of Chicago
> 1100 E. 58th Street, Ryerson Hall
> Chicago, IL 60637
> ============================================
> Email: iraicu at cs.uchicago.edu
> Web:   http://www.cs.uchicago.edu/~iraicu
>        http://dsl.cs.uchicago.edu/
> ============================================
> ============================================


From benc at hawaga.org.uk  Thu Jul 26 04:38:25 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 26 Jul 2007 09:38:25 +0000 (GMT)
Subject: [Swift-devel] dcache
In-Reply-To: <1185408188.21980.0.camel@blabla.mcs.anl.gov>
References: <1185408188.21980.0.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707260937230.11237@dildano.hawaga.org.uk>


On Wed, 25 Jul 2007, Mihael Hategan wrote:

> Does anyone know of an installation I can play with?
> Looking at the docs, I'm a bit reluctant to try to install it on my
> laptop.

I have access to one at fermi - it was pretty straightforward to get 
access, as I already had a fermi account.

-- 


From benc at hawaga.org.uk  Thu Jul 26 06:07:56 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 26 Jul 2007 11:07:56 +0000 (GMT)
Subject: [Swift-devel] dcache
In-Reply-To: <Pine.LNX.4.64.0707260937230.11237@dildano.hawaga.org.uk>
References: <1185408188.21980.0.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707260937230.11237@dildano.hawaga.org.uk>
Message-ID: <Pine.LNX.4.64.0707261104500.11237@dildano.hawaga.org.uk>


an extremely crude implementation of dcache-in-swift could be told which 
subtrees of the local posix filesystem namespace are actually dCache; and 
then the swift stage in and stage out code would have an additional step 
which would dccp the file to submit-side storage before sending it from 
there to the remote site.

-- 


From benc at hawaga.org.uk  Thu Jul 26 09:01:25 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 26 Jul 2007 14:01:25 +0000 (GMT)
Subject: [Swift-devel] numeric type(s) in swift.
In-Reply-To: <1185302836.6949.5.camel@blabla.mcs.anl.gov>
References: <Pine.LNX.4.64.0707160806440.7513@dildano.hawaga.org.uk> 
	<Pine.LNX.4.64.0707201422570.11237@dildano.hawaga.org.uk> 
	<Pine.LNX.4.58.0707200929220.8849@classes.cs.uchicago.edu> 
	<46A0D0D1.6070407@mcs.anl.gov>
	<Pine.LNX.4.64.0707201705140.26516@dildano.hawaga.org.uk>
	<46A11286.7080807@mcs.anl.gov>
	<1184964549.26024.0.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707202206520.26516@dildano.hawaga.org.uk>
	<1185302836.6949.5.camel@blabla.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707261357420.21532@dildano.hawaga.org.uk>


On Tue, 24 Jul 2007, Mihael Hategan wrote:

> I'm thinking we should have two division operators:
> div - integer division (int, int -> int)
> / - floating point division ( [int|float], [int|float] -> float )

So a patch I have (not yet committed) makes / be floating point division, 
%/ be integer division and %% be mod (rather than %).

(the % prefix on %/ and %% because those two operators are strongly 
related).

No other particularly nice symbols spring to mind.

With that in place, the operator changes you committed work without too 
much change to the language both against the XML development stuff and 
also against the trunk code.

I'd be happy for those to go into trunk now, ahead of the big XML 
expression work.

-- 


From bugzilla-daemon at mcs.anl.gov  Thu Jul 26 09:52:31 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Thu, 26 Jul 2007 09:52:31 -0500 (CDT)
Subject: [Swift-devel] [Bug 22] configurable remote filesystem layout
In-Reply-To: <bug-22-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070726145231.EB473164B3@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=22


benc at hawaga.org.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|v0.2                        |v0.3


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You reported the bug, or are watching the reporter.
You are the assignee for the bug, or are watching the assignee.


From hategan at mcs.anl.gov  Thu Jul 26 10:03:37 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 26 Jul 2007 10:03:37 -0500
Subject: [Swift-devel] numeric type(s) in swift.
In-Reply-To: <Pine.LNX.4.64.0707261357420.21532@dildano.hawaga.org.uk>
References: <Pine.LNX.4.64.0707160806440.7513@dildano.hawaga.org.uk>
	<Pine.LNX.4.64.0707201422570.11237@dildano.hawaga.org.uk>
	<Pine.LNX.4.58.0707200929220.8849@classes.cs.uchicago.edu>
	<46A0D0D1.6070407@mcs.anl.gov>
	<Pine.LNX.4.64.0707201705140.26516@dildano.hawaga.org.uk>
	<46A11286.7080807@mcs.anl.gov>
	<1184964549.26024.0.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707202206520.26516@dildano.hawaga.org.uk>
	<1185302836.6949.5.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707261357420.21532@dildano.hawaga.org.uk>
Message-ID: <1185462217.1578.11.camel@blabla.mcs.anl.gov>

On Thu, 2007-07-26 at 14:01 +0000, Ben Clifford wrote:
> On Tue, 24 Jul 2007, Mihael Hategan wrote:
> 
> > I'm thinking we should have two division operators:
> > div - integer division (int, int -> int)
> > / - floating point division ( [int|float], [int|float] -> float )
> 
> So a patch I have (not yet committed) makes / be floating point division, 
> %/ be integer division and %% be mod (rather than %).
> 
> (the % prefix on %/ and %% because those two operators are strongly 
> related).
> 
> No other particularly nice symbols spring to mind.

'div' and 'mod'?

> 
> With that in place, the operator changes you committed work without too 
> much change to the language both against the XML development stuff and 
> also against the trunk code.
> 
> I'd be happy for those to go into trunk now, ahead of the big XML 
> expression work.
> 


From hategan at mcs.anl.gov  Thu Jul 26 10:04:11 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 26 Jul 2007 10:04:11 -0500
Subject: [Swift-devel] dcache
In-Reply-To: <Pine.LNX.4.64.0707260937230.11237@dildano.hawaga.org.uk>
References: <1185408188.21980.0.camel@blabla.mcs.anl.gov>
	<Pine.LNX.4.64.0707260937230.11237@dildano.hawaga.org.uk>
Message-ID: <1185462251.1578.12.camel@blabla.mcs.anl.gov>

On Thu, 2007-07-26 at 09:38 +0000, Ben Clifford wrote:
> 
> On Wed, 25 Jul 2007, Mihael Hategan wrote:
> 
> > Does anyone know of an installation I can play with?
> > Looking at the docs, I'm a bit reluctant to try to install it on my
> > laptop.
> 
> I have access to one at fermi - it was pretty straightforward to get 
> access, as I already had a fermi account.

Right. That's what I thought I would do, unless we have something really
close.

> 


From nefedova at mcs.anl.gov  Fri Jul 27 10:22:52 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Fri, 27 Jul 2007 10:22:52 -0500
Subject: [Swift-devel] loops and strings
Message-ID: <EA4BBB79-A387-4EA5-A057-CEEB455DBBD9@mcs.anl.gov>

I am not sure if its possible to do string operations inside the loop  
in swift?
I have a versy simple test code that doesn't work no matter what.  
Obviously, I am missing something.
This is the code:

file fls[]<filesys_mapper;pattern="*.prt",location=".">;
string wham_string = "#";
foreach prt_file in fls
{
       wham_string = @strcat (wham_string, ", wham");
       print (wham_string);
}
print (wham_string);


basically I expect to have this as an output:  
#,wham,wham,wham,wham,... (its a test code (-;)

instead I have these errors:

wham_string is already assigned with a value of #
wham_string is already assigned with a value of #
         vdl:assign @ test.kml, line: 46
         vdl:mains @ test.kml, line: 39
Caused by: java.lang.IllegalArgumentException: wham_string is already  
assigned with a value of #
         at org.griphyn.vdl.mapping.AbstractDataNode.setValue 
(AbstractDataNode.java:255)
         at org.griphyn.vdl.karajan.lib.Assign.function(Assign.java:70)
<snip>


In any case -- if I can't construct the string by using the loop -  
how else could it be done?

I use the constructed string then to map an array (I understand I  
can't map individual array elements):

file whamfiles_$s[] <fixed_array_mapper;files="$wham_string">; //it  
was in the wrapper script before)


Nika


From hategan at mcs.anl.gov  Fri Jul 27 10:46:06 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 27 Jul 2007 10:46:06 -0500
Subject: [Swift-devel] loops and strings
In-Reply-To: <EA4BBB79-A387-4EA5-A057-CEEB455DBBD9@mcs.anl.gov>
References: <EA4BBB79-A387-4EA5-A057-CEEB455DBBD9@mcs.anl.gov>
Message-ID: <1185551166.17961.2.camel@blabla.mcs.anl.gov>

Variables in swift are single assignment. You can't assign to a variable
twice. What, in your opinion, should the error message be instead of the
current one?

On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote:
> I am not sure if its possible to do string operations inside the loop  
> in swift?
> I have a versy simple test code that doesn't work no matter what.  
> Obviously, I am missing something.
> This is the code:
> 
> file fls[]<filesys_mapper;pattern="*.prt",location=".">;
> string wham_string = "#";
> foreach prt_file in fls
> {
>        wham_string = @strcat (wham_string, ", wham");
>        print (wham_string);
> }
> print (wham_string);
> 
> 
> basically I expect to have this as an output:  
> #,wham,wham,wham,wham,... (its a test code (-;)
> 
> instead I have these errors:
> 
> wham_string is already assigned with a value of #
> wham_string is already assigned with a value of #
>          vdl:assign @ test.kml, line: 46
>          vdl:mains @ test.kml, line: 39
> Caused by: java.lang.IllegalArgumentException: wham_string is already  
> assigned with a value of #
>          at org.griphyn.vdl.mapping.AbstractDataNode.setValue 
> (AbstractDataNode.java:255)
>          at org.griphyn.vdl.karajan.lib.Assign.function(Assign.java:70)
> <snip>
> 
> 
> In any case -- if I can't construct the string by using the loop -  
> how else could it be done?
> 
> I use the constructed string then to map an array (I understand I  
> can't map individual array elements):
> 
> file whamfiles_$s[] <fixed_array_mapper;files="$wham_string">; //it  
> was in the wrapper script before)
> 
> 
> Nika
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 


From nefedova at mcs.anl.gov  Fri Jul 27 10:50:51 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Fri, 27 Jul 2007 10:50:51 -0500
Subject: [Swift-devel] loops and strings
In-Reply-To: <1185551166.17961.2.camel@blabla.mcs.anl.gov>
References: <EA4BBB79-A387-4EA5-A057-CEEB455DBBD9@mcs.anl.gov>
	<1185551166.17961.2.camel@blabla.mcs.anl.gov>
Message-ID: <BE751A1F-3C2D-44D6-A64D-2AD319A76F5B@mcs.anl.gov>

So how else then I construct a string in swift ?


On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote:

> Variables in swift are single assignment. You can't assign to a  
> variable
> twice. What, in your opinion, should the error message be instead  
> of the
> current one?
>
> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote:
>> I am not sure if its possible to do string operations inside the loop
>> in swift?
>> I have a versy simple test code that doesn't work no matter what.
>> Obviously, I am missing something.
>> This is the code:
>>
>> file fls[]<filesys_mapper;pattern="*.prt",location=".">;
>> string wham_string = "#";
>> foreach prt_file in fls
>> {
>>        wham_string = @strcat (wham_string, ", wham");
>>        print (wham_string);
>> }
>> print (wham_string);
>>
>>
>> basically I expect to have this as an output:
>> #,wham,wham,wham,wham,... (its a test code (-;)
>>
>> instead I have these errors:
>>
>> wham_string is already assigned with a value of #
>> wham_string is already assigned with a value of #
>>          vdl:assign @ test.kml, line: 46
>>          vdl:mains @ test.kml, line: 39
>> Caused by: java.lang.IllegalArgumentException: wham_string is already
>> assigned with a value of #
>>          at org.griphyn.vdl.mapping.AbstractDataNode.setValue
>> (AbstractDataNode.java:255)
>>          at org.griphyn.vdl.karajan.lib.Assign.function 
>> (Assign.java:70)
>> <snip>
>>
>>
>> In any case -- if I can't construct the string by using the loop -
>> how else could it be done?
>>
>> I use the constructed string then to map an array (I understand I
>> can't map individual array elements):
>>
>> file whamfiles_$s[] <fixed_array_mapper;files="$wham_string">; //it
>> was in the wrapper script before)
>>
>>
>> Nika
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>


From hategan at mcs.anl.gov  Fri Jul 27 11:01:59 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 27 Jul 2007 11:01:59 -0500
Subject: [Swift-devel] loops and strings
In-Reply-To: <BE751A1F-3C2D-44D6-A64D-2AD319A76F5B@mcs.anl.gov>
References: <EA4BBB79-A387-4EA5-A057-CEEB455DBBD9@mcs.anl.gov>
	<1185551166.17961.2.camel@blabla.mcs.anl.gov>
	<BE751A1F-3C2D-44D6-A64D-2AD319A76F5B@mcs.anl.gov>
Message-ID: <1185552119.18583.4.camel@blabla.mcs.anl.gov>

wham_string2 = @strcat(wham_string, ", wham");
print(wham_string2);

Variables are not variables. They are labels that are used to direct the
data flow. Loops (in the sense of data looping around the same node -
picture this as a data flow graph) make no sense.

On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote:
> So how else then I construct a string in swift ?
> 
> 
> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote:
> 
> > Variables in swift are single assignment. You can't assign to a  
> > variable
> > twice. What, in your opinion, should the error message be instead  
> > of the
> > current one?
> >
> > On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote:
> >> I am not sure if its possible to do string operations inside the loop
> >> in swift?
> >> I have a versy simple test code that doesn't work no matter what.
> >> Obviously, I am missing something.
> >> This is the code:
> >>
> >> file fls[]<filesys_mapper;pattern="*.prt",location=".">;
> >> string wham_string = "#";
> >> foreach prt_file in fls
> >> {
> >>        wham_string = @strcat (wham_string, ", wham");
> >>        print (wham_string);
> >> }
> >> print (wham_string);
> >>
> >>
> >> basically I expect to have this as an output:
> >> #,wham,wham,wham,wham,... (its a test code (-;)
> >>
> >> instead I have these errors:
> >>
> >> wham_string is already assigned with a value of #
> >> wham_string is already assigned with a value of #
> >>          vdl:assign @ test.kml, line: 46
> >>          vdl:mains @ test.kml, line: 39
> >> Caused by: java.lang.IllegalArgumentException: wham_string is already
> >> assigned with a value of #
> >>          at org.griphyn.vdl.mapping.AbstractDataNode.setValue
> >> (AbstractDataNode.java:255)
> >>          at org.griphyn.vdl.karajan.lib.Assign.function 
> >> (Assign.java:70)
> >> <snip>
> >>
> >>
> >> In any case -- if I can't construct the string by using the loop -
> >> how else could it be done?
> >>
> >> I use the constructed string then to map an array (I understand I
> >> can't map individual array elements):
> >>
> >> file whamfiles_$s[] <fixed_array_mapper;files="$wham_string">; //it
> >> was in the wrapper script before)
> >>
> >>
> >> Nika
> >>
> >> _______________________________________________
> >> Swift-devel mailing list
> >> Swift-devel at ci.uchicago.edu
> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>
> >
> 


From nefedova at mcs.anl.gov  Fri Jul 27 11:09:19 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Fri, 27 Jul 2007 11:09:19 -0500
Subject: [Swift-devel] loops and strings
In-Reply-To: <1185552119.18583.4.camel@blabla.mcs.anl.gov>
References: <EA4BBB79-A387-4EA5-A057-CEEB455DBBD9@mcs.anl.gov>
	<1185551166.17961.2.camel@blabla.mcs.anl.gov>
	<BE751A1F-3C2D-44D6-A64D-2AD319A76F5B@mcs.anl.gov>
	<1185552119.18583.4.camel@blabla.mcs.anl.gov>
Message-ID: <866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov>

I need to 'cat' together an unknown number of strings to form a  
string, thats why I was attempting to do it inside the loop. And even  
if I knew the number of loop cycles (say, its 68) -- are you  
suggesting  to do it 'by hand' ?


Anyway - my main goal is not to create this string, but to map an array:
file whamfiles_$s[] <fixed_array_mapper;files="$wham_string">;

Do you see a solution here?

Thanks,

Nika


On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote:

> wham_string2 = @strcat(wham_string, ", wham");
> print(wham_string2);
>
> Variables are not variables. They are labels that are used to  
> direct the
> data flow. Loops (in the sense of data looping around the same node -
> picture this as a data flow graph) make no sense.
>
> On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote:
>> So how else then I construct a string in swift ?
>>
>>
>> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote:
>>
>>> Variables in swift are single assignment. You can't assign to a
>>> variable
>>> twice. What, in your opinion, should the error message be instead
>>> of the
>>> current one?
>>>
>>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote:
>>>> I am not sure if its possible to do string operations inside the  
>>>> loop
>>>> in swift?
>>>> I have a versy simple test code that doesn't work no matter what.
>>>> Obviously, I am missing something.
>>>> This is the code:
>>>>
>>>> file fls[]<filesys_mapper;pattern="*.prt",location=".">;
>>>> string wham_string = "#";
>>>> foreach prt_file in fls
>>>> {
>>>>        wham_string = @strcat (wham_string, ", wham");
>>>>        print (wham_string);
>>>> }
>>>> print (wham_string);
>>>>
>>>>
>>>> basically I expect to have this as an output:
>>>> #,wham,wham,wham,wham,... (its a test code (-;)
>>>>
>>>> instead I have these errors:
>>>>
>>>> wham_string is already assigned with a value of #
>>>> wham_string is already assigned with a value of #
>>>>          vdl:assign @ test.kml, line: 46
>>>>          vdl:mains @ test.kml, line: 39
>>>> Caused by: java.lang.IllegalArgumentException: wham_string is  
>>>> already
>>>> assigned with a value of #
>>>>          at org.griphyn.vdl.mapping.AbstractDataNode.setValue
>>>> (AbstractDataNode.java:255)
>>>>          at org.griphyn.vdl.karajan.lib.Assign.function
>>>> (Assign.java:70)
>>>> <snip>
>>>>
>>>>
>>>> In any case -- if I can't construct the string by using the loop -
>>>> how else could it be done?
>>>>
>>>> I use the constructed string then to map an array (I understand I
>>>> can't map individual array elements):
>>>>
>>>> file whamfiles_$s[] <fixed_array_mapper;files="$wham_string">; //it
>>>> was in the wrapper script before)
>>>>
>>>>
>>>> Nika
>>>>
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>
>>>
>>
>


From hategan at mcs.anl.gov  Fri Jul 27 13:24:34 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 27 Jul 2007 13:24:34 -0500
Subject: [Swift-devel] loops and strings
In-Reply-To: <866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov>
References: <EA4BBB79-A387-4EA5-A057-CEEB455DBBD9@mcs.anl.gov>
	<1185551166.17961.2.camel@blabla.mcs.anl.gov>
	<BE751A1F-3C2D-44D6-A64D-2AD319A76F5B@mcs.anl.gov>
	<1185552119.18583.4.camel@blabla.mcs.anl.gov>
	<866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov>
Message-ID: <1185560674.19922.7.camel@blabla.mcs.anl.gov>

I see we're getting back to the same old story of the conflict between
writing a mapper and hacking one directly in swift.

This is an issue we really need to deal with. It has produced more
discussions and hacks than any other single Swift issue.

You could use an array, or we could provide a folding operator/function,
or even a join function.
We could also let fixed_array_mapper accept an array as a parameter, so
you would build an array with the file names and then pass it to the
mapper.

On Fri, 2007-07-27 at 11:09 -0500, Veronika Nefedova wrote:
> I need to 'cat' together an unknown number of strings to form a  
> string, thats why I was attempting to do it inside the loop. And even  
> if I knew the number of loop cycles (say, its 68) -- are you  
> suggesting  to do it 'by hand' ?
> 
> 
> Anyway - my main goal is not to create this string, but to map an array:
> file whamfiles_$s[] <fixed_array_mapper;files="$wham_string">;
> 
> Do you see a solution here?
> 
> Thanks,
> 
> Nika
> 
> 
> On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote:
> 
> > wham_string2 = @strcat(wham_string, ", wham");
> > print(wham_string2);
> >
> > Variables are not variables. They are labels that are used to  
> > direct the
> > data flow. Loops (in the sense of data looping around the same node -
> > picture this as a data flow graph) make no sense.
> >
> > On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote:
> >> So how else then I construct a string in swift ?
> >>
> >>
> >> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote:
> >>
> >>> Variables in swift are single assignment. You can't assign to a
> >>> variable
> >>> twice. What, in your opinion, should the error message be instead
> >>> of the
> >>> current one?
> >>>
> >>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote:
> >>>> I am not sure if its possible to do string operations inside the  
> >>>> loop
> >>>> in swift?
> >>>> I have a versy simple test code that doesn't work no matter what.
> >>>> Obviously, I am missing something.
> >>>> This is the code:
> >>>>
> >>>> file fls[]<filesys_mapper;pattern="*.prt",location=".">;
> >>>> string wham_string = "#";
> >>>> foreach prt_file in fls
> >>>> {
> >>>>        wham_string = @strcat (wham_string, ", wham");
> >>>>        print (wham_string);
> >>>> }
> >>>> print (wham_string);
> >>>>
> >>>>
> >>>> basically I expect to have this as an output:
> >>>> #,wham,wham,wham,wham,... (its a test code (-;)
> >>>>
> >>>> instead I have these errors:
> >>>>
> >>>> wham_string is already assigned with a value of #
> >>>> wham_string is already assigned with a value of #
> >>>>          vdl:assign @ test.kml, line: 46
> >>>>          vdl:mains @ test.kml, line: 39
> >>>> Caused by: java.lang.IllegalArgumentException: wham_string is  
> >>>> already
> >>>> assigned with a value of #
> >>>>          at org.griphyn.vdl.mapping.AbstractDataNode.setValue
> >>>> (AbstractDataNode.java:255)
> >>>>          at org.griphyn.vdl.karajan.lib.Assign.function
> >>>> (Assign.java:70)
> >>>> <snip>
> >>>>
> >>>>
> >>>> In any case -- if I can't construct the string by using the loop -
> >>>> how else could it be done?
> >>>>
> >>>> I use the constructed string then to map an array (I understand I
> >>>> can't map individual array elements):
> >>>>
> >>>> file whamfiles_$s[] <fixed_array_mapper;files="$wham_string">; //it
> >>>> was in the wrapper script before)
> >>>>
> >>>>
> >>>> Nika
> >>>>
> >>>> _______________________________________________
> >>>> Swift-devel mailing list
> >>>> Swift-devel at ci.uchicago.edu
> >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>>>
> >>>
> >>
> >
> 


From hategan at mcs.anl.gov  Fri Jul 27 13:36:22 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 27 Jul 2007 13:36:22 -0500
Subject: [Swift-devel] loops and strings
In-Reply-To: <410163427-1185561143-cardhu_decombobulator_blackberry.rim.net-1435636067-@bxe009.bisx.prod.on.blackberry>
References: <EA4BBB79-A387-4EA5-A057-CEEB455DBBD9@mcs.anl.gov>
	<1185551166.17961.2.camel@blabla.mcs.anl.gov>
	<BE751A1F-3C2D-44D6-A64D-2AD319A76F5B@mcs.anl.gov>
	<1185552119.18583.4.camel@blabla.mcs.anl.gov>
	<866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov>
	<1185560674.19922.7.camel@blabla.mcs.anl.gov>
	<410163427-1185561143-cardhu_decombobulator_blackberry.rim.net-1435636067-@bxe009.bisx.prod.on.blackberry>
Message-ID: <1185561383.21161.2.camel@blabla.mcs.anl.gov>

I wish. I think we all need to think about it.

On Fri, 2007-07-27 at 18:32 +0000, Ian Foster wrote:
> Can you propose a general solution?
> 
> Sent via BlackBerry from T-Mobile
> 
> -----Original Message-----
> From: Mihael Hategan <hategan at mcs.anl.gov>
> 
> Date: Fri, 27 Jul 2007 13:24:34 
> To:Veronika Nefedova <nefedova at mcs.anl.gov>
> Cc:swift-devel at ci.uchicago.edu
> Subject: Re: [Swift-devel] loops and strings
> 
> 
> I see we're getting back to the same old story of the conflict between
> writing a mapper and hacking one directly in swift.
> 
> This is an issue we really need to deal with. It has produced more
> discussions and hacks than any other single Swift issue.
> 
> You could use an array, or we could provide a folding operator/function,
> or even a join function.
> We could also let fixed_array_mapper accept an array as a parameter, so
> you would build an array with the file names and then pass it to the
> mapper.
> 
> On Fri, 2007-07-27 at 11:09 -0500, Veronika Nefedova wrote:
> > I need to 'cat' together an unknown number of strings to form a  
> > string, thats why I was attempting to do it inside the loop. And even  
> > if I knew the number of loop cycles (say, its 68) -- are you  
> > suggesting  to do it 'by hand' ?
> > 
> > 
> > Anyway - my main goal is not to create this string, but to map an array:
> > file whamfiles_$s[] <fixed_array_mapper;files="$wham_string">;
> > 
> > Do you see a solution here?
> > 
> > Thanks,
> > 
> > Nika
> > 
> > 
> > On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote:
> > 
> > > wham_string2 = @strcat(wham_string, ", wham");
> > > print(wham_string2);
> > >
> > > Variables are not variables. They are labels that are used to  
> > > direct the
> > > data flow. Loops (in the sense of data looping around the same node -
> > > picture this as a data flow graph) make no sense.
> > >
> > > On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote:
> > >> So how else then I construct a string in swift ?
> > >>
> > >>
> > >> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote:
> > >>
> > >>> Variables in swift are single assignment. You can't assign to a
> > >>> variable
> > >>> twice. What, in your opinion, should the error message be instead
> > >>> of the
> > >>> current one?
> > >>>
> > >>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote:
> > >>>> I am not sure if its possible to do string operations inside the  
> > >>>> loop
> > >>>> in swift?
> > >>>> I have a versy simple test code that doesn't work no matter what.
> > >>>> Obviously, I am missing something.
> > >>>> This is the code:
> > >>>>
> > >>>> file fls[]<filesys_mapper;pattern="*.prt",location=".">;
> > >>>> string wham_string = "#";
> > >>>> foreach prt_file in fls
> > >>>> {
> > >>>>        wham_string = @strcat (wham_string, ", wham");
> > >>>>        print (wham_string);
> > >>>> }
> > >>>> print (wham_string);
> > >>>>
> > >>>>
> > >>>> basically I expect to have this as an output:
> > >>>> #,wham,wham,wham,wham,... (its a test code (-;)
> > >>>>
> > >>>> instead I have these errors:
> > >>>>
> > >>>> wham_string is already assigned with a value of #
> > >>>> wham_string is already assigned with a value of #
> > >>>>          vdl:assign @ test.kml, line: 46
> > >>>>          vdl:mains @ test.kml, line: 39
> > >>>> Caused by: java.lang.IllegalArgumentException: wham_string is  
> > >>>> already
> > >>>> assigned with a value of #
> > >>>>          at org.griphyn.vdl.mapping.AbstractDataNode.setValue
> > >>>> (AbstractDataNode.java:255)
> > >>>>          at org.griphyn.vdl.karajan.lib.Assign.function
> > >>>> (Assign.java:70)
> > >>>> <snip>
> > >>>>
> > >>>>
> > >>>> In any case -- if I can't construct the string by using the loop -
> > >>>> how else could it be done?
> > >>>>
> > >>>> I use the constructed string then to map an array (I understand I
> > >>>> can't map individual array elements):
> > >>>>
> > >>>> file whamfiles_$s[] <fixed_array_mapper;files="$wham_string">; //it
> > >>>> was in the wrapper script before)
> > >>>>
> > >>>>
> > >>>> Nika
> > >>>>
> > >>>> _______________________________________________
> > >>>> Swift-devel mailing list
> > >>>> Swift-devel at ci.uchicago.edu
> > >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > >>>>
> > >>>
> > >>
> > >
> > 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 


From itf at mcs.anl.gov  Fri Jul 27 13:32:16 2007
From: itf at mcs.anl.gov (=?utf-8?B?SWFuIEZvc3Rlcg==?=)
Date: Fri, 27 Jul 2007 18:32:16 +0000
Subject: [Swift-devel] loops and strings
In-Reply-To: <1185560674.19922.7.camel@blabla.mcs.anl.gov>
References: <EA4BBB79-A387-4EA5-A057-CEEB455DBBD9@mcs.anl.gov><1185551166.17961.2.camel@blabla.mcs.anl.gov><BE751A1F-3C2D-44D6-A64D-2AD319A76F5B@mcs.anl.gov><1185552119.18583.4.camel@blabla.mcs.anl.gov><866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov><1185560674.19922.7.camel@blabla.mcs.anl.gov>
Message-ID: <410163427-1185561143-cardhu_decombobulator_blackberry.rim.net-1435636067-@bxe009.bisx.prod.on.blackberry>

Can you propose a general solution?

Sent via BlackBerry from T-Mobile

-----Original Message-----
From: Mihael Hategan <hategan at mcs.anl.gov>

Date: Fri, 27 Jul 2007 13:24:34 
To:Veronika Nefedova <nefedova at mcs.anl.gov>
Cc:swift-devel at ci.uchicago.edu
Subject: Re: [Swift-devel] loops and strings


I see we're getting back to the same old story of the conflict between
writing a mapper and hacking one directly in swift.

This is an issue we really need to deal with. It has produced more
discussions and hacks than any other single Swift issue.

You could use an array, or we could provide a folding operator/function,
or even a join function.
We could also let fixed_array_mapper accept an array as a parameter, so
you would build an array with the file names and then pass it to the
mapper.

On Fri, 2007-07-27 at 11:09 -0500, Veronika Nefedova wrote:
> I need to 'cat' together an unknown number of strings to form a  
> string, thats why I was attempting to do it inside the loop. And even  
> if I knew the number of loop cycles (say, its 68) -- are you  
> suggesting  to do it 'by hand' ?
> 
> 
> Anyway - my main goal is not to create this string, but to map an array:
> file whamfiles_$s[] <fixed_array_mapper;files="$wham_string">;
> 
> Do you see a solution here?
> 
> Thanks,
> 
> Nika
> 
> 
> On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote:
> 
> > wham_string2 = @strcat(wham_string, ", wham");
> > print(wham_string2);
> >
> > Variables are not variables. They are labels that are used to  
> > direct the
> > data flow. Loops (in the sense of data looping around the same node -
> > picture this as a data flow graph) make no sense.
> >
> > On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote:
> >> So how else then I construct a string in swift ?
> >>
> >>
> >> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote:
> >>
> >>> Variables in swift are single assignment. You can't assign to a
> >>> variable
> >>> twice. What, in your opinion, should the error message be instead
> >>> of the
> >>> current one?
> >>>
> >>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote:
> >>>> I am not sure if its possible to do string operations inside the  
> >>>> loop
> >>>> in swift?
> >>>> I have a versy simple test code that doesn't work no matter what.
> >>>> Obviously, I am missing something.
> >>>> This is the code:
> >>>>
> >>>> file fls[]<filesys_mapper;pattern="*.prt",location=".">;
> >>>> string wham_string = "#";
> >>>> foreach prt_file in fls
> >>>> {
> >>>>        wham_string = @strcat (wham_string, ", wham");
> >>>>        print (wham_string);
> >>>> }
> >>>> print (wham_string);
> >>>>
> >>>>
> >>>> basically I expect to have this as an output:
> >>>> #,wham,wham,wham,wham,... (its a test code (-;)
> >>>>
> >>>> instead I have these errors:
> >>>>
> >>>> wham_string is already assigned with a value of #
> >>>> wham_string is already assigned with a value of #
> >>>>          vdl:assign @ test.kml, line: 46
> >>>>          vdl:mains @ test.kml, line: 39
> >>>> Caused by: java.lang.IllegalArgumentException: wham_string is  
> >>>> already
> >>>> assigned with a value of #
> >>>>          at org.griphyn.vdl.mapping.AbstractDataNode.setValue
> >>>> (AbstractDataNode.java:255)
> >>>>          at org.griphyn.vdl.karajan.lib.Assign.function
> >>>> (Assign.java:70)
> >>>> <snip>
> >>>>
> >>>>
> >>>> In any case -- if I can't construct the string by using the loop -
> >>>> how else could it be done?
> >>>>
> >>>> I use the constructed string then to map an array (I understand I
> >>>> can't map individual array elements):
> >>>>
> >>>> file whamfiles_$s[] <fixed_array_mapper;files="$wham_string">; //it
> >>>> was in the wrapper script before)
> >>>>
> >>>>
> >>>> Nika
> >>>>
> >>>> _______________________________________________
> >>>> Swift-devel mailing list
> >>>> Swift-devel at ci.uchicago.edu
> >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>>>
> >>>
> >>
> >
> 

_______________________________________________
Swift-devel mailing list
Swift-devel at ci.uchicago.edu
http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel


From nefedova at mcs.anl.gov  Fri Jul 27 14:01:52 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Fri, 27 Jul 2007 14:01:52 -0500
Subject: [Swift-devel] loops and strings
In-Reply-To: <1185560674.19922.7.camel@blabla.mcs.anl.gov>
References: <EA4BBB79-A387-4EA5-A057-CEEB455DBBD9@mcs.anl.gov>
	<1185551166.17961.2.camel@blabla.mcs.anl.gov>
	<BE751A1F-3C2D-44D6-A64D-2AD319A76F5B@mcs.anl.gov>
	<1185552119.18583.4.camel@blabla.mcs.anl.gov>
	<866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov>
	<1185560674.19922.7.camel@blabla.mcs.anl.gov>
Message-ID: <4AF4ED33-613B-4193-AD83-B2C79D286F38@mcs.anl.gov>

will allowing multiple assignments to the same variable be a really  
impossible thing to have in swift?

Nika

On Jul 27, 2007, at 1:24 PM, Mihael Hategan wrote:
> I see we're getting back to the same old story of the conflict between
> writing a mapper and hacking one directly in swift.
>
> This is an issue we really need to deal with. It has produced more
> discussions and hacks than any other single Swift issue.
>
> You could use an array, or we could provide a folding operator/ 
> function,
> or even a join function.
> We could also let fixed_array_mapper accept an array as a  
> parameter, so
> you would build an array with the file names and then pass it to the
> mapper.
>
> On Fri, 2007-07-27 at 11:09 -0500, Veronika Nefedova wrote:
>> I need to 'cat' together an unknown number of strings to form a
>> string, thats why I was attempting to do it inside the loop. And even
>> if I knew the number of loop cycles (say, its 68) -- are you
>> suggesting  to do it 'by hand' ?
>>
>>
>> Anyway - my main goal is not to create this string, but to map an  
>> array:
>> file whamfiles_$s[] <fixed_array_mapper;files="$wham_string">;
>>
>> Do you see a solution here?
>>
>> Thanks,
>>
>> Nika
>>
>>
>> On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote:
>>
>>> wham_string2 = @strcat(wham_string, ", wham");
>>> print(wham_string2);
>>>
>>> Variables are not variables. They are labels that are used to
>>> direct the
>>> data flow. Loops (in the sense of data looping around the same  
>>> node -
>>> picture this as a data flow graph) make no sense.
>>>
>>> On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote:
>>>> So how else then I construct a string in swift ?
>>>>
>>>>
>>>> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote:
>>>>
>>>>> Variables in swift are single assignment. You can't assign to a
>>>>> variable
>>>>> twice. What, in your opinion, should the error message be instead
>>>>> of the
>>>>> current one?
>>>>>
>>>>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote:
>>>>>> I am not sure if its possible to do string operations inside the
>>>>>> loop
>>>>>> in swift?
>>>>>> I have a versy simple test code that doesn't work no matter what.
>>>>>> Obviously, I am missing something.
>>>>>> This is the code:
>>>>>>
>>>>>> file fls[]<filesys_mapper;pattern="*.prt",location=".">;
>>>>>> string wham_string = "#";
>>>>>> foreach prt_file in fls
>>>>>> {
>>>>>>        wham_string = @strcat (wham_string, ", wham");
>>>>>>        print (wham_string);
>>>>>> }
>>>>>> print (wham_string);
>>>>>>
>>>>>>
>>>>>> basically I expect to have this as an output:
>>>>>> #,wham,wham,wham,wham,... (its a test code (-;)
>>>>>>
>>>>>> instead I have these errors:
>>>>>>
>>>>>> wham_string is already assigned with a value of #
>>>>>> wham_string is already assigned with a value of #
>>>>>>          vdl:assign @ test.kml, line: 46
>>>>>>          vdl:mains @ test.kml, line: 39
>>>>>> Caused by: java.lang.IllegalArgumentException: wham_string is
>>>>>> already
>>>>>> assigned with a value of #
>>>>>>          at org.griphyn.vdl.mapping.AbstractDataNode.setValue
>>>>>> (AbstractDataNode.java:255)
>>>>>>          at org.griphyn.vdl.karajan.lib.Assign.function
>>>>>> (Assign.java:70)
>>>>>> <snip>
>>>>>>
>>>>>>
>>>>>> In any case -- if I can't construct the string by using the  
>>>>>> loop -
>>>>>> how else could it be done?
>>>>>>
>>>>>> I use the constructed string then to map an array (I understand I
>>>>>> can't map individual array elements):
>>>>>>
>>>>>> file whamfiles_$s[]  
>>>>>> <fixed_array_mapper;files="$wham_string">; //it
>>>>>> was in the wrapper script before)
>>>>>>
>>>>>>
>>>>>> Nika
>>>>>>
>>>>>> _______________________________________________
>>>>>> Swift-devel mailing list
>>>>>> Swift-devel at ci.uchicago.edu
>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>
>>>>>
>>>>
>>>
>>
>


From hategan at mcs.anl.gov  Fri Jul 27 14:11:09 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 27 Jul 2007 14:11:09 -0500
Subject: [Swift-devel] loops and strings
In-Reply-To: <4AF4ED33-613B-4193-AD83-B2C79D286F38@mcs.anl.gov>
References: <EA4BBB79-A387-4EA5-A057-CEEB455DBBD9@mcs.anl.gov>
	<1185551166.17961.2.camel@blabla.mcs.anl.gov>
	<BE751A1F-3C2D-44D6-A64D-2AD319A76F5B@mcs.anl.gov>
	<1185552119.18583.4.camel@blabla.mcs.anl.gov>
	<866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov>
	<1185560674.19922.7.camel@blabla.mcs.anl.gov>
	<4AF4ED33-613B-4193-AD83-B2C79D286F38@mcs.anl.gov>
Message-ID: <1185563469.22752.7.camel@blabla.mcs.anl.gov>

On Fri, 2007-07-27 at 14:01 -0500, Veronika Nefedova wrote:
> will allowing multiple assignments to the same variable be a really  
> impossible thing to have in swift?

With what we currently have as "Swift", yes.

> 
> Nika
> 
> On Jul 27, 2007, at 1:24 PM, Mihael Hategan wrote:
> > I see we're getting back to the same old story of the conflict between
> > writing a mapper and hacking one directly in swift.
> >
> > This is an issue we really need to deal with. It has produced more
> > discussions and hacks than any other single Swift issue.
> >
> > You could use an array, or we could provide a folding operator/ 
> > function,
> > or even a join function.
> > We could also let fixed_array_mapper accept an array as a  
> > parameter, so
> > you would build an array with the file names and then pass it to the
> > mapper.
> >
> > On Fri, 2007-07-27 at 11:09 -0500, Veronika Nefedova wrote:
> >> I need to 'cat' together an unknown number of strings to form a
> >> string, thats why I was attempting to do it inside the loop. And even
> >> if I knew the number of loop cycles (say, its 68) -- are you
> >> suggesting  to do it 'by hand' ?
> >>
> >>
> >> Anyway - my main goal is not to create this string, but to map an  
> >> array:
> >> file whamfiles_$s[] <fixed_array_mapper;files="$wham_string">;
> >>
> >> Do you see a solution here?
> >>
> >> Thanks,
> >>
> >> Nika
> >>
> >>
> >> On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote:
> >>
> >>> wham_string2 = @strcat(wham_string, ", wham");
> >>> print(wham_string2);
> >>>
> >>> Variables are not variables. They are labels that are used to
> >>> direct the
> >>> data flow. Loops (in the sense of data looping around the same  
> >>> node -
> >>> picture this as a data flow graph) make no sense.
> >>>
> >>> On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote:
> >>>> So how else then I construct a string in swift ?
> >>>>
> >>>>
> >>>> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote:
> >>>>
> >>>>> Variables in swift are single assignment. You can't assign to a
> >>>>> variable
> >>>>> twice. What, in your opinion, should the error message be instead
> >>>>> of the
> >>>>> current one?
> >>>>>
> >>>>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote:
> >>>>>> I am not sure if its possible to do string operations inside the
> >>>>>> loop
> >>>>>> in swift?
> >>>>>> I have a versy simple test code that doesn't work no matter what.
> >>>>>> Obviously, I am missing something.
> >>>>>> This is the code:
> >>>>>>
> >>>>>> file fls[]<filesys_mapper;pattern="*.prt",location=".">;
> >>>>>> string wham_string = "#";
> >>>>>> foreach prt_file in fls
> >>>>>> {
> >>>>>>        wham_string = @strcat (wham_string, ", wham");
> >>>>>>        print (wham_string);
> >>>>>> }
> >>>>>> print (wham_string);
> >>>>>>
> >>>>>>
> >>>>>> basically I expect to have this as an output:
> >>>>>> #,wham,wham,wham,wham,... (its a test code (-;)
> >>>>>>
> >>>>>> instead I have these errors:
> >>>>>>
> >>>>>> wham_string is already assigned with a value of #
> >>>>>> wham_string is already assigned with a value of #
> >>>>>>          vdl:assign @ test.kml, line: 46
> >>>>>>          vdl:mains @ test.kml, line: 39
> >>>>>> Caused by: java.lang.IllegalArgumentException: wham_string is
> >>>>>> already
> >>>>>> assigned with a value of #
> >>>>>>          at org.griphyn.vdl.mapping.AbstractDataNode.setValue
> >>>>>> (AbstractDataNode.java:255)
> >>>>>>          at org.griphyn.vdl.karajan.lib.Assign.function
> >>>>>> (Assign.java:70)
> >>>>>> <snip>
> >>>>>>
> >>>>>>
> >>>>>> In any case -- if I can't construct the string by using the  
> >>>>>> loop -
> >>>>>> how else could it be done?
> >>>>>>
> >>>>>> I use the constructed string then to map an array (I understand I
> >>>>>> can't map individual array elements):
> >>>>>>
> >>>>>> file whamfiles_$s[]  
> >>>>>> <fixed_array_mapper;files="$wham_string">; //it
> >>>>>> was in the wrapper script before)
> >>>>>>
> >>>>>>
> >>>>>> Nika
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Swift-devel mailing list
> >>>>>> Swift-devel at ci.uchicago.edu
> >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> 


From itf at mcs.anl.gov  Fri Jul 27 14:20:11 2007
From: itf at mcs.anl.gov (=?utf-8?B?SWFuIEZvc3Rlcg==?=)
Date: Fri, 27 Jul 2007 19:20:11 +0000
Subject: [Swift-devel] loops and strings
In-Reply-To: <866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov>
References: <EA4BBB79-A387-4EA5-A057-CEEB455DBBD9@mcs.anl.gov><1185551166.17961.2.camel@blabla.mcs.anl.gov><BE751A1F-3C2D-44D6-A64D-2AD319A76F5B@mcs.anl.gov><1185552119.18583.4.camel@blabla.mcs.anl.gov><866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov>
Message-ID: <625535667-1185564017-cardhu_decombobulator_blackberry.rim.net-1248866056-@bxe009.bisx.prod.on.blackberry>

Could you not handle the "cat a set of strings" case via a call to a shell script or other program that does this?

Ian


Sent via BlackBerry from T-Mobile

-----Original Message-----
From: Veronika Nefedova <nefedova at mcs.anl.gov>

Date: Fri, 27 Jul 2007 11:09:19 
To:Mihael Hategan <hategan at mcs.anl.gov>
Cc:swift-devel at ci.uchicago.edu
Subject: Re: [Swift-devel] loops and strings


I need to 'cat' together an unknown number of strings to form a  
string, thats why I was attempting to do it inside the loop. And even  
if I knew the number of loop cycles (say, its 68) -- are you  
suggesting  to do it 'by hand' ?


Anyway - my main goal is not to create this string, but to map an array:
file whamfiles_$s[] <fixed_array_mapper;files="$wham_string">;

Do you see a solution here?

Thanks,

Nika


On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote:

> wham_string2 = @strcat(wham_string, ", wham");
> print(wham_string2);
>
> Variables are not variables. They are labels that are used to  
> direct the
> data flow. Loops (in the sense of data looping around the same node -
> picture this as a data flow graph) make no sense.
>
> On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote:
>> So how else then I construct a string in swift ?
>>
>>
>> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote:
>>
>>> Variables in swift are single assignment. You can't assign to a
>>> variable
>>> twice. What, in your opinion, should the error message be instead
>>> of the
>>> current one?
>>>
>>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote:
>>>> I am not sure if its possible to do string operations inside the  
>>>> loop
>>>> in swift?
>>>> I have a versy simple test code that doesn't work no matter what.
>>>> Obviously, I am missing something.
>>>> This is the code:
>>>>
>>>> file fls[]<filesys_mapper;pattern="*.prt",location=".">;
>>>> string wham_string = "#";
>>>> foreach prt_file in fls
>>>> {
>>>>        wham_string = @strcat (wham_string, ", wham");
>>>>        print (wham_string);
>>>> }
>>>> print (wham_string);
>>>>
>>>>
>>>> basically I expect to have this as an output:
>>>> #,wham,wham,wham,wham,... (its a test code (-;)
>>>>
>>>> instead I have these errors:
>>>>
>>>> wham_string is already assigned with a value of #
>>>> wham_string is already assigned with a value of #
>>>>          vdl:assign @ test.kml, line: 46
>>>>          vdl:mains @ test.kml, line: 39
>>>> Caused by: java.lang.IllegalArgumentException: wham_string is  
>>>> already
>>>> assigned with a value of #
>>>>          at org.griphyn.vdl.mapping.AbstractDataNode.setValue
>>>> (AbstractDataNode.java:255)
>>>>          at org.griphyn.vdl.karajan.lib.Assign.function
>>>> (Assign.java:70)
>>>> <snip>
>>>>
>>>>
>>>> In any case -- if I can't construct the string by using the loop -
>>>> how else could it be done?
>>>>
>>>> I use the constructed string then to map an array (I understand I
>>>> can't map individual array elements):
>>>>
>>>> file whamfiles_$s[] <fixed_array_mapper;files="$wham_string">; //it
>>>> was in the wrapper script before)
>>>>
>>>>
>>>> Nika
>>>>
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>
>>>
>>
>

_______________________________________________
Swift-devel mailing list
Swift-devel at ci.uchicago.edu
http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel


From nefedova at mcs.anl.gov  Fri Jul 27 14:26:36 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Fri, 27 Jul 2007 14:26:36 -0500
Subject: [Swift-devel] loops and strings
In-Reply-To: <1185563469.22752.7.camel@blabla.mcs.anl.gov>
References: <EA4BBB79-A387-4EA5-A057-CEEB455DBBD9@mcs.anl.gov>
	<1185551166.17961.2.camel@blabla.mcs.anl.gov>
	<BE751A1F-3C2D-44D6-A64D-2AD319A76F5B@mcs.anl.gov>
	<1185552119.18583.4.camel@blabla.mcs.anl.gov>
	<866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov>
	<1185560674.19922.7.camel@blabla.mcs.anl.gov>
	<4AF4ED33-613B-4193-AD83-B2C79D286F38@mcs.anl.gov>
	<1185563469.22752.7.camel@blabla.mcs.anl.gov>
Message-ID: <68DFC8CA-3B70-4D09-94DA-786DD9BB9572@mcs.anl.gov>

I guess I am still missing something. I *can* have multiple  
assignments to the same variable inside the loop. Here, this code  
assigns different values to "name" at every loop step:

file fls[]<filesys_mapper;pattern="*.prt",location=".">;
foreach prt_file in fls
{
       string name = @strcut (@prt_file, "\.\/(.*)\.prt");
       print (name);
}


Or "name" considered to be a new variable every time  since I have a  
type declaration next to it?

Nika

On Jul 27, 2007, at 2:11 PM, Mihael Hategan wrote:

> On Fri, 2007-07-27 at 14:01 -0500, Veronika Nefedova wrote:
>> will allowing multiple assignments to the same variable be a really
>> impossible thing to have in swift?
>
> With what we currently have as "Swift", yes.
>
>>
>> Nika
>>
>> On Jul 27, 2007, at 1:24 PM, Mihael Hategan wrote:
>>> I see we're getting back to the same old story of the conflict  
>>> between
>>> writing a mapper and hacking one directly in swift.
>>>
>>> This is an issue we really need to deal with. It has produced more
>>> discussions and hacks than any other single Swift issue.
>>>
>>> You could use an array, or we could provide a folding operator/
>>> function,
>>> or even a join function.
>>> We could also let fixed_array_mapper accept an array as a
>>> parameter, so
>>> you would build an array with the file names and then pass it to the
>>> mapper.
>>>
>>> On Fri, 2007-07-27 at 11:09 -0500, Veronika Nefedova wrote:
>>>> I need to 'cat' together an unknown number of strings to form a
>>>> string, thats why I was attempting to do it inside the loop. And  
>>>> even
>>>> if I knew the number of loop cycles (say, its 68) -- are you
>>>> suggesting  to do it 'by hand' ?
>>>>
>>>>
>>>> Anyway - my main goal is not to create this string, but to map an
>>>> array:
>>>> file whamfiles_$s[] <fixed_array_mapper;files="$wham_string">;
>>>>
>>>> Do you see a solution here?
>>>>
>>>> Thanks,
>>>>
>>>> Nika
>>>>
>>>>
>>>> On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote:
>>>>
>>>>> wham_string2 = @strcat(wham_string, ", wham");
>>>>> print(wham_string2);
>>>>>
>>>>> Variables are not variables. They are labels that are used to
>>>>> direct the
>>>>> data flow. Loops (in the sense of data looping around the same
>>>>> node -
>>>>> picture this as a data flow graph) make no sense.
>>>>>
>>>>> On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote:
>>>>>> So how else then I construct a string in swift ?
>>>>>>
>>>>>>
>>>>>> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote:
>>>>>>
>>>>>>> Variables in swift are single assignment. You can't assign to a
>>>>>>> variable
>>>>>>> twice. What, in your opinion, should the error message be  
>>>>>>> instead
>>>>>>> of the
>>>>>>> current one?
>>>>>>>
>>>>>>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote:
>>>>>>>> I am not sure if its possible to do string operations inside  
>>>>>>>> the
>>>>>>>> loop
>>>>>>>> in swift?
>>>>>>>> I have a versy simple test code that doesn't work no matter  
>>>>>>>> what.
>>>>>>>> Obviously, I am missing something.
>>>>>>>> This is the code:
>>>>>>>>
>>>>>>>> file fls[]<filesys_mapper;pattern="*.prt",location=".">;
>>>>>>>> string wham_string = "#";
>>>>>>>> foreach prt_file in fls
>>>>>>>> {
>>>>>>>>        wham_string = @strcat (wham_string, ", wham");
>>>>>>>>        print (wham_string);
>>>>>>>> }
>>>>>>>> print (wham_string);
>>>>>>>>
>>>>>>>>
>>>>>>>> basically I expect to have this as an output:
>>>>>>>> #,wham,wham,wham,wham,... (its a test code (-;)
>>>>>>>>
>>>>>>>> instead I have these errors:
>>>>>>>>
>>>>>>>> wham_string is already assigned with a value of #
>>>>>>>> wham_string is already assigned with a value of #
>>>>>>>>          vdl:assign @ test.kml, line: 46
>>>>>>>>          vdl:mains @ test.kml, line: 39
>>>>>>>> Caused by: java.lang.IllegalArgumentException: wham_string is
>>>>>>>> already
>>>>>>>> assigned with a value of #
>>>>>>>>          at org.griphyn.vdl.mapping.AbstractDataNode.setValue
>>>>>>>> (AbstractDataNode.java:255)
>>>>>>>>          at org.griphyn.vdl.karajan.lib.Assign.function
>>>>>>>> (Assign.java:70)
>>>>>>>> <snip>
>>>>>>>>
>>>>>>>>
>>>>>>>> In any case -- if I can't construct the string by using the
>>>>>>>> loop -
>>>>>>>> how else could it be done?
>>>>>>>>
>>>>>>>> I use the constructed string then to map an array (I  
>>>>>>>> understand I
>>>>>>>> can't map individual array elements):
>>>>>>>>
>>>>>>>> file whamfiles_$s[]
>>>>>>>> <fixed_array_mapper;files="$wham_string">; //it
>>>>>>>> was in the wrapper script before)
>>>>>>>>
>>>>>>>>
>>>>>>>> Nika
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Swift-devel mailing list
>>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


From nefedova at mcs.anl.gov  Fri Jul 27 14:39:12 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Fri, 27 Jul 2007 14:39:12 -0500
Subject: [Swift-devel] loops and strings
In-Reply-To: <625535667-1185564017-cardhu_decombobulator_blackberry.rim.net-1248866056-@bxe009.bisx.prod.on.blackberry>
References: <EA4BBB79-A387-4EA5-A057-CEEB455DBBD9@mcs.anl.gov><1185551166.17961.2.camel@blabla.mcs.anl.gov><BE751A1F-3C2D-44D6-A64D-2AD319A76F5B@mcs.anl.gov><1185552119.18583.4.camel@blabla.mcs.anl.gov><866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov>
	<625535667-1185564017-cardhu_decombobulator_blackberry.rim.net-1248866056-@bxe009.bisx.prod.on.blackberry>
Message-ID: <11E45345-6700-42C5-9624-694ED4D7666E@mcs.anl.gov>

This proves a bit cumbersome to have this combination of swift and  
the wrapper. This array declaration has to be inside another loop,  
i.e. depend on the loop variable, yet being generated by shell  
script... I am still testing various possibilities. Although  
generating the string inside swift would've been much easier.

On Jul 27, 2007, at 2:20 PM, Ian Foster wrote:

> Could you not handle the "cat a set of strings" case via a call to  
> a shell script or other program that does this?
>
> Ian
>
>
> Sent via BlackBerry from T-Mobile
>
> -----Original Message-----
> From: Veronika Nefedova <nefedova at mcs.anl.gov>
>
> Date: Fri, 27 Jul 2007 11:09:19
> To:Mihael Hategan <hategan at mcs.anl.gov>
> Cc:swift-devel at ci.uchicago.edu
> Subject: Re: [Swift-devel] loops and strings
>
>
> I need to 'cat' together an unknown number of strings to form a
> string, thats why I was attempting to do it inside the loop. And even
> if I knew the number of loop cycles (say, its 68) -- are you
> suggesting  to do it 'by hand' ?
>
>
> Anyway - my main goal is not to create this string, but to map an  
> array:
> file whamfiles_$s[] <fixed_array_mapper;files="$wham_string">;
>
> Do you see a solution here?
>
> Thanks,
>
> Nika
>
>
> On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote:
>
>> wham_string2 = @strcat(wham_string, ", wham");
>> print(wham_string2);
>>
>> Variables are not variables. They are labels that are used to
>> direct the
>> data flow. Loops (in the sense of data looping around the same node -
>> picture this as a data flow graph) make no sense.
>>
>> On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote:
>>> So how else then I construct a string in swift ?
>>>
>>>
>>> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote:
>>>
>>>> Variables in swift are single assignment. You can't assign to a
>>>> variable
>>>> twice. What, in your opinion, should the error message be instead
>>>> of the
>>>> current one?
>>>>
>>>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote:
>>>>> I am not sure if its possible to do string operations inside the
>>>>> loop
>>>>> in swift?
>>>>> I have a versy simple test code that doesn't work no matter what.
>>>>> Obviously, I am missing something.
>>>>> This is the code:
>>>>>
>>>>> file fls[]<filesys_mapper;pattern="*.prt",location=".">;
>>>>> string wham_string = "#";
>>>>> foreach prt_file in fls
>>>>> {
>>>>>        wham_string = @strcat (wham_string, ", wham");
>>>>>        print (wham_string);
>>>>> }
>>>>> print (wham_string);
>>>>>
>>>>>
>>>>> basically I expect to have this as an output:
>>>>> #,wham,wham,wham,wham,... (its a test code (-;)
>>>>>
>>>>> instead I have these errors:
>>>>>
>>>>> wham_string is already assigned with a value of #
>>>>> wham_string is already assigned with a value of #
>>>>>          vdl:assign @ test.kml, line: 46
>>>>>          vdl:mains @ test.kml, line: 39
>>>>> Caused by: java.lang.IllegalArgumentException: wham_string is
>>>>> already
>>>>> assigned with a value of #
>>>>>          at org.griphyn.vdl.mapping.AbstractDataNode.setValue
>>>>> (AbstractDataNode.java:255)
>>>>>          at org.griphyn.vdl.karajan.lib.Assign.function
>>>>> (Assign.java:70)
>>>>> <snip>
>>>>>
>>>>>
>>>>> In any case -- if I can't construct the string by using the loop -
>>>>> how else could it be done?
>>>>>
>>>>> I use the constructed string then to map an array (I understand I
>>>>> can't map individual array elements):
>>>>>
>>>>> file whamfiles_$s[]  
>>>>> <fixed_array_mapper;files="$wham_string">; //it
>>>>> was in the wrapper script before)
>>>>>
>>>>>
>>>>> Nika
>>>>>
>>>>> _______________________________________________
>>>>> Swift-devel mailing list
>>>>> Swift-devel at ci.uchicago.edu
>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>
>>>>
>>>
>>
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From itf at mcs.anl.gov  Fri Jul 27 14:59:44 2007
From: itf at mcs.anl.gov (=?utf-8?B?SWFuIEZvc3Rlcg==?=)
Date: Fri, 27 Jul 2007 19:59:44 +0000
Subject: [Swift-devel] loops and strings
In-Reply-To: <68DFC8CA-3B70-4D09-94DA-786DD9BB9572@mcs.anl.gov>
References: <EA4BBB79-A387-4EA5-A057-CEEB455DBBD9@mcs.anl.gov><1185551166.17961.2.camel@blabla.mcs.anl.gov><BE751A1F-3C2D-44D6-A64D-2AD319A76F5B@mcs.anl.gov><1185552119.18583.4.camel@blabla.mcs.anl.gov><866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov><1185560674.19922.7.camel@blabla.mcs.anl.gov><4AF4ED33-613B-4193-AD83-B2C79D286F38@mcs.anl.gov><1185563469.22752.7.camel@blabla.mcs.anl.gov><68DFC8CA-3B70-4D09-94DA-786DD9BB9572@mcs.anl.gov>
Message-ID: <793163636-1185566391-cardhu_decombobulator_blackberry.rim.net-663918437-@bxe009.bisx.prod.on.blackberry>

That has local scope and so each time around the loop is a different variable

Sent via BlackBerry from T-Mobile

-----Original Message-----
From: Veronika Nefedova <nefedova at mcs.anl.gov>

Date: Fri, 27 Jul 2007 14:26:36 
To:Mihael Hategan <hategan at mcs.anl.gov>
Cc:swift-devel at ci.uchicago.edu
Subject: Re: [Swift-devel] loops and strings


I guess I am still missing something. I *can* have multiple  
assignments to the same variable inside the loop. Here, this code  
assigns different values to "name" at every loop step:

file fls[]<filesys_mapper;pattern="*.prt",location=".">;
foreach prt_file in fls
{
       string name = @strcut (@prt_file, "\.\/(.*)\.prt");
       print (name);
}


Or "name" considered to be a new variable every time  since I have a  
type declaration next to it?

Nika

On Jul 27, 2007, at 2:11 PM, Mihael Hategan wrote:

> On Fri, 2007-07-27 at 14:01 -0500, Veronika Nefedova wrote:
>> will allowing multiple assignments to the same variable be a really
>> impossible thing to have in swift?
>
> With what we currently have as "Swift", yes.
>
>>
>> Nika
>>
>> On Jul 27, 2007, at 1:24 PM, Mihael Hategan wrote:
>>> I see we're getting back to the same old story of the conflict  
>>> between
>>> writing a mapper and hacking one directly in swift.
>>>
>>> This is an issue we really need to deal with. It has produced more
>>> discussions and hacks than any other single Swift issue.
>>>
>>> You could use an array, or we could provide a folding operator/
>>> function,
>>> or even a join function.
>>> We could also let fixed_array_mapper accept an array as a
>>> parameter, so
>>> you would build an array with the file names and then pass it to the
>>> mapper.
>>>
>>> On Fri, 2007-07-27 at 11:09 -0500, Veronika Nefedova wrote:
>>>> I need to 'cat' together an unknown number of strings to form a
>>>> string, thats why I was attempting to do it inside the loop. And  
>>>> even
>>>> if I knew the number of loop cycles (say, its 68) -- are you
>>>> suggesting  to do it 'by hand' ?
>>>>
>>>>
>>>> Anyway - my main goal is not to create this string, but to map an
>>>> array:
>>>> file whamfiles_$s[] <fixed_array_mapper;files="$wham_string">;
>>>>
>>>> Do you see a solution here?
>>>>
>>>> Thanks,
>>>>
>>>> Nika
>>>>
>>>>
>>>> On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote:
>>>>
>>>>> wham_string2 = @strcat(wham_string, ", wham");
>>>>> print(wham_string2);
>>>>>
>>>>> Variables are not variables. They are labels that are used to
>>>>> direct the
>>>>> data flow. Loops (in the sense of data looping around the same
>>>>> node -
>>>>> picture this as a data flow graph) make no sense.
>>>>>
>>>>> On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote:
>>>>>> So how else then I construct a string in swift ?
>>>>>>
>>>>>>
>>>>>> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote:
>>>>>>
>>>>>>> Variables in swift are single assignment. You can't assign to a
>>>>>>> variable
>>>>>>> twice. What, in your opinion, should the error message be  
>>>>>>> instead
>>>>>>> of the
>>>>>>> current one?
>>>>>>>
>>>>>>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote:
>>>>>>>> I am not sure if its possible to do string operations inside  
>>>>>>>> the
>>>>>>>> loop
>>>>>>>> in swift?
>>>>>>>> I have a versy simple test code that doesn't work no matter  
>>>>>>>> what.
>>>>>>>> Obviously, I am missing something.
>>>>>>>> This is the code:
>>>>>>>>
>>>>>>>> file fls[]<filesys_mapper;pattern="*.prt",location=".">;
>>>>>>>> string wham_string = "#";
>>>>>>>> foreach prt_file in fls
>>>>>>>> {
>>>>>>>>        wham_string = @strcat (wham_string, ", wham");
>>>>>>>>        print (wham_string);
>>>>>>>> }
>>>>>>>> print (wham_string);
>>>>>>>>
>>>>>>>>
>>>>>>>> basically I expect to have this as an output:
>>>>>>>> #,wham,wham,wham,wham,... (its a test code (-;)
>>>>>>>>
>>>>>>>> instead I have these errors:
>>>>>>>>
>>>>>>>> wham_string is already assigned with a value of #
>>>>>>>> wham_string is already assigned with a value of #
>>>>>>>>          vdl:assign @ test.kml, line: 46
>>>>>>>>          vdl:mains @ test.kml, line: 39
>>>>>>>> Caused by: java.lang.IllegalArgumentException: wham_string is
>>>>>>>> already
>>>>>>>> assigned with a value of #
>>>>>>>>          at org.griphyn.vdl.mapping.AbstractDataNode.setValue
>>>>>>>> (AbstractDataNode.java:255)
>>>>>>>>          at org.griphyn.vdl.karajan.lib.Assign.function
>>>>>>>> (Assign.java:70)
>>>>>>>> <snip>
>>>>>>>>
>>>>>>>>
>>>>>>>> In any case -- if I can't construct the string by using the
>>>>>>>> loop -
>>>>>>>> how else could it be done?
>>>>>>>>
>>>>>>>> I use the constructed string then to map an array (I  
>>>>>>>> understand I
>>>>>>>> can't map individual array elements):
>>>>>>>>
>>>>>>>> file whamfiles_$s[]
>>>>>>>> <fixed_array_mapper;files="$wham_string">; //it
>>>>>>>> was in the wrapper script before)
>>>>>>>>
>>>>>>>>
>>>>>>>> Nika
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Swift-devel mailing list
>>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

_______________________________________________
Swift-devel mailing list
Swift-devel at ci.uchicago.edu
http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel


From nefedova at mcs.anl.gov  Fri Jul 27 15:13:20 2007
From: nefedova at mcs.anl.gov (Veronika Nefedova)
Date: Fri, 27 Jul 2007 15:13:20 -0500
Subject: [Swift-devel] loops and strings
In-Reply-To: <11E45345-6700-42C5-9624-694ED4D7666E@mcs.anl.gov>
References: <EA4BBB79-A387-4EA5-A057-CEEB455DBBD9@mcs.anl.gov><1185551166.17961.2.camel@blabla.mcs.anl.gov><BE751A1F-3C2D-44D6-A64D-2AD319A76F5B@mcs.anl.gov><1185552119.18583.4.camel@blabla.mcs.anl.gov><866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov>
	<625535667-1185564017-cardhu_decombobulator_blackberry.rim.net-1248866056-@bxe009.bisx.prod.on.blackberry>
	<11E45345-6700-42C5-9624-694ED4D7666E@mcs.anl.gov>
Message-ID: <F7D01F53-C278-4F39-80AC-69D77ECFD210@mcs.anl.gov>

ok, here is the problem I do not see how to bypass.

I have an outer loop:

foreach f in files {
string S = "bla"
}

I need to have this array declared, and if I generate the string in  
the shell script, it has to be declared explicitly:

foreach f in files {
string S = "bla"
file whamfiles [] <fixed_array_mapper;files="file1_S, file2_S,  
file3_S">;
}

and it has to be "S", not its value since its all inside the loop.  
But for swift to recognize S as its own variable (and substitute its  
value on every loop step) I need to use strcat:
@strcat("file1_", S), @strcat("file2_", S), etc for each of the  
string's element -- I do not see a way for doing it so far without  
being able to construct a string in swift... There are 68 elements in  
that string but could be any number.

Does anybody have any suggestions?

Nika

> This proves a bit cumbersome to have this combination of swift and  
> the wrapper. This array declaration has to be inside another loop,  
> i.e. depend on the loop variable, yet being generated by shell  
> script... I am still testing various possibilities. Although  
> generating the string inside swift would've been much easier.
>
> On Jul 27, 2007, at 2:20 PM, Ian Foster wrote:
>
>> Could you not handle the "cat a set of strings" case via a call to  
>> a shell script or other program that does this?
>>
>> Ian
>>
>>
>> Sent via BlackBerry from T-Mobile
>>
>> -----Original Message-----
>> From: Veronika Nefedova <nefedova at mcs.anl.gov>
>>
>> Date: Fri, 27 Jul 2007 11:09:19
>> To:Mihael Hategan <hategan at mcs.anl.gov>
>> Cc:swift-devel at ci.uchicago.edu
>> Subject: Re: [Swift-devel] loops and strings
>>
>>
>> I need to 'cat' together an unknown number of strings to form a
>> string, thats why I was attempting to do it inside the loop. And even
>> if I knew the number of loop cycles (say, its 68) -- are you
>> suggesting  to do it 'by hand' ?
>>
>>
>> Anyway - my main goal is not to create this string, but to map an  
>> array:
>> file whamfiles_$s[] <fixed_array_mapper;files="$wham_string">;
>>
>> Do you see a solution here?
>>
>> Thanks,
>>
>> Nika
>>
>>
>> On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote:
>>
>>> wham_string2 = @strcat(wham_string, ", wham");
>>> print(wham_string2);
>>>
>>> Variables are not variables. They are labels that are used to
>>> direct the
>>> data flow. Loops (in the sense of data looping around the same  
>>> node -
>>> picture this as a data flow graph) make no sense.
>>>
>>> On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote:
>>>> So how else then I construct a string in swift ?
>>>>
>>>>
>>>> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote:
>>>>
>>>>> Variables in swift are single assignment. You can't assign to a
>>>>> variable
>>>>> twice. What, in your opinion, should the error message be instead
>>>>> of the
>>>>> current one?
>>>>>
>>>>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote:
>>>>>> I am not sure if its possible to do string operations inside the
>>>>>> loop
>>>>>> in swift?
>>>>>> I have a versy simple test code that doesn't work no matter what.
>>>>>> Obviously, I am missing something.
>>>>>> This is the code:
>>>>>>
>>>>>> file fls[]<filesys_mapper;pattern="*.prt",location=".">;
>>>>>> string wham_string = "#";
>>>>>> foreach prt_file in fls
>>>>>> {
>>>>>>        wham_string = @strcat (wham_string, ", wham");
>>>>>>        print (wham_string);
>>>>>> }
>>>>>> print (wham_string);
>>>>>>
>>>>>>
>>>>>> basically I expect to have this as an output:
>>>>>> #,wham,wham,wham,wham,... (its a test code (-;)
>>>>>>
>>>>>> instead I have these errors:
>>>>>>
>>>>>> wham_string is already assigned with a value of #
>>>>>> wham_string is already assigned with a value of #
>>>>>>          vdl:assign @ test.kml, line: 46
>>>>>>          vdl:mains @ test.kml, line: 39
>>>>>> Caused by: java.lang.IllegalArgumentException: wham_string is
>>>>>> already
>>>>>> assigned with a value of #
>>>>>>          at org.griphyn.vdl.mapping.AbstractDataNode.setValue
>>>>>> (AbstractDataNode.java:255)
>>>>>>          at org.griphyn.vdl.karajan.lib.Assign.function
>>>>>> (Assign.java:70)
>>>>>> <snip>
>>>>>>
>>>>>>
>>>>>> In any case -- if I can't construct the string by using the  
>>>>>> loop -
>>>>>> how else could it be done?
>>>>>>
>>>>>> I use the constructed string then to map an array (I understand I
>>>>>> can't map individual array elements):
>>>>>>
>>>>>> file whamfiles_$s[]  
>>>>>> <fixed_array_mapper;files="$wham_string">; //it
>>>>>> was in the wrapper script before)
>>>>>>
>>>>>>
>>>>>> Nika
>>>>>>
>>>>>> _______________________________________________
>>>>>> Swift-devel mailing list
>>>>>> Swift-devel at ci.uchicago.edu
>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>
>>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>


From hategan at mcs.anl.gov  Fri Jul 27 15:30:12 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 27 Jul 2007 15:30:12 -0500
Subject: [Swift-devel] loops and strings
In-Reply-To: <F7D01F53-C278-4F39-80AC-69D77ECFD210@mcs.anl.gov>
References: <EA4BBB79-A387-4EA5-A057-CEEB455DBBD9@mcs.anl.gov>
	<1185551166.17961.2.camel@blabla.mcs.anl.gov>
	<BE751A1F-3C2D-44D6-A64D-2AD319A76F5B@mcs.anl.gov>
	<1185552119.18583.4.camel@blabla.mcs.anl.gov>
	<866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov>
	<625535667-1185564017-cardhu_decombobulator_blackberry.rim.net-1248866056-@bxe009.bisx.prod.on.blackberry>
	<11E45345-6700-42C5-9624-694ED4D7666E@mcs.anl.gov>
	<F7D01F53-C278-4F39-80AC-69D77ECFD210@mcs.anl.gov>
Message-ID: <1185568212.26509.3.camel@blabla.mcs.anl.gov>

Seriously now. Having a mapper would save you lots of time. I'll help
you out.

Take a look at AirsnMapper.java and ROIMapper.java.

Mihael

On Fri, 2007-07-27 at 15:13 -0500, Veronika Nefedova wrote:
> ok, here is the problem I do not see how to bypass.
> 
> I have an outer loop:
> 
> foreach f in files {
> string S = "bla"
> }
> 
> I need to have this array declared, and if I generate the string in  
> the shell script, it has to be declared explicitly:
> 
> foreach f in files {
> string S = "bla"
> file whamfiles [] <fixed_array_mapper;files="file1_S, file2_S,  
> file3_S">;
> }
> 
> and it has to be "S", not its value since its all inside the loop.  
> But for swift to recognize S as its own variable (and substitute its  
> value on every loop step) I need to use strcat:
> @strcat("file1_", S), @strcat("file2_", S), etc for each of the  
> string's element -- I do not see a way for doing it so far without  
> being able to construct a string in swift... There are 68 elements in  
> that string but could be any number.
> 
> Does anybody have any suggestions?
> 
> Nika
> 
> > This proves a bit cumbersome to have this combination of swift and  
> > the wrapper. This array declaration has to be inside another loop,  
> > i.e. depend on the loop variable, yet being generated by shell  
> > script... I am still testing various possibilities. Although  
> > generating the string inside swift would've been much easier.
> >
> > On Jul 27, 2007, at 2:20 PM, Ian Foster wrote:
> >
> >> Could you not handle the "cat a set of strings" case via a call to  
> >> a shell script or other program that does this?
> >>
> >> Ian
> >>
> >>
> >> Sent via BlackBerry from T-Mobile
> >>
> >> -----Original Message-----
> >> From: Veronika Nefedova <nefedova at mcs.anl.gov>
> >>
> >> Date: Fri, 27 Jul 2007 11:09:19
> >> To:Mihael Hategan <hategan at mcs.anl.gov>
> >> Cc:swift-devel at ci.uchicago.edu
> >> Subject: Re: [Swift-devel] loops and strings
> >>
> >>
> >> I need to 'cat' together an unknown number of strings to form a
> >> string, thats why I was attempting to do it inside the loop. And even
> >> if I knew the number of loop cycles (say, its 68) -- are you
> >> suggesting  to do it 'by hand' ?
> >>
> >>
> >> Anyway - my main goal is not to create this string, but to map an  
> >> array:
> >> file whamfiles_$s[] <fixed_array_mapper;files="$wham_string">;
> >>
> >> Do you see a solution here?
> >>
> >> Thanks,
> >>
> >> Nika
> >>
> >>
> >> On Jul 27, 2007, at 11:01 AM, Mihael Hategan wrote:
> >>
> >>> wham_string2 = @strcat(wham_string, ", wham");
> >>> print(wham_string2);
> >>>
> >>> Variables are not variables. They are labels that are used to
> >>> direct the
> >>> data flow. Loops (in the sense of data looping around the same  
> >>> node -
> >>> picture this as a data flow graph) make no sense.
> >>>
> >>> On Fri, 2007-07-27 at 10:50 -0500, Veronika Nefedova wrote:
> >>>> So how else then I construct a string in swift ?
> >>>>
> >>>>
> >>>> On Jul 27, 2007, at 10:46 AM, Mihael Hategan wrote:
> >>>>
> >>>>> Variables in swift are single assignment. You can't assign to a
> >>>>> variable
> >>>>> twice. What, in your opinion, should the error message be instead
> >>>>> of the
> >>>>> current one?
> >>>>>
> >>>>> On Fri, 2007-07-27 at 10:22 -0500, Veronika Nefedova wrote:
> >>>>>> I am not sure if its possible to do string operations inside the
> >>>>>> loop
> >>>>>> in swift?
> >>>>>> I have a versy simple test code that doesn't work no matter what.
> >>>>>> Obviously, I am missing something.
> >>>>>> This is the code:
> >>>>>>
> >>>>>> file fls[]<filesys_mapper;pattern="*.prt",location=".">;
> >>>>>> string wham_string = "#";
> >>>>>> foreach prt_file in fls
> >>>>>> {
> >>>>>>        wham_string = @strcat (wham_string, ", wham");
> >>>>>>        print (wham_string);
> >>>>>> }
> >>>>>> print (wham_string);
> >>>>>>
> >>>>>>
> >>>>>> basically I expect to have this as an output:
> >>>>>> #,wham,wham,wham,wham,... (its a test code (-;)
> >>>>>>
> >>>>>> instead I have these errors:
> >>>>>>
> >>>>>> wham_string is already assigned with a value of #
> >>>>>> wham_string is already assigned with a value of #
> >>>>>>          vdl:assign @ test.kml, line: 46
> >>>>>>          vdl:mains @ test.kml, line: 39
> >>>>>> Caused by: java.lang.IllegalArgumentException: wham_string is
> >>>>>> already
> >>>>>> assigned with a value of #
> >>>>>>          at org.griphyn.vdl.mapping.AbstractDataNode.setValue
> >>>>>> (AbstractDataNode.java:255)
> >>>>>>          at org.griphyn.vdl.karajan.lib.Assign.function
> >>>>>> (Assign.java:70)
> >>>>>> <snip>
> >>>>>>
> >>>>>>
> >>>>>> In any case -- if I can't construct the string by using the  
> >>>>>> loop -
> >>>>>> how else could it be done?
> >>>>>>
> >>>>>> I use the constructed string then to map an array (I understand I
> >>>>>> can't map individual array elements):
> >>>>>>
> >>>>>> file whamfiles_$s[]  
> >>>>>> <fixed_array_mapper;files="$wham_string">; //it
> >>>>>> was in the wrapper script before)
> >>>>>>
> >>>>>>
> >>>>>> Nika
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Swift-devel mailing list
> >>>>>> Swift-devel at ci.uchicago.edu
> >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >> _______________________________________________
> >> Swift-devel mailing list
> >> Swift-devel at ci.uchicago.edu
> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 


From bugzilla-daemon at mcs.anl.gov  Fri Jul 27 18:46:03 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Fri, 27 Jul 2007 18:46:03 -0500 (CDT)
Subject: [Swift-devel] [Bug 84] New: switch does not work with variable
	parameter
Message-ID: <bug-84-21@http.bugzilla.mcs.anl.gov/swift/>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=84

           Summary: switch does not work with variable parameter
           Product: Swift
           Version: unspecified
          Platform: Macintosh
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P2
         Component: SwiftScript language
        AssignedTo: benc at hawaga.org.uk
        ReportedBy: benc at hawaga.org.uk
                CC: swift-devel at ci.uchicago.edu


The below code executes the default case, rather than the 8 case. Replacing the
switch with:

    switch(8) {

with the selector value a hard-coded constant causes the 8 case to run.

type messagefile {}

(messagefile t) greeting(string m) { 
    app {
        echo m stdout=@filename(t);
    }
}

messagefile outfile <"091-case.out">;

int selector = 8;

print(selector);

string message;

switch(selector) {
  case 3:
    message="first message";
    break;
  case 8:
    message="eighth message";
    break;
  case 57:
    message="last message";
    break;
  default:
    message="no message at all...";
    break;
}

outfile = greeting(message);


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Fri Jul 27 18:48:44 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Fri, 27 Jul 2007 18:48:44 -0500 (CDT)
Subject: [Swift-devel] [Bug 85] New: break statements in switch/case have no
	effect.
Message-ID: <bug-85-21@http.bugzilla.mcs.anl.gov/swift/>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=85

           Summary: break statements in switch/case have no effect.
           Product: Swift
           Version: unspecified
          Platform: Macintosh
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P2
         Component: SwiftScript language
        AssignedTo: benc at hawaga.org.uk
        ReportedBy: benc at hawaga.org.uk
                CC: swift-devel at ci.uchicago.edu


Break statements in case/switch have no effect - code behaves the same whether
there's a break or not, and executes only the code directly attached to any
particular case.

For example, the below code executes only the 8th case, rather than executing
the code associated with other case lower down too (which should then fail with
multiple assignment error).

Likely easiest course is to remove break; from the language.

type messagefile {}

(messagefile t) greeting(string m) { 
    app {
        echo m stdout=@filename(t);
    }
}

messagefile outfile <"092-case-duffs-device.out">;


string message;

switch(8) {
  case 3:
    message="first message";
  case 8:
    message="eighth message";
  default:
    message="no message at all...";
  case 57:
    message="last message";
}


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Fri Jul 27 20:09:28 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Fri, 27 Jul 2007 20:09:28 -0500 (CDT)
Subject: [Swift-devel] [Bug 84] switch does not work with variable parameter
In-Reply-To: <bug-84-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070728010928.83157164EC@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=84


------- Comment #1 from hategan at mcs.anl.gov  2007-07-27 20:09 -------
Comparison broken? Can you add a test case in SVN?


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Fri Jul 27 20:15:05 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Fri, 27 Jul 2007 20:15:05 -0500 (CDT)
Subject: [Swift-devel] [Bug 85] break statements in switch/case have no
	effect.
In-Reply-To: <bug-85-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070728011505.3B346164EC@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=85


------- Comment #1 from hategan at mcs.anl.gov  2007-07-27 20:15 -------
I'm thinking that the C behavio(u)r here, we may want to avoid. In fact we
could drop the switch statement altogether. In C it fulfills the important role
of having multiple if tests compiled into (more or less) one indirect jump. In
Swift, it looks more like a liability.


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From benc at hawaga.org.uk  Fri Jul 27 22:42:38 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Sat, 28 Jul 2007 03:42:38 +0000 (GMT)
Subject: [Swift-devel] Re: [Bug 84] switch does not work with variable
	parameter
In-Reply-To: <20070728010928.9A88816502@foxtrot.mcs.anl.gov>
References: <20070728010928.9A88816502@foxtrot.mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707280340430.26516@dildano.hawaga.org.uk>


On Fri, 27 Jul 2007, bugzilla-daemon at mcs.anl.gov wrote:

> ------- Comment #1 from hategan at mcs.anl.gov  2007-07-27 20:09 -------
> Comparison broken? Can you add a test case in SVN?

yes, it turns out - I hadn't thought about that being a cause. The below 
fails (i.e. writes 'false' to cmp3.out). I have a bunch of tests related 
to this. Will commit them tomorrow when I'm more awake.

type messagefile {}

(messagefile t) greeting(boolean b) { 
    app {
        echo b stdout=@filename(t);
    }
}

messagefile outfile <"cmp3.out">;

int i = 2;

boolean r = i==2;

outfile = greeting(r);

-- 


From benc at hawaga.org.uk  Sat Jul 28 07:47:55 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Sat, 28 Jul 2007 12:47:55 +0000 (GMT)
Subject: [Swift-devel] loops and strings
In-Reply-To: <68DFC8CA-3B70-4D09-94DA-786DD9BB9572@mcs.anl.gov>
References: <EA4BBB79-A387-4EA5-A057-CEEB455DBBD9@mcs.anl.gov>
	<1185551166.17961.2.camel@blabla.mcs.anl.gov>
	<BE751A1F-3C2D-44D6-A64D-2AD319A76F5B@mcs.anl.gov>
	<1185552119.18583.4.camel@blabla.mcs.anl.gov>
	<866A705A-0D06-4990-AB99-15AC783C27D6@mcs.anl.gov>
	<1185560674.19922.7.camel@blabla.mcs.anl.gov>
	<4AF4ED33-613B-4193-AD83-B2C79D286F38@mcs.anl.gov>
	<1185563469.22752.7.camel@blabla.mcs.anl.gov>
	<68DFC8CA-3B70-4D09-94DA-786DD9BB9572@mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0707281245470.17188@dildano.hawaga.org.uk>


On Fri, 27 Jul 2007, Veronika Nefedova wrote:

> Or "name" considered to be a new variable every time  since I have a type
> declaration next to it?

pretty much, yes - its declared inside the loop so every time that loop 
code is run a new variables comes into existence. if its declared in an 
outer loop, then a new one comes into existence every time the outer loop 
runs. if its declared at the top level of your swift code, then a new one 
comes into existence every time you run a new workfow.

-- 


From bugzilla-daemon at mcs.anl.gov  Sat Jul 28 08:02:08 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Sat, 28 Jul 2007 08:02:08 -0500 (CDT)
Subject: [Swift-devel] [Bug 85] break statements in switch/case have no
	effect.
In-Reply-To: <bug-85-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070728130208.2A286164B3@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=85


------- Comment #2 from benc at hawaga.org.uk  2007-07-28 08:02 -------
r1001 removes break from the language.


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Sat Jul 28 08:18:34 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Sat, 28 Jul 2007 08:18:34 -0500 (CDT)
Subject: [Swift-devel] [Bug 84] switch does not work with variable parameter
In-Reply-To: <bug-84-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070728131834.BBE4E164EC@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=84


------- Comment #2 from benc at hawaga.org.uk  2007-07-28 08:18 -------
Yes, there seems to be a problem with numerical comparison.
In r1002, I added three tests to language-behaviour - 100-comparison.swift
which works, and broken/bug84*.swift which don't work.

(To run the ones in the broken subdir, you need to be in the broken
subdirectory, so type something like this:

cd broken/
../run bug84-comparisons2.swift
)


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Sat Jul 28 15:32:04 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Sat, 28 Jul 2007 15:32:04 -0500 (CDT)
Subject: [Swift-devel] [Bug 84] switch does not work with variable parameter
In-Reply-To: <bug-84-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070728203204.7C3F0164B3@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=84


------- Comment #3 from hategan at mcs.anl.gov  2007-07-28 15:32 -------
It's comparing "2" with 2. Either we switch to the new expression stuff, where
numbers are numbers, or equals() is changed to equalsNumeric(), which does some
type conversion before doing the comparison.


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Mon Jul 30 16:29:20 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Mon, 30 Jul 2007 16:29:20 -0500 (CDT)
Subject: [Swift-devel] [Bug 85] break statements in switch/case have no
	effect.
In-Reply-To: <bug-85-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070730212920.60818164EC@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=85


benc at hawaga.org.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Mon Jul 30 16:35:38 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Mon, 30 Jul 2007 16:35:38 -0500 (CDT)
Subject: [Swift-devel] [Bug 2] Diamond and file_counter tests are failing.
In-Reply-To: <bug-2-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070730213538.C0BBB164EC@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=2


hategan at mcs.anl.gov changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #5 from hategan at mcs.anl.gov  2007-07-30 16:35 -------
Looks like this has been solved by Ben's updates
(http://www.ci.uchicago.edu/trac/swift/changeset/952)


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You reported the bug, or are watching the reporter.


From bugzilla-daemon at mcs.anl.gov  Mon Jul 30 16:40:53 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Mon, 30 Jul 2007 16:40:53 -0500 (CDT)
Subject: [Swift-devel] [Bug 16] failing job behaviour varies depending on
	-debug or not
In-Reply-To: <bug-16-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070730214053.0C358164EC@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=16


hategan at mcs.anl.gov changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #4 from hategan at mcs.anl.gov  2007-07-30 16:40 -------
Closing this up since it seems solved. Reopen if necessary.


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Mon Jul 30 16:43:29 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Mon, 30 Jul 2007 16:43:29 -0500 (CDT)
Subject: [Swift-devel] [Bug 19] @stdin doesn't seem to work properly
In-Reply-To: <bug-19-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070730214329.6FF8E164EC@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=19


hategan at mcs.anl.gov changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Mon Jul 30 16:51:14 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Mon, 30 Jul 2007 16:51:14 -0500 (CDT)
Subject: [Swift-devel] [Bug 37] PATH being printed out when workflow runs
In-Reply-To: <bug-37-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070730215114.83FCD16505@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=37


hategan at mcs.anl.gov changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #5 from hategan at mcs.anl.gov  2007-07-30 16:51 -------
Bug seems to have mysteriously disappeared. Reopen if needed.


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at mcs.anl.gov  Mon Jul 30 16:53:52 2007
From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov)
Date: Mon, 30 Jul 2007 16:53:52 -0500 (CDT)
Subject: [Swift-devel] [Bug 37] PATH being printed out when workflow runs
In-Reply-To: <bug-37-21@http.bugzilla.mcs.anl.gov/swift/>
Message-ID: <20070730215352.648C116505@foxtrot.mcs.anl.gov>

http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=37


------- Comment #6 from nefedova at mcs.anl.gov  2007-07-30 16:53 -------
nope, its still there (r999). It prints $PATH at every job invocation.


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From foster at mcs.anl.gov  Tue Jul 31 08:23:37 2007
From: foster at mcs.anl.gov (Ian Foster)
Date: Tue, 31 Jul 2007 08:23:37 -0500
Subject: [Swift-devel] Q about MolDyn
Message-ID: <46AF37D9.7000301@mcs.anl.gov>

Hi,

I am curious whether we found out why those two jobs (?) were failing at 
the end of the big MolDyn run?

Ian.


From benc at hawaga.org.uk  Tue Jul 31 14:14:04 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 31 Jul 2007 19:14:04 +0000 (GMT)
Subject: [Swift-devel] kilo-commit
Message-ID: <Pine.LNX.4.64.0707311913300.26338@dildano.hawaga.org.uk>


Commit r1024 just went into SVN...

--