[Swift-devel] Re: test v0.1rc1

Veronika V. Nefedova nefedova at mcs.anl.gov
Tue Feb 27 14:54:20 CST 2007


It worked! Thank you very much for tracking it down, Mihael.

Anyway -- a complete MolDyn workflow just finished its first run on 
Teragrid (for 3 molecules only).

(-;

Nika

At 02:19 PM 2/27/2007, Veronika  V. Nefedova wrote:

>Thanks, Mihael!
>I'll try it now
>
>Nika
>
>At 02:04 PM 2/27/2007, Mihael Hategan wrote:
>>Try it now. The latest nightly build should contain the fix.
>>
>>The problem was an inner class having synchronized methods and me
>>idiotically assuming that they will use the outer class' instance
>>monitor.
>>
>>Mihael
>>
>>On Mon, 2007-02-26 at 22:17 -0600, Veronika V. Nefedova wrote:
>> > If you give me your vds_home location - I can try to run the workflow and
>> > see if its working...
>> >
>> > NIka
>> > At 09:57 PM 2/26/2007, Mihael Hategan wrote:
>> > >Hmm. I made a change to the code that did not seem to be the cause, but
>> > >some other, smaller issue and enabled some more debugging in log4j. With
>> > >this, I've been running the workflow in a loop on wiggum for two hours
>> > >now, and got nothing yet. I don't know what to make of it.
>> > >
>> > >I'll keep running and eventually revert the changes to see if they are
>> > >the source.
>> > >
>> > >Mihael
>> > >
>> > >On Mon, 2007-02-26 at 14:47 -0600, Mihael Hategan wrote:
>> > > > On Mon, 2007-02-26 at 14:46 -0600, Veronika V. Nefedova wrote:
>> > > > > An additional info: This failure happened on TG with 070219 when 
>> I was
>> > > > > running 2 molecules at the same time (i.e. two executables at 
>> the same
>> > > > > time). When I tried to run just one, it failed with the same
>> > > exitcode, but
>> > > > > didn't have that handle exception:
>> > > >
>> > > > Right. This seems like a different problem, and I'm not sure if it's
>> > > > Swift or some problem with TP or the application. That needs to be
>> > > > investigated.
>> > > >
>> > > > >
>> > > > > 2007-02-26 14:34:41,986 DEBUG vdl:execute2 Application 
>> exception: Job
>> > > chrm
>> > > > > failed with an exit code of 174
>> > > > >          sys:throw @ vdl-int.k, line: 108
>> > > > >          vdl:checkexitcode @ vdl-int.k, line: 367
>> > > > >          vdl:execute2 @ execute-default.k, line: 22
>> > > > >          vdl:execute @ swift-MolDyn.kml, line: 69
>> > > > >          charmm @ swift-MolDyn.kml, line: 279
>> > > > >          vdl:mains @ swift-MolDyn.kml, line: 261
>> > > > > <here it re-tries it>
>> > > > >
>> > > > > Again, the failure with 070219 happens only on TG, on localhost 
>> (wiggum)
>> > > > > its working just fine.
>> > > > >
>> > > > > Nika
>> > > > >
>> > > > >
>> > > > > At 02:38 PM 2/26/2007, Mihael Hategan wrote:
>> > > > > >That's fine. Just wanted to be clear that we're talking about 
>> the same
>> > > > > >error. It's good that it also occurs in 070219, because there 
>> are no
>> > > > > >recent changes I could remember that could trigger it. It's 
>> also good to
>> > > > > >know that it may or may not occur, because I know approximately 
>> what
>> > > > > >class of problem we're dealing with.
>> > > > > >
>> > > > > >Mihael
>> > > > > >
>> > > > > >On Mon, 2007-02-26 at 14:37 -0600, Veronika V. Nefedova wrote:
>> > > > > > > Yes, I didn't paste it -- its all in the log. If you'd like 
>> I can
>> > > send you
>> > > > > > > the log as an attachment...
>> > > > > > >
>> > > > > > > Nika
>> > > > > > >
>> > > > > > > At 02:33 PM 2/26/2007, Mihael Hategan wrote:
>> > > > > > > >Wait, because I'm missing something. Wasn't the error 
>> supposed to be
>> > > > > > > >"TaskHandler can only handle unsubmitted tasks"?
>> > > > > > > >
>> > > > > > > >On Mon, 2007-02-26 at 14:26 -0600, Veronika V. Nefedova wrote:
>> > > > > > > > > And now its getting interesting!
>> > > > > > > > >
>> > > > > > > > > I have now the same failure (as below) with 070219 as I 
>> had on
>> > > > > > localhost
>> > > > > > > > > with v0.1rc1 *BUT* when running on TG. Failed at the same
>> > > point (while
>> > > > > > > > > trying to run the last app in the workflow), with the same
>> > > exceptions.
>> > > > > > > > > Strange that 070219 worked on localhost (and still working).
>> > > > > > > > >
>> > > > > > > > > The log is on wiggum:
>> > > > > > > > /sandbox/ydeng/alamines/swift-MolDyn-690y7r1skc8z0.log
>> > > > > > > > >
>> > > > > > > > > 2007-02-26 14:10:16,543 INFO  vdl:execute2 Running job
>> > > > > > chrm-rmnoet7i chrm
>> > > > > > > > > with arguments [system:solv_m001, title:solv, stitle:m001,
>> > > > > > > > > rtffile:parm03_gaff_all.rtf, 
>> paramfile:parm03_gaffnb_all.prm,
>> > > > > > > > > gaff:m001_am1, nwater:400, ligcrd:lyz, rforce:0, 
>> iseed:3131887,
>> > > > > > rwater:15,
>> > > > > > > > > nstep:100, minstep:100, skipstep:100, startstep:10000] in
>> > > > > > > > > swift-MolDyn-690y7r1skc8z0/chrm-rmnoet7i on TG-NCSA
>> > > > > > > > > 2007-02-26 14:11:18,586 DEBUG vdl:execute2 Application
>> > > exception:
>> > > > > > Job chrm
>> > > > > > > > > failed with an exit code of 174
>> > > > > > > > > <snip>
>> > > > > > > > > All input files are staged in...
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > Nika
>> > > > > > > > >
>> > > > > > > > > At 02:17 PM 2/26/2007, Veronika  V. Nefedova wrote:
>> > > > > > > > > >You can try to run my application, or look in the logs. I
>> > > ran it
>> > > > > > all on
>> > > > > > > > > >wiggum. The log is:
>> > > > > > > > > >/sandbox/ydeng/alamines/swift-MolDyn-8q6ygr7cy15c2.log
>> > > > > > > > > >
>> > > > > > > > > >the dtm file I am running is
>> > > /sandbox/ydeng/alamines/swift-MolDyn.dtm
>> > > > > > > > > >
>> > > > > > > > > >Nika
>> > > > > > > > > >
>> > > > > > > > > >At 01:39 PM 2/26/2007, Mihael Hategan wrote:
>> > > > > > > > > >>That doesn't sound good. How do I reproduce this?
>> > > > > > > > > >>
>> > > > > > > > > >>Mihael
>> > > > > > > > > >>
>> > > > > > > > > >>On Mon, 2007-02-26 at 13:21 -0600, Veronika V. 
>> Nefedova wrote:
>> > > > > > > > > >> > The one Ben asked us all to test:
>> > > > > > > > > >> >
>> > > > > > > > > >> > >http://www.ci.uchicago.edu/swift/tests/vdsk-0.1rc1. 
>> t ar.gz
>> > > > > > > > > >> >
>> > > > > > > > > >> > At 01:15 PM 2/26/2007, Mihael Hategan wrote:
>> > > > > > > > > >> > >On Mon, 2007-02-26 at 13:05 -0600, Veronika V. 
>> Nefedova
>> > > wrote:
>> > > > > > > > > >> > > > When I tried to run my working workflow with a new
>> > > version, it
>> > > > > > > > > >> gave me an
>> > > > > > > > > >> > > > exception:
>> > > > > > > > > >> > >
>> > > > > > > > > >> > >Which new version?
>> > > > > > > > > >> > >
>> > > > > > > > > >> > >Mihael
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > > > Warning: Task handler throws exception but does not
>> > > set status
>> > > > > > > > > >> > > >
>> > > > > > > >
>> > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>> > > > > > > > > >> > > > TaskHandler can only handle unsubmitted tasks
>> > > > > > > > > >> > > >          at
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > >
>> > > > > > > > > >>
>> > > > > > > >
>> > > > > >
>> > > 
>> org.globus.cog.abstraction.impl.common.task.CachingFileOperationTaskHandler.submit(CachingFileOperationTaskHandler.java:20)
>> > > > > > > > > >> > > >          at
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > >
>> > > > > > > > > >>
>> > > > > > > >
>> > > > > >
>> > > 
>> org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:78)
>> > > > > > > > > >> > > >          at java.lang.Thread.run(Thread.java:534)
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > > > [349] wiggum /sandbox/ydeng/alamines > \\
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > > > I do not have this happening with 070219 built.
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > > > Nika
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > > > At 06:12 AM 2/26/2007, Ben Clifford wrote:
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > > > >On Mon, 26 Feb 2007, Ben Clifford wrote:
>> > > > > > > > > >> > > > >
>> > > > > > > > > >> > > > > >
>> > > > > > > > > >> > > > > > v0.1rc1 was built at the end of last week.
>> > > please spend
>> > > > > > > > some time
>> > > > > > > > > >> > > testing
>> > > > > > > > > >> > > > >
>> > > > > > > > > >> > > > >here's the URL for download:
>> > > > > > > > > >> > > > >
>> > > > > > > > > >> > > > >http://www.ci.uchicago.edu/swift/tests/vdsk-0.1 
>> r c1.t
>> > > ar.gz
>> > > > > > > > > >> > > > >
>> > > > > > > > > >> > > > >--
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > > > _______________________________________________
>> > > > > > > > > >> > > > Swift-devel mailing list
>> > > > > > > > > >> > > > Swift-devel at ci.uchicago.edu
>> > > > > > > > > >> > > > 
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>> > > > > > > > > >> > > >
>> > > > > > > > > >> >
>> > > > > > > > > >> >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >_______________________________________________
>> > > > > > > > > >Swift-devel mailing list
>> > > > > > > > > >Swift-devel at ci.uchicago.edu
>> > > > > > > > > >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > >
>> > > > >
>> > > >
>> > > > _______________________________________________
>> > > > Swift-devel mailing list
>> > > > Swift-devel at ci.uchicago.edu
>> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>> > > >
>> >
>> >
>
>
>_______________________________________________
>Swift-devel mailing list
>Swift-devel at ci.uchicago.edu
>http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel





More information about the Swift-devel mailing list