[Swift-devel] Re: test v0.1rc1

Veronika V. Nefedova nefedova at mcs.anl.gov
Tue Feb 27 15:40:50 CST 2007


Yes, thank you to everybody who helped me !

The log shows these time stamps:
Start: 15:02:49 (i.e. first entry in the log)
Finish: 15:15:22 (i.e. the last entry in the log)

It is the total time (including all the staging in/out, waiting in the 
queue, etc). I need yet to find out what compute times were before swift.

Nika

At 03:33 PM 2/27/2007, Mike Wilde wrote:
>Awesome!  Nice work Nika and everyone!!!
>
>Just how "swift" was it?
>
>:) Mike
>
>^ thanks, I needed that!
>
>Veronika V. Nefedova wrote, On 2/27/2007 3:27 PM:
>>Workflow with 16 molecules finished on TG just swiftly (-;
>>At 03:03 PM 2/27/2007, Veronika  V. Nefedova wrote:
>>>I have just 16 molecules (at present) for testing - I am running all of 
>>>them now. Will let you know how did it go. I've already asked Yuqing to 
>>>send my way all 350 molecules (;
>>>
>>>Nika
>>>
>>>At 02:54 PM 2/27/2007, Mihael Hategan wrote:
>>>>On Tue, 2007-02-27 at 14:54 -0600, Veronika V. Nefedova wrote:
>>>> > It worked! Thank you very much for tracking it down, Mihael.
>>>> >
>>>> > Anyway -- a complete MolDyn workflow just finished its first run on
>>>> > Teragrid (for 3 molecules only).
>>>>
>>>>Groovy. Can it be pumped up?
>>>>
>>>> >
>>>> > (-;
>>>> >
>>>> > Nika
>>>> >
>>>> > At 02:19 PM 2/27/2007, Veronika  V. Nefedova wrote:
>>>> >
>>>> > >Thanks, Mihael!
>>>> > >I'll try it now
>>>> > >
>>>> > >Nika
>>>> > >
>>>> > >At 02:04 PM 2/27/2007, Mihael Hategan wrote:
>>>> > >>Try it now. The latest nightly build should contain the fix.
>>>> > >>
>>>> > >>The problem was an inner class having synchronized methods and me
>>>> > >>idiotically assuming that they will use the outer class' instance
>>>> > >>monitor.
>>>> > >>
>>>> > >>Mihael
>>>> > >>
>>>> > >>On Mon, 2007-02-26 at 22:17 -0600, Veronika V. Nefedova wrote:
>>>> > >> > If you give me your vds_home location - I can try to run the 
>>>> workflow and
>>>> > >> > see if its working...
>>>> > >> >
>>>> > >> > NIka
>>>> > >> > At 09:57 PM 2/26/2007, Mihael Hategan wrote:
>>>> > >> > >Hmm. I made a change to the code that did not seem to be the 
>>>> cause, but
>>>> > >> > >some other, smaller issue and enabled some more debugging in 
>>>> log4j. With
>>>> > >> > >this, I've been running the workflow in a loop on wiggum for 
>>>> two hours
>>>> > >> > >now, and got nothing yet. I don't know what to make of it.
>>>> > >> > >
>>>> > >> > >I'll keep running and eventually revert the changes to see if 
>>>> they are
>>>> > >> > >the source.
>>>> > >> > >
>>>> > >> > >Mihael
>>>> > >> > >
>>>> > >> > >On Mon, 2007-02-26 at 14:47 -0600, Mihael Hategan wrote:
>>>> > >> > > > On Mon, 2007-02-26 at 14:46 -0600, Veronika V. Nefedova wrote:
>>>> > >> > > > > An additional info: This failure happened on TG with 
>>>> 070219 when
>>>> > >> I was
>>>> > >> > > > > running 2 molecules at the same time (i.e. two executables at
>>>> > >> the same
>>>> > >> > > > > time). When I tried to run just one, it failed with the same
>>>> > >> > > exitcode, but
>>>> > >> > > > > didn't have that handle exception:
>>>> > >> > > >
>>>> > >> > > > Right. This seems like a different problem, and I'm not 
>>>> sure if it's
>>>> > >> > > > Swift or some problem with TP or the application. That 
>>>> needs to be
>>>> > >> > > > investigated.
>>>> > >> > > >
>>>> > >> > > > >
>>>> > >> > > > > 2007-02-26 14:34:41,986 DEBUG vdl:execute2 Application
>>>> > >> exception: Job
>>>> > >> > > chrm
>>>> > >> > > > > failed with an exit code of 174
>>>> > >> > > > >          sys:throw @ vdl-int.k, line: 108
>>>> > >> > > > >          vdl:checkexitcode @ vdl-int.k, line: 367
>>>> > >> > > > >          vdl:execute2 @ execute-default.k, line: 22
>>>> > >> > > > >          vdl:execute @ swift-MolDyn.kml, line: 69
>>>> > >> > > > >          charmm @ swift-MolDyn.kml, line: 279
>>>> > >> > > > >          vdl:mains @ swift-MolDyn.kml, line: 261
>>>> > >> > > > > <here it re-tries it>
>>>> > >> > > > >
>>>> > >> > > > > Again, the failure with 070219 happens only on TG, on 
>>>> localhost
>>>> > >> (wiggum)
>>>> > >> > > > > its working just fine.
>>>> > >> > > > >
>>>> > >> > > > > Nika
>>>> > >> > > > >
>>>> > >> > > > >
>>>> > >> > > > > At 02:38 PM 2/26/2007, Mihael Hategan wrote:
>>>> > >> > > > > >That's fine. Just wanted to be clear that we're talking 
>>>> about
>>>> > >> the same
>>>> > >> > > > > >error. It's good that it also occurs in 070219, because 
>>>> there
>>>> > >> are no
>>>> > >> > > > > >recent changes I could remember that could trigger it. It's
>>>> > >> also good to
>>>> > >> > > > > >know that it may or may not occur, because I know 
>>>> approximately
>>>> > >> what
>>>> > >> > > > > >class of problem we're dealing with.
>>>> > >> > > > > >
>>>> > >> > > > > >Mihael
>>>> > >> > > > > >
>>>> > >> > > > > >On Mon, 2007-02-26 at 14:37 -0600, Veronika V. Nefedova 
>>>> wrote:
>>>> > >> > > > > > > Yes, I didn't paste it -- its all in the log. If 
>>>> you'd like
>>>> > >> I can
>>>> > >> > > send you
>>>> > >> > > > > > > the log as an attachment...
>>>> > >> > > > > > >
>>>> > >> > > > > > > Nika
>>>> > >> > > > > > >
>>>> > >> > > > > > > At 02:33 PM 2/26/2007, Mihael Hategan wrote:
>>>> > >> > > > > > > >Wait, because I'm missing something. Wasn't the error
>>>> > >> supposed to be
>>>> > >> > > > > > > >"TaskHandler can only handle unsubmitted tasks"?
>>>> > >> > > > > > > >
>>>> > >> > > > > > > >On Mon, 2007-02-26 at 14:26 -0600, Veronika V. 
>>>> Nefedova wrote:
>>>> > >> > > > > > > > > And now its getting interesting!
>>>> > >> > > > > > > > >
>>>> > >> > > > > > > > > I have now the same failure (as below) with 
>>>> 070219 as I
>>>> > >> had on
>>>> > >> > > > > > localhost
>>>> > >> > > > > > > > > with v0.1rc1 *BUT* when running on TG. Failed at 
>>>> the same
>>>> > >> > > point (while
>>>> > >> > > > > > > > > trying to run the last app in the workflow), with 
>>>> the same
>>>> > >> > > exceptions.
>>>> > >> > > > > > > > > Strange that 070219 worked on localhost (and 
>>>> still working).
>>>> > >> > > > > > > > >
>>>> > >> > > > > > > > > The log is on wiggum:
>>>> > >> > > > > > > > /sandbox/ydeng/alamines/swift-MolDyn-690y7r1skc8z0.log
>>>> > >> > > > > > > > >
>>>> > >> > > > > > > > > 2007-02-26 14:10:16,543 INFO  vdl:execute2 
>>>> Running job
>>>> > >> > > > > > chrm-rmnoet7i chrm
>>>> > >> > > > > > > > > with arguments [system:solv_m001, title:solv, 
>>>> stitle:m001,
>>>> > >> > > > > > > > > rtffile:parm03_gaff_all.rtf,
>>>> > >> paramfile:parm03_gaffnb_all.prm,
>>>> > >> > > > > > > > > gaff:m001_am1, nwater:400, ligcrd:lyz, rforce:0,
>>>> > >> iseed:3131887,
>>>> > >> > > > > > rwater:15,
>>>> > >> > > > > > > > > nstep:100, minstep:100, skipstep:100, 
>>>> startstep:10000] in
>>>> > >> > > > > > > > > swift-MolDyn-690y7r1skc8z0/chrm-rmnoet7i on TG-NCSA
>>>> > >> > > > > > > > > 2007-02-26 14:11:18,586 DEBUG vdl:execute2 
>>>> Application
>>>> > >> > > exception:
>>>> > >> > > > > > Job chrm
>>>> > >> > > > > > > > > failed with an exit code of 174
>>>> > >> > > > > > > > > <snip>
>>>> > >> > > > > > > > > All input files are staged in...
>>>> > >> > > > > > > > >
>>>> > >> > > > > > > > >
>>>> > >> > > > > > > > > Nika
>>>> > >> > > > > > > > >
>>>> > >> > > > > > > > > At 02:17 PM 2/26/2007, Veronika  V. Nefedova wrote:
>>>> > >> > > > > > > > > >You can try to run my application, or look in 
>>>> the logs. I
>>>> > >> > > ran it
>>>> > >> > > > > > all on
>>>> > >> > > > > > > > > >wiggum. The log is:
>>>> > >> > > > > > > > > >/sandbox/ydeng/alamines/swift-MolDyn-8q6ygr7cy15c 
>>>> 2.log
>>>> > >> > > > > > > > > >
>>>> > >> > > > > > > > > >the dtm file I am running is
>>>> > >> > > /sandbox/ydeng/alamines/swift-MolDyn.dtm
>>>> > >> > > > > > > > > >
>>>> > >> > > > > > > > > >Nika
>>>> > >> > > > > > > > > >
>>>> > >> > > > > > > > > >At 01:39 PM 2/26/2007, Mihael Hategan wrote:
>>>> > >> > > > > > > > > >>That doesn't sound good. How do I reproduce this?
>>>> > >> > > > > > > > > >>
>>>> > >> > > > > > > > > >>Mihael
>>>> > >> > > > > > > > > >>
>>>> > >> > > > > > > > > >>On Mon, 2007-02-26 at 13:21 -0600, Veronika V.
>>>> > >> Nefedova wrote:
>>>> > >> > > > > > > > > >> > The one Ben asked us all to test:
>>>> > >> > > > > > > > > >> >
>>>> > >> > > > > > > > > >> > >http://www.ci.uchicago.edu/swift/tests/vdsk- 
>>>> 0. 1 rc1.
>>>> > >> t ar.gz
>>>> > >> > > > > > > > > >> >
>>>> > >> > > > > > > > > >> > At 01:15 PM 2/26/2007, Mihael Hategan wrote:
>>>> > >> > > > > > > > > >> > >On Mon, 2007-02-26 at 13:05 -0600, Veronika V.
>>>> > >> Nefedova
>>>> > >> > > wrote:
>>>> > >> > > > > > > > > >> > > > When I tried to run my working workflow 
>>>> with a new
>>>> > >> > > version, it
>>>> > >> > > > > > > > > >> gave me an
>>>> > >> > > > > > > > > >> > > > exception:
>>>> > >> > > > > > > > > >> > >
>>>> > >> > > > > > > > > >> > >Which new version?
>>>> > >> > > > > > > > > >> > >
>>>> > >> > > > > > > > > >> > >Mihael
>>>> > >> > > > > > > > > >> > >
>>>> > >> > > > > > > > > >> > > >
>>>> > >> > > > > > > > > >> > > > Warning: Task handler throws exception 
>>>> but does not
>>>> > >> > > set status
>>>> > >> > > > > > > > > >> > > >
>>>> > >> > > > > > > >
>>>> > >> > > 
>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>>>> > >> > > > > > > > > >> > > > TaskHandler can only handle unsubmitted 
>>>> tasks
>>>> > >> > > > > > > > > >> > > >          at
>>>> > >> > > > > > > > > >> > > >
>>>> > >> > > > > > > > > >> > >
>>>> > >> > > > > > > > > >>
>>>> > >> > > > > > > >
>>>> > >> > > > > >
>>>> > >> > >
>>>> > >> 
>>>> org.globus.cog.abstraction.impl.common.task.CachingFileOperationTaskHandler.submit(CachingFileOperationTaskHandler.java:20) 
>>>>
>>>> > >> > > > > > > > > >> > > >          at
>>>> > >> > > > > > > > > >> > > >
>>>> > >> > > > > > > > > >> > >
>>>> > >> > > > > > > > > >>
>>>> > >> > > > > > > >
>>>> > >> > > > > >
>>>> > >> > >
>>>> > >> 
>>>> org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:78) 
>>>>
>>>> > >> > > > > > > > > >> > > >          at 
>>>> java.lang.Thread.run(Thread.java:534)
>>>> > >> > > > > > > > > >> > > >
>>>> > >> > > > > > > > > >> > > > [349] wiggum /sandbox/ydeng/alamines > \\
>>>> > >> > > > > > > > > >> > > >
>>>> > >> > > > > > > > > >> > > >
>>>> > >> > > > > > > > > >> > > > I do not have this happening with 070219 
>>>> built.
>>>> > >> > > > > > > > > >> > > >
>>>> > >> > > > > > > > > >> > > > Nika
>>>> > >> > > > > > > > > >> > > >
>>>> > >> > > > > > > > > >> > > > At 06:12 AM 2/26/2007, Ben Clifford wrote:
>>>> > >> > > > > > > > > >> > > >
>>>> > >> > > > > > > > > >> > > > >On Mon, 26 Feb 2007, Ben Clifford wrote:
>>>> > >> > > > > > > > > >> > > > >
>>>> > >> > > > > > > > > >> > > > > >
>>>> > >> > > > > > > > > >> > > > > > v0.1rc1 was built at the end of last 
>>>> week.
>>>> > >> > > please spend
>>>> > >> > > > > > > > some time
>>>> > >> > > > > > > > > >> > > testing
>>>> > >> > > > > > > > > >> > > > >
>>>> > >> > > > > > > > > >> > > > >here's the URL for download:
>>>> > >> > > > > > > > > >> > > > >
>>>> > >> > > > > > > > > >> > > > >http://www.ci.uchicago.edu/swift/tests/v 
>>>> ds k -0.1
>>>> > >> r c1.t
>>>> > >> > > ar.gz
>>>> > >> > > > > > > > > >> > > > >
>>>> > >> > > > > > > > > >> > > > >--
>>>> > >> > > > > > > > > >> > > >
>>>> > >> > > > > > > > > >> > > >
>>>> > >> > > > > > > > > >> > > > 
>>>> _______________________________________________
>>>> > >> > > > > > > > > >> > > > Swift-devel mailing list
>>>> > >> > > > > > > > > >> > > > Swift-devel at ci.uchicago.edu
>>>> > >> > > > > > > > > >> > > >
>>>> > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>> > >> > > > > > > > > >> > > >
>>>> > >> > > > > > > > > >> >
>>>> > >> > > > > > > > > >> >
>>>> > >> > > > > > > > > >
>>>> > >> > > > > > > > > >
>>>> > >> > > > > > > > > >_______________________________________________
>>>> > >> > > > > > > > > >Swift-devel mailing list
>>>> > >> > > > > > > > > >Swift-devel at ci.uchicago.edu
>>>> > >> > > > > > > > > >http://mail.ci.uchicago.edu/mailman/listinfo/swif 
>>>> t- d evel
>>>> > >> > > > > > > > >
>>>> > >> > > > > > > > >
>>>> > >> > > > > > >
>>>> > >> > > > > > >
>>>> > >> > > > >
>>>> > >> > > > >
>>>> > >> > > >
>>>> > >> > > > _______________________________________________
>>>> > >> > > > Swift-devel mailing list
>>>> > >> > > > Swift-devel at ci.uchicago.edu
>>>> > >> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>> > >> > > >
>>>> > >> >
>>>> > >> >
>>>> > >
>>>> > >
>>>> > >_______________________________________________
>>>> > >Swift-devel mailing list
>>>> > >Swift-devel at ci.uchicago.edu
>>>> > >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>> >
>>>> >
>>>
>>>
>>>_______________________________________________
>>>Swift-devel mailing list
>>>Swift-devel at ci.uchicago.edu
>>>http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>>_______________________________________________
>>Swift-devel mailing list
>>Swift-devel at ci.uchicago.edu
>>http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>--
>Mike Wilde
>Computation Institute, University of Chicago
>Math & Computer Science Division
>Argonne National Laboratory
>Argonne, IL   60439    USA
>tel 630-252-7497 fax 630-252-1997





More information about the Swift-devel mailing list