[Swift-devel] Re: email for Mike and Ian
Mike Wilde
wilde at mcs.anl.gov
Tue Jun 19 19:04:06 CDT 2007
So, at a practical level, what went wrong here and what do we do to
correct it?
The points below are perhaps a bit naive and reflect the sad fact
that I'm not currently a user. But to set guidelines for ourselves
and a growing community of users, should we:
- Run Swift from well-defined submit hosts
- Keep those hosts up to date with nightly builds
- stay in tune to bugzilla traffic to know when to jump to a new build
- is the run dir and/or logs clearly tagged with the build date?
- use only official builds if at all possible (unless you need to
include a fix thats not yet been included in a build?)
- what else.
Would it be useful to spell out good practices for Nika, Tibi, and
CNARI, MolDyn, and LQCD people?
Thanks,
Mike
Ben Clifford wrote, On 6/19/2007 6:51 PM:
> This looks like bug 49:
>
> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=49
>
> I just spent the evening tracking it down with Nika.
>
> As far as I can tell, that means she's been using a swift compiler that
> has been at least 2 months old right up until she just updated it this
> evening.
>
> *please* try to report problems against something resembling a recent
> checkout-and-build.
>
> Furthermore, when I finally tracked it down, turns out that its because of
> a bug in the SwiftScript source. I fix *exactly* this problem, here:
>
> Date: Sat, 28 Apr 2007 08:39:03 +0000 (GMT)
> From: Ben Clifford <benc at hawaga.org.uk>
> To: Veronika V. Nefedova <nefedova at mcs.anl.gov>
> Cc: swift-devel at ci.uchicago.edu
> Subject: Re: [Swift-devel] nightly built 070426
>
> *please* try to actually use bugfixes that people give you.
>
> Bad users!
>
> Go to your room!
>
>
> On Tue, 19 Jun 2007, Yong Zhao wrote:
>
>> I tried the restart feature yesterday and it seemed to work fine with the
>> MolDyn workflow. I am not sure what was the problem that you encountered.
>>
>> About the compile problem, maybe Ben can take a look since he made a few
>> changes to the translation.
>>
>> yong.
>>
>> On Tue, 19 Jun 2007, Veronika Nefedova wrote:
>>
>>> Yong,
>>>
>>> Ben asks me to test the restart feature that was failing before... I
>>> am wondering if its OK to do svn up and then rebuild vdsk? I do not
>>> want to break things... If its OK - should I do it in ~nefedova/vdsk
>>> (I assume)?
>>>
>>> Nika
>>>
>>> On Jun 19, 2007, at 4:31 PM, Yong Zhao wrote:
>>>
>>>> did you make sure that your path is set correctly? do a
>>>>
>>>> which swift
>>>>
>>>> On Tue, 19 Jun 2007, Veronika Nefedova wrote:
>>>>
>>>>> Yong,
>>>>>
>>>>> Any idea what could've caused it to fail:
>>>>>
>>>>> nefedova at viper:~/alamines> cat MolDyn-244-ctsmk1lnf2qa1.log
>>>>> 2007-06-19 16:11:19,256 INFO Loader MolDyn-244.dtm: source file is
>>>>> new. Recompiling.
>>>>> 2007-06-19 16:12:08,346 DEBUG Loader Detailed exception:
>>>>> java.lang.RuntimeException: Failed to convert .xml to .kml for
>>>>> MolDyn-244.dtm
>>>>> at org.griphyn.vdl.karajan.Loader.compile(Loader.java:209)
>>>>> at org.griphyn.vdl.karajan.Loader.main(Loader.java:108)
>>>>> Caused by: java.util.NoSuchElementException: no such attribute: nil
>>>>> in template context [call_arg]
>>>>> at org.antlr.stringtemplate.StringTemplate.rawSetAttribute
>>>>> (StringTemplate.java:643)
>>>>> at org.antlr.stringtemplate.StringTemplate.setAttribute
>>>>> (StringTemplate.java:539)
>>>>> at org.griphyn.vdl.engine.Karajan.setExprOrValue
>>>>> (Karajan.java:663)
>>>>> at org.griphyn.vdl.engine.Karajan.setExprOrValue
>>>>> (Karajan.java:638)
>>>>> at org.griphyn.vdl.engine.Karajan.actualParameter
>>>>> (Karajan.java:458)
>>>>> at org.griphyn.vdl.engine.Karajan.call(Karajan.java:351)
>>>>> at org.griphyn.vdl.engine.Karajan.statements(Karajan.java:
>>>>> 304)
>>>>> at org.griphyn.vdl.engine.Karajan.program(Karajan.java:117)
>>>>> at org.griphyn.vdl.engine.Karajan.main(Karajan.java:71)
>>>>> at org.griphyn.vdl.karajan.Loader.compile(Loader.java:199)
>>>>> ... 1 more
>>>>> nefedova at viper:~/alamines>
>>>>>
>>>>>
>>>>>
>>>>> The dtm file is generated by a script. The same script that generated
>>>>> the files for 1,20 and 100 molecules. Not sure why 244 is different.
>>>>> Everything is in my alamines dir on viper in home dir...
>>>>>
>>>>> Nika
>>>>>
>>>>> On Jun 19, 2007, at 4:00 PM, Yong Zhao wrote:
>>>>>
>>>>>> Everything is configured in Nika's directory:
>>>>>> ~nefedova/vdsk
>>>>>>
>>>>>> Just point VDS_HOME or SWIFT_HOME to /home/nefedova/vdsk, and the
>>>>>> rest
>>>>>> should be correctly configured in the etc directory.
>>>>>>
>>>>>> Yong.
>>>>>>
>>>>>> On Tue, 19 Jun 2007, Ioan Raicu wrote:
>>>>>>
>>>>>>> Yong, you are the one who ran the Swift workflow... can you make
>>>>>>> sure
>>>>>>> Nika has everything updated, or can you invoke the command form
>>>>>>> your
>>>>>>> environment?
>>>>>>>
>>>>>>> I have restarted Falkon and set it to 18 hours for 100 nodes (200
>>>>>>> workers).... its all up and running... there is a 2 hour idle
>>>>>>> time, so
>>>>>>> make sure to start the workflow in the next 2 hours so we don't
>>>>>>> loose
>>>>>>> the allocation.
>>>>>>>
>>>>>>> Falkon is in the same place as last night, tg-viz-login1 on 50001!
>>>>>>>
>>>>>>> Ioan
>>>>>>>
>>>>>>> Veronika Nefedova wrote:
>>>>>>>> Ok, I have the file ready. What workdir should I specify for TG
>>>>>>>> UC ?
>>>>>>>>
>>>>>>>> Nika
>>>>>>>>
>>>>>>>> On Jun 19, 2007, at 2:31 PM, Ioan Raicu wrote:
>>>>>>>>
>>>>>>>>> Hi guys,
>>>>>>>>> I need to go eat some lunch.... I'll be back in 30 min... but
>>>>>>>>> then
>>>>>>>>> I'll only be online until 4PM... so can you please look over that
>>>>>>>>> email, and send it back to me soon? Also, let's decide what
>>>>>>>>> to do
>>>>>>>>> about the next run, is 244 short mol run OK? Nika, can you prep
>>>>>>>>> the
>>>>>>>>> input data for this? ANL seems almost idle, only 4 nodes are in
>>>>>>>>> use,
>>>>>>>>> so we could easily et another 200 processors like last night :)
>>>>>>>>>
>>>>>>>>> Ioan
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> ============================================
>>>>>>>>> Ioan Raicu
>>>>>>>>> Ph.D. Student
>>>>>>>>> ============================================
>>>>>>>>> Distributed Systems Laboratory
>>>>>>>>> Computer Science Department
>>>>>>>>> University of Chicago
>>>>>>>>> 1100 E. 58th Street, Ryerson Hall
>>>>>>>>> Chicago, IL 60637
>>>>>>>>> ============================================
>>>>>>>>> Email: iraicu at cs.uchicago.edu
>>>>>>>>> Web: http://www.cs.uchicago.edu/~iraicu
>>>>>>>>> http://dsl.cs.uchicago.edu/
>>>>>>>>> ============================================
>>>>>>>>> ============================================
>>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>>> ============================================
>>>>>>> Ioan Raicu
>>>>>>> Ph.D. Student
>>>>>>> ============================================
>>>>>>> Distributed Systems Laboratory
>>>>>>> Computer Science Department
>>>>>>> University of Chicago
>>>>>>> 1100 E. 58th Street, Ryerson Hall
>>>>>>> Chicago, IL 60637
>>>>>>> ============================================
>>>>>>> Email: iraicu at cs.uchicago.edu
>>>>>>> Web: http://www.cs.uchicago.edu/~iraicu
>>>>>>> http://dsl.cs.uchicago.edu/
>>>>>>> ============================================
>>>>>>> ============================================
>>>>>>>
>>>>>>>
>>>>>
>>>
>>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>
--
Mike Wilde
Computation Institute, University of Chicago
Math & Computer Science Division
Argonne National Laboratory
Argonne, IL 60439 USA
tel 630-252-7497 fax 630-252-1997
More information about the Swift-devel
mailing list