[Swift-devel] Q about MolDyn

Ioan Raicu iraicu at cs.uchicago.edu
Wed Aug 8 15:35:45 CDT 2007


Did you try just a small workflow to test? 
It looks to be idle

13014.996 0 1 42 188 188 0 0 0.0 0.0 0.0 0.0 489.0 0.0
13015.996 0 1 42 188 188 0 0 0.0 0.0 0.0 0.0 489.0 0.0
13016.996 0 1 42 188 188 0 0 0.0 0.0 0.0 0.0 489.0 0.0
13017.996 0 1 42 188 188 0 0 0.0 0.0 0.0 0.0 489.0 0.0
13018.996 0 1 42 188 188 0 0 0.0 0.0 0.0 0.0 489.0 0.0
13019.996 0 1 42 188 188 0 0 0.0 0.0 0.0 0.0 489.0 0.0

with 489 jobs completed... is this normal?

Veronika Nefedova wrote:
> anyway - I fixed the log4j.properties file and started the run
>
> Nika
>
> On Aug 8, 2007, at 2:20 PM, Ioan Raicu wrote:
>
>> All my work was related to the deef-provider... I did not touch 
>> anything else!
>>
>> in the folder
>> nefedova at viper:~/cogl/modules/provider-deef
>>
>> I did:
>>
>> cp yongs_source_files 
>> src/org/globus/cog/abstraction/impl/execution/deef/
>> svn update
>> ant distclean
>> ant -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/
>>
>> Now why would this screw up your logging or anything else in Swift?  
>> Unless it screwed something up in the deef-provider (which was 
>> already screwed up prior).  Now, the message "booting deef" comes 
>> from Boot.java.  This file was from SVN, as Mihael modified it a few 
>> days ago, so Yong's Boot.java was not carried over.  Should I have 
>> used the older Boot.java (Yong's version from July 26th)?  If this is 
>> not the issue, and its something else related to the deef-provider, 
>> you can find the old deef-provider that you had before at:
>> viper:/home/nefedova/cogl/modules/provider-deef_8-8-07_svn
>>
>> Ioan
>> PS: I don't have rights to commit changes to SVN, so if you don't 
>> want me to make any more changes to your Swift install, we can wait 
>> until I get the right to commit my changes so you can see them and 
>> pull them in yourself through SVN.
>>
>> Veronika Nefedova wrote:
>>> the current changes screwed up my logging again...
>>> Please, do not touch my install --- I'd rather get everything from SVN,
>>>
>>> nefedova at viper:~/alamines> swift -tc.file tc-uc.data -sites.file 
>>> sites-uc-64.xml -debug MolDyn-244-loops.swift&
>>> [1] 10562
>>> nefedova at viper:~/alamines> WARN   - Failed to configure log file name
>>> DEBUG  - Booting deef
>>>
>>>
>>> Nika
>>>
>>> On Aug 8, 2007, at 1:19 PM, Mihael Hategan wrote:
>>>
>>>> On Wed, 2007-08-08 at 13:04 -0500, Ioan Raicu wrote:
>>>>> Shouldn't we be certain that things work before we commit the 
>>>>> changes?
>>>>
>>>> No.
>>>>
>>>>>   I thought the commit would take place after we try MolDyn out 
>>>>> and we
>>>>> see things are back to normal.
>>>>
>>>> The whole problem we've seen the past few days was due to the fact 
>>>> that
>>>> Nika had no clear place to get the code from, so she repeatedly 
>>>> ended up
>>>> with broken versions. S o  p u t  t h e  c h a n g e s  i n  S V N !
>>>>
>>>>>
>>>>> Ioan
>>>>>
>>>>> Mihael Hategan wrote:
>>>>>> On Wed, 2007-08-08 at 11:59 -0500, Ioan Raicu wrote:
>>>>>>
>>>>>>> OK everyone, I found Yong's version of the provider dated July 
>>>>>>> 26th,
>>>>>>> much more recent than what was in SVN on June 27th.  I updated 
>>>>>>> Nika's
>>>>>>> version of the provider (which has been checked out of SVN),
>>>>>>>
>>>>>>
>>>>>> No. P u t  t h e  c h a n g e s  i n  S V N !
>>>>>>
>>>>>>
>>>>>>> and recompiled&deploy!
>>>>>>>
>>>>>>>   ant distclean
>>>>>>>   ant 
>>>>>>> -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/
>>>>>>> dist
>>>>>>>
>>>>>>> I even updated updated some of the logging info to use the logger
>>>>>>> (some were not using the logger).
>>>>>>>
>>>>>>> Nika, Falkon is freshly restarted and ready for another test run!
>>>>>>>
>>>>>>> Falkon Factory Service:
>>>>>>> http://tg-viz-login2.uc.teragrid.org:50020/wsrf/services/GenericPortal/core/WS/GPFactoryService 
>>>>>>>
>>>>>>> Web Server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm
>>>>>>>
>>>>>>> Ioan
>>>>>>>
>>>>>>> Veronika Nefedova wrote:
>>>>>>>
>>>>>>>> Ioan,
>>>>>>>>
>>>>>>>>
>>>>>>>> It looks like the Falcon (including provider-deef) was put in 
>>>>>>>> SVN on
>>>>>>>> June 27th. You really were supposed to use the SVN code from that
>>>>>>>> point. Sigh. Did you do any changes to viper install after June
>>>>>>>> 27th?
>>>>>>>>
>>>>>>>>
>>>>>>>> Nika
>>>>>>>>
>>>>>>>> On Aug 7, 2007, at 11:32 AM, Ioan Raicu wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> Could it be that the fixes were done before the original SVN
>>>>>>>>> checkin?   If not, then at least we know why things aren't
>>>>>>>>> working.  I bet the latest provider source was in Nika's Swift
>>>>>>>>> install on viper.  Nika, I take it you don't have this 
>>>>>>>>> anymore, as
>>>>>>>>> SVN updates overwrote this.  Yong, is there any other place you
>>>>>>>>> might have the latest provider source?  If not, I guess we 
>>>>>>>>> need to
>>>>>>>>> take another look through the provider source to fix the issues
>>>>>>>>> that we knew of...
>>>>>>>>>
>>>>>>>>> Ioan
>>>>>>>>>
>>>>>>>>> Mihael Hategan wrote:
>>>>>>>>>
>>>>>>>>>> Well, it doesn't look like the falkon provider in SVN has 
>>>>>>>>>> been updated
>>>>>>>>>> at all in terms of fixing synchronization issues. All commits on
>>>>>>>>>> provider-deef come from either ben or me:
>>>>>>>>>>
>>>>>>>>>> bash-3.1$ svn log
>>>>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>>>>
>>>>>>>>>> r1053 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:49:48 -0500 
>>>>>>>>>> (Fri, 03 Aug
>>>>>>>>>> 2007) | 1 line
>>>>>>>>>>
>>>>>>>>>> removed gt4 stuff and added them as a dependency
>>>>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>>>>
>>>>>>>>>> r1052 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:48:25 -0500 
>>>>>>>>>> (Fri, 03 Aug
>>>>>>>>>> 2007) | 1 line
>>>>>>>>>>
>>>>>>>>>> removed gt4 stuff and added them as a dependency
>>>>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>>>>
>>>>>>>>>> r1051 | benc at CI.UCHICAGO.EDU | 2007-08-03 14:20:21 -0500 
>>>>>>>>>> (Fri, 03 Aug
>>>>>>>>>> 2007) | 1 line
>>>>>>>>>>
>>>>>>>>>> a very small readme for provider-deef
>>>>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>>>>
>>>>>>>>>> r875 | benc at CI.UCHICAGO.EDU | 2007-06-27 15:00:12 -0500 (Wed, 
>>>>>>>>>> 27 Jun
>>>>>>>>>> 2007) | 1 line
>>>>>>>>>>
>>>>>>>>>> remove dist directory form svn
>>>>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>>>>
>>>>>>>>>> r873 | benc at CI.UCHICAGO.EDU | 2007-06-27 10:23:15 -0500 (Wed, 
>>>>>>>>>> 27 Jun
>>>>>>>>>> 2007) | 20 lines
>>>>>>>>>>
>>>>>>>>>> provider-deef, the Falkon/cog provider
>>>>>>>>>>
>>>>>>>>>> based on source in below message, with .class files deleted
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Date: Wed, 27 Jun 2007 09:27:23 -0500
>>>>>>>>>> From: Veronika Nefedova <nefedova at mcs.anl.gov>
>>>>>>>>>> To: Yong Zhao <yongzh at cs.uchicago.edu>
>>>>>>>>>> Cc: Ben Clifford <benc at hawaga.org.uk>, Mihael Hategan
>>>>>>>>>> <hategan at mcs.anl.gov>,
>>>>>>>>>>     iraicu at cs.uchicago.edu, Ian Foster <foster at mcs.anl.gov>,
>>>>>>>>>>     Mike Wilde <wilde at mcs.anl.gov>,
>>>>>>>>>>     Tiberiu Stef-Praun <tiberius at ci.uchicago.edu>
>>>>>>>>>> Subject: Re: 244 molecule MolDyn run...
>>>>>>>>>>
>>>>>>>>>> its on viper.uchicago.edu
>>>>>>>>>> in : /home/nefedova/cogl/modules/provider-deef/
>>>>>>>>>> I also tared it up and put in my home on terminable: 
>>>>>>>>>> ~nefedova/cogl.tgz
>>>>>>>>>>
>>>>>>>>>> Nika
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Mihael, do you have any clues on why this run has failed? 
>>>>>>>>>>> Ioan - my
>>>>>>>>>>> answers to your questions are below...
>>>>>>>>>>>
>>>>>>>>>>> On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> It looks like viper (where Swift is running) is idle, and 
>>>>>>>>>>>> so is tg-
>>>>>>>>>>>> viz-login2 (where Falkon is running).
>>>>>>>>>>>> What looks evident to me is that the normal list of events 
>>>>>>>>>>>> is for a
>>>>>>>>>>>> successful task:
>>>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "urn:
>>>>>>>>>>>> 0-1-73-2-31-0-0-1186444341989" 
>>>>>>>>>>>> MolDyn-244-loops-zhgo6be8tjhi1.log
>>>>>>>>>>>> 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, 
>>>>>>>>>>>> identity=urn:
>>>>>>>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Submitted
>>>>>>>>>>>> 2007-08-06 20:58:17,685 DEBUG NotificationThread 
>>>>>>>>>>>> notification: urn:
>>>>>>>>>>>> 0-1-73-2-31-0-0-1186444341989 0
>>>>>>>>>>>> 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, 
>>>>>>>>>>>> identity=urn:
>>>>>>>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Completed
>>>>>>>>>>>>
>>>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to
>>>>>>>>>>>> Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc
>>>>>>>>>>>>  17566  175660 2179412
>>>>>>>>>>>>
>>>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "NotificationThread
>>>>>>>>>>>> notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc
>>>>>>>>>>>>   7959   55713  785035
>>>>>>>>>>>>
>>>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to
>>>>>>>>>>>> Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc
>>>>>>>>>>>> 190968 1909680 24003796
>>>>>>>>>>>>
>>>>>>>>>>>> Now, 17566 tasks were submitted, 7959 notifiation were 
>>>>>>>>>>>> received
>>>>>>>>>>>> from Falkon, and 190968 tasks were set to completed...
>>>>>>>>>>>>
>>>>>>>>>>>> Obviously this isn't right.  Falkon only saw 7959 tasks, so 
>>>>>>>>>>>> I would
>>>>>>>>>>>> argue that the # of notifications received is correct.  The
>>>>>>>>>>>> submitted # of tasks looks like the # I would have 
>>>>>>>>>>>> expected, but
>>>>>>>>>>>> all the tasks did not make it to Falkon.  The Falkon 
>>>>>>>>>>>> provider is
>>>>>>>>>>>> what sits between the change of status to submitted, and the
>>>>>>>>>>>> receipt of the notification, so I would say that is the 
>>>>>>>>>>>> first place
>>>>>>>>>>>> we need to look for more details... there used to some 
>>>>>>>>>>>> extra debug
>>>>>>>>>>>> info in the Falkon provider that simply printed all the 
>>>>>>>>>>>> tasks that
>>>>>>>>>>>> were actually being submitted to Falkon (as opposed to just 
>>>>>>>>>>>> the
>>>>>>>>>>>> change of status within Karajan).  I don't see those debug
>>>>>>>>>>>> statements, I bet they got overwritten in the SVN update.
>>>>>>>>>>>> What about the completed tasks, why are there so many (190K)
>>>>>>>>>>>> completed tasks?  Where did they come from?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> "Task" doesn't mean job. It could be just data being staged 
>>>>>>>>>>> in , etc.
>>>>>>>>>>> The first 2 are important -- (Submitted vs Completed). Since it
>>>>>>>>>>> differs, this is the problem...
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Yong, are you keeping up with these emails?  Do you still 
>>>>>>>>>>>> have a
>>>>>>>>>>>> copy of the latest Falkon provider that you edited just 
>>>>>>>>>>>> before you
>>>>>>>>>>>> left?  Can you just take a look through there to make sure 
>>>>>>>>>>>> nothing
>>>>>>>>>>>> has been broken with the SVN updates?  If you don't have 
>>>>>>>>>>>> time for
>>>>>>>>>>>> this now (considering today was your first day on the new 
>>>>>>>>>>>> job),
>>>>>>>>>>>> I'll dig through there and see if I can make some sense of 
>>>>>>>>>>>> what is
>>>>>>>>>>>> happening!
>>>>>>>>>>>>
>>>>>>>>>>>> One last thing, Ben mentioned that the Falkon provider you 
>>>>>>>>>>>> saw in
>>>>>>>>>>>> Nika's account was different than what was in SVN.  Ben, 
>>>>>>>>>>>> did you at
>>>>>>>>>>>> least look at modification dates?  How old was one as 
>>>>>>>>>>>> opposed to
>>>>>>>>>>>> the other?  I hope we did not revert back to an older 
>>>>>>>>>>>> version that
>>>>>>>>>>>> might have had some bug in it....
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> I had to update to the latest version of provider-deef from 
>>>>>>>>>>> SVN since
>>>>>>>>>>> without the update nothing worked. The version I am at now 
>>>>>>>>>>> is 1050.
>>>>>>>>>>> But this is exactly the same version of swift/deef I used 
>>>>>>>>>>> for our
>>>>>>>>>>> Friday run (which 'worked' from Falcon/Swift point of view)
>>>>>>>>>>>
>>>>>>>>>>> Nika
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Ioan
>>>>>>>>>>>>
>>>>>>>>>>>> Veronika Nefedova wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Well, there are some discrepancies:
>>>>>>>>>>>>>
>>>>>>>>>>>>> nefedova at viper:~/alamines> grep "Completed job" 
>>>>>>>>>>>>> MolDyn-244-loops-
>>>>>>>>>>>>> zhgo6be8tjhi1.log | wc
>>>>>>>>>>>>>    7959  244749 3241072
>>>>>>>>>>>>> nefedova at viper:~/alamines> grep "Running job" 
>>>>>>>>>>>>> MolDyn-244-loops-
>>>>>>>>>>>>> zhgo6be8tjhi1.log | wc
>>>>>>>>>>>>>   17207  564648 7949388
>>>>>>>>>>>>> nefedova at viper:~/alamines>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I.e. almost half of the jobs haven't finished (according 
>>>>>>>>>>>>> to swift)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I also have some exceptions:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, 
>>>>>>>>>>>>> identity=urn:
>>>>>>>>>>>>> 0-1-101-2-37-0-0-1186444363341) setting status to Failed 
>>>>>>>>>>>>> Exception
>>>>>>>>>>>>> in getFile
>>>>>>>>>>>>> (80 of those):
>>>>>>>>>>>>> nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops-
>>>>>>>>>>>>> zhgo6be8tjhi1.log | wc
>>>>>>>>>>>>>      80     880    9705
>>>>>>>>>>>>> nefedova at viper:~/alamines>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Nika
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Swift-devel mailing list
>>>>>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Swift-devel mailing list
>>>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>>
>>
>
>



More information about the Swift-devel mailing list