[Swift-devel] Q about MolDyn
Ioan Raicu
iraicu at cs.uchicago.edu
Wed Aug 8 17:36:05 CDT 2007
viper in Yong's account... he ran some tests just before he left with
this version, and it worked just fine!
I saved Nika's provider which I replaced, so we can always go back to
that if we need to.
Ioan
Mihael Hategan wrote:
> Where exactly is this version?
>
> On Wed, 2007-08-08 at 11:59 -0500, Ioan Raicu wrote:
>
>> OK everyone, I found Yong's version of the provider dated July 26th,
>> much more recent than what was in SVN on June 27th. I updated Nika's
>> version of the provider (which has been checked out of SVN), and
>> recompiled&deploy!
>>
>> ant distclean
>> ant -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/
>> dist
>>
>> I even updated updated some of the logging info to use the logger
>> (some were not using the logger).
>>
>> Nika, Falkon is freshly restarted and ready for another test run!
>>
>> Falkon Factory Service:
>> http://tg-viz-login2.uc.teragrid.org:50020/wsrf/services/GenericPortal/core/WS/GPFactoryService
>> Web Server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm
>>
>> Ioan
>>
>> Veronika Nefedova wrote:
>>
>>> Ioan,
>>>
>>>
>>> It looks like the Falcon (including provider-deef) was put in SVN on
>>> June 27th. You really were supposed to use the SVN code from that
>>> point. Sigh. Did you do any changes to viper install after June
>>> 27th?
>>>
>>>
>>> Nika
>>>
>>> On Aug 7, 2007, at 11:32 AM, Ioan Raicu wrote:
>>>
>>>
>>>> Could it be that the fixes were done before the original SVN
>>>> checkin? If not, then at least we know why things aren't
>>>> working. I bet the latest provider source was in Nika's Swift
>>>> install on viper. Nika, I take it you don't have this anymore, as
>>>> SVN updates overwrote this. Yong, is there any other place you
>>>> might have the latest provider source? If not, I guess we need to
>>>> take another look through the provider source to fix the issues
>>>> that we knew of...
>>>>
>>>> Ioan
>>>>
>>>> Mihael Hategan wrote:
>>>>
>>>>> Well, it doesn't look like the falkon provider in SVN has been updated
>>>>> at all in terms of fixing synchronization issues. All commits on
>>>>> provider-deef come from either ben or me:
>>>>>
>>>>> bash-3.1$ svn log
>>>>> ------------------------------------------------------------------------
>>>>> r1053 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:49:48 -0500 (Fri, 03 Aug
>>>>> 2007) | 1 line
>>>>>
>>>>> removed gt4 stuff and added them as a dependency
>>>>> ------------------------------------------------------------------------
>>>>> r1052 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:48:25 -0500 (Fri, 03 Aug
>>>>> 2007) | 1 line
>>>>>
>>>>> removed gt4 stuff and added them as a dependency
>>>>> ------------------------------------------------------------------------
>>>>> r1051 | benc at CI.UCHICAGO.EDU | 2007-08-03 14:20:21 -0500 (Fri, 03 Aug
>>>>> 2007) | 1 line
>>>>>
>>>>> a very small readme for provider-deef
>>>>> ------------------------------------------------------------------------
>>>>> r875 | benc at CI.UCHICAGO.EDU | 2007-06-27 15:00:12 -0500 (Wed, 27 Jun
>>>>> 2007) | 1 line
>>>>>
>>>>> remove dist directory form svn
>>>>> ------------------------------------------------------------------------
>>>>> r873 | benc at CI.UCHICAGO.EDU | 2007-06-27 10:23:15 -0500 (Wed, 27 Jun
>>>>> 2007) | 20 lines
>>>>>
>>>>> provider-deef, the Falkon/cog provider
>>>>>
>>>>> based on source in below message, with .class files deleted
>>>>>
>>>>>
>>>>> Date: Wed, 27 Jun 2007 09:27:23 -0500
>>>>> From: Veronika Nefedova <nefedova at mcs.anl.gov>
>>>>> To: Yong Zhao <yongzh at cs.uchicago.edu>
>>>>> Cc: Ben Clifford <benc at hawaga.org.uk>, Mihael Hategan
>>>>> <hategan at mcs.anl.gov>,
>>>>> iraicu at cs.uchicago.edu, Ian Foster <foster at mcs.anl.gov>,
>>>>> Mike Wilde <wilde at mcs.anl.gov>,
>>>>> Tiberiu Stef-Praun <tiberius at ci.uchicago.edu>
>>>>> Subject: Re: 244 molecule MolDyn run...
>>>>>
>>>>> its on viper.uchicago.edu
>>>>> in : /home/nefedova/cogl/modules/provider-deef/
>>>>> I also tared it up and put in my home on terminable: ~nefedova/cogl.tgz
>>>>>
>>>>> Nika
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>>
>>>>> On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote:
>>>>>
>>>>>
>>>>>> Mihael, do you have any clues on why this run has failed? Ioan - my
>>>>>> answers to your questions are below...
>>>>>>
>>>>>> On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> It looks like viper (where Swift is running) is idle, and so is tg-
>>>>>>> viz-login2 (where Falkon is running).
>>>>>>> What looks evident to me is that the normal list of events is for a
>>>>>>> successful task:
>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "urn:
>>>>>>> 0-1-73-2-31-0-0-1186444341989" MolDyn-244-loops-zhgo6be8tjhi1.log
>>>>>>> 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, identity=urn:
>>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Submitted
>>>>>>> 2007-08-06 20:58:17,685 DEBUG NotificationThread notification: urn:
>>>>>>> 0-1-73-2-31-0-0-1186444341989 0
>>>>>>> 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, identity=urn:
>>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Completed
>>>>>>>
>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to
>>>>>>> Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc
>>>>>>> 17566 175660 2179412
>>>>>>>
>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "NotificationThread
>>>>>>> notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc
>>>>>>> 7959 55713 785035
>>>>>>>
>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to
>>>>>>> Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc
>>>>>>> 190968 1909680 24003796
>>>>>>>
>>>>>>> Now, 17566 tasks were submitted, 7959 notifiation were received
>>>>>>> from Falkon, and 190968 tasks were set to completed...
>>>>>>>
>>>>>>> Obviously this isn't right. Falkon only saw 7959 tasks, so I would
>>>>>>> argue that the # of notifications received is correct. The
>>>>>>> submitted # of tasks looks like the # I would have expected, but
>>>>>>> all the tasks did not make it to Falkon. The Falkon provider is
>>>>>>> what sits between the change of status to submitted, and the
>>>>>>> receipt of the notification, so I would say that is the first place
>>>>>>> we need to look for more details... there used to some extra debug
>>>>>>> info in the Falkon provider that simply printed all the tasks that
>>>>>>> were actually being submitted to Falkon (as opposed to just the
>>>>>>> change of status within Karajan). I don't see those debug
>>>>>>> statements, I bet they got overwritten in the SVN update.
>>>>>>> What about the completed tasks, why are there so many (190K)
>>>>>>> completed tasks? Where did they come from?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> "Task" doesn't mean job. It could be just data being staged in , etc.
>>>>>> The first 2 are important -- (Submitted vs Completed). Since it
>>>>>> differs, this is the problem...
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Yong, are you keeping up with these emails? Do you still have a
>>>>>>> copy of the latest Falkon provider that you edited just before you
>>>>>>> left? Can you just take a look through there to make sure nothing
>>>>>>> has been broken with the SVN updates? If you don't have time for
>>>>>>> this now (considering today was your first day on the new job),
>>>>>>> I'll dig through there and see if I can make some sense of what is
>>>>>>> happening!
>>>>>>>
>>>>>>> One last thing, Ben mentioned that the Falkon provider you saw in
>>>>>>> Nika's account was different than what was in SVN. Ben, did you at
>>>>>>> least look at modification dates? How old was one as opposed to
>>>>>>> the other? I hope we did not revert back to an older version that
>>>>>>> might have had some bug in it....
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> I had to update to the latest version of provider-deef from SVN since
>>>>>> without the update nothing worked. The version I am at now is 1050.
>>>>>> But this is exactly the same version of swift/deef I used for our
>>>>>> Friday run (which 'worked' from Falcon/Swift point of view)
>>>>>>
>>>>>> Nika
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Ioan
>>>>>>>
>>>>>>> Veronika Nefedova wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Well, there are some discrepancies:
>>>>>>>>
>>>>>>>> nefedova at viper:~/alamines> grep "Completed job" MolDyn-244-loops-
>>>>>>>> zhgo6be8tjhi1.log | wc
>>>>>>>> 7959 244749 3241072
>>>>>>>> nefedova at viper:~/alamines> grep "Running job" MolDyn-244-loops-
>>>>>>>> zhgo6be8tjhi1.log | wc
>>>>>>>> 17207 564648 7949388
>>>>>>>> nefedova at viper:~/alamines>
>>>>>>>>
>>>>>>>> I.e. almost half of the jobs haven't finished (according to swift)
>>>>>>>>
>>>>>>>> I also have some exceptions:
>>>>>>>>
>>>>>>>> 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, identity=urn:
>>>>>>>> 0-1-101-2-37-0-0-1186444363341) setting status to Failed Exception
>>>>>>>> in getFile
>>>>>>>> (80 of those):
>>>>>>>> nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops-
>>>>>>>> zhgo6be8tjhi1.log | wc
>>>>>>>> 80 880 9705
>>>>>>>> nefedova at viper:~/alamines>
>>>>>>>>
>>>>>>>>
>>>>>>>> Nika
>>>>>>>>
>>>>>>>>
>>>>>> _______________________________________________
>>>>>> Swift-devel mailing list
>>>>>> Swift-devel at ci.uchicago.edu
>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>
>>>>>>
>>>>>>
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>
>>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070808/97533c9d/attachment.html>
More information about the Swift-devel
mailing list