[Swift-devel] Q about MolDyn

Veronika Nefedova nikan at wideopenwest.com
Tue Aug 7 10:01:02 CDT 2007


Mihael, do you have any clues on why this run has failed? Ioan - my  
answers to your questions are below...

On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote:

> It looks like viper (where Swift is running) is idle, and so is tg- 
> viz-login2 (where Falkon is running).
> What looks evident to me is that the normal list of events is for a  
> successful task:
> iraicu at viper:/home/nefedova/alamines> grep "urn: 
> 0-1-73-2-31-0-0-1186444341989" MolDyn-244-loops-zhgo6be8tjhi1.log
> 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, identity=urn: 
> 0-1-73-2-31-0-0-1186444341989) setting status to Submitted
> 2007-08-06 20:58:17,685 DEBUG NotificationThread notification: urn: 
> 0-1-73-2-31-0-0-1186444341989 0
> 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, identity=urn: 
> 0-1-73-2-31-0-0-1186444341989) setting status to Completed
>
> iraicu at viper:/home/nefedova/alamines> grep "setting status to  
> Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc
>  17566  175660 2179412
>
> iraicu at viper:/home/nefedova/alamines> grep "NotificationThread  
> notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc
>   7959   55713  785035
>
> iraicu at viper:/home/nefedova/alamines> grep "setting status to  
> Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc
> 190968 1909680 24003796
>
> Now, 17566 tasks were submitted, 7959 notifiation were received  
> from Falkon, and 190968 tasks were set to completed...
>
> Obviously this isn't right.  Falkon only saw 7959 tasks, so I would  
> argue that the # of notifications received is correct.  The  
> submitted # of tasks looks like the # I would have expected, but  
> all the tasks did not make it to Falkon.  The Falkon provider is  
> what sits between the change of status to submitted, and the  
> receipt of the notification, so I would say that is the first place  
> we need to look for more details... there used to some extra debug  
> info in the Falkon provider that simply printed all the tasks that  
> were actually being submitted to Falkon (as opposed to just the  
> change of status within Karajan).  I don't see those debug  
> statements, I bet they got overwritten in the SVN update.
> What about the completed tasks, why are there so many (190K)  
> completed tasks?  Where did they come from?
>


"Task" doesn't mean job. It could be just data being staged in , etc.  
The first 2 are important -- (Submitted vs Completed). Since it  
differs, this is the problem...


> Yong, are you keeping up with these emails?  Do you still have a  
> copy of the latest Falkon provider that you edited just before you  
> left?  Can you just take a look through there to make sure nothing  
> has been broken with the SVN updates?  If you don't have time for  
> this now (considering today was your first day on the new job),  
> I'll dig through there and see if I can make some sense of what is  
> happening!
>
> One last thing, Ben mentioned that the Falkon provider you saw in  
> Nika's account was different than what was in SVN.  Ben, did you at  
> least look at modification dates?  How old was one as opposed to  
> the other?  I hope we did not revert back to an older version that  
> might have had some bug in it....
>

I had to update to the latest version of provider-deef from SVN since  
without the update nothing worked. The version I am at now is 1050.  
But this is exactly the same version of swift/deef I used for our  
Friday run (which 'worked' from Falcon/Swift point of view)

Nika


> Ioan
>
> Veronika Nefedova wrote:
>> Well, there are some discrepancies:
>>
>> nefedova at viper:~/alamines> grep "Completed job" MolDyn-244-loops- 
>> zhgo6be8tjhi1.log | wc
>>    7959  244749 3241072
>> nefedova at viper:~/alamines> grep "Running job" MolDyn-244-loops- 
>> zhgo6be8tjhi1.log | wc
>>   17207  564648 7949388
>> nefedova at viper:~/alamines>
>>
>> I.e. almost half of the jobs haven't finished (according to swift)
>>
>> I also have some exceptions:
>>
>> 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, identity=urn: 
>> 0-1-101-2-37-0-0-1186444363341) setting status to Failed Exception  
>> in getFile
>> (80 of those):
>> nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- 
>> zhgo6be8tjhi1.log | wc
>>      80     880    9705
>> nefedova at viper:~/alamines>
>>
>>
>> Nika




More information about the Swift-devel mailing list