[Swift-devel] Q about MolDyn

Veronika Nefedova nefedova at mcs.anl.gov
Mon Aug 6 16:18:21 CDT 2007


I got 40 such entries:

2007-08-06 14:47:03,571 DEBUG TaskImpl Task(type=2, identity=urn: 
0-1-66-0-1186429258767) setting status to Failed Exception in getFile

and 20 such entries:
2007-08-06 14:46:58,559 DEBUG vdl:execute2 Application exception:  
Task failed

The workflow just exited with no more new errors/entries in the log.  
The last few lines of the log:

2007-08-06 14:47:03,596 DEBUG TaskImpl Task(type=4, identity=urn: 
0-1-55-0-1186429258834) setting status to Active
2007-08-06 14:47:03,596 DEBUG TaskImpl Task(type=4, identity=urn: 
0-1-55-0-1186429258834) setting status to Completed
2007-08-06 14:47:03,704 DEBUG TaskImpl Task(type=2, identity=urn: 
0-1-62-0-1186429258791) setting status to Failed Exception in getFile
2007-08-06 14:47:03,705 DEBUG TaskImpl Task(type=4, identity=urn: 
0-1-62-0-1186429258838) setting status to Active
2007-08-06 14:47:03,705 DEBUG TaskImpl Task(type=4, identity=urn: 
0-1-62-0-1186429258838) setting status to Completed
nefedova at viper:~/alamines>

On Aug 6, 2007, at 4:04 PM, Mihael Hategan wrote:

> Try "[E|e]xception".
>
> On Mon, 2007-08-06 at 15:57 -0500, Veronika Nefedova wrote:
>> Nope, nothing more really...
>> Several of these:
>>
>> 2007-08-06 14:46:58,562 DEBUG TaskImpl Task(type=1, identity=urn:
>> 0-1-67-0-1186429255847) setting status to Failed
>> 2007-08-06 14:46:58,562 DEBUG TaskImpl Task(type=1, identity=urn:
>> 0-1-69-0-1186429255851) setting status to Failed
>> 2007-08-06 14:46:58,562 DEBUG TaskImpl Task(type=1, identity=urn:
>> 0-1-68-0-1186429255859) setting status to Failed
>> 2007-08-06 14:46:58,562 DEBUG TaskImpl Task(type=1, identity=urn:
>> 0-1-70-0-1186429255863) setting status to Failed
>>
>> Nothing more specific...
>>
>> The log is huge. If you tell me what string to grep for - I might be
>> able to find something relevant...
>>
>> NIka
>>
>> On Aug 6, 2007, at 3:52 PM, Mihael Hategan wrote:
>>
>>> On Mon, 2007-08-06 at 15:17 -0500, Veronika Nefedova wrote:
>>>> OK. There is something weird happening. I've got several such  
>>>> entries
>>>> in my swift log:
>>>>
>>>> 2007-08-06 14:46:58,565 DEBUG vdl:execute2 Application exception:
>>>> Task failed
>>>>          task:execute @ vdl-int.k, line: 332
>>>>          vdl:execute2 @ execute-default.k, line: 22
>>>>          vdl:execute @ MolDyn-244-loops.kml, line: 20
>>>>          antchmbr @ MolDyn-244-loops.kml, line: 2845
>>>>          vdl:mains @ MolDyn-244-loops.kml, line: 2267
>>>
>>> That doesn't say much. Any more details in the logs?
>>>
>>>>
>>>>
>>>> Looks like antechamber has failed (?). And the failure is only on a
>>>> swfit side, it never made it across to Falcon (there are no remote
>>>> directories created). But I see some of antechamber jobs have
>>>> finished (in shared).
>>>>
>>>> Yuqing -- could the changes you've made be responsible for these
>>>> failures (I do not see how it could though) ?
>>>>
>>>> Ioan, what do you see in your logs ion these tasks:
>>>>
>>>> 2007-08-06 14:46:58,555 DEBUG TaskImpl Task(type=1, identity=urn:
>>>> 0-1-56-0-1186429255786) setting status to Failed
>>>> 2007-08-06 14:46:58,556 DEBUG TaskImpl Task(type=1, identity=urn:
>>>> 0-1-57-0-1186429255798) setting status to Failed
>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn:
>>>> 0-1-59-0-1186429255800) setting status to Failed
>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn:
>>>> 0-1-60-0-1186429255805) setting status to Failed
>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn:
>>>> 0-1-61-0-1186429255811) setting status to Failed
>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn:
>>>> 0-1-58-0-1186429255814) setting status to Failed
>>>>
>>>> Nika
>>>>
>>>> On Aug 6, 2007, at 2:29 PM, Ioan Raicu wrote:
>>>>
>>>>> OK!
>>>>> Why don't we do one last run from my allocation, as everything is
>>>>> set up already and ready to go!  Make sure to enable all debug
>>>>> logging.  Falkon is up and running with all debug enabled!
>>>>>
>>>>> Falkon location is unchanged from the last experiment.
>>>>> Falkon Factory Service: http://tg-viz-login2:50010/wsrf/services/
>>>>> GenericPortal/core/WS/GPFactoryService
>>>>> Web Server (graphs): http://tg-viz-login2.uc.teragrid.org:51000/
>>>>> index.htm
>>>>>
>>>>> ANL/UC is not quite so idle as it was earlier, but I bet we could
>>>>> still get 150~200 processors!
>>>>>
>>>>> Ioan
>>>>>
>>>>> Veronika Nefedova wrote:
>>>>>> m050 and m179 finished just fine now via GRAM (thanks to Yuqing
>>>>>> who fixed the m179 just in time!). We could start again the 244-
>>>>>> molecule run to verify that nothing is wrong with the whole  
>>>>>> system.
>>>>>>
>>>>>> Nika
>>>>>>
>>>>>> On Aug 6, 2007, at 12:20 PM, Veronika Nefedova wrote:
>>>>>>
>>>>>>>
>>>>>>> On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote:
>>>>>>>
>>>>>>>
>>>>>>> I started those 2 molecules via GRAM. I have no trust in m179
>>>>>>> finishing completely since I didn't change anything. I hope for
>>>>>>> m050 to finish though...
>>>>>>> You can watch the swift log on viper in ~nefedova/alamines/
>>>>>>> MolDyn-2-loops-be9484k93kk21.log
>>>>>>>
>>>>>>> Nika
>>>>>>>
>>>>>>>> Then, let's try another run with 244 molecules soon, as most of
>>>>>>>> ANL/UC is free!
>>>>>>>>
>>>>>>>> Ioan
>>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>
>>>
>>
>




More information about the Swift-devel mailing list