[Swift-devel] Q about MolDyn

Mihael Hategan hategan at mcs.anl.gov
Mon Aug 6 16:04:46 CDT 2007


Try "[E|e]xception".

On Mon, 2007-08-06 at 15:57 -0500, Veronika Nefedova wrote:
> Nope, nothing more really...
> Several of these:
> 
> 2007-08-06 14:46:58,562 DEBUG TaskImpl Task(type=1, identity=urn: 
> 0-1-67-0-1186429255847) setting status to Failed
> 2007-08-06 14:46:58,562 DEBUG TaskImpl Task(type=1, identity=urn: 
> 0-1-69-0-1186429255851) setting status to Failed
> 2007-08-06 14:46:58,562 DEBUG TaskImpl Task(type=1, identity=urn: 
> 0-1-68-0-1186429255859) setting status to Failed
> 2007-08-06 14:46:58,562 DEBUG TaskImpl Task(type=1, identity=urn: 
> 0-1-70-0-1186429255863) setting status to Failed
> 
> Nothing more specific...
> 
> The log is huge. If you tell me what string to grep for - I might be  
> able to find something relevant...
> 
> NIka
> 
> On Aug 6, 2007, at 3:52 PM, Mihael Hategan wrote:
> 
> > On Mon, 2007-08-06 at 15:17 -0500, Veronika Nefedova wrote:
> >> OK. There is something weird happening. I've got several such entries
> >> in my swift log:
> >>
> >> 2007-08-06 14:46:58,565 DEBUG vdl:execute2 Application exception:
> >> Task failed
> >>          task:execute @ vdl-int.k, line: 332
> >>          vdl:execute2 @ execute-default.k, line: 22
> >>          vdl:execute @ MolDyn-244-loops.kml, line: 20
> >>          antchmbr @ MolDyn-244-loops.kml, line: 2845
> >>          vdl:mains @ MolDyn-244-loops.kml, line: 2267
> >
> > That doesn't say much. Any more details in the logs?
> >
> >>
> >>
> >> Looks like antechamber has failed (?). And the failure is only on a
> >> swfit side, it never made it across to Falcon (there are no remote
> >> directories created). But I see some of antechamber jobs have
> >> finished (in shared).
> >>
> >> Yuqing -- could the changes you've made be responsible for these
> >> failures (I do not see how it could though) ?
> >>
> >> Ioan, what do you see in your logs ion these tasks:
> >>
> >> 2007-08-06 14:46:58,555 DEBUG TaskImpl Task(type=1, identity=urn:
> >> 0-1-56-0-1186429255786) setting status to Failed
> >> 2007-08-06 14:46:58,556 DEBUG TaskImpl Task(type=1, identity=urn:
> >> 0-1-57-0-1186429255798) setting status to Failed
> >> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn:
> >> 0-1-59-0-1186429255800) setting status to Failed
> >> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn:
> >> 0-1-60-0-1186429255805) setting status to Failed
> >> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn:
> >> 0-1-61-0-1186429255811) setting status to Failed
> >> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn:
> >> 0-1-58-0-1186429255814) setting status to Failed
> >>
> >> Nika
> >>
> >> On Aug 6, 2007, at 2:29 PM, Ioan Raicu wrote:
> >>
> >>> OK!
> >>> Why don't we do one last run from my allocation, as everything is
> >>> set up already and ready to go!  Make sure to enable all debug
> >>> logging.  Falkon is up and running with all debug enabled!
> >>>
> >>> Falkon location is unchanged from the last experiment.
> >>> Falkon Factory Service: http://tg-viz-login2:50010/wsrf/services/
> >>> GenericPortal/core/WS/GPFactoryService
> >>> Web Server (graphs): http://tg-viz-login2.uc.teragrid.org:51000/
> >>> index.htm
> >>>
> >>> ANL/UC is not quite so idle as it was earlier, but I bet we could
> >>> still get 150~200 processors!
> >>>
> >>> Ioan
> >>>
> >>> Veronika Nefedova wrote:
> >>>> m050 and m179 finished just fine now via GRAM (thanks to Yuqing
> >>>> who fixed the m179 just in time!). We could start again the 244-
> >>>> molecule run to verify that nothing is wrong with the whole system.
> >>>>
> >>>> Nika
> >>>>
> >>>> On Aug 6, 2007, at 12:20 PM, Veronika Nefedova wrote:
> >>>>
> >>>>>
> >>>>> On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote:
> >>>>>
> >>>>>
> >>>>> I started those 2 molecules via GRAM. I have no trust in m179
> >>>>> finishing completely since I didn't change anything. I hope for
> >>>>> m050 to finish though...
> >>>>> You can watch the swift log on viper in ~nefedova/alamines/
> >>>>> MolDyn-2-loops-be9484k93kk21.log
> >>>>>
> >>>>> Nika
> >>>>>
> >>>>>> Then, let's try another run with 244 molecules soon, as most of
> >>>>>> ANL/UC is free!
> >>>>>>
> >>>>>> Ioan
> >>>>>>
> >>>>
> >>>>
> >>>
> >>
> >> _______________________________________________
> >> Swift-devel mailing list
> >> Swift-devel at ci.uchicago.edu
> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>
> >
> 




More information about the Swift-devel mailing list