[Swift-devel] swift problem?

Veronika V. Nefedova nefedova at mcs.anl.gov
Wed Mar 21 17:32:47 CDT 2007


  I am not sure what I should look for. I have several hundreds of gram 
logs -- I checked a few of them and they looked normal (all approximately 
the same size). I also didn't see any stderr in my outputs (usually when 
the job is killed you get some kind of GRAM and/or PBS error in stderr.txt 
file)...

The number of jobs in the queue are decreasing -- i.e. the jobs are 
finishing and nothing new is submitted...

Nika

At 05:16 PM 3/21/2007, Mihael Hategan wrote:
>I've never seen this error before, but it's coming from the GRAM
>service. It's not the reason why more jobs were not submitted properly,
>but it may be related to it. My guess is that something happened on the
>server side that caused most jobs to not send notifications and some (or
>one) to fail in that way, and Swift thinks most of these jobs are still
>running.
>
>Did the jobs get killed? Do the GRAM logs give any details?
>
>Mihael
>
>On Wed, 2007-03-21 at 17:08 -0500, Veronika V. Nefedova wrote:
> > I've submitted a big job to TG NCSA today. At some point it filled up the
> > PBS queue completely - I had 384 jobs queued/running (thats the limit). 
> And
> > I know that I had many more jobs waiting on my local machine to be
> > submitted to TG. Once the jobs started to leave the queue (i.e. were
> > finished) - no more jobs were submitted. So I have now only 372 jobs in 
> the
> > queue while I should be having 384. Any ideas why is it happening ?
> >
> > I checked my log on wiggum:
> > /sandbox/ydeng/alamines/swift-MolDyn-free-final-c2eygeq2do861.log
> >
> > and found this error:
> >
> > 2007-03-21 15:51:35,963 INFO  vdl:execute2 Running job chrm_long-8qmvzv8i
> > chrm_long with arguments [pstep:40000, prtfile:solv_chg_a3,
> > system:solv_m018, stitle:m018, rtffile:parm03_gaff_all.rtf,
> > paramfile:parm03_gaffnb_all.prm, gaff:m018_am1, vac:, restart:NONE,
> > faster:off, rwater:15, chem:chem, minstep:0, rforce:0, ligcrd:lyz,
> > stage:chg, urandseed:4212951, dirname:solv_chg_a3_m018] in
> > swift-MolDyn-free-final-c2eygeq2do861/chrm_long-8qmvzv8i on TG-NCSA
> > 2007-03-21 15:51:38,162 DEBUG vdl:execute2 Application exception: It is
> > unknown if the job was submitted
> >          task:execute @ vdl-int.k, line: 352
> >          vdl:execute2 @ execute-default.k, line: 22
> >          vdl:execute @ swift-MolDyn-free-final.kml, line: 142
> >          charmm2 @ swift-MolDyn-free-final.kml, line: 155790
> >          vdl:mains @ swift-MolDyn-free-final.kml, line: 122678
> > Caused by: org.globus.gram.GramException: It is unknown if the job was
> > submitted
> >
> > I am not sure if its causing the job submission problems ?
> > I am using this swift code: /sandbox/nefedova/SWIFT/vdsk-0.1rc2 (with some
> > options tweaked in scheduler.xml and swift exec)
> > Thanks!
> >
> > Nika
> >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070321/41cc89e1/attachment.html>


More information about the Swift-devel mailing list