[Swift-devel] swift problem?

Mihael Hategan hategan at mcs.anl.gov
Wed Mar 21 17:37:52 CDT 2007


On Wed, 2007-03-21 at 17:32 -0500, Veronika V. Nefedova wrote:
>  I am not sure what I should look for. I have several hundreds of gram
> logs -- I checked a few of them and they looked normal (all
> approximately the same size). I also didn't see any stderr in my
> outputs (usually when the job is killed you get some kind of GRAM
> and/or PBS error in stderr.txt file)...
> 
> The number of jobs in the queue are decreasing 

The fact that the number of jobs in the queue is decreasing doesn't mean
that Swift knows about it.
Can you add
"log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG" in log4j.properties and try it again?

Mihael

> -- i.e. the jobs are finishing and nothing new is submitted...
> 
> Nika
> 
> At 05:16 PM 3/21/2007, Mihael Hategan wrote:
> > I've never seen this error before, but it's coming from the GRAM
> > service. It's not the reason why more jobs were not submitted
> > properly,
> > but it may be related to it. My guess is that something happened on
> > the
> > server side that caused most jobs to not send notifications and some
> > (or
> > one) to fail in that way, and Swift thinks most of these jobs are
> > still
> > running.
> > 
> > Did the jobs get killed? Do the GRAM logs give any details?
> > 
> > Mihael 
> > 
> > On Wed, 2007-03-21 at 17:08 -0500, Veronika V. Nefedova wrote:
> > > I've submitted a big job to TG NCSA today. At some point it filled
> > up the 
> > > PBS queue completely - I had 384 jobs queued/running (thats the
> > limit). And 
> > > I know that I had many more jobs waiting on my local machine to
> > be 
> > > submitted to TG. Once the jobs started to leave the queue (i.e.
> > were 
> > > finished) - no more jobs were submitted. So I have now only 372
> > jobs in the 
> > > queue while I should be having 384. Any ideas why is it
> > happening ?
> > > 
> > > I checked my log on wiggum: 
> > > /sandbox/ydeng/alamines/swift-MolDyn-free-final-c2eygeq2do861.log
> > > 
> > > and found this error:
> > > 
> > > 2007-03-21 15:51:35,963 INFO  vdl:execute2 Running job
> > chrm_long-8qmvzv8i 
> > > chrm_long with arguments [pstep:40000, prtfile:solv_chg_a3, 
> > > system:solv_m018, stitle:m018, rtffile:parm03_gaff_all.rtf, 
> > > paramfile:parm03_gaffnb_all.prm, gaff:m018_am1, vac:,
> > restart:NONE, 
> > > faster:off, rwater:15, chem:chem, minstep:0, rforce:0,
> > ligcrd:lyz, 
> > > stage:chg, urandseed:4212951, dirname:solv_chg_a3_m018] in 
> > > swift-MolDyn-free-final-c2eygeq2do861/chrm_long-8qmvzv8i on
> > TG-NCSA
> > > 2007-03-21 15:51:38,162 DEBUG vdl:execute2 Application exception:
> > It is 
> > > unknown if the job was submitted
> > >          task:execute @ vdl-int.k, line: 352
> > >          vdl:execute2 @ execute-default.k, line: 22
> > >          vdl:execute @ swift-MolDyn-free-final.kml, line: 142
> > >          charmm2 @ swift-MolDyn-free-final.kml, line: 155790
> > >          vdl:mains @ swift-MolDyn-free-final.kml, line: 122678
> > > Caused by: org.globus.gram.GramException: It is unknown if the job
> > was 
> > > submitted
> > > 
> > > I am not sure if its causing the job submission problems ?
> > > I am using this swift code: /sandbox/nefedova/SWIFT/vdsk-0.1rc2
> > (with some 
> > > options tweaked in scheduler.xml and swift exec)
> > > Thanks!
> > > 
> > > Nika
> > > 
> > > 
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > 
> 




More information about the Swift-devel mailing list