[Swift-devel] swift problem?

Mihael Hategan hategan at mcs.anl.gov
Wed Mar 21 17:16:38 CDT 2007


I've never seen this error before, but it's coming from the GRAM
service. It's not the reason why more jobs were not submitted properly,
but it may be related to it. My guess is that something happened on the
server side that caused most jobs to not send notifications and some (or
one) to fail in that way, and Swift thinks most of these jobs are still
running.

Did the jobs get killed? Do the GRAM logs give any details?

Mihael 

On Wed, 2007-03-21 at 17:08 -0500, Veronika V. Nefedova wrote:
> I've submitted a big job to TG NCSA today. At some point it filled up the 
> PBS queue completely - I had 384 jobs queued/running (thats the limit). And 
> I know that I had many more jobs waiting on my local machine to be 
> submitted to TG. Once the jobs started to leave the queue (i.e. were 
> finished) - no more jobs were submitted. So I have now only 372 jobs in the 
> queue while I should be having 384. Any ideas why is it happening ?
> 
> I checked my log on wiggum: 
> /sandbox/ydeng/alamines/swift-MolDyn-free-final-c2eygeq2do861.log
> 
> and found this error:
> 
> 2007-03-21 15:51:35,963 INFO  vdl:execute2 Running job chrm_long-8qmvzv8i 
> chrm_long with arguments [pstep:40000, prtfile:solv_chg_a3, 
> system:solv_m018, stitle:m018, rtffile:parm03_gaff_all.rtf, 
> paramfile:parm03_gaffnb_all.prm, gaff:m018_am1, vac:, restart:NONE, 
> faster:off, rwater:15, chem:chem, minstep:0, rforce:0, ligcrd:lyz, 
> stage:chg, urandseed:4212951, dirname:solv_chg_a3_m018] in 
> swift-MolDyn-free-final-c2eygeq2do861/chrm_long-8qmvzv8i on TG-NCSA
> 2007-03-21 15:51:38,162 DEBUG vdl:execute2 Application exception: It is 
> unknown if the job was submitted
>          task:execute @ vdl-int.k, line: 352
>          vdl:execute2 @ execute-default.k, line: 22
>          vdl:execute @ swift-MolDyn-free-final.kml, line: 142
>          charmm2 @ swift-MolDyn-free-final.kml, line: 155790
>          vdl:mains @ swift-MolDyn-free-final.kml, line: 122678
> Caused by: org.globus.gram.GramException: It is unknown if the job was 
> submitted
> 
> I am not sure if its causing the job submission problems ?
> I am using this swift code: /sandbox/nefedova/SWIFT/vdsk-0.1rc2 (with some 
> options tweaked in scheduler.xml and swift exec)
> Thanks!
> 
> Nika
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 




More information about the Swift-devel mailing list