[Swift-devel] Jobs being aborted by PBS server on tg-grid.uc.teragrid.org

Michael Wilde wilde at mcs.anl.gov
Sun Nov 4 21:38:44 CST 2007


Ive reported this to TG and Ti on the chance that its on the server 
side. If nothing else, possibly a PBS log can pinpoint what we're doing 
wrong if its us or me.

The two runs below are in ~benc/swift-logs/wilde/
   7:46 PM - run142
   8:57 PM - run142

Ive started to add a 'comment' file to my log dirs there to note the 
reason, and on occasion I copy output placed in cwd to _output.
Also adding find or ls output to each dir when its relevant and I 
remember. Im trying to automate more of this as I go.

- Mike


On 11/4/07 9:20 PM, Michael Wilde wrote:
> Im starting to see more frequent problems like this.
> Happened once last night to 3 consecutive jobs, and tonight happened 
> twice, to 6 jobs.
> 
> Ti, could you look in the PBS logs, possibly on the related node(s) and 
> see if its looking like a problem on tg-uc or on our side?
> 
> Thanks,
> 
> Mike
> 
> 
> 11/3 8:05 PM - 3 failures
>  Job IDs 1571647, 48, & 49
> 11/4 7:46 PM - 3 failures
>  Job IDs 1572031, 33, & 34
> 11/4 8:56 - 8:57 PM
>  1572040, 42, 43
> 
> All errors have the format below.
> 
> Swift retries failing jobs 3 times, hence the groups of 3 above.
> 
> 
> -------- Original Message --------
> Subject: PBS JOB 1572043.tg-master.uc.teragrid.org
> Date: Sun,  4 Nov 2007 20:57:11 -0600 (CST)
> From: adm at tg-master.uc.teragrid.org (root)
> To: wilde at tg-grid1.uc.teragrid.org
> 
> PBS Job Id: 1572043.tg-master.uc.teragrid.org
> Job Name:   STDIN
> Aborted by PBS Server
> Job cannot be executed
> See Administrator for help
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 
> 



More information about the Swift-devel mailing list