[Swift-devel] Returning GRAM errors to swift user

Mihael Hategan hategan at mcs.anl.gov
Sun Feb 15 23:21:58 CST 2009


On Sun, 2009-02-15 at 17:50 -0600, Michael Wilde wrote:
> Im assuming that Swift and the CoG provider return as much about GRAM 
> errors back to the user as they know. But, for jobs that fail to start, 
> e.g., due to an invalid project code, that error never makes it back to 
> the user (but *is* present in the gram log).
> 
> In this case, can the message below, from the GRAM log, 
> "GRAM_SCRIPT_GT3_FAILURE_MESSAGE:qsub: Invalid Account MSG=invalid 
> account\n" available in the GRAM API so it can be sent to the user?

There are two possibilities:
1. The message does not make it to the ws-gram client. This needs to be
fixed in ws-gram.
2. (1) is false, and the ws-gram cog provider does not propagate that
message in the failure event. This I should fix.

There's a third, but unlikely, that the karajan or swift portion is
broken.

> 
> I'm assuming this particular issue is well known to users experienced 
> with TeraGrid sites, like Sarah, but is perhaps worth pointing out in a 
> troubleshooting section. If there's a chance that some of this GRAM 
> error info can be returned but is not currently, I can file this in 
> bugzilla.
> 
> It seems like a few errors, such as account/project errors, or other 
> invalid job specs (like time/queue mismatches?) are similarly not passed 
> back. Is that the case?

In my experience, yes.

> 
> Relevant snips from the logs are below.
> 
> Also interesting to note: On the UC teragrid site, a project specified 
> in sites.xml via the globus profile does *not* override a default 
> project set by the tgprojects command.

If this is correct (i.e. we're not talking about some obscure issue
where having a bogus default project causes all your jobs to fail), I
would think of it as a bug that should be submitted to teragrid.





More information about the Swift-devel mailing list