[Swift-devel] Returning GRAM errors to swift user
Mihael Hategan
hategan at mcs.anl.gov
Sun Feb 15 23:21:58 CST 2009
On Sun, 2009-02-15 at 17:50 -0600, Michael Wilde wrote:
> Im assuming that Swift and the CoG provider return as much about GRAM
> errors back to the user as they know. But, for jobs that fail to start,
> e.g., due to an invalid project code, that error never makes it back to
> the user (but *is* present in the gram log).
>
> In this case, can the message below, from the GRAM log,
> "GRAM_SCRIPT_GT3_FAILURE_MESSAGE:qsub: Invalid Account MSG=invalid
> account\n" available in the GRAM API so it can be sent to the user?
There are two possibilities:
1. The message does not make it to the ws-gram client. This needs to be
fixed in ws-gram.
2. (1) is false, and the ws-gram cog provider does not propagate that
message in the failure event. This I should fix.
There's a third, but unlikely, that the karajan or swift portion is
broken.
>
> I'm assuming this particular issue is well known to users experienced
> with TeraGrid sites, like Sarah, but is perhaps worth pointing out in a
> troubleshooting section. If there's a chance that some of this GRAM
> error info can be returned but is not currently, I can file this in
> bugzilla.
>
> It seems like a few errors, such as account/project errors, or other
> invalid job specs (like time/queue mismatches?) are similarly not passed
> back. Is that the case?
In my experience, yes.
>
> Relevant snips from the logs are below.
>
> Also interesting to note: On the UC teragrid site, a project specified
> in sites.xml via the globus profile does *not* override a default
> project set by the tgprojects command.
If this is correct (i.e. we're not talking about some obscure issue
where having a bogus default project causes all your jobs to fail), I
would think of it as a bug that should be submitted to teragrid.
More information about the Swift-devel
mailing list