[Swift-user] Re: Error with condor provider
Arjun Comar
mandaya at rose-hulman.edu
Thu Jun 24 11:24:53 CDT 2010
I don't think it's just the host, because the swift job was submitted to
red.unl.edu, which I know is a valid host. And Justin suggested running
without that gridresource line, but then it just fails outright. So I
switched the gridresource to gt2 red.unl.edu/jobmanager-fork, and ran again,
and this time I got an actual reason for being held in the log file. Here's
the exact line:
012 (077.000.000) 06/24 11:21:05 Job was held.
Failed to get expiration time of proxy
Code 0 Subcode 0
...
I have a valid proxy, so I don't know what the problem is. Does this
indicate that I need to switch to voms-proxy over grid-proxy?
Arjun
On Wed, Jun 23, 2010 at 5:40 PM, wilde at mcs.anl.gov <wilde at mcs.anl.gov>wrote:
>
> ----- "Arjun Comar" <mandaya at rose-hulman.edu> wrote:
>
> > Alright, so I just tried again with trunk over Mike's stable
> > repository and it worked, so it looks like whatever the problem was
> > got fixed between stable and trunk.
>
> Or uncommitted in my working dirs and needs testing and checkin :(
>
> > However submitting jobs to condor
> > results in the job simply getting held. Do I need any permissions over
> > grid-proxy-init? Same thing happens when I run condor_submit on the
> > command line.
>
> There is a good little guide on debugging Condor problems:
>
> http://www.cs.wisc.edu/condor/CondorWeek2004/presentations/effective_condorg_v4.doc
> (likely superseded now; talks about what causes jobs to go on hold, and how
> to look at log files)
>
> Things like bad paths or args can cause jobs to fail, and get held and/or
> retried.
>
> Below, the host name gsu1.uchicago.edu may be some grid school host that
> is down or non-existent???
>
> - Mike
>
> >
> > Condor submit file:
> > executable=/bin/echo
> > arguments=Hello World!
> > output=results.output
> > error=results.error
> > log=results.log
> > notification=never
> > universe=grid
> > grid_resource=gt2 gsu1.uchicago.edu/jobmanager-fork
> > queue
> >
> > and I submitted with:
> > condor_submit myjob.submit
> >
> > results in:
> > -- Submitter: bridled.ci.uchicago.edu : < 128.135.125.18:49572 > :
> > bridled.ci.uchicago.edu
> > ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
> > 57.0 aespinosa 6/15 11:54 0+00:00:00 I 0 7.3 condor_dagman
> > 71.0 arjun 6/23 11:24 0+00:00:00 H 0 0.0 echo Hello World!
> > 72.0 arjun 6/23 17:05 0+00:00:00 H 0 1.0 bash /opt/osg/data
> >
> > 3 jobs; 1 idle, 0 running, 2 held
> >
> > (My first job is the condor-submit job, the second is the job I
> > attempted to submit via swift.)
> >
> > Any thoughts?
> >
> > Arjun
> >
> >
> > On Wed, Jun 23, 2010 at 11:26 AM, Arjun Comar <
> > mandaya at rose-hulman.edu > wrote:
> >
> >
> > Hey all,
> > I've been trying to get jobs submitted over Condor via swift, and
> > running into a few problems. I think I've finally hit a point where
> > it's the Condor provider itself that's failing over any of my
> > configurations.
> > Here's the sites entry (though any sites entry over Condor will do,
> > and I've tried several):
> >
> > <config>
> > <pool handle=" Nebraska_red.unl.edu ">
> > <gridftp url="gsiftp:// red.unl.edu/ "/>
> > <execution provider="condor"/>
> > <profile key="jobType" namespace="globus">grid</profile>
> > <profile key="gridResource" namespace="globus">gt2
> > red.unl.edu/jobmanager-condor </profile>
> > <workdirectory>/opt/osg/data/engage/tmp/ red.unl.edu </workdirectory>
> > </pool>
> > </config>
> >
> > And any swift script at all fails, even a simple helloworld:
> >
> > type messagefile;
> > app (messagefile t) greeting () {
> > echo "Hello, world!" stdout=@filename(t);
> > }
> > messagefile outfile <"hello.txt">;
> > outfile = greeting();
> >
> > With the following error:
> > The following errors have occurred:
> > 1. Application "echo" failed (Cannot submit job: Could not submit job
> > (condor_submit reported an exit code of 1). Submitting job(s)
> > Found illegal unescaped double-quote: "" -e /bin/echo -out hello.txt
> > -err stderr.txt -i -d "" -if "" -of hello.txt -k "" -status "provider
> > " -a "Hello, world!"The full arguments you specified were:
> > /opt/osg/data/engage/tmp/
> > red.unl.edu/helloworld-20100623-1051-dr5v5apa/shared/_swiftwrap
> > echo-dqt6jttj -jobdir d -scratch "" -e /bin/echo -out hello.txt -err
> > stderr.txt -i -d "" -if "" -of hello.txt -k "" -status "provider " -a
> > "Hello, world!")
> >
> > The same script runs just fine through any other submission mechanism,
> > even to the same site (ssh, coasters+ssh:pbs, etc).
> >
> > Anyone have any thoughts on fixing the problem?
> >
> > Thanks!
> >
> > --
> > Arjun Comar, Rose-Hulman '12
> >
> >
> >
> > --
> > Arjun Comar, Rose-Hulman '12
>
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
>
>
--
Arjun Comar, Rose-Hulman '12
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20100624/768673ac/attachment.html>
More information about the Swift-user
mailing list