[Swift-user] Re: Error with condor provider

Michael Wilde wilde at mcs.anl.gov
Thu Jun 24 12:52:33 CDT 2010


----- "Arjun Comar" <mandaya at rose-hulman.edu> wrote:

> I don't think it's just the host, because the swift job was submitted
> to red.unl.edu , which I know is a valid host. And Justin suggested
> running without that gridresource line, but then it just fails
> outright. So I switched the gridresource to gt2
> red.unl.edu/jobmanager-fork , and ran again, and this time I got an
> actual reason for being held in the log file. Here's the exact line:
> 012 (077.000.000) 06/24 11:21:05 Job was held.
> Failed to get expiration time of proxy
> Code 0 Subcode 0
> ...
> 
> I have a valid proxy, so I don't know what the problem is. Does this
> indicate that I need to switch to voms-proxy over grid-proxy?

You should test this step by step, layer by layer. Use whatever proxy type you would use when testing with the standard grid commands. 

Try a short simple job using the gt2 fork provider (jobmanager=fork in the sites entry I think it is).

Are you testing standalone condor-g?

Are you doing this from engage-submit (where there is a known-working OSG client stack)?

- Mike


> 
> Arjun
> 
> 
> On Wed, Jun 23, 2010 at 5:40 PM, wilde at mcs.anl.gov < wilde at mcs.anl.gov
> > wrote:
> 
> 
> 
> 
> ----- "Arjun Comar" < mandaya at rose-hulman.edu > wrote:
> 
> > Alright, so I just tried again with trunk over Mike's stable
> > repository and it worked, so it looks like whatever the problem was
> > got fixed between stable and trunk.
> 
> Or uncommitted in my working dirs and needs testing and checkin :(
> 
> 
> > However submitting jobs to condor
> > results in the job simply getting held. Do I need any permissions
> over
> > grid-proxy-init? Same thing happens when I run condor_submit on the
> > command line.
> 
> There is a good little guide on debugging Condor problems:
> http://www.cs.wisc.edu/condor/CondorWeek2004/presentations/effective_condorg_v4.doc
> (likely superseded now; talks about what causes jobs to go on hold,
> and how to look at log files)
> 
> Things like bad paths or args can cause jobs to fail, and get held
> and/or retried.
> 
> Below, the host name gsu1.uchicago.edu may be some grid school host
> that is down or non-existent???
> 
> - Mike
> 
> 
> 
> 
> >
> > Condor submit file:
> > executable=/bin/echo
> > arguments=Hello World!
> > output=results.output
> > error=results.error
> > log=results.log
> > notification=never
> > universe=grid
> > grid_resource=gt2 gsu1.uchicago.edu/jobmanager-fork
> > queue
> >
> > and I submitted with:
> > condor_submit myjob.submit
> >
> > results in:
> > -- Submitter: bridled.ci.uchicago.edu : < 128.135.125.18:49572 > :
> > bridled.ci.uchicago.edu
> > ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
> > 57.0 aespinosa 6/15 11:54 0+00:00:00 I 0 7.3 condor_dagman
> > 71.0 arjun 6/23 11:24 0+00:00:00 H 0 0.0 echo Hello World!
> > 72.0 arjun 6/23 17:05 0+00:00:00 H 0 1.0 bash /opt/osg/data
> >
> > 3 jobs; 1 idle, 0 running, 2 held
> >
> > (My first job is the condor-submit job, the second is the job I
> > attempted to submit via swift.)
> >
> > Any thoughts?
> >
> > Arjun
> >
> >
> > On Wed, Jun 23, 2010 at 11:26 AM, Arjun Comar <
> > mandaya at rose-hulman.edu > wrote:
> >
> >
> > Hey all,
> > I've been trying to get jobs submitted over Condor via swift, and
> > running into a few problems. I think I've finally hit a point where
> > it's the Condor provider itself that's failing over any of my
> > configurations.
> > Here's the sites entry (though any sites entry over Condor will do,
> > and I've tried several):
> >
> > <config>
> > <pool handle=" Nebraska_red.unl.edu ">
> > <gridftp url="gsiftp:// red.unl.edu/ "/>
> > <execution provider="condor"/>
> > <profile key="jobType" namespace="globus">grid</profile>
> > <profile key="gridResource" namespace="globus">gt2
> > red.unl.edu/jobmanager-condor </profile>
> > <workdirectory>/opt/osg/data/engage/tmp/ red.unl.edu
> </workdirectory>
> > </pool>
> > </config>
> >
> > And any swift script at all fails, even a simple helloworld:
> >
> > type messagefile;
> > app (messagefile t) greeting () {
> > echo "Hello, world!" stdout=@filename(t);
> > }
> > messagefile outfile <"hello.txt">;
> > outfile = greeting();
> >
> > With the following error:
> > The following errors have occurred:
> > 1. Application "echo" failed (Cannot submit job: Could not submit
> job
> > (condor_submit reported an exit code of 1). Submitting job(s)
> > Found illegal unescaped double-quote: "" -e /bin/echo -out hello.txt
> > -err stderr.txt -i -d "" -if "" -of hello.txt -k "" -status
> "provider
> > " -a "Hello, world!"The full arguments you specified were:
> > /opt/osg/data/engage/tmp/
> > red.unl.edu/helloworld-20100623-1051-dr5v5apa/shared/_swiftwrap
> > echo-dqt6jttj -jobdir d -scratch "" -e /bin/echo -out hello.txt -err
> > stderr.txt -i -d "" -if "" -of hello.txt -k "" -status "provider "
> -a
> > "Hello, world!")
> >
> > The same script runs just fine through any other submission
> mechanism,
> > even to the same site (ssh, coasters+ssh:pbs, etc).
> >
> > Anyone have any thoughts on fixing the problem?
> >
> > Thanks!
> >
> > --
> > Arjun Comar, Rose-Hulman '12
> >
> >
> >
> > --
> > Arjun Comar, Rose-Hulman '12
> 
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 
> 
> 
> 
> --
> Arjun Comar, Rose-Hulman '12

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-user mailing list