[Swift-user] Re: Error with condor provider

wilde at mcs.anl.gov wilde at mcs.anl.gov
Wed Jun 23 17:40:28 CDT 2010


----- "Arjun Comar" <mandaya at rose-hulman.edu> wrote:

> Alright, so I just tried again with trunk over Mike's stable
> repository and it worked, so it looks like whatever the problem was
> got fixed between stable and trunk.

Or uncommitted in my working dirs and needs testing and checkin :(

> However submitting jobs to condor
> results in the job simply getting held. Do I need any permissions over
> grid-proxy-init? Same thing happens when I run condor_submit on the
> command line.

There is a good little guide on debugging Condor problems:
  http://www.cs.wisc.edu/condor/CondorWeek2004/presentations/effective_condorg_v4.doc
(likely superseded now; talks about what causes jobs to go on hold, and how to look at log files)

Things like bad paths or args can cause jobs to fail, and get held and/or retried.

Below, the host name gsu1.uchicago.edu may be some grid school host that is down or non-existent???

- Mike

> 
> Condor submit file:
> executable=/bin/echo
> arguments=Hello World!
> output=results.output
> error=results.error
> log=results.log
> notification=never
> universe=grid
> grid_resource=gt2 gsu1.uchicago.edu/jobmanager-fork
> queue
> 
> and I submitted with:
> condor_submit myjob.submit
> 
> results in:
> -- Submitter: bridled.ci.uchicago.edu : < 128.135.125.18:49572 > :
> bridled.ci.uchicago.edu
> ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
> 57.0 aespinosa 6/15 11:54 0+00:00:00 I 0 7.3 condor_dagman
> 71.0 arjun 6/23 11:24 0+00:00:00 H 0 0.0 echo Hello World!
> 72.0 arjun 6/23 17:05 0+00:00:00 H 0 1.0 bash /opt/osg/data
> 
> 3 jobs; 1 idle, 0 running, 2 held
> 
> (My first job is the condor-submit job, the second is the job I
> attempted to submit via swift.)
> 
> Any thoughts?
> 
> Arjun
> 
> 
> On Wed, Jun 23, 2010 at 11:26 AM, Arjun Comar <
> mandaya at rose-hulman.edu > wrote:
> 
> 
> Hey all,
> I've been trying to get jobs submitted over Condor via swift, and
> running into a few problems. I think I've finally hit a point where
> it's the Condor provider itself that's failing over any of my
> configurations.
> Here's the sites entry (though any sites entry over Condor will do,
> and I've tried several):
> 
> <config>
> <pool handle=" Nebraska_red.unl.edu ">
> <gridftp url="gsiftp:// red.unl.edu/ "/>
> <execution provider="condor"/>
> <profile key="jobType" namespace="globus">grid</profile>
> <profile key="gridResource" namespace="globus">gt2
> red.unl.edu/jobmanager-condor </profile>
> <workdirectory>/opt/osg/data/engage/tmp/ red.unl.edu </workdirectory>
> </pool>
> </config>
> 
> And any swift script at all fails, even a simple helloworld:
> 
> type messagefile;
> app (messagefile t) greeting () {
> echo "Hello, world!" stdout=@filename(t);
> }
> messagefile outfile <"hello.txt">;
> outfile = greeting();
> 
> With the following error:
> The following errors have occurred:
> 1. Application "echo" failed (Cannot submit job: Could not submit job
> (condor_submit reported an exit code of 1). Submitting job(s)
> Found illegal unescaped double-quote: "" -e /bin/echo -out hello.txt
> -err stderr.txt -i -d "" -if "" -of hello.txt -k "" -status "provider
> " -a "Hello, world!"The full arguments you specified were:
> /opt/osg/data/engage/tmp/
> red.unl.edu/helloworld-20100623-1051-dr5v5apa/shared/_swiftwrap
> echo-dqt6jttj -jobdir d -scratch "" -e /bin/echo -out hello.txt -err
> stderr.txt -i -d "" -if "" -of hello.txt -k "" -status "provider " -a
> "Hello, world!")
> 
> The same script runs just fine through any other submission mechanism,
> even to the same site (ssh, coasters+ssh:pbs, etc).
> 
> Anyone have any thoughts on fixing the problem?
> 
> Thanks!
> 
> --
> Arjun Comar, Rose-Hulman '12
> 
> 
> 
> --
> Arjun Comar, Rose-Hulman '12

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-user mailing list