I don't think it's just the host, because the swift job was submitted to <a href="http://red.unl.edu">red.unl.edu</a>, which I know is a valid host. And Justin suggested running without that gridresource line, but then it just fails outright. So I switched the gridresource to gt2 <a href="http://red.unl.edu/jobmanager-fork">red.unl.edu/jobmanager-fork</a>, and ran again, and this time I got an actual reason for being held in the log file. Here's the exact line:<br>
012 (077.000.000) 06/24 11:21:05 Job was held.<br> Failed to get expiration time of proxy<br> Code 0 Subcode 0<br>...<br><br>I have a valid proxy, so I don't know what the problem is. Does this indicate that I need to switch to voms-proxy over grid-proxy?<br>
<br>Arjun<br><br><div class="gmail_quote">On Wed, Jun 23, 2010 at 5:40 PM, <a href="mailto:wilde@mcs.anl.gov">wilde@mcs.anl.gov</a> <span dir="ltr"><<a href="mailto:wilde@mcs.anl.gov">wilde@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div class="im"><br>
----- "Arjun Comar" <<a href="mailto:mandaya@rose-hulman.edu">mandaya@rose-hulman.edu</a>> wrote:<br>
<br>
> Alright, so I just tried again with trunk over Mike's stable<br>
> repository and it worked, so it looks like whatever the problem was<br>
> got fixed between stable and trunk.<br>
<br>
</div>Or uncommitted in my working dirs and needs testing and checkin :(<br>
<div class="im"><br>
> However submitting jobs to condor<br>
> results in the job simply getting held. Do I need any permissions over<br>
> grid-proxy-init? Same thing happens when I run condor_submit on the<br>
> command line.<br>
<br>
</div>There is a good little guide on debugging Condor problems:<br>
<a href="http://www.cs.wisc.edu/condor/CondorWeek2004/presentations/effective_condorg_v4.doc" target="_blank">http://www.cs.wisc.edu/condor/CondorWeek2004/presentations/effective_condorg_v4.doc</a><br>
(likely superseded now; talks about what causes jobs to go on hold, and how to look at log files)<br>
<br>
Things like bad paths or args can cause jobs to fail, and get held and/or retried.<br>
<br>
Below, the host name <a href="http://gsu1.uchicago.edu" target="_blank">gsu1.uchicago.edu</a> may be some grid school host that is down or non-existent???<br>
<br>
- Mike<br>
<div><div></div><div class="h5"><br>
><br>
> Condor submit file:<br>
> executable=/bin/echo<br>
> arguments=Hello World!<br>
> output=results.output<br>
> error=results.error<br>
> log=results.log<br>
> notification=never<br>
> universe=grid<br>
> grid_resource=gt2 <a href="http://gsu1.uchicago.edu/jobmanager-fork" target="_blank">gsu1.uchicago.edu/jobmanager-fork</a><br>
> queue<br>
><br>
> and I submitted with:<br>
> condor_submit myjob.submit<br>
><br>
> results in:<br>
> -- Submitter: <a href="http://bridled.ci.uchicago.edu" target="_blank">bridled.ci.uchicago.edu</a> : < <a href="http://128.135.125.18:49572" target="_blank">128.135.125.18:49572</a> > :<br>
> <a href="http://bridled.ci.uchicago.edu" target="_blank">bridled.ci.uchicago.edu</a><br>
> ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD<br>
> 57.0 aespinosa 6/15 11:54 0+00:00:00 I 0 7.3 condor_dagman<br>
> 71.0 arjun 6/23 11:24 0+00:00:00 H 0 0.0 echo Hello World!<br>
> 72.0 arjun 6/23 17:05 0+00:00:00 H 0 1.0 bash /opt/osg/data<br>
><br>
> 3 jobs; 1 idle, 0 running, 2 held<br>
><br>
> (My first job is the condor-submit job, the second is the job I<br>
> attempted to submit via swift.)<br>
><br>
> Any thoughts?<br>
><br>
> Arjun<br>
><br>
><br>
> On Wed, Jun 23, 2010 at 11:26 AM, Arjun Comar <<br>
> <a href="mailto:mandaya@rose-hulman.edu">mandaya@rose-hulman.edu</a> > wrote:<br>
><br>
><br>
> Hey all,<br>
> I've been trying to get jobs submitted over Condor via swift, and<br>
> running into a few problems. I think I've finally hit a point where<br>
> it's the Condor provider itself that's failing over any of my<br>
> configurations.<br>
> Here's the sites entry (though any sites entry over Condor will do,<br>
> and I've tried several):<br>
><br>
> <config><br>
> <pool handle=" <a href="http://Nebraska_red.unl.edu" target="_blank">Nebraska_red.unl.edu</a> "><br>
> <gridftp url="gsiftp:// <a href="http://red.unl.edu/" target="_blank">red.unl.edu/</a> "/><br>
> <execution provider="condor"/><br>
> <profile key="jobType" namespace="globus">grid</profile><br>
> <profile key="gridResource" namespace="globus">gt2<br>
> <a href="http://red.unl.edu/jobmanager-condor" target="_blank">red.unl.edu/jobmanager-condor</a> </profile><br>
> <workdirectory>/opt/osg/data/engage/tmp/ <a href="http://red.unl.edu" target="_blank">red.unl.edu</a> </workdirectory><br>
> </pool><br>
> </config><br>
><br>
> And any swift script at all fails, even a simple helloworld:<br>
><br>
> type messagefile;<br>
> app (messagefile t) greeting () {<br>
> echo "Hello, world!" stdout=@filename(t);<br>
> }<br>
> messagefile outfile <"hello.txt">;<br>
> outfile = greeting();<br>
><br>
> With the following error:<br>
> The following errors have occurred:<br>
> 1. Application "echo" failed (Cannot submit job: Could not submit job<br>
> (condor_submit reported an exit code of 1). Submitting job(s)<br>
> Found illegal unescaped double-quote: "" -e /bin/echo -out hello.txt<br>
> -err stderr.txt -i -d "" -if "" -of hello.txt -k "" -status "provider<br>
> " -a "Hello, world!"The full arguments you specified were:<br>
> /opt/osg/data/engage/tmp/<br>
> <a href="http://red.unl.edu/helloworld-20100623-1051-dr5v5apa/shared/_swiftwrap" target="_blank">red.unl.edu/helloworld-20100623-1051-dr5v5apa/shared/_swiftwrap</a><br>
> echo-dqt6jttj -jobdir d -scratch "" -e /bin/echo -out hello.txt -err<br>
> stderr.txt -i -d "" -if "" -of hello.txt -k "" -status "provider " -a<br>
> "Hello, world!")<br>
><br>
> The same script runs just fine through any other submission mechanism,<br>
> even to the same site (ssh, coasters+ssh:pbs, etc).<br>
><br>
> Anyone have any thoughts on fixing the problem?<br>
><br>
> Thanks!<br>
><br>
> --<br>
> Arjun Comar, Rose-Hulman '12<br>
><br>
><br>
><br>
> --<br>
> Arjun Comar, Rose-Hulman '12<br>
<br>
</div></div><font color="#888888">--<br>
Michael Wilde<br>
Computation Institute, University of Chicago<br>
Mathematics and Computer Science Division<br>
Argonne National Laboratory<br>
<br>
</font></blockquote></div><br><br clear="all"><br>-- <br>Arjun Comar, Rose-Hulman '12<br>