[Swift-user] Re: Error with condor provider
Allan Espinosa
aespinosa at cs.uchicago.edu
Thu Jun 24 12:57:08 CDT 2010
Hi Arjun,
I see that you are connecting to my condor instance on bridled.ci. It
may not have access permissions for other users other than me as I
don't know if its configured to be a general condor service. I proxy
access problems when I was using the condor service installed in a
default debian installation in my workstation.
-Allan
2010/6/24 Michael Wilde <wilde at mcs.anl.gov>:
>
> ----- "Arjun Comar" <mandaya at rose-hulman.edu> wrote:
>
>> I don't think it's just the host, because the swift job was submitted
>> to red.unl.edu , which I know is a valid host. And Justin suggested
>> running without that gridresource line, but then it just fails
>> outright. So I switched the gridresource to gt2
>> red.unl.edu/jobmanager-fork , and ran again, and this time I got an
>> actual reason for being held in the log file. Here's the exact line:
>> 012 (077.000.000) 06/24 11:21:05 Job was held.
>> Failed to get expiration time of proxy
>> Code 0 Subcode 0
>> ...
>>
>> I have a valid proxy, so I don't know what the problem is. Does this
>> indicate that I need to switch to voms-proxy over grid-proxy?
>
> You should test this step by step, layer by layer. Use whatever proxy type you would use when testing with the standard grid commands.
>
> Try a short simple job using the gt2 fork provider (jobmanager=fork in the sites entry I think it is).
>
> Are you testing standalone condor-g?
>
> Are you doing this from engage-submit (where there is a known-working OSG client stack)?
>
> - Mike
>
>
>>
>> Arjun
>>
>>
>> On Wed, Jun 23, 2010 at 5:40 PM, wilde at mcs.anl.gov < wilde at mcs.anl.gov
>> > wrote:
>>
>>
>>
>>
>> ----- "Arjun Comar" < mandaya at rose-hulman.edu > wrote:
>>
>> > Alright, so I just tried again with trunk over Mike's stable
>> > repository and it worked, so it looks like whatever the problem was
>> > got fixed between stable and trunk.
>>
>> Or uncommitted in my working dirs and needs testing and checkin :(
>>
>>
>> > However submitting jobs to condor
>> > results in the job simply getting held. Do I need any permissions
>> over
>> > grid-proxy-init? Same thing happens when I run condor_submit on the
>> > command line.
>>
>> There is a good little guide on debugging Condor problems:
>> http://www.cs.wisc.edu/condor/CondorWeek2004/presentations/effective_condorg_v4.doc
>> (likely superseded now; talks about what causes jobs to go on hold,
>> and how to look at log files)
>>
>> Things like bad paths or args can cause jobs to fail, and get held
>> and/or retried.
>>
>> Below, the host name gsu1.uchicago.edu may be some grid school host
>> that is down or non-existent???
>>
>> - Mike
>>
>>
>>
>>
>> >
>> > Condor submit file:
>> > executable=/bin/echo
>> > arguments=Hello World!
>> > output=results.output
>> > error=results.error
>> > log=results.log
>> > notification=never
>> > universe=grid
>> > grid_resource=gt2 gsu1.uchicago.edu/jobmanager-fork
>> > queue
>> >
>> > and I submitted with:
>> > condor_submit myjob.submit
>> >
>> > results in:
>> > -- Submitter: bridled.ci.uchicago.edu : < 128.135.125.18:49572 > :
>> > bridled.ci.uchicago.edu
>> > ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
>> > 57.0 aespinosa 6/15 11:54 0+00:00:00 I 0 7.3 condor_dagman
>> > 71.0 arjun 6/23 11:24 0+00:00:00 H 0 0.0 echo Hello World!
>> > 72.0 arjun 6/23 17:05 0+00:00:00 H 0 1.0 bash /opt/osg/data
>> >
>> > 3 jobs; 1 idle, 0 running, 2 held
>> >
>> > (My first job is the condor-submit job, the second is the job I
>> > attempted to submit via swift.)
>> >
>> > Any thoughts?
>> >
>> > Arjun
>> >
>> >
>> > On Wed, Jun 23, 2010 at 11:26 AM, Arjun Comar <
>> > mandaya at rose-hulman.edu > wrote:
>> >
>> >
>> > Hey all,
>> > I've been trying to get jobs submitted over Condor via swift, and
>> > running into a few problems. I think I've finally hit a point where
>> > it's the Condor provider itself that's failing over any of my
>> > configurations.
>> > Here's the sites entry (though any sites entry over Condor will do,
>> > and I've tried several):
>> >
>> > <config>
>> > <pool handle=" Nebraska_red.unl.edu ">
>> > <gridftp url="gsiftp:// red.unl.edu/ "/>
>> > <execution provider="condor"/>
>> > <profile key="jobType" namespace="globus">grid</profile>
>> > <profile key="gridResource" namespace="globus">gt2
>> > red.unl.edu/jobmanager-condor </profile>
>> > <workdirectory>/opt/osg/data/engage/tmp/ red.unl.edu
>> </workdirectory>
>> > </pool>
>> > </config>
>> >
>> > And any swift script at all fails, even a simple helloworld:
>> >
>> > type messagefile;
>> > app (messagefile t) greeting () {
>> > echo "Hello, world!" stdout=@filename(t);
>> > }
>> > messagefile outfile <"hello.txt">;
>> > outfile = greeting();
>> >
>> > With the following error:
>> > The following errors have occurred:
>> > 1. Application "echo" failed (Cannot submit job: Could not submit
>> job
>> > (condor_submit reported an exit code of 1). Submitting job(s)
>> > Found illegal unescaped double-quote: "" -e /bin/echo -out hello.txt
>> > -err stderr.txt -i -d "" -if "" -of hello.txt -k "" -status
>> "provider
>> > " -a "Hello, world!"The full arguments you specified were:
>> > /opt/osg/data/engage/tmp/
>> > red.unl.edu/helloworld-20100623-1051-dr5v5apa/shared/_swiftwrap
>> > echo-dqt6jttj -jobdir d -scratch "" -e /bin/echo -out hello.txt -err
>> > stderr.txt -i -d "" -if "" -of hello.txt -k "" -status "provider "
>> -a
>> > "Hello, world!")
>> >
>> > The same script runs just fine through any other submission
>> mechanism,
>> > even to the same site (ssh, coasters+ssh:pbs, etc).
>> >
>> > Anyone have any thoughts on fixing the problem?
>> >
>> > Thanks!
>> >
>> > --
>> > Arjun Comar, Rose-Hulman '12
>> >
>> >
>> >
>> > --
>> > Arjun Comar, Rose-Hulman '12
>>
>> --
>> Michael Wilde
>> Computation Institute, University of Chicago
>> Mathematics and Computer Science Division
>> Argonne National Laboratory
>>
>>
>
More information about the Swift-user
mailing list