[Swift-devel] Condor-G test on UJ-OSG site

Mihael Hategan hategan at mcs.anl.gov
Thu May 7 12:03:39 CDT 2009


On Thu, 2009-05-07 at 11:52 -0500, Zhao Zhang wrote:
> Hi,
> 
> I got Condor-G test hanging on UJ-OSG site.
> Does the following message mean that there is no condor running on the 
> UJ-OSG site?
> 
> zhao
> 
> sites.xml definition
> [zzhang at tp-grid1 sites]$ cat condor-g/UJ-OSG.xml
> <config>
>   <!-- UJ-OSG -->
>   <pool handle="UJ-OSG" >
>     <gridftp  url="gsiftp://osg-ce.grid.uj.ac.za/" />
>     <execution  provider="condor" />
>     <workdirectory >/nfs/data/osg_store/osg/tmp/UJ-OSG</workdirectory>
>     <profile namespace="globus" key="jobType">grid</profile>
>     <profile namespace="globus" key="gridResource">gt2 
> osg-ce.grid.uj.ac.za/jobmanager-condor</profile>
>   </pool>
> </config>
> 
> Globus-Job-Run:
> [zzhang at tp-grid1 sites]$ globus-job-run osg-ce.grid.uj.ac.za /usr/bin/id
> uid=640(osgedu) gid=2000(osg) groups=2000(osg)
> 
> Condor-Hold-Reason
> [zzhang at tp-grid1 ~]$ condor_q 166 -long | grep Hold
> PeriodicHold = FALSE
> OnExitHold = FALSE
> HoldReasonCode = 2

That's Condor's way of saying "gram reported an
error" (http://www.cs.wisc.edu/condor/manual/v6.8/2_5Submitting_Job.html#2162)

> HoldReasonSubCode = 93

And that's the gram error which says "the gatekeeper failed to find the
requested
service" (http://homepages.nesc.ac.uk/~gcw/NGS/GRAM_error_codes.html)

So I think the problem there might be "jobmanager-condor" which should
probably be "jobmanager-pbs" or otherwise something is broken with the
site.




More information about the Swift-devel mailing list