[Swift-devel] Condor-G test on UJ-OSG site
Mihael Hategan
hategan at mcs.anl.gov
Thu May 7 12:03:39 CDT 2009
On Thu, 2009-05-07 at 11:52 -0500, Zhao Zhang wrote:
> Hi,
>
> I got Condor-G test hanging on UJ-OSG site.
> Does the following message mean that there is no condor running on the
> UJ-OSG site?
>
> zhao
>
> sites.xml definition
> [zzhang at tp-grid1 sites]$ cat condor-g/UJ-OSG.xml
> <config>
> <!-- UJ-OSG -->
> <pool handle="UJ-OSG" >
> <gridftp url="gsiftp://osg-ce.grid.uj.ac.za/" />
> <execution provider="condor" />
> <workdirectory >/nfs/data/osg_store/osg/tmp/UJ-OSG</workdirectory>
> <profile namespace="globus" key="jobType">grid</profile>
> <profile namespace="globus" key="gridResource">gt2
> osg-ce.grid.uj.ac.za/jobmanager-condor</profile>
> </pool>
> </config>
>
> Globus-Job-Run:
> [zzhang at tp-grid1 sites]$ globus-job-run osg-ce.grid.uj.ac.za /usr/bin/id
> uid=640(osgedu) gid=2000(osg) groups=2000(osg)
>
> Condor-Hold-Reason
> [zzhang at tp-grid1 ~]$ condor_q 166 -long | grep Hold
> PeriodicHold = FALSE
> OnExitHold = FALSE
> HoldReasonCode = 2
That's Condor's way of saying "gram reported an
error" (http://www.cs.wisc.edu/condor/manual/v6.8/2_5Submitting_Job.html#2162)
> HoldReasonSubCode = 93
And that's the gram error which says "the gatekeeper failed to find the
requested
service" (http://homepages.nesc.ac.uk/~gcw/NGS/GRAM_error_codes.html)
So I think the problem there might be "jobmanager-condor" which should
probably be "jobmanager-pbs" or otherwise something is broken with the
site.
More information about the Swift-devel
mailing list