[Swift-devel] Re: [CI Ticketing System #1074] How to set .soft and env to run condor on TeraPort?

Glen Hocky hockyg at uchicago.edu
Fri Jun 19 11:06:24 CDT 2009


(and ben)

On Fri, Jun 19, 2009 at 11:05 AM, Glen Hocky <hockyg at uchicago.edu> wrote:

> That did it for me! Thanks Zhao
>
>
> On Fri, Jun 19, 2009 at 11:00 AM, Zhao Zhang <zhaozhang at uchicago.edu>wrote:
>
>> Here is my .soft
>>
>> [zzhang at tp-grid1 ~]$ cat .soft
>> #
>> # This is your SoftEnv configuration run control file.
>> #
>> #   It is used to tell SoftEnv how to customize your environment by
>> #   setting up variables such as PATH and MANPATH.  To learn more
>> #   about this file, do a "man softenv".
>> #
>> +java-sun
>> +osg-client
>> +maui
>> +torque
>> @python-2.5
>> @osg
>> @default
>> @globus-4
>>
>> And the source file is
>> source /opt/osg/setup.sh
>>
>> zhao
>>
>> Glen Hocky wrote:
>>
>>> This did update my condor and globus locations, but did not fix the
>>> problem. Hopefully Zhao can tell me what to do next
>>>
>>> [hockyg at tp-grid1 swift]$ which condor_q
>>> /soft/condor-7.0.5-r1/bin/condor_q
>>> [hockyg at tp-grid1 swift]$ condor_q
>>>
>>> Neither the environment variable CONDOR_CONFIG,
>>> /etc/condor/, nor ~condor/ contain a condor_config source.
>>> Either set CONDOR_CONFIG to point to a valid config source,
>>> or put a "condor_config" file in /etc/condor or ~condor/
>>> Exiting.
>>>
>>>
>>> On Fri, Jun 19, 2009 at 8:49 AM, Ti Leggett <support at ci.uchicago.edu<mailto:
>>> support at ci.uchicago.edu>> wrote:
>>>
>>>    There were some misconfigurations in the @globus-4 macro for
>>>    rhel-5 and condor
>>>    that I've just fixed. Can you set your ~/.soft to look like below
>>>    and then run
>>>    resoft:
>>>
>>>    @globus-4
>>>
>>>    @default
>>>
>>>    You should be using /soft/condor-7.0.5-r1 and
>>>    /soft/globus-4.2.1-r2 after that.
>>>    Let me know if that works for you, or if anything changes.
>>>
>>>    On Thu Jun 18 16:44:15 2009, wilde at mcs.anl.gov
>>>    <mailto:wilde at mcs.anl.gov> wrote:
>>>    > Hi,
>>>    >
>>>    > Swift users need to run the condor-g client in order to send jobs to
>>>    > OSG
>>>    > sites from a Swift script.
>>>    >
>>>    > Can you tell us how to set .soft and env so that condor_submit to
>>>    > "grid"
>>>    > universe works?
>>>    >
>>>    > We've had all sorts of problems in getting this to work well:
>>>    >
>>>    > - the version of condor client code on communicado is too new to run
>>>    > with Swift.
>>>    >
>>>    > - On teraport, it seems difficult to get the right settings of .soft
>>>    > entries and setup.sh scripts to work corrcetly together
>>>    >
>>>    > - I still dont know if what worked for Zhao on tp-osg a month ago
>>>    > still
>>>    > works. It seems not to, and I cant tell if its because of a
>>>    change in
>>>    > .soft or env settings, or some other software issue
>>>    >
>>>    > - We would like to run from Teraport compute nodes with qsub -I, and
>>>    > hope that whatever we determine to be the right settings for login
>>>    > nodes
>>>    > work on interactive compute nodes as well.
>>>    >
>>>    > - It would be good *not* to run on tp-osg.
>>>    >
>>>    > Suchandra, Ti, or Greg, can you help us sort out how to set things
>>>    > correctly?
>>>    >
>>>    > Tanks,
>>>    >
>>>    > Mike
>>>    >
>>>    >
>>>    > -------- Original Message --------
>>>    > Subject: Re: [Swift-devel] Cant run condor-g on TeraPort
>>>    > Date: Thu, 18 Jun 2009 19:31:26 +0000 (GMT)
>>>    > From: Ben Clifford <benc at hawaga.org.uk <mailto:benc at hawaga.org.uk>>
>>>    > To: Michael Wilde <wilde at mcs.anl.gov <mailto:wilde at mcs.anl.gov>>
>>>    > CC: swift-devel <swift-devel at ci.uchicago.edu
>>>    <mailto:swift-devel at ci.uchicago.edu>>
>>>    > References: <4A3A93E2.2080805 at mcs.anl.gov
>>>    <mailto:4A3A93E2.2080805 at mcs.anl.gov>>
>>>    >
>>>    >
>>>    > condor_q works for me on tp-osg if I source /opt/osg/setenv.sh
>>>    rather
>>>    > than
>>>    > use softenv. it doesn't work for me if I use @osg in softenv,
>>>    with the
>>>    > error you report.
>>>    >
>>>    > On Thu, 18 Jun 2009, Michael Wilde wrote:
>>>    >
>>>    > > As far as I can tell, the condor client code is broken on
>>>    TeraPort.
>>>    > >
>>>    > > Ive tried this on tp-login and tp-osg; I am using +osg-client and
>>>    > @osg in my
>>>    > > .soft. I source $VDT_LOCATION/setup.sh
>>>    > >
>>>    > > Zhao, Glen, can you cross-check and see if you are now seeing the
>>>    > same thing?
>>>    > >
>>>    > > My suspicion is that the condor client config broke in the last
>>>    > month, through
>>>    > > OSG changes, CI Support work, etc etc.
>>>    > >
>>>    > > - Mike
>>>    > >
>>>    > >
>>>    > > I get this from condor_q:
>>>    > >
>>>    > > tp$ condor_q
>>>    > > Error:
>>>    > >
>>>    > > Extra Info: You probably saw this error because the
>>>    condor_schedd is
>>>    > not
>>>    > > running on the machine you are trying to query. If the
>>>    condor_schedd
>>>    > is not
>>>    > > running, the Condor system will not be able to find an address and
>>>    > port to
>>>    > > connect to and satisfy this request. Please make sure the Condor
>>>    > daemons are
>>>    > > running and try again.
>>>    > >
>>>    > > Extra Info: If the condor_schedd is running on the machine you are
>>>    > trying to
>>>    > > query and you still see the error, the most likely cause is
>>>    that you
>>>    > have
>>>    > > setup a personal Condor, you have not defined SCHEDD_NAME in your
>>>    > > condor_config file, and something is wrong with your
>>>    > SCHEDD_ADDRESS_FILE
>>>    > > setting. You must define either or both of those settings in your
>>>    > config
>>>    > > file, or you must use the -name option to condor_q. Please see the
>>>    > Condor
>>>    > > manual for details on SCHEDD_NAME and SCHEDD_ADDRESS_FILE.
>>>    > > tp$
>>>    > >
>>>    > > and this from swift:
>>>    > >
>>>    > > tp-grid1$ swift -tc.file tc.data -sites.file sites.condorg.xml
>>>    > cat.swift
>>>    > > Swift svn swift-r2890 cog-r2392
>>>    > >
>>>    > > RunID: 20090618-1404-mo0thjj4
>>>    > > Progress:
>>>    > > Progress: Stage in:1
>>>    > > Progress: Submitted:1
>>>    > > Failed to transfer wrapper log from cat-20090618-1404-
>>>    > mo0thjj4/info/h on
>>>    > > firefly
>>>    > > Progress: Failed:1
>>>    > > Execution failed:
>>>    > > Exception in cat:
>>>    > > Arguments: [data.txt]
>>>    > > Host: firefly
>>>    > > Directory: cat-20090618-1404-mo0thjj4/jobs/h/cat-hv5s3gcj
>>>    > > stderr.txt:
>>>    > >
>>>    > > stdout.txt:
>>>    > >
>>>    > > ----
>>>    > >
>>>    > > Caused by:
>>>    > > Cannot submit job: Could not submit job (condor_submit reported an
>>>    > > exit code of 1). no error output
>>>    > > tp-grid1$ ls
>>>    > >
>>>    > > --
>>>    > >
>>>    > > Using this sites file:
>>>    > >
>>>    > > <config>
>>>    > > <pool handle="firefly" >
>>>    > > <gridftp url="gsiftp://ff-grid.unl.edu
>>>    <http://ff-grid.unl.edu>" />
>>>    > > <execution provider="condor" />
>>>    > > <profile namespace="globus" key="jobType">grid</profile>
>>>    > > <profile namespace="globus" key="gridResource">gt2
>>>    > > ff-grid.unl.edu/jobmanager-pbs
>>>    <http://ff-grid.unl.edu/jobmanager-pbs></profile>
>>>    > > <workdirectory
>>>    > >/panfs/panasas/CMS/data/oops/wilde/swiftwork</workdirectory>
>>>    > > </pool>
>>>    > > </config>
>>>    > > _______________________________________________
>>>    > > Swift-devel mailing list
>>>    > > Swift-devel at ci.uchicago.edu <mailto:Swift-devel at ci.uchicago.edu>
>>>    > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>    > >
>>>    > >
>>>
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20090619/20d7180d/attachment.html>


More information about the Swift-devel mailing list