[Swift-devel] Re: [CI Ticketing System #1074] How to set .soft and env to run condor on TeraPort?

Zhao Zhang zhaozhang at uchicago.edu
Fri Jun 19 11:00:11 CDT 2009


Here is my .soft

[zzhang at tp-grid1 ~]$ cat .soft
#
# This is your SoftEnv configuration run control file.
#
#   It is used to tell SoftEnv how to customize your environment by
#   setting up variables such as PATH and MANPATH.  To learn more
#   about this file, do a "man softenv".
#
+java-sun
+osg-client
+maui
+torque
@python-2.5
@osg
@default
@globus-4

And the source file is
source /opt/osg/setup.sh

zhao

Glen Hocky wrote:
> This did update my condor and globus locations, but did not fix the 
> problem. Hopefully Zhao can tell me what to do next
>
> [hockyg at tp-grid1 swift]$ which condor_q
> /soft/condor-7.0.5-r1/bin/condor_q
> [hockyg at tp-grid1 swift]$ condor_q
>
> Neither the environment variable CONDOR_CONFIG,
> /etc/condor/, nor ~condor/ contain a condor_config source.
> Either set CONDOR_CONFIG to point to a valid config source,
> or put a "condor_config" file in /etc/condor or ~condor/
> Exiting.
>
>
> On Fri, Jun 19, 2009 at 8:49 AM, Ti Leggett <support at ci.uchicago.edu 
> <mailto:support at ci.uchicago.edu>> wrote:
>
>     There were some misconfigurations in the @globus-4 macro for
>     rhel-5 and condor
>     that I've just fixed. Can you set your ~/.soft to look like below
>     and then run
>     resoft:
>
>     @globus-4
>
>     @default
>
>     You should be using /soft/condor-7.0.5-r1 and
>     /soft/globus-4.2.1-r2 after that.
>     Let me know if that works for you, or if anything changes.
>
>     On Thu Jun 18 16:44:15 2009, wilde at mcs.anl.gov
>     <mailto:wilde at mcs.anl.gov> wrote:
>     > Hi,
>     >
>     > Swift users need to run the condor-g client in order to send jobs to
>     > OSG
>     > sites from a Swift script.
>     >
>     > Can you tell us how to set .soft and env so that condor_submit to
>     > "grid"
>     > universe works?
>     >
>     > We've had all sorts of problems in getting this to work well:
>     >
>     > - the version of condor client code on communicado is too new to run
>     > with Swift.
>     >
>     > - On teraport, it seems difficult to get the right settings of .soft
>     > entries and setup.sh scripts to work corrcetly together
>     >
>     > - I still dont know if what worked for Zhao on tp-osg a month ago
>     > still
>     > works. It seems not to, and I cant tell if its because of a
>     change in
>     > .soft or env settings, or some other software issue
>     >
>     > - We would like to run from Teraport compute nodes with qsub -I, and
>     > hope that whatever we determine to be the right settings for login
>     > nodes
>     > work on interactive compute nodes as well.
>     >
>     > - It would be good *not* to run on tp-osg.
>     >
>     > Suchandra, Ti, or Greg, can you help us sort out how to set things
>     > correctly?
>     >
>     > Tanks,
>     >
>     > Mike
>     >
>     >
>     > -------- Original Message --------
>     > Subject: Re: [Swift-devel] Cant run condor-g on TeraPort
>     > Date: Thu, 18 Jun 2009 19:31:26 +0000 (GMT)
>     > From: Ben Clifford <benc at hawaga.org.uk <mailto:benc at hawaga.org.uk>>
>     > To: Michael Wilde <wilde at mcs.anl.gov <mailto:wilde at mcs.anl.gov>>
>     > CC: swift-devel <swift-devel at ci.uchicago.edu
>     <mailto:swift-devel at ci.uchicago.edu>>
>     > References: <4A3A93E2.2080805 at mcs.anl.gov
>     <mailto:4A3A93E2.2080805 at mcs.anl.gov>>
>     >
>     >
>     > condor_q works for me on tp-osg if I source /opt/osg/setenv.sh
>     rather
>     > than
>     > use softenv. it doesn't work for me if I use @osg in softenv,
>     with the
>     > error you report.
>     >
>     > On Thu, 18 Jun 2009, Michael Wilde wrote:
>     >
>     > > As far as I can tell, the condor client code is broken on
>     TeraPort.
>     > >
>     > > Ive tried this on tp-login and tp-osg; I am using +osg-client and
>     > @osg in my
>     > > .soft. I source $VDT_LOCATION/setup.sh
>     > >
>     > > Zhao, Glen, can you cross-check and see if you are now seeing the
>     > same thing?
>     > >
>     > > My suspicion is that the condor client config broke in the last
>     > month, through
>     > > OSG changes, CI Support work, etc etc.
>     > >
>     > > - Mike
>     > >
>     > >
>     > > I get this from condor_q:
>     > >
>     > > tp$ condor_q
>     > > Error:
>     > >
>     > > Extra Info: You probably saw this error because the
>     condor_schedd is
>     > not
>     > > running on the machine you are trying to query. If the
>     condor_schedd
>     > is not
>     > > running, the Condor system will not be able to find an address and
>     > port to
>     > > connect to and satisfy this request. Please make sure the Condor
>     > daemons are
>     > > running and try again.
>     > >
>     > > Extra Info: If the condor_schedd is running on the machine you are
>     > trying to
>     > > query and you still see the error, the most likely cause is
>     that you
>     > have
>     > > setup a personal Condor, you have not defined SCHEDD_NAME in your
>     > > condor_config file, and something is wrong with your
>     > SCHEDD_ADDRESS_FILE
>     > > setting. You must define either or both of those settings in your
>     > config
>     > > file, or you must use the -name option to condor_q. Please see the
>     > Condor
>     > > manual for details on SCHEDD_NAME and SCHEDD_ADDRESS_FILE.
>     > > tp$
>     > >
>     > > and this from swift:
>     > >
>     > > tp-grid1$ swift -tc.file tc.data -sites.file sites.condorg.xml
>     > cat.swift
>     > > Swift svn swift-r2890 cog-r2392
>     > >
>     > > RunID: 20090618-1404-mo0thjj4
>     > > Progress:
>     > > Progress: Stage in:1
>     > > Progress: Submitted:1
>     > > Failed to transfer wrapper log from cat-20090618-1404-
>     > mo0thjj4/info/h on
>     > > firefly
>     > > Progress: Failed:1
>     > > Execution failed:
>     > > Exception in cat:
>     > > Arguments: [data.txt]
>     > > Host: firefly
>     > > Directory: cat-20090618-1404-mo0thjj4/jobs/h/cat-hv5s3gcj
>     > > stderr.txt:
>     > >
>     > > stdout.txt:
>     > >
>     > > ----
>     > >
>     > > Caused by:
>     > > Cannot submit job: Could not submit job (condor_submit reported an
>     > > exit code of 1). no error output
>     > > tp-grid1$ ls
>     > >
>     > > --
>     > >
>     > > Using this sites file:
>     > >
>     > > <config>
>     > > <pool handle="firefly" >
>     > > <gridftp url="gsiftp://ff-grid.unl.edu
>     <http://ff-grid.unl.edu>" />
>     > > <execution provider="condor" />
>     > > <profile namespace="globus" key="jobType">grid</profile>
>     > > <profile namespace="globus" key="gridResource">gt2
>     > > ff-grid.unl.edu/jobmanager-pbs
>     <http://ff-grid.unl.edu/jobmanager-pbs></profile>
>     > > <workdirectory
>     > >/panfs/panasas/CMS/data/oops/wilde/swiftwork</workdirectory>
>     > > </pool>
>     > > </config>
>     > > _______________________________________________
>     > > Swift-devel mailing list
>     > > Swift-devel at ci.uchicago.edu <mailto:Swift-devel at ci.uchicago.edu>
>     > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>     > >
>     > >
>
>



More information about the Swift-devel mailing list