[Swift-user] using swift on a cluster
Michael Wilde
wilde at mcs.anl.gov
Wed Oct 21 12:36:43 CDT 2009
Erin,
The first line of your sites.xml file seems to be left there in error:
> [hodgess at grid bin]$ cat sites.xml
> <execution provider="condor" url="none" />
Can you remove that and try again? Im not sure how that got parsed.
- Mike
On 10/21/09 12:10 PM, Hodgess, Erin wrote:
> Hi again!
>
> Here are the sites.xml and tc.data files.
>
> Thanks,
> Erin
>
>
> [hodgess at grid bin]$ cat sites.xml
> <execution provider="condor" url="none" />
>
> <config>
>
> <pool handle="localhost">
> <gridftp url="local://localhost" />
> <execution provider="local" url="none" />
> <workdirectory>/home/hodgess/swiftwork</workdirectory>
> <profile namespace="karajan" key="jobThrottle">.03</profile>
> <profile namespace="karajan" key="initialScore">10000</profile>
> </pool>
>
> <pool handle="condor">
> <execution provider="condor" url="none"/>
> <gridftp url="local://localhost"/>
> <workdirectory>/home/hodgess/swiftwork</workdirectory>
> <profile namespace="karajan" key="jobThrottle">.19</profile>
> <profile namespace="karajan" key="initialScore">10000</profile>
> </pool>
>
> </config>
> [hodgess at grid bin]$ cat tc.data
> localhost convert /usr/bin/convert INSTALLED
> INTEL32::LINUX null
> localhost RInvoke /home/hodgess/R-2.9.2/bin/RInvoke.sh
> INSTALLED INTEL32::LINUX null
> condor RInvoke /home/hodgess/R-2.9.2/bin/RInvoke.sh INSTALLED
> INTEL32::LINUX null
> [hodgess at grid bin]$ cat firstR.R
> cat: firstR.R: No such file or directory
> [hodgess at grid bin]$ cat firstR.swift
> type file{}
> app (file output) firstone (file scriptFile) {
> RInvoke @filename(scriptFile) @filename(output);
> }
>
>
> file scriptFile <"a1.in" >;
> file output <"a1.out" >;
> output=firstone(scriptFile);
> [hodgess at grid bin]$
>
>
> Erin M. Hodgess, PhD
> Associate Professor
> Department of Computer and Mathematical Sciences
> University of Houston - Downtown
> mailto: hodgesse at uhd.edu
>
>
>
> -----Original Message-----
> From: Michael Wilde [mailto:wilde at mcs.anl.gov]
> Sent: Wed 10/21/2009 9:22 AM
> To: Hodgess, Erin
> Cc: swift-user at ci.uchicago.edu
> Subject: Re: [Swift-user] using swift on a cluster
>
> Erin, we need to look into this further.
>
> Please make sure that you are running either Swift 0.9 or the latest
> source from svn. And tell us what revision you are running.
>
> Also please post your tc.data and sites.xml (and log file is its small
> enought); see if there are any messages in the .log file that would
> clarify the error.
>
> Make sure that your app is cataloged in tc.data as being on pool
> "condor". But I think if it were not, you'd see a different error.
>
> It almost looks to me like Swift is looking for the GRAM service contact
> string, as if it thinks you are asking for Condor-G instead of local
> Condor, eg:
>
> <profile namespace="globus" key="jobType">grid</profile>
> <profile namespace="globus"
> key="gridResource">gt2 belhaven-1.renci.org/jobmanager-fork</profile>
>
> Just as a test, try changing provider="condor" to "pbs" in sites.xml. If
> the error changes to something like "PBS not installed" or "qsub not
> found" then I would suspect this is the case.
>
> Its possible you can add just the jobType element with the value set to
> vanilla instead of grid, but I am purely *guessing*; we'll look deeper
> as soon as you send the info above and we have time.
>
> - Mike
>
>
> On 10/21/09 9:03 AM, Hodgess, Erin wrote:
> > Here is the output:
> >
> >
> > [hodgess at grid bin]$ swift -tc.file tc.data -sites.file sites.xml
> > firstR.swift
> > Swift 0.9 swift-r2860 cog-r2388
> >
> > RunID: 20091021-0901-aku7y862
> > Progress:
> > Execution failed:
> > No service contacts available
> > [hodgess at grid bin]$
> >
> >
> >
> > Erin M. Hodgess, PhD
> > Associate Professor
> > Department of Computer and Mathematical Sciences
> > University of Houston - Downtown
> > mailto: hodgesse at uhd.edu
> >
> >
> >
> > -----Original Message-----
> > From: Michael Wilde [mailto:wilde at mcs.anl.gov]
> > Sent: Wed 10/21/2009 7:02 AM
> > To: Hodgess, Erin
> > Cc: swift-user at ci.uchicago.edu
> > Subject: Re: [Swift-user] using swift on a cluster
> >
> > For running Swift locally on a Condor cluster, use a sites.xml based on
> > this example:
> >
> > <execution provider="condor" url="none" />
> >
> > <config>
> >
> > <pool handle="localhost">
> > <gridftp url="local://localhost" />
> > <execution provider="local" url="none" />
> > <workdirectory>/home/erin/swiftwork</workdirectory>
> > <profile namespace="karajan" key="jobThrottle">.03</profile>
> > <profile namespace="karajan" key="initialScore">10000</profile>
> > </pool>
> >
> > <pool handle="condor">
> > <execution provider="condor" url="none"/>
> > <gridftp url="local://localhost"/>
> > <workdirectory>/home/erin/swiftwork</workdirectory>
> > <profile namespace="karajan" key="jobThrottle">.19</profile>
> > <profile namespace="karajan" key="initialScore">10000</profile>
> > </pool>
> >
> > </config>
> >
> > The jobThrottle values above will enable Swift to run up to 4 jobs at a
> > time on localhost and 20 jobs at a time on the Condor cluster.
> >
> > Use tc.data to catalog applications on pool or the other.
> >
> > Set jobThrottle as desired to control execution parallelism.
> >
> > #jobs run in parallel is (jobThrottle * 100)+1
> >
> > initialScore=10000 overrides Swift's "start slow" approach to sensing
> > the site's responsiveness.
> >
> > - Mike
> >
> > On 10/21/09 3:17 AM, Hodgess, Erin wrote:
> > > Aha!
> > >
> > > I needed the universe=vanilla line.
> > >
> > >
> > >
> > > Erin M. Hodgess, PhD
> > > Associate Professor
> > > Department of Computer and Mathematical Sciences
> > > University of Houston - Downtown
> > > mailto: hodgesse at uhd.edu
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: swift-user-bounces at ci.uchicago.edu on behalf of Hodgess, Erin
> > > Sent: Wed 10/21/2009 3:07 AM
> > > To: Michael Wilde
> > > Cc: swift-user at ci.uchicago.edu
> > > Subject: RE: [Swift-user] using swift on a cluster
> > >
> > > Hello!
> > >
> > > We are indeed using condor.
> > >
> > > I wanted to try a small test run, but am running into trouble:
> > >
> > > [hodgess at grid bin]$ cat myjob.submit
> > > executable=/usr/bin/id
> > > output=results.output
> > > error=results.error
> > > log=results.log
> > > queue
> > > [hodgess at grid bin]$ condor_submit myjob.submit
> > > Submitting job(s).
> > > Logging submit event(s).
> > > 1 job(s) submitted to cluster 15.
> > > [hodgess at grid bin]$ ls results*
> > > results.error results.log results.output
> > > You have new mail in /var/spool/mail/hodgess
> > > [hodgess at grid bin]$ cat results.log
> > > 000 (015.000.000) 10/21 03:06:03 Job submitted from host:
> > > <192.168.1.11:46274>
> > > ...
> > > 001 (015.000.000) 10/21 03:06:05 Job executing on host:
> > <10.1.255.244:44508>
> > > ...
> > > 002 (015.000.000) 10/21 03:06:05 (1) Job not properly linked for
> Condor.
> > > ...
> > > 009 (015.000.000) 10/21 03:06:05 Job was aborted by the user.
> > > ...
> > > [hodgess at grid bin]$
> > >
> > > I'm not sure why the job is not linked.
> > >
> > > Any suggestions would be much appreciated.
> > >
> > > Thanks,
> > > Erin
> > >
> > >
> > > Erin M. Hodgess, PhD
> > > Associate Professor
> > > Department of Computer and Mathematical Sciences
> > > University of Houston - Downtown
> > > mailto: hodgesse at uhd.edu
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Michael Wilde [mailto:wilde at mcs.anl.gov]
> > > Sent: Tue 10/20/2009 10:49 PM
> > > To: Hodgess, Erin
> > > Cc: swift-user at ci.uchicago.edu
> > > Subject: Re: [Swift-user] using swift on a cluster
> > >
> > > Hi Erin,
> > >
> > > I'm assuming you meant "use Swift to run jobs on the compute nodes of
> > > the cluster"?
> > >
> > > If so, you first need to find out what scheduler (also called "batch
> > > system" or "local resource manager") the cluster is running.
> > >
> > > Thats typical one of these: PBS, Condor, or SGE.
> > >
> > > Either ask your system administrator, or see if the "man" command or
> > > similar probes give you a clue:
> > >
> > > Condor: condor_q -version
> > >
> > > condor_q -version
> > > $CondorVersion: 7.2.4 Jun 16 2009 BuildID: 159529 $
> > > $CondorPlatform: I386-LINUX_RHEL5 $
> > >
> > > PBS: man qstat:
> > >
> > > qstat(1B) PBS
> > >
> > > SGE: man qstat:
> > >
> > > QSTAT(1) Sun Grid Engine User Commands
> > >
> > >
> > > If its PBS or Condor, then the Swift user guide gives the sites.xml
> > > entries to use.
> > >
> > > Tell us what you find, then try following the instructions in the user
> > > guide, and follow up with questions as needed.
> > >
> > > - Mike
> > >
> > >
> > > On 10/20/09 9:41 PM, Hodgess, Erin wrote:
> > > > Hi Swift Users:
> > > >
> > > > I'm on a cluster and would like to use swift on the different
> sites on
> > > > the cluster.
> > > >
> > > > How would I do that, please?
> > > >
> > > > Thanks,
> > > > Erin
> > > >
> > > >
> > > > Erin M. Hodgess, PhD
> > > > Associate Professor
> > > > Department of Computer and Mathematical Sciences
> > > > University of Houston - Downtown
> > > > mailto: hodgesse at uhd.edu
> > > >
> > > >
> > > >
> > ------------------------------------------------------------------------
> > > >
> > > > _______________________________________________
> > > > Swift-user mailing list
> > > > Swift-user at ci.uchicago.edu
> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> > >
> > >
> >
>
More information about the Swift-user
mailing list