[Swift-devel] Re: [Swift-user] using swift on a cluster
Michael Wilde
wilde at mcs.anl.gov
Wed Oct 21 15:12:00 CDT 2009
cool.
- Mike
On 10/21/09 2:54 PM, Hodgess, Erin wrote:
> Yay!
>
> That was it!
>
> Thanks,
> Erin
>
>
> -----Original Message-----
> From: Michael Wilde [mailto:wilde at mcs.anl.gov]
> Sent: Wednesday, October 21, 2009 12:37 PM
> To: Hodgess, Erin
> Cc: Swift User Discussion List
> Subject: Re: [Swift-user] using swift on a cluster
>
> Erin,
>
> The first line of your sites.xml file seems to be left there in error:
>
> > [hodgess at grid bin]$ cat sites.xml
> > <execution provider="condor" url="none" />
>
> Can you remove that and try again? Im not sure how that got parsed.
>
> - Mike
>
> On 10/21/09 12:10 PM, Hodgess, Erin wrote:
>> Hi again!
>>
>> Here are the sites.xml and tc.data files.
>>
>> Thanks,
>> Erin
>>
>>
>> [hodgess at grid bin]$ cat sites.xml
>> <execution provider="condor" url="none" />
>>
>> <config>
>>
>> <pool handle="localhost">
>> <gridftp url="local://localhost" />
>> <execution provider="local" url="none" />
>> <workdirectory>/home/hodgess/swiftwork</workdirectory>
>> <profile namespace="karajan" key="jobThrottle">.03</profile>
>> <profile namespace="karajan" key="initialScore">10000</profile>
>> </pool>
>>
>> <pool handle="condor">
>> <execution provider="condor" url="none"/>
>> <gridftp url="local://localhost"/>
>> <workdirectory>/home/hodgess/swiftwork</workdirectory>
>> <profile namespace="karajan" key="jobThrottle">.19</profile>
>> <profile namespace="karajan" key="initialScore">10000</profile>
>> </pool>
>>
>> </config>
>> [hodgess at grid bin]$ cat tc.data
>> localhost convert /usr/bin/convert INSTALLED
>> INTEL32::LINUX null
>> localhost RInvoke /home/hodgess/R-2.9.2/bin/RInvoke.sh
>> INSTALLED INTEL32::LINUX null
>> condor RInvoke /home/hodgess/R-2.9.2/bin/RInvoke.sh INSTALLED
>
>> INTEL32::LINUX null
>> [hodgess at grid bin]$ cat firstR.R
>> cat: firstR.R: No such file or directory
>> [hodgess at grid bin]$ cat firstR.swift
>> type file{}
>> app (file output) firstone (file scriptFile) {
>> RInvoke @filename(scriptFile) @filename(output);
>> }
>>
>>
>> file scriptFile <"a1.in" >;
>> file output <"a1.out" >;
>> output=firstone(scriptFile);
>> [hodgess at grid bin]$
>>
>>
>> Erin M. Hodgess, PhD
>> Associate Professor
>> Department of Computer and Mathematical Sciences
>> University of Houston - Downtown
>> mailto: hodgesse at uhd.edu
>>
>>
>>
>> -----Original Message-----
>> From: Michael Wilde [mailto:wilde at mcs.anl.gov]
>> Sent: Wed 10/21/2009 9:22 AM
>> To: Hodgess, Erin
>> Cc: swift-user at ci.uchicago.edu
>> Subject: Re: [Swift-user] using swift on a cluster
>>
>> Erin, we need to look into this further.
>>
>> Please make sure that you are running either Swift 0.9 or the latest
>> source from svn. And tell us what revision you are running.
>>
>> Also please post your tc.data and sites.xml (and log file is its small
>> enought); see if there are any messages in the .log file that would
>> clarify the error.
>>
>> Make sure that your app is cataloged in tc.data as being on pool
>> "condor". But I think if it were not, you'd see a different error.
>>
>> It almost looks to me like Swift is looking for the GRAM service
> contact
>> string, as if it thinks you are asking for Condor-G instead of local
>> Condor, eg:
>>
>> <profile namespace="globus" key="jobType">grid</profile>
>> <profile namespace="globus"
>> key="gridResource">gt2
> belhaven-1.renci.org/jobmanager-fork</profile>
>> Just as a test, try changing provider="condor" to "pbs" in sites.xml.
> If
>> the error changes to something like "PBS not installed" or "qsub not
>> found" then I would suspect this is the case.
>>
>> Its possible you can add just the jobType element with the value set
> to
>> vanilla instead of grid, but I am purely *guessing*; we'll look deeper
>> as soon as you send the info above and we have time.
>>
>> - Mike
>>
>>
>> On 10/21/09 9:03 AM, Hodgess, Erin wrote:
>> > Here is the output:
>> >
>> >
>> > [hodgess at grid bin]$ swift -tc.file tc.data -sites.file sites.xml
>> > firstR.swift
>> > Swift 0.9 swift-r2860 cog-r2388
>> >
>> > RunID: 20091021-0901-aku7y862
>> > Progress:
>> > Execution failed:
>> > No service contacts available
>> > [hodgess at grid bin]$
>> >
>> >
>> >
>> > Erin M. Hodgess, PhD
>> > Associate Professor
>> > Department of Computer and Mathematical Sciences
>> > University of Houston - Downtown
>> > mailto: hodgesse at uhd.edu
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: Michael Wilde [mailto:wilde at mcs.anl.gov]
>> > Sent: Wed 10/21/2009 7:02 AM
>> > To: Hodgess, Erin
>> > Cc: swift-user at ci.uchicago.edu
>> > Subject: Re: [Swift-user] using swift on a cluster
>> >
>> > For running Swift locally on a Condor cluster, use a sites.xml
> based on
>> > this example:
>> >
>> > <execution provider="condor" url="none" />
>> >
>> > <config>
>> >
>> > <pool handle="localhost">
>> > <gridftp url="local://localhost" />
>> > <execution provider="local" url="none" />
>> > <workdirectory>/home/erin/swiftwork</workdirectory>
>> > <profile namespace="karajan" key="jobThrottle">.03</profile>
>> > <profile namespace="karajan"
> key="initialScore">10000</profile>
>> > </pool>
>> >
>> > <pool handle="condor">
>> > <execution provider="condor" url="none"/>
>> > <gridftp url="local://localhost"/>
>> > <workdirectory>/home/erin/swiftwork</workdirectory>
>> > <profile namespace="karajan" key="jobThrottle">.19</profile>
>> > <profile namespace="karajan"
> key="initialScore">10000</profile>
>> > </pool>
>> >
>> > </config>
>> >
>> > The jobThrottle values above will enable Swift to run up to 4 jobs
> at a
>> > time on localhost and 20 jobs at a time on the Condor cluster.
>> >
>> > Use tc.data to catalog applications on pool or the other.
>> >
>> > Set jobThrottle as desired to control execution parallelism.
>> >
>> > #jobs run in parallel is (jobThrottle * 100)+1
>> >
>> > initialScore=10000 overrides Swift's "start slow" approach to
> sensing
>> > the site's responsiveness.
>> >
>> > - Mike
>> >
>> > On 10/21/09 3:17 AM, Hodgess, Erin wrote:
>> > > Aha!
>> > >
>> > > I needed the universe=vanilla line.
>> > >
>> > >
>> > >
>> > > Erin M. Hodgess, PhD
>> > > Associate Professor
>> > > Department of Computer and Mathematical Sciences
>> > > University of Houston - Downtown
>> > > mailto: hodgesse at uhd.edu
>> > >
>> > >
>> > >
>> > > -----Original Message-----
>> > > From: swift-user-bounces at ci.uchicago.edu on behalf of Hodgess,
> Erin
>> > > Sent: Wed 10/21/2009 3:07 AM
>> > > To: Michael Wilde
>> > > Cc: swift-user at ci.uchicago.edu
>> > > Subject: RE: [Swift-user] using swift on a cluster
>> > >
>> > > Hello!
>> > >
>> > > We are indeed using condor.
>> > >
>> > > I wanted to try a small test run, but am running into trouble:
>> > >
>> > > [hodgess at grid bin]$ cat myjob.submit
>> > > executable=/usr/bin/id
>> > > output=results.output
>> > > error=results.error
>> > > log=results.log
>> > > queue
>> > > [hodgess at grid bin]$ condor_submit myjob.submit
>> > > Submitting job(s).
>> > > Logging submit event(s).
>> > > 1 job(s) submitted to cluster 15.
>> > > [hodgess at grid bin]$ ls results*
>> > > results.error results.log results.output
>> > > You have new mail in /var/spool/mail/hodgess
>> > > [hodgess at grid bin]$ cat results.log
>> > > 000 (015.000.000) 10/21 03:06:03 Job submitted from host:
>> > > <192.168.1.11:46274>
>> > > ...
>> > > 001 (015.000.000) 10/21 03:06:05 Job executing on host:
>> > <10.1.255.244:44508>
>> > > ...
>> > > 002 (015.000.000) 10/21 03:06:05 (1) Job not properly linked for
>
>> Condor.
>> > > ...
>> > > 009 (015.000.000) 10/21 03:06:05 Job was aborted by the user.
>> > > ...
>> > > [hodgess at grid bin]$
>> > >
>> > > I'm not sure why the job is not linked.
>> > >
>> > > Any suggestions would be much appreciated.
>> > >
>> > > Thanks,
>> > > Erin
>> > >
>> > >
>> > > Erin M. Hodgess, PhD
>> > > Associate Professor
>> > > Department of Computer and Mathematical Sciences
>> > > University of Houston - Downtown
>> > > mailto: hodgesse at uhd.edu
>> > >
>> > >
>> > >
>> > > -----Original Message-----
>> > > From: Michael Wilde [mailto:wilde at mcs.anl.gov]
>> > > Sent: Tue 10/20/2009 10:49 PM
>> > > To: Hodgess, Erin
>> > > Cc: swift-user at ci.uchicago.edu
>> > > Subject: Re: [Swift-user] using swift on a cluster
>> > >
>> > > Hi Erin,
>> > >
>> > > I'm assuming you meant "use Swift to run jobs on the compute
> nodes of
>> > > the cluster"?
>> > >
>> > > If so, you first need to find out what scheduler (also called
> "batch
>> > > system" or "local resource manager") the cluster is running.
>> > >
>> > > Thats typical one of these: PBS, Condor, or SGE.
>> > >
>> > > Either ask your system administrator, or see if the "man"
> command or
>> > > similar probes give you a clue:
>> > >
>> > > Condor: condor_q -version
>> > >
>> > > condor_q -version
>> > > $CondorVersion: 7.2.4 Jun 16 2009 BuildID: 159529 $
>> > > $CondorPlatform: I386-LINUX_RHEL5 $
>> > >
>> > > PBS: man qstat:
>> > >
>> > > qstat(1B) PBS
>> > >
>> > > SGE: man qstat:
>> > >
>> > > QSTAT(1) Sun Grid Engine User Commands
>> > >
>> > >
>> > > If its PBS or Condor, then the Swift user guide gives the
> sites.xml
>> > > entries to use.
>> > >
>> > > Tell us what you find, then try following the instructions in
> the user
>> > > guide, and follow up with questions as needed.
>> > >
>> > > - Mike
>> > >
>> > >
>> > > On 10/20/09 9:41 PM, Hodgess, Erin wrote:
>> > > > Hi Swift Users:
>> > > >
>> > > > I'm on a cluster and would like to use swift on the different
>
>> sites on
>> > > > the cluster.
>> > > >
>> > > > How would I do that, please?
>> > > >
>> > > > Thanks,
>> > > > Erin
>> > > >
>> > > >
>> > > > Erin M. Hodgess, PhD
>> > > > Associate Professor
>> > > > Department of Computer and Mathematical Sciences
>> > > > University of Houston - Downtown
>> > > > mailto: hodgesse at uhd.edu
>> > > >
>> > > >
>> > > >
>> >
> ------------------------------------------------------------------------
>> > > >
>> > > > _______________________________________________
>> > > > Swift-user mailing list
>> > > > Swift-user at ci.uchicago.edu
>> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>> > >
>> > >
>> >
>>
>
More information about the Swift-devel
mailing list