[Swift-user] using swift on a cluster

Hodgess, Erin HodgessE at uhd.edu
Wed Oct 21 09:03:24 CDT 2009


Here is the output:


[hodgess at grid bin]$ swift -tc.file tc.data -sites.file sites.xml firstR.swift
Swift 0.9 swift-r2860 cog-r2388

RunID: 20091021-0901-aku7y862
Progress:
Execution failed:
        No service contacts available
[hodgess at grid bin]$



Erin M. Hodgess, PhD
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: hodgesse at uhd.edu



-----Original Message-----
From: Michael Wilde [mailto:wilde at mcs.anl.gov]
Sent: Wed 10/21/2009 7:02 AM
To: Hodgess, Erin
Cc: swift-user at ci.uchicago.edu
Subject: Re: [Swift-user] using swift on a cluster
 
For running Swift locally on a Condor cluster, use a sites.xml based on 
this example:

<execution provider="condor" url="none" />

<config>

   <pool handle="localhost">
     <gridftp url="local://localhost" />
     <execution provider="local" url="none" />
     <workdirectory>/home/erin/swiftwork</workdirectory>
     <profile namespace="karajan" key="jobThrottle">.03</profile>
     <profile namespace="karajan" key="initialScore">10000</profile>
   </pool>

   <pool handle="condor">
     <execution provider="condor" url="none"/>
     <gridftp url="local://localhost"/>
     <workdirectory>/home/erin/swiftwork</workdirectory>
     <profile namespace="karajan" key="jobThrottle">.19</profile>
     <profile namespace="karajan" key="initialScore">10000</profile>
   </pool>

</config>

The jobThrottle values above will enable Swift to run up to 4 jobs at a 
time on localhost and 20 jobs at a time on the Condor cluster.

Use tc.data to catalog applications on pool or the other.

Set jobThrottle as desired to control execution parallelism.

#jobs run in parallel is (jobThrottle * 100)+1

initialScore=10000 overrides Swift's "start slow" approach to sensing 
the site's responsiveness.

- Mike

On 10/21/09 3:17 AM, Hodgess, Erin wrote:
> Aha!
> 
> I needed the universe=vanilla line.
> 
> 
> 
> Erin M. Hodgess, PhD
> Associate Professor
> Department of Computer and Mathematical Sciences
> University of Houston - Downtown
> mailto: hodgesse at uhd.edu
> 
> 
> 
> -----Original Message-----
> From: swift-user-bounces at ci.uchicago.edu on behalf of Hodgess, Erin
> Sent: Wed 10/21/2009 3:07 AM
> To: Michael Wilde
> Cc: swift-user at ci.uchicago.edu
> Subject: RE: [Swift-user] using swift on a cluster
> 
> Hello!
> 
> We are indeed using condor.
> 
> I wanted to try a small test run, but am running into trouble:
> 
> [hodgess at grid bin]$ cat myjob.submit
> executable=/usr/bin/id
> output=results.output
> error=results.error
> log=results.log
> queue
> [hodgess at grid bin]$ condor_submit myjob.submit
> Submitting job(s).
> Logging submit event(s).
> 1 job(s) submitted to cluster 15.
> [hodgess at grid bin]$ ls results*
> results.error  results.log  results.output
> You have new mail in /var/spool/mail/hodgess
> [hodgess at grid bin]$ cat results.log
> 000 (015.000.000) 10/21 03:06:03 Job submitted from host: 
> <192.168.1.11:46274>
> ...
> 001 (015.000.000) 10/21 03:06:05 Job executing on host: <10.1.255.244:44508>
> ...
> 002 (015.000.000) 10/21 03:06:05 (1) Job not properly linked for Condor.
> ...
> 009 (015.000.000) 10/21 03:06:05 Job was aborted by the user.
> ...
> [hodgess at grid bin]$
> 
> I'm not sure why the job is not linked.
> 
> Any suggestions would be much appreciated.
> 
> Thanks,
> Erin
> 
> 
> Erin M. Hodgess, PhD
> Associate Professor
> Department of Computer and Mathematical Sciences
> University of Houston - Downtown
> mailto: hodgesse at uhd.edu
> 
> 
> 
> -----Original Message-----
> From: Michael Wilde [mailto:wilde at mcs.anl.gov]
> Sent: Tue 10/20/2009 10:49 PM
> To: Hodgess, Erin
> Cc: swift-user at ci.uchicago.edu
> Subject: Re: [Swift-user] using swift on a cluster
> 
> Hi Erin,
> 
> I'm assuming you meant "use Swift to run jobs on the compute nodes of
> the cluster"?
> 
> If so, you first need to find out what scheduler (also called "batch
> system" or "local resource manager") the cluster is running.
> 
> Thats typical one of these: PBS, Condor, or SGE.
> 
> Either ask your system administrator, or see if the "man" command or
> similar probes give you a clue:
> 
> Condor: condor_q -version
> 
> condor_q -version
> $CondorVersion: 7.2.4 Jun 16 2009 BuildID: 159529 $
> $CondorPlatform: I386-LINUX_RHEL5 $
> 
> PBS: man qstat:
> 
>    qstat(1B)  PBS
> 
> SGE: man qstat:
> 
>    QSTAT(1)   Sun Grid Engine User Commands
> 
> 
> If its PBS or Condor, then the Swift user guide gives the sites.xml
> entries to use.
> 
> Tell us what you find, then try following the instructions in the user
> guide, and follow up with questions as needed.
> 
> - Mike
> 
> 
> On 10/20/09 9:41 PM, Hodgess, Erin wrote:
>  > Hi Swift Users:
>  >
>  > I'm on a cluster and would like to use swift on the different sites on
>  > the cluster.
>  >
>  > How would I do that, please?
>  >
>  > Thanks,
>  > Erin
>  >
>  >
>  > Erin M. Hodgess, PhD
>  > Associate Professor
>  > Department of Computer and Mathematical Sciences
>  > University of Houston - Downtown
>  > mailto: hodgesse at uhd.edu
>  >
>  >
>  > ------------------------------------------------------------------------
>  >
>  > _______________________________________________
>  > Swift-user mailing list
>  > Swift-user at ci.uchicago.edu
>  > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20091021/52cb76fb/attachment.html>


More information about the Swift-user mailing list