[Swift-user] using swift on a cluster

Hodgess, Erin HodgessE at uhd.edu
Wed Oct 21 12:10:25 CDT 2009


Hi again!

Here are the sites.xml and tc.data files.

Thanks,
Erin


[hodgess at grid bin]$ cat sites.xml
<execution provider="condor" url="none" />

<config>

   <pool handle="localhost">
     <gridftp url="local://localhost" />
     <execution provider="local" url="none" />
     <workdirectory>/home/hodgess/swiftwork</workdirectory>
     <profile namespace="karajan" key="jobThrottle">.03</profile>
     <profile namespace="karajan" key="initialScore">10000</profile>
   </pool>

   <pool handle="condor">
     <execution provider="condor" url="none"/>
     <gridftp url="local://localhost"/>
     <workdirectory>/home/hodgess/swiftwork</workdirectory>
     <profile namespace="karajan" key="jobThrottle">.19</profile>
     <profile namespace="karajan" key="initialScore">10000</profile>
   </pool>

</config>
[hodgess at grid bin]$ cat tc.data
localhost       convert /usr/bin/convert        INSTALLED       INTEL32::LINUX null
localhost       RInvoke /home/hodgess/R-2.9.2/bin/RInvoke.sh    INSTALLED      INTEL32::LINUX   null
condor  RInvoke /home/hodgess/R-2.9.2/bin/RInvoke.sh    INSTALLED       INTEL32::LINUX  null
[hodgess at grid bin]$ cat firstR.R
cat: firstR.R: No such file or directory
[hodgess at grid bin]$ cat firstR.swift
type file{}
app (file output) firstone (file scriptFile) {
    RInvoke  @filename(scriptFile) @filename(output);
    }


        file scriptFile <"a1.in" >;
        file output <"a1.out" >;
            output=firstone(scriptFile);
[hodgess at grid bin]$


Erin M. Hodgess, PhD
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: hodgesse at uhd.edu



-----Original Message-----
From: Michael Wilde [mailto:wilde at mcs.anl.gov]
Sent: Wed 10/21/2009 9:22 AM
To: Hodgess, Erin
Cc: swift-user at ci.uchicago.edu
Subject: Re: [Swift-user] using swift on a cluster
 
Erin, we need to look into this further.

Please make sure that you are running either Swift 0.9 or the latest 
source from svn. And tell us what revision you are running.

Also please post your tc.data and sites.xml (and log file is its small 
enought); see if there are any messages in the .log file that would 
clarify the error.

Make sure that your app is cataloged in tc.data as being on pool 
"condor". But I think if it were not, you'd see a different error.

It almost looks to me like Swift is looking for the GRAM service contact 
string, as if it thinks you are asking for Condor-G instead of local 
Condor, eg:

  <profile namespace="globus" key="jobType">grid</profile>
  <profile namespace="globus"
   key="gridResource">gt2 belhaven-1.renci.org/jobmanager-fork</profile>

Just as a test, try changing provider="condor" to "pbs" in sites.xml. If 
the error changes to something like "PBS not installed" or "qsub not 
found" then I would suspect this is the case.

Its possible you can add just the jobType element with the value set to 
vanilla instead of grid, but I am purely *guessing*; we'll look deeper 
as soon as you send the info above and we have time.

- Mike


On 10/21/09 9:03 AM, Hodgess, Erin wrote:
> Here is the output:
> 
> 
> [hodgess at grid bin]$ swift -tc.file tc.data -sites.file sites.xml 
> firstR.swift
> Swift 0.9 swift-r2860 cog-r2388
> 
> RunID: 20091021-0901-aku7y862
> Progress:
> Execution failed:
>         No service contacts available
> [hodgess at grid bin]$
> 
> 
> 
> Erin M. Hodgess, PhD
> Associate Professor
> Department of Computer and Mathematical Sciences
> University of Houston - Downtown
> mailto: hodgesse at uhd.edu
> 
> 
> 
> -----Original Message-----
> From: Michael Wilde [mailto:wilde at mcs.anl.gov]
> Sent: Wed 10/21/2009 7:02 AM
> To: Hodgess, Erin
> Cc: swift-user at ci.uchicago.edu
> Subject: Re: [Swift-user] using swift on a cluster
> 
> For running Swift locally on a Condor cluster, use a sites.xml based on
> this example:
> 
> <execution provider="condor" url="none" />
> 
> <config>
> 
>    <pool handle="localhost">
>      <gridftp url="local://localhost" />
>      <execution provider="local" url="none" />
>      <workdirectory>/home/erin/swiftwork</workdirectory>
>      <profile namespace="karajan" key="jobThrottle">.03</profile>
>      <profile namespace="karajan" key="initialScore">10000</profile>
>    </pool>
> 
>    <pool handle="condor">
>      <execution provider="condor" url="none"/>
>      <gridftp url="local://localhost"/>
>      <workdirectory>/home/erin/swiftwork</workdirectory>
>      <profile namespace="karajan" key="jobThrottle">.19</profile>
>      <profile namespace="karajan" key="initialScore">10000</profile>
>    </pool>
> 
> </config>
> 
> The jobThrottle values above will enable Swift to run up to 4 jobs at a
> time on localhost and 20 jobs at a time on the Condor cluster.
> 
> Use tc.data to catalog applications on pool or the other.
> 
> Set jobThrottle as desired to control execution parallelism.
> 
> #jobs run in parallel is (jobThrottle * 100)+1
> 
> initialScore=10000 overrides Swift's "start slow" approach to sensing
> the site's responsiveness.
> 
> - Mike
> 
> On 10/21/09 3:17 AM, Hodgess, Erin wrote:
>  > Aha!
>  >
>  > I needed the universe=vanilla line.
>  >
>  >
>  >
>  > Erin M. Hodgess, PhD
>  > Associate Professor
>  > Department of Computer and Mathematical Sciences
>  > University of Houston - Downtown
>  > mailto: hodgesse at uhd.edu
>  >
>  >
>  >
>  > -----Original Message-----
>  > From: swift-user-bounces at ci.uchicago.edu on behalf of Hodgess, Erin
>  > Sent: Wed 10/21/2009 3:07 AM
>  > To: Michael Wilde
>  > Cc: swift-user at ci.uchicago.edu
>  > Subject: RE: [Swift-user] using swift on a cluster
>  >
>  > Hello!
>  >
>  > We are indeed using condor.
>  >
>  > I wanted to try a small test run, but am running into trouble:
>  >
>  > [hodgess at grid bin]$ cat myjob.submit
>  > executable=/usr/bin/id
>  > output=results.output
>  > error=results.error
>  > log=results.log
>  > queue
>  > [hodgess at grid bin]$ condor_submit myjob.submit
>  > Submitting job(s).
>  > Logging submit event(s).
>  > 1 job(s) submitted to cluster 15.
>  > [hodgess at grid bin]$ ls results*
>  > results.error  results.log  results.output
>  > You have new mail in /var/spool/mail/hodgess
>  > [hodgess at grid bin]$ cat results.log
>  > 000 (015.000.000) 10/21 03:06:03 Job submitted from host:
>  > <192.168.1.11:46274>
>  > ...
>  > 001 (015.000.000) 10/21 03:06:05 Job executing on host: 
> <10.1.255.244:44508>
>  > ...
>  > 002 (015.000.000) 10/21 03:06:05 (1) Job not properly linked for Condor.
>  > ...
>  > 009 (015.000.000) 10/21 03:06:05 Job was aborted by the user.
>  > ...
>  > [hodgess at grid bin]$
>  >
>  > I'm not sure why the job is not linked.
>  >
>  > Any suggestions would be much appreciated.
>  >
>  > Thanks,
>  > Erin
>  >
>  >
>  > Erin M. Hodgess, PhD
>  > Associate Professor
>  > Department of Computer and Mathematical Sciences
>  > University of Houston - Downtown
>  > mailto: hodgesse at uhd.edu
>  >
>  >
>  >
>  > -----Original Message-----
>  > From: Michael Wilde [mailto:wilde at mcs.anl.gov]
>  > Sent: Tue 10/20/2009 10:49 PM
>  > To: Hodgess, Erin
>  > Cc: swift-user at ci.uchicago.edu
>  > Subject: Re: [Swift-user] using swift on a cluster
>  >
>  > Hi Erin,
>  >
>  > I'm assuming you meant "use Swift to run jobs on the compute nodes of
>  > the cluster"?
>  >
>  > If so, you first need to find out what scheduler (also called "batch
>  > system" or "local resource manager") the cluster is running.
>  >
>  > Thats typical one of these: PBS, Condor, or SGE.
>  >
>  > Either ask your system administrator, or see if the "man" command or
>  > similar probes give you a clue:
>  >
>  > Condor: condor_q -version
>  >
>  > condor_q -version
>  > $CondorVersion: 7.2.4 Jun 16 2009 BuildID: 159529 $
>  > $CondorPlatform: I386-LINUX_RHEL5 $
>  >
>  > PBS: man qstat:
>  >
>  >    qstat(1B)  PBS
>  >
>  > SGE: man qstat:
>  >
>  >    QSTAT(1)   Sun Grid Engine User Commands
>  >
>  >
>  > If its PBS or Condor, then the Swift user guide gives the sites.xml
>  > entries to use.
>  >
>  > Tell us what you find, then try following the instructions in the user
>  > guide, and follow up with questions as needed.
>  >
>  > - Mike
>  >
>  >
>  > On 10/20/09 9:41 PM, Hodgess, Erin wrote:
>  >  > Hi Swift Users:
>  >  >
>  >  > I'm on a cluster and would like to use swift on the different sites on
>  >  > the cluster.
>  >  >
>  >  > How would I do that, please?
>  >  >
>  >  > Thanks,
>  >  > Erin
>  >  >
>  >  >
>  >  > Erin M. Hodgess, PhD
>  >  > Associate Professor
>  >  > Department of Computer and Mathematical Sciences
>  >  > University of Houston - Downtown
>  >  > mailto: hodgesse at uhd.edu
>  >  >
>  >  >
>  >  > 
> ------------------------------------------------------------------------
>  >  >
>  >  > _______________________________________________
>  >  > Swift-user mailing list
>  >  > Swift-user at ci.uchicago.edu
>  >  > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>  >
>  >
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20091021/717d9493/attachment.html>


More information about the Swift-user mailing list