[Swift-user] Pointer to Swift tutorials for computational science education and research

Andrew Stocker amstocker at dons.usfca.edu
Wed Sep 10 20:45:50 CDT 2014


Ketan,

I copied the catnap executable to the same directory on each of the
computers and now the swift script is working perfectly without error.
 Thanks for your help!  What are the next steps we can take to set up our
cluster to not require the script to be on all the computers?  Since we are
fairly new to parallel computing with a cluster, could you point us towards
any resources regarding the technical configuration for Swift?  I've looked
at the documentation for tc.data but I am still a bit confused by it.

Thanks,

Andrew

On Tue, Sep 9, 2014 at 7:49 AM, Ketan Maheshwari <ketan at mcs.anl.gov> wrote:

>
> On Mon, Sep 8, 2014 at 6:28 PM, Andrew Stocker <amstocker at dons.usfca.edu>
> wrote:
>
>>  Thanks for your response!
>>
>>  Since we're just in the stages of experimentation, our preliminary
>> cluster is just four iMacs connected to a switch.  I set up password-less
>> ssh communication between the four and I'm able to start the coaster
>> service (in the folder with coaster-service.conf) without any errors.  I am
>> running Swift from the computer which has the catnap.sh installed at the
>> correct path, and I'm pretty sure it has the executable bit set ( #!/bin/sh
>> is the first line of the program).  None of the other three computers
>> have Swift installed, nor do they have catnap.sh at the location
>> specified in tc.data, is this a problem?
>>
>
> Yes, that seems to be the issue. The executable--catnap.sh in this case
> must be available on all compute nodes in the location specified in the tc.
>
> An alternative in this case is to use catnap.sh as data and move it along
> with data to target compute nodes. However, we can do  that later. For now,
> could you try to put catnap.sh in a common location on each of the compute
> nodes and try again.
>
> No, Swift is not needed to be installed on compute nodes. Swift just needs
> to be on the submit node.
>
>
>>
>>  Attached is the log file from the run when I got the error I
>> copy+pasted above.  Interestingly, when I run the catnap swift script with
>> only 3 concurrent instances, it seems to run fine since we allow 3 jobs per
>> node and so it is probably only running locally.
>>
>>  Regards,
>> Andrew
>>
>> On Fri, Sep 5, 2014 at 6:03 PM, Ketan Maheshwari <ketan at mcs.anl.gov>
>> wrote:
>>
>>> Hi Andrew,
>>>
>>>  Yes, I remember: thanks for getting back on this.
>>>
>>>  From the error message and tc.data, indeed it looks like the
>>> executable is provided as absolute path but somehow Swift is looking into
>>> system path and not finding it. One possibility is that the node on which
>>> catnap.sh is running does not have it installed on the path specified in
>>> the tc.data. Can you also check if catnap.sh has the executable bit set.
>>> Less likely that this is causing the issue though.
>>>
>>>  Also, from the tc.data line it looks like you are using persistent
>>> coasters. Have started the coaster service beforehand and made sure the
>>> service started correctly without any error messages. Could you indicate
>>> more about your cluster. Depending on the type of cluster, it is possible
>>> that we can run Swift in a non-persistent, implicit coasters mode.
>>>
>>>  Can you also send the Swift generated log for this run.
>>>
>>>  Thanks,
>>> Ketan
>>>
>>>
>>> On Fri, Sep 5, 2014 at 7:16 PM, Andrew Stocker <amstocker at dons.usfca.edu
>>> > wrote:
>>>
>>>>  Hi Ketan,
>>>>
>>>>  I'm not sure if you remember, but myself and my research advisor
>>>> Xiaosheng spoke to you at LBL in Oakland at the beginning of the summer
>>>> about starting to use Swift at our school.   We have been working hard on
>>>> setting it up, and I am trying to get your demo to run but I'm having a
>>>> problem.  For some reason I keep getting the following error when I try to
>>>> run your catsnsleep demo:
>>>>
>>>>  Execution failed:
>>>> Exception in catnap:
>>>>     Arguments: [5, data.txt]
>>>>     Host: persistent-coasters
>>>>     Directory: catsnsleep-20140905-1702-mihcat06/jobs/u/catnap-utx080xl
>>>>
>>>>  Caused by:
>>>> Cannot find executable catnap.sh on site system path
>>>> catnap, catsnsleep.swift, line 13
>>>>
>>>>  However I'm not sure why.  In our tc.data file we have the line:
>>>>
>>>>  persistent-coasters catnap
>>>> /usr/local/swift-0.94.1/oakland-demo/catnap.sh
>>>>
>>>>  which I think should work but obviously something is going wrong.  I
>>>> have been browsing the documentation articles but I can't find anything
>>>> about why this might be happening.  We would greatly appreciate your advice!
>>>>
>>>>  Regards,
>>>>
>>>>  Andrew Stocker
>>>>
>>>>
>>>> On Fri, Jun 20, 2014 at 12:00 PM, Xiaosheng Huang <xhuang22 at usfca.edu>
>>>> wrote:
>>>>
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From: Ketan Maheshwari <ketan at mcs.anl.gov>
>>>>> Date: Fri, Jun 20, 2014 at 11:45 AM
>>>>> Subject: Re: Pointer to Swift tutorials for computational science
>>>>> education and research
>>>>> To: Xiaosheng Huang <xhuang22 at usfca.edu>
>>>>> Cc: Wilde <wilde at mcs.anl.gov>
>>>>>
>>>>>
>>>>> Hi Xiaosheng,
>>>>>
>>>>>  The tarball is: http://www.mcs.anl.gov/~ketan/oakland-demo.tgz
>>>>>
>>>>>  There is a small README in there which outlines the steps.
>>>>>
>>>>>  Best,
>>>>> Ketan
>>>>>
>>>>>   ************************************************************
>>>>> Xiaosheng Huang, Assistant Professor
>>>>> Department of Physics and Astronomy
>>>>> University of San Francisco
>>>>> 2130 Fulton Street, San Francisco, CA 94117-1080
>>>>>
>>>>> Phone: (415) 422-6281
>>>>> E-mail: xhuang22 at usfca.edu
>>>>> ************************************************************
>>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140910/b5a653cf/attachment.html>


More information about the Swift-user mailing list