[Swift-user] Swift on local resources

Allan Espinosa aespinosa at cs.uchicago.edu
Fri Jun 12 15:41:41 CDT 2009


Can you please post your swift logs?  when i remove my passphrase
entry: i get the following error:

Swift svn swift-r2949 cog-r2406

RunID: remoterun
Progress:
Progress:  Initializing site shared directory:1
Progress:  Initializing site shared directory:1
Execution failed:
        Could not initialize shared directory on TERAPORT
Caused by:
        org.globus.cog.abstraction.impl.file.FileResourceException:
Error while communicating with the SSH server on
tp-login1.ci.uchicago.edu:22
Caused by:
        java.lang.NullPointerException
        at org.globus.cog.abstraction.impl.ssh.SSHChannelManager.loadDefaultCredentials(SSHChannelManager.java:160)
        at org.globus.cog.abstraction.impl.ssh.SSHChannelManager.getDefaultCredentials(SSHChannelManager.java:120)
        at org.globus.cog.abstraction.impl.ssh.SSHChannelManager.getCredentials(SSHChannelManager.java:79)
        at org.globus.cog.abstraction.impl.ssh.SSHChannelManager.getChannel(SSHChannelManager.java:62)
        at org.globus.cog.abstraction.impl.ssh.file.FileResourceImpl.start(FileResourceImpl.java:81)
        at org.globus.cog.abstraction.impl.file.FileResourceCache.getResource(FileResourceCache.java:98)
        at org.globus.cog.abstraction.impl.file.CachingDelegatedFileOperationHandler.getResource(CachingDelegatedFileOperationHandler.java:75)
        at org.globus.cog.abstraction.impl.file.CachingDelegatedFileOperationHandler.submit(CachingDelegatedFileOperationHandler.java:40)
        at org.globus.cog.abstraction.impl.common.task.CachingFileOperationTaskHandler.submit(CachingFileOperationTaskHandler.java:28)
        at org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:86)
        at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431)
        at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643)
        at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668)
        at java.lang.Thread.run(Thread.java:595)


2009/6/12 Andriy Fedorov <fedorov at cs.wm.edu>:
> Allan,
>
> Thank you for the example.
>
> I have exactly same setup, but it doesn't work for me. I suspect the
> reason is that I am unable to set up my environment to work with
> passphrase. I can only log in with password. I wonder if there is any
> workaround...
>
> AF
>
>
>
> On Fri, Jun 12, 2009 at 4:04 PM, Allan
> Espinosa<aespinosa at cs.uchicago.edu> wrote:
>> Hi Andriy and Mike,
>>
>> here is my example ~/.ssh/auth.defaults for executing jobs on
>> tp-login1.ci.uchicago.edu:
>>
>> [aespinosa at tp-login2 ~]$ cat .ssh/auth.defaults
>> tp-login1.ci.uchicago.edu.type=key
>> tp-login1.ci.uchicago.edu.username=aespinosa
>> tp-login1.ci.uchicago.edu.key=/home/aespinosa/.ssh/id_dsa
>> tp-login1.ci.uchicago.edu.passphrase=XXXXXXXX
>>
>> We have used falkon and coasters before for multi-core configurations.
>>  but for a single multi-core machine, i believe you can get away with
>> having multiple entries of the same host in the sites.xml file using
>> the ssh-provider.
>>
>> ie:
>>
>> <config>
>> <pool handle="CORE0">
>>   <execution provider="ssh"... />
>>   ...
>> </pool>
>> <pool handle="CORE1">
>>   ...
>> </pool>
>> </config>
>>
>> 2009/6/12 Michael Wilde <wilde at mcs.anl.gov>:
>>> Ah, very cool. Im eager to get more user experience feedback on multicore
>>> use.
>>>
>>> So I will try to hunt down my examples of .ssh configs.
>>>
>>> Also, Allan Espinosa used this recently. Allan, can you post details and
>>> examples?
>>>
>>> Thanks!
>>>
>>> Mike
>>>
>>> On 6/12/09 2:06 PM, Andriy Fedorov wrote:
>>>>
>>>> Michael,
>>>>
>>>> Thank you for the advice, I will look into this. This is very helpful.
>>>> I had an impression Lava is not included in the list of schedulers
>>>> supported out of the box, but wanted to check.
>>>>
>>>> Just a clarification -- I need to access two different types of local
>>>> resources. Cluster (via Lava or Condor) is one, but for the multicore
>>>> nodes we have on the network, which are not part of cluster, the only
>>>> option is to use ssh.
>>>>
>>>> AF
>>>>
>>>>
>>>>
>>>> On Fri, Jun 12, 2009 at 2:52 PM, Michael Wilde<wilde at mcs.anl.gov> wrote:
>>>>>
>>>>> Andriy,
>>>>>
>>>>> Ben or Mihael may have better ideas, but I offer my thoughts below.
>>>>>
>>>>> On 6/12/09 1:18 PM, Andriy Fedorov wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am trying to set up Swift with the local cluster and non-cluster
>>>>>> resources in our lab. Here some configuration details.
>>>>>>
>>>>>> Due to technical problems, passphrase login is not possible for the
>>>>>> nodes on local network, and I need to enter password each time.
>>>>>>
>>>>>> For the cluster, I was able to set up passphrase login for the head
>>>>>> node. The cluster is running Lava and Condor schedulers at the same
>>>>>> time, but Lava should be used if possible.
>>>>>>
>>>>>> Two questions:
>>>>>>
>>>>>> (1) is it possible to configure Swift to talk to Lava scheduler?
>>>>>
>>>>> Making Swift talk to a new scheduler means writing a new CoG provider (in
>>>>> Java). You can likely use an existing "data" provider like "local"; you
>>>>> could model the "execution" provider after the "PBS" provider. How hard
>>>>> this
>>>>> is depends on how close Lava is to PBS in nature. (I dont know it). And
>>>>> the
>>>>> provider interface you need to code to is not well documented afaik.
>>>>>
>>>>> I would try the Condor provider. While that provider is less mature and
>>>>> tested than others, it should work, and if it doesnt, we should try to
>>>>> fix
>>>>> it.
>>>>>
>>>>> If possible, make sure a simple condor_submit hello-world works for you
>>>>> first.
>>>>>
>>>>> Run swift on the head/login node; use the "local" data provider.
>>>>>
>>>>> Another route is to use Falkon, but that will be harder and its less
>>>>> supported, so I suggest against this until easier routes are exhausted.
>>>>>
>>>>> I dont think that ssh will get you far, as to leverage the cluster I
>>>>> think
>>>>> you'd need to describe each worker node with a separate sites.xml entry.
>>>>> Thats fine in principle, but a bit awkward, and may have scheduling
>>>>> issues
>>>>> (ie if ssh hangs or dies when you dont own the node).
>>>>>
>>>>> Save ssh as another last resort; I suggest trying Condor first.
>>>>>
>>>>> If needed, people who used ssh recently can send you the info below.
>>>>>
>>>>> - Mike
>>>>>
>>>>>> (2) I am following the instructions on setting up ssh site provider to
>>>>>> use nodes on the local network.
>>>>>>  (2.1) do I need to set up auth.defaults even if I have ssh-agent
>>>>>> running, and can ssh to the remote node without being asked for
>>>>>> password?
>>>>>>  (2.2.) can anybody give me more detailed instructions on how to set
>>>>>> up auth.defaults? I cannot make it work.
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Andriy Fedorov
>>>>>> _______________________________________________
>>
>



More information about the Swift-user mailing list