[Swift-user] Bag of Workstatitions connection timeout
Schenker, Thomas
thomas.schenker at student.kit.edu
Fri Nov 2 10:18:20 CDT 2012
Hi,
i'm trying to run swift on some workstations, following these instructions: http://www.ci.uchicago.edu/swift/guides/release-0.93/siteguide/siteguide.html#_bag_of_workstations .
My coaster-service.conf looks like this:
# Location of SWIFT. If empty, PATH is searched
export SWIFT=/home/tschenker/swift-0.93/bin/swift
# Where to copy worker.pl on the remote machine for sites.xml
#export WORKER_LOCATION=$HOME
export WORKER_LOCATION=/home/ubuntu
# How to launch workers: local, ssh, cobalt, or futuregrid
export WORKER_MODE=ssh
# SSH hosts to start workers on (ssh mode only)
export WORKER_HOSTS="10.1.218.228 10.1.218.229"
# Do all the worker nodes you're using have a shared filesystem? (yes/no)
export SHARED_FILESYSTEM=no
# Username to use on worker nodes
export WORKER_USERNAME=ubuntu
# Enable SSH tunneling? (yes/no)
export SSH_TUNNELING=no
# Directory to keep log files, relative to working directory when launching start-coaster-service
export LOG_DIR=logs
# Manually define ports. If not specified, an available port will be used
export LOCAL_PORT=
export SERVICE_PORT=
# This is the IP address to which the workers will connect
# If not given, start-coaster-service tries to automatically detect
# the IP address of this system via ifconfig
# Specify this if you have multiple network interfaces
export IPADDR=
# Location of the swift-vm-boot scripts
export SWIFTVMBOOT_DIR=$HOME/swift-vm-boot
# Swift information for creating sites.xml
export WORK=/home/tschenker/.work_swift
export QUEUE=prod-devel
export MAXTIME=20
export NODE=64
export JOBS_PER_NODE=1
export JOB_THROTTLE=0.019
start-coaster-service starts worker.pl on the workstations, but after ~30 seconds these processes die.
This is what i get when i run one of the swift examples:
~$ swift -sites.file ~/sites.xml -tc.file ~/tc.data -config cf swift-0.93/examples/swift/tutorial/if.swift
Swift 0.93 swift-r5483 cog-r3339
RunID: 20121029-2128-urhpc6eg
Failed to acquire exclusive lock on log file.
Progress: time: Mon, 29 Oct 2012 21:28:04 +0100
Find: http://10.1.167.72:57254
Find: keepalive(120), reconnect - http://10.1.167.72:57254
Passive queue processor initialized. Callback URI is http://127.0.1.1:43284
Progress: time: Mon, 29 Oct 2012 21:28:34 +0100 Submitted:1
Progress: time: Mon, 29 Oct 2012 21:29:04 +0100 Submitted:1
Failed to connect: Connection timed out at /home/ubuntu/worker.pl line 372.
Failed to connect: Connection timed out at /home/ubuntu/worker.pl line 372.
Progress: time: Mon, 29 Oct 2012 21:29:34 +0100 Submitted:1
Progress: time: Mon, 29 Oct 2012 21:30:04 +0100 Submitted:1
...
Can anybody help?
Thanks,
Thomas
More information about the Swift-user
mailing list