[Swift-devel] Localhost coasters not working on Beagle
Michael Wilde
wilde at anl.gov
Sun Jun 8 16:48:00 CDT 2014
Mihael - Im not able to get a simple localhost coasters run working on
Beagle login1.
All: Is anyone seeing something similar? It looks to me like my coaster
worker is not able to connect to the Swift coaster service (using
standard automatic coasters).
Im working in /lustre/beagle/wilde/swift/lab/fastio (where you can find
logs and configs). Running 0.95RC6.
Im setting GLOBUS_HOSTNAME (to 127.0.0.1) and have tried internalHost as
well:
login1$ swift -config cf -tc.file apps -sites.file localcoast.xml
catsn.swift
login1$ cat localcoast.xml
<?xml version="1.0" encoding="UTF-8"?>
<config xmlns="http://www.ci.uchicago.edu/swift/SwiftSites">
<pool handle="localhost">
<execution provider="coaster" jobmanager="local:local"/>
<profile namespace="globus" key="internalHostname">127.0.0.1</profile>
<profile namespace="globus" key="maxwalltime">00:01:00</profile>
<profile namespace="globus" key="maxtime">3600</profile>
<profile namespace="globus" key="jobsPerNode">1</profile>
<profile namespace="globus" key="slots">1</profile>
<profile namespace="globus" key="nodeGranularity">1</profile>
<profile namespace="globus" key="maxNodes">1</profile>
<profile namespace="karajan" key="jobThrottle">12</profile>
<profile namespace="karajan" key="initialScore">10000</profile>
<profile namespace="karajan" key="lowOverAllocation">100</profile>
<profile namespace="karajan" key="highOverAllocation">100</profile>
<filesystem provider="local"/>
<workdirectory>/tmp/swiftwork</workdirectory>
</pool>
I get error 110 connection timeouts:
2014-06-08 16:37:50,762-0500 DEBUG swift JOB_START jobid=cat-7jiymsrl
tr=cat arguments=[data.txt] tmpdir=catsn-run013/jobs/7/cat-7jiymsrl
host=localhost
2014-06-08 16:37:50,829-0500 INFO LocalService Started local service:
127.0.0.1:50000
2014-06-08 16:37:50,837-0500 INFO BootstrapService Socket bound. URL is
http://127.0.0.1:50001
2014-06-08 16:37:50,914-0500 INFO Settings Local contacts:
[http://127.0.0.2:50003, http://192.5.86.104:50003,
http://10.128.2.244:50003]
2014-06-08 16:37:50,917-0500 INFO CoasterService Started local service:
http://127.0.0.1:50003
2014-06-08 16:37:50,917-0500 INFO CoasterService Reserving channel for
registration
2014-06-08 16:37:50,942-0500 INFO MetaChannel MetaChannel [context:
cpipe, boundTo: null] binding to cpipe://1
2014-06-08 16:37:50,942-0500 INFO MetaChannel MetaChannel [context:
spipe, boundTo: null] binding to spipe://1
2014-06-08 16:37:50,942-0500 INFO CoasterService Sending registration
2014-06-08 16:37:50,948-0500 INFO MetaChannel Trying to re-bind current
channel
2014-06-08 16:37:50,949-0500 INFO RequestHandler Handler(tag: 1,
REGISTER) unregistering (send)
2014-06-08 16:37:50,949-0500 INFO CoasterService Registration complete
2014-06-08 16:37:50,949-0500 INFO CoasterService Started coaster
service: http://127.0.0.1:50002
2014-06-08 16:37:50,952-0500 WARN Settings original callback URI is
http://10.128.2.244:50003
2014-06-08 16:37:50,952-0500 WARN Settings callback URI has been
overridden to http://127.0.0.1:50003
2014-06-08 16:37:50,953-0500 INFO RequestHandler Handler(tag: 1,
CONFIGSERVICE) unregistering (send)
2014-06-08 16:37:50,969-0500 INFO BlockQueueProcessor Starting...
id=0608-3704500
2014-06-08 16:37:50,969-0500 INFO RequestHandler Handler(tag: 2,
SUBMITJOB) unregistering (send)
2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor
Settings {
slots = 1
jobsPerNode = 1
workersPerNode = 1
nodeGranularity = 1
allocationStepSize = 0.1
maxNodes = 1
lowOverallocation = 10.0
highOverallocation = 1.0
overallocationDecayFactor = 0.001
spread = 0.9
reserve = 60.000s
maxtime = 3600
remoteMonitorEnabled = false
internalHostname = 127.0.0.1
hookClass = null
workerManager = block
workerLoggingLevel = NONE
workerLoggingDirectory = DEFAULT
ldLibraryPath = null
workerCopies = null
directory = null
useHashBang = null
parallelism = 0.01
coresPerNode = 1
perfTraceWorker = false
perfTraceInterval = -1
attributes = {}
callbackURIs = [http://127.0.0.1:50003]
}
2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Jobs in holding
queue: 1
2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Time estimate for
holding queue (seconds): 1
2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Allocating blocks
for a total walltime of: 1s
2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Considering:
Job(id:0 60.000s)
2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Max
Walltime (seconds): 60
2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Time
estimate (seconds): 600
2014-06-08 16:37:51,010-0500 INFO BlockQueueProcessor Total for
this new Block (est. seconds): 0
2014-06-08 16:37:51,013-0500 INFO BlockQueueProcessor index: 0, last:
0, holding.size(): 1
2014-06-08 16:37:51,014-0500 INFO BlockQueueProcessor Queued: 1 jobs to
new Block
2014-06-08 16:37:51,014-0500 INFO BlockQueueProcessor index: 0, last:
0, ii: 1, holding.size(): 1
2014-06-08 16:37:51,014-0500 INFO Block Starting block: workers=1,
walltime=600.000s
2014-06-08 16:37:51,016-0500 INFO RemoteLogHandler BLOCK_REQUESTED
id=0608-3704500-000000, cores=1, coresPerWorker=1, walltime=600
2014-06-08 16:37:51,016-0500 INFO RequestHandler Handler(tag: 2, RLOG)
unregistering (send)
2014-06-08 16:37:51,018-0500 INFO BlockTaskSubmitter Queuing block
Block 0608-3704500-000000 (1x600.000s) for submission
2014-06-08 16:37:51,018-0500 INFO BlockQueueProcessor Added 1 jobs to
new blocks
2014-06-08 16:37:51,018-0500 INFO BlockTaskSubmitter Submitting block
Block 0608-3704500-000000 (1x600.000s)
2014-06-08 16:37:51,018-0500 INFO ExecutionTaskHandler provider=local
2014-06-08 16:37:51,023-0500 INFO Block Block task status changed:
Submitting
2014-06-08 16:37:51,023-0500 INFO JobSubmissionTaskHandler Submit: in:
/ command: /usr/bin/perl
/home/wilde/.globus/coasters/cscript2445623341660096310.pl
http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
2014-06-08 16:37:51,024-0500 INFO Block Block task status changed:
Submitted
2014-06-08 16:37:51,027-0500 INFO Block Block task status changed: Active
2014-06-08 16:37:51,027-0500 INFO RemoteLogHandler BLOCK_ACTIVE
id=0608-3704500-000000
2014-06-08 16:37:51,027-0500 INFO RequestHandler Handler(tag: 3, RLOG)
unregistering (send)
2014-06-08 16:37:51,681-0500 INFO RuntimeStats$ProgressTicker Submitted:1
2014-06-08 16:37:51,681-0500 INFO RuntimeStats$ProgressTicker HeapMax:
954466304, CrtHeap: 253624320, UsedHeap: 28583112
2014-06-08 16:38:21,683-0500 INFO RuntimeStats$ProgressTicker Submitted:1
2014-06-08 16:38:21,683-0500 INFO RuntimeStats$ProgressTicker HeapMax:
954466304, CrtHeap: 253624320, UsedHeap: 29067208
2014-06-08 16:38:51,686-0500 INFO RuntimeStats$ProgressTicker Submitted:1
2014-06-08 16:38:51,686-0500 INFO RuntimeStats$ProgressTicker HeapMax:
954466304, CrtHeap: 253624320, UsedHeap: 29551304
2014-06-08 16:38:57,113-0500 INFO Block Block task status changed:
Failed Job failed with an exit code of 110
2014-06-08 16:38:57,115-0500 INFO Block Failed task spec: Job:
executable: /usr/bin/perl
arguments:
/home/wilde/.globus/coasters/cscript2445623341660096310.pl
http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
stdout: null
stderr: null
directory: /
batch: false
redirected: false
attributes:
hostcount=1,count=1,jobspernode=1,corespernode=1,maxwalltime=10
env: WORKER_LOGGING_LEVEL=NONE
2014-06-08 16:38:57,115-0500 INFO Block Worker task failed:
Failed to connect: Connection timed out at
/home/wilde/.globus/coasters/cscript2445623341660096310.pl line 1101.
--
Michael Wilde
Mathematics and Computer Science Computation Institute
Argonne National Laboratory The University of Chicago
More information about the Swift-devel
mailing list