[Swift-devel] Localhost coasters not working on Beagle

Michael Wilde wilde at anl.gov
Sun Jun 8 16:48:00 CDT 2014


Mihael - Im not able to get a simple localhost coasters run working on 
Beagle login1.

All: Is anyone seeing something similar?  It looks to me like my coaster 
worker is not able to connect to the Swift coaster service (using 
standard automatic coasters).

Im working in /lustre/beagle/wilde/swift/lab/fastio (where you can find 
logs and configs).  Running 0.95RC6.

Im setting GLOBUS_HOSTNAME (to 127.0.0.1) and have tried internalHost as 
well:

login1$ swift -config cf -tc.file apps -sites.file localcoast.xml 
catsn.swift

login1$ cat localcoast.xml
<?xml version="1.0" encoding="UTF-8"?>
<config xmlns="http://www.ci.uchicago.edu/swift/SwiftSites">

<pool handle="localhost">

<execution provider="coaster" jobmanager="local:local"/>

<profile namespace="globus" key="internalHostname">127.0.0.1</profile>
   <profile namespace="globus" key="maxwalltime">00:01:00</profile>
   <profile namespace="globus" key="maxtime">3600</profile>

   <profile namespace="globus" key="jobsPerNode">1</profile>
   <profile namespace="globus" key="slots">1</profile>
   <profile namespace="globus" key="nodeGranularity">1</profile>
   <profile namespace="globus" key="maxNodes">1</profile>

   <profile namespace="karajan" key="jobThrottle">12</profile>
   <profile namespace="karajan" key="initialScore">10000</profile>

   <profile namespace="karajan" key="lowOverAllocation">100</profile>
   <profile namespace="karajan" key="highOverAllocation">100</profile>

<filesystem provider="local"/>
<workdirectory>/tmp/swiftwork</workdirectory>


</pool>

I get error 110 connection timeouts:

2014-06-08 16:37:50,762-0500 DEBUG swift JOB_START jobid=cat-7jiymsrl 
tr=cat arguments=[data.txt] tmpdir=catsn-run013/jobs/7/cat-7jiymsrl 
host=localhost
2014-06-08 16:37:50,829-0500 INFO  LocalService Started local service: 
127.0.0.1:50000
2014-06-08 16:37:50,837-0500 INFO  BootstrapService Socket bound. URL is 
http://127.0.0.1:50001
2014-06-08 16:37:50,914-0500 INFO  Settings Local contacts: 
[http://127.0.0.2:50003, http://192.5.86.104:50003, 
http://10.128.2.244:50003]
2014-06-08 16:37:50,917-0500 INFO  CoasterService Started local service: 
http://127.0.0.1:50003
2014-06-08 16:37:50,917-0500 INFO  CoasterService Reserving channel for 
registration
2014-06-08 16:37:50,942-0500 INFO  MetaChannel MetaChannel [context: 
cpipe, boundTo: null] binding to cpipe://1
2014-06-08 16:37:50,942-0500 INFO  MetaChannel MetaChannel [context: 
spipe, boundTo: null] binding to spipe://1
2014-06-08 16:37:50,942-0500 INFO  CoasterService Sending registration
2014-06-08 16:37:50,948-0500 INFO  MetaChannel Trying to re-bind current 
channel
2014-06-08 16:37:50,949-0500 INFO  RequestHandler Handler(tag: 1, 
REGISTER) unregistering (send)
2014-06-08 16:37:50,949-0500 INFO  CoasterService Registration complete
2014-06-08 16:37:50,949-0500 INFO  CoasterService Started coaster 
service: http://127.0.0.1:50002
2014-06-08 16:37:50,952-0500 WARN  Settings original callback URI is 
http://10.128.2.244:50003
2014-06-08 16:37:50,952-0500 WARN  Settings callback URI has been 
overridden to http://127.0.0.1:50003
2014-06-08 16:37:50,953-0500 INFO  RequestHandler Handler(tag: 1, 
CONFIGSERVICE) unregistering (send)
2014-06-08 16:37:50,969-0500 INFO  BlockQueueProcessor Starting... 
id=0608-3704500
2014-06-08 16:37:50,969-0500 INFO  RequestHandler Handler(tag: 2, 
SUBMITJOB) unregistering (send)
2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor
Settings {
     slots = 1
     jobsPerNode = 1
     workersPerNode = 1
     nodeGranularity = 1
     allocationStepSize = 0.1
     maxNodes = 1
     lowOverallocation = 10.0
     highOverallocation = 1.0
     overallocationDecayFactor = 0.001
     spread = 0.9
     reserve = 60.000s
     maxtime = 3600
     remoteMonitorEnabled = false
     internalHostname = 127.0.0.1
     hookClass = null
     workerManager = block
     workerLoggingLevel = NONE
     workerLoggingDirectory = DEFAULT
     ldLibraryPath = null
     workerCopies = null
     directory = null
     useHashBang = null
     parallelism = 0.01
     coresPerNode = 1
     perfTraceWorker = false
     perfTraceInterval = -1
     attributes = {}
     callbackURIs = [http://127.0.0.1:50003]
}

2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Jobs in holding 
queue: 1
2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Time estimate for 
holding queue (seconds): 1
2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Allocating blocks 
for a total walltime of: 1s
2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor  Considering: 
Job(id:0 60.000s)
2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor       Max 
Walltime (seconds):   60
2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor       Time 
estimate (seconds):  600
2014-06-08 16:37:51,010-0500 INFO  BlockQueueProcessor       Total for 
this new Block (est. seconds): 0
2014-06-08 16:37:51,013-0500 INFO  BlockQueueProcessor index: 0, last: 
0, holding.size(): 1
2014-06-08 16:37:51,014-0500 INFO  BlockQueueProcessor Queued: 1 jobs to 
new Block
2014-06-08 16:37:51,014-0500 INFO  BlockQueueProcessor index: 0, last: 
0, ii: 1, holding.size(): 1
2014-06-08 16:37:51,014-0500 INFO  Block Starting block: workers=1, 
walltime=600.000s
2014-06-08 16:37:51,016-0500 INFO  RemoteLogHandler BLOCK_REQUESTED 
id=0608-3704500-000000, cores=1, coresPerWorker=1, walltime=600
2014-06-08 16:37:51,016-0500 INFO  RequestHandler Handler(tag: 2, RLOG) 
unregistering (send)
2014-06-08 16:37:51,018-0500 INFO  BlockTaskSubmitter Queuing block 
Block 0608-3704500-000000 (1x600.000s) for submission
2014-06-08 16:37:51,018-0500 INFO  BlockQueueProcessor Added 1 jobs to 
new blocks
2014-06-08 16:37:51,018-0500 INFO  BlockTaskSubmitter Submitting block 
Block 0608-3704500-000000 (1x600.000s)
2014-06-08 16:37:51,018-0500 INFO  ExecutionTaskHandler provider=local
2014-06-08 16:37:51,023-0500 INFO  Block Block task status changed: 
Submitting
2014-06-08 16:37:51,023-0500 INFO  JobSubmissionTaskHandler Submit: in: 
/ command: /usr/bin/perl 
/home/wilde/.globus/coasters/cscript2445623341660096310.pl 
http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
2014-06-08 16:37:51,024-0500 INFO  Block Block task status changed: 
Submitted
2014-06-08 16:37:51,027-0500 INFO  Block Block task status changed: Active
2014-06-08 16:37:51,027-0500 INFO  RemoteLogHandler BLOCK_ACTIVE 
id=0608-3704500-000000
2014-06-08 16:37:51,027-0500 INFO  RequestHandler Handler(tag: 3, RLOG) 
unregistering (send)
2014-06-08 16:37:51,681-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
2014-06-08 16:37:51,681-0500 INFO  RuntimeStats$ProgressTicker HeapMax: 
954466304, CrtHeap: 253624320, UsedHeap: 28583112
2014-06-08 16:38:21,683-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
2014-06-08 16:38:21,683-0500 INFO  RuntimeStats$ProgressTicker HeapMax: 
954466304, CrtHeap: 253624320, UsedHeap: 29067208
2014-06-08 16:38:51,686-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
2014-06-08 16:38:51,686-0500 INFO  RuntimeStats$ProgressTicker HeapMax: 
954466304, CrtHeap: 253624320, UsedHeap: 29551304
2014-06-08 16:38:57,113-0500 INFO  Block Block task status changed: 
Failed Job failed with an exit code of 110
2014-06-08 16:38:57,115-0500 INFO  Block Failed task spec: Job:
     executable: /usr/bin/perl
     arguments: 
/home/wilde/.globus/coasters/cscript2445623341660096310.pl 
http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
     stdout:     null
     stderr:     null
     directory:  /
     batch:      false
     redirected: false
     attributes: 
hostcount=1,count=1,jobspernode=1,corespernode=1,maxwalltime=10
     env:        WORKER_LOGGING_LEVEL=NONE

2014-06-08 16:38:57,115-0500 INFO  Block Worker task failed:
Failed to connect: Connection timed out at 
/home/wilde/.globus/coasters/cscript2445623341660096310.pl line 1101.



-- 
Michael Wilde
Mathematics and Computer Science          Computation Institute
Argonne National Laboratory               The University of Chicago




More information about the Swift-devel mailing list