[Swift-devel] Localhost coasters not working on Beagle

Mihael Hategan hategan at mcs.anl.gov
Sun Jun 8 17:33:15 CDT 2014


Can you enable worker logging and post the worker log?

Mihael

On Sun, 2014-06-08 at 16:48 -0500, Michael Wilde wrote:
> Mihael - Im not able to get a simple localhost coasters run working on 
> Beagle login1.
> 
> All: Is anyone seeing something similar?  It looks to me like my coaster 
> worker is not able to connect to the Swift coaster service (using 
> standard automatic coasters).
> 
> Im working in /lustre/beagle/wilde/swift/lab/fastio (where you can find 
> logs and configs).  Running 0.95RC6.
> 
> Im setting GLOBUS_HOSTNAME (to 127.0.0.1) and have tried internalHost as 
> well:
> 
> login1$ swift -config cf -tc.file apps -sites.file localcoast.xml 
> catsn.swift
> 
> login1$ cat localcoast.xml
> <?xml version="1.0" encoding="UTF-8"?>
> <config xmlns="http://www.ci.uchicago.edu/swift/SwiftSites">
> 
> <pool handle="localhost">
> 
> <execution provider="coaster" jobmanager="local:local"/>
> 
> <profile namespace="globus" key="internalHostname">127.0.0.1</profile>
>    <profile namespace="globus" key="maxwalltime">00:01:00</profile>
>    <profile namespace="globus" key="maxtime">3600</profile>
> 
>    <profile namespace="globus" key="jobsPerNode">1</profile>
>    <profile namespace="globus" key="slots">1</profile>
>    <profile namespace="globus" key="nodeGranularity">1</profile>
>    <profile namespace="globus" key="maxNodes">1</profile>
> 
>    <profile namespace="karajan" key="jobThrottle">12</profile>
>    <profile namespace="karajan" key="initialScore">10000</profile>
> 
>    <profile namespace="karajan" key="lowOverAllocation">100</profile>
>    <profile namespace="karajan" key="highOverAllocation">100</profile>
> 
> <filesystem provider="local"/>
> <workdirectory>/tmp/swiftwork</workdirectory>
> 
> 
> </pool>
> 
> I get error 110 connection timeouts:
> 
> 2014-06-08 16:37:50,762-0500 DEBUG swift JOB_START jobid=cat-7jiymsrl 
> tr=cat arguments=[data.txt] tmpdir=catsn-run013/jobs/7/cat-7jiymsrl 
> host=localhost
> 2014-06-08 16:37:50,829-0500 INFO  LocalService Started local service: 
> 127.0.0.1:50000
> 2014-06-08 16:37:50,837-0500 INFO  BootstrapService Socket bound. URL is 
> http://127.0.0.1:50001
> 2014-06-08 16:37:50,914-0500 INFO  Settings Local contacts: 
> [http://127.0.0.2:50003, http://192.5.86.104:50003, 
> http://10.128.2.244:50003]
> 2014-06-08 16:37:50,917-0500 INFO  CoasterService Started local service: 
> http://127.0.0.1:50003
> 2014-06-08 16:37:50,917-0500 INFO  CoasterService Reserving channel for 
> registration
> 2014-06-08 16:37:50,942-0500 INFO  MetaChannel MetaChannel [context: 
> cpipe, boundTo: null] binding to cpipe://1
> 2014-06-08 16:37:50,942-0500 INFO  MetaChannel MetaChannel [context: 
> spipe, boundTo: null] binding to spipe://1
> 2014-06-08 16:37:50,942-0500 INFO  CoasterService Sending registration
> 2014-06-08 16:37:50,948-0500 INFO  MetaChannel Trying to re-bind current 
> channel
> 2014-06-08 16:37:50,949-0500 INFO  RequestHandler Handler(tag: 1, 
> REGISTER) unregistering (send)
> 2014-06-08 16:37:50,949-0500 INFO  CoasterService Registration complete
> 2014-06-08 16:37:50,949-0500 INFO  CoasterService Started coaster 
> service: http://127.0.0.1:50002
> 2014-06-08 16:37:50,952-0500 WARN  Settings original callback URI is 
> http://10.128.2.244:50003
> 2014-06-08 16:37:50,952-0500 WARN  Settings callback URI has been 
> overridden to http://127.0.0.1:50003
> 2014-06-08 16:37:50,953-0500 INFO  RequestHandler Handler(tag: 1, 
> CONFIGSERVICE) unregistering (send)
> 2014-06-08 16:37:50,969-0500 INFO  BlockQueueProcessor Starting... 
> id=0608-3704500
> 2014-06-08 16:37:50,969-0500 INFO  RequestHandler Handler(tag: 2, 
> SUBMITJOB) unregistering (send)
> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor
> Settings {
>      slots = 1
>      jobsPerNode = 1
>      workersPerNode = 1
>      nodeGranularity = 1
>      allocationStepSize = 0.1
>      maxNodes = 1
>      lowOverallocation = 10.0
>      highOverallocation = 1.0
>      overallocationDecayFactor = 0.001
>      spread = 0.9
>      reserve = 60.000s
>      maxtime = 3600
>      remoteMonitorEnabled = false
>      internalHostname = 127.0.0.1
>      hookClass = null
>      workerManager = block
>      workerLoggingLevel = NONE
>      workerLoggingDirectory = DEFAULT
>      ldLibraryPath = null
>      workerCopies = null
>      directory = null
>      useHashBang = null
>      parallelism = 0.01
>      coresPerNode = 1
>      perfTraceWorker = false
>      perfTraceInterval = -1
>      attributes = {}
>      callbackURIs = [http://127.0.0.1:50003]
> }
> 
> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Jobs in holding 
> queue: 1
> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Time estimate for 
> holding queue (seconds): 1
> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Allocating blocks 
> for a total walltime of: 1s
> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor  Considering: 
> Job(id:0 60.000s)
> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor       Max 
> Walltime (seconds):   60
> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor       Time 
> estimate (seconds):  600
> 2014-06-08 16:37:51,010-0500 INFO  BlockQueueProcessor       Total for 
> this new Block (est. seconds): 0
> 2014-06-08 16:37:51,013-0500 INFO  BlockQueueProcessor index: 0, last: 
> 0, holding.size(): 1
> 2014-06-08 16:37:51,014-0500 INFO  BlockQueueProcessor Queued: 1 jobs to 
> new Block
> 2014-06-08 16:37:51,014-0500 INFO  BlockQueueProcessor index: 0, last: 
> 0, ii: 1, holding.size(): 1
> 2014-06-08 16:37:51,014-0500 INFO  Block Starting block: workers=1, 
> walltime=600.000s
> 2014-06-08 16:37:51,016-0500 INFO  RemoteLogHandler BLOCK_REQUESTED 
> id=0608-3704500-000000, cores=1, coresPerWorker=1, walltime=600
> 2014-06-08 16:37:51,016-0500 INFO  RequestHandler Handler(tag: 2, RLOG) 
> unregistering (send)
> 2014-06-08 16:37:51,018-0500 INFO  BlockTaskSubmitter Queuing block 
> Block 0608-3704500-000000 (1x600.000s) for submission
> 2014-06-08 16:37:51,018-0500 INFO  BlockQueueProcessor Added 1 jobs to 
> new blocks
> 2014-06-08 16:37:51,018-0500 INFO  BlockTaskSubmitter Submitting block 
> Block 0608-3704500-000000 (1x600.000s)
> 2014-06-08 16:37:51,018-0500 INFO  ExecutionTaskHandler provider=local
> 2014-06-08 16:37:51,023-0500 INFO  Block Block task status changed: 
> Submitting
> 2014-06-08 16:37:51,023-0500 INFO  JobSubmissionTaskHandler Submit: in: 
> / command: /usr/bin/perl 
> /home/wilde/.globus/coasters/cscript2445623341660096310.pl 
> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
> 2014-06-08 16:37:51,024-0500 INFO  Block Block task status changed: 
> Submitted
> 2014-06-08 16:37:51,027-0500 INFO  Block Block task status changed: Active
> 2014-06-08 16:37:51,027-0500 INFO  RemoteLogHandler BLOCK_ACTIVE 
> id=0608-3704500-000000
> 2014-06-08 16:37:51,027-0500 INFO  RequestHandler Handler(tag: 3, RLOG) 
> unregistering (send)
> 2014-06-08 16:37:51,681-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
> 2014-06-08 16:37:51,681-0500 INFO  RuntimeStats$ProgressTicker HeapMax: 
> 954466304, CrtHeap: 253624320, UsedHeap: 28583112
> 2014-06-08 16:38:21,683-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
> 2014-06-08 16:38:21,683-0500 INFO  RuntimeStats$ProgressTicker HeapMax: 
> 954466304, CrtHeap: 253624320, UsedHeap: 29067208
> 2014-06-08 16:38:51,686-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
> 2014-06-08 16:38:51,686-0500 INFO  RuntimeStats$ProgressTicker HeapMax: 
> 954466304, CrtHeap: 253624320, UsedHeap: 29551304
> 2014-06-08 16:38:57,113-0500 INFO  Block Block task status changed: 
> Failed Job failed with an exit code of 110
> 2014-06-08 16:38:57,115-0500 INFO  Block Failed task spec: Job:
>      executable: /usr/bin/perl
>      arguments: 
> /home/wilde/.globus/coasters/cscript2445623341660096310.pl 
> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
>      stdout:     null
>      stderr:     null
>      directory:  /
>      batch:      false
>      redirected: false
>      attributes: 
> hostcount=1,count=1,jobspernode=1,corespernode=1,maxwalltime=10
>      env:        WORKER_LOGGING_LEVEL=NONE
> 
> 2014-06-08 16:38:57,115-0500 INFO  Block Worker task failed:
> Failed to connect: Connection timed out at 
> /home/wilde/.globus/coasters/cscript2445623341660096310.pl line 1101.
> 
> 
> 





More information about the Swift-devel mailing list