[Swift-devel] Localhost coasters not working on Beagle
Mihael Hategan
hategan at mcs.anl.gov
Sun Jun 8 17:33:15 CDT 2014
Can you enable worker logging and post the worker log?
Mihael
On Sun, 2014-06-08 at 16:48 -0500, Michael Wilde wrote:
> Mihael - Im not able to get a simple localhost coasters run working on
> Beagle login1.
>
> All: Is anyone seeing something similar? It looks to me like my coaster
> worker is not able to connect to the Swift coaster service (using
> standard automatic coasters).
>
> Im working in /lustre/beagle/wilde/swift/lab/fastio (where you can find
> logs and configs). Running 0.95RC6.
>
> Im setting GLOBUS_HOSTNAME (to 127.0.0.1) and have tried internalHost as
> well:
>
> login1$ swift -config cf -tc.file apps -sites.file localcoast.xml
> catsn.swift
>
> login1$ cat localcoast.xml
> <?xml version="1.0" encoding="UTF-8"?>
> <config xmlns="http://www.ci.uchicago.edu/swift/SwiftSites">
>
> <pool handle="localhost">
>
> <execution provider="coaster" jobmanager="local:local"/>
>
> <profile namespace="globus" key="internalHostname">127.0.0.1</profile>
> <profile namespace="globus" key="maxwalltime">00:01:00</profile>
> <profile namespace="globus" key="maxtime">3600</profile>
>
> <profile namespace="globus" key="jobsPerNode">1</profile>
> <profile namespace="globus" key="slots">1</profile>
> <profile namespace="globus" key="nodeGranularity">1</profile>
> <profile namespace="globus" key="maxNodes">1</profile>
>
> <profile namespace="karajan" key="jobThrottle">12</profile>
> <profile namespace="karajan" key="initialScore">10000</profile>
>
> <profile namespace="karajan" key="lowOverAllocation">100</profile>
> <profile namespace="karajan" key="highOverAllocation">100</profile>
>
> <filesystem provider="local"/>
> <workdirectory>/tmp/swiftwork</workdirectory>
>
>
> </pool>
>
> I get error 110 connection timeouts:
>
> 2014-06-08 16:37:50,762-0500 DEBUG swift JOB_START jobid=cat-7jiymsrl
> tr=cat arguments=[data.txt] tmpdir=catsn-run013/jobs/7/cat-7jiymsrl
> host=localhost
> 2014-06-08 16:37:50,829-0500 INFO LocalService Started local service:
> 127.0.0.1:50000
> 2014-06-08 16:37:50,837-0500 INFO BootstrapService Socket bound. URL is
> http://127.0.0.1:50001
> 2014-06-08 16:37:50,914-0500 INFO Settings Local contacts:
> [http://127.0.0.2:50003, http://192.5.86.104:50003,
> http://10.128.2.244:50003]
> 2014-06-08 16:37:50,917-0500 INFO CoasterService Started local service:
> http://127.0.0.1:50003
> 2014-06-08 16:37:50,917-0500 INFO CoasterService Reserving channel for
> registration
> 2014-06-08 16:37:50,942-0500 INFO MetaChannel MetaChannel [context:
> cpipe, boundTo: null] binding to cpipe://1
> 2014-06-08 16:37:50,942-0500 INFO MetaChannel MetaChannel [context:
> spipe, boundTo: null] binding to spipe://1
> 2014-06-08 16:37:50,942-0500 INFO CoasterService Sending registration
> 2014-06-08 16:37:50,948-0500 INFO MetaChannel Trying to re-bind current
> channel
> 2014-06-08 16:37:50,949-0500 INFO RequestHandler Handler(tag: 1,
> REGISTER) unregistering (send)
> 2014-06-08 16:37:50,949-0500 INFO CoasterService Registration complete
> 2014-06-08 16:37:50,949-0500 INFO CoasterService Started coaster
> service: http://127.0.0.1:50002
> 2014-06-08 16:37:50,952-0500 WARN Settings original callback URI is
> http://10.128.2.244:50003
> 2014-06-08 16:37:50,952-0500 WARN Settings callback URI has been
> overridden to http://127.0.0.1:50003
> 2014-06-08 16:37:50,953-0500 INFO RequestHandler Handler(tag: 1,
> CONFIGSERVICE) unregistering (send)
> 2014-06-08 16:37:50,969-0500 INFO BlockQueueProcessor Starting...
> id=0608-3704500
> 2014-06-08 16:37:50,969-0500 INFO RequestHandler Handler(tag: 2,
> SUBMITJOB) unregistering (send)
> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor
> Settings {
> slots = 1
> jobsPerNode = 1
> workersPerNode = 1
> nodeGranularity = 1
> allocationStepSize = 0.1
> maxNodes = 1
> lowOverallocation = 10.0
> highOverallocation = 1.0
> overallocationDecayFactor = 0.001
> spread = 0.9
> reserve = 60.000s
> maxtime = 3600
> remoteMonitorEnabled = false
> internalHostname = 127.0.0.1
> hookClass = null
> workerManager = block
> workerLoggingLevel = NONE
> workerLoggingDirectory = DEFAULT
> ldLibraryPath = null
> workerCopies = null
> directory = null
> useHashBang = null
> parallelism = 0.01
> coresPerNode = 1
> perfTraceWorker = false
> perfTraceInterval = -1
> attributes = {}
> callbackURIs = [http://127.0.0.1:50003]
> }
>
> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Jobs in holding
> queue: 1
> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Time estimate for
> holding queue (seconds): 1
> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Allocating blocks
> for a total walltime of: 1s
> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Considering:
> Job(id:0 60.000s)
> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Max
> Walltime (seconds): 60
> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Time
> estimate (seconds): 600
> 2014-06-08 16:37:51,010-0500 INFO BlockQueueProcessor Total for
> this new Block (est. seconds): 0
> 2014-06-08 16:37:51,013-0500 INFO BlockQueueProcessor index: 0, last:
> 0, holding.size(): 1
> 2014-06-08 16:37:51,014-0500 INFO BlockQueueProcessor Queued: 1 jobs to
> new Block
> 2014-06-08 16:37:51,014-0500 INFO BlockQueueProcessor index: 0, last:
> 0, ii: 1, holding.size(): 1
> 2014-06-08 16:37:51,014-0500 INFO Block Starting block: workers=1,
> walltime=600.000s
> 2014-06-08 16:37:51,016-0500 INFO RemoteLogHandler BLOCK_REQUESTED
> id=0608-3704500-000000, cores=1, coresPerWorker=1, walltime=600
> 2014-06-08 16:37:51,016-0500 INFO RequestHandler Handler(tag: 2, RLOG)
> unregistering (send)
> 2014-06-08 16:37:51,018-0500 INFO BlockTaskSubmitter Queuing block
> Block 0608-3704500-000000 (1x600.000s) for submission
> 2014-06-08 16:37:51,018-0500 INFO BlockQueueProcessor Added 1 jobs to
> new blocks
> 2014-06-08 16:37:51,018-0500 INFO BlockTaskSubmitter Submitting block
> Block 0608-3704500-000000 (1x600.000s)
> 2014-06-08 16:37:51,018-0500 INFO ExecutionTaskHandler provider=local
> 2014-06-08 16:37:51,023-0500 INFO Block Block task status changed:
> Submitting
> 2014-06-08 16:37:51,023-0500 INFO JobSubmissionTaskHandler Submit: in:
> / command: /usr/bin/perl
> /home/wilde/.globus/coasters/cscript2445623341660096310.pl
> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
> 2014-06-08 16:37:51,024-0500 INFO Block Block task status changed:
> Submitted
> 2014-06-08 16:37:51,027-0500 INFO Block Block task status changed: Active
> 2014-06-08 16:37:51,027-0500 INFO RemoteLogHandler BLOCK_ACTIVE
> id=0608-3704500-000000
> 2014-06-08 16:37:51,027-0500 INFO RequestHandler Handler(tag: 3, RLOG)
> unregistering (send)
> 2014-06-08 16:37:51,681-0500 INFO RuntimeStats$ProgressTicker Submitted:1
> 2014-06-08 16:37:51,681-0500 INFO RuntimeStats$ProgressTicker HeapMax:
> 954466304, CrtHeap: 253624320, UsedHeap: 28583112
> 2014-06-08 16:38:21,683-0500 INFO RuntimeStats$ProgressTicker Submitted:1
> 2014-06-08 16:38:21,683-0500 INFO RuntimeStats$ProgressTicker HeapMax:
> 954466304, CrtHeap: 253624320, UsedHeap: 29067208
> 2014-06-08 16:38:51,686-0500 INFO RuntimeStats$ProgressTicker Submitted:1
> 2014-06-08 16:38:51,686-0500 INFO RuntimeStats$ProgressTicker HeapMax:
> 954466304, CrtHeap: 253624320, UsedHeap: 29551304
> 2014-06-08 16:38:57,113-0500 INFO Block Block task status changed:
> Failed Job failed with an exit code of 110
> 2014-06-08 16:38:57,115-0500 INFO Block Failed task spec: Job:
> executable: /usr/bin/perl
> arguments:
> /home/wilde/.globus/coasters/cscript2445623341660096310.pl
> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
> stdout: null
> stderr: null
> directory: /
> batch: false
> redirected: false
> attributes:
> hostcount=1,count=1,jobspernode=1,corespernode=1,maxwalltime=10
> env: WORKER_LOGGING_LEVEL=NONE
>
> 2014-06-08 16:38:57,115-0500 INFO Block Worker task failed:
> Failed to connect: Connection timed out at
> /home/wilde/.globus/coasters/cscript2445623341660096310.pl line 1101.
>
>
>
More information about the Swift-devel
mailing list