[Swift-devel] Localhost coasters not working on Beagle
Michael Wilde
wilde at anl.gov
Sun Jun 8 22:10:38 CDT 2014
login1$ more /home/wilde/.globus/coasters/worker-0608-0710120-000000.log
2014/06/08 22:07:12.296 INFO - 0608-0710120-000000 Logging started: Sun
Jun 8 22:07:12 2014
2014/06/08 22:07:12.296 INFO - Running on node
login1.beagle.ci.uchicago.edu
2014/06/08 22:07:12.296 DEBUG - uri=http://127.0.0.1:50003
2014/06/08 22:07:12.296 DEBUG - scheme=http
2014/06/08 22:07:12.297 DEBUG - host=127.0.0.1
2014/06/08 22:07:12.297 DEBUG - port=50003
2014/06/08 22:07:12.297 DEBUG - blockid=0608-0710120-000000
2014/06/08 22:07:12.297 INFO - Connect attempt: 0 ...
2014/06/08 22:07:12.297 INFO - Trying 127.0.0.1:50003 ...
2014/06/08 22:07:33.296 INFO - Connection failed: Connection timed out.
Trying other addresses
2014/06/08 22:07:33.296 ERROR - Connection failed for all addresses.
2014/06/08 22:07:33.296 ERROR - Retrying in 1 seconds
2014/06/08 22:07:34.297 INFO - Connect attempt: 1 ...
2014/06/08 22:07:34.297 INFO - Trying 127.0.0.1:50003 ...
2014/06/08 22:07:55.295 INFO - Connection failed: Connection timed out.
Trying other addresses
2014/06/08 22:07:55.296 ERROR - Connection failed for all addresses.
2014/06/08 22:07:55.296 ERROR - Retrying in 2 seconds
2014/06/08 22:07:57.298 INFO - Connect attempt: 2 ...
2014/06/08 22:07:57.298 INFO - Trying 127.0.0.1:50003 ...
2014/06/08 22:08:18.295 INFO - Connection failed: Connection timed out.
Trying other addresses
2014/06/08 22:08:18.295 ERROR - Connection failed for all addresses.
2014/06/08 22:08:18.295 ERROR - Failed to connect: Connection timed out
login1$
On 6/8/14, 5:33 PM, Mihael Hategan wrote:
> Can you enable worker logging and post the worker log?
>
> Mihael
>
> On Sun, 2014-06-08 at 16:48 -0500, Michael Wilde wrote:
>> Mihael - Im not able to get a simple localhost coasters run working on
>> Beagle login1.
>>
>> All: Is anyone seeing something similar? It looks to me like my coaster
>> worker is not able to connect to the Swift coaster service (using
>> standard automatic coasters).
>>
>> Im working in /lustre/beagle/wilde/swift/lab/fastio (where you can find
>> logs and configs). Running 0.95RC6.
>>
>> Im setting GLOBUS_HOSTNAME (to 127.0.0.1) and have tried internalHost as
>> well:
>>
>> login1$ swift -config cf -tc.file apps -sites.file localcoast.xml
>> catsn.swift
>>
>> login1$ cat localcoast.xml
>> <?xml version="1.0" encoding="UTF-8"?>
>> <config xmlns="http://www.ci.uchicago.edu/swift/SwiftSites">
>>
>> <pool handle="localhost">
>>
>> <execution provider="coaster" jobmanager="local:local"/>
>>
>> <profile namespace="globus" key="internalHostname">127.0.0.1</profile>
>> <profile namespace="globus" key="maxwalltime">00:01:00</profile>
>> <profile namespace="globus" key="maxtime">3600</profile>
>>
>> <profile namespace="globus" key="jobsPerNode">1</profile>
>> <profile namespace="globus" key="slots">1</profile>
>> <profile namespace="globus" key="nodeGranularity">1</profile>
>> <profile namespace="globus" key="maxNodes">1</profile>
>>
>> <profile namespace="karajan" key="jobThrottle">12</profile>
>> <profile namespace="karajan" key="initialScore">10000</profile>
>>
>> <profile namespace="karajan" key="lowOverAllocation">100</profile>
>> <profile namespace="karajan" key="highOverAllocation">100</profile>
>>
>> <filesystem provider="local"/>
>> <workdirectory>/tmp/swiftwork</workdirectory>
>>
>>
>> </pool>
>>
>> I get error 110 connection timeouts:
>>
>> 2014-06-08 16:37:50,762-0500 DEBUG swift JOB_START jobid=cat-7jiymsrl
>> tr=cat arguments=[data.txt] tmpdir=catsn-run013/jobs/7/cat-7jiymsrl
>> host=localhost
>> 2014-06-08 16:37:50,829-0500 INFO LocalService Started local service:
>> 127.0.0.1:50000
>> 2014-06-08 16:37:50,837-0500 INFO BootstrapService Socket bound. URL is
>> http://127.0.0.1:50001
>> 2014-06-08 16:37:50,914-0500 INFO Settings Local contacts:
>> [http://127.0.0.2:50003, http://192.5.86.104:50003,
>> http://10.128.2.244:50003]
>> 2014-06-08 16:37:50,917-0500 INFO CoasterService Started local service:
>> http://127.0.0.1:50003
>> 2014-06-08 16:37:50,917-0500 INFO CoasterService Reserving channel for
>> registration
>> 2014-06-08 16:37:50,942-0500 INFO MetaChannel MetaChannel [context:
>> cpipe, boundTo: null] binding to cpipe://1
>> 2014-06-08 16:37:50,942-0500 INFO MetaChannel MetaChannel [context:
>> spipe, boundTo: null] binding to spipe://1
>> 2014-06-08 16:37:50,942-0500 INFO CoasterService Sending registration
>> 2014-06-08 16:37:50,948-0500 INFO MetaChannel Trying to re-bind current
>> channel
>> 2014-06-08 16:37:50,949-0500 INFO RequestHandler Handler(tag: 1,
>> REGISTER) unregistering (send)
>> 2014-06-08 16:37:50,949-0500 INFO CoasterService Registration complete
>> 2014-06-08 16:37:50,949-0500 INFO CoasterService Started coaster
>> service: http://127.0.0.1:50002
>> 2014-06-08 16:37:50,952-0500 WARN Settings original callback URI is
>> http://10.128.2.244:50003
>> 2014-06-08 16:37:50,952-0500 WARN Settings callback URI has been
>> overridden to http://127.0.0.1:50003
>> 2014-06-08 16:37:50,953-0500 INFO RequestHandler Handler(tag: 1,
>> CONFIGSERVICE) unregistering (send)
>> 2014-06-08 16:37:50,969-0500 INFO BlockQueueProcessor Starting...
>> id=0608-3704500
>> 2014-06-08 16:37:50,969-0500 INFO RequestHandler Handler(tag: 2,
>> SUBMITJOB) unregistering (send)
>> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor
>> Settings {
>> slots = 1
>> jobsPerNode = 1
>> workersPerNode = 1
>> nodeGranularity = 1
>> allocationStepSize = 0.1
>> maxNodes = 1
>> lowOverallocation = 10.0
>> highOverallocation = 1.0
>> overallocationDecayFactor = 0.001
>> spread = 0.9
>> reserve = 60.000s
>> maxtime = 3600
>> remoteMonitorEnabled = false
>> internalHostname = 127.0.0.1
>> hookClass = null
>> workerManager = block
>> workerLoggingLevel = NONE
>> workerLoggingDirectory = DEFAULT
>> ldLibraryPath = null
>> workerCopies = null
>> directory = null
>> useHashBang = null
>> parallelism = 0.01
>> coresPerNode = 1
>> perfTraceWorker = false
>> perfTraceInterval = -1
>> attributes = {}
>> callbackURIs = [http://127.0.0.1:50003]
>> }
>>
>> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Jobs in holding
>> queue: 1
>> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Time estimate for
>> holding queue (seconds): 1
>> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Allocating blocks
>> for a total walltime of: 1s
>> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Considering:
>> Job(id:0 60.000s)
>> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Max
>> Walltime (seconds): 60
>> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Time
>> estimate (seconds): 600
>> 2014-06-08 16:37:51,010-0500 INFO BlockQueueProcessor Total for
>> this new Block (est. seconds): 0
>> 2014-06-08 16:37:51,013-0500 INFO BlockQueueProcessor index: 0, last:
>> 0, holding.size(): 1
>> 2014-06-08 16:37:51,014-0500 INFO BlockQueueProcessor Queued: 1 jobs to
>> new Block
>> 2014-06-08 16:37:51,014-0500 INFO BlockQueueProcessor index: 0, last:
>> 0, ii: 1, holding.size(): 1
>> 2014-06-08 16:37:51,014-0500 INFO Block Starting block: workers=1,
>> walltime=600.000s
>> 2014-06-08 16:37:51,016-0500 INFO RemoteLogHandler BLOCK_REQUESTED
>> id=0608-3704500-000000, cores=1, coresPerWorker=1, walltime=600
>> 2014-06-08 16:37:51,016-0500 INFO RequestHandler Handler(tag: 2, RLOG)
>> unregistering (send)
>> 2014-06-08 16:37:51,018-0500 INFO BlockTaskSubmitter Queuing block
>> Block 0608-3704500-000000 (1x600.000s) for submission
>> 2014-06-08 16:37:51,018-0500 INFO BlockQueueProcessor Added 1 jobs to
>> new blocks
>> 2014-06-08 16:37:51,018-0500 INFO BlockTaskSubmitter Submitting block
>> Block 0608-3704500-000000 (1x600.000s)
>> 2014-06-08 16:37:51,018-0500 INFO ExecutionTaskHandler provider=local
>> 2014-06-08 16:37:51,023-0500 INFO Block Block task status changed:
>> Submitting
>> 2014-06-08 16:37:51,023-0500 INFO JobSubmissionTaskHandler Submit: in:
>> / command: /usr/bin/perl
>> /home/wilde/.globus/coasters/cscript2445623341660096310.pl
>> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
>> 2014-06-08 16:37:51,024-0500 INFO Block Block task status changed:
>> Submitted
>> 2014-06-08 16:37:51,027-0500 INFO Block Block task status changed: Active
>> 2014-06-08 16:37:51,027-0500 INFO RemoteLogHandler BLOCK_ACTIVE
>> id=0608-3704500-000000
>> 2014-06-08 16:37:51,027-0500 INFO RequestHandler Handler(tag: 3, RLOG)
>> unregistering (send)
>> 2014-06-08 16:37:51,681-0500 INFO RuntimeStats$ProgressTicker Submitted:1
>> 2014-06-08 16:37:51,681-0500 INFO RuntimeStats$ProgressTicker HeapMax:
>> 954466304, CrtHeap: 253624320, UsedHeap: 28583112
>> 2014-06-08 16:38:21,683-0500 INFO RuntimeStats$ProgressTicker Submitted:1
>> 2014-06-08 16:38:21,683-0500 INFO RuntimeStats$ProgressTicker HeapMax:
>> 954466304, CrtHeap: 253624320, UsedHeap: 29067208
>> 2014-06-08 16:38:51,686-0500 INFO RuntimeStats$ProgressTicker Submitted:1
>> 2014-06-08 16:38:51,686-0500 INFO RuntimeStats$ProgressTicker HeapMax:
>> 954466304, CrtHeap: 253624320, UsedHeap: 29551304
>> 2014-06-08 16:38:57,113-0500 INFO Block Block task status changed:
>> Failed Job failed with an exit code of 110
>> 2014-06-08 16:38:57,115-0500 INFO Block Failed task spec: Job:
>> executable: /usr/bin/perl
>> arguments:
>> /home/wilde/.globus/coasters/cscript2445623341660096310.pl
>> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
>> stdout: null
>> stderr: null
>> directory: /
>> batch: false
>> redirected: false
>> attributes:
>> hostcount=1,count=1,jobspernode=1,corespernode=1,maxwalltime=10
>> env: WORKER_LOGGING_LEVEL=NONE
>>
>> 2014-06-08 16:38:57,115-0500 INFO Block Worker task failed:
>> Failed to connect: Connection timed out at
>> /home/wilde/.globus/coasters/cscript2445623341660096310.pl line 1101.
>>
>>
>>
>
--
Michael Wilde
Mathematics and Computer Science Computation Institute
Argonne National Laboratory The University of Chicago
More information about the Swift-devel
mailing list