From wilde at anl.gov Sun Jun 1 23:55:03 2014 From: wilde at anl.gov (Michael Wilde) Date: Sun, 1 Jun 2014 23:55:03 -0500 Subject: [Swift-devel] tryswift changes needed Message-ID: <538C03A7.7070609@anl.gov> Hi David, If you have a moment, can you look at making the following changes/fixes: - when you click Explain, it should bring the Explain HTML window to the top, if its obscured. - For the Hello World app, it sometimes doesnt show the contents of out.txt in the output window. This has happened several times, but I cant yet see what causes it to happen. - If you select an output file thats already obscured by the main window, it should also bring that output file to the top - if you select File Outputs as your choice, you get a 404 for this URL: http://ec2-54-87-184-8.compute-1.amazonaws.com/File%20outputs These are not urgent, but would be good to fix soon (or suggest how to fix them). Thanks, - Mike -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago From wilde at anl.gov Mon Jun 2 00:21:12 2014 From: wilde at anl.gov (Michael Wilde) Date: Mon, 2 Jun 2014 00:21:12 -0500 Subject: [Swift-devel] Swift web changes pending In-Reply-To: <538A31A3.9030007@anl.gov> References: <538A31A3.9030007@anl.gov> Message-ID: <538C09C8.9080307@anl.gov> I just pushed these changes live. I had to manually update push_to.sh (David, is the list of pages supposed to be maintained automatically? Maybe we can drive that off of a find?) Mihael just fixed ticket 1279. David will make 0.95 the "latest download" (and update the button on both Home and Download, as push 0.94.X to the older releases page)? Yadu, can you re-test this release when you are back online in Chicago? Thanks, - Mike On 5/31/14, 2:46 PM, Michael Wilde wrote: > Hi All, > > I added TrySwift and Swift/T to the Swift main page. You can preview > the changes at: > > http://web.ci.uchicago.edu/~wilde/www/main/ > > The changes are committed to svn but not yet pushed to the main site. > > I added the GeMTC paper under "Whats New" after Swift/T. > > I created a new main directory Swift-T/ for Swift/T. At the moment, > this just forwards to the Google exm site Swift-T page. > > Justin: feel free to start integrating Swift/T content below this directory. > > You can check out the entire web below your public_html directory, test > there, and commit. > > After youre done and committed, I'll update my test copy, check it out, > and then push to the live site some time tomorrow. > > Im going to shift later to work on TrySwift text (probably not till > tomorrow morning). > > Yadu is looking at creating a Local Host tutorial version that runs on > Linux; hopefully same will run on Mac. > > Justin, Tim: do you know how to create a nice mac install package? Did > you do so for Swift/T? > > Thanks, > > - Mike > -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago From hategan at mcs.anl.gov Mon Jun 2 00:23:54 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 1 Jun 2014 22:23:54 -0700 Subject: [Swift-devel] Swift web changes pending In-Reply-To: <538C09C8.9080307@anl.gov> References: <538A31A3.9030007@anl.gov> <538C09C8.9080307@anl.gov> Message-ID: <1401686634.26836.0.camel@echo> On Mon, 2014-06-02 at 00:21 -0500, Michael Wilde wrote: > I just pushed these changes live. I had to manually update push_to.sh > (David, is the list of pages supposed to be maintained automatically? > Maybe we can drive that off of a find?) > > Mihael just fixed ticket 1279. David will make 0.95 the "latest > download" (and update the button on both Home and Download, as push > 0.94.X to the older releases page)? > > Yadu, can you re-test this release when you are back online in Chicago? Yeah, do we have a general feel for 0.95? I lost track. Mihael From skrieder at iit.edu Mon Jun 2 20:03:18 2014 From: skrieder at iit.edu (Scott Krieder) Date: Mon, 2 Jun 2014 20:03:18 -0500 Subject: [Swift-devel] apple swift language Message-ID: swift-lang.org is probably worth a lot of money now! http://www.cnet.com/news/apples-new-swift-coding-language-hopes-to-lock-down-errors/ -- Scott J. Krieder C: 419-685-0410 E: skrieder at iit.edu http://datasys.cs.iit.edu/~skrieder/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.g.armstrong at gmail.com Mon Jun 2 20:26:47 2014 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Mon, 2 Jun 2014 20:26:47 -0500 Subject: [Swift-devel] apple swift language In-Reply-To: References: Message-ID: We had an off-list thread about this - the site went down due to load pretty soon after it announced and only got back online thanks to David Kelly moving it to a bunch of AWS servers. - Tim On Mon, Jun 2, 2014 at 8:03 PM, Scott Krieder wrote: > swift-lang.org is probably worth a lot of money now! > > > http://www.cnet.com/news/apples-new-swift-coding-language-hopes-to-lock-down-errors/ > > -- > Scott J. Krieder > C: 419-685-0410 > E: skrieder at iit.edu > http://datasys.cs.iit.edu/~skrieder/ > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketan at mcs.anl.gov Mon Jun 2 22:56:37 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Mon, 2 Jun 2014 22:56:37 -0500 Subject: [Swift-devel] apple swift language In-Reply-To: References: Message-ID: Interesting looking features from this thread on Reddit: http://www.reddit.com/r/programming/comments/274t5s/apple_swift_programming_language_unveiled Statically typed with type inference. Generics. Closures. No exceptions. Extension methods. Properties (syntax similar to C#), including lazy properties with the "@lazy" annotation. Functions, methods and type (static) methods. Support for observers (with "willSet" and "didSet"). Interesting to see the observer pattern baked in a language although I'm more partial to event buses for this kind of thing. Enums. Classes and structures (structures have restrictions regarding inheritance and other things). For and while loops (statements, not expressions). "mutating" keyword. Named parameters. Deinitializers (finalizers). Protocols (interfaces). Optional chaining with "a?.b?.c" and forced dereference with "!."". Convenient "assign and test": "if let person = findPerson() ...". Type casting with "is", down casting with "as?" (combines nicely with the "let" syntax. Ceylon does it right too). On Mon, Jun 2, 2014 at 8:26 PM, Tim Armstrong wrote: > We had an off-list thread about this - the site went down due to load > pretty soon after it announced and only got back online thanks to David > Kelly moving it to a bunch of AWS servers. > > - Tim > > > On Mon, Jun 2, 2014 at 8:03 PM, Scott Krieder wrote: > >> swift-lang.org is probably worth a lot of money now! >> >> >> http://www.cnet.com/news/apples-new-swift-coding-language-hopes-to-lock-down-errors/ >> >> -- >> Scott J. Krieder >> C: 419-685-0410 >> E: skrieder at iit.edu >> http://datasys.cs.iit.edu/~skrieder/ >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> >> > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadudoc1729 at gmail.com Wed Jun 4 17:13:01 2014 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Thu, 5 Jun 2014 03:43:01 +0530 Subject: [Swift-devel] Swift web changes pending In-Reply-To: <1401686634.26836.0.camel@echo> References: <538A31A3.9030007@anl.gov> <538C09C8.9080307@anl.gov> <1401686634.26836.0.camel@echo> Message-ID: Hi, The last tests which ran Swift 0.95 Branch SVN swift-r7871 (swift modified locally) cog-r3905 passed most tests. There are tests which are failing including a remote testing failure on frisbee (mac). Build is working. I need to check into the modis test failures, which seem to be config issues. Here's a link to the results : http://swift.rcc.uchicago.edu:8043/swift-0.95/run-2014-06-03-220931/tests-2014-06-03.html Links from remote sites: Link: http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-login4.beagle.ci.uchicago.edu-220931/tests-2014-06-03.html Link: http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-midway001-220931/tests-2014-06-03.html Link: http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-thwomp-220931/tests-2014-06-03.html Link: http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-blogin1-220931/tests-2014-06-03.html Link: http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-flogin1-220931/tests-2014-06-03.html Link: http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-communicado.ci.uchicago.edu-220931/tests-2014-06-03.html -Yadu On Mon, Jun 2, 2014 at 10:53 AM, Mihael Hategan wrote: > On Mon, 2014-06-02 at 00:21 -0500, Michael Wilde wrote: > > I just pushed these changes live. I had to manually update push_to.sh > > (David, is the list of pages supposed to be maintained automatically? > > Maybe we can drive that off of a find?) > > > > Mihael just fixed ticket 1279. David will make 0.95 the "latest > > download" (and update the button on both Home and Download, as push > > 0.94.X to the older releases page)? > > > > Yadu, can you re-test this release when you are back online in Chicago? > > Yeah, do we have a general feel for 0.95? I lost track. > > Mihael > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -- Yadu Nand B -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Wed Jun 4 20:25:27 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 4 Jun 2014 18:25:27 -0700 Subject: [Swift-devel] Swift web changes pending In-Reply-To: References: <538A31A3.9030007@anl.gov> <538C09C8.9080307@anl.gov> <1401686634.26836.0.camel@echo> Message-ID: <1401931527.21488.2.camel@echo> There's a few of the form "sleep not in tc.data", "unknown site 'beagle'", etc. There's also a few "could not create JVM", which seem to point to some problem with the remote environment when starting coasters. Can you fix those please, so we can get an idea of where there are actual swift failures? Mihael On Thu, 2014-06-05 at 03:43 +0530, Yadu Nand wrote: > Hi, > > The last tests which ran Swift 0.95 Branch SVN swift-r7871 (swift modified > locally) cog-r3905 passed most tests. > There are tests which are failing including a remote testing failure on > frisbee (mac). Build is working. > > I need to check into the modis test failures, which seem to be config > issues. > > Here's a link to the results : > http://swift.rcc.uchicago.edu:8043/swift-0.95/run-2014-06-03-220931/tests-2014-06-03.html > Links from remote sites: > Link: > http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-login4.beagle.ci.uchicago.edu-220931/tests-2014-06-03.html > Link: > http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-midway001-220931/tests-2014-06-03.html > Link: > http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-thwomp-220931/tests-2014-06-03.html > Link: > http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-blogin1-220931/tests-2014-06-03.html > Link: > http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-flogin1-220931/tests-2014-06-03.html > Link: > http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-communicado.ci.uchicago.edu-220931/tests-2014-06-03.html > > -Yadu > > > On Mon, Jun 2, 2014 at 10:53 AM, Mihael Hategan wrote: > > > On Mon, 2014-06-02 at 00:21 -0500, Michael Wilde wrote: > > > I just pushed these changes live. I had to manually update push_to.sh > > > (David, is the list of pages supposed to be maintained automatically? > > > Maybe we can drive that off of a find?) > > > > > > Mihael just fixed ticket 1279. David will make 0.95 the "latest > > > download" (and update the button on both Home and Download, as push > > > 0.94.X to the older releases page)? > > > > > > Yadu, can you re-test this release when you are back online in Chicago? > > > > Yeah, do we have a general feel for 0.95? I lost track. > > > > Mihael > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > From wilde at anl.gov Sun Jun 8 14:24:27 2014 From: wilde at anl.gov (Michael Wilde) Date: Sun, 8 Jun 2014 14:24:27 -0500 Subject: [Swift-devel] Does softimage work with the new 0.95 config mechanism? Message-ID: <5394B86B.4090308@anl.gov> Does softimage work with the new 0.95 config mechanism? If not, can you suggest how to integrate it? Also: has anyone written up any softimage documentation yet? Thanks, - Mike ps. softimage is briefly introduced in this prior swift-devel post: http://lists.ci.uchicago.edu/pipermail/swift-devel/2014-February/010640.html -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago From wilde at anl.gov Sun Jun 8 16:48:00 2014 From: wilde at anl.gov (Michael Wilde) Date: Sun, 8 Jun 2014 16:48:00 -0500 Subject: [Swift-devel] Localhost coasters not working on Beagle Message-ID: <5394DA10.3040404@anl.gov> Mihael - Im not able to get a simple localhost coasters run working on Beagle login1. All: Is anyone seeing something similar? It looks to me like my coaster worker is not able to connect to the Swift coaster service (using standard automatic coasters). Im working in /lustre/beagle/wilde/swift/lab/fastio (where you can find logs and configs). Running 0.95RC6. Im setting GLOBUS_HOSTNAME (to 127.0.0.1) and have tried internalHost as well: login1$ swift -config cf -tc.file apps -sites.file localcoast.xml catsn.swift login1$ cat localcoast.xml 127.0.0.1 00:01:00 3600 1 1 1 1 12 10000 100 100 /tmp/swiftwork I get error 110 connection timeouts: 2014-06-08 16:37:50,762-0500 DEBUG swift JOB_START jobid=cat-7jiymsrl tr=cat arguments=[data.txt] tmpdir=catsn-run013/jobs/7/cat-7jiymsrl host=localhost 2014-06-08 16:37:50,829-0500 INFO LocalService Started local service: 127.0.0.1:50000 2014-06-08 16:37:50,837-0500 INFO BootstrapService Socket bound. URL is http://127.0.0.1:50001 2014-06-08 16:37:50,914-0500 INFO Settings Local contacts: [http://127.0.0.2:50003, http://192.5.86.104:50003, http://10.128.2.244:50003] 2014-06-08 16:37:50,917-0500 INFO CoasterService Started local service: http://127.0.0.1:50003 2014-06-08 16:37:50,917-0500 INFO CoasterService Reserving channel for registration 2014-06-08 16:37:50,942-0500 INFO MetaChannel MetaChannel [context: cpipe, boundTo: null] binding to cpipe://1 2014-06-08 16:37:50,942-0500 INFO MetaChannel MetaChannel [context: spipe, boundTo: null] binding to spipe://1 2014-06-08 16:37:50,942-0500 INFO CoasterService Sending registration 2014-06-08 16:37:50,948-0500 INFO MetaChannel Trying to re-bind current channel 2014-06-08 16:37:50,949-0500 INFO RequestHandler Handler(tag: 1, REGISTER) unregistering (send) 2014-06-08 16:37:50,949-0500 INFO CoasterService Registration complete 2014-06-08 16:37:50,949-0500 INFO CoasterService Started coaster service: http://127.0.0.1:50002 2014-06-08 16:37:50,952-0500 WARN Settings original callback URI is http://10.128.2.244:50003 2014-06-08 16:37:50,952-0500 WARN Settings callback URI has been overridden to http://127.0.0.1:50003 2014-06-08 16:37:50,953-0500 INFO RequestHandler Handler(tag: 1, CONFIGSERVICE) unregistering (send) 2014-06-08 16:37:50,969-0500 INFO BlockQueueProcessor Starting... id=0608-3704500 2014-06-08 16:37:50,969-0500 INFO RequestHandler Handler(tag: 2, SUBMITJOB) unregistering (send) 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Settings { slots = 1 jobsPerNode = 1 workersPerNode = 1 nodeGranularity = 1 allocationStepSize = 0.1 maxNodes = 1 lowOverallocation = 10.0 highOverallocation = 1.0 overallocationDecayFactor = 0.001 spread = 0.9 reserve = 60.000s maxtime = 3600 remoteMonitorEnabled = false internalHostname = 127.0.0.1 hookClass = null workerManager = block workerLoggingLevel = NONE workerLoggingDirectory = DEFAULT ldLibraryPath = null workerCopies = null directory = null useHashBang = null parallelism = 0.01 coresPerNode = 1 perfTraceWorker = false perfTraceInterval = -1 attributes = {} callbackURIs = [http://127.0.0.1:50003] } 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Jobs in holding queue: 1 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Time estimate for holding queue (seconds): 1 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Allocating blocks for a total walltime of: 1s 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Considering: Job(id:0 60.000s) 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Max Walltime (seconds): 60 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Time estimate (seconds): 600 2014-06-08 16:37:51,010-0500 INFO BlockQueueProcessor Total for this new Block (est. seconds): 0 2014-06-08 16:37:51,013-0500 INFO BlockQueueProcessor index: 0, last: 0, holding.size(): 1 2014-06-08 16:37:51,014-0500 INFO BlockQueueProcessor Queued: 1 jobs to new Block 2014-06-08 16:37:51,014-0500 INFO BlockQueueProcessor index: 0, last: 0, ii: 1, holding.size(): 1 2014-06-08 16:37:51,014-0500 INFO Block Starting block: workers=1, walltime=600.000s 2014-06-08 16:37:51,016-0500 INFO RemoteLogHandler BLOCK_REQUESTED id=0608-3704500-000000, cores=1, coresPerWorker=1, walltime=600 2014-06-08 16:37:51,016-0500 INFO RequestHandler Handler(tag: 2, RLOG) unregistering (send) 2014-06-08 16:37:51,018-0500 INFO BlockTaskSubmitter Queuing block Block 0608-3704500-000000 (1x600.000s) for submission 2014-06-08 16:37:51,018-0500 INFO BlockQueueProcessor Added 1 jobs to new blocks 2014-06-08 16:37:51,018-0500 INFO BlockTaskSubmitter Submitting block Block 0608-3704500-000000 (1x600.000s) 2014-06-08 16:37:51,018-0500 INFO ExecutionTaskHandler provider=local 2014-06-08 16:37:51,023-0500 INFO Block Block task status changed: Submitting 2014-06-08 16:37:51,023-0500 INFO JobSubmissionTaskHandler Submit: in: / command: /usr/bin/perl /home/wilde/.globus/coasters/cscript2445623341660096310.pl http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING 2014-06-08 16:37:51,024-0500 INFO Block Block task status changed: Submitted 2014-06-08 16:37:51,027-0500 INFO Block Block task status changed: Active 2014-06-08 16:37:51,027-0500 INFO RemoteLogHandler BLOCK_ACTIVE id=0608-3704500-000000 2014-06-08 16:37:51,027-0500 INFO RequestHandler Handler(tag: 3, RLOG) unregistering (send) 2014-06-08 16:37:51,681-0500 INFO RuntimeStats$ProgressTicker Submitted:1 2014-06-08 16:37:51,681-0500 INFO RuntimeStats$ProgressTicker HeapMax: 954466304, CrtHeap: 253624320, UsedHeap: 28583112 2014-06-08 16:38:21,683-0500 INFO RuntimeStats$ProgressTicker Submitted:1 2014-06-08 16:38:21,683-0500 INFO RuntimeStats$ProgressTicker HeapMax: 954466304, CrtHeap: 253624320, UsedHeap: 29067208 2014-06-08 16:38:51,686-0500 INFO RuntimeStats$ProgressTicker Submitted:1 2014-06-08 16:38:51,686-0500 INFO RuntimeStats$ProgressTicker HeapMax: 954466304, CrtHeap: 253624320, UsedHeap: 29551304 2014-06-08 16:38:57,113-0500 INFO Block Block task status changed: Failed Job failed with an exit code of 110 2014-06-08 16:38:57,115-0500 INFO Block Failed task spec: Job: executable: /usr/bin/perl arguments: /home/wilde/.globus/coasters/cscript2445623341660096310.pl http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING stdout: null stderr: null directory: / batch: false redirected: false attributes: hostcount=1,count=1,jobspernode=1,corespernode=1,maxwalltime=10 env: WORKER_LOGGING_LEVEL=NONE 2014-06-08 16:38:57,115-0500 INFO Block Worker task failed: Failed to connect: Connection timed out at /home/wilde/.globus/coasters/cscript2445623341660096310.pl line 1101. -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago From hategan at mcs.anl.gov Sun Jun 8 17:33:15 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 8 Jun 2014 15:33:15 -0700 Subject: [Swift-devel] Localhost coasters not working on Beagle In-Reply-To: <5394DA10.3040404@anl.gov> References: <5394DA10.3040404@anl.gov> Message-ID: <1402266795.32444.2.camel@echo> Can you enable worker logging and post the worker log? Mihael On Sun, 2014-06-08 at 16:48 -0500, Michael Wilde wrote: > Mihael - Im not able to get a simple localhost coasters run working on > Beagle login1. > > All: Is anyone seeing something similar? It looks to me like my coaster > worker is not able to connect to the Swift coaster service (using > standard automatic coasters). > > Im working in /lustre/beagle/wilde/swift/lab/fastio (where you can find > logs and configs). Running 0.95RC6. > > Im setting GLOBUS_HOSTNAME (to 127.0.0.1) and have tried internalHost as > well: > > login1$ swift -config cf -tc.file apps -sites.file localcoast.xml > catsn.swift > > login1$ cat localcoast.xml > > > > > > > > 127.0.0.1 > 00:01:00 > 3600 > > 1 > 1 > 1 > 1 > > 12 > 10000 > > 100 > 100 > > > /tmp/swiftwork > > > > > I get error 110 connection timeouts: > > 2014-06-08 16:37:50,762-0500 DEBUG swift JOB_START jobid=cat-7jiymsrl > tr=cat arguments=[data.txt] tmpdir=catsn-run013/jobs/7/cat-7jiymsrl > host=localhost > 2014-06-08 16:37:50,829-0500 INFO LocalService Started local service: > 127.0.0.1:50000 > 2014-06-08 16:37:50,837-0500 INFO BootstrapService Socket bound. URL is > http://127.0.0.1:50001 > 2014-06-08 16:37:50,914-0500 INFO Settings Local contacts: > [http://127.0.0.2:50003, http://192.5.86.104:50003, > http://10.128.2.244:50003] > 2014-06-08 16:37:50,917-0500 INFO CoasterService Started local service: > http://127.0.0.1:50003 > 2014-06-08 16:37:50,917-0500 INFO CoasterService Reserving channel for > registration > 2014-06-08 16:37:50,942-0500 INFO MetaChannel MetaChannel [context: > cpipe, boundTo: null] binding to cpipe://1 > 2014-06-08 16:37:50,942-0500 INFO MetaChannel MetaChannel [context: > spipe, boundTo: null] binding to spipe://1 > 2014-06-08 16:37:50,942-0500 INFO CoasterService Sending registration > 2014-06-08 16:37:50,948-0500 INFO MetaChannel Trying to re-bind current > channel > 2014-06-08 16:37:50,949-0500 INFO RequestHandler Handler(tag: 1, > REGISTER) unregistering (send) > 2014-06-08 16:37:50,949-0500 INFO CoasterService Registration complete > 2014-06-08 16:37:50,949-0500 INFO CoasterService Started coaster > service: http://127.0.0.1:50002 > 2014-06-08 16:37:50,952-0500 WARN Settings original callback URI is > http://10.128.2.244:50003 > 2014-06-08 16:37:50,952-0500 WARN Settings callback URI has been > overridden to http://127.0.0.1:50003 > 2014-06-08 16:37:50,953-0500 INFO RequestHandler Handler(tag: 1, > CONFIGSERVICE) unregistering (send) > 2014-06-08 16:37:50,969-0500 INFO BlockQueueProcessor Starting... > id=0608-3704500 > 2014-06-08 16:37:50,969-0500 INFO RequestHandler Handler(tag: 2, > SUBMITJOB) unregistering (send) > 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor > Settings { > slots = 1 > jobsPerNode = 1 > workersPerNode = 1 > nodeGranularity = 1 > allocationStepSize = 0.1 > maxNodes = 1 > lowOverallocation = 10.0 > highOverallocation = 1.0 > overallocationDecayFactor = 0.001 > spread = 0.9 > reserve = 60.000s > maxtime = 3600 > remoteMonitorEnabled = false > internalHostname = 127.0.0.1 > hookClass = null > workerManager = block > workerLoggingLevel = NONE > workerLoggingDirectory = DEFAULT > ldLibraryPath = null > workerCopies = null > directory = null > useHashBang = null > parallelism = 0.01 > coresPerNode = 1 > perfTraceWorker = false > perfTraceInterval = -1 > attributes = {} > callbackURIs = [http://127.0.0.1:50003] > } > > 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Jobs in holding > queue: 1 > 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Time estimate for > holding queue (seconds): 1 > 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Allocating blocks > for a total walltime of: 1s > 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Considering: > Job(id:0 60.000s) > 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Max > Walltime (seconds): 60 > 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Time > estimate (seconds): 600 > 2014-06-08 16:37:51,010-0500 INFO BlockQueueProcessor Total for > this new Block (est. seconds): 0 > 2014-06-08 16:37:51,013-0500 INFO BlockQueueProcessor index: 0, last: > 0, holding.size(): 1 > 2014-06-08 16:37:51,014-0500 INFO BlockQueueProcessor Queued: 1 jobs to > new Block > 2014-06-08 16:37:51,014-0500 INFO BlockQueueProcessor index: 0, last: > 0, ii: 1, holding.size(): 1 > 2014-06-08 16:37:51,014-0500 INFO Block Starting block: workers=1, > walltime=600.000s > 2014-06-08 16:37:51,016-0500 INFO RemoteLogHandler BLOCK_REQUESTED > id=0608-3704500-000000, cores=1, coresPerWorker=1, walltime=600 > 2014-06-08 16:37:51,016-0500 INFO RequestHandler Handler(tag: 2, RLOG) > unregistering (send) > 2014-06-08 16:37:51,018-0500 INFO BlockTaskSubmitter Queuing block > Block 0608-3704500-000000 (1x600.000s) for submission > 2014-06-08 16:37:51,018-0500 INFO BlockQueueProcessor Added 1 jobs to > new blocks > 2014-06-08 16:37:51,018-0500 INFO BlockTaskSubmitter Submitting block > Block 0608-3704500-000000 (1x600.000s) > 2014-06-08 16:37:51,018-0500 INFO ExecutionTaskHandler provider=local > 2014-06-08 16:37:51,023-0500 INFO Block Block task status changed: > Submitting > 2014-06-08 16:37:51,023-0500 INFO JobSubmissionTaskHandler Submit: in: > / command: /usr/bin/perl > /home/wilde/.globus/coasters/cscript2445623341660096310.pl > http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING > 2014-06-08 16:37:51,024-0500 INFO Block Block task status changed: > Submitted > 2014-06-08 16:37:51,027-0500 INFO Block Block task status changed: Active > 2014-06-08 16:37:51,027-0500 INFO RemoteLogHandler BLOCK_ACTIVE > id=0608-3704500-000000 > 2014-06-08 16:37:51,027-0500 INFO RequestHandler Handler(tag: 3, RLOG) > unregistering (send) > 2014-06-08 16:37:51,681-0500 INFO RuntimeStats$ProgressTicker Submitted:1 > 2014-06-08 16:37:51,681-0500 INFO RuntimeStats$ProgressTicker HeapMax: > 954466304, CrtHeap: 253624320, UsedHeap: 28583112 > 2014-06-08 16:38:21,683-0500 INFO RuntimeStats$ProgressTicker Submitted:1 > 2014-06-08 16:38:21,683-0500 INFO RuntimeStats$ProgressTicker HeapMax: > 954466304, CrtHeap: 253624320, UsedHeap: 29067208 > 2014-06-08 16:38:51,686-0500 INFO RuntimeStats$ProgressTicker Submitted:1 > 2014-06-08 16:38:51,686-0500 INFO RuntimeStats$ProgressTicker HeapMax: > 954466304, CrtHeap: 253624320, UsedHeap: 29551304 > 2014-06-08 16:38:57,113-0500 INFO Block Block task status changed: > Failed Job failed with an exit code of 110 > 2014-06-08 16:38:57,115-0500 INFO Block Failed task spec: Job: > executable: /usr/bin/perl > arguments: > /home/wilde/.globus/coasters/cscript2445623341660096310.pl > http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING > stdout: null > stderr: null > directory: / > batch: false > redirected: false > attributes: > hostcount=1,count=1,jobspernode=1,corespernode=1,maxwalltime=10 > env: WORKER_LOGGING_LEVEL=NONE > > 2014-06-08 16:38:57,115-0500 INFO Block Worker task failed: > Failed to connect: Connection timed out at > /home/wilde/.globus/coasters/cscript2445623341660096310.pl line 1101. > > > From davidkelly at uchicago.edu Sun Jun 8 19:07:38 2014 From: davidkelly at uchicago.edu (David Kelly) Date: Sun, 8 Jun 2014 19:07:38 -0500 Subject: [Swift-devel] Does softimage work with the new 0.95 config mechanism? In-Reply-To: <5394B86B.4090308@anl.gov> References: <5394B86B.4090308@anl.gov> Message-ID: No, the new config mechanism does not know about softimage. There is no documentation in the userguide about it. (Side note: the trunk userguide should be copied to become the 0.95 userguide, and be added to the website) I just created a page in swift-devel explaining how to add properties to the new config ( https://sites.google.com/site/swiftdevel/home/adding-properties-to-0-95-config-mechanism ). On Sun, Jun 8, 2014 at 2:24 PM, Michael Wilde wrote: > > Does softimage work with the new 0.95 config mechanism? > > If not, can you suggest how to integrate it? > > Also: has anyone written up any softimage documentation yet? > > Thanks, > > - Mike > > ps. softimage is briefly introduced in this prior swift-devel post: > > http://lists.ci.uchicago.edu/pipermail/swift-devel/2014-February/010640.html > > -- > Michael Wilde > Mathematics and Computer Science Computation Institute > Argonne National Laboratory The University of Chicago > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at anl.gov Sun Jun 8 22:10:38 2014 From: wilde at anl.gov (Michael Wilde) Date: Sun, 8 Jun 2014 22:10:38 -0500 Subject: [Swift-devel] Localhost coasters not working on Beagle In-Reply-To: <1402266795.32444.2.camel@echo> References: <5394DA10.3040404@anl.gov> <1402266795.32444.2.camel@echo> Message-ID: <539525AE.7080705@anl.gov> login1$ more /home/wilde/.globus/coasters/worker-0608-0710120-000000.log 2014/06/08 22:07:12.296 INFO - 0608-0710120-000000 Logging started: Sun Jun 8 22:07:12 2014 2014/06/08 22:07:12.296 INFO - Running on node login1.beagle.ci.uchicago.edu 2014/06/08 22:07:12.296 DEBUG - uri=http://127.0.0.1:50003 2014/06/08 22:07:12.296 DEBUG - scheme=http 2014/06/08 22:07:12.297 DEBUG - host=127.0.0.1 2014/06/08 22:07:12.297 DEBUG - port=50003 2014/06/08 22:07:12.297 DEBUG - blockid=0608-0710120-000000 2014/06/08 22:07:12.297 INFO - Connect attempt: 0 ... 2014/06/08 22:07:12.297 INFO - Trying 127.0.0.1:50003 ... 2014/06/08 22:07:33.296 INFO - Connection failed: Connection timed out. Trying other addresses 2014/06/08 22:07:33.296 ERROR - Connection failed for all addresses. 2014/06/08 22:07:33.296 ERROR - Retrying in 1 seconds 2014/06/08 22:07:34.297 INFO - Connect attempt: 1 ... 2014/06/08 22:07:34.297 INFO - Trying 127.0.0.1:50003 ... 2014/06/08 22:07:55.295 INFO - Connection failed: Connection timed out. Trying other addresses 2014/06/08 22:07:55.296 ERROR - Connection failed for all addresses. 2014/06/08 22:07:55.296 ERROR - Retrying in 2 seconds 2014/06/08 22:07:57.298 INFO - Connect attempt: 2 ... 2014/06/08 22:07:57.298 INFO - Trying 127.0.0.1:50003 ... 2014/06/08 22:08:18.295 INFO - Connection failed: Connection timed out. Trying other addresses 2014/06/08 22:08:18.295 ERROR - Connection failed for all addresses. 2014/06/08 22:08:18.295 ERROR - Failed to connect: Connection timed out login1$ On 6/8/14, 5:33 PM, Mihael Hategan wrote: > Can you enable worker logging and post the worker log? > > Mihael > > On Sun, 2014-06-08 at 16:48 -0500, Michael Wilde wrote: >> Mihael - Im not able to get a simple localhost coasters run working on >> Beagle login1. >> >> All: Is anyone seeing something similar? It looks to me like my coaster >> worker is not able to connect to the Swift coaster service (using >> standard automatic coasters). >> >> Im working in /lustre/beagle/wilde/swift/lab/fastio (where you can find >> logs and configs). Running 0.95RC6. >> >> Im setting GLOBUS_HOSTNAME (to 127.0.0.1) and have tried internalHost as >> well: >> >> login1$ swift -config cf -tc.file apps -sites.file localcoast.xml >> catsn.swift >> >> login1$ cat localcoast.xml >> >> >> >> >> >> >> >> 127.0.0.1 >> 00:01:00 >> 3600 >> >> 1 >> 1 >> 1 >> 1 >> >> 12 >> 10000 >> >> 100 >> 100 >> >> >> /tmp/swiftwork >> >> >> >> >> I get error 110 connection timeouts: >> >> 2014-06-08 16:37:50,762-0500 DEBUG swift JOB_START jobid=cat-7jiymsrl >> tr=cat arguments=[data.txt] tmpdir=catsn-run013/jobs/7/cat-7jiymsrl >> host=localhost >> 2014-06-08 16:37:50,829-0500 INFO LocalService Started local service: >> 127.0.0.1:50000 >> 2014-06-08 16:37:50,837-0500 INFO BootstrapService Socket bound. URL is >> http://127.0.0.1:50001 >> 2014-06-08 16:37:50,914-0500 INFO Settings Local contacts: >> [http://127.0.0.2:50003, http://192.5.86.104:50003, >> http://10.128.2.244:50003] >> 2014-06-08 16:37:50,917-0500 INFO CoasterService Started local service: >> http://127.0.0.1:50003 >> 2014-06-08 16:37:50,917-0500 INFO CoasterService Reserving channel for >> registration >> 2014-06-08 16:37:50,942-0500 INFO MetaChannel MetaChannel [context: >> cpipe, boundTo: null] binding to cpipe://1 >> 2014-06-08 16:37:50,942-0500 INFO MetaChannel MetaChannel [context: >> spipe, boundTo: null] binding to spipe://1 >> 2014-06-08 16:37:50,942-0500 INFO CoasterService Sending registration >> 2014-06-08 16:37:50,948-0500 INFO MetaChannel Trying to re-bind current >> channel >> 2014-06-08 16:37:50,949-0500 INFO RequestHandler Handler(tag: 1, >> REGISTER) unregistering (send) >> 2014-06-08 16:37:50,949-0500 INFO CoasterService Registration complete >> 2014-06-08 16:37:50,949-0500 INFO CoasterService Started coaster >> service: http://127.0.0.1:50002 >> 2014-06-08 16:37:50,952-0500 WARN Settings original callback URI is >> http://10.128.2.244:50003 >> 2014-06-08 16:37:50,952-0500 WARN Settings callback URI has been >> overridden to http://127.0.0.1:50003 >> 2014-06-08 16:37:50,953-0500 INFO RequestHandler Handler(tag: 1, >> CONFIGSERVICE) unregistering (send) >> 2014-06-08 16:37:50,969-0500 INFO BlockQueueProcessor Starting... >> id=0608-3704500 >> 2014-06-08 16:37:50,969-0500 INFO RequestHandler Handler(tag: 2, >> SUBMITJOB) unregistering (send) >> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor >> Settings { >> slots = 1 >> jobsPerNode = 1 >> workersPerNode = 1 >> nodeGranularity = 1 >> allocationStepSize = 0.1 >> maxNodes = 1 >> lowOverallocation = 10.0 >> highOverallocation = 1.0 >> overallocationDecayFactor = 0.001 >> spread = 0.9 >> reserve = 60.000s >> maxtime = 3600 >> remoteMonitorEnabled = false >> internalHostname = 127.0.0.1 >> hookClass = null >> workerManager = block >> workerLoggingLevel = NONE >> workerLoggingDirectory = DEFAULT >> ldLibraryPath = null >> workerCopies = null >> directory = null >> useHashBang = null >> parallelism = 0.01 >> coresPerNode = 1 >> perfTraceWorker = false >> perfTraceInterval = -1 >> attributes = {} >> callbackURIs = [http://127.0.0.1:50003] >> } >> >> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Jobs in holding >> queue: 1 >> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Time estimate for >> holding queue (seconds): 1 >> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Allocating blocks >> for a total walltime of: 1s >> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Considering: >> Job(id:0 60.000s) >> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Max >> Walltime (seconds): 60 >> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Time >> estimate (seconds): 600 >> 2014-06-08 16:37:51,010-0500 INFO BlockQueueProcessor Total for >> this new Block (est. seconds): 0 >> 2014-06-08 16:37:51,013-0500 INFO BlockQueueProcessor index: 0, last: >> 0, holding.size(): 1 >> 2014-06-08 16:37:51,014-0500 INFO BlockQueueProcessor Queued: 1 jobs to >> new Block >> 2014-06-08 16:37:51,014-0500 INFO BlockQueueProcessor index: 0, last: >> 0, ii: 1, holding.size(): 1 >> 2014-06-08 16:37:51,014-0500 INFO Block Starting block: workers=1, >> walltime=600.000s >> 2014-06-08 16:37:51,016-0500 INFO RemoteLogHandler BLOCK_REQUESTED >> id=0608-3704500-000000, cores=1, coresPerWorker=1, walltime=600 >> 2014-06-08 16:37:51,016-0500 INFO RequestHandler Handler(tag: 2, RLOG) >> unregistering (send) >> 2014-06-08 16:37:51,018-0500 INFO BlockTaskSubmitter Queuing block >> Block 0608-3704500-000000 (1x600.000s) for submission >> 2014-06-08 16:37:51,018-0500 INFO BlockQueueProcessor Added 1 jobs to >> new blocks >> 2014-06-08 16:37:51,018-0500 INFO BlockTaskSubmitter Submitting block >> Block 0608-3704500-000000 (1x600.000s) >> 2014-06-08 16:37:51,018-0500 INFO ExecutionTaskHandler provider=local >> 2014-06-08 16:37:51,023-0500 INFO Block Block task status changed: >> Submitting >> 2014-06-08 16:37:51,023-0500 INFO JobSubmissionTaskHandler Submit: in: >> / command: /usr/bin/perl >> /home/wilde/.globus/coasters/cscript2445623341660096310.pl >> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING >> 2014-06-08 16:37:51,024-0500 INFO Block Block task status changed: >> Submitted >> 2014-06-08 16:37:51,027-0500 INFO Block Block task status changed: Active >> 2014-06-08 16:37:51,027-0500 INFO RemoteLogHandler BLOCK_ACTIVE >> id=0608-3704500-000000 >> 2014-06-08 16:37:51,027-0500 INFO RequestHandler Handler(tag: 3, RLOG) >> unregistering (send) >> 2014-06-08 16:37:51,681-0500 INFO RuntimeStats$ProgressTicker Submitted:1 >> 2014-06-08 16:37:51,681-0500 INFO RuntimeStats$ProgressTicker HeapMax: >> 954466304, CrtHeap: 253624320, UsedHeap: 28583112 >> 2014-06-08 16:38:21,683-0500 INFO RuntimeStats$ProgressTicker Submitted:1 >> 2014-06-08 16:38:21,683-0500 INFO RuntimeStats$ProgressTicker HeapMax: >> 954466304, CrtHeap: 253624320, UsedHeap: 29067208 >> 2014-06-08 16:38:51,686-0500 INFO RuntimeStats$ProgressTicker Submitted:1 >> 2014-06-08 16:38:51,686-0500 INFO RuntimeStats$ProgressTicker HeapMax: >> 954466304, CrtHeap: 253624320, UsedHeap: 29551304 >> 2014-06-08 16:38:57,113-0500 INFO Block Block task status changed: >> Failed Job failed with an exit code of 110 >> 2014-06-08 16:38:57,115-0500 INFO Block Failed task spec: Job: >> executable: /usr/bin/perl >> arguments: >> /home/wilde/.globus/coasters/cscript2445623341660096310.pl >> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING >> stdout: null >> stderr: null >> directory: / >> batch: false >> redirected: false >> attributes: >> hostcount=1,count=1,jobspernode=1,corespernode=1,maxwalltime=10 >> env: WORKER_LOGGING_LEVEL=NONE >> >> 2014-06-08 16:38:57,115-0500 INFO Block Worker task failed: >> Failed to connect: Connection timed out at >> /home/wilde/.globus/coasters/cscript2445623341660096310.pl line 1101. >> >> >> > -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago From hategan at mcs.anl.gov Sun Jun 8 22:22:51 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 8 Jun 2014 20:22:51 -0700 Subject: [Swift-devel] Localhost coasters not working on Beagle In-Reply-To: <539525AE.7080705@anl.gov> References: <5394DA10.3040404@anl.gov> <1402266795.32444.2.camel@echo> <539525AE.7080705@anl.gov> Message-ID: <1402284171.15313.0.camel@echo> That's odd. Have you tried netstat -lntp? telnet? I'll give it a shot, but this looks rather strange. Mihael On Sun, 2014-06-08 at 22:10 -0500, Michael Wilde wrote: > login1$ more /home/wilde/.globus/coasters/worker-0608-0710120-000000.log > 2014/06/08 22:07:12.296 INFO - 0608-0710120-000000 Logging started: Sun > Jun 8 22:07:12 2014 > 2014/06/08 22:07:12.296 INFO - Running on node > login1.beagle.ci.uchicago.edu > 2014/06/08 22:07:12.296 DEBUG - uri=http://127.0.0.1:50003 > 2014/06/08 22:07:12.296 DEBUG - scheme=http > 2014/06/08 22:07:12.297 DEBUG - host=127.0.0.1 > 2014/06/08 22:07:12.297 DEBUG - port=50003 > 2014/06/08 22:07:12.297 DEBUG - blockid=0608-0710120-000000 > 2014/06/08 22:07:12.297 INFO - Connect attempt: 0 ... > 2014/06/08 22:07:12.297 INFO - Trying 127.0.0.1:50003 ... > 2014/06/08 22:07:33.296 INFO - Connection failed: Connection timed out. > Trying other addresses > 2014/06/08 22:07:33.296 ERROR - Connection failed for all addresses. > 2014/06/08 22:07:33.296 ERROR - Retrying in 1 seconds > 2014/06/08 22:07:34.297 INFO - Connect attempt: 1 ... > 2014/06/08 22:07:34.297 INFO - Trying 127.0.0.1:50003 ... > 2014/06/08 22:07:55.295 INFO - Connection failed: Connection timed out. > Trying other addresses > 2014/06/08 22:07:55.296 ERROR - Connection failed for all addresses. > 2014/06/08 22:07:55.296 ERROR - Retrying in 2 seconds > 2014/06/08 22:07:57.298 INFO - Connect attempt: 2 ... > 2014/06/08 22:07:57.298 INFO - Trying 127.0.0.1:50003 ... > 2014/06/08 22:08:18.295 INFO - Connection failed: Connection timed out. > Trying other addresses > 2014/06/08 22:08:18.295 ERROR - Connection failed for all addresses. > 2014/06/08 22:08:18.295 ERROR - Failed to connect: Connection timed out > login1$ > > > On 6/8/14, 5:33 PM, Mihael Hategan wrote: > > Can you enable worker logging and post the worker log? > > > > Mihael > > > > On Sun, 2014-06-08 at 16:48 -0500, Michael Wilde wrote: > >> Mihael - Im not able to get a simple localhost coasters run working on > >> Beagle login1. > >> > >> All: Is anyone seeing something similar? It looks to me like my coaster > >> worker is not able to connect to the Swift coaster service (using > >> standard automatic coasters). > >> > >> Im working in /lustre/beagle/wilde/swift/lab/fastio (where you can find > >> logs and configs). Running 0.95RC6. > >> > >> Im setting GLOBUS_HOSTNAME (to 127.0.0.1) and have tried internalHost as > >> well: > >> > >> login1$ swift -config cf -tc.file apps -sites.file localcoast.xml > >> catsn.swift > >> > >> login1$ cat localcoast.xml > >> > >> > >> > >> > >> > >> > >> > >> 127.0.0.1 > >> 00:01:00 > >> 3600 > >> > >> 1 > >> 1 > >> 1 > >> 1 > >> > >> 12 > >> 10000 > >> > >> 100 > >> 100 > >> > >> > >> /tmp/swiftwork > >> > >> > >> > >> > >> I get error 110 connection timeouts: > >> > >> 2014-06-08 16:37:50,762-0500 DEBUG swift JOB_START jobid=cat-7jiymsrl > >> tr=cat arguments=[data.txt] tmpdir=catsn-run013/jobs/7/cat-7jiymsrl > >> host=localhost > >> 2014-06-08 16:37:50,829-0500 INFO LocalService Started local service: > >> 127.0.0.1:50000 > >> 2014-06-08 16:37:50,837-0500 INFO BootstrapService Socket bound. URL is > >> http://127.0.0.1:50001 > >> 2014-06-08 16:37:50,914-0500 INFO Settings Local contacts: > >> [http://127.0.0.2:50003, http://192.5.86.104:50003, > >> http://10.128.2.244:50003] > >> 2014-06-08 16:37:50,917-0500 INFO CoasterService Started local service: > >> http://127.0.0.1:50003 > >> 2014-06-08 16:37:50,917-0500 INFO CoasterService Reserving channel for > >> registration > >> 2014-06-08 16:37:50,942-0500 INFO MetaChannel MetaChannel [context: > >> cpipe, boundTo: null] binding to cpipe://1 > >> 2014-06-08 16:37:50,942-0500 INFO MetaChannel MetaChannel [context: > >> spipe, boundTo: null] binding to spipe://1 > >> 2014-06-08 16:37:50,942-0500 INFO CoasterService Sending registration > >> 2014-06-08 16:37:50,948-0500 INFO MetaChannel Trying to re-bind current > >> channel > >> 2014-06-08 16:37:50,949-0500 INFO RequestHandler Handler(tag: 1, > >> REGISTER) unregistering (send) > >> 2014-06-08 16:37:50,949-0500 INFO CoasterService Registration complete > >> 2014-06-08 16:37:50,949-0500 INFO CoasterService Started coaster > >> service: http://127.0.0.1:50002 > >> 2014-06-08 16:37:50,952-0500 WARN Settings original callback URI is > >> http://10.128.2.244:50003 > >> 2014-06-08 16:37:50,952-0500 WARN Settings callback URI has been > >> overridden to http://127.0.0.1:50003 > >> 2014-06-08 16:37:50,953-0500 INFO RequestHandler Handler(tag: 1, > >> CONFIGSERVICE) unregistering (send) > >> 2014-06-08 16:37:50,969-0500 INFO BlockQueueProcessor Starting... > >> id=0608-3704500 > >> 2014-06-08 16:37:50,969-0500 INFO RequestHandler Handler(tag: 2, > >> SUBMITJOB) unregistering (send) > >> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor > >> Settings { > >> slots = 1 > >> jobsPerNode = 1 > >> workersPerNode = 1 > >> nodeGranularity = 1 > >> allocationStepSize = 0.1 > >> maxNodes = 1 > >> lowOverallocation = 10.0 > >> highOverallocation = 1.0 > >> overallocationDecayFactor = 0.001 > >> spread = 0.9 > >> reserve = 60.000s > >> maxtime = 3600 > >> remoteMonitorEnabled = false > >> internalHostname = 127.0.0.1 > >> hookClass = null > >> workerManager = block > >> workerLoggingLevel = NONE > >> workerLoggingDirectory = DEFAULT > >> ldLibraryPath = null > >> workerCopies = null > >> directory = null > >> useHashBang = null > >> parallelism = 0.01 > >> coresPerNode = 1 > >> perfTraceWorker = false > >> perfTraceInterval = -1 > >> attributes = {} > >> callbackURIs = [http://127.0.0.1:50003] > >> } > >> > >> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Jobs in holding > >> queue: 1 > >> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Time estimate for > >> holding queue (seconds): 1 > >> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Allocating blocks > >> for a total walltime of: 1s > >> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Considering: > >> Job(id:0 60.000s) > >> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Max > >> Walltime (seconds): 60 > >> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Time > >> estimate (seconds): 600 > >> 2014-06-08 16:37:51,010-0500 INFO BlockQueueProcessor Total for > >> this new Block (est. seconds): 0 > >> 2014-06-08 16:37:51,013-0500 INFO BlockQueueProcessor index: 0, last: > >> 0, holding.size(): 1 > >> 2014-06-08 16:37:51,014-0500 INFO BlockQueueProcessor Queued: 1 jobs to > >> new Block > >> 2014-06-08 16:37:51,014-0500 INFO BlockQueueProcessor index: 0, last: > >> 0, ii: 1, holding.size(): 1 > >> 2014-06-08 16:37:51,014-0500 INFO Block Starting block: workers=1, > >> walltime=600.000s > >> 2014-06-08 16:37:51,016-0500 INFO RemoteLogHandler BLOCK_REQUESTED > >> id=0608-3704500-000000, cores=1, coresPerWorker=1, walltime=600 > >> 2014-06-08 16:37:51,016-0500 INFO RequestHandler Handler(tag: 2, RLOG) > >> unregistering (send) > >> 2014-06-08 16:37:51,018-0500 INFO BlockTaskSubmitter Queuing block > >> Block 0608-3704500-000000 (1x600.000s) for submission > >> 2014-06-08 16:37:51,018-0500 INFO BlockQueueProcessor Added 1 jobs to > >> new blocks > >> 2014-06-08 16:37:51,018-0500 INFO BlockTaskSubmitter Submitting block > >> Block 0608-3704500-000000 (1x600.000s) > >> 2014-06-08 16:37:51,018-0500 INFO ExecutionTaskHandler provider=local > >> 2014-06-08 16:37:51,023-0500 INFO Block Block task status changed: > >> Submitting > >> 2014-06-08 16:37:51,023-0500 INFO JobSubmissionTaskHandler Submit: in: > >> / command: /usr/bin/perl > >> /home/wilde/.globus/coasters/cscript2445623341660096310.pl > >> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING > >> 2014-06-08 16:37:51,024-0500 INFO Block Block task status changed: > >> Submitted > >> 2014-06-08 16:37:51,027-0500 INFO Block Block task status changed: Active > >> 2014-06-08 16:37:51,027-0500 INFO RemoteLogHandler BLOCK_ACTIVE > >> id=0608-3704500-000000 > >> 2014-06-08 16:37:51,027-0500 INFO RequestHandler Handler(tag: 3, RLOG) > >> unregistering (send) > >> 2014-06-08 16:37:51,681-0500 INFO RuntimeStats$ProgressTicker Submitted:1 > >> 2014-06-08 16:37:51,681-0500 INFO RuntimeStats$ProgressTicker HeapMax: > >> 954466304, CrtHeap: 253624320, UsedHeap: 28583112 > >> 2014-06-08 16:38:21,683-0500 INFO RuntimeStats$ProgressTicker Submitted:1 > >> 2014-06-08 16:38:21,683-0500 INFO RuntimeStats$ProgressTicker HeapMax: > >> 954466304, CrtHeap: 253624320, UsedHeap: 29067208 > >> 2014-06-08 16:38:51,686-0500 INFO RuntimeStats$ProgressTicker Submitted:1 > >> 2014-06-08 16:38:51,686-0500 INFO RuntimeStats$ProgressTicker HeapMax: > >> 954466304, CrtHeap: 253624320, UsedHeap: 29551304 > >> 2014-06-08 16:38:57,113-0500 INFO Block Block task status changed: > >> Failed Job failed with an exit code of 110 > >> 2014-06-08 16:38:57,115-0500 INFO Block Failed task spec: Job: > >> executable: /usr/bin/perl > >> arguments: > >> /home/wilde/.globus/coasters/cscript2445623341660096310.pl > >> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING > >> stdout: null > >> stderr: null > >> directory: / > >> batch: false > >> redirected: false > >> attributes: > >> hostcount=1,count=1,jobspernode=1,corespernode=1,maxwalltime=10 > >> env: WORKER_LOGGING_LEVEL=NONE > >> > >> 2014-06-08 16:38:57,115-0500 INFO Block Worker task failed: > >> Failed to connect: Connection timed out at > >> /home/wilde/.globus/coasters/cscript2445623341660096310.pl line 1101. > >> > >> > >> > > > From hategan at mcs.anl.gov Sun Jun 8 22:27:04 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 8 Jun 2014 20:27:04 -0700 Subject: [Swift-devel] Localhost coasters not working on Beagle In-Reply-To: <1402284171.15313.0.camel@echo> References: <5394DA10.3040404@anl.gov> <1402266795.32444.2.camel@echo> <539525AE.7080705@anl.gov> <1402284171.15313.0.camel@echo> Message-ID: <1402284424.15405.1.camel@echo> Ok, so: shell1: hategan at login1:~> netcat -l -p 50003 shell2: hategan at login1:~> netstat -lntp ... tcp 0 0 0.0.0.0:50003 0.0.0.0:* LISTEN 22806/netcat ... hategan at login1:~> telnet 127.0.0.1 50003 Trying 127.0.0.1... telnet: connect to address 127.0.0.1: Connection timed out I don't think this has anything to do with swift or coasters. Mihael On Sun, 2014-06-08 at 20:22 -0700, Mihael Hategan wrote: > That's odd. Have you tried netstat -lntp? telnet? > > I'll give it a shot, but this looks rather strange. > > Mihael > > On Sun, 2014-06-08 at 22:10 -0500, Michael Wilde wrote: > > login1$ more /home/wilde/.globus/coasters/worker-0608-0710120-000000.log > > 2014/06/08 22:07:12.296 INFO - 0608-0710120-000000 Logging started: Sun > > Jun 8 22:07:12 2014 > > 2014/06/08 22:07:12.296 INFO - Running on node > > login1.beagle.ci.uchicago.edu > > 2014/06/08 22:07:12.296 DEBUG - uri=http://127.0.0.1:50003 > > 2014/06/08 22:07:12.296 DEBUG - scheme=http > > 2014/06/08 22:07:12.297 DEBUG - host=127.0.0.1 > > 2014/06/08 22:07:12.297 DEBUG - port=50003 > > 2014/06/08 22:07:12.297 DEBUG - blockid=0608-0710120-000000 > > 2014/06/08 22:07:12.297 INFO - Connect attempt: 0 ... > > 2014/06/08 22:07:12.297 INFO - Trying 127.0.0.1:50003 ... > > 2014/06/08 22:07:33.296 INFO - Connection failed: Connection timed out. > > Trying other addresses > > 2014/06/08 22:07:33.296 ERROR - Connection failed for all addresses. > > 2014/06/08 22:07:33.296 ERROR - Retrying in 1 seconds > > 2014/06/08 22:07:34.297 INFO - Connect attempt: 1 ... > > 2014/06/08 22:07:34.297 INFO - Trying 127.0.0.1:50003 ... > > 2014/06/08 22:07:55.295 INFO - Connection failed: Connection timed out. > > Trying other addresses > > 2014/06/08 22:07:55.296 ERROR - Connection failed for all addresses. > > 2014/06/08 22:07:55.296 ERROR - Retrying in 2 seconds > > 2014/06/08 22:07:57.298 INFO - Connect attempt: 2 ... > > 2014/06/08 22:07:57.298 INFO - Trying 127.0.0.1:50003 ... > > 2014/06/08 22:08:18.295 INFO - Connection failed: Connection timed out. > > Trying other addresses > > 2014/06/08 22:08:18.295 ERROR - Connection failed for all addresses. > > 2014/06/08 22:08:18.295 ERROR - Failed to connect: Connection timed out > > login1$ > > > > > > On 6/8/14, 5:33 PM, Mihael Hategan wrote: > > > Can you enable worker logging and post the worker log? > > > > > > Mihael > > > > > > On Sun, 2014-06-08 at 16:48 -0500, Michael Wilde wrote: > > >> Mihael - Im not able to get a simple localhost coasters run working on > > >> Beagle login1. > > >> > > >> All: Is anyone seeing something similar? It looks to me like my coaster > > >> worker is not able to connect to the Swift coaster service (using > > >> standard automatic coasters). > > >> > > >> Im working in /lustre/beagle/wilde/swift/lab/fastio (where you can find > > >> logs and configs). Running 0.95RC6. > > >> > > >> Im setting GLOBUS_HOSTNAME (to 127.0.0.1) and have tried internalHost as > > >> well: > > >> > > >> login1$ swift -config cf -tc.file apps -sites.file localcoast.xml > > >> catsn.swift > > >> > > >> login1$ cat localcoast.xml > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> 127.0.0.1 > > >> 00:01:00 > > >> 3600 > > >> > > >> 1 > > >> 1 > > >> 1 > > >> 1 > > >> > > >> 12 > > >> 10000 > > >> > > >> 100 > > >> 100 > > >> > > >> > > >> /tmp/swiftwork > > >> > > >> > > >> > > >> > > >> I get error 110 connection timeouts: > > >> > > >> 2014-06-08 16:37:50,762-0500 DEBUG swift JOB_START jobid=cat-7jiymsrl > > >> tr=cat arguments=[data.txt] tmpdir=catsn-run013/jobs/7/cat-7jiymsrl > > >> host=localhost > > >> 2014-06-08 16:37:50,829-0500 INFO LocalService Started local service: > > >> 127.0.0.1:50000 > > >> 2014-06-08 16:37:50,837-0500 INFO BootstrapService Socket bound. URL is > > >> http://127.0.0.1:50001 > > >> 2014-06-08 16:37:50,914-0500 INFO Settings Local contacts: > > >> [http://127.0.0.2:50003, http://192.5.86.104:50003, > > >> http://10.128.2.244:50003] > > >> 2014-06-08 16:37:50,917-0500 INFO CoasterService Started local service: > > >> http://127.0.0.1:50003 > > >> 2014-06-08 16:37:50,917-0500 INFO CoasterService Reserving channel for > > >> registration > > >> 2014-06-08 16:37:50,942-0500 INFO MetaChannel MetaChannel [context: > > >> cpipe, boundTo: null] binding to cpipe://1 > > >> 2014-06-08 16:37:50,942-0500 INFO MetaChannel MetaChannel [context: > > >> spipe, boundTo: null] binding to spipe://1 > > >> 2014-06-08 16:37:50,942-0500 INFO CoasterService Sending registration > > >> 2014-06-08 16:37:50,948-0500 INFO MetaChannel Trying to re-bind current > > >> channel > > >> 2014-06-08 16:37:50,949-0500 INFO RequestHandler Handler(tag: 1, > > >> REGISTER) unregistering (send) > > >> 2014-06-08 16:37:50,949-0500 INFO CoasterService Registration complete > > >> 2014-06-08 16:37:50,949-0500 INFO CoasterService Started coaster > > >> service: http://127.0.0.1:50002 > > >> 2014-06-08 16:37:50,952-0500 WARN Settings original callback URI is > > >> http://10.128.2.244:50003 > > >> 2014-06-08 16:37:50,952-0500 WARN Settings callback URI has been > > >> overridden to http://127.0.0.1:50003 > > >> 2014-06-08 16:37:50,953-0500 INFO RequestHandler Handler(tag: 1, > > >> CONFIGSERVICE) unregistering (send) > > >> 2014-06-08 16:37:50,969-0500 INFO BlockQueueProcessor Starting... > > >> id=0608-3704500 > > >> 2014-06-08 16:37:50,969-0500 INFO RequestHandler Handler(tag: 2, > > >> SUBMITJOB) unregistering (send) > > >> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor > > >> Settings { > > >> slots = 1 > > >> jobsPerNode = 1 > > >> workersPerNode = 1 > > >> nodeGranularity = 1 > > >> allocationStepSize = 0.1 > > >> maxNodes = 1 > > >> lowOverallocation = 10.0 > > >> highOverallocation = 1.0 > > >> overallocationDecayFactor = 0.001 > > >> spread = 0.9 > > >> reserve = 60.000s > > >> maxtime = 3600 > > >> remoteMonitorEnabled = false > > >> internalHostname = 127.0.0.1 > > >> hookClass = null > > >> workerManager = block > > >> workerLoggingLevel = NONE > > >> workerLoggingDirectory = DEFAULT > > >> ldLibraryPath = null > > >> workerCopies = null > > >> directory = null > > >> useHashBang = null > > >> parallelism = 0.01 > > >> coresPerNode = 1 > > >> perfTraceWorker = false > > >> perfTraceInterval = -1 > > >> attributes = {} > > >> callbackURIs = [http://127.0.0.1:50003] > > >> } > > >> > > >> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Jobs in holding > > >> queue: 1 > > >> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Time estimate for > > >> holding queue (seconds): 1 > > >> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Allocating blocks > > >> for a total walltime of: 1s > > >> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Considering: > > >> Job(id:0 60.000s) > > >> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Max > > >> Walltime (seconds): 60 > > >> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Time > > >> estimate (seconds): 600 > > >> 2014-06-08 16:37:51,010-0500 INFO BlockQueueProcessor Total for > > >> this new Block (est. seconds): 0 > > >> 2014-06-08 16:37:51,013-0500 INFO BlockQueueProcessor index: 0, last: > > >> 0, holding.size(): 1 > > >> 2014-06-08 16:37:51,014-0500 INFO BlockQueueProcessor Queued: 1 jobs to > > >> new Block > > >> 2014-06-08 16:37:51,014-0500 INFO BlockQueueProcessor index: 0, last: > > >> 0, ii: 1, holding.size(): 1 > > >> 2014-06-08 16:37:51,014-0500 INFO Block Starting block: workers=1, > > >> walltime=600.000s > > >> 2014-06-08 16:37:51,016-0500 INFO RemoteLogHandler BLOCK_REQUESTED > > >> id=0608-3704500-000000, cores=1, coresPerWorker=1, walltime=600 > > >> 2014-06-08 16:37:51,016-0500 INFO RequestHandler Handler(tag: 2, RLOG) > > >> unregistering (send) > > >> 2014-06-08 16:37:51,018-0500 INFO BlockTaskSubmitter Queuing block > > >> Block 0608-3704500-000000 (1x600.000s) for submission > > >> 2014-06-08 16:37:51,018-0500 INFO BlockQueueProcessor Added 1 jobs to > > >> new blocks > > >> 2014-06-08 16:37:51,018-0500 INFO BlockTaskSubmitter Submitting block > > >> Block 0608-3704500-000000 (1x600.000s) > > >> 2014-06-08 16:37:51,018-0500 INFO ExecutionTaskHandler provider=local > > >> 2014-06-08 16:37:51,023-0500 INFO Block Block task status changed: > > >> Submitting > > >> 2014-06-08 16:37:51,023-0500 INFO JobSubmissionTaskHandler Submit: in: > > >> / command: /usr/bin/perl > > >> /home/wilde/.globus/coasters/cscript2445623341660096310.pl > > >> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING > > >> 2014-06-08 16:37:51,024-0500 INFO Block Block task status changed: > > >> Submitted > > >> 2014-06-08 16:37:51,027-0500 INFO Block Block task status changed: Active > > >> 2014-06-08 16:37:51,027-0500 INFO RemoteLogHandler BLOCK_ACTIVE > > >> id=0608-3704500-000000 > > >> 2014-06-08 16:37:51,027-0500 INFO RequestHandler Handler(tag: 3, RLOG) > > >> unregistering (send) > > >> 2014-06-08 16:37:51,681-0500 INFO RuntimeStats$ProgressTicker Submitted:1 > > >> 2014-06-08 16:37:51,681-0500 INFO RuntimeStats$ProgressTicker HeapMax: > > >> 954466304, CrtHeap: 253624320, UsedHeap: 28583112 > > >> 2014-06-08 16:38:21,683-0500 INFO RuntimeStats$ProgressTicker Submitted:1 > > >> 2014-06-08 16:38:21,683-0500 INFO RuntimeStats$ProgressTicker HeapMax: > > >> 954466304, CrtHeap: 253624320, UsedHeap: 29067208 > > >> 2014-06-08 16:38:51,686-0500 INFO RuntimeStats$ProgressTicker Submitted:1 > > >> 2014-06-08 16:38:51,686-0500 INFO RuntimeStats$ProgressTicker HeapMax: > > >> 954466304, CrtHeap: 253624320, UsedHeap: 29551304 > > >> 2014-06-08 16:38:57,113-0500 INFO Block Block task status changed: > > >> Failed Job failed with an exit code of 110 > > >> 2014-06-08 16:38:57,115-0500 INFO Block Failed task spec: Job: > > >> executable: /usr/bin/perl > > >> arguments: > > >> /home/wilde/.globus/coasters/cscript2445623341660096310.pl > > >> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING > > >> stdout: null > > >> stderr: null > > >> directory: / > > >> batch: false > > >> redirected: false > > >> attributes: > > >> hostcount=1,count=1,jobspernode=1,corespernode=1,maxwalltime=10 > > >> env: WORKER_LOGGING_LEVEL=NONE > > >> > > >> 2014-06-08 16:38:57,115-0500 INFO Block Worker task failed: > > >> Failed to connect: Connection timed out at > > >> /home/wilde/.globus/coasters/cscript2445623341660096310.pl line 1101. > > >> > > >> > > >> > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From wilde at anl.gov Sun Jun 8 22:31:35 2014 From: wilde at anl.gov (Michael Wilde) Date: Sun, 8 Jun 2014 22:31:35 -0500 Subject: [Swift-devel] Localhost coasters not working on Beagle In-Reply-To: <1402284424.15405.1.camel@echo> References: <5394DA10.3040404@anl.gov> <1402266795.32444.2.camel@echo> <539525AE.7080705@anl.gov> <1402284171.15313.0.camel@echo> <1402284424.15405.1.camel@echo> Message-ID: <53952A97.7000002@anl.gov> I'll try the other addresses for that host. Maybe something changed there in iptables or similar. - MIke On 6/8/14, 10:27 PM, Mihael Hategan wrote: > Ok, so: > > shell1: hategan at login1:~> netcat -l -p 50003 > > shell2: hategan at login1:~> netstat -lntp > ... > tcp 0 0 0.0.0.0:50003 0.0.0.0:* > LISTEN 22806/netcat > ... > > hategan at login1:~> telnet 127.0.0.1 50003 > Trying 127.0.0.1... > telnet: connect to address 127.0.0.1: Connection timed out > > I don't think this has anything to do with swift or coasters. > > Mihael > > On Sun, 2014-06-08 at 20:22 -0700, Mihael Hategan wrote: >> That's odd. Have you tried netstat -lntp? telnet? >> >> I'll give it a shot, but this looks rather strange. >> >> Mihael >> >> On Sun, 2014-06-08 at 22:10 -0500, Michael Wilde wrote: >>> login1$ more /home/wilde/.globus/coasters/worker-0608-0710120-000000.log >>> 2014/06/08 22:07:12.296 INFO - 0608-0710120-000000 Logging started: Sun >>> Jun 8 22:07:12 2014 >>> 2014/06/08 22:07:12.296 INFO - Running on node >>> login1.beagle.ci.uchicago.edu >>> 2014/06/08 22:07:12.296 DEBUG - uri=http://127.0.0.1:50003 >>> 2014/06/08 22:07:12.296 DEBUG - scheme=http >>> 2014/06/08 22:07:12.297 DEBUG - host=127.0.0.1 >>> 2014/06/08 22:07:12.297 DEBUG - port=50003 >>> 2014/06/08 22:07:12.297 DEBUG - blockid=0608-0710120-000000 >>> 2014/06/08 22:07:12.297 INFO - Connect attempt: 0 ... >>> 2014/06/08 22:07:12.297 INFO - Trying 127.0.0.1:50003 ... >>> 2014/06/08 22:07:33.296 INFO - Connection failed: Connection timed out. >>> Trying other addresses >>> 2014/06/08 22:07:33.296 ERROR - Connection failed for all addresses. >>> 2014/06/08 22:07:33.296 ERROR - Retrying in 1 seconds >>> 2014/06/08 22:07:34.297 INFO - Connect attempt: 1 ... >>> 2014/06/08 22:07:34.297 INFO - Trying 127.0.0.1:50003 ... >>> 2014/06/08 22:07:55.295 INFO - Connection failed: Connection timed out. >>> Trying other addresses >>> 2014/06/08 22:07:55.296 ERROR - Connection failed for all addresses. >>> 2014/06/08 22:07:55.296 ERROR - Retrying in 2 seconds >>> 2014/06/08 22:07:57.298 INFO - Connect attempt: 2 ... >>> 2014/06/08 22:07:57.298 INFO - Trying 127.0.0.1:50003 ... >>> 2014/06/08 22:08:18.295 INFO - Connection failed: Connection timed out. >>> Trying other addresses >>> 2014/06/08 22:08:18.295 ERROR - Connection failed for all addresses. >>> 2014/06/08 22:08:18.295 ERROR - Failed to connect: Connection timed out >>> login1$ >>> >>> >>> On 6/8/14, 5:33 PM, Mihael Hategan wrote: >>>> Can you enable worker logging and post the worker log? >>>> >>>> Mihael >>>> >>>> On Sun, 2014-06-08 at 16:48 -0500, Michael Wilde wrote: >>>>> Mihael - Im not able to get a simple localhost coasters run working on >>>>> Beagle login1. >>>>> >>>>> All: Is anyone seeing something similar? It looks to me like my coaster >>>>> worker is not able to connect to the Swift coaster service (using >>>>> standard automatic coasters). >>>>> >>>>> Im working in /lustre/beagle/wilde/swift/lab/fastio (where you can find >>>>> logs and configs). Running 0.95RC6. >>>>> >>>>> Im setting GLOBUS_HOSTNAME (to 127.0.0.1) and have tried internalHost as >>>>> well: >>>>> >>>>> login1$ swift -config cf -tc.file apps -sites.file localcoast.xml >>>>> catsn.swift >>>>> >>>>> login1$ cat localcoast.xml >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> 127.0.0.1 >>>>> 00:01:00 >>>>> 3600 >>>>> >>>>> 1 >>>>> 1 >>>>> 1 >>>>> 1 >>>>> >>>>> 12 >>>>> 10000 >>>>> >>>>> 100 >>>>> 100 >>>>> >>>>> >>>>> /tmp/swiftwork >>>>> >>>>> >>>>> >>>>> >>>>> I get error 110 connection timeouts: >>>>> >>>>> 2014-06-08 16:37:50,762-0500 DEBUG swift JOB_START jobid=cat-7jiymsrl >>>>> tr=cat arguments=[data.txt] tmpdir=catsn-run013/jobs/7/cat-7jiymsrl >>>>> host=localhost >>>>> 2014-06-08 16:37:50,829-0500 INFO LocalService Started local service: >>>>> 127.0.0.1:50000 >>>>> 2014-06-08 16:37:50,837-0500 INFO BootstrapService Socket bound. URL is >>>>> http://127.0.0.1:50001 >>>>> 2014-06-08 16:37:50,914-0500 INFO Settings Local contacts: >>>>> [http://127.0.0.2:50003, http://192.5.86.104:50003, >>>>> http://10.128.2.244:50003] >>>>> 2014-06-08 16:37:50,917-0500 INFO CoasterService Started local service: >>>>> http://127.0.0.1:50003 >>>>> 2014-06-08 16:37:50,917-0500 INFO CoasterService Reserving channel for >>>>> registration >>>>> 2014-06-08 16:37:50,942-0500 INFO MetaChannel MetaChannel [context: >>>>> cpipe, boundTo: null] binding to cpipe://1 >>>>> 2014-06-08 16:37:50,942-0500 INFO MetaChannel MetaChannel [context: >>>>> spipe, boundTo: null] binding to spipe://1 >>>>> 2014-06-08 16:37:50,942-0500 INFO CoasterService Sending registration >>>>> 2014-06-08 16:37:50,948-0500 INFO MetaChannel Trying to re-bind current >>>>> channel >>>>> 2014-06-08 16:37:50,949-0500 INFO RequestHandler Handler(tag: 1, >>>>> REGISTER) unregistering (send) >>>>> 2014-06-08 16:37:50,949-0500 INFO CoasterService Registration complete >>>>> 2014-06-08 16:37:50,949-0500 INFO CoasterService Started coaster >>>>> service: http://127.0.0.1:50002 >>>>> 2014-06-08 16:37:50,952-0500 WARN Settings original callback URI is >>>>> http://10.128.2.244:50003 >>>>> 2014-06-08 16:37:50,952-0500 WARN Settings callback URI has been >>>>> overridden to http://127.0.0.1:50003 >>>>> 2014-06-08 16:37:50,953-0500 INFO RequestHandler Handler(tag: 1, >>>>> CONFIGSERVICE) unregistering (send) >>>>> 2014-06-08 16:37:50,969-0500 INFO BlockQueueProcessor Starting... >>>>> id=0608-3704500 >>>>> 2014-06-08 16:37:50,969-0500 INFO RequestHandler Handler(tag: 2, >>>>> SUBMITJOB) unregistering (send) >>>>> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor >>>>> Settings { >>>>> slots = 1 >>>>> jobsPerNode = 1 >>>>> workersPerNode = 1 >>>>> nodeGranularity = 1 >>>>> allocationStepSize = 0.1 >>>>> maxNodes = 1 >>>>> lowOverallocation = 10.0 >>>>> highOverallocation = 1.0 >>>>> overallocationDecayFactor = 0.001 >>>>> spread = 0.9 >>>>> reserve = 60.000s >>>>> maxtime = 3600 >>>>> remoteMonitorEnabled = false >>>>> internalHostname = 127.0.0.1 >>>>> hookClass = null >>>>> workerManager = block >>>>> workerLoggingLevel = NONE >>>>> workerLoggingDirectory = DEFAULT >>>>> ldLibraryPath = null >>>>> workerCopies = null >>>>> directory = null >>>>> useHashBang = null >>>>> parallelism = 0.01 >>>>> coresPerNode = 1 >>>>> perfTraceWorker = false >>>>> perfTraceInterval = -1 >>>>> attributes = {} >>>>> callbackURIs = [http://127.0.0.1:50003] >>>>> } >>>>> >>>>> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Jobs in holding >>>>> queue: 1 >>>>> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Time estimate for >>>>> holding queue (seconds): 1 >>>>> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Allocating blocks >>>>> for a total walltime of: 1s >>>>> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Considering: >>>>> Job(id:0 60.000s) >>>>> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Max >>>>> Walltime (seconds): 60 >>>>> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Time >>>>> estimate (seconds): 600 >>>>> 2014-06-08 16:37:51,010-0500 INFO BlockQueueProcessor Total for >>>>> this new Block (est. seconds): 0 >>>>> 2014-06-08 16:37:51,013-0500 INFO BlockQueueProcessor index: 0, last: >>>>> 0, holding.size(): 1 >>>>> 2014-06-08 16:37:51,014-0500 INFO BlockQueueProcessor Queued: 1 jobs to >>>>> new Block >>>>> 2014-06-08 16:37:51,014-0500 INFO BlockQueueProcessor index: 0, last: >>>>> 0, ii: 1, holding.size(): 1 >>>>> 2014-06-08 16:37:51,014-0500 INFO Block Starting block: workers=1, >>>>> walltime=600.000s >>>>> 2014-06-08 16:37:51,016-0500 INFO RemoteLogHandler BLOCK_REQUESTED >>>>> id=0608-3704500-000000, cores=1, coresPerWorker=1, walltime=600 >>>>> 2014-06-08 16:37:51,016-0500 INFO RequestHandler Handler(tag: 2, RLOG) >>>>> unregistering (send) >>>>> 2014-06-08 16:37:51,018-0500 INFO BlockTaskSubmitter Queuing block >>>>> Block 0608-3704500-000000 (1x600.000s) for submission >>>>> 2014-06-08 16:37:51,018-0500 INFO BlockQueueProcessor Added 1 jobs to >>>>> new blocks >>>>> 2014-06-08 16:37:51,018-0500 INFO BlockTaskSubmitter Submitting block >>>>> Block 0608-3704500-000000 (1x600.000s) >>>>> 2014-06-08 16:37:51,018-0500 INFO ExecutionTaskHandler provider=local >>>>> 2014-06-08 16:37:51,023-0500 INFO Block Block task status changed: >>>>> Submitting >>>>> 2014-06-08 16:37:51,023-0500 INFO JobSubmissionTaskHandler Submit: in: >>>>> / command: /usr/bin/perl >>>>> /home/wilde/.globus/coasters/cscript2445623341660096310.pl >>>>> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING >>>>> 2014-06-08 16:37:51,024-0500 INFO Block Block task status changed: >>>>> Submitted >>>>> 2014-06-08 16:37:51,027-0500 INFO Block Block task status changed: Active >>>>> 2014-06-08 16:37:51,027-0500 INFO RemoteLogHandler BLOCK_ACTIVE >>>>> id=0608-3704500-000000 >>>>> 2014-06-08 16:37:51,027-0500 INFO RequestHandler Handler(tag: 3, RLOG) >>>>> unregistering (send) >>>>> 2014-06-08 16:37:51,681-0500 INFO RuntimeStats$ProgressTicker Submitted:1 >>>>> 2014-06-08 16:37:51,681-0500 INFO RuntimeStats$ProgressTicker HeapMax: >>>>> 954466304, CrtHeap: 253624320, UsedHeap: 28583112 >>>>> 2014-06-08 16:38:21,683-0500 INFO RuntimeStats$ProgressTicker Submitted:1 >>>>> 2014-06-08 16:38:21,683-0500 INFO RuntimeStats$ProgressTicker HeapMax: >>>>> 954466304, CrtHeap: 253624320, UsedHeap: 29067208 >>>>> 2014-06-08 16:38:51,686-0500 INFO RuntimeStats$ProgressTicker Submitted:1 >>>>> 2014-06-08 16:38:51,686-0500 INFO RuntimeStats$ProgressTicker HeapMax: >>>>> 954466304, CrtHeap: 253624320, UsedHeap: 29551304 >>>>> 2014-06-08 16:38:57,113-0500 INFO Block Block task status changed: >>>>> Failed Job failed with an exit code of 110 >>>>> 2014-06-08 16:38:57,115-0500 INFO Block Failed task spec: Job: >>>>> executable: /usr/bin/perl >>>>> arguments: >>>>> /home/wilde/.globus/coasters/cscript2445623341660096310.pl >>>>> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING >>>>> stdout: null >>>>> stderr: null >>>>> directory: / >>>>> batch: false >>>>> redirected: false >>>>> attributes: >>>>> hostcount=1,count=1,jobspernode=1,corespernode=1,maxwalltime=10 >>>>> env: WORKER_LOGGING_LEVEL=NONE >>>>> >>>>> 2014-06-08 16:38:57,115-0500 INFO Block Worker task failed: >>>>> Failed to connect: Connection timed out at >>>>> /home/wilde/.globus/coasters/cscript2445623341660096310.pl line 1101. >>>>> >>>>> >>>>> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago From wilde at anl.gov Sun Jun 8 23:12:50 2014 From: wilde at anl.gov (Michael Wilde) Date: Sun, 8 Jun 2014 23:12:50 -0500 Subject: [Swift-devel] Localhost coasters not working on Beagle In-Reply-To: <53952A97.7000002@anl.gov> References: <5394DA10.3040404@anl.gov> <1402266795.32444.2.camel@echo> <539525AE.7080705@anl.gov> <1402284171.15313.0.camel@echo> <1402284424.15405.1.camel@echo> <53952A97.7000002@anl.gov> Message-ID: <53953442.1090600@anl.gov> Yadu pointed out that beagle's login host ports are open at a different range. When I set the correct port range in GLOBUS_TCP_PORT_RANGE and GLOBUS_TCP_SOURCE_RANGE, it works. The swift module on Beagle does this automatically. I was using my own download of 0.95-RC6. Thanks, Mihael and Yadu. - Mike On 6/8/14, 10:31 PM, Michael Wilde wrote: > I'll try the other addresses for that host. > > Maybe something changed there in iptables or similar. > > - MIke > > On 6/8/14, 10:27 PM, Mihael Hategan wrote: >> Ok, so: >> >> shell1: hategan at login1:~> netcat -l -p 50003 >> >> shell2: hategan at login1:~> netstat -lntp >> ... >> tcp 0 0 0.0.0.0:50003 0.0.0.0:* >> LISTEN 22806/netcat >> ... >> >> hategan at login1:~> telnet 127.0.0.1 50003 >> Trying 127.0.0.1... >> telnet: connect to address 127.0.0.1: Connection timed out >> >> I don't think this has anything to do with swift or coasters. >> >> Mihael >> >> On Sun, 2014-06-08 at 20:22 -0700, Mihael Hategan wrote: >>> That's odd. Have you tried netstat -lntp? telnet? >>> >>> I'll give it a shot, but this looks rather strange. >>> >>> Mihael >>> >>> On Sun, 2014-06-08 at 22:10 -0500, Michael Wilde wrote: >>>> login1$ more /home/wilde/.globus/coasters/worker-0608-0710120-000000.log >>>> 2014/06/08 22:07:12.296 INFO - 0608-0710120-000000 Logging started: Sun >>>> Jun 8 22:07:12 2014 >>>> 2014/06/08 22:07:12.296 INFO - Running on node >>>> login1.beagle.ci.uchicago.edu >>>> 2014/06/08 22:07:12.296 DEBUG - uri=http://127.0.0.1:50003 >>>> 2014/06/08 22:07:12.296 DEBUG - scheme=http >>>> 2014/06/08 22:07:12.297 DEBUG - host=127.0.0.1 >>>> 2014/06/08 22:07:12.297 DEBUG - port=50003 >>>> 2014/06/08 22:07:12.297 DEBUG - blockid=0608-0710120-000000 >>>> 2014/06/08 22:07:12.297 INFO - Connect attempt: 0 ... >>>> 2014/06/08 22:07:12.297 INFO - Trying 127.0.0.1:50003 ... >>>> 2014/06/08 22:07:33.296 INFO - Connection failed: Connection timed out. >>>> Trying other addresses >>>> 2014/06/08 22:07:33.296 ERROR - Connection failed for all addresses. >>>> 2014/06/08 22:07:33.296 ERROR - Retrying in 1 seconds >>>> 2014/06/08 22:07:34.297 INFO - Connect attempt: 1 ... >>>> 2014/06/08 22:07:34.297 INFO - Trying 127.0.0.1:50003 ... >>>> 2014/06/08 22:07:55.295 INFO - Connection failed: Connection timed out. >>>> Trying other addresses >>>> 2014/06/08 22:07:55.296 ERROR - Connection failed for all addresses. >>>> 2014/06/08 22:07:55.296 ERROR - Retrying in 2 seconds >>>> 2014/06/08 22:07:57.298 INFO - Connect attempt: 2 ... >>>> 2014/06/08 22:07:57.298 INFO - Trying 127.0.0.1:50003 ... >>>> 2014/06/08 22:08:18.295 INFO - Connection failed: Connection timed out. >>>> Trying other addresses >>>> 2014/06/08 22:08:18.295 ERROR - Connection failed for all addresses. >>>> 2014/06/08 22:08:18.295 ERROR - Failed to connect: Connection timed out >>>> login1$ >>>> >>>> >>>> On 6/8/14, 5:33 PM, Mihael Hategan wrote: >>>>> Can you enable worker logging and post the worker log? >>>>> >>>>> Mihael >>>>> >>>>> On Sun, 2014-06-08 at 16:48 -0500, Michael Wilde wrote: >>>>>> Mihael - Im not able to get a simple localhost coasters run working on >>>>>> Beagle login1. >>>>>> >>>>>> All: Is anyone seeing something similar? It looks to me like my coaster >>>>>> worker is not able to connect to the Swift coaster service (using >>>>>> standard automatic coasters). >>>>>> >>>>>> Im working in /lustre/beagle/wilde/swift/lab/fastio (where you can find >>>>>> logs and configs). Running 0.95RC6. >>>>>> >>>>>> Im setting GLOBUS_HOSTNAME (to 127.0.0.1) and have tried internalHost as >>>>>> well: >>>>>> >>>>>> login1$ swift -config cf -tc.file apps -sites.file localcoast.xml >>>>>> catsn.swift >>>>>> >>>>>> login1$ cat localcoast.xml >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 127.0.0.1 >>>>>> 00:01:00 >>>>>> 3600 >>>>>> >>>>>> 1 >>>>>> 1 >>>>>> 1 >>>>>> 1 >>>>>> >>>>>> 12 >>>>>> 10000 >>>>>> >>>>>> 100 >>>>>> 100 >>>>>> >>>>>> >>>>>> /tmp/swiftwork >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> I get error 110 connection timeouts: >>>>>> >>>>>> 2014-06-08 16:37:50,762-0500 DEBUG swift JOB_START jobid=cat-7jiymsrl >>>>>> tr=cat arguments=[data.txt] tmpdir=catsn-run013/jobs/7/cat-7jiymsrl >>>>>> host=localhost >>>>>> 2014-06-08 16:37:50,829-0500 INFO LocalService Started local service: >>>>>> 127.0.0.1:50000 >>>>>> 2014-06-08 16:37:50,837-0500 INFO BootstrapService Socket bound. URL is >>>>>> http://127.0.0.1:50001 >>>>>> 2014-06-08 16:37:50,914-0500 INFO Settings Local contacts: >>>>>> [http://127.0.0.2:50003, http://192.5.86.104:50003, >>>>>> http://10.128.2.244:50003] >>>>>> 2014-06-08 16:37:50,917-0500 INFO CoasterService Started local service: >>>>>> http://127.0.0.1:50003 >>>>>> 2014-06-08 16:37:50,917-0500 INFO CoasterService Reserving channel for >>>>>> registration >>>>>> 2014-06-08 16:37:50,942-0500 INFO MetaChannel MetaChannel [context: >>>>>> cpipe, boundTo: null] binding to cpipe://1 >>>>>> 2014-06-08 16:37:50,942-0500 INFO MetaChannel MetaChannel [context: >>>>>> spipe, boundTo: null] binding to spipe://1 >>>>>> 2014-06-08 16:37:50,942-0500 INFO CoasterService Sending registration >>>>>> 2014-06-08 16:37:50,948-0500 INFO MetaChannel Trying to re-bind current >>>>>> channel >>>>>> 2014-06-08 16:37:50,949-0500 INFO RequestHandler Handler(tag: 1, >>>>>> REGISTER) unregistering (send) >>>>>> 2014-06-08 16:37:50,949-0500 INFO CoasterService Registration complete >>>>>> 2014-06-08 16:37:50,949-0500 INFO CoasterService Started coaster >>>>>> service: http://127.0.0.1:50002 >>>>>> 2014-06-08 16:37:50,952-0500 WARN Settings original callback URI is >>>>>> http://10.128.2.244:50003 >>>>>> 2014-06-08 16:37:50,952-0500 WARN Settings callback URI has been >>>>>> overridden to http://127.0.0.1:50003 >>>>>> 2014-06-08 16:37:50,953-0500 INFO RequestHandler Handler(tag: 1, >>>>>> CONFIGSERVICE) unregistering (send) >>>>>> 2014-06-08 16:37:50,969-0500 INFO BlockQueueProcessor Starting... >>>>>> id=0608-3704500 >>>>>> 2014-06-08 16:37:50,969-0500 INFO RequestHandler Handler(tag: 2, >>>>>> SUBMITJOB) unregistering (send) >>>>>> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor >>>>>> Settings { >>>>>> slots = 1 >>>>>> jobsPerNode = 1 >>>>>> workersPerNode = 1 >>>>>> nodeGranularity = 1 >>>>>> allocationStepSize = 0.1 >>>>>> maxNodes = 1 >>>>>> lowOverallocation = 10.0 >>>>>> highOverallocation = 1.0 >>>>>> overallocationDecayFactor = 0.001 >>>>>> spread = 0.9 >>>>>> reserve = 60.000s >>>>>> maxtime = 3600 >>>>>> remoteMonitorEnabled = false >>>>>> internalHostname = 127.0.0.1 >>>>>> hookClass = null >>>>>> workerManager = block >>>>>> workerLoggingLevel = NONE >>>>>> workerLoggingDirectory = DEFAULT >>>>>> ldLibraryPath = null >>>>>> workerCopies = null >>>>>> directory = null >>>>>> useHashBang = null >>>>>> parallelism = 0.01 >>>>>> coresPerNode = 1 >>>>>> perfTraceWorker = false >>>>>> perfTraceInterval = -1 >>>>>> attributes = {} >>>>>> callbackURIs = [http://127.0.0.1:50003] >>>>>> } >>>>>> >>>>>> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Jobs in holding >>>>>> queue: 1 >>>>>> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Time estimate for >>>>>> holding queue (seconds): 1 >>>>>> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Allocating blocks >>>>>> for a total walltime of: 1s >>>>>> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Considering: >>>>>> Job(id:0 60.000s) >>>>>> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Max >>>>>> Walltime (seconds): 60 >>>>>> 2014-06-08 16:37:51,009-0500 INFO BlockQueueProcessor Time >>>>>> estimate (seconds): 600 >>>>>> 2014-06-08 16:37:51,010-0500 INFO BlockQueueProcessor Total for >>>>>> this new Block (est. seconds): 0 >>>>>> 2014-06-08 16:37:51,013-0500 INFO BlockQueueProcessor index: 0, last: >>>>>> 0, holding.size(): 1 >>>>>> 2014-06-08 16:37:51,014-0500 INFO BlockQueueProcessor Queued: 1 jobs to >>>>>> new Block >>>>>> 2014-06-08 16:37:51,014-0500 INFO BlockQueueProcessor index: 0, last: >>>>>> 0, ii: 1, holding.size(): 1 >>>>>> 2014-06-08 16:37:51,014-0500 INFO Block Starting block: workers=1, >>>>>> walltime=600.000s >>>>>> 2014-06-08 16:37:51,016-0500 INFO RemoteLogHandler BLOCK_REQUESTED >>>>>> id=0608-3704500-000000, cores=1, coresPerWorker=1, walltime=600 >>>>>> 2014-06-08 16:37:51,016-0500 INFO RequestHandler Handler(tag: 2, RLOG) >>>>>> unregistering (send) >>>>>> 2014-06-08 16:37:51,018-0500 INFO BlockTaskSubmitter Queuing block >>>>>> Block 0608-3704500-000000 (1x600.000s) for submission >>>>>> 2014-06-08 16:37:51,018-0500 INFO BlockQueueProcessor Added 1 jobs to >>>>>> new blocks >>>>>> 2014-06-08 16:37:51,018-0500 INFO BlockTaskSubmitter Submitting block >>>>>> Block 0608-3704500-000000 (1x600.000s) >>>>>> 2014-06-08 16:37:51,018-0500 INFO ExecutionTaskHandler provider=local >>>>>> 2014-06-08 16:37:51,023-0500 INFO Block Block task status changed: >>>>>> Submitting >>>>>> 2014-06-08 16:37:51,023-0500 INFO JobSubmissionTaskHandler Submit: in: >>>>>> / command: /usr/bin/perl >>>>>> /home/wilde/.globus/coasters/cscript2445623341660096310.pl >>>>>> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING >>>>>> 2014-06-08 16:37:51,024-0500 INFO Block Block task status changed: >>>>>> Submitted >>>>>> 2014-06-08 16:37:51,027-0500 INFO Block Block task status changed: Active >>>>>> 2014-06-08 16:37:51,027-0500 INFO RemoteLogHandler BLOCK_ACTIVE >>>>>> id=0608-3704500-000000 >>>>>> 2014-06-08 16:37:51,027-0500 INFO RequestHandler Handler(tag: 3, RLOG) >>>>>> unregistering (send) >>>>>> 2014-06-08 16:37:51,681-0500 INFO RuntimeStats$ProgressTicker Submitted:1 >>>>>> 2014-06-08 16:37:51,681-0500 INFO RuntimeStats$ProgressTicker HeapMax: >>>>>> 954466304, CrtHeap: 253624320, UsedHeap: 28583112 >>>>>> 2014-06-08 16:38:21,683-0500 INFO RuntimeStats$ProgressTicker Submitted:1 >>>>>> 2014-06-08 16:38:21,683-0500 INFO RuntimeStats$ProgressTicker HeapMax: >>>>>> 954466304, CrtHeap: 253624320, UsedHeap: 29067208 >>>>>> 2014-06-08 16:38:51,686-0500 INFO RuntimeStats$ProgressTicker Submitted:1 >>>>>> 2014-06-08 16:38:51,686-0500 INFO RuntimeStats$ProgressTicker HeapMax: >>>>>> 954466304, CrtHeap: 253624320, UsedHeap: 29551304 >>>>>> 2014-06-08 16:38:57,113-0500 INFO Block Block task status changed: >>>>>> Failed Job failed with an exit code of 110 >>>>>> 2014-06-08 16:38:57,115-0500 INFO Block Failed task spec: Job: >>>>>> executable: /usr/bin/perl >>>>>> arguments: >>>>>> /home/wilde/.globus/coasters/cscript2445623341660096310.pl >>>>>> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING >>>>>> stdout: null >>>>>> stderr: null >>>>>> directory: / >>>>>> batch: false >>>>>> redirected: false >>>>>> attributes: >>>>>> hostcount=1,count=1,jobspernode=1,corespernode=1,maxwalltime=10 >>>>>> env: WORKER_LOGGING_LEVEL=NONE >>>>>> >>>>>> 2014-06-08 16:38:57,115-0500 INFO Block Worker task failed: >>>>>> Failed to connect: Connection timed out at >>>>>> /home/wilde/.globus/coasters/cscript2445623341660096310.pl line 1101. >>>>>> >>>>>> >>>>>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago From lulzanonym at gmail.com Mon Jun 9 05:07:11 2014 From: lulzanonym at gmail.com (Walid Braham) Date: Mon, 9 Jun 2014 12:07:11 +0200 Subject: [Swift-devel] Subscription Message-ID: lulzanonym at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From tga at uchicago.edu Wed Jun 11 07:33:23 2014 From: tga at uchicago.edu (Tim Armstrong) Date: Wed, 11 Jun 2014 14:33:23 +0200 Subject: [Swift-devel] Swift-T Github mirror moved Message-ID: <53984C93.3080003@uchicago.edu> I moved the Swift/T github mirror from my personal account to the swift organisation. You can find it at https://github.com/swift-lang/swift-t . Cheers, Tim From yadunand at uchicago.edu Wed Jun 11 11:41:58 2014 From: yadunand at uchicago.edu (Yadu Nand Babuji) Date: Wed, 11 Jun 2014 11:41:58 -0500 Subject: [Swift-devel] SVN down ? Message-ID: <539886D6.6000602@uchicago.edu> Hi, Since about an hour back I'm not able to access svn.ci.uchicago.edu. The online repo browser wouldn't load as well. Whom do I contact ? [yadunand at midway001 tests]$ svn up Updating '.': svn: E000110: Unable to connect to a repository at URL 'https://svn.ci.uchicago.edu/svn/vdl2/branches/release-0.95' svn: E000110: Error running context: Connection timed out Thanks, Yadu From davidkelly at uchicago.edu Wed Jun 11 11:53:59 2014 From: davidkelly at uchicago.edu (David Kelly) Date: Wed, 11 Jun 2014 11:53:59 -0500 Subject: [Swift-devel] SVN down ? In-Reply-To: <539886D6.6000602@uchicago.edu> References: <539886D6.6000602@uchicago.edu> Message-ID: CI support maintains the svn server On Wed, Jun 11, 2014 at 11:41 AM, Yadu Nand Babuji wrote: > Hi, > > Since about an hour back I'm not able to access svn.ci.uchicago.edu. The > online repo browser wouldn't load as well. > Whom do I contact ? > > [yadunand at midway001 tests]$ svn up > Updating '.': > svn: E000110: Unable to connect to a repository at URL > 'https://svn.ci.uchicago.edu/svn/vdl2/branches/release-0.95' > svn: E000110: Error running context: Connection timed out > > Thanks, > Yadu > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadunand at uchicago.edu Wed Jun 11 14:46:25 2014 From: yadunand at uchicago.edu (Yadu Nand Babuji) Date: Wed, 11 Jun 2014 14:46:25 -0500 Subject: [Swift-devel] Reducing swift log size Message-ID: <5398B211.8060403@uchicago.edu> Hi, I'm running a proxy app for a user with 6000 tasks each taking a few milliseconds, and the log sizes are unusually large. When the tasks are set to take 20s, the total log size reaches ~7Gb. I tried setting -minimal.logging and -reduced.logging but still see debug lines in the log like: 2014-06-11 19:15:06,961+0000 DEBUG Cpu worker started: block=0611-1507050-000000 host=midway461 id=10 2014-06-11 19:15:06,961+0000 DEBUG Cpu ready for work: block=0611-1507050-000000 id=10 Do we need DEBUG lines such as the ones listed above ? Is it reasonable to have these set by default to WARN ? Secondly, setting -minimal.logging did not turn off these DEBUG lines and I had to set the following log4j.properties from DEBUG to WARN to remove most of the offending lines: log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=WARN log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block=WARN With 6001 tasks, each taking 2 ms or so: Swift without any changes to logging -> 440879 lines and 51Mb Swift with -minimal.logging -> 83350 lines and 9.5Mb Swift with -minimal.logging and -> 7625 lines and 791Kb Cpu, Block log4j properties set Thanks, Yadu From hategan at mcs.anl.gov Wed Jun 11 14:54:57 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 11 Jun 2014 12:54:57 -0700 Subject: [Swift-devel] Reducing swift log size In-Reply-To: <5398B211.8060403@uchicago.edu> References: <5398B211.8060403@uchicago.edu> Message-ID: <1402516497.26970.3.camel@echo> On Wed, 2014-06-11 at 14:46 -0500, Yadu Nand Babuji wrote: > Hi, > > I'm running a proxy app for a user with 6000 tasks each taking a few > milliseconds, and the log sizes are unusually large. When the tasks are > set to take 20s, the total log size reaches ~7Gb. 7GB? Wow. I'd like to see that. Can you upload and post link? > > I tried setting -minimal.logging and -reduced.logging but still see > debug lines in the log like: > 2014-06-11 19:15:06,961+0000 DEBUG Cpu worker started: > block=0611-1507050-000000 host=midway461 id=10 > 2014-06-11 19:15:06,961+0000 DEBUG Cpu ready for work: > block=0611-1507050-000000 id=10 > > Do we need DEBUG lines such as the ones listed above ? Is it reasonable > to have these set by default to WARN ? It is until there's a problem and then people ask for the opposite. We should evaluate whether this belongs in reduced logging or not. But does that really account for most of the 7G? > Secondly, setting -minimal.logging did not turn off these DEBUG lines > and I had to set the following log4j.properties from DEBUG to WARN to > remove most of the offending lines: That sounds like a bug. > > log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=WARN > log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block=WARN > > With 6001 tasks, each taking 2 ms or so: > Swift without any changes to logging -> 440879 lines and 51Mb > Swift with -minimal.logging -> 83350 lines and 9.5Mb > Swift with -minimal.logging and -> 7625 lines and 791Kb > Cpu, Block log4j properties set > > Thanks, > Yadu > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From yadunand at uchicago.edu Wed Jun 11 17:57:22 2014 From: yadunand at uchicago.edu (Yadu Nand Babuji) Date: Wed, 11 Jun 2014 17:57:22 -0500 Subject: [Swift-devel] Reducing swift log size In-Reply-To: <1402516497.26970.3.camel@echo> References: <5398B211.8060403@uchicago.edu> <1402516497.26970.3.camel@echo> Message-ID: <5398DED2.70808@uchicago.edu> Hi Mihael, I've got the logs for you. This time, i've run the 6001 tasks with 20s delay added, and was all run with swift-0.95-RC6 (from our website) : Normal run -> http://users.rcc.uchicago.edu/~yadunand/run010_swift_normal/run010.log ( 6.1gb ) With minimal logging -> http://users.rcc.uchicago.edu/~yadunand/run011_minimal_logging/run011.log ( 7.5 gb ) With minimal logging -> http://users.rcc.uchicago.edu/~yadunand/run012_minimal_plus_log4j/run012.log ( 845.4 kb ) & log4j properties modified The run with minimal logging ran for ~15mins while the first run took ~12mins. That *might* explain why the one with minimal logging is larger. Thanks, Yadu On 06/11/2014 02:54 PM, Mihael Hategan wrote: > On Wed, 2014-06-11 at 14:46 -0500, Yadu Nand Babuji wrote: >> Hi, >> >> I'm running a proxy app for a user with 6000 tasks each taking a few >> milliseconds, and the log sizes are unusually large. When the tasks are >> set to take 20s, the total log size reaches ~7Gb. > 7GB? Wow. I'd like to see that. Can you upload and post link? > >> I tried setting -minimal.logging and -reduced.logging but still see >> debug lines in the log like: >> 2014-06-11 19:15:06,961+0000 DEBUG Cpu worker started: >> block=0611-1507050-000000 host=midway461 id=10 >> 2014-06-11 19:15:06,961+0000 DEBUG Cpu ready for work: >> block=0611-1507050-000000 id=10 >> >> Do we need DEBUG lines such as the ones listed above ? Is it reasonable >> to have these set by default to WARN ? > It is until there's a problem and then people ask for the opposite. We > should evaluate whether this belongs in reduced logging or not. But does > that really account for most of the 7G? > >> Secondly, setting -minimal.logging did not turn off these DEBUG lines >> and I had to set the following log4j.properties from DEBUG to WARN to >> remove most of the offending lines: > That sounds like a bug. > >> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=WARN >> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block=WARN >> >> With 6001 tasks, each taking 2 ms or so: >> Swift without any changes to logging -> 440879 lines and 51Mb >> Swift with -minimal.logging -> 83350 lines and 9.5Mb >> Swift with -minimal.logging and -> 7625 lines and 791Kb >> Cpu, Block log4j properties set >> >> Thanks, >> Yadu >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Wed Jun 11 19:03:06 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 11 Jun 2014 17:03:06 -0700 Subject: [Swift-devel] Reducing swift log size In-Reply-To: <5398DED2.70808@uchicago.edu> References: <5398B211.8060403@uchicago.edu> <1402516497.26970.3.camel@echo> <5398DED2.70808@uchicago.edu> Message-ID: <1402531386.29962.1.camel@echo> Sorry, I should have mentioned this, but can you please gzip them? It's a bit much otherwise. Mihael On Wed, 2014-06-11 at 17:57 -0500, Yadu Nand Babuji wrote: > Hi Mihael, > > I've got the logs for you. > > This time, i've run the 6001 tasks with 20s delay added, and was all run > with swift-0.95-RC6 (from our website) : > > Normal run -> > http://users.rcc.uchicago.edu/~yadunand/run010_swift_normal/run010.log ( > 6.1gb ) > With minimal logging -> > http://users.rcc.uchicago.edu/~yadunand/run011_minimal_logging/run011.log ( > 7.5 gb ) > With minimal logging -> > http://users.rcc.uchicago.edu/~yadunand/run012_minimal_plus_log4j/run012.log > ( 845.4 kb ) > & log4j properties modified > > The run with minimal logging ran for ~15mins while the first run took > ~12mins. That *might* explain why the one > with minimal logging is larger. > > Thanks, > Yadu > > On 06/11/2014 02:54 PM, Mihael Hategan wrote: > > On Wed, 2014-06-11 at 14:46 -0500, Yadu Nand Babuji wrote: > >> Hi, > >> > >> I'm running a proxy app for a user with 6000 tasks each taking a few > >> milliseconds, and the log sizes are unusually large. When the tasks are > >> set to take 20s, the total log size reaches ~7Gb. > > 7GB? Wow. I'd like to see that. Can you upload and post link? > > > >> I tried setting -minimal.logging and -reduced.logging but still see > >> debug lines in the log like: > >> 2014-06-11 19:15:06,961+0000 DEBUG Cpu worker started: > >> block=0611-1507050-000000 host=midway461 id=10 > >> 2014-06-11 19:15:06,961+0000 DEBUG Cpu ready for work: > >> block=0611-1507050-000000 id=10 > >> > >> Do we need DEBUG lines such as the ones listed above ? Is it reasonable > >> to have these set by default to WARN ? > > It is until there's a problem and then people ask for the opposite. We > > should evaluate whether this belongs in reduced logging or not. But does > > that really account for most of the 7G? > > > >> Secondly, setting -minimal.logging did not turn off these DEBUG lines > >> and I had to set the following log4j.properties from DEBUG to WARN to > >> remove most of the offending lines: > > That sounds like a bug. > > > >> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=WARN > >> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block=WARN > >> > >> With 6001 tasks, each taking 2 ms or so: > >> Swift without any changes to logging -> 440879 lines and 51Mb > >> Swift with -minimal.logging -> 83350 lines and 9.5Mb > >> Swift with -minimal.logging and -> 7625 lines and 791Kb > >> Cpu, Block log4j properties set > >> > >> Thanks, > >> Yadu > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > From yadunand at uchicago.edu Wed Jun 11 19:45:56 2014 From: yadunand at uchicago.edu (Yadu Nand Babuji) Date: Wed, 11 Jun 2014 19:45:56 -0500 Subject: [Swift-devel] Reducing swift log size In-Reply-To: <1402531386.29962.1.camel@echo> References: <5398B211.8060403@uchicago.edu> <1402516497.26970.3.camel@echo> <5398DED2.70808@uchicago.edu> <1402531386.29962.1.camel@echo> Message-ID: <5398F844.4050607@uchicago.edu> Okay, here you go: http://users.rcc.uchicago.edu/~yadunand/run010_swift_normal.tar.gz http://users.rcc.uchicago.edu/~yadunand/run011_minimal_logging.tar.gz http://users.rcc.uchicago.edu/~yadunand/run012_minimal_plus_log4j.tar.gz -Yadu On 06/11/2014 07:03 PM, Mihael Hategan wrote: > Sorry, I should have mentioned this, but can you please gzip them? It's > a bit much otherwise. > > Mihael > > On Wed, 2014-06-11 at 17:57 -0500, Yadu Nand Babuji wrote: >> Hi Mihael, >> >> I've got the logs for you. >> >> This time, i've run the 6001 tasks with 20s delay added, and was all run >> with swift-0.95-RC6 (from our website) : >> >> Normal run -> >> http://users.rcc.uchicago.edu/~yadunand/run010_swift_normal/run010.log ( >> 6.1gb ) >> With minimal logging -> >> http://users.rcc.uchicago.edu/~yadunand/run011_minimal_logging/run011.log ( >> 7.5 gb ) >> With minimal logging -> >> http://users.rcc.uchicago.edu/~yadunand/run012_minimal_plus_log4j/run012.log >> ( 845.4 kb ) >> & log4j properties modified >> >> The run with minimal logging ran for ~15mins while the first run took >> ~12mins. That *might* explain why the one >> with minimal logging is larger. >> >> Thanks, >> Yadu >> >> On 06/11/2014 02:54 PM, Mihael Hategan wrote: >>> On Wed, 2014-06-11 at 14:46 -0500, Yadu Nand Babuji wrote: >>>> Hi, >>>> >>>> I'm running a proxy app for a user with 6000 tasks each taking a few >>>> milliseconds, and the log sizes are unusually large. When the tasks are >>>> set to take 20s, the total log size reaches ~7Gb. >>> 7GB? Wow. I'd like to see that. Can you upload and post link? >>> >>>> I tried setting -minimal.logging and -reduced.logging but still see >>>> debug lines in the log like: >>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu worker started: >>>> block=0611-1507050-000000 host=midway461 id=10 >>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu ready for work: >>>> block=0611-1507050-000000 id=10 >>>> >>>> Do we need DEBUG lines such as the ones listed above ? Is it reasonable >>>> to have these set by default to WARN ? >>> It is until there's a problem and then people ask for the opposite. We >>> should evaluate whether this belongs in reduced logging or not. But does >>> that really account for most of the 7G? >>> >>>> Secondly, setting -minimal.logging did not turn off these DEBUG lines >>>> and I had to set the following log4j.properties from DEBUG to WARN to >>>> remove most of the offending lines: >>> That sounds like a bug. >>> >>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=WARN >>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block=WARN >>>> >>>> With 6001 tasks, each taking 2 ms or so: >>>> Swift without any changes to logging -> 440879 lines and 51Mb >>>> Swift with -minimal.logging -> 83350 lines and 9.5Mb >>>> Swift with -minimal.logging and -> 7625 lines and 791Kb >>>> Cpu, Block log4j properties set >>>> >>>> Thanks, >>>> Yadu >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Thu Jun 12 01:45:39 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 11 Jun 2014 23:45:39 -0700 Subject: [Swift-devel] Reducing swift log size In-Reply-To: <5398F844.4050607@uchicago.edu> References: <5398B211.8060403@uchicago.edu> <1402516497.26970.3.camel@echo> <5398DED2.70808@uchicago.edu> <1402531386.29962.1.camel@echo> <5398F844.4050607@uchicago.edu> Message-ID: <1402555539.11363.1.camel@echo> Ok, the worker CPU stuff is indeed generating a lot of messages. But that isn't supposed to happen. It's supposed to do nothing if nothing happens. So I need to check what's going on. Mihael On Wed, 2014-06-11 at 19:45 -0500, Yadu Nand Babuji wrote: > Okay, here you go: > http://users.rcc.uchicago.edu/~yadunand/run010_swift_normal.tar.gz > http://users.rcc.uchicago.edu/~yadunand/run011_minimal_logging.tar.gz > http://users.rcc.uchicago.edu/~yadunand/run012_minimal_plus_log4j.tar.gz > > -Yadu > On 06/11/2014 07:03 PM, Mihael Hategan wrote: > > Sorry, I should have mentioned this, but can you please gzip them? It's > > a bit much otherwise. > > > > Mihael > > > > On Wed, 2014-06-11 at 17:57 -0500, Yadu Nand Babuji wrote: > >> Hi Mihael, > >> > >> I've got the logs for you. > >> > >> This time, i've run the 6001 tasks with 20s delay added, and was all run > >> with swift-0.95-RC6 (from our website) : > >> > >> Normal run -> > >> http://users.rcc.uchicago.edu/~yadunand/run010_swift_normal/run010.log ( > >> 6.1gb ) > >> With minimal logging -> > >> http://users.rcc.uchicago.edu/~yadunand/run011_minimal_logging/run011.log ( > >> 7.5 gb ) > >> With minimal logging -> > >> http://users.rcc.uchicago.edu/~yadunand/run012_minimal_plus_log4j/run012.log > >> ( 845.4 kb ) > >> & log4j properties modified > >> > >> The run with minimal logging ran for ~15mins while the first run took > >> ~12mins. That *might* explain why the one > >> with minimal logging is larger. > >> > >> Thanks, > >> Yadu > >> > >> On 06/11/2014 02:54 PM, Mihael Hategan wrote: > >>> On Wed, 2014-06-11 at 14:46 -0500, Yadu Nand Babuji wrote: > >>>> Hi, > >>>> > >>>> I'm running a proxy app for a user with 6000 tasks each taking a few > >>>> milliseconds, and the log sizes are unusually large. When the tasks are > >>>> set to take 20s, the total log size reaches ~7Gb. > >>> 7GB? Wow. I'd like to see that. Can you upload and post link? > >>> > >>>> I tried setting -minimal.logging and -reduced.logging but still see > >>>> debug lines in the log like: > >>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu worker started: > >>>> block=0611-1507050-000000 host=midway461 id=10 > >>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu ready for work: > >>>> block=0611-1507050-000000 id=10 > >>>> > >>>> Do we need DEBUG lines such as the ones listed above ? Is it reasonable > >>>> to have these set by default to WARN ? > >>> It is until there's a problem and then people ask for the opposite. We > >>> should evaluate whether this belongs in reduced logging or not. But does > >>> that really account for most of the 7G? > >>> > >>>> Secondly, setting -minimal.logging did not turn off these DEBUG lines > >>>> and I had to set the following log4j.properties from DEBUG to WARN to > >>>> remove most of the offending lines: > >>> That sounds like a bug. > >>> > >>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=WARN > >>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block=WARN > >>>> > >>>> With 6001 tasks, each taking 2 ms or so: > >>>> Swift without any changes to logging -> 440879 lines and 51Mb > >>>> Swift with -minimal.logging -> 83350 lines and 9.5Mb > >>>> Swift with -minimal.logging and -> 7625 lines and 791Kb > >>>> Cpu, Block log4j properties set > >>>> > >>>> Thanks, > >>>> Yadu > >>>> _______________________________________________ > >>>> Swift-devel mailing list > >>>> Swift-devel at ci.uchicago.edu > >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > From hategan at mcs.anl.gov Thu Jun 12 02:22:47 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 12 Jun 2014 00:22:47 -0700 Subject: [Swift-devel] Reducing swift log size In-Reply-To: <1402555539.11363.1.camel@echo> References: <5398B211.8060403@uchicago.edu> <1402516497.26970.3.camel@echo> <5398DED2.70808@uchicago.edu> <1402531386.29962.1.camel@echo> <5398F844.4050607@uchicago.edu> <1402555539.11363.1.camel@echo> Message-ID: <1402557767.11363.8.camel@echo> Allright, Can you do me the following favor? in coasters/src/......./Cpu.java, change line 183 from " Cpus sleeping: " + cpus); to " Cpus sleeping: " + cpus + ", qseq: " + lastseq + ", myseq: " + this.getLastSeq()); Then re-run and send me the log. You don't have to run the full thing. When you see the very frequent sleeping/requesting work craziness in the log, you can kill the run. I'm asking this because I have not seen this problem occurring, and it shouldn't be happening, but it clearly is and your version holds the key. Mihael On Wed, 2014-06-11 at 23:45 -0700, Mihael Hategan wrote: > Ok, the worker CPU stuff is indeed generating a lot of messages. But > that isn't supposed to happen. It's supposed to do nothing if nothing > happens. So I need to check what's going on. > > Mihael > > On Wed, 2014-06-11 at 19:45 -0500, Yadu Nand Babuji wrote: > > Okay, here you go: > > http://users.rcc.uchicago.edu/~yadunand/run010_swift_normal.tar.gz > > http://users.rcc.uchicago.edu/~yadunand/run011_minimal_logging.tar.gz > > http://users.rcc.uchicago.edu/~yadunand/run012_minimal_plus_log4j.tar.gz > > > > -Yadu > > On 06/11/2014 07:03 PM, Mihael Hategan wrote: > > > Sorry, I should have mentioned this, but can you please gzip them? It's > > > a bit much otherwise. > > > > > > Mihael > > > > > > On Wed, 2014-06-11 at 17:57 -0500, Yadu Nand Babuji wrote: > > >> Hi Mihael, > > >> > > >> I've got the logs for you. > > >> > > >> This time, i've run the 6001 tasks with 20s delay added, and was all run > > >> with swift-0.95-RC6 (from our website) : > > >> > > >> Normal run -> > > >> http://users.rcc.uchicago.edu/~yadunand/run010_swift_normal/run010.log ( > > >> 6.1gb ) > > >> With minimal logging -> > > >> http://users.rcc.uchicago.edu/~yadunand/run011_minimal_logging/run011.log ( > > >> 7.5 gb ) > > >> With minimal logging -> > > >> http://users.rcc.uchicago.edu/~yadunand/run012_minimal_plus_log4j/run012.log > > >> ( 845.4 kb ) > > >> & log4j properties modified > > >> > > >> The run with minimal logging ran for ~15mins while the first run took > > >> ~12mins. That *might* explain why the one > > >> with minimal logging is larger. > > >> > > >> Thanks, > > >> Yadu > > >> > > >> On 06/11/2014 02:54 PM, Mihael Hategan wrote: > > >>> On Wed, 2014-06-11 at 14:46 -0500, Yadu Nand Babuji wrote: > > >>>> Hi, > > >>>> > > >>>> I'm running a proxy app for a user with 6000 tasks each taking a few > > >>>> milliseconds, and the log sizes are unusually large. When the tasks are > > >>>> set to take 20s, the total log size reaches ~7Gb. > > >>> 7GB? Wow. I'd like to see that. Can you upload and post link? > > >>> > > >>>> I tried setting -minimal.logging and -reduced.logging but still see > > >>>> debug lines in the log like: > > >>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu worker started: > > >>>> block=0611-1507050-000000 host=midway461 id=10 > > >>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu ready for work: > > >>>> block=0611-1507050-000000 id=10 > > >>>> > > >>>> Do we need DEBUG lines such as the ones listed above ? Is it reasonable > > >>>> to have these set by default to WARN ? > > >>> It is until there's a problem and then people ask for the opposite. We > > >>> should evaluate whether this belongs in reduced logging or not. But does > > >>> that really account for most of the 7G? > > >>> > > >>>> Secondly, setting -minimal.logging did not turn off these DEBUG lines > > >>>> and I had to set the following log4j.properties from DEBUG to WARN to > > >>>> remove most of the offending lines: > > >>> That sounds like a bug. > > >>> > > >>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=WARN > > >>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block=WARN > > >>>> > > >>>> With 6001 tasks, each taking 2 ms or so: > > >>>> Swift without any changes to logging -> 440879 lines and 51Mb > > >>>> Swift with -minimal.logging -> 83350 lines and 9.5Mb > > >>>> Swift with -minimal.logging and -> 7625 lines and 791Kb > > >>>> Cpu, Block log4j properties set > > >>>> > > >>>> Thanks, > > >>>> Yadu > > >>>> _______________________________________________ > > >>>> Swift-devel mailing list > > >>>> Swift-devel at ci.uchicago.edu > > >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From yadunand at uchicago.edu Thu Jun 12 12:11:12 2014 From: yadunand at uchicago.edu (Yadu Nand Babuji) Date: Thu, 12 Jun 2014 12:11:12 -0500 Subject: [Swift-devel] Reducing swift log size In-Reply-To: <1402557767.11363.8.camel@echo> References: <5398B211.8060403@uchicago.edu> <1402516497.26970.3.camel@echo> <5398DED2.70808@uchicago.edu> <1402531386.29962.1.camel@echo> <5398F844.4050607@uchicago.edu> <1402555539.11363.1.camel@echo> <1402557767.11363.8.camel@echo> Message-ID: <5399DF30.6040708@uchicago.edu> Hi Mihael, Here's the package I'm running: http://users.rcc.uchicago.edu/~yadunand/swift-0.95-modded.tar.gz (It has Cpu.java modified, as well as as a debugging line in libexec/swift-int-staging.k) I shutdown the run once the logs were past 4gb, here's the link to the log : http://users.rcc.uchicago.edu/~yadunand/run013_with_mods_to_Cpujava.tar.gz Thanks, -Yadu On 06/11/2014 07:03 PM, Mihael Hategan wrote: >>>> Sorry, I should have mentioned this, but can you please gzip them? It's >>>> a bit much otherwise. >>>> >>>> Mihael >>>> >>>> On Wed, 2014-06-11 at 17:57 -0500, Yadu Nand Babuji wrote: >>>>> Hi Mihael, >>>>> >>>>> I've got the logs for you. >>>>> >>>>> This time, i've run the 6001 tasks with 20s delay added, and was all run >>>>> with swift-0.95-RC6 (from our website) : >>>>> >>>>> Normal run -> >>>>> http://users.rcc.uchicago.edu/~yadunand/run010_swift_normal/run010.log ( >>>>> 6.1gb ) >>>>> With minimal logging -> >>>>> http://users.rcc.uchicago.edu/~yadunand/run011_minimal_logging/run011.log ( >>>>> 7.5 gb ) >>>>> With minimal logging -> >>>>> http://users.rcc.uchicago.edu/~yadunand/run012_minimal_plus_log4j/run012.log >>>>> ( 845.4 kb ) >>>>> & log4j properties modified >>>>> >>>>> The run with minimal logging ran for ~15mins while the first run took >>>>> ~12mins. That *might* explain why the one >>>>> with minimal logging is larger. >>>>> >>>>> Thanks, >>>>> Yadu >>>>> >>>>> On 06/11/2014 02:54 PM, Mihael Hategan wrote: >>>>>> On Wed, 2014-06-11 at 14:46 -0500, Yadu Nand Babuji wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I'm running a proxy app for a user with 6000 tasks each taking a few >>>>>>> milliseconds, and the log sizes are unusually large. When the tasks are >>>>>>> set to take 20s, the total log size reaches ~7Gb. >>>>>> 7GB? Wow. I'd like to see that. Can you upload and post link? >>>>>> >>>>>>> I tried setting -minimal.logging and -reduced.logging but still see >>>>>>> debug lines in the log like: >>>>>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu worker started: >>>>>>> block=0611-1507050-000000 host=midway461 id=10 >>>>>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu ready for work: >>>>>>> block=0611-1507050-000000 id=10 >>>>>>> >>>>>>> Do we need DEBUG lines such as the ones listed above ? Is it reasonable >>>>>>> to have these set by default to WARN ? >>>>>> It is until there's a problem and then people ask for the opposite. We >>>>>> should evaluate whether this belongs in reduced logging or not. But does >>>>>> that really account for most of the 7G? >>>>>> >>>>>>> Secondly, setting -minimal.logging did not turn off these DEBUG lines >>>>>>> and I had to set the following log4j.properties from DEBUG to WARN to >>>>>>> remove most of the offending lines: >>>>>> That sounds like a bug. >>>>>> >>>>>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=WARN >>>>>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block=WARN >>>>>>> >>>>>>> With 6001 tasks, each taking 2 ms or so: >>>>>>> Swift without any changes to logging -> 440879 lines and 51Mb >>>>>>> Swift with -minimal.logging -> 83350 lines and 9.5Mb >>>>>>> Swift with -minimal.logging and -> 7625 lines and 791Kb >>>>>>> Cpu, Block log4j properties set >>>>>>> >>>>>>> Thanks, >>>>>>> Yadu >>>>>>> _______________________________________________ >>>>>>> Swift-devel mailing list >>>>>>> Swift-devel at ci.uchicago.edu >>>>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Thu Jun 12 12:58:22 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 12 Jun 2014 10:58:22 -0700 Subject: [Swift-devel] Reducing swift log size In-Reply-To: <5399DF30.6040708@uchicago.edu> References: <5398B211.8060403@uchicago.edu> <1402516497.26970.3.camel@echo> <5398DED2.70808@uchicago.edu> <1402531386.29962.1.camel@echo> <5398F844.4050607@uchicago.edu> <1402555539.11363.1.camel@echo> <1402557767.11363.8.camel@echo> <5399DF30.6040708@uchicago.edu> Message-ID: <1402595902.15226.6.camel@echo> Ah! Well, so the trick with two coaster services on localhost doesn't really work well unless you use "ssh:", and this is a good example why. In your case you can avoid it easily if you change your first pool to use the local provider instead of the coaster provider, since you don't really need coasters to run locally. Mihael On Thu, 2014-06-12 at 12:11 -0500, Yadu Nand Babuji wrote: > Hi Mihael, > > Here's the package I'm running: > http://users.rcc.uchicago.edu/~yadunand/swift-0.95-modded.tar.gz > (It has Cpu.java modified, as well as as a debugging line in > libexec/swift-int-staging.k) > > I shutdown the run once the logs were past 4gb, here's the link to the > log : > http://users.rcc.uchicago.edu/~yadunand/run013_with_mods_to_Cpujava.tar.gz > > Thanks, > -Yadu > > On 06/11/2014 07:03 PM, Mihael Hategan wrote: > >>>> Sorry, I should have mentioned this, but can you please gzip them? It's > >>>> a bit much otherwise. > >>>> > >>>> Mihael > >>>> > >>>> On Wed, 2014-06-11 at 17:57 -0500, Yadu Nand Babuji wrote: > >>>>> Hi Mihael, > >>>>> > >>>>> I've got the logs for you. > >>>>> > >>>>> This time, i've run the 6001 tasks with 20s delay added, and was all run > >>>>> with swift-0.95-RC6 (from our website) : > >>>>> > >>>>> Normal run -> > >>>>> http://users.rcc.uchicago.edu/~yadunand/run010_swift_normal/run010.log ( > >>>>> 6.1gb ) > >>>>> With minimal logging -> > >>>>> http://users.rcc.uchicago.edu/~yadunand/run011_minimal_logging/run011.log ( > >>>>> 7.5 gb ) > >>>>> With minimal logging -> > >>>>> http://users.rcc.uchicago.edu/~yadunand/run012_minimal_plus_log4j/run012.log > >>>>> ( 845.4 kb ) > >>>>> & log4j properties modified > >>>>> > >>>>> The run with minimal logging ran for ~15mins while the first run took > >>>>> ~12mins. That *might* explain why the one > >>>>> with minimal logging is larger. > >>>>> > >>>>> Thanks, > >>>>> Yadu > >>>>> > >>>>> On 06/11/2014 02:54 PM, Mihael Hategan wrote: > >>>>>> On Wed, 2014-06-11 at 14:46 -0500, Yadu Nand Babuji wrote: > >>>>>>> Hi, > >>>>>>> > >>>>>>> I'm running a proxy app for a user with 6000 tasks each taking a few > >>>>>>> milliseconds, and the log sizes are unusually large. When the tasks are > >>>>>>> set to take 20s, the total log size reaches ~7Gb. > >>>>>> 7GB? Wow. I'd like to see that. Can you upload and post link? > >>>>>> > >>>>>>> I tried setting -minimal.logging and -reduced.logging but still see > >>>>>>> debug lines in the log like: > >>>>>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu worker started: > >>>>>>> block=0611-1507050-000000 host=midway461 id=10 > >>>>>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu ready for work: > >>>>>>> block=0611-1507050-000000 id=10 > >>>>>>> > >>>>>>> Do we need DEBUG lines such as the ones listed above ? Is it reasonable > >>>>>>> to have these set by default to WARN ? > >>>>>> It is until there's a problem and then people ask for the opposite. We > >>>>>> should evaluate whether this belongs in reduced logging or not. But does > >>>>>> that really account for most of the 7G? > >>>>>> > >>>>>>> Secondly, setting -minimal.logging did not turn off these DEBUG lines > >>>>>>> and I had to set the following log4j.properties from DEBUG to WARN to > >>>>>>> remove most of the offending lines: > >>>>>> That sounds like a bug. > >>>>>> > >>>>>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=WARN > >>>>>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block=WARN > >>>>>>> > >>>>>>> With 6001 tasks, each taking 2 ms or so: > >>>>>>> Swift without any changes to logging -> 440879 lines and 51Mb > >>>>>>> Swift with -minimal.logging -> 83350 lines and 9.5Mb > >>>>>>> Swift with -minimal.logging and -> 7625 lines and 791Kb > >>>>>>> Cpu, Block log4j properties set > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Yadu > >>>>>>> _______________________________________________ > >>>>>>> Swift-devel mailing list > >>>>>>> Swift-devel at ci.uchicago.edu > >>>>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > From wilde at anl.gov Thu Jun 12 13:02:00 2014 From: wilde at anl.gov (Michael Wilde) Date: Thu, 12 Jun 2014 13:02:00 -0500 Subject: [Swift-devel] Reducing swift log size In-Reply-To: <1402595902.15226.6.camel@echo> References: <5398B211.8060403@uchicago.edu> <1402516497.26970.3.camel@echo> <5398DED2.70808@uchicago.edu> <1402531386.29962.1.camel@echo> <5398F844.4050607@uchicago.edu> <1402555539.11363.1.camel@echo> <1402557767.11363.8.camel@echo> <5399DF30.6040708@uchicago.edu> <1402595902.15226.6.camel@echo> Message-ID: <5399EB18.5050104@anl.gov> In general we're moving to running coasters in all configurations (in part to reduce the number of configurations to explain and test). Yadu's also looking at using provider staging shared-filesystem mode to avoid un-necessary staging for local filesystems. Can you explain the connection between this and the excessive logging? Can that be fixed rather than resorting to an alternate provider? - Mike On 6/12/14, 12:58 PM, Mihael Hategan wrote: > Ah! > > Well, so the trick with two coaster services on localhost doesn't really > work well unless you use "ssh:", and this is a good example why. > > In your case you can avoid it easily if you change your first pool to > use the local provider instead of the coaster provider, since you don't > really need coasters to run locally. > > Mihael > > > On Thu, 2014-06-12 at 12:11 -0500, Yadu Nand Babuji wrote: >> Hi Mihael, >> >> Here's the package I'm running: >> http://users.rcc.uchicago.edu/~yadunand/swift-0.95-modded.tar.gz >> (It has Cpu.java modified, as well as as a debugging line in >> libexec/swift-int-staging.k) >> >> I shutdown the run once the logs were past 4gb, here's the link to the >> log : >> http://users.rcc.uchicago.edu/~yadunand/run013_with_mods_to_Cpujava.tar.gz >> >> Thanks, >> -Yadu >> >> On 06/11/2014 07:03 PM, Mihael Hategan wrote: >>>>>> Sorry, I should have mentioned this, but can you please gzip them? It's >>>>>> a bit much otherwise. >>>>>> >>>>>> Mihael >>>>>> >>>>>> On Wed, 2014-06-11 at 17:57 -0500, Yadu Nand Babuji wrote: >>>>>>> Hi Mihael, >>>>>>> >>>>>>> I've got the logs for you. >>>>>>> >>>>>>> This time, i've run the 6001 tasks with 20s delay added, and was all run >>>>>>> with swift-0.95-RC6 (from our website) : >>>>>>> >>>>>>> Normal run -> >>>>>>> http://users.rcc.uchicago.edu/~yadunand/run010_swift_normal/run010.log ( >>>>>>> 6.1gb ) >>>>>>> With minimal logging -> >>>>>>> http://users.rcc.uchicago.edu/~yadunand/run011_minimal_logging/run011.log ( >>>>>>> 7.5 gb ) >>>>>>> With minimal logging -> >>>>>>> http://users.rcc.uchicago.edu/~yadunand/run012_minimal_plus_log4j/run012.log >>>>>>> ( 845.4 kb ) >>>>>>> & log4j properties modified >>>>>>> >>>>>>> The run with minimal logging ran for ~15mins while the first run took >>>>>>> ~12mins. That *might* explain why the one >>>>>>> with minimal logging is larger. >>>>>>> >>>>>>> Thanks, >>>>>>> Yadu >>>>>>> >>>>>>> On 06/11/2014 02:54 PM, Mihael Hategan wrote: >>>>>>>> On Wed, 2014-06-11 at 14:46 -0500, Yadu Nand Babuji wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I'm running a proxy app for a user with 6000 tasks each taking a few >>>>>>>>> milliseconds, and the log sizes are unusually large. When the tasks are >>>>>>>>> set to take 20s, the total log size reaches ~7Gb. >>>>>>>> 7GB? Wow. I'd like to see that. Can you upload and post link? >>>>>>>> >>>>>>>>> I tried setting -minimal.logging and -reduced.logging but still see >>>>>>>>> debug lines in the log like: >>>>>>>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu worker started: >>>>>>>>> block=0611-1507050-000000 host=midway461 id=10 >>>>>>>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu ready for work: >>>>>>>>> block=0611-1507050-000000 id=10 >>>>>>>>> >>>>>>>>> Do we need DEBUG lines such as the ones listed above ? Is it reasonable >>>>>>>>> to have these set by default to WARN ? >>>>>>>> It is until there's a problem and then people ask for the opposite. We >>>>>>>> should evaluate whether this belongs in reduced logging or not. But does >>>>>>>> that really account for most of the 7G? >>>>>>>> >>>>>>>>> Secondly, setting -minimal.logging did not turn off these DEBUG lines >>>>>>>>> and I had to set the following log4j.properties from DEBUG to WARN to >>>>>>>>> remove most of the offending lines: >>>>>>>> That sounds like a bug. >>>>>>>> >>>>>>>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=WARN >>>>>>>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block=WARN >>>>>>>>> >>>>>>>>> With 6001 tasks, each taking 2 ms or so: >>>>>>>>> Swift without any changes to logging -> 440879 lines and 51Mb >>>>>>>>> Swift with -minimal.logging -> 83350 lines and 9.5Mb >>>>>>>>> Swift with -minimal.logging and -> 7625 lines and 791Kb >>>>>>>>> Cpu, Block log4j properties set >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Yadu >>>>>>>>> _______________________________________________ >>>>>>>>> Swift-devel mailing list >>>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago From hategan at mcs.anl.gov Thu Jun 12 13:36:44 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 12 Jun 2014 11:36:44 -0700 Subject: [Swift-devel] Reducing swift log size In-Reply-To: <5399EB18.5050104@anl.gov> References: <5398B211.8060403@uchicago.edu> <1402516497.26970.3.camel@echo> <5398DED2.70808@uchicago.edu> <1402531386.29962.1.camel@echo> <5398F844.4050607@uchicago.edu> <1402555539.11363.1.camel@echo> <1402557767.11363.8.camel@echo> <5399DF30.6040708@uchicago.edu> <1402595902.15226.6.camel@echo> <5399EB18.5050104@anl.gov> Message-ID: <1402598204.15763.13.camel@echo> On Thu, 2014-06-12 at 13:02 -0500, Michael Wilde wrote: > In general we're moving to running coasters in all configurations (in > part to reduce the number of configurations to explain and test). Right. Although we could default to the local provider for local things. > > Yadu's also looking at using provider staging shared-filesystem mode to > avoid un-necessary staging for local filesystems. > > Can you explain the connection between this and the excessive logging? > Can that be fixed rather than resorting to an alternate provider? Local coaster services run in the same JVM. So static variables are the same in multiple instances of local coaster services. The code was written with the assumption that there would be one service per JVM, a scenario that we didn't think we would deviate from a few years ago. The job to worker node submission scheme is made up of a thread that looks at queued jobs and matches them with free workers. This runs in a loop that polls both the job queue and the worker queue. It is, however, possible for workers to be available that cannot fit any of the queued jobs due to walltime constraints. So you don't want to loop constantly in that case. The good news is that if a worker cannot run a queued job now due to time constraints, it will never be able to. So unless a new job with a smaller walltime comes in, you can safely assume that you don't need to bother waking up said worker. This is achieved using a sequence number. The job queue keeps one and changes it monotonically when new jobs come in. Sleeping workers take a snapshot of that and are only awaken if it differs from the one in the job queue (i.e. new jobs came in since we last figured that this worker cannot run any of the already queued jobs). The problem is that there are two job queues, one for each coaster service. But the code only looks at one static instance of them when checking whether a worker should be awaken. So the worker gets a low sequence number from the right job queue, but then it checks it against the other job queue, which has a higher sequence number. So it gets awaken. Then it gets put to sleep because it has nothing to run really. Anyway, there are two things that should be fixed there: the static variables and this should be made threadless. Mihael From yadunand at uchicago.edu Thu Jun 12 16:50:51 2014 From: yadunand at uchicago.edu (Yadu Nand Babuji) Date: Thu, 12 Jun 2014 16:50:51 -0500 Subject: [Swift-devel] Reducing swift log size In-Reply-To: <1402598204.15763.13.camel@echo> References: <5398B211.8060403@uchicago.edu> <1402516497.26970.3.camel@echo> <5398DED2.70808@uchicago.edu> <1402531386.29962.1.camel@echo> <5398F844.4050607@uchicago.edu> <1402555539.11363.1.camel@echo> <1402557767.11363.8.camel@echo> <5399DF30.6040708@uchicago.edu> <1402595902.15226.6.camel@echo> <5399EB18.5050104@anl.gov> <1402598204.15763.13.camel@echo> Message-ID: <539A20BB.7050705@uchicago.edu> Just an update. I ran the same configs as earlier, 6001 tasks, each taking 20s with swift-0.95-RC6 and set just local pool to use ssh-cl:local and the logs are only 42Mb. This is without minimal or reduced logging set. -Yadu On 06/12/2014 01:36 PM, Mihael Hategan wrote: > On Thu, 2014-06-12 at 13:02 -0500, Michael Wilde wrote: >> In general we're moving to running coasters in all configurations (in >> part to reduce the number of configurations to explain and test). > Right. Although we could default to the local provider for local things. > >> Yadu's also looking at using provider staging shared-filesystem mode to >> avoid un-necessary staging for local filesystems. >> >> Can you explain the connection between this and the excessive logging? >> Can that be fixed rather than resorting to an alternate provider? > Local coaster services run in the same JVM. So static variables are the > same in multiple instances of local coaster services. The code was > written with the assumption that there would be one service per JVM, a > scenario that we didn't think we would deviate from a few years ago. > > The job to worker node submission scheme is made up of a thread that > looks at queued jobs and matches them with free workers. This runs in a > loop that polls both the job queue and the worker queue. It is, however, > possible for workers to be available that cannot fit any of the queued > jobs due to walltime constraints. So you don't want to loop constantly > in that case. > > The good news is that if a worker cannot run a queued job now due to > time constraints, it will never be able to. So unless a new job with a > smaller walltime comes in, you can safely assume that you don't need to > bother waking up said worker. > > This is achieved using a sequence number. The job queue keeps one and > changes it monotonically when new jobs come in. Sleeping workers take a > snapshot of that and are only awaken if it differs from the one in the > job queue (i.e. new jobs came in since we last figured that this worker > cannot run any of the already queued jobs). > > The problem is that there are two job queues, one for each coaster > service. But the code only looks at one static instance of them when > checking whether a worker should be awaken. So the worker gets a low > sequence number from the right job queue, but then it checks it against > the other job queue, which has a higher sequence number. So it gets > awaken. Then it gets put to sleep because it has nothing to run really. > > Anyway, there are two things that should be fixed there: the static > variables and this should be made threadless. > > Mihael > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Thu Jun 12 17:09:18 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 12 Jun 2014 15:09:18 -0700 Subject: [Swift-devel] Reducing swift log size In-Reply-To: <539A20BB.7050705@uchicago.edu> References: <5398B211.8060403@uchicago.edu> <1402516497.26970.3.camel@echo> <5398DED2.70808@uchicago.edu> <1402531386.29962.1.camel@echo> <5398F844.4050607@uchicago.edu> <1402555539.11363.1.camel@echo> <1402557767.11363.8.camel@echo> <5399DF30.6040708@uchicago.edu> <1402595902.15226.6.camel@echo> <5399EB18.5050104@anl.gov> <1402598204.15763.13.camel@echo> <539A20BB.7050705@uchicago.edu> Message-ID: <1402610958.19138.1.camel@echo> Thanks for the update! On Thu, 2014-06-12 at 16:50 -0500, Yadu Nand Babuji wrote: > Just an update. > > I ran the same configs as earlier, 6001 tasks, each taking 20s with > swift-0.95-RC6 > and set just local pool to use ssh-cl:local You really really really don't need coasters to run stuff on localhost. Mihael From yadudoc1729 at gmail.com Thu Jun 12 18:29:13 2014 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Thu, 12 Jun 2014 18:29:13 -0500 Subject: [Swift-devel] Reducing swift log size In-Reply-To: <1402610958.19138.1.camel@echo> References: <5398B211.8060403@uchicago.edu> <1402516497.26970.3.camel@echo> <5398DED2.70808@uchicago.edu> <1402531386.29962.1.camel@echo> <5398F844.4050607@uchicago.edu> <1402555539.11363.1.camel@echo> <1402557767.11363.8.camel@echo> <5399DF30.6040708@uchicago.edu> <1402595902.15226.6.camel@echo> <5399EB18.5050104@anl.gov> <1402598204.15763.13.camel@echo> <539A20BB.7050705@uchicago.edu> <1402610958.19138.1.camel@echo> Message-ID: Okay, in the (off-list) mail to greg, I've given tested configs for running on local using the local provider : 10000 00:20:00 3600 file */tmp/*{env.USER}/swiftwork And, it works pretty well! On Thu, Jun 12, 2014 at 5:09 PM, Mihael Hategan wrote: > Thanks for the update! > > On Thu, 2014-06-12 at 16:50 -0500, Yadu Nand Babuji wrote: > > Just an update. > > > > I ran the same configs as earlier, 6001 tasks, each taking 20s with > > swift-0.95-RC6 > > and set just local pool to use ssh-cl:local > > You really really really don't need coasters to run stuff on localhost. > > Mihael > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -- Yadu Nand B -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Sun Jun 22 19:40:51 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 22 Jun 2014 17:40:51 -0700 Subject: [Swift-devel] FQNs use in Swift Message-ID: <1403484051.20517.17.camel@echo> Hi, What are your general feelings toward namespaces in various places in swift (e.g. tc.data, sites.xml)? Do you like them? Think they are necessary? Would like to see them gone? Mihael From wilde at anl.gov Mon Jun 23 08:41:00 2014 From: wilde at anl.gov (Michael Wilde) Date: Mon, 23 Jun 2014 08:41:00 -0500 Subject: [Swift-devel] FQNs use in Swift In-Reply-To: <1403484051.20517.17.camel@echo> References: <1403484051.20517.17.camel@echo> Message-ID: <53A82E6C.8000808@anl.gov> I don't think they matter in tc.data and sites.xml, since with the new config mechanism in 0.95 these files should seldom be visible to users. I think namespaces might more important in the language itself, to support a richer package model for script libraries. But neither is high priority at the moment. I feel we should leave the namespaces in tc and sites as-is for now. - Mike On 6/22/14, 7:40 PM, Mihael Hategan wrote: > Hi, > > What are your general feelings toward namespaces in various places in > swift (e.g. tc.data, sites.xml)? Do you like them? Think they are > necessary? Would like to see them gone? > > Mihael > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago From hategan at mcs.anl.gov Mon Jun 23 10:14:56 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 23 Jun 2014 08:14:56 -0700 Subject: [Swift-devel] FQNs use in Swift In-Reply-To: <53A82E6C.8000808@anl.gov> References: <1403484051.20517.17.camel@echo> <53A82E6C.8000808@anl.gov> Message-ID: <1403536496.31463.12.camel@echo> The reason I'm asking is because I'm trying to fix the various coaster configuration problems: - the need to have a "pilot" job to set jobsPerNode - the inability to change settings on a persistent service after the first run - the problems with multiple sites on the same host The way we do things now is to pass these settings through site/app/dynamic profiles which all get mashed into task attributes (though some selection happens based on namespace). It's hard to efficiently check if settings are different between tasks, since you have to look at all attribute values and compare, for each task. My plan was to make a cleaner separation. There should be site attributes (such as jobThrottle), provider attributes (e.g. slots), and job attributes (maxwalltime). Each would go into the corresponding XML node. So jobThrottle would be a child of , slots would be a child of , and walltime would be a child of , which would now be defined in sites.xml instead of tc.data. This eliminates the need for namespaces as a (poor - the name "karajan" does not belong in sites.xml) indicator of what should go where. But the more important thing is that once an execution provider for a site is defined, you know that the settings for that are not going to change. So you can use the actual instance to distinguish between different settings. This makes it much easier to support multiple coaster configurations. And yes, David's configuration system makes this less relevant from a user's perspective, but that just part of it. So this makes FQNs an annoyance in sites.xml, so the only place where they remain is for app names. But then we don't use them there. We name things "cat", not "system::cat", and I have heard nobody so far trying to use the latter. That's why I asked, but wanted to make it short. Mihael On Mon, 2014-06-23 at 08:41 -0500, Michael Wilde wrote: > I don't think they matter in tc.data and sites.xml, since with the new > config mechanism in 0.95 these files should seldom be visible to users. > > I think namespaces might more important in the language itself, to > support a richer package model for script libraries. > > But neither is high priority at the moment. I feel we should leave the > namespaces in tc and sites as-is for now. > > - Mike > > On 6/22/14, 7:40 PM, Mihael Hategan wrote: > > Hi, > > > > What are your general feelings toward namespaces in various places in > > swift (e.g. tc.data, sites.xml)? Do you like them? Think they are > > necessary? Would like to see them gone? > > > > Mihael > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > From wilde at anl.gov Mon Jun 23 11:00:00 2014 From: wilde at anl.gov (Michael Wilde) Date: Mon, 23 Jun 2014 11:00:00 -0500 Subject: [Swift-devel] FQNs use in Swift In-Reply-To: <1403536496.31463.12.camel@echo> References: <1403484051.20517.17.camel@echo> <53A82E6C.8000808@anl.gov> <1403536496.31463.12.camel@echo> Message-ID: <53A84F00.3040808@anl.gov> OK, I see what you want to do here, and why. What you're proposing will make the internals cleaner, and would be a chance to harmonize the user-visible and internal property names. If we do this now, in trunk, presumably 0.96 will have the new names. So that would put a stake in the ground for conversion or all users to the new config mechanism. What should we do for backwards compatibility? My inclination would be to provide a tool to convert sites.xml + tc.data into the new config format, and urge users to convert. Whats the development time needed for this? Will it make code maintenance and enhancement easier? Currently, finding property values (eg, within provider code) has been a surprisingly large obstacle to provider enhancement and support. If this fixes that problem (which also requires developer documentation) it will be worthwhile, if its affordable. - Mike On 6/23/14, 10:14 AM, Mihael Hategan wrote: > The reason I'm asking is because I'm trying to fix the various coaster > configuration problems: > - the need to have a "pilot" job to set jobsPerNode > - the inability to change settings on a persistent service after the > first run > - the problems with multiple sites on the same host > > The way we do things now is to pass these settings through > site/app/dynamic profiles which all get mashed into task attributes > (though some selection happens based on namespace). It's hard to > efficiently check if settings are different between tasks, since you > have to look at all attribute values and compare, for each task. > > My plan was to make a cleaner separation. There should be site > attributes (such as jobThrottle), provider attributes (e.g. slots), and > job attributes (maxwalltime). Each would go into the corresponding XML > node. So jobThrottle would be a child of , slots would be a child > of , and walltime would be a child of , which > would now be defined in sites.xml instead of tc.data. > > This eliminates the need for namespaces as a (poor - the name "karajan" > does not belong in sites.xml) indicator of what should go where. But the > more important thing is that once an execution provider for a site is > defined, you know that the settings for that are not going to change. So > you can use the actual instance to distinguish between different > settings. This makes it much easier to support multiple coaster > configurations. > > And yes, David's configuration system makes this less relevant from a > user's perspective, but that just part of it. > > So this makes FQNs an annoyance in sites.xml, so the only place where > they remain is for app names. But then we don't use them there. We name > things "cat", not "system::cat", and I have heard nobody so far trying > to use the latter. That's why I asked, but wanted to make it short. > > Mihael > > On Mon, 2014-06-23 at 08:41 -0500, Michael Wilde wrote: >> I don't think they matter in tc.data and sites.xml, since with the new >> config mechanism in 0.95 these files should seldom be visible to users. >> >> I think namespaces might more important in the language itself, to >> support a richer package model for script libraries. >> >> But neither is high priority at the moment. I feel we should leave the >> namespaces in tc and sites as-is for now. >> >> - Mike >> >> On 6/22/14, 7:40 PM, Mihael Hategan wrote: >>> Hi, >>> >>> What are your general feelings toward namespaces in various places in >>> swift (e.g. tc.data, sites.xml)? Do you like them? Think they are >>> necessary? Would like to see them gone? >>> >>> Mihael >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago From hategan at mcs.anl.gov Mon Jun 23 12:10:10 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 23 Jun 2014 10:10:10 -0700 Subject: [Swift-devel] FQNs use in Swift In-Reply-To: <53A84F00.3040808@anl.gov> References: <1403484051.20517.17.camel@echo> <53A82E6C.8000808@anl.gov> <1403536496.31463.12.camel@echo> <53A84F00.3040808@anl.gov> Message-ID: <1403543410.32109.11.camel@echo> On Mon, 2014-06-23 at 11:00 -0500, Michael Wilde wrote: > OK, I see what you want to do here, and why. > > What you're proposing will make the internals cleaner, and would be a > chance to harmonize the user-visible and internal property names. > > If we do this now, in trunk, presumably 0.96 will have the new names. > So that would put a stake in the ground for conversion or all users to > the new config mechanism. > > What should we do for backwards compatibility? I was asking myself the same question. Initially, I wanted to allow both old and new configurations, and translate internally, but I believe that would make the code messy for what is essentially a one time operation for a limited number of users (due to the new config mechanism)... > My inclination would be > to provide a tool to convert sites.xml + tc.data into the new config > format, and urge users to convert. ... so I reasoned that we could do just what you say above. > > Whats the development time needed for this? Small-ish. I did most of it yesterday. I was under the impression that I was fixing the coaster stuff, but it led me here. Let me stress again, the coaster stuff really needs fixing. This is, to me, acceptable collateral damage. > > Will it make code maintenance and enhancement easier? That's my take on it, although that code isn't touched much. > > Currently, finding property values (eg, within provider code) has been a > surprisingly large obstacle to provider enhancement and support. If this > fixes that problem (which also requires developer documentation) it will > be worthwhile, if its affordable. I was thinking about that too. It is related in that provider properties are now separate from other task properties, so it would be easier to add some API to get a list of what each provider supports and to use that to validate sites files without having to keep a separate account of provider properties. Mihael PS: While we're at it, jobThrottle and initialScore are being "replaced" with maxParallelJobs and initialParallelJobs. From wilde at anl.gov Mon Jun 23 12:29:21 2014 From: wilde at anl.gov (Michael Wilde) Date: Mon, 23 Jun 2014 12:29:21 -0500 Subject: [Swift-devel] FQNs use in Swift In-Reply-To: <1403543410.32109.11.camel@echo> References: <1403484051.20517.17.camel@echo> <53A82E6C.8000808@anl.gov> <1403536496.31463.12.camel@echo> <53A84F00.3040808@anl.gov> <1403543410.32109.11.camel@echo> Message-ID: <53A863F1.9040008@anl.gov> This all sounds good, so best to keep going. But regarding maxParallelJobs and initialParallelJobs, lets keep the names in sync with 0.95. There we used the term "task" to indicate a Swift function invocation (usually an app task) and "job" to mean a site resource manager job. - Mike On 6/23/14, 12:10 PM, Mihael Hategan wrote: > On Mon, 2014-06-23 at 11:00 -0500, Michael Wilde wrote: >> OK, I see what you want to do here, and why. >> >> What you're proposing will make the internals cleaner, and would be a >> chance to harmonize the user-visible and internal property names. >> >> If we do this now, in trunk, presumably 0.96 will have the new names. >> So that would put a stake in the ground for conversion or all users to >> the new config mechanism. >> >> What should we do for backwards compatibility? > I was asking myself the same question. Initially, I wanted to allow both > old and new configurations, and translate internally, but I believe that > would make the code messy for what is essentially a one time operation > for a limited number of users (due to the new config mechanism)... > >> My inclination would be >> to provide a tool to convert sites.xml + tc.data into the new config >> format, and urge users to convert. > ... so I reasoned that we could do just what you say above. > >> Whats the development time needed for this? > Small-ish. I did most of it yesterday. I was under the impression that I > was fixing the coaster stuff, but it led me here. Let me stress again, > the coaster stuff really needs fixing. This is, to me, acceptable > collateral damage. > >> Will it make code maintenance and enhancement easier? > That's my take on it, although that code isn't touched much. > >> Currently, finding property values (eg, within provider code) has been a >> surprisingly large obstacle to provider enhancement and support. If this >> fixes that problem (which also requires developer documentation) it will >> be worthwhile, if its affordable. > I was thinking about that too. It is related in that provider properties > are now separate from other task properties, so it would be easier to > add some API to get a list of what each provider supports and to use > that to validate sites files without having to keep a separate account > of provider properties. > > Mihael > > PS: While we're at it, jobThrottle and initialScore are being "replaced" > with maxParallelJobs and initialParallelJobs. > -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago From yadunand at uchicago.edu Fri Jun 27 10:47:40 2014 From: yadunand at uchicago.edu (Yadu Nand Babuji) Date: Fri, 27 Jun 2014 10:47:40 -0500 Subject: [Swift-devel] Swift on Google compute engine [Request for comments] Message-ID: <53AD921C.6050807@uchicago.edu> Hi Everyone, Please try the Google Compute Engine setup and tutorial, I've linked below. This will require credit card information and will bill you approximately $0.065/Hr with the default config of 5 micro instances. Cloud setup online doc: https://github.com/yadudoc/swift-on-cloud/tree/master/compute-engine Git repo to clone: https://github.com/yadudoc/swift-on-cloud.git Once setup is done, do "connect headnode" and you can run the swift-cloud-tutorial under your $HOME/swift-on-cloud/swift-cloud-tutorial folder on the headnode instance. Feedback would be much appreciated. PS: Remember to shut-down instances using the "dissolve" command. Thanks! Yadu From hategan at mcs.anl.gov Fri Jun 27 10:54:42 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 27 Jun 2014 08:54:42 -0700 Subject: [Swift-devel] Swift on Google compute engine [Request for comments] In-Reply-To: <53AD921C.6050807@uchicago.edu> References: <53AD921C.6050807@uchicago.edu> Message-ID: <1403884482.29142.2.camel@echo> Nice! One suggestion is that we probably should encourage dropping the @ in front of functions. Mihael On Fri, 2014-06-27 at 10:47 -0500, Yadu Nand Babuji wrote: > Hi Everyone, > > Please try the Google Compute Engine setup and tutorial, I've linked below. > This will require credit card information and will bill you > approximately $0.065/Hr > with the default config of 5 micro instances. > > Cloud setup online doc: > https://github.com/yadudoc/swift-on-cloud/tree/master/compute-engine > > Git repo to clone: > https://github.com/yadudoc/swift-on-cloud.git > > Once setup is done, do "connect headnode" and you can run the > swift-cloud-tutorial > under your $HOME/swift-on-cloud/swift-cloud-tutorial folder on the > headnode instance. > > Feedback would be much appreciated. > > PS: Remember to shut-down instances using the "dissolve" command. > > > Thanks! > Yadu > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From yadunand at uchicago.edu Fri Jun 27 11:08:47 2014 From: yadunand at uchicago.edu (Yadu Nand Babuji) Date: Fri, 27 Jun 2014 11:08:47 -0500 Subject: [Swift-devel] Swift on Google compute engine [Request for comments] In-Reply-To: <1403884482.29142.2.camel@echo> References: <53AD921C.6050807@uchicago.edu> <1403884482.29142.2.camel@echo> Message-ID: <53AD970F.2080501@uchicago.edu> Thanks! I'm guessing that you are talking about the README in swift-cloud-tutorial. I've corrected the doc, and the code was already not using any @ prefix for functions. -Yadu On 06/27/2014 10:54 AM, Mihael Hategan wrote: > Nice! > > One suggestion is that we probably should encourage dropping the @ in > front of functions. > > Mihael > > On Fri, 2014-06-27 at 10:47 -0500, Yadu Nand Babuji wrote: >> Hi Everyone, >> >> Please try the Google Compute Engine setup and tutorial, I've linked below. >> This will require credit card information and will bill you >> approximately $0.065/Hr >> with the default config of 5 micro instances. >> >> Cloud setup online doc: >> https://github.com/yadudoc/swift-on-cloud/tree/master/compute-engine >> >> Git repo to clone: >> https://github.com/yadudoc/swift-on-cloud.git >> >> Once setup is done, do "connect headnode" and you can run the >> swift-cloud-tutorial >> under your $HOME/swift-on-cloud/swift-cloud-tutorial folder on the >> headnode instance. >> >> Feedback would be much appreciated. >> >> PS: Remember to shut-down instances using the "dissolve" command. >> >> >> Thanks! >> Yadu >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > From foster at anl.gov Fri Jun 27 11:11:18 2014 From: foster at anl.gov (Foster, Ian T.) Date: Fri, 27 Jun 2014 16:11:18 +0000 Subject: [Swift-devel] Swift on Google compute engine [Request for comments] In-Reply-To: <53AD921C.6050807@uchicago.edu> References: <53AD921C.6050807@uchicago.edu> Message-ID: Very nice. I am curious as to the reason for running on google. No objection to it, I just don't have experience with it -- only google > On Jun 27, 2014, at 10:47 AM, "Yadu Nand Babuji" wrote: > > Hi Everyone, > > Please try the Google Compute Engine setup and tutorial, I've linked below. > This will require credit card information and will bill you > approximately $0.065/Hr > with the default config of 5 micro instances. > > Cloud setup online doc: > https://github.com/yadudoc/swift-on-cloud/tree/master/compute-engine > > Git repo to clone: > https://github.com/yadudoc/swift-on-cloud.git > > Once setup is done, do "connect headnode" and you can run the > swift-cloud-tutorial > under your $HOME/swift-on-cloud/swift-cloud-tutorial folder on the > headnode instance. > > Feedback would be much appreciated. > > PS: Remember to shut-down instances using the "dissolve" command. > > > Thanks! > Yadu > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From yadunand at uchicago.edu Fri Jun 27 12:00:52 2014 From: yadunand at uchicago.edu (Yadu Nand Babuji) Date: Fri, 27 Jun 2014 12:00:52 -0500 Subject: [Swift-devel] Swift on Google compute engine [Request for comments] In-Reply-To: References: <53AD921C.6050807@uchicago.edu> Message-ID: <53ADA344.6000807@uchicago.edu> Thanks! Google beats AWS in both Pricing and performance. Once you are past the initial run of setup.sh from the compute-engine setup, starting a 20 node cluster takes on less than a 1min. The initial run copies over images which takes time. AWS GCE Micro instance $0.020/Hr $0.013/Hr [1][2] Networking perf 135mbits/s 692mbits/s [3] Boot speed 1min+ < 30s [4] [1] https://developers.google.com/compute/pricing [2] http://aws.amazon.com/ec2/pricing/ [3] https://blog.serverdensity.com/network-performance-aws-google-rackspace-softlayer/ [4] http://gigaom.com/2013/03/15/by-the-numbers-how-google-compute-engine-stacks-up-to-amazon-ec2/ Thanks -Yadu On 06/27/2014 11:11 AM, Foster, Ian T. wrote: > Very nice. > > I am curious as to the reason for running on google. No objection to it, I just don't have experience with it -- only google > >> On Jun 27, 2014, at 10:47 AM, "Yadu Nand Babuji" wrote: >> >> Hi Everyone, >> >> Please try the Google Compute Engine setup and tutorial, I've linked below. >> This will require credit card information and will bill you >> approximately $0.065/Hr >> with the default config of 5 micro instances. >> >> Cloud setup online doc: >> https://github.com/yadudoc/swift-on-cloud/tree/master/compute-engine >> >> Git repo to clone: >> https://github.com/yadudoc/swift-on-cloud.git >> >> Once setup is done, do "connect headnode" and you can run the >> swift-cloud-tutorial >> under your $HOME/swift-on-cloud/swift-cloud-tutorial folder on the >> headnode instance. >> >> Feedback would be much appreciated. >> >> PS: Remember to shut-down instances using the "dissolve" command. >> >> >> Thanks! >> Yadu >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From foster at anl.gov Fri Jun 27 12:01:50 2014 From: foster at anl.gov (Ian Foster) Date: Fri, 27 Jun 2014 12:01:50 -0500 Subject: [Swift-devel] Swift on Google compute engine [Request for comments] In-Reply-To: <53ADA344.6000807@uchicago.edu> References: <53AD921C.6050807@uchicago.edu> <53ADA344.6000807@uchicago.edu> Message-ID: Nice. This would be a great use case for linking credit card info into Globus Nexus authentication. (Once we get to that.) On Jun 27, 2014, at 12:00 PM, Yadu Nand Babuji wrote: > Thanks! > > Google beats AWS in both Pricing and performance. Once you are past the initial run of setup.sh from > the compute-engine setup, starting a 20 node cluster takes on less than a 1min. The initial run copies > over images which takes time. > > AWS GCE > Micro instance $0.020/Hr $0.013/Hr [1][2] > Networking perf 135mbits/s 692mbits/s [3] > Boot speed 1min+ < 30s [4] > > [1] https://developers.google.com/compute/pricing > [2] http://aws.amazon.com/ec2/pricing/ > [3] https://blog.serverdensity.com/network-performance-aws-google-rackspace-softlayer/ > [4] http://gigaom.com/2013/03/15/by-the-numbers-how-google-compute-engine-stacks-up-to-amazon-ec2/ > > Thanks > -Yadu > > On 06/27/2014 11:11 AM, Foster, Ian T. wrote: >> Very nice. >> >> I am curious as to the reason for running on google. No objection to it, I just don't have experience with it -- only google >> >>> On Jun 27, 2014, at 10:47 AM, "Yadu Nand Babuji" wrote: >>> >>> Hi Everyone, >>> >>> Please try the Google Compute Engine setup and tutorial, I've linked below. >>> This will require credit card information and will bill you >>> approximately $0.065/Hr >>> with the default config of 5 micro instances. >>> >>> Cloud setup online doc: >>> https://github.com/yadudoc/swift-on-cloud/tree/master/compute-engine >>> >>> Git repo to clone: >>> https://github.com/yadudoc/swift-on-cloud.git >>> >>> Once setup is done, do "connect headnode" and you can run the >>> swift-cloud-tutorial >>> under your $HOME/swift-on-cloud/swift-cloud-tutorial folder on the >>> headnode instance. >>> >>> Feedback would be much appreciated. >>> >>> PS: Remember to shut-down instances using the "dissolve" command. >>> >>> >>> Thanks! >>> Yadu >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > From ketan at mcs.anl.gov Fri Jun 27 23:47:55 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Fri, 27 Jun 2014 23:47:55 -0500 Subject: [Swift-devel] Swift on Google compute engine [Request for comments] In-Reply-To: References: <53AD921C.6050807@uchicago.edu> <53ADA344.6000807@uchicago.edu> Message-ID: Google is adding new services to GCE by the day. Recently they announced a new programming model (as a replacement of MapReduce) called ... Cloud Dataflow. And some more services: http://www.datacenterknowledge.com/archives/2014/06/25/google-dumps-mapreduce-favor-new-hyper-scale-analytics-system -- Ketan On Fri, Jun 27, 2014 at 12:01 PM, Ian Foster wrote: > Nice. This would be a great use case for linking credit card info into > Globus Nexus authentication. (Once we get to that.) > > > On Jun 27, 2014, at 12:00 PM, Yadu Nand Babuji > wrote: > > > Thanks! > > > > Google beats AWS in both Pricing and performance. Once you are past the > initial run of setup.sh from > > the compute-engine setup, starting a 20 node cluster takes on less than > a 1min. The initial run copies > > over images which takes time. > > > > AWS GCE > > Micro instance $0.020/Hr > $0.013/Hr [1][2] > > Networking perf 135mbits/s 692mbits/s [3] > > Boot speed 1min+ < 30s [4] > > > > [1] https://developers.google.com/compute/pricing > > [2] http://aws.amazon.com/ec2/pricing/ > > [3] > https://blog.serverdensity.com/network-performance-aws-google-rackspace-softlayer/ > > [4] > http://gigaom.com/2013/03/15/by-the-numbers-how-google-compute-engine-stacks-up-to-amazon-ec2/ > > > > Thanks > > -Yadu > > > > On 06/27/2014 11:11 AM, Foster, Ian T. wrote: > >> Very nice. > >> > >> I am curious as to the reason for running on google. No objection to > it, I just don't have experience with it -- only google > >> > >>> On Jun 27, 2014, at 10:47 AM, "Yadu Nand Babuji" < > yadunand at uchicago.edu> wrote: > >>> > >>> Hi Everyone, > >>> > >>> Please try the Google Compute Engine setup and tutorial, I've linked > below. > >>> This will require credit card information and will bill you > >>> approximately $0.065/Hr > >>> with the default config of 5 micro instances. > >>> > >>> Cloud setup online doc: > >>> https://github.com/yadudoc/swift-on-cloud/tree/master/compute-engine > >>> > >>> Git repo to clone: > >>> https://github.com/yadudoc/swift-on-cloud.git > >>> > >>> Once setup is done, do "connect headnode" and you can run the > >>> swift-cloud-tutorial > >>> under your $HOME/swift-on-cloud/swift-cloud-tutorial folder on the > >>> headnode instance. > >>> > >>> Feedback would be much appreciated. > >>> > >>> PS: Remember to shut-down instances using the "dissolve" command. > >>> > >>> > >>> Thanks! > >>> Yadu > >>> > >>> _______________________________________________ > >>> Swift-devel mailing list > >>> Swift-devel at ci.uchicago.edu > >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: