From wilde at anl.gov  Sun Jun  1 23:55:03 2014
From: wilde at anl.gov (Michael Wilde)
Date: Sun, 1 Jun 2014 23:55:03 -0500
Subject: [Swift-devel] tryswift changes needed
Message-ID: <538C03A7.7070609@anl.gov>

Hi David,

If you have a moment, can you look at making the following changes/fixes:

- when you click Explain, it should bring the Explain HTML window to the 
top, if its obscured.

- For the Hello World app, it sometimes doesnt show the contents of 
out.txt in the output window. This has happened several times, but I 
cant yet see what causes it to happen.

- If you select an output file thats already obscured by the main 
window, it should also bring that output file to the top

- if you select File Outputs as your choice, you get a 404 for this URL: 
http://ec2-54-87-184-8.compute-1.amazonaws.com/File%20outputs

These are not urgent, but would be good to fix soon (or suggest how to fix them).

Thanks,

- Mike

-- 
Michael Wilde
Mathematics and Computer Science          Computation Institute
Argonne National Laboratory               The University of Chicago


From wilde at anl.gov  Mon Jun  2 00:21:12 2014
From: wilde at anl.gov (Michael Wilde)
Date: Mon, 2 Jun 2014 00:21:12 -0500
Subject: [Swift-devel] Swift web changes pending
In-Reply-To: <538A31A3.9030007@anl.gov>
References: <538A31A3.9030007@anl.gov>
Message-ID: <538C09C8.9080307@anl.gov>

I just pushed these changes live.  I had to manually update push_to.sh 
(David, is the list of pages supposed to be maintained automatically?  
Maybe we can drive that off of a find?)

Mihael just fixed ticket 1279. David will make 0.95 the "latest 
download" (and update the button on both Home and Download, as push 
0.94.X to the older releases page)?

Yadu, can you re-test this release when you are back online in Chicago?

Thanks,

- Mike


On 5/31/14, 2:46 PM, Michael Wilde wrote:
> Hi All,
>
> I added TrySwift and Swift/T to the Swift main page.  You can preview
> the changes at:
>
> http://web.ci.uchicago.edu/~wilde/www/main/
>
> The changes are committed to svn but not yet pushed to the main site.
>
> I added the GeMTC paper under "Whats New" after Swift/T.
>
> I created a new main directory Swift-T/ for Swift/T.  At the moment,
> this just forwards to the Google exm site Swift-T page.
>
> Justin: feel free to start integrating Swift/T content below this directory.
>
> You can check out the entire web below your public_html directory, test
> there, and commit.
>
> After youre done and committed, I'll update my test copy, check it out,
> and then push to the live site some time tomorrow.
>
> Im going to shift later to work on TrySwift text (probably not till
> tomorrow morning).
>
> Yadu is looking at creating a Local Host tutorial version that runs on
> Linux; hopefully same will run on Mac.
>
> Justin, Tim: do you know how to create a nice mac install package? Did
> you do so for Swift/T?
>
> Thanks,
>
> - Mike
>

-- 
Michael Wilde
Mathematics and Computer Science          Computation Institute
Argonne National Laboratory               The University of Chicago


From hategan at mcs.anl.gov  Mon Jun  2 00:23:54 2014
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Sun, 1 Jun 2014 22:23:54 -0700
Subject: [Swift-devel] Swift web changes pending
In-Reply-To: <538C09C8.9080307@anl.gov>
References: <538A31A3.9030007@anl.gov> <538C09C8.9080307@anl.gov>
Message-ID: <1401686634.26836.0.camel@echo>

On Mon, 2014-06-02 at 00:21 -0500, Michael Wilde wrote:
> I just pushed these changes live.  I had to manually update push_to.sh 
> (David, is the list of pages supposed to be maintained automatically?  
> Maybe we can drive that off of a find?)
> 
> Mihael just fixed ticket 1279. David will make 0.95 the "latest 
> download" (and update the button on both Home and Download, as push 
> 0.94.X to the older releases page)?
> 
> Yadu, can you re-test this release when you are back online in Chicago?

Yeah, do we have a general feel for 0.95? I lost track.

Mihael


From skrieder at iit.edu  Mon Jun  2 20:03:18 2014
From: skrieder at iit.edu (Scott Krieder)
Date: Mon, 2 Jun 2014 20:03:18 -0500
Subject: [Swift-devel] apple swift language
Message-ID: <CABp7gVzqpbZwZhAAfbB2JxeqSzADFyUL-68zM9Fs47z2w==8SA@mail.gmail.com>

swift-lang.org is probably worth a lot of money now!

http://www.cnet.com/news/apples-new-swift-coding-language-hopes-to-lock-down-errors/

-- 
Scott J. Krieder
C: 419-685-0410
E: skrieder at iit.edu
http://datasys.cs.iit.edu/~skrieder/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20140602/6627f5ae/attachment.html>

From tim.g.armstrong at gmail.com  Mon Jun  2 20:26:47 2014
From: tim.g.armstrong at gmail.com (Tim Armstrong)
Date: Mon, 2 Jun 2014 20:26:47 -0500
Subject: [Swift-devel] apple swift language
In-Reply-To: <CABp7gVzqpbZwZhAAfbB2JxeqSzADFyUL-68zM9Fs47z2w==8SA@mail.gmail.com>
References: <CABp7gVzqpbZwZhAAfbB2JxeqSzADFyUL-68zM9Fs47z2w==8SA@mail.gmail.com>
Message-ID: <CAC0jiV45ahPA7n1vHqVxpnbdqM6ejvf7+XoxrdJqbHZxEZ6VMQ@mail.gmail.com>

We had an off-list thread about this - the site went down due to load
pretty soon after it announced and only got back online thanks to David
Kelly moving it to a bunch of AWS servers.

- Tim


On Mon, Jun 2, 2014 at 8:03 PM, Scott Krieder <skrieder at iit.edu> wrote:

> swift-lang.org is probably worth a lot of money now!
>
>
> http://www.cnet.com/news/apples-new-swift-coding-language-hopes-to-lock-down-errors/
>
> --
> Scott J. Krieder
> C: 419-685-0410
> E: skrieder at iit.edu
> http://datasys.cs.iit.edu/~skrieder/
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20140602/5220c4b1/attachment.html>

From ketan at mcs.anl.gov  Mon Jun  2 22:56:37 2014
From: ketan at mcs.anl.gov (Ketan Maheshwari)
Date: Mon, 2 Jun 2014 22:56:37 -0500
Subject: [Swift-devel] apple swift language
In-Reply-To: <CAC0jiV45ahPA7n1vHqVxpnbdqM6ejvf7+XoxrdJqbHZxEZ6VMQ@mail.gmail.com>
References: <CABp7gVzqpbZwZhAAfbB2JxeqSzADFyUL-68zM9Fs47z2w==8SA@mail.gmail.com>
	<CAC0jiV45ahPA7n1vHqVxpnbdqM6ejvf7+XoxrdJqbHZxEZ6VMQ@mail.gmail.com>
Message-ID: <CAMUuviqbR4-QumF7BQ55Pyn5=K56vwuz30qsQ7NovP_Ks8M8HQ@mail.gmail.com>

Interesting looking features from this thread on Reddit:
http://www.reddit.com/r/programming/comments/274t5s/apple_swift_programming_language_unveiled

Statically typed with type inference.
Generics.
Closures.
No exceptions.
Extension methods.
Properties (syntax similar to C#), including lazy properties with the
"@lazy" annotation.
Functions, methods and type (static) methods.
Support for observers (with "willSet" and "didSet"). Interesting to see the
observer pattern baked in a language although I'm more partial to event
buses for this kind of thing.
Enums.
Classes and structures (structures have restrictions regarding inheritance
and other things).
For and while loops (statements, not expressions).
"mutating" keyword.
Named parameters.
Deinitializers (finalizers).
Protocols (interfaces).
Optional chaining with "a?.b?.c" and forced dereference with "!."".
Convenient "assign and test": "if let person = findPerson() ...".
Type casting with "is", down casting with "as?" (combines nicely with the
"let" syntax. Ceylon does it right too).


On Mon, Jun 2, 2014 at 8:26 PM, Tim Armstrong <tim.g.armstrong at gmail.com>
wrote:

> We had an off-list thread about this - the site went down due to load
> pretty soon after it announced and only got back online thanks to David
> Kelly moving it to a bunch of AWS servers.
>
> - Tim
>
>
> On Mon, Jun 2, 2014 at 8:03 PM, Scott Krieder <skrieder at iit.edu> wrote:
>
>> swift-lang.org is probably worth a lot of money now!
>>
>>
>> http://www.cnet.com/news/apples-new-swift-coding-language-hopes-to-lock-down-errors/
>>
>> --
>> Scott J. Krieder
>> C: 419-685-0410
>> E: skrieder at iit.edu
>> http://datasys.cs.iit.edu/~skrieder/
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>
>>
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20140602/d41c726e/attachment.html>

From yadudoc1729 at gmail.com  Wed Jun  4 17:13:01 2014
From: yadudoc1729 at gmail.com (Yadu Nand)
Date: Thu, 5 Jun 2014 03:43:01 +0530
Subject: [Swift-devel] Swift web changes pending
In-Reply-To: <1401686634.26836.0.camel@echo>
References: <538A31A3.9030007@anl.gov> <538C09C8.9080307@anl.gov>
	<1401686634.26836.0.camel@echo>
Message-ID: <CANa904nqggpeMjA=zKh79Ma2T+FZcMt0zAgDAc+TehnO4bLW+Q@mail.gmail.com>

Hi,

The last tests which ran Swift 0.95 Branch SVN swift-r7871 (swift modified
locally) cog-r3905 passed most tests.
There are tests which are failing including a remote testing failure on
frisbee (mac). Build is working.

I need to check into the modis test failures, which seem to be config
issues.

Here's a link to the results :
http://swift.rcc.uchicago.edu:8043/swift-0.95/run-2014-06-03-220931/tests-2014-06-03.html
Links from remote sites:
Link:
http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-login4.beagle.ci.uchicago.edu-220931/tests-2014-06-03.html
Link:
http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-midway001-220931/tests-2014-06-03.html
Link:
http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-thwomp-220931/tests-2014-06-03.html
Link:
http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-blogin1-220931/tests-2014-06-03.html
Link:
http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-flogin1-220931/tests-2014-06-03.html
Link:
http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-communicado.ci.uchicago.edu-220931/tests-2014-06-03.html

-Yadu


On Mon, Jun 2, 2014 at 10:53 AM, Mihael Hategan <hategan at mcs.anl.gov> wrote:

> On Mon, 2014-06-02 at 00:21 -0500, Michael Wilde wrote:
> > I just pushed these changes live.  I had to manually update push_to.sh
> > (David, is the list of pages supposed to be maintained automatically?
> > Maybe we can drive that off of a find?)
> >
> > Mihael just fixed ticket 1279. David will make 0.95 the "latest
> > download" (and update the button on both Home and Download, as push
> > 0.94.X to the older releases page)?
> >
> > Yadu, can you re-test this release when you are back online in Chicago?
>
> Yeah, do we have a general feel for 0.95? I lost track.
>
> Mihael
>
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>


-- 
Yadu Nand B
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20140605/47701b3a/attachment.html>

From hategan at mcs.anl.gov  Wed Jun  4 20:25:27 2014
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 4 Jun 2014 18:25:27 -0700
Subject: [Swift-devel] Swift web changes pending
In-Reply-To: <CANa904nqggpeMjA=zKh79Ma2T+FZcMt0zAgDAc+TehnO4bLW+Q@mail.gmail.com>
References: <538A31A3.9030007@anl.gov> <538C09C8.9080307@anl.gov>
	<1401686634.26836.0.camel@echo>
	<CANa904nqggpeMjA=zKh79Ma2T+FZcMt0zAgDAc+TehnO4bLW+Q@mail.gmail.com>
Message-ID: <1401931527.21488.2.camel@echo>

There's a few of the form "sleep not in tc.data", "unknown site
'beagle'", etc.
There's also a few "could not create JVM", which seem to point to some
problem with the remote environment when starting coasters.

Can you fix those please, so we can get an idea of where there are
actual swift failures?

Mihael

On Thu, 2014-06-05 at 03:43 +0530, Yadu Nand wrote:
> Hi,
> 
> The last tests which ran Swift 0.95 Branch SVN swift-r7871 (swift modified
> locally) cog-r3905 passed most tests.
> There are tests which are failing including a remote testing failure on
> frisbee (mac). Build is working.
> 
> I need to check into the modis test failures, which seem to be config
> issues.
> 
> Here's a link to the results :
> http://swift.rcc.uchicago.edu:8043/swift-0.95/run-2014-06-03-220931/tests-2014-06-03.html
> Links from remote sites:
> Link:
> http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-login4.beagle.ci.uchicago.edu-220931/tests-2014-06-03.html
> Link:
> http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-midway001-220931/tests-2014-06-03.html
> Link:
> http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-thwomp-220931/tests-2014-06-03.html
> Link:
> http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-blogin1-220931/tests-2014-06-03.html
> Link:
> http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-flogin1-220931/tests-2014-06-03.html
> Link:
> http://swift.rcc.uchicago.edu:8042/swift-0.95/run-2014-06-03-communicado.ci.uchicago.edu-220931/tests-2014-06-03.html
> 
> -Yadu
> 
> 
> On Mon, Jun 2, 2014 at 10:53 AM, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> 
> > On Mon, 2014-06-02 at 00:21 -0500, Michael Wilde wrote:
> > > I just pushed these changes live.  I had to manually update push_to.sh
> > > (David, is the list of pages supposed to be maintained automatically?
> > > Maybe we can drive that off of a find?)
> > >
> > > Mihael just fixed ticket 1279. David will make 0.95 the "latest
> > > download" (and update the button on both Home and Download, as push
> > > 0.94.X to the older releases page)?
> > >
> > > Yadu, can you re-test this release when you are back online in Chicago?
> >
> > Yeah, do we have a general feel for 0.95? I lost track.
> >
> > Mihael
> >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >
> 
> 
> 


From wilde at anl.gov  Sun Jun  8 14:24:27 2014
From: wilde at anl.gov (Michael Wilde)
Date: Sun, 8 Jun 2014 14:24:27 -0500
Subject: [Swift-devel] Does softimage work with the new 0.95 config
	mechanism?
Message-ID: <5394B86B.4090308@anl.gov>


Does softimage work with the new 0.95 config mechanism?

If not, can you suggest how to integrate it?

Also: has anyone written up any softimage documentation yet?

Thanks,

- Mike

ps. softimage is briefly introduced in this prior swift-devel post:
http://lists.ci.uchicago.edu/pipermail/swift-devel/2014-February/010640.html

-- 
Michael Wilde
Mathematics and Computer Science          Computation Institute
Argonne National Laboratory               The University of Chicago


From wilde at anl.gov  Sun Jun  8 16:48:00 2014
From: wilde at anl.gov (Michael Wilde)
Date: Sun, 8 Jun 2014 16:48:00 -0500
Subject: [Swift-devel] Localhost coasters not working on Beagle
Message-ID: <5394DA10.3040404@anl.gov>

Mihael - Im not able to get a simple localhost coasters run working on 
Beagle login1.

All: Is anyone seeing something similar?  It looks to me like my coaster 
worker is not able to connect to the Swift coaster service (using 
standard automatic coasters).

Im working in /lustre/beagle/wilde/swift/lab/fastio (where you can find 
logs and configs).  Running 0.95RC6.

Im setting GLOBUS_HOSTNAME (to 127.0.0.1) and have tried internalHost as 
well:

login1$ swift -config cf -tc.file apps -sites.file localcoast.xml 
catsn.swift

login1$ cat localcoast.xml
<?xml version="1.0" encoding="UTF-8"?>
<config xmlns="http://www.ci.uchicago.edu/swift/SwiftSites">

<pool handle="localhost">

<execution provider="coaster" jobmanager="local:local"/>

<profile namespace="globus" key="internalHostname">127.0.0.1</profile>
   <profile namespace="globus" key="maxwalltime">00:01:00</profile>
   <profile namespace="globus" key="maxtime">3600</profile>

   <profile namespace="globus" key="jobsPerNode">1</profile>
   <profile namespace="globus" key="slots">1</profile>
   <profile namespace="globus" key="nodeGranularity">1</profile>
   <profile namespace="globus" key="maxNodes">1</profile>

   <profile namespace="karajan" key="jobThrottle">12</profile>
   <profile namespace="karajan" key="initialScore">10000</profile>

   <profile namespace="karajan" key="lowOverAllocation">100</profile>
   <profile namespace="karajan" key="highOverAllocation">100</profile>

<filesystem provider="local"/>
<workdirectory>/tmp/swiftwork</workdirectory>


</pool>

I get error 110 connection timeouts:

2014-06-08 16:37:50,762-0500 DEBUG swift JOB_START jobid=cat-7jiymsrl 
tr=cat arguments=[data.txt] tmpdir=catsn-run013/jobs/7/cat-7jiymsrl 
host=localhost
2014-06-08 16:37:50,829-0500 INFO  LocalService Started local service: 
127.0.0.1:50000
2014-06-08 16:37:50,837-0500 INFO  BootstrapService Socket bound. URL is 
http://127.0.0.1:50001
2014-06-08 16:37:50,914-0500 INFO  Settings Local contacts: 
[http://127.0.0.2:50003, http://192.5.86.104:50003, 
http://10.128.2.244:50003]
2014-06-08 16:37:50,917-0500 INFO  CoasterService Started local service: 
http://127.0.0.1:50003
2014-06-08 16:37:50,917-0500 INFO  CoasterService Reserving channel for 
registration
2014-06-08 16:37:50,942-0500 INFO  MetaChannel MetaChannel [context: 
cpipe, boundTo: null] binding to cpipe://1
2014-06-08 16:37:50,942-0500 INFO  MetaChannel MetaChannel [context: 
spipe, boundTo: null] binding to spipe://1
2014-06-08 16:37:50,942-0500 INFO  CoasterService Sending registration
2014-06-08 16:37:50,948-0500 INFO  MetaChannel Trying to re-bind current 
channel
2014-06-08 16:37:50,949-0500 INFO  RequestHandler Handler(tag: 1, 
REGISTER) unregistering (send)
2014-06-08 16:37:50,949-0500 INFO  CoasterService Registration complete
2014-06-08 16:37:50,949-0500 INFO  CoasterService Started coaster 
service: http://127.0.0.1:50002
2014-06-08 16:37:50,952-0500 WARN  Settings original callback URI is 
http://10.128.2.244:50003
2014-06-08 16:37:50,952-0500 WARN  Settings callback URI has been 
overridden to http://127.0.0.1:50003
2014-06-08 16:37:50,953-0500 INFO  RequestHandler Handler(tag: 1, 
CONFIGSERVICE) unregistering (send)
2014-06-08 16:37:50,969-0500 INFO  BlockQueueProcessor Starting... 
id=0608-3704500
2014-06-08 16:37:50,969-0500 INFO  RequestHandler Handler(tag: 2, 
SUBMITJOB) unregistering (send)
2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor
Settings {
     slots = 1
     jobsPerNode = 1
     workersPerNode = 1
     nodeGranularity = 1
     allocationStepSize = 0.1
     maxNodes = 1
     lowOverallocation = 10.0
     highOverallocation = 1.0
     overallocationDecayFactor = 0.001
     spread = 0.9
     reserve = 60.000s
     maxtime = 3600
     remoteMonitorEnabled = false
     internalHostname = 127.0.0.1
     hookClass = null
     workerManager = block
     workerLoggingLevel = NONE
     workerLoggingDirectory = DEFAULT
     ldLibraryPath = null
     workerCopies = null
     directory = null
     useHashBang = null
     parallelism = 0.01
     coresPerNode = 1
     perfTraceWorker = false
     perfTraceInterval = -1
     attributes = {}
     callbackURIs = [http://127.0.0.1:50003]
}

2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Jobs in holding 
queue: 1
2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Time estimate for 
holding queue (seconds): 1
2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Allocating blocks 
for a total walltime of: 1s
2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor  Considering: 
Job(id:0 60.000s)
2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor       Max 
Walltime (seconds):   60
2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor       Time 
estimate (seconds):  600
2014-06-08 16:37:51,010-0500 INFO  BlockQueueProcessor       Total for 
this new Block (est. seconds): 0
2014-06-08 16:37:51,013-0500 INFO  BlockQueueProcessor index: 0, last: 
0, holding.size(): 1
2014-06-08 16:37:51,014-0500 INFO  BlockQueueProcessor Queued: 1 jobs to 
new Block
2014-06-08 16:37:51,014-0500 INFO  BlockQueueProcessor index: 0, last: 
0, ii: 1, holding.size(): 1
2014-06-08 16:37:51,014-0500 INFO  Block Starting block: workers=1, 
walltime=600.000s
2014-06-08 16:37:51,016-0500 INFO  RemoteLogHandler BLOCK_REQUESTED 
id=0608-3704500-000000, cores=1, coresPerWorker=1, walltime=600
2014-06-08 16:37:51,016-0500 INFO  RequestHandler Handler(tag: 2, RLOG) 
unregistering (send)
2014-06-08 16:37:51,018-0500 INFO  BlockTaskSubmitter Queuing block 
Block 0608-3704500-000000 (1x600.000s) for submission
2014-06-08 16:37:51,018-0500 INFO  BlockQueueProcessor Added 1 jobs to 
new blocks
2014-06-08 16:37:51,018-0500 INFO  BlockTaskSubmitter Submitting block 
Block 0608-3704500-000000 (1x600.000s)
2014-06-08 16:37:51,018-0500 INFO  ExecutionTaskHandler provider=local
2014-06-08 16:37:51,023-0500 INFO  Block Block task status changed: 
Submitting
2014-06-08 16:37:51,023-0500 INFO  JobSubmissionTaskHandler Submit: in: 
/ command: /usr/bin/perl 
/home/wilde/.globus/coasters/cscript2445623341660096310.pl 
http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
2014-06-08 16:37:51,024-0500 INFO  Block Block task status changed: 
Submitted
2014-06-08 16:37:51,027-0500 INFO  Block Block task status changed: Active
2014-06-08 16:37:51,027-0500 INFO  RemoteLogHandler BLOCK_ACTIVE 
id=0608-3704500-000000
2014-06-08 16:37:51,027-0500 INFO  RequestHandler Handler(tag: 3, RLOG) 
unregistering (send)
2014-06-08 16:37:51,681-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
2014-06-08 16:37:51,681-0500 INFO  RuntimeStats$ProgressTicker HeapMax: 
954466304, CrtHeap: 253624320, UsedHeap: 28583112
2014-06-08 16:38:21,683-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
2014-06-08 16:38:21,683-0500 INFO  RuntimeStats$ProgressTicker HeapMax: 
954466304, CrtHeap: 253624320, UsedHeap: 29067208
2014-06-08 16:38:51,686-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
2014-06-08 16:38:51,686-0500 INFO  RuntimeStats$ProgressTicker HeapMax: 
954466304, CrtHeap: 253624320, UsedHeap: 29551304
2014-06-08 16:38:57,113-0500 INFO  Block Block task status changed: 
Failed Job failed with an exit code of 110
2014-06-08 16:38:57,115-0500 INFO  Block Failed task spec: Job:
     executable: /usr/bin/perl
     arguments: 
/home/wilde/.globus/coasters/cscript2445623341660096310.pl 
http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
     stdout:     null
     stderr:     null
     directory:  /
     batch:      false
     redirected: false
     attributes: 
hostcount=1,count=1,jobspernode=1,corespernode=1,maxwalltime=10
     env:        WORKER_LOGGING_LEVEL=NONE

2014-06-08 16:38:57,115-0500 INFO  Block Worker task failed:
Failed to connect: Connection timed out at 
/home/wilde/.globus/coasters/cscript2445623341660096310.pl line 1101.


-- 
Michael Wilde
Mathematics and Computer Science          Computation Institute
Argonne National Laboratory               The University of Chicago


From hategan at mcs.anl.gov  Sun Jun  8 17:33:15 2014
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Sun, 8 Jun 2014 15:33:15 -0700
Subject: [Swift-devel] Localhost coasters not working on Beagle
In-Reply-To: <5394DA10.3040404@anl.gov>
References: <5394DA10.3040404@anl.gov>
Message-ID: <1402266795.32444.2.camel@echo>

Can you enable worker logging and post the worker log?

Mihael

On Sun, 2014-06-08 at 16:48 -0500, Michael Wilde wrote:
> Mihael - Im not able to get a simple localhost coasters run working on 
> Beagle login1.
> 
> All: Is anyone seeing something similar?  It looks to me like my coaster 
> worker is not able to connect to the Swift coaster service (using 
> standard automatic coasters).
> 
> Im working in /lustre/beagle/wilde/swift/lab/fastio (where you can find 
> logs and configs).  Running 0.95RC6.
> 
> Im setting GLOBUS_HOSTNAME (to 127.0.0.1) and have tried internalHost as 
> well:
> 
> login1$ swift -config cf -tc.file apps -sites.file localcoast.xml 
> catsn.swift
> 
> login1$ cat localcoast.xml
> <?xml version="1.0" encoding="UTF-8"?>
> <config xmlns="http://www.ci.uchicago.edu/swift/SwiftSites">
> 
> <pool handle="localhost">
> 
> <execution provider="coaster" jobmanager="local:local"/>
> 
> <profile namespace="globus" key="internalHostname">127.0.0.1</profile>
>    <profile namespace="globus" key="maxwalltime">00:01:00</profile>
>    <profile namespace="globus" key="maxtime">3600</profile>
> 
>    <profile namespace="globus" key="jobsPerNode">1</profile>
>    <profile namespace="globus" key="slots">1</profile>
>    <profile namespace="globus" key="nodeGranularity">1</profile>
>    <profile namespace="globus" key="maxNodes">1</profile>
> 
>    <profile namespace="karajan" key="jobThrottle">12</profile>
>    <profile namespace="karajan" key="initialScore">10000</profile>
> 
>    <profile namespace="karajan" key="lowOverAllocation">100</profile>
>    <profile namespace="karajan" key="highOverAllocation">100</profile>
> 
> <filesystem provider="local"/>
> <workdirectory>/tmp/swiftwork</workdirectory>
> 
> 
> </pool>
> 
> I get error 110 connection timeouts:
> 
> 2014-06-08 16:37:50,762-0500 DEBUG swift JOB_START jobid=cat-7jiymsrl 
> tr=cat arguments=[data.txt] tmpdir=catsn-run013/jobs/7/cat-7jiymsrl 
> host=localhost
> 2014-06-08 16:37:50,829-0500 INFO  LocalService Started local service: 
> 127.0.0.1:50000
> 2014-06-08 16:37:50,837-0500 INFO  BootstrapService Socket bound. URL is 
> http://127.0.0.1:50001
> 2014-06-08 16:37:50,914-0500 INFO  Settings Local contacts: 
> [http://127.0.0.2:50003, http://192.5.86.104:50003, 
> http://10.128.2.244:50003]
> 2014-06-08 16:37:50,917-0500 INFO  CoasterService Started local service: 
> http://127.0.0.1:50003
> 2014-06-08 16:37:50,917-0500 INFO  CoasterService Reserving channel for 
> registration
> 2014-06-08 16:37:50,942-0500 INFO  MetaChannel MetaChannel [context: 
> cpipe, boundTo: null] binding to cpipe://1
> 2014-06-08 16:37:50,942-0500 INFO  MetaChannel MetaChannel [context: 
> spipe, boundTo: null] binding to spipe://1
> 2014-06-08 16:37:50,942-0500 INFO  CoasterService Sending registration
> 2014-06-08 16:37:50,948-0500 INFO  MetaChannel Trying to re-bind current 
> channel
> 2014-06-08 16:37:50,949-0500 INFO  RequestHandler Handler(tag: 1, 
> REGISTER) unregistering (send)
> 2014-06-08 16:37:50,949-0500 INFO  CoasterService Registration complete
> 2014-06-08 16:37:50,949-0500 INFO  CoasterService Started coaster 
> service: http://127.0.0.1:50002
> 2014-06-08 16:37:50,952-0500 WARN  Settings original callback URI is 
> http://10.128.2.244:50003
> 2014-06-08 16:37:50,952-0500 WARN  Settings callback URI has been 
> overridden to http://127.0.0.1:50003
> 2014-06-08 16:37:50,953-0500 INFO  RequestHandler Handler(tag: 1, 
> CONFIGSERVICE) unregistering (send)
> 2014-06-08 16:37:50,969-0500 INFO  BlockQueueProcessor Starting... 
> id=0608-3704500
> 2014-06-08 16:37:50,969-0500 INFO  RequestHandler Handler(tag: 2, 
> SUBMITJOB) unregistering (send)
> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor
> Settings {
>      slots = 1
>      jobsPerNode = 1
>      workersPerNode = 1
>      nodeGranularity = 1
>      allocationStepSize = 0.1
>      maxNodes = 1
>      lowOverallocation = 10.0
>      highOverallocation = 1.0
>      overallocationDecayFactor = 0.001
>      spread = 0.9
>      reserve = 60.000s
>      maxtime = 3600
>      remoteMonitorEnabled = false
>      internalHostname = 127.0.0.1
>      hookClass = null
>      workerManager = block
>      workerLoggingLevel = NONE
>      workerLoggingDirectory = DEFAULT
>      ldLibraryPath = null
>      workerCopies = null
>      directory = null
>      useHashBang = null
>      parallelism = 0.01
>      coresPerNode = 1
>      perfTraceWorker = false
>      perfTraceInterval = -1
>      attributes = {}
>      callbackURIs = [http://127.0.0.1:50003]
> }
> 
> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Jobs in holding 
> queue: 1
> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Time estimate for 
> holding queue (seconds): 1
> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Allocating blocks 
> for a total walltime of: 1s
> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor  Considering: 
> Job(id:0 60.000s)
> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor       Max 
> Walltime (seconds):   60
> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor       Time 
> estimate (seconds):  600
> 2014-06-08 16:37:51,010-0500 INFO  BlockQueueProcessor       Total for 
> this new Block (est. seconds): 0
> 2014-06-08 16:37:51,013-0500 INFO  BlockQueueProcessor index: 0, last: 
> 0, holding.size(): 1
> 2014-06-08 16:37:51,014-0500 INFO  BlockQueueProcessor Queued: 1 jobs to 
> new Block
> 2014-06-08 16:37:51,014-0500 INFO  BlockQueueProcessor index: 0, last: 
> 0, ii: 1, holding.size(): 1
> 2014-06-08 16:37:51,014-0500 INFO  Block Starting block: workers=1, 
> walltime=600.000s
> 2014-06-08 16:37:51,016-0500 INFO  RemoteLogHandler BLOCK_REQUESTED 
> id=0608-3704500-000000, cores=1, coresPerWorker=1, walltime=600
> 2014-06-08 16:37:51,016-0500 INFO  RequestHandler Handler(tag: 2, RLOG) 
> unregistering (send)
> 2014-06-08 16:37:51,018-0500 INFO  BlockTaskSubmitter Queuing block 
> Block 0608-3704500-000000 (1x600.000s) for submission
> 2014-06-08 16:37:51,018-0500 INFO  BlockQueueProcessor Added 1 jobs to 
> new blocks
> 2014-06-08 16:37:51,018-0500 INFO  BlockTaskSubmitter Submitting block 
> Block 0608-3704500-000000 (1x600.000s)
> 2014-06-08 16:37:51,018-0500 INFO  ExecutionTaskHandler provider=local
> 2014-06-08 16:37:51,023-0500 INFO  Block Block task status changed: 
> Submitting
> 2014-06-08 16:37:51,023-0500 INFO  JobSubmissionTaskHandler Submit: in: 
> / command: /usr/bin/perl 
> /home/wilde/.globus/coasters/cscript2445623341660096310.pl 
> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
> 2014-06-08 16:37:51,024-0500 INFO  Block Block task status changed: 
> Submitted
> 2014-06-08 16:37:51,027-0500 INFO  Block Block task status changed: Active
> 2014-06-08 16:37:51,027-0500 INFO  RemoteLogHandler BLOCK_ACTIVE 
> id=0608-3704500-000000
> 2014-06-08 16:37:51,027-0500 INFO  RequestHandler Handler(tag: 3, RLOG) 
> unregistering (send)
> 2014-06-08 16:37:51,681-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
> 2014-06-08 16:37:51,681-0500 INFO  RuntimeStats$ProgressTicker HeapMax: 
> 954466304, CrtHeap: 253624320, UsedHeap: 28583112
> 2014-06-08 16:38:21,683-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
> 2014-06-08 16:38:21,683-0500 INFO  RuntimeStats$ProgressTicker HeapMax: 
> 954466304, CrtHeap: 253624320, UsedHeap: 29067208
> 2014-06-08 16:38:51,686-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
> 2014-06-08 16:38:51,686-0500 INFO  RuntimeStats$ProgressTicker HeapMax: 
> 954466304, CrtHeap: 253624320, UsedHeap: 29551304
> 2014-06-08 16:38:57,113-0500 INFO  Block Block task status changed: 
> Failed Job failed with an exit code of 110
> 2014-06-08 16:38:57,115-0500 INFO  Block Failed task spec: Job:
>      executable: /usr/bin/perl
>      arguments: 
> /home/wilde/.globus/coasters/cscript2445623341660096310.pl 
> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
>      stdout:     null
>      stderr:     null
>      directory:  /
>      batch:      false
>      redirected: false
>      attributes: 
> hostcount=1,count=1,jobspernode=1,corespernode=1,maxwalltime=10
>      env:        WORKER_LOGGING_LEVEL=NONE
> 
> 2014-06-08 16:38:57,115-0500 INFO  Block Worker task failed:
> Failed to connect: Connection timed out at 
> /home/wilde/.globus/coasters/cscript2445623341660096310.pl line 1101.
> 
> 
> 


From davidkelly at uchicago.edu  Sun Jun  8 19:07:38 2014
From: davidkelly at uchicago.edu (David Kelly)
Date: Sun, 8 Jun 2014 19:07:38 -0500
Subject: [Swift-devel] Does softimage work with the new 0.95 config
	mechanism?
In-Reply-To: <5394B86B.4090308@anl.gov>
References: <5394B86B.4090308@anl.gov>
Message-ID: <CA+_+Ey9LuzyCnunMotvmH2K1CHSehPOghbjDE-FdrqQOUPW48g@mail.gmail.com>

No, the new config mechanism does not know about softimage. There is no
documentation in the userguide about it. (Side note: the trunk userguide
should be copied to become the 0.95 userguide, and be added to the website)

I just created a page in swift-devel explaining how to add properties to
the new config (
https://sites.google.com/site/swiftdevel/home/adding-properties-to-0-95-config-mechanism
).


On Sun, Jun 8, 2014 at 2:24 PM, Michael Wilde <wilde at anl.gov> wrote:

>
> Does softimage work with the new 0.95 config mechanism?
>
> If not, can you suggest how to integrate it?
>
> Also: has anyone written up any softimage documentation yet?
>
> Thanks,
>
> - Mike
>
> ps. softimage is briefly introduced in this prior swift-devel post:
>
> http://lists.ci.uchicago.edu/pipermail/swift-devel/2014-February/010640.html
>
> --
> Michael Wilde
> Mathematics and Computer Science          Computation Institute
> Argonne National Laboratory               The University of Chicago
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20140608/57ac26c5/attachment.html>

From wilde at anl.gov  Sun Jun  8 22:10:38 2014
From: wilde at anl.gov (Michael Wilde)
Date: Sun, 8 Jun 2014 22:10:38 -0500
Subject: [Swift-devel] Localhost coasters not working on Beagle
In-Reply-To: <1402266795.32444.2.camel@echo>
References: <5394DA10.3040404@anl.gov> <1402266795.32444.2.camel@echo>
Message-ID: <539525AE.7080705@anl.gov>

login1$ more /home/wilde/.globus/coasters/worker-0608-0710120-000000.log
2014/06/08 22:07:12.296 INFO  - 0608-0710120-000000 Logging started: Sun 
Jun  8 22:07:12 2014
2014/06/08 22:07:12.296 INFO  - Running on node 
login1.beagle.ci.uchicago.edu
2014/06/08 22:07:12.296 DEBUG - uri=http://127.0.0.1:50003
2014/06/08 22:07:12.296 DEBUG - scheme=http
2014/06/08 22:07:12.297 DEBUG - host=127.0.0.1
2014/06/08 22:07:12.297 DEBUG - port=50003
2014/06/08 22:07:12.297 DEBUG - blockid=0608-0710120-000000
2014/06/08 22:07:12.297 INFO  - Connect attempt: 0 ...
2014/06/08 22:07:12.297 INFO  - Trying 127.0.0.1:50003 ...
2014/06/08 22:07:33.296 INFO  - Connection failed: Connection timed out. 
Trying other addresses
2014/06/08 22:07:33.296 ERROR - Connection failed for all addresses.
2014/06/08 22:07:33.296 ERROR - Retrying in 1 seconds
2014/06/08 22:07:34.297 INFO  - Connect attempt: 1 ...
2014/06/08 22:07:34.297 INFO  - Trying 127.0.0.1:50003 ...
2014/06/08 22:07:55.295 INFO  - Connection failed: Connection timed out. 
Trying other addresses
2014/06/08 22:07:55.296 ERROR - Connection failed for all addresses.
2014/06/08 22:07:55.296 ERROR - Retrying in 2 seconds
2014/06/08 22:07:57.298 INFO  - Connect attempt: 2 ...
2014/06/08 22:07:57.298 INFO  - Trying 127.0.0.1:50003 ...
2014/06/08 22:08:18.295 INFO  - Connection failed: Connection timed out. 
Trying other addresses
2014/06/08 22:08:18.295 ERROR - Connection failed for all addresses.
2014/06/08 22:08:18.295 ERROR - Failed to connect: Connection timed out
login1$


On 6/8/14, 5:33 PM, Mihael Hategan wrote:
> Can you enable worker logging and post the worker log?
>
> Mihael
>
> On Sun, 2014-06-08 at 16:48 -0500, Michael Wilde wrote:
>> Mihael - Im not able to get a simple localhost coasters run working on
>> Beagle login1.
>>
>> All: Is anyone seeing something similar?  It looks to me like my coaster
>> worker is not able to connect to the Swift coaster service (using
>> standard automatic coasters).
>>
>> Im working in /lustre/beagle/wilde/swift/lab/fastio (where you can find
>> logs and configs).  Running 0.95RC6.
>>
>> Im setting GLOBUS_HOSTNAME (to 127.0.0.1) and have tried internalHost as
>> well:
>>
>> login1$ swift -config cf -tc.file apps -sites.file localcoast.xml
>> catsn.swift
>>
>> login1$ cat localcoast.xml
>> <?xml version="1.0" encoding="UTF-8"?>
>> <config xmlns="http://www.ci.uchicago.edu/swift/SwiftSites">
>>
>> <pool handle="localhost">
>>
>> <execution provider="coaster" jobmanager="local:local"/>
>>
>> <profile namespace="globus" key="internalHostname">127.0.0.1</profile>
>>     <profile namespace="globus" key="maxwalltime">00:01:00</profile>
>>     <profile namespace="globus" key="maxtime">3600</profile>
>>
>>     <profile namespace="globus" key="jobsPerNode">1</profile>
>>     <profile namespace="globus" key="slots">1</profile>
>>     <profile namespace="globus" key="nodeGranularity">1</profile>
>>     <profile namespace="globus" key="maxNodes">1</profile>
>>
>>     <profile namespace="karajan" key="jobThrottle">12</profile>
>>     <profile namespace="karajan" key="initialScore">10000</profile>
>>
>>     <profile namespace="karajan" key="lowOverAllocation">100</profile>
>>     <profile namespace="karajan" key="highOverAllocation">100</profile>
>>
>> <filesystem provider="local"/>
>> <workdirectory>/tmp/swiftwork</workdirectory>
>>
>>
>> </pool>
>>
>> I get error 110 connection timeouts:
>>
>> 2014-06-08 16:37:50,762-0500 DEBUG swift JOB_START jobid=cat-7jiymsrl
>> tr=cat arguments=[data.txt] tmpdir=catsn-run013/jobs/7/cat-7jiymsrl
>> host=localhost
>> 2014-06-08 16:37:50,829-0500 INFO  LocalService Started local service:
>> 127.0.0.1:50000
>> 2014-06-08 16:37:50,837-0500 INFO  BootstrapService Socket bound. URL is
>> http://127.0.0.1:50001
>> 2014-06-08 16:37:50,914-0500 INFO  Settings Local contacts:
>> [http://127.0.0.2:50003, http://192.5.86.104:50003,
>> http://10.128.2.244:50003]
>> 2014-06-08 16:37:50,917-0500 INFO  CoasterService Started local service:
>> http://127.0.0.1:50003
>> 2014-06-08 16:37:50,917-0500 INFO  CoasterService Reserving channel for
>> registration
>> 2014-06-08 16:37:50,942-0500 INFO  MetaChannel MetaChannel [context:
>> cpipe, boundTo: null] binding to cpipe://1
>> 2014-06-08 16:37:50,942-0500 INFO  MetaChannel MetaChannel [context:
>> spipe, boundTo: null] binding to spipe://1
>> 2014-06-08 16:37:50,942-0500 INFO  CoasterService Sending registration
>> 2014-06-08 16:37:50,948-0500 INFO  MetaChannel Trying to re-bind current
>> channel
>> 2014-06-08 16:37:50,949-0500 INFO  RequestHandler Handler(tag: 1,
>> REGISTER) unregistering (send)
>> 2014-06-08 16:37:50,949-0500 INFO  CoasterService Registration complete
>> 2014-06-08 16:37:50,949-0500 INFO  CoasterService Started coaster
>> service: http://127.0.0.1:50002
>> 2014-06-08 16:37:50,952-0500 WARN  Settings original callback URI is
>> http://10.128.2.244:50003
>> 2014-06-08 16:37:50,952-0500 WARN  Settings callback URI has been
>> overridden to http://127.0.0.1:50003
>> 2014-06-08 16:37:50,953-0500 INFO  RequestHandler Handler(tag: 1,
>> CONFIGSERVICE) unregistering (send)
>> 2014-06-08 16:37:50,969-0500 INFO  BlockQueueProcessor Starting...
>> id=0608-3704500
>> 2014-06-08 16:37:50,969-0500 INFO  RequestHandler Handler(tag: 2,
>> SUBMITJOB) unregistering (send)
>> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor
>> Settings {
>>       slots = 1
>>       jobsPerNode = 1
>>       workersPerNode = 1
>>       nodeGranularity = 1
>>       allocationStepSize = 0.1
>>       maxNodes = 1
>>       lowOverallocation = 10.0
>>       highOverallocation = 1.0
>>       overallocationDecayFactor = 0.001
>>       spread = 0.9
>>       reserve = 60.000s
>>       maxtime = 3600
>>       remoteMonitorEnabled = false
>>       internalHostname = 127.0.0.1
>>       hookClass = null
>>       workerManager = block
>>       workerLoggingLevel = NONE
>>       workerLoggingDirectory = DEFAULT
>>       ldLibraryPath = null
>>       workerCopies = null
>>       directory = null
>>       useHashBang = null
>>       parallelism = 0.01
>>       coresPerNode = 1
>>       perfTraceWorker = false
>>       perfTraceInterval = -1
>>       attributes = {}
>>       callbackURIs = [http://127.0.0.1:50003]
>> }
>>
>> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Jobs in holding
>> queue: 1
>> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Time estimate for
>> holding queue (seconds): 1
>> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Allocating blocks
>> for a total walltime of: 1s
>> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor  Considering:
>> Job(id:0 60.000s)
>> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor       Max
>> Walltime (seconds):   60
>> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor       Time
>> estimate (seconds):  600
>> 2014-06-08 16:37:51,010-0500 INFO  BlockQueueProcessor       Total for
>> this new Block (est. seconds): 0
>> 2014-06-08 16:37:51,013-0500 INFO  BlockQueueProcessor index: 0, last:
>> 0, holding.size(): 1
>> 2014-06-08 16:37:51,014-0500 INFO  BlockQueueProcessor Queued: 1 jobs to
>> new Block
>> 2014-06-08 16:37:51,014-0500 INFO  BlockQueueProcessor index: 0, last:
>> 0, ii: 1, holding.size(): 1
>> 2014-06-08 16:37:51,014-0500 INFO  Block Starting block: workers=1,
>> walltime=600.000s
>> 2014-06-08 16:37:51,016-0500 INFO  RemoteLogHandler BLOCK_REQUESTED
>> id=0608-3704500-000000, cores=1, coresPerWorker=1, walltime=600
>> 2014-06-08 16:37:51,016-0500 INFO  RequestHandler Handler(tag: 2, RLOG)
>> unregistering (send)
>> 2014-06-08 16:37:51,018-0500 INFO  BlockTaskSubmitter Queuing block
>> Block 0608-3704500-000000 (1x600.000s) for submission
>> 2014-06-08 16:37:51,018-0500 INFO  BlockQueueProcessor Added 1 jobs to
>> new blocks
>> 2014-06-08 16:37:51,018-0500 INFO  BlockTaskSubmitter Submitting block
>> Block 0608-3704500-000000 (1x600.000s)
>> 2014-06-08 16:37:51,018-0500 INFO  ExecutionTaskHandler provider=local
>> 2014-06-08 16:37:51,023-0500 INFO  Block Block task status changed:
>> Submitting
>> 2014-06-08 16:37:51,023-0500 INFO  JobSubmissionTaskHandler Submit: in:
>> / command: /usr/bin/perl
>> /home/wilde/.globus/coasters/cscript2445623341660096310.pl
>> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
>> 2014-06-08 16:37:51,024-0500 INFO  Block Block task status changed:
>> Submitted
>> 2014-06-08 16:37:51,027-0500 INFO  Block Block task status changed: Active
>> 2014-06-08 16:37:51,027-0500 INFO  RemoteLogHandler BLOCK_ACTIVE
>> id=0608-3704500-000000
>> 2014-06-08 16:37:51,027-0500 INFO  RequestHandler Handler(tag: 3, RLOG)
>> unregistering (send)
>> 2014-06-08 16:37:51,681-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
>> 2014-06-08 16:37:51,681-0500 INFO  RuntimeStats$ProgressTicker HeapMax:
>> 954466304, CrtHeap: 253624320, UsedHeap: 28583112
>> 2014-06-08 16:38:21,683-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
>> 2014-06-08 16:38:21,683-0500 INFO  RuntimeStats$ProgressTicker HeapMax:
>> 954466304, CrtHeap: 253624320, UsedHeap: 29067208
>> 2014-06-08 16:38:51,686-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
>> 2014-06-08 16:38:51,686-0500 INFO  RuntimeStats$ProgressTicker HeapMax:
>> 954466304, CrtHeap: 253624320, UsedHeap: 29551304
>> 2014-06-08 16:38:57,113-0500 INFO  Block Block task status changed:
>> Failed Job failed with an exit code of 110
>> 2014-06-08 16:38:57,115-0500 INFO  Block Failed task spec: Job:
>>       executable: /usr/bin/perl
>>       arguments:
>> /home/wilde/.globus/coasters/cscript2445623341660096310.pl
>> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
>>       stdout:     null
>>       stderr:     null
>>       directory:  /
>>       batch:      false
>>       redirected: false
>>       attributes:
>> hostcount=1,count=1,jobspernode=1,corespernode=1,maxwalltime=10
>>       env:        WORKER_LOGGING_LEVEL=NONE
>>
>> 2014-06-08 16:38:57,115-0500 INFO  Block Worker task failed:
>> Failed to connect: Connection timed out at
>> /home/wilde/.globus/coasters/cscript2445623341660096310.pl line 1101.
>>
>>
>>
>

-- 
Michael Wilde
Mathematics and Computer Science          Computation Institute
Argonne National Laboratory               The University of Chicago


From hategan at mcs.anl.gov  Sun Jun  8 22:22:51 2014
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Sun, 8 Jun 2014 20:22:51 -0700
Subject: [Swift-devel] Localhost coasters not working on Beagle
In-Reply-To: <539525AE.7080705@anl.gov>
References: <5394DA10.3040404@anl.gov> <1402266795.32444.2.camel@echo>
	<539525AE.7080705@anl.gov>
Message-ID: <1402284171.15313.0.camel@echo>

That's odd. Have you tried netstat -lntp? telnet?

I'll give it a shot, but this looks rather strange.

Mihael

On Sun, 2014-06-08 at 22:10 -0500, Michael Wilde wrote:
> login1$ more /home/wilde/.globus/coasters/worker-0608-0710120-000000.log
> 2014/06/08 22:07:12.296 INFO  - 0608-0710120-000000 Logging started: Sun 
> Jun  8 22:07:12 2014
> 2014/06/08 22:07:12.296 INFO  - Running on node 
> login1.beagle.ci.uchicago.edu
> 2014/06/08 22:07:12.296 DEBUG - uri=http://127.0.0.1:50003
> 2014/06/08 22:07:12.296 DEBUG - scheme=http
> 2014/06/08 22:07:12.297 DEBUG - host=127.0.0.1
> 2014/06/08 22:07:12.297 DEBUG - port=50003
> 2014/06/08 22:07:12.297 DEBUG - blockid=0608-0710120-000000
> 2014/06/08 22:07:12.297 INFO  - Connect attempt: 0 ...
> 2014/06/08 22:07:12.297 INFO  - Trying 127.0.0.1:50003 ...
> 2014/06/08 22:07:33.296 INFO  - Connection failed: Connection timed out. 
> Trying other addresses
> 2014/06/08 22:07:33.296 ERROR - Connection failed for all addresses.
> 2014/06/08 22:07:33.296 ERROR - Retrying in 1 seconds
> 2014/06/08 22:07:34.297 INFO  - Connect attempt: 1 ...
> 2014/06/08 22:07:34.297 INFO  - Trying 127.0.0.1:50003 ...
> 2014/06/08 22:07:55.295 INFO  - Connection failed: Connection timed out. 
> Trying other addresses
> 2014/06/08 22:07:55.296 ERROR - Connection failed for all addresses.
> 2014/06/08 22:07:55.296 ERROR - Retrying in 2 seconds
> 2014/06/08 22:07:57.298 INFO  - Connect attempt: 2 ...
> 2014/06/08 22:07:57.298 INFO  - Trying 127.0.0.1:50003 ...
> 2014/06/08 22:08:18.295 INFO  - Connection failed: Connection timed out. 
> Trying other addresses
> 2014/06/08 22:08:18.295 ERROR - Connection failed for all addresses.
> 2014/06/08 22:08:18.295 ERROR - Failed to connect: Connection timed out
> login1$
> 
> 
> On 6/8/14, 5:33 PM, Mihael Hategan wrote:
> > Can you enable worker logging and post the worker log?
> >
> > Mihael
> >
> > On Sun, 2014-06-08 at 16:48 -0500, Michael Wilde wrote:
> >> Mihael - Im not able to get a simple localhost coasters run working on
> >> Beagle login1.
> >>
> >> All: Is anyone seeing something similar?  It looks to me like my coaster
> >> worker is not able to connect to the Swift coaster service (using
> >> standard automatic coasters).
> >>
> >> Im working in /lustre/beagle/wilde/swift/lab/fastio (where you can find
> >> logs and configs).  Running 0.95RC6.
> >>
> >> Im setting GLOBUS_HOSTNAME (to 127.0.0.1) and have tried internalHost as
> >> well:
> >>
> >> login1$ swift -config cf -tc.file apps -sites.file localcoast.xml
> >> catsn.swift
> >>
> >> login1$ cat localcoast.xml
> >> <?xml version="1.0" encoding="UTF-8"?>
> >> <config xmlns="http://www.ci.uchicago.edu/swift/SwiftSites">
> >>
> >> <pool handle="localhost">
> >>
> >> <execution provider="coaster" jobmanager="local:local"/>
> >>
> >> <profile namespace="globus" key="internalHostname">127.0.0.1</profile>
> >>     <profile namespace="globus" key="maxwalltime">00:01:00</profile>
> >>     <profile namespace="globus" key="maxtime">3600</profile>
> >>
> >>     <profile namespace="globus" key="jobsPerNode">1</profile>
> >>     <profile namespace="globus" key="slots">1</profile>
> >>     <profile namespace="globus" key="nodeGranularity">1</profile>
> >>     <profile namespace="globus" key="maxNodes">1</profile>
> >>
> >>     <profile namespace="karajan" key="jobThrottle">12</profile>
> >>     <profile namespace="karajan" key="initialScore">10000</profile>
> >>
> >>     <profile namespace="karajan" key="lowOverAllocation">100</profile>
> >>     <profile namespace="karajan" key="highOverAllocation">100</profile>
> >>
> >> <filesystem provider="local"/>
> >> <workdirectory>/tmp/swiftwork</workdirectory>
> >>
> >>
> >> </pool>
> >>
> >> I get error 110 connection timeouts:
> >>
> >> 2014-06-08 16:37:50,762-0500 DEBUG swift JOB_START jobid=cat-7jiymsrl
> >> tr=cat arguments=[data.txt] tmpdir=catsn-run013/jobs/7/cat-7jiymsrl
> >> host=localhost
> >> 2014-06-08 16:37:50,829-0500 INFO  LocalService Started local service:
> >> 127.0.0.1:50000
> >> 2014-06-08 16:37:50,837-0500 INFO  BootstrapService Socket bound. URL is
> >> http://127.0.0.1:50001
> >> 2014-06-08 16:37:50,914-0500 INFO  Settings Local contacts:
> >> [http://127.0.0.2:50003, http://192.5.86.104:50003,
> >> http://10.128.2.244:50003]
> >> 2014-06-08 16:37:50,917-0500 INFO  CoasterService Started local service:
> >> http://127.0.0.1:50003
> >> 2014-06-08 16:37:50,917-0500 INFO  CoasterService Reserving channel for
> >> registration
> >> 2014-06-08 16:37:50,942-0500 INFO  MetaChannel MetaChannel [context:
> >> cpipe, boundTo: null] binding to cpipe://1
> >> 2014-06-08 16:37:50,942-0500 INFO  MetaChannel MetaChannel [context:
> >> spipe, boundTo: null] binding to spipe://1
> >> 2014-06-08 16:37:50,942-0500 INFO  CoasterService Sending registration
> >> 2014-06-08 16:37:50,948-0500 INFO  MetaChannel Trying to re-bind current
> >> channel
> >> 2014-06-08 16:37:50,949-0500 INFO  RequestHandler Handler(tag: 1,
> >> REGISTER) unregistering (send)
> >> 2014-06-08 16:37:50,949-0500 INFO  CoasterService Registration complete
> >> 2014-06-08 16:37:50,949-0500 INFO  CoasterService Started coaster
> >> service: http://127.0.0.1:50002
> >> 2014-06-08 16:37:50,952-0500 WARN  Settings original callback URI is
> >> http://10.128.2.244:50003
> >> 2014-06-08 16:37:50,952-0500 WARN  Settings callback URI has been
> >> overridden to http://127.0.0.1:50003
> >> 2014-06-08 16:37:50,953-0500 INFO  RequestHandler Handler(tag: 1,
> >> CONFIGSERVICE) unregistering (send)
> >> 2014-06-08 16:37:50,969-0500 INFO  BlockQueueProcessor Starting...
> >> id=0608-3704500
> >> 2014-06-08 16:37:50,969-0500 INFO  RequestHandler Handler(tag: 2,
> >> SUBMITJOB) unregistering (send)
> >> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor
> >> Settings {
> >>       slots = 1
> >>       jobsPerNode = 1
> >>       workersPerNode = 1
> >>       nodeGranularity = 1
> >>       allocationStepSize = 0.1
> >>       maxNodes = 1
> >>       lowOverallocation = 10.0
> >>       highOverallocation = 1.0
> >>       overallocationDecayFactor = 0.001
> >>       spread = 0.9
> >>       reserve = 60.000s
> >>       maxtime = 3600
> >>       remoteMonitorEnabled = false
> >>       internalHostname = 127.0.0.1
> >>       hookClass = null
> >>       workerManager = block
> >>       workerLoggingLevel = NONE
> >>       workerLoggingDirectory = DEFAULT
> >>       ldLibraryPath = null
> >>       workerCopies = null
> >>       directory = null
> >>       useHashBang = null
> >>       parallelism = 0.01
> >>       coresPerNode = 1
> >>       perfTraceWorker = false
> >>       perfTraceInterval = -1
> >>       attributes = {}
> >>       callbackURIs = [http://127.0.0.1:50003]
> >> }
> >>
> >> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Jobs in holding
> >> queue: 1
> >> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Time estimate for
> >> holding queue (seconds): 1
> >> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Allocating blocks
> >> for a total walltime of: 1s
> >> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor  Considering:
> >> Job(id:0 60.000s)
> >> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor       Max
> >> Walltime (seconds):   60
> >> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor       Time
> >> estimate (seconds):  600
> >> 2014-06-08 16:37:51,010-0500 INFO  BlockQueueProcessor       Total for
> >> this new Block (est. seconds): 0
> >> 2014-06-08 16:37:51,013-0500 INFO  BlockQueueProcessor index: 0, last:
> >> 0, holding.size(): 1
> >> 2014-06-08 16:37:51,014-0500 INFO  BlockQueueProcessor Queued: 1 jobs to
> >> new Block
> >> 2014-06-08 16:37:51,014-0500 INFO  BlockQueueProcessor index: 0, last:
> >> 0, ii: 1, holding.size(): 1
> >> 2014-06-08 16:37:51,014-0500 INFO  Block Starting block: workers=1,
> >> walltime=600.000s
> >> 2014-06-08 16:37:51,016-0500 INFO  RemoteLogHandler BLOCK_REQUESTED
> >> id=0608-3704500-000000, cores=1, coresPerWorker=1, walltime=600
> >> 2014-06-08 16:37:51,016-0500 INFO  RequestHandler Handler(tag: 2, RLOG)
> >> unregistering (send)
> >> 2014-06-08 16:37:51,018-0500 INFO  BlockTaskSubmitter Queuing block
> >> Block 0608-3704500-000000 (1x600.000s) for submission
> >> 2014-06-08 16:37:51,018-0500 INFO  BlockQueueProcessor Added 1 jobs to
> >> new blocks
> >> 2014-06-08 16:37:51,018-0500 INFO  BlockTaskSubmitter Submitting block
> >> Block 0608-3704500-000000 (1x600.000s)
> >> 2014-06-08 16:37:51,018-0500 INFO  ExecutionTaskHandler provider=local
> >> 2014-06-08 16:37:51,023-0500 INFO  Block Block task status changed:
> >> Submitting
> >> 2014-06-08 16:37:51,023-0500 INFO  JobSubmissionTaskHandler Submit: in:
> >> / command: /usr/bin/perl
> >> /home/wilde/.globus/coasters/cscript2445623341660096310.pl
> >> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
> >> 2014-06-08 16:37:51,024-0500 INFO  Block Block task status changed:
> >> Submitted
> >> 2014-06-08 16:37:51,027-0500 INFO  Block Block task status changed: Active
> >> 2014-06-08 16:37:51,027-0500 INFO  RemoteLogHandler BLOCK_ACTIVE
> >> id=0608-3704500-000000
> >> 2014-06-08 16:37:51,027-0500 INFO  RequestHandler Handler(tag: 3, RLOG)
> >> unregistering (send)
> >> 2014-06-08 16:37:51,681-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
> >> 2014-06-08 16:37:51,681-0500 INFO  RuntimeStats$ProgressTicker HeapMax:
> >> 954466304, CrtHeap: 253624320, UsedHeap: 28583112
> >> 2014-06-08 16:38:21,683-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
> >> 2014-06-08 16:38:21,683-0500 INFO  RuntimeStats$ProgressTicker HeapMax:
> >> 954466304, CrtHeap: 253624320, UsedHeap: 29067208
> >> 2014-06-08 16:38:51,686-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
> >> 2014-06-08 16:38:51,686-0500 INFO  RuntimeStats$ProgressTicker HeapMax:
> >> 954466304, CrtHeap: 253624320, UsedHeap: 29551304
> >> 2014-06-08 16:38:57,113-0500 INFO  Block Block task status changed:
> >> Failed Job failed with an exit code of 110
> >> 2014-06-08 16:38:57,115-0500 INFO  Block Failed task spec: Job:
> >>       executable: /usr/bin/perl
> >>       arguments:
> >> /home/wilde/.globus/coasters/cscript2445623341660096310.pl
> >> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
> >>       stdout:     null
> >>       stderr:     null
> >>       directory:  /
> >>       batch:      false
> >>       redirected: false
> >>       attributes:
> >> hostcount=1,count=1,jobspernode=1,corespernode=1,maxwalltime=10
> >>       env:        WORKER_LOGGING_LEVEL=NONE
> >>
> >> 2014-06-08 16:38:57,115-0500 INFO  Block Worker task failed:
> >> Failed to connect: Connection timed out at
> >> /home/wilde/.globus/coasters/cscript2445623341660096310.pl line 1101.
> >>
> >>
> >>
> >
> 


From hategan at mcs.anl.gov  Sun Jun  8 22:27:04 2014
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Sun, 8 Jun 2014 20:27:04 -0700
Subject: [Swift-devel] Localhost coasters not working on Beagle
In-Reply-To: <1402284171.15313.0.camel@echo>
References: <5394DA10.3040404@anl.gov> <1402266795.32444.2.camel@echo>
	<539525AE.7080705@anl.gov> <1402284171.15313.0.camel@echo>
Message-ID: <1402284424.15405.1.camel@echo>

Ok, so:

shell1: hategan at login1:~> netcat -l -p 50003

shell2: hategan at login1:~> netstat -lntp
...
tcp        0      0 0.0.0.0:50003           0.0.0.0:*
LISTEN      22806/netcat
...

hategan at login1:~> telnet 127.0.0.1 50003
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection timed out

I don't think this has anything to do with swift or coasters.

Mihael

On Sun, 2014-06-08 at 20:22 -0700, Mihael Hategan wrote:
> That's odd. Have you tried netstat -lntp? telnet?
> 
> I'll give it a shot, but this looks rather strange.
> 
> Mihael
> 
> On Sun, 2014-06-08 at 22:10 -0500, Michael Wilde wrote:
> > login1$ more /home/wilde/.globus/coasters/worker-0608-0710120-000000.log
> > 2014/06/08 22:07:12.296 INFO  - 0608-0710120-000000 Logging started: Sun 
> > Jun  8 22:07:12 2014
> > 2014/06/08 22:07:12.296 INFO  - Running on node 
> > login1.beagle.ci.uchicago.edu
> > 2014/06/08 22:07:12.296 DEBUG - uri=http://127.0.0.1:50003
> > 2014/06/08 22:07:12.296 DEBUG - scheme=http
> > 2014/06/08 22:07:12.297 DEBUG - host=127.0.0.1
> > 2014/06/08 22:07:12.297 DEBUG - port=50003
> > 2014/06/08 22:07:12.297 DEBUG - blockid=0608-0710120-000000
> > 2014/06/08 22:07:12.297 INFO  - Connect attempt: 0 ...
> > 2014/06/08 22:07:12.297 INFO  - Trying 127.0.0.1:50003 ...
> > 2014/06/08 22:07:33.296 INFO  - Connection failed: Connection timed out. 
> > Trying other addresses
> > 2014/06/08 22:07:33.296 ERROR - Connection failed for all addresses.
> > 2014/06/08 22:07:33.296 ERROR - Retrying in 1 seconds
> > 2014/06/08 22:07:34.297 INFO  - Connect attempt: 1 ...
> > 2014/06/08 22:07:34.297 INFO  - Trying 127.0.0.1:50003 ...
> > 2014/06/08 22:07:55.295 INFO  - Connection failed: Connection timed out. 
> > Trying other addresses
> > 2014/06/08 22:07:55.296 ERROR - Connection failed for all addresses.
> > 2014/06/08 22:07:55.296 ERROR - Retrying in 2 seconds
> > 2014/06/08 22:07:57.298 INFO  - Connect attempt: 2 ...
> > 2014/06/08 22:07:57.298 INFO  - Trying 127.0.0.1:50003 ...
> > 2014/06/08 22:08:18.295 INFO  - Connection failed: Connection timed out. 
> > Trying other addresses
> > 2014/06/08 22:08:18.295 ERROR - Connection failed for all addresses.
> > 2014/06/08 22:08:18.295 ERROR - Failed to connect: Connection timed out
> > login1$
> > 
> > 
> > On 6/8/14, 5:33 PM, Mihael Hategan wrote:
> > > Can you enable worker logging and post the worker log?
> > >
> > > Mihael
> > >
> > > On Sun, 2014-06-08 at 16:48 -0500, Michael Wilde wrote:
> > >> Mihael - Im not able to get a simple localhost coasters run working on
> > >> Beagle login1.
> > >>
> > >> All: Is anyone seeing something similar?  It looks to me like my coaster
> > >> worker is not able to connect to the Swift coaster service (using
> > >> standard automatic coasters).
> > >>
> > >> Im working in /lustre/beagle/wilde/swift/lab/fastio (where you can find
> > >> logs and configs).  Running 0.95RC6.
> > >>
> > >> Im setting GLOBUS_HOSTNAME (to 127.0.0.1) and have tried internalHost as
> > >> well:
> > >>
> > >> login1$ swift -config cf -tc.file apps -sites.file localcoast.xml
> > >> catsn.swift
> > >>
> > >> login1$ cat localcoast.xml
> > >> <?xml version="1.0" encoding="UTF-8"?>
> > >> <config xmlns="http://www.ci.uchicago.edu/swift/SwiftSites">
> > >>
> > >> <pool handle="localhost">
> > >>
> > >> <execution provider="coaster" jobmanager="local:local"/>
> > >>
> > >> <profile namespace="globus" key="internalHostname">127.0.0.1</profile>
> > >>     <profile namespace="globus" key="maxwalltime">00:01:00</profile>
> > >>     <profile namespace="globus" key="maxtime">3600</profile>
> > >>
> > >>     <profile namespace="globus" key="jobsPerNode">1</profile>
> > >>     <profile namespace="globus" key="slots">1</profile>
> > >>     <profile namespace="globus" key="nodeGranularity">1</profile>
> > >>     <profile namespace="globus" key="maxNodes">1</profile>
> > >>
> > >>     <profile namespace="karajan" key="jobThrottle">12</profile>
> > >>     <profile namespace="karajan" key="initialScore">10000</profile>
> > >>
> > >>     <profile namespace="karajan" key="lowOverAllocation">100</profile>
> > >>     <profile namespace="karajan" key="highOverAllocation">100</profile>
> > >>
> > >> <filesystem provider="local"/>
> > >> <workdirectory>/tmp/swiftwork</workdirectory>
> > >>
> > >>
> > >> </pool>
> > >>
> > >> I get error 110 connection timeouts:
> > >>
> > >> 2014-06-08 16:37:50,762-0500 DEBUG swift JOB_START jobid=cat-7jiymsrl
> > >> tr=cat arguments=[data.txt] tmpdir=catsn-run013/jobs/7/cat-7jiymsrl
> > >> host=localhost
> > >> 2014-06-08 16:37:50,829-0500 INFO  LocalService Started local service:
> > >> 127.0.0.1:50000
> > >> 2014-06-08 16:37:50,837-0500 INFO  BootstrapService Socket bound. URL is
> > >> http://127.0.0.1:50001
> > >> 2014-06-08 16:37:50,914-0500 INFO  Settings Local contacts:
> > >> [http://127.0.0.2:50003, http://192.5.86.104:50003,
> > >> http://10.128.2.244:50003]
> > >> 2014-06-08 16:37:50,917-0500 INFO  CoasterService Started local service:
> > >> http://127.0.0.1:50003
> > >> 2014-06-08 16:37:50,917-0500 INFO  CoasterService Reserving channel for
> > >> registration
> > >> 2014-06-08 16:37:50,942-0500 INFO  MetaChannel MetaChannel [context:
> > >> cpipe, boundTo: null] binding to cpipe://1
> > >> 2014-06-08 16:37:50,942-0500 INFO  MetaChannel MetaChannel [context:
> > >> spipe, boundTo: null] binding to spipe://1
> > >> 2014-06-08 16:37:50,942-0500 INFO  CoasterService Sending registration
> > >> 2014-06-08 16:37:50,948-0500 INFO  MetaChannel Trying to re-bind current
> > >> channel
> > >> 2014-06-08 16:37:50,949-0500 INFO  RequestHandler Handler(tag: 1,
> > >> REGISTER) unregistering (send)
> > >> 2014-06-08 16:37:50,949-0500 INFO  CoasterService Registration complete
> > >> 2014-06-08 16:37:50,949-0500 INFO  CoasterService Started coaster
> > >> service: http://127.0.0.1:50002
> > >> 2014-06-08 16:37:50,952-0500 WARN  Settings original callback URI is
> > >> http://10.128.2.244:50003
> > >> 2014-06-08 16:37:50,952-0500 WARN  Settings callback URI has been
> > >> overridden to http://127.0.0.1:50003
> > >> 2014-06-08 16:37:50,953-0500 INFO  RequestHandler Handler(tag: 1,
> > >> CONFIGSERVICE) unregistering (send)
> > >> 2014-06-08 16:37:50,969-0500 INFO  BlockQueueProcessor Starting...
> > >> id=0608-3704500
> > >> 2014-06-08 16:37:50,969-0500 INFO  RequestHandler Handler(tag: 2,
> > >> SUBMITJOB) unregistering (send)
> > >> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor
> > >> Settings {
> > >>       slots = 1
> > >>       jobsPerNode = 1
> > >>       workersPerNode = 1
> > >>       nodeGranularity = 1
> > >>       allocationStepSize = 0.1
> > >>       maxNodes = 1
> > >>       lowOverallocation = 10.0
> > >>       highOverallocation = 1.0
> > >>       overallocationDecayFactor = 0.001
> > >>       spread = 0.9
> > >>       reserve = 60.000s
> > >>       maxtime = 3600
> > >>       remoteMonitorEnabled = false
> > >>       internalHostname = 127.0.0.1
> > >>       hookClass = null
> > >>       workerManager = block
> > >>       workerLoggingLevel = NONE
> > >>       workerLoggingDirectory = DEFAULT
> > >>       ldLibraryPath = null
> > >>       workerCopies = null
> > >>       directory = null
> > >>       useHashBang = null
> > >>       parallelism = 0.01
> > >>       coresPerNode = 1
> > >>       perfTraceWorker = false
> > >>       perfTraceInterval = -1
> > >>       attributes = {}
> > >>       callbackURIs = [http://127.0.0.1:50003]
> > >> }
> > >>
> > >> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Jobs in holding
> > >> queue: 1
> > >> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Time estimate for
> > >> holding queue (seconds): 1
> > >> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Allocating blocks
> > >> for a total walltime of: 1s
> > >> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor  Considering:
> > >> Job(id:0 60.000s)
> > >> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor       Max
> > >> Walltime (seconds):   60
> > >> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor       Time
> > >> estimate (seconds):  600
> > >> 2014-06-08 16:37:51,010-0500 INFO  BlockQueueProcessor       Total for
> > >> this new Block (est. seconds): 0
> > >> 2014-06-08 16:37:51,013-0500 INFO  BlockQueueProcessor index: 0, last:
> > >> 0, holding.size(): 1
> > >> 2014-06-08 16:37:51,014-0500 INFO  BlockQueueProcessor Queued: 1 jobs to
> > >> new Block
> > >> 2014-06-08 16:37:51,014-0500 INFO  BlockQueueProcessor index: 0, last:
> > >> 0, ii: 1, holding.size(): 1
> > >> 2014-06-08 16:37:51,014-0500 INFO  Block Starting block: workers=1,
> > >> walltime=600.000s
> > >> 2014-06-08 16:37:51,016-0500 INFO  RemoteLogHandler BLOCK_REQUESTED
> > >> id=0608-3704500-000000, cores=1, coresPerWorker=1, walltime=600
> > >> 2014-06-08 16:37:51,016-0500 INFO  RequestHandler Handler(tag: 2, RLOG)
> > >> unregistering (send)
> > >> 2014-06-08 16:37:51,018-0500 INFO  BlockTaskSubmitter Queuing block
> > >> Block 0608-3704500-000000 (1x600.000s) for submission
> > >> 2014-06-08 16:37:51,018-0500 INFO  BlockQueueProcessor Added 1 jobs to
> > >> new blocks
> > >> 2014-06-08 16:37:51,018-0500 INFO  BlockTaskSubmitter Submitting block
> > >> Block 0608-3704500-000000 (1x600.000s)
> > >> 2014-06-08 16:37:51,018-0500 INFO  ExecutionTaskHandler provider=local
> > >> 2014-06-08 16:37:51,023-0500 INFO  Block Block task status changed:
> > >> Submitting
> > >> 2014-06-08 16:37:51,023-0500 INFO  JobSubmissionTaskHandler Submit: in:
> > >> / command: /usr/bin/perl
> > >> /home/wilde/.globus/coasters/cscript2445623341660096310.pl
> > >> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
> > >> 2014-06-08 16:37:51,024-0500 INFO  Block Block task status changed:
> > >> Submitted
> > >> 2014-06-08 16:37:51,027-0500 INFO  Block Block task status changed: Active
> > >> 2014-06-08 16:37:51,027-0500 INFO  RemoteLogHandler BLOCK_ACTIVE
> > >> id=0608-3704500-000000
> > >> 2014-06-08 16:37:51,027-0500 INFO  RequestHandler Handler(tag: 3, RLOG)
> > >> unregistering (send)
> > >> 2014-06-08 16:37:51,681-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
> > >> 2014-06-08 16:37:51,681-0500 INFO  RuntimeStats$ProgressTicker HeapMax:
> > >> 954466304, CrtHeap: 253624320, UsedHeap: 28583112
> > >> 2014-06-08 16:38:21,683-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
> > >> 2014-06-08 16:38:21,683-0500 INFO  RuntimeStats$ProgressTicker HeapMax:
> > >> 954466304, CrtHeap: 253624320, UsedHeap: 29067208
> > >> 2014-06-08 16:38:51,686-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
> > >> 2014-06-08 16:38:51,686-0500 INFO  RuntimeStats$ProgressTicker HeapMax:
> > >> 954466304, CrtHeap: 253624320, UsedHeap: 29551304
> > >> 2014-06-08 16:38:57,113-0500 INFO  Block Block task status changed:
> > >> Failed Job failed with an exit code of 110
> > >> 2014-06-08 16:38:57,115-0500 INFO  Block Failed task spec: Job:
> > >>       executable: /usr/bin/perl
> > >>       arguments:
> > >> /home/wilde/.globus/coasters/cscript2445623341660096310.pl
> > >> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
> > >>       stdout:     null
> > >>       stderr:     null
> > >>       directory:  /
> > >>       batch:      false
> > >>       redirected: false
> > >>       attributes:
> > >> hostcount=1,count=1,jobspernode=1,corespernode=1,maxwalltime=10
> > >>       env:        WORKER_LOGGING_LEVEL=NONE
> > >>
> > >> 2014-06-08 16:38:57,115-0500 INFO  Block Worker task failed:
> > >> Failed to connect: Connection timed out at
> > >> /home/wilde/.globus/coasters/cscript2445623341660096310.pl line 1101.
> > >>
> > >>
> > >>
> > >
> > 
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel


From wilde at anl.gov  Sun Jun  8 22:31:35 2014
From: wilde at anl.gov (Michael Wilde)
Date: Sun, 8 Jun 2014 22:31:35 -0500
Subject: [Swift-devel] Localhost coasters not working on Beagle
In-Reply-To: <1402284424.15405.1.camel@echo>
References: <5394DA10.3040404@anl.gov> <1402266795.32444.2.camel@echo>	
	<539525AE.7080705@anl.gov> <1402284171.15313.0.camel@echo>
	<1402284424.15405.1.camel@echo>
Message-ID: <53952A97.7000002@anl.gov>

I'll try the other addresses for that host.

Maybe something changed there in iptables or similar.

- MIke

On 6/8/14, 10:27 PM, Mihael Hategan wrote:
> Ok, so:
>
> shell1: hategan at login1:~> netcat -l -p 50003
>
> shell2: hategan at login1:~> netstat -lntp
> ...
> tcp        0      0 0.0.0.0:50003           0.0.0.0:*
> LISTEN      22806/netcat
> ...
>
> hategan at login1:~> telnet 127.0.0.1 50003
> Trying 127.0.0.1...
> telnet: connect to address 127.0.0.1: Connection timed out
>
> I don't think this has anything to do with swift or coasters.
>
> Mihael
>
> On Sun, 2014-06-08 at 20:22 -0700, Mihael Hategan wrote:
>> That's odd. Have you tried netstat -lntp? telnet?
>>
>> I'll give it a shot, but this looks rather strange.
>>
>> Mihael
>>
>> On Sun, 2014-06-08 at 22:10 -0500, Michael Wilde wrote:
>>> login1$ more /home/wilde/.globus/coasters/worker-0608-0710120-000000.log
>>> 2014/06/08 22:07:12.296 INFO  - 0608-0710120-000000 Logging started: Sun
>>> Jun  8 22:07:12 2014
>>> 2014/06/08 22:07:12.296 INFO  - Running on node
>>> login1.beagle.ci.uchicago.edu
>>> 2014/06/08 22:07:12.296 DEBUG - uri=http://127.0.0.1:50003
>>> 2014/06/08 22:07:12.296 DEBUG - scheme=http
>>> 2014/06/08 22:07:12.297 DEBUG - host=127.0.0.1
>>> 2014/06/08 22:07:12.297 DEBUG - port=50003
>>> 2014/06/08 22:07:12.297 DEBUG - blockid=0608-0710120-000000
>>> 2014/06/08 22:07:12.297 INFO  - Connect attempt: 0 ...
>>> 2014/06/08 22:07:12.297 INFO  - Trying 127.0.0.1:50003 ...
>>> 2014/06/08 22:07:33.296 INFO  - Connection failed: Connection timed out.
>>> Trying other addresses
>>> 2014/06/08 22:07:33.296 ERROR - Connection failed for all addresses.
>>> 2014/06/08 22:07:33.296 ERROR - Retrying in 1 seconds
>>> 2014/06/08 22:07:34.297 INFO  - Connect attempt: 1 ...
>>> 2014/06/08 22:07:34.297 INFO  - Trying 127.0.0.1:50003 ...
>>> 2014/06/08 22:07:55.295 INFO  - Connection failed: Connection timed out.
>>> Trying other addresses
>>> 2014/06/08 22:07:55.296 ERROR - Connection failed for all addresses.
>>> 2014/06/08 22:07:55.296 ERROR - Retrying in 2 seconds
>>> 2014/06/08 22:07:57.298 INFO  - Connect attempt: 2 ...
>>> 2014/06/08 22:07:57.298 INFO  - Trying 127.0.0.1:50003 ...
>>> 2014/06/08 22:08:18.295 INFO  - Connection failed: Connection timed out.
>>> Trying other addresses
>>> 2014/06/08 22:08:18.295 ERROR - Connection failed for all addresses.
>>> 2014/06/08 22:08:18.295 ERROR - Failed to connect: Connection timed out
>>> login1$
>>>
>>>
>>> On 6/8/14, 5:33 PM, Mihael Hategan wrote:
>>>> Can you enable worker logging and post the worker log?
>>>>
>>>> Mihael
>>>>
>>>> On Sun, 2014-06-08 at 16:48 -0500, Michael Wilde wrote:
>>>>> Mihael - Im not able to get a simple localhost coasters run working on
>>>>> Beagle login1.
>>>>>
>>>>> All: Is anyone seeing something similar?  It looks to me like my coaster
>>>>> worker is not able to connect to the Swift coaster service (using
>>>>> standard automatic coasters).
>>>>>
>>>>> Im working in /lustre/beagle/wilde/swift/lab/fastio (where you can find
>>>>> logs and configs).  Running 0.95RC6.
>>>>>
>>>>> Im setting GLOBUS_HOSTNAME (to 127.0.0.1) and have tried internalHost as
>>>>> well:
>>>>>
>>>>> login1$ swift -config cf -tc.file apps -sites.file localcoast.xml
>>>>> catsn.swift
>>>>>
>>>>> login1$ cat localcoast.xml
>>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>>> <config xmlns="http://www.ci.uchicago.edu/swift/SwiftSites">
>>>>>
>>>>> <pool handle="localhost">
>>>>>
>>>>> <execution provider="coaster" jobmanager="local:local"/>
>>>>>
>>>>> <profile namespace="globus" key="internalHostname">127.0.0.1</profile>
>>>>>      <profile namespace="globus" key="maxwalltime">00:01:00</profile>
>>>>>      <profile namespace="globus" key="maxtime">3600</profile>
>>>>>
>>>>>      <profile namespace="globus" key="jobsPerNode">1</profile>
>>>>>      <profile namespace="globus" key="slots">1</profile>
>>>>>      <profile namespace="globus" key="nodeGranularity">1</profile>
>>>>>      <profile namespace="globus" key="maxNodes">1</profile>
>>>>>
>>>>>      <profile namespace="karajan" key="jobThrottle">12</profile>
>>>>>      <profile namespace="karajan" key="initialScore">10000</profile>
>>>>>
>>>>>      <profile namespace="karajan" key="lowOverAllocation">100</profile>
>>>>>      <profile namespace="karajan" key="highOverAllocation">100</profile>
>>>>>
>>>>> <filesystem provider="local"/>
>>>>> <workdirectory>/tmp/swiftwork</workdirectory>
>>>>>
>>>>>
>>>>> </pool>
>>>>>
>>>>> I get error 110 connection timeouts:
>>>>>
>>>>> 2014-06-08 16:37:50,762-0500 DEBUG swift JOB_START jobid=cat-7jiymsrl
>>>>> tr=cat arguments=[data.txt] tmpdir=catsn-run013/jobs/7/cat-7jiymsrl
>>>>> host=localhost
>>>>> 2014-06-08 16:37:50,829-0500 INFO  LocalService Started local service:
>>>>> 127.0.0.1:50000
>>>>> 2014-06-08 16:37:50,837-0500 INFO  BootstrapService Socket bound. URL is
>>>>> http://127.0.0.1:50001
>>>>> 2014-06-08 16:37:50,914-0500 INFO  Settings Local contacts:
>>>>> [http://127.0.0.2:50003, http://192.5.86.104:50003,
>>>>> http://10.128.2.244:50003]
>>>>> 2014-06-08 16:37:50,917-0500 INFO  CoasterService Started local service:
>>>>> http://127.0.0.1:50003
>>>>> 2014-06-08 16:37:50,917-0500 INFO  CoasterService Reserving channel for
>>>>> registration
>>>>> 2014-06-08 16:37:50,942-0500 INFO  MetaChannel MetaChannel [context:
>>>>> cpipe, boundTo: null] binding to cpipe://1
>>>>> 2014-06-08 16:37:50,942-0500 INFO  MetaChannel MetaChannel [context:
>>>>> spipe, boundTo: null] binding to spipe://1
>>>>> 2014-06-08 16:37:50,942-0500 INFO  CoasterService Sending registration
>>>>> 2014-06-08 16:37:50,948-0500 INFO  MetaChannel Trying to re-bind current
>>>>> channel
>>>>> 2014-06-08 16:37:50,949-0500 INFO  RequestHandler Handler(tag: 1,
>>>>> REGISTER) unregistering (send)
>>>>> 2014-06-08 16:37:50,949-0500 INFO  CoasterService Registration complete
>>>>> 2014-06-08 16:37:50,949-0500 INFO  CoasterService Started coaster
>>>>> service: http://127.0.0.1:50002
>>>>> 2014-06-08 16:37:50,952-0500 WARN  Settings original callback URI is
>>>>> http://10.128.2.244:50003
>>>>> 2014-06-08 16:37:50,952-0500 WARN  Settings callback URI has been
>>>>> overridden to http://127.0.0.1:50003
>>>>> 2014-06-08 16:37:50,953-0500 INFO  RequestHandler Handler(tag: 1,
>>>>> CONFIGSERVICE) unregistering (send)
>>>>> 2014-06-08 16:37:50,969-0500 INFO  BlockQueueProcessor Starting...
>>>>> id=0608-3704500
>>>>> 2014-06-08 16:37:50,969-0500 INFO  RequestHandler Handler(tag: 2,
>>>>> SUBMITJOB) unregistering (send)
>>>>> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor
>>>>> Settings {
>>>>>        slots = 1
>>>>>        jobsPerNode = 1
>>>>>        workersPerNode = 1
>>>>>        nodeGranularity = 1
>>>>>        allocationStepSize = 0.1
>>>>>        maxNodes = 1
>>>>>        lowOverallocation = 10.0
>>>>>        highOverallocation = 1.0
>>>>>        overallocationDecayFactor = 0.001
>>>>>        spread = 0.9
>>>>>        reserve = 60.000s
>>>>>        maxtime = 3600
>>>>>        remoteMonitorEnabled = false
>>>>>        internalHostname = 127.0.0.1
>>>>>        hookClass = null
>>>>>        workerManager = block
>>>>>        workerLoggingLevel = NONE
>>>>>        workerLoggingDirectory = DEFAULT
>>>>>        ldLibraryPath = null
>>>>>        workerCopies = null
>>>>>        directory = null
>>>>>        useHashBang = null
>>>>>        parallelism = 0.01
>>>>>        coresPerNode = 1
>>>>>        perfTraceWorker = false
>>>>>        perfTraceInterval = -1
>>>>>        attributes = {}
>>>>>        callbackURIs = [http://127.0.0.1:50003]
>>>>> }
>>>>>
>>>>> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Jobs in holding
>>>>> queue: 1
>>>>> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Time estimate for
>>>>> holding queue (seconds): 1
>>>>> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Allocating blocks
>>>>> for a total walltime of: 1s
>>>>> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor  Considering:
>>>>> Job(id:0 60.000s)
>>>>> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor       Max
>>>>> Walltime (seconds):   60
>>>>> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor       Time
>>>>> estimate (seconds):  600
>>>>> 2014-06-08 16:37:51,010-0500 INFO  BlockQueueProcessor       Total for
>>>>> this new Block (est. seconds): 0
>>>>> 2014-06-08 16:37:51,013-0500 INFO  BlockQueueProcessor index: 0, last:
>>>>> 0, holding.size(): 1
>>>>> 2014-06-08 16:37:51,014-0500 INFO  BlockQueueProcessor Queued: 1 jobs to
>>>>> new Block
>>>>> 2014-06-08 16:37:51,014-0500 INFO  BlockQueueProcessor index: 0, last:
>>>>> 0, ii: 1, holding.size(): 1
>>>>> 2014-06-08 16:37:51,014-0500 INFO  Block Starting block: workers=1,
>>>>> walltime=600.000s
>>>>> 2014-06-08 16:37:51,016-0500 INFO  RemoteLogHandler BLOCK_REQUESTED
>>>>> id=0608-3704500-000000, cores=1, coresPerWorker=1, walltime=600
>>>>> 2014-06-08 16:37:51,016-0500 INFO  RequestHandler Handler(tag: 2, RLOG)
>>>>> unregistering (send)
>>>>> 2014-06-08 16:37:51,018-0500 INFO  BlockTaskSubmitter Queuing block
>>>>> Block 0608-3704500-000000 (1x600.000s) for submission
>>>>> 2014-06-08 16:37:51,018-0500 INFO  BlockQueueProcessor Added 1 jobs to
>>>>> new blocks
>>>>> 2014-06-08 16:37:51,018-0500 INFO  BlockTaskSubmitter Submitting block
>>>>> Block 0608-3704500-000000 (1x600.000s)
>>>>> 2014-06-08 16:37:51,018-0500 INFO  ExecutionTaskHandler provider=local
>>>>> 2014-06-08 16:37:51,023-0500 INFO  Block Block task status changed:
>>>>> Submitting
>>>>> 2014-06-08 16:37:51,023-0500 INFO  JobSubmissionTaskHandler Submit: in:
>>>>> / command: /usr/bin/perl
>>>>> /home/wilde/.globus/coasters/cscript2445623341660096310.pl
>>>>> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
>>>>> 2014-06-08 16:37:51,024-0500 INFO  Block Block task status changed:
>>>>> Submitted
>>>>> 2014-06-08 16:37:51,027-0500 INFO  Block Block task status changed: Active
>>>>> 2014-06-08 16:37:51,027-0500 INFO  RemoteLogHandler BLOCK_ACTIVE
>>>>> id=0608-3704500-000000
>>>>> 2014-06-08 16:37:51,027-0500 INFO  RequestHandler Handler(tag: 3, RLOG)
>>>>> unregistering (send)
>>>>> 2014-06-08 16:37:51,681-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
>>>>> 2014-06-08 16:37:51,681-0500 INFO  RuntimeStats$ProgressTicker HeapMax:
>>>>> 954466304, CrtHeap: 253624320, UsedHeap: 28583112
>>>>> 2014-06-08 16:38:21,683-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
>>>>> 2014-06-08 16:38:21,683-0500 INFO  RuntimeStats$ProgressTicker HeapMax:
>>>>> 954466304, CrtHeap: 253624320, UsedHeap: 29067208
>>>>> 2014-06-08 16:38:51,686-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
>>>>> 2014-06-08 16:38:51,686-0500 INFO  RuntimeStats$ProgressTicker HeapMax:
>>>>> 954466304, CrtHeap: 253624320, UsedHeap: 29551304
>>>>> 2014-06-08 16:38:57,113-0500 INFO  Block Block task status changed:
>>>>> Failed Job failed with an exit code of 110
>>>>> 2014-06-08 16:38:57,115-0500 INFO  Block Failed task spec: Job:
>>>>>        executable: /usr/bin/perl
>>>>>        arguments:
>>>>> /home/wilde/.globus/coasters/cscript2445623341660096310.pl
>>>>> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
>>>>>        stdout:     null
>>>>>        stderr:     null
>>>>>        directory:  /
>>>>>        batch:      false
>>>>>        redirected: false
>>>>>        attributes:
>>>>> hostcount=1,count=1,jobspernode=1,corespernode=1,maxwalltime=10
>>>>>        env:        WORKER_LOGGING_LEVEL=NONE
>>>>>
>>>>> 2014-06-08 16:38:57,115-0500 INFO  Block Worker task failed:
>>>>> Failed to connect: Connection timed out at
>>>>> /home/wilde/.globus/coasters/cscript2445623341660096310.pl line 1101.
>>>>>
>>>>>
>>>>>
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>

-- 
Michael Wilde
Mathematics and Computer Science          Computation Institute
Argonne National Laboratory               The University of Chicago


From wilde at anl.gov  Sun Jun  8 23:12:50 2014
From: wilde at anl.gov (Michael Wilde)
Date: Sun, 8 Jun 2014 23:12:50 -0500
Subject: [Swift-devel] Localhost coasters not working on Beagle
In-Reply-To: <53952A97.7000002@anl.gov>
References: <5394DA10.3040404@anl.gov>
	<1402266795.32444.2.camel@echo>		<539525AE.7080705@anl.gov>
	<1402284171.15313.0.camel@echo>	<1402284424.15405.1.camel@echo>
	<53952A97.7000002@anl.gov>
Message-ID: <53953442.1090600@anl.gov>

Yadu pointed out that beagle's login host ports are open at a different 
range.

When I set the correct port range in GLOBUS_TCP_PORT_RANGE and 
GLOBUS_TCP_SOURCE_RANGE, it works.

The swift module on Beagle does this automatically. I was using my own 
download of 0.95-RC6.

Thanks, Mihael and Yadu.

- Mike

On 6/8/14, 10:31 PM, Michael Wilde wrote:
> I'll try the other addresses for that host.
>
> Maybe something changed there in iptables or similar.
>
> - MIke
>
> On 6/8/14, 10:27 PM, Mihael Hategan wrote:
>> Ok, so:
>>
>> shell1: hategan at login1:~> netcat -l -p 50003
>>
>> shell2: hategan at login1:~> netstat -lntp
>> ...
>> tcp        0      0 0.0.0.0:50003           0.0.0.0:*
>> LISTEN      22806/netcat
>> ...
>>
>> hategan at login1:~> telnet 127.0.0.1 50003
>> Trying 127.0.0.1...
>> telnet: connect to address 127.0.0.1: Connection timed out
>>
>> I don't think this has anything to do with swift or coasters.
>>
>> Mihael
>>
>> On Sun, 2014-06-08 at 20:22 -0700, Mihael Hategan wrote:
>>> That's odd. Have you tried netstat -lntp? telnet?
>>>
>>> I'll give it a shot, but this looks rather strange.
>>>
>>> Mihael
>>>
>>> On Sun, 2014-06-08 at 22:10 -0500, Michael Wilde wrote:
>>>> login1$ more /home/wilde/.globus/coasters/worker-0608-0710120-000000.log
>>>> 2014/06/08 22:07:12.296 INFO  - 0608-0710120-000000 Logging started: Sun
>>>> Jun  8 22:07:12 2014
>>>> 2014/06/08 22:07:12.296 INFO  - Running on node
>>>> login1.beagle.ci.uchicago.edu
>>>> 2014/06/08 22:07:12.296 DEBUG - uri=http://127.0.0.1:50003
>>>> 2014/06/08 22:07:12.296 DEBUG - scheme=http
>>>> 2014/06/08 22:07:12.297 DEBUG - host=127.0.0.1
>>>> 2014/06/08 22:07:12.297 DEBUG - port=50003
>>>> 2014/06/08 22:07:12.297 DEBUG - blockid=0608-0710120-000000
>>>> 2014/06/08 22:07:12.297 INFO  - Connect attempt: 0 ...
>>>> 2014/06/08 22:07:12.297 INFO  - Trying 127.0.0.1:50003 ...
>>>> 2014/06/08 22:07:33.296 INFO  - Connection failed: Connection timed out.
>>>> Trying other addresses
>>>> 2014/06/08 22:07:33.296 ERROR - Connection failed for all addresses.
>>>> 2014/06/08 22:07:33.296 ERROR - Retrying in 1 seconds
>>>> 2014/06/08 22:07:34.297 INFO  - Connect attempt: 1 ...
>>>> 2014/06/08 22:07:34.297 INFO  - Trying 127.0.0.1:50003 ...
>>>> 2014/06/08 22:07:55.295 INFO  - Connection failed: Connection timed out.
>>>> Trying other addresses
>>>> 2014/06/08 22:07:55.296 ERROR - Connection failed for all addresses.
>>>> 2014/06/08 22:07:55.296 ERROR - Retrying in 2 seconds
>>>> 2014/06/08 22:07:57.298 INFO  - Connect attempt: 2 ...
>>>> 2014/06/08 22:07:57.298 INFO  - Trying 127.0.0.1:50003 ...
>>>> 2014/06/08 22:08:18.295 INFO  - Connection failed: Connection timed out.
>>>> Trying other addresses
>>>> 2014/06/08 22:08:18.295 ERROR - Connection failed for all addresses.
>>>> 2014/06/08 22:08:18.295 ERROR - Failed to connect: Connection timed out
>>>> login1$
>>>>
>>>>
>>>> On 6/8/14, 5:33 PM, Mihael Hategan wrote:
>>>>> Can you enable worker logging and post the worker log?
>>>>>
>>>>> Mihael
>>>>>
>>>>> On Sun, 2014-06-08 at 16:48 -0500, Michael Wilde wrote:
>>>>>> Mihael - Im not able to get a simple localhost coasters run working on
>>>>>> Beagle login1.
>>>>>>
>>>>>> All: Is anyone seeing something similar?  It looks to me like my coaster
>>>>>> worker is not able to connect to the Swift coaster service (using
>>>>>> standard automatic coasters).
>>>>>>
>>>>>> Im working in /lustre/beagle/wilde/swift/lab/fastio (where you can find
>>>>>> logs and configs).  Running 0.95RC6.
>>>>>>
>>>>>> Im setting GLOBUS_HOSTNAME (to 127.0.0.1) and have tried internalHost as
>>>>>> well:
>>>>>>
>>>>>> login1$ swift -config cf -tc.file apps -sites.file localcoast.xml
>>>>>> catsn.swift
>>>>>>
>>>>>> login1$ cat localcoast.xml
>>>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>>>> <config xmlns="http://www.ci.uchicago.edu/swift/SwiftSites">
>>>>>>
>>>>>> <pool handle="localhost">
>>>>>>
>>>>>> <execution provider="coaster" jobmanager="local:local"/>
>>>>>>
>>>>>> <profile namespace="globus" key="internalHostname">127.0.0.1</profile>
>>>>>>       <profile namespace="globus" key="maxwalltime">00:01:00</profile>
>>>>>>       <profile namespace="globus" key="maxtime">3600</profile>
>>>>>>
>>>>>>       <profile namespace="globus" key="jobsPerNode">1</profile>
>>>>>>       <profile namespace="globus" key="slots">1</profile>
>>>>>>       <profile namespace="globus" key="nodeGranularity">1</profile>
>>>>>>       <profile namespace="globus" key="maxNodes">1</profile>
>>>>>>
>>>>>>       <profile namespace="karajan" key="jobThrottle">12</profile>
>>>>>>       <profile namespace="karajan" key="initialScore">10000</profile>
>>>>>>
>>>>>>       <profile namespace="karajan" key="lowOverAllocation">100</profile>
>>>>>>       <profile namespace="karajan" key="highOverAllocation">100</profile>
>>>>>>
>>>>>> <filesystem provider="local"/>
>>>>>> <workdirectory>/tmp/swiftwork</workdirectory>
>>>>>>
>>>>>>
>>>>>> </pool>
>>>>>>
>>>>>> I get error 110 connection timeouts:
>>>>>>
>>>>>> 2014-06-08 16:37:50,762-0500 DEBUG swift JOB_START jobid=cat-7jiymsrl
>>>>>> tr=cat arguments=[data.txt] tmpdir=catsn-run013/jobs/7/cat-7jiymsrl
>>>>>> host=localhost
>>>>>> 2014-06-08 16:37:50,829-0500 INFO  LocalService Started local service:
>>>>>> 127.0.0.1:50000
>>>>>> 2014-06-08 16:37:50,837-0500 INFO  BootstrapService Socket bound. URL is
>>>>>> http://127.0.0.1:50001
>>>>>> 2014-06-08 16:37:50,914-0500 INFO  Settings Local contacts:
>>>>>> [http://127.0.0.2:50003, http://192.5.86.104:50003,
>>>>>> http://10.128.2.244:50003]
>>>>>> 2014-06-08 16:37:50,917-0500 INFO  CoasterService Started local service:
>>>>>> http://127.0.0.1:50003
>>>>>> 2014-06-08 16:37:50,917-0500 INFO  CoasterService Reserving channel for
>>>>>> registration
>>>>>> 2014-06-08 16:37:50,942-0500 INFO  MetaChannel MetaChannel [context:
>>>>>> cpipe, boundTo: null] binding to cpipe://1
>>>>>> 2014-06-08 16:37:50,942-0500 INFO  MetaChannel MetaChannel [context:
>>>>>> spipe, boundTo: null] binding to spipe://1
>>>>>> 2014-06-08 16:37:50,942-0500 INFO  CoasterService Sending registration
>>>>>> 2014-06-08 16:37:50,948-0500 INFO  MetaChannel Trying to re-bind current
>>>>>> channel
>>>>>> 2014-06-08 16:37:50,949-0500 INFO  RequestHandler Handler(tag: 1,
>>>>>> REGISTER) unregistering (send)
>>>>>> 2014-06-08 16:37:50,949-0500 INFO  CoasterService Registration complete
>>>>>> 2014-06-08 16:37:50,949-0500 INFO  CoasterService Started coaster
>>>>>> service: http://127.0.0.1:50002
>>>>>> 2014-06-08 16:37:50,952-0500 WARN  Settings original callback URI is
>>>>>> http://10.128.2.244:50003
>>>>>> 2014-06-08 16:37:50,952-0500 WARN  Settings callback URI has been
>>>>>> overridden to http://127.0.0.1:50003
>>>>>> 2014-06-08 16:37:50,953-0500 INFO  RequestHandler Handler(tag: 1,
>>>>>> CONFIGSERVICE) unregistering (send)
>>>>>> 2014-06-08 16:37:50,969-0500 INFO  BlockQueueProcessor Starting...
>>>>>> id=0608-3704500
>>>>>> 2014-06-08 16:37:50,969-0500 INFO  RequestHandler Handler(tag: 2,
>>>>>> SUBMITJOB) unregistering (send)
>>>>>> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor
>>>>>> Settings {
>>>>>>         slots = 1
>>>>>>         jobsPerNode = 1
>>>>>>         workersPerNode = 1
>>>>>>         nodeGranularity = 1
>>>>>>         allocationStepSize = 0.1
>>>>>>         maxNodes = 1
>>>>>>         lowOverallocation = 10.0
>>>>>>         highOverallocation = 1.0
>>>>>>         overallocationDecayFactor = 0.001
>>>>>>         spread = 0.9
>>>>>>         reserve = 60.000s
>>>>>>         maxtime = 3600
>>>>>>         remoteMonitorEnabled = false
>>>>>>         internalHostname = 127.0.0.1
>>>>>>         hookClass = null
>>>>>>         workerManager = block
>>>>>>         workerLoggingLevel = NONE
>>>>>>         workerLoggingDirectory = DEFAULT
>>>>>>         ldLibraryPath = null
>>>>>>         workerCopies = null
>>>>>>         directory = null
>>>>>>         useHashBang = null
>>>>>>         parallelism = 0.01
>>>>>>         coresPerNode = 1
>>>>>>         perfTraceWorker = false
>>>>>>         perfTraceInterval = -1
>>>>>>         attributes = {}
>>>>>>         callbackURIs = [http://127.0.0.1:50003]
>>>>>> }
>>>>>>
>>>>>> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Jobs in holding
>>>>>> queue: 1
>>>>>> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Time estimate for
>>>>>> holding queue (seconds): 1
>>>>>> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor Allocating blocks
>>>>>> for a total walltime of: 1s
>>>>>> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor  Considering:
>>>>>> Job(id:0 60.000s)
>>>>>> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor       Max
>>>>>> Walltime (seconds):   60
>>>>>> 2014-06-08 16:37:51,009-0500 INFO  BlockQueueProcessor       Time
>>>>>> estimate (seconds):  600
>>>>>> 2014-06-08 16:37:51,010-0500 INFO  BlockQueueProcessor       Total for
>>>>>> this new Block (est. seconds): 0
>>>>>> 2014-06-08 16:37:51,013-0500 INFO  BlockQueueProcessor index: 0, last:
>>>>>> 0, holding.size(): 1
>>>>>> 2014-06-08 16:37:51,014-0500 INFO  BlockQueueProcessor Queued: 1 jobs to
>>>>>> new Block
>>>>>> 2014-06-08 16:37:51,014-0500 INFO  BlockQueueProcessor index: 0, last:
>>>>>> 0, ii: 1, holding.size(): 1
>>>>>> 2014-06-08 16:37:51,014-0500 INFO  Block Starting block: workers=1,
>>>>>> walltime=600.000s
>>>>>> 2014-06-08 16:37:51,016-0500 INFO  RemoteLogHandler BLOCK_REQUESTED
>>>>>> id=0608-3704500-000000, cores=1, coresPerWorker=1, walltime=600
>>>>>> 2014-06-08 16:37:51,016-0500 INFO  RequestHandler Handler(tag: 2, RLOG)
>>>>>> unregistering (send)
>>>>>> 2014-06-08 16:37:51,018-0500 INFO  BlockTaskSubmitter Queuing block
>>>>>> Block 0608-3704500-000000 (1x600.000s) for submission
>>>>>> 2014-06-08 16:37:51,018-0500 INFO  BlockQueueProcessor Added 1 jobs to
>>>>>> new blocks
>>>>>> 2014-06-08 16:37:51,018-0500 INFO  BlockTaskSubmitter Submitting block
>>>>>> Block 0608-3704500-000000 (1x600.000s)
>>>>>> 2014-06-08 16:37:51,018-0500 INFO  ExecutionTaskHandler provider=local
>>>>>> 2014-06-08 16:37:51,023-0500 INFO  Block Block task status changed:
>>>>>> Submitting
>>>>>> 2014-06-08 16:37:51,023-0500 INFO  JobSubmissionTaskHandler Submit: in:
>>>>>> / command: /usr/bin/perl
>>>>>> /home/wilde/.globus/coasters/cscript2445623341660096310.pl
>>>>>> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
>>>>>> 2014-06-08 16:37:51,024-0500 INFO  Block Block task status changed:
>>>>>> Submitted
>>>>>> 2014-06-08 16:37:51,027-0500 INFO  Block Block task status changed: Active
>>>>>> 2014-06-08 16:37:51,027-0500 INFO  RemoteLogHandler BLOCK_ACTIVE
>>>>>> id=0608-3704500-000000
>>>>>> 2014-06-08 16:37:51,027-0500 INFO  RequestHandler Handler(tag: 3, RLOG)
>>>>>> unregistering (send)
>>>>>> 2014-06-08 16:37:51,681-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
>>>>>> 2014-06-08 16:37:51,681-0500 INFO  RuntimeStats$ProgressTicker HeapMax:
>>>>>> 954466304, CrtHeap: 253624320, UsedHeap: 28583112
>>>>>> 2014-06-08 16:38:21,683-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
>>>>>> 2014-06-08 16:38:21,683-0500 INFO  RuntimeStats$ProgressTicker HeapMax:
>>>>>> 954466304, CrtHeap: 253624320, UsedHeap: 29067208
>>>>>> 2014-06-08 16:38:51,686-0500 INFO  RuntimeStats$ProgressTicker Submitted:1
>>>>>> 2014-06-08 16:38:51,686-0500 INFO  RuntimeStats$ProgressTicker HeapMax:
>>>>>> 954466304, CrtHeap: 253624320, UsedHeap: 29551304
>>>>>> 2014-06-08 16:38:57,113-0500 INFO  Block Block task status changed:
>>>>>> Failed Job failed with an exit code of 110
>>>>>> 2014-06-08 16:38:57,115-0500 INFO  Block Failed task spec: Job:
>>>>>>         executable: /usr/bin/perl
>>>>>>         arguments:
>>>>>> /home/wilde/.globus/coasters/cscript2445623341660096310.pl
>>>>>> http://127.0.0.1:50003 0608-3704500-000000 NOLOGGING
>>>>>>         stdout:     null
>>>>>>         stderr:     null
>>>>>>         directory:  /
>>>>>>         batch:      false
>>>>>>         redirected: false
>>>>>>         attributes:
>>>>>> hostcount=1,count=1,jobspernode=1,corespernode=1,maxwalltime=10
>>>>>>         env:        WORKER_LOGGING_LEVEL=NONE
>>>>>>
>>>>>> 2014-06-08 16:38:57,115-0500 INFO  Block Worker task failed:
>>>>>> Failed to connect: Connection timed out at
>>>>>> /home/wilde/.globus/coasters/cscript2445623341660096310.pl line 1101.
>>>>>>
>>>>>>
>>>>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel

-- 
Michael Wilde
Mathematics and Computer Science          Computation Institute
Argonne National Laboratory               The University of Chicago


From lulzanonym at gmail.com  Mon Jun  9 05:07:11 2014
From: lulzanonym at gmail.com (Walid Braham)
Date: Mon, 9 Jun 2014 12:07:11 +0200
Subject: [Swift-devel] Subscription
Message-ID: <CAPFvKRQkbi6Hq9McYtVtR9t5H8zrTrOoW05+To4doLe-zNe-Vw@mail.gmail.com>

lulzanonym at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20140609/7584eea9/attachment.html>

From tga at uchicago.edu  Wed Jun 11 07:33:23 2014
From: tga at uchicago.edu (Tim Armstrong)
Date: Wed, 11 Jun 2014 14:33:23 +0200
Subject: [Swift-devel] Swift-T Github mirror moved
Message-ID: <53984C93.3080003@uchicago.edu>

I moved the Swift/T github mirror from my personal account to the swift 
organisation.  You can find it at https://github.com/swift-lang/swift-t .

Cheers,
Tim


From yadunand at uchicago.edu  Wed Jun 11 11:41:58 2014
From: yadunand at uchicago.edu (Yadu Nand Babuji)
Date: Wed, 11 Jun 2014 11:41:58 -0500
Subject: [Swift-devel] SVN down ?
Message-ID: <539886D6.6000602@uchicago.edu>

Hi,

Since about an hour back I'm not able to access svn.ci.uchicago.edu. The 
online repo browser wouldn't load as well.
Whom do I contact ?

[yadunand at midway001 tests]$ svn up
Updating '.':
svn: E000110: Unable to connect to a repository at URL 
'https://svn.ci.uchicago.edu/svn/vdl2/branches/release-0.95'
svn: E000110: Error running context: Connection timed out

Thanks,
Yadu


From davidkelly at uchicago.edu  Wed Jun 11 11:53:59 2014
From: davidkelly at uchicago.edu (David Kelly)
Date: Wed, 11 Jun 2014 11:53:59 -0500
Subject: [Swift-devel] SVN down ?
In-Reply-To: <539886D6.6000602@uchicago.edu>
References: <539886D6.6000602@uchicago.edu>
Message-ID: <CA+_+Ey_UY9KXDaB+xo3ckBji7EpF2ydw7-rzbHC5iZs=nUb1fg@mail.gmail.com>

CI support maintains the svn server


On Wed, Jun 11, 2014 at 11:41 AM, Yadu Nand Babuji <yadunand at uchicago.edu>
wrote:

> Hi,
>
> Since about an hour back I'm not able to access svn.ci.uchicago.edu. The
> online repo browser wouldn't load as well.
> Whom do I contact ?
>
> [yadunand at midway001 tests]$ svn up
> Updating '.':
> svn: E000110: Unable to connect to a repository at URL
> 'https://svn.ci.uchicago.edu/svn/vdl2/branches/release-0.95'
> svn: E000110: Error running context: Connection timed out
>
> Thanks,
> Yadu
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20140611/f11db74b/attachment.html>

From yadunand at uchicago.edu  Wed Jun 11 14:46:25 2014
From: yadunand at uchicago.edu (Yadu Nand Babuji)
Date: Wed, 11 Jun 2014 14:46:25 -0500
Subject: [Swift-devel] Reducing swift log size
Message-ID: <5398B211.8060403@uchicago.edu>

Hi,

I'm running a proxy app for a user with 6000 tasks each taking a few 
milliseconds, and the log sizes are unusually large. When the tasks are 
set to take 20s, the total log size reaches ~7Gb.

I tried setting -minimal.logging and -reduced.logging but still see 
debug lines in the log like:
2014-06-11 19:15:06,961+0000 DEBUG Cpu worker started: 
block=0611-1507050-000000 host=midway461 id=10
2014-06-11 19:15:06,961+0000 DEBUG Cpu ready for work: 
block=0611-1507050-000000 id=10

Do we need DEBUG lines such as the ones listed above ? Is it reasonable 
to have these set by default to WARN ?
Secondly, setting -minimal.logging did not turn off these DEBUG lines 
and I had to set the following log4j.properties from DEBUG to WARN to 
remove most of the offending lines:

log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=WARN
log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block=WARN

With 6001 tasks, each taking 2 ms or so:
Swift without any changes to logging ->  440879 lines and  51Mb
Swift with -minimal.logging                ->    83350  lines and 9.5Mb
Swift with -minimal.logging and         ->      7625  lines and 791Kb
Cpu, Block log4j properties set

Thanks,
Yadu


From hategan at mcs.anl.gov  Wed Jun 11 14:54:57 2014
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 11 Jun 2014 12:54:57 -0700
Subject: [Swift-devel] Reducing swift log size
In-Reply-To: <5398B211.8060403@uchicago.edu>
References: <5398B211.8060403@uchicago.edu>
Message-ID: <1402516497.26970.3.camel@echo>

On Wed, 2014-06-11 at 14:46 -0500, Yadu Nand Babuji wrote:
> Hi,
> 
> I'm running a proxy app for a user with 6000 tasks each taking a few 
> milliseconds, and the log sizes are unusually large. When the tasks are 
> set to take 20s, the total log size reaches ~7Gb.

7GB? Wow. I'd like to see that. Can you upload and post link?

> 
> I tried setting -minimal.logging and -reduced.logging but still see 
> debug lines in the log like:
> 2014-06-11 19:15:06,961+0000 DEBUG Cpu worker started: 
> block=0611-1507050-000000 host=midway461 id=10
> 2014-06-11 19:15:06,961+0000 DEBUG Cpu ready for work: 
> block=0611-1507050-000000 id=10
> 
> Do we need DEBUG lines such as the ones listed above ? Is it reasonable 
> to have these set by default to WARN ?

It is until there's a problem and then people ask for the opposite. We
should evaluate whether this belongs in reduced logging or not. But does
that really account for most of the 7G?

> Secondly, setting -minimal.logging did not turn off these DEBUG lines 
> and I had to set the following log4j.properties from DEBUG to WARN to 
> remove most of the offending lines:

That sounds like a bug.

> 
> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=WARN
> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block=WARN
> 
> With 6001 tasks, each taking 2 ms or so:
> Swift without any changes to logging ->  440879 lines and  51Mb
> Swift with -minimal.logging                ->    83350  lines and 9.5Mb
> Swift with -minimal.logging and         ->      7625  lines and 791Kb
> Cpu, Block log4j properties set
> 
> Thanks,
> Yadu
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel


From yadunand at uchicago.edu  Wed Jun 11 17:57:22 2014
From: yadunand at uchicago.edu (Yadu Nand Babuji)
Date: Wed, 11 Jun 2014 17:57:22 -0500
Subject: [Swift-devel] Reducing swift log size
In-Reply-To: <1402516497.26970.3.camel@echo>
References: <5398B211.8060403@uchicago.edu> <1402516497.26970.3.camel@echo>
Message-ID: <5398DED2.70808@uchicago.edu>

Hi Mihael,

I've got the logs for you.

This time, i've run the 6001 tasks with 20s delay added, and was all run 
with swift-0.95-RC6 (from our website) :

Normal run                 -> 
http://users.rcc.uchicago.edu/~yadunand/run010_swift_normal/run010.log ( 
6.1gb )
With minimal logging -> 
http://users.rcc.uchicago.edu/~yadunand/run011_minimal_logging/run011.log ( 
7.5 gb )
With minimal logging -> 
http://users.rcc.uchicago.edu/~yadunand/run012_minimal_plus_log4j/run012.log 
( 845.4 kb )
& log4j properties modified

The run with minimal logging ran for ~15mins while the first run took 
~12mins. That *might* explain why the one
with minimal logging is larger.

Thanks,
Yadu

On 06/11/2014 02:54 PM, Mihael Hategan wrote:
> On Wed, 2014-06-11 at 14:46 -0500, Yadu Nand Babuji wrote:
>> Hi,
>>
>> I'm running a proxy app for a user with 6000 tasks each taking a few
>> milliseconds, and the log sizes are unusually large. When the tasks are
>> set to take 20s, the total log size reaches ~7Gb.
> 7GB? Wow. I'd like to see that. Can you upload and post link?
>
>> I tried setting -minimal.logging and -reduced.logging but still see
>> debug lines in the log like:
>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu worker started:
>> block=0611-1507050-000000 host=midway461 id=10
>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu ready for work:
>> block=0611-1507050-000000 id=10
>>
>> Do we need DEBUG lines such as the ones listed above ? Is it reasonable
>> to have these set by default to WARN ?
> It is until there's a problem and then people ask for the opposite. We
> should evaluate whether this belongs in reduced logging or not. But does
> that really account for most of the 7G?
>
>> Secondly, setting -minimal.logging did not turn off these DEBUG lines
>> and I had to set the following log4j.properties from DEBUG to WARN to
>> remove most of the offending lines:
> That sounds like a bug.
>
>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=WARN
>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block=WARN
>>
>> With 6001 tasks, each taking 2 ms or so:
>> Swift without any changes to logging ->  440879 lines and  51Mb
>> Swift with -minimal.logging                ->    83350  lines and 9.5Mb
>> Swift with -minimal.logging and         ->      7625  lines and 791Kb
>> Cpu, Block log4j properties set
>>
>> Thanks,
>> Yadu
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>


From hategan at mcs.anl.gov  Wed Jun 11 19:03:06 2014
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 11 Jun 2014 17:03:06 -0700
Subject: [Swift-devel] Reducing swift log size
In-Reply-To: <5398DED2.70808@uchicago.edu>
References: <5398B211.8060403@uchicago.edu> <1402516497.26970.3.camel@echo>
	<5398DED2.70808@uchicago.edu>
Message-ID: <1402531386.29962.1.camel@echo>

Sorry, I should have mentioned this, but can you please gzip them? It's
a bit much otherwise.

Mihael

On Wed, 2014-06-11 at 17:57 -0500, Yadu Nand Babuji wrote:
> Hi Mihael,
> 
> I've got the logs for you.
> 
> This time, i've run the 6001 tasks with 20s delay added, and was all run 
> with swift-0.95-RC6 (from our website) :
> 
> Normal run                 -> 
> http://users.rcc.uchicago.edu/~yadunand/run010_swift_normal/run010.log ( 
> 6.1gb )
> With minimal logging -> 
> http://users.rcc.uchicago.edu/~yadunand/run011_minimal_logging/run011.log ( 
> 7.5 gb )
> With minimal logging -> 
> http://users.rcc.uchicago.edu/~yadunand/run012_minimal_plus_log4j/run012.log 
> ( 845.4 kb )
> & log4j properties modified
> 
> The run with minimal logging ran for ~15mins while the first run took 
> ~12mins. That *might* explain why the one
> with minimal logging is larger.
> 
> Thanks,
> Yadu
> 
> On 06/11/2014 02:54 PM, Mihael Hategan wrote:
> > On Wed, 2014-06-11 at 14:46 -0500, Yadu Nand Babuji wrote:
> >> Hi,
> >>
> >> I'm running a proxy app for a user with 6000 tasks each taking a few
> >> milliseconds, and the log sizes are unusually large. When the tasks are
> >> set to take 20s, the total log size reaches ~7Gb.
> > 7GB? Wow. I'd like to see that. Can you upload and post link?
> >
> >> I tried setting -minimal.logging and -reduced.logging but still see
> >> debug lines in the log like:
> >> 2014-06-11 19:15:06,961+0000 DEBUG Cpu worker started:
> >> block=0611-1507050-000000 host=midway461 id=10
> >> 2014-06-11 19:15:06,961+0000 DEBUG Cpu ready for work:
> >> block=0611-1507050-000000 id=10
> >>
> >> Do we need DEBUG lines such as the ones listed above ? Is it reasonable
> >> to have these set by default to WARN ?
> > It is until there's a problem and then people ask for the opposite. We
> > should evaluate whether this belongs in reduced logging or not. But does
> > that really account for most of the 7G?
> >
> >> Secondly, setting -minimal.logging did not turn off these DEBUG lines
> >> and I had to set the following log4j.properties from DEBUG to WARN to
> >> remove most of the offending lines:
> > That sounds like a bug.
> >
> >> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=WARN
> >> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block=WARN
> >>
> >> With 6001 tasks, each taking 2 ms or so:
> >> Swift without any changes to logging ->  440879 lines and  51Mb
> >> Swift with -minimal.logging                ->    83350  lines and 9.5Mb
> >> Swift with -minimal.logging and         ->      7625  lines and 791Kb
> >> Cpu, Block log4j properties set
> >>
> >> Thanks,
> >> Yadu
> >> _______________________________________________
> >> Swift-devel mailing list
> >> Swift-devel at ci.uchicago.edu
> >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >
> 


From yadunand at uchicago.edu  Wed Jun 11 19:45:56 2014
From: yadunand at uchicago.edu (Yadu Nand Babuji)
Date: Wed, 11 Jun 2014 19:45:56 -0500
Subject: [Swift-devel] Reducing swift log size
In-Reply-To: <1402531386.29962.1.camel@echo>
References: <5398B211.8060403@uchicago.edu> <1402516497.26970.3.camel@echo>	
	<5398DED2.70808@uchicago.edu> <1402531386.29962.1.camel@echo>
Message-ID: <5398F844.4050607@uchicago.edu>

Okay, here you go:
http://users.rcc.uchicago.edu/~yadunand/run010_swift_normal.tar.gz
http://users.rcc.uchicago.edu/~yadunand/run011_minimal_logging.tar.gz
http://users.rcc.uchicago.edu/~yadunand/run012_minimal_plus_log4j.tar.gz

-Yadu
On 06/11/2014 07:03 PM, Mihael Hategan wrote:
> Sorry, I should have mentioned this, but can you please gzip them? It's
> a bit much otherwise.
>
> Mihael
>
> On Wed, 2014-06-11 at 17:57 -0500, Yadu Nand Babuji wrote:
>> Hi Mihael,
>>
>> I've got the logs for you.
>>
>> This time, i've run the 6001 tasks with 20s delay added, and was all run
>> with swift-0.95-RC6 (from our website) :
>>
>> Normal run                 ->
>> http://users.rcc.uchicago.edu/~yadunand/run010_swift_normal/run010.log (
>> 6.1gb )
>> With minimal logging ->
>> http://users.rcc.uchicago.edu/~yadunand/run011_minimal_logging/run011.log (
>> 7.5 gb )
>> With minimal logging ->
>> http://users.rcc.uchicago.edu/~yadunand/run012_minimal_plus_log4j/run012.log
>> ( 845.4 kb )
>> & log4j properties modified
>>
>> The run with minimal logging ran for ~15mins while the first run took
>> ~12mins. That *might* explain why the one
>> with minimal logging is larger.
>>
>> Thanks,
>> Yadu
>>
>> On 06/11/2014 02:54 PM, Mihael Hategan wrote:
>>> On Wed, 2014-06-11 at 14:46 -0500, Yadu Nand Babuji wrote:
>>>> Hi,
>>>>
>>>> I'm running a proxy app for a user with 6000 tasks each taking a few
>>>> milliseconds, and the log sizes are unusually large. When the tasks are
>>>> set to take 20s, the total log size reaches ~7Gb.
>>> 7GB? Wow. I'd like to see that. Can you upload and post link?
>>>
>>>> I tried setting -minimal.logging and -reduced.logging but still see
>>>> debug lines in the log like:
>>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu worker started:
>>>> block=0611-1507050-000000 host=midway461 id=10
>>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu ready for work:
>>>> block=0611-1507050-000000 id=10
>>>>
>>>> Do we need DEBUG lines such as the ones listed above ? Is it reasonable
>>>> to have these set by default to WARN ?
>>> It is until there's a problem and then people ask for the opposite. We
>>> should evaluate whether this belongs in reduced logging or not. But does
>>> that really account for most of the 7G?
>>>
>>>> Secondly, setting -minimal.logging did not turn off these DEBUG lines
>>>> and I had to set the following log4j.properties from DEBUG to WARN to
>>>> remove most of the offending lines:
>>> That sounds like a bug.
>>>
>>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=WARN
>>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block=WARN
>>>>
>>>> With 6001 tasks, each taking 2 ms or so:
>>>> Swift without any changes to logging ->  440879 lines and  51Mb
>>>> Swift with -minimal.logging                ->    83350  lines and 9.5Mb
>>>> Swift with -minimal.logging and         ->      7625  lines and 791Kb
>>>> Cpu, Block log4j properties set
>>>>
>>>> Thanks,
>>>> Yadu
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>


From hategan at mcs.anl.gov  Thu Jun 12 01:45:39 2014
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 11 Jun 2014 23:45:39 -0700
Subject: [Swift-devel] Reducing swift log size
In-Reply-To: <5398F844.4050607@uchicago.edu>
References: <5398B211.8060403@uchicago.edu> <1402516497.26970.3.camel@echo>
	<5398DED2.70808@uchicago.edu> <1402531386.29962.1.camel@echo>
	<5398F844.4050607@uchicago.edu>
Message-ID: <1402555539.11363.1.camel@echo>

Ok, the worker CPU stuff is indeed generating a lot of messages. But
that isn't supposed to happen. It's supposed to do nothing if nothing
happens. So I need to check what's going on.

Mihael

On Wed, 2014-06-11 at 19:45 -0500, Yadu Nand Babuji wrote:
> Okay, here you go:
> http://users.rcc.uchicago.edu/~yadunand/run010_swift_normal.tar.gz
> http://users.rcc.uchicago.edu/~yadunand/run011_minimal_logging.tar.gz
> http://users.rcc.uchicago.edu/~yadunand/run012_minimal_plus_log4j.tar.gz
> 
> -Yadu
> On 06/11/2014 07:03 PM, Mihael Hategan wrote:
> > Sorry, I should have mentioned this, but can you please gzip them? It's
> > a bit much otherwise.
> >
> > Mihael
> >
> > On Wed, 2014-06-11 at 17:57 -0500, Yadu Nand Babuji wrote:
> >> Hi Mihael,
> >>
> >> I've got the logs for you.
> >>
> >> This time, i've run the 6001 tasks with 20s delay added, and was all run
> >> with swift-0.95-RC6 (from our website) :
> >>
> >> Normal run                 ->
> >> http://users.rcc.uchicago.edu/~yadunand/run010_swift_normal/run010.log (
> >> 6.1gb )
> >> With minimal logging ->
> >> http://users.rcc.uchicago.edu/~yadunand/run011_minimal_logging/run011.log (
> >> 7.5 gb )
> >> With minimal logging ->
> >> http://users.rcc.uchicago.edu/~yadunand/run012_minimal_plus_log4j/run012.log
> >> ( 845.4 kb )
> >> & log4j properties modified
> >>
> >> The run with minimal logging ran for ~15mins while the first run took
> >> ~12mins. That *might* explain why the one
> >> with minimal logging is larger.
> >>
> >> Thanks,
> >> Yadu
> >>
> >> On 06/11/2014 02:54 PM, Mihael Hategan wrote:
> >>> On Wed, 2014-06-11 at 14:46 -0500, Yadu Nand Babuji wrote:
> >>>> Hi,
> >>>>
> >>>> I'm running a proxy app for a user with 6000 tasks each taking a few
> >>>> milliseconds, and the log sizes are unusually large. When the tasks are
> >>>> set to take 20s, the total log size reaches ~7Gb.
> >>> 7GB? Wow. I'd like to see that. Can you upload and post link?
> >>>
> >>>> I tried setting -minimal.logging and -reduced.logging but still see
> >>>> debug lines in the log like:
> >>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu worker started:
> >>>> block=0611-1507050-000000 host=midway461 id=10
> >>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu ready for work:
> >>>> block=0611-1507050-000000 id=10
> >>>>
> >>>> Do we need DEBUG lines such as the ones listed above ? Is it reasonable
> >>>> to have these set by default to WARN ?
> >>> It is until there's a problem and then people ask for the opposite. We
> >>> should evaluate whether this belongs in reduced logging or not. But does
> >>> that really account for most of the 7G?
> >>>
> >>>> Secondly, setting -minimal.logging did not turn off these DEBUG lines
> >>>> and I had to set the following log4j.properties from DEBUG to WARN to
> >>>> remove most of the offending lines:
> >>> That sounds like a bug.
> >>>
> >>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=WARN
> >>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block=WARN
> >>>>
> >>>> With 6001 tasks, each taking 2 ms or so:
> >>>> Swift without any changes to logging ->  440879 lines and  51Mb
> >>>> Swift with -minimal.logging                ->    83350  lines and 9.5Mb
> >>>> Swift with -minimal.logging and         ->      7625  lines and 791Kb
> >>>> Cpu, Block log4j properties set
> >>>>
> >>>> Thanks,
> >>>> Yadu
> >>>> _______________________________________________
> >>>> Swift-devel mailing list
> >>>> Swift-devel at ci.uchicago.edu
> >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >
> 


From hategan at mcs.anl.gov  Thu Jun 12 02:22:47 2014
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 12 Jun 2014 00:22:47 -0700
Subject: [Swift-devel] Reducing swift log size
In-Reply-To: <1402555539.11363.1.camel@echo>
References: <5398B211.8060403@uchicago.edu> <1402516497.26970.3.camel@echo>
	<5398DED2.70808@uchicago.edu> <1402531386.29962.1.camel@echo>
	<5398F844.4050607@uchicago.edu> <1402555539.11363.1.camel@echo>
Message-ID: <1402557767.11363.8.camel@echo>

Allright, Can you do me the following favor?

in coasters/src/......./Cpu.java, change line 183 from
" Cpus sleeping: " + cpus);
to
" Cpus sleeping: " + cpus + ", qseq: " + lastseq + ", myseq: " +
this.getLastSeq());

Then re-run and send me the log. You don't have to run the full thing.
When you see the very frequent sleeping/requesting work craziness in the
log, you can kill the run.

I'm asking this because I have not seen this problem occurring, and it
shouldn't be happening, but it clearly is and your version holds the
key.

Mihael

On Wed, 2014-06-11 at 23:45 -0700, Mihael Hategan wrote:
> Ok, the worker CPU stuff is indeed generating a lot of messages. But
> that isn't supposed to happen. It's supposed to do nothing if nothing
> happens. So I need to check what's going on.
> 
> Mihael
> 
> On Wed, 2014-06-11 at 19:45 -0500, Yadu Nand Babuji wrote:
> > Okay, here you go:
> > http://users.rcc.uchicago.edu/~yadunand/run010_swift_normal.tar.gz
> > http://users.rcc.uchicago.edu/~yadunand/run011_minimal_logging.tar.gz
> > http://users.rcc.uchicago.edu/~yadunand/run012_minimal_plus_log4j.tar.gz
> > 
> > -Yadu
> > On 06/11/2014 07:03 PM, Mihael Hategan wrote:
> > > Sorry, I should have mentioned this, but can you please gzip them? It's
> > > a bit much otherwise.
> > >
> > > Mihael
> > >
> > > On Wed, 2014-06-11 at 17:57 -0500, Yadu Nand Babuji wrote:
> > >> Hi Mihael,
> > >>
> > >> I've got the logs for you.
> > >>
> > >> This time, i've run the 6001 tasks with 20s delay added, and was all run
> > >> with swift-0.95-RC6 (from our website) :
> > >>
> > >> Normal run                 ->
> > >> http://users.rcc.uchicago.edu/~yadunand/run010_swift_normal/run010.log (
> > >> 6.1gb )
> > >> With minimal logging ->
> > >> http://users.rcc.uchicago.edu/~yadunand/run011_minimal_logging/run011.log (
> > >> 7.5 gb )
> > >> With minimal logging ->
> > >> http://users.rcc.uchicago.edu/~yadunand/run012_minimal_plus_log4j/run012.log
> > >> ( 845.4 kb )
> > >> & log4j properties modified
> > >>
> > >> The run with minimal logging ran for ~15mins while the first run took
> > >> ~12mins. That *might* explain why the one
> > >> with minimal logging is larger.
> > >>
> > >> Thanks,
> > >> Yadu
> > >>
> > >> On 06/11/2014 02:54 PM, Mihael Hategan wrote:
> > >>> On Wed, 2014-06-11 at 14:46 -0500, Yadu Nand Babuji wrote:
> > >>>> Hi,
> > >>>>
> > >>>> I'm running a proxy app for a user with 6000 tasks each taking a few
> > >>>> milliseconds, and the log sizes are unusually large. When the tasks are
> > >>>> set to take 20s, the total log size reaches ~7Gb.
> > >>> 7GB? Wow. I'd like to see that. Can you upload and post link?
> > >>>
> > >>>> I tried setting -minimal.logging and -reduced.logging but still see
> > >>>> debug lines in the log like:
> > >>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu worker started:
> > >>>> block=0611-1507050-000000 host=midway461 id=10
> > >>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu ready for work:
> > >>>> block=0611-1507050-000000 id=10
> > >>>>
> > >>>> Do we need DEBUG lines such as the ones listed above ? Is it reasonable
> > >>>> to have these set by default to WARN ?
> > >>> It is until there's a problem and then people ask for the opposite. We
> > >>> should evaluate whether this belongs in reduced logging or not. But does
> > >>> that really account for most of the 7G?
> > >>>
> > >>>> Secondly, setting -minimal.logging did not turn off these DEBUG lines
> > >>>> and I had to set the following log4j.properties from DEBUG to WARN to
> > >>>> remove most of the offending lines:
> > >>> That sounds like a bug.
> > >>>
> > >>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=WARN
> > >>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block=WARN
> > >>>>
> > >>>> With 6001 tasks, each taking 2 ms or so:
> > >>>> Swift without any changes to logging ->  440879 lines and  51Mb
> > >>>> Swift with -minimal.logging                ->    83350  lines and 9.5Mb
> > >>>> Swift with -minimal.logging and         ->      7625  lines and 791Kb
> > >>>> Cpu, Block log4j properties set
> > >>>>
> > >>>> Thanks,
> > >>>> Yadu
> > >>>> _______________________________________________
> > >>>> Swift-devel mailing list
> > >>>> Swift-devel at ci.uchicago.edu
> > >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > >
> > 
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel


From yadunand at uchicago.edu  Thu Jun 12 12:11:12 2014
From: yadunand at uchicago.edu (Yadu Nand Babuji)
Date: Thu, 12 Jun 2014 12:11:12 -0500
Subject: [Swift-devel] Reducing swift log size
In-Reply-To: <1402557767.11363.8.camel@echo>
References: <5398B211.8060403@uchicago.edu> <1402516497.26970.3.camel@echo>	
	<5398DED2.70808@uchicago.edu> <1402531386.29962.1.camel@echo>	
	<5398F844.4050607@uchicago.edu> <1402555539.11363.1.camel@echo>
	<1402557767.11363.8.camel@echo>
Message-ID: <5399DF30.6040708@uchicago.edu>

Hi Mihael,

Here's the package I'm running:
http://users.rcc.uchicago.edu/~yadunand/swift-0.95-modded.tar.gz
(It has Cpu.java modified, as well as as a debugging line in 
libexec/swift-int-staging.k)

I shutdown the run once the logs were past 4gb, here's the link to the 
log :
http://users.rcc.uchicago.edu/~yadunand/run013_with_mods_to_Cpujava.tar.gz

Thanks,
-Yadu

On 06/11/2014 07:03 PM, Mihael Hategan wrote:
>>>> Sorry, I should have mentioned this, but can you please gzip them? It's
>>>> a bit much otherwise.
>>>>
>>>> Mihael
>>>>
>>>> On Wed, 2014-06-11 at 17:57 -0500, Yadu Nand Babuji wrote:
>>>>> Hi Mihael,
>>>>>
>>>>> I've got the logs for you.
>>>>>
>>>>> This time, i've run the 6001 tasks with 20s delay added, and was all run
>>>>> with swift-0.95-RC6 (from our website) :
>>>>>
>>>>> Normal run                 ->
>>>>> http://users.rcc.uchicago.edu/~yadunand/run010_swift_normal/run010.log (
>>>>> 6.1gb )
>>>>> With minimal logging ->
>>>>> http://users.rcc.uchicago.edu/~yadunand/run011_minimal_logging/run011.log (
>>>>> 7.5 gb )
>>>>> With minimal logging ->
>>>>> http://users.rcc.uchicago.edu/~yadunand/run012_minimal_plus_log4j/run012.log
>>>>> ( 845.4 kb )
>>>>> & log4j properties modified
>>>>>
>>>>> The run with minimal logging ran for ~15mins while the first run took
>>>>> ~12mins. That *might* explain why the one
>>>>> with minimal logging is larger.
>>>>>
>>>>> Thanks,
>>>>> Yadu
>>>>>
>>>>> On 06/11/2014 02:54 PM, Mihael Hategan wrote:
>>>>>> On Wed, 2014-06-11 at 14:46 -0500, Yadu Nand Babuji wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm running a proxy app for a user with 6000 tasks each taking a few
>>>>>>> milliseconds, and the log sizes are unusually large. When the tasks are
>>>>>>> set to take 20s, the total log size reaches ~7Gb.
>>>>>> 7GB? Wow. I'd like to see that. Can you upload and post link?
>>>>>>
>>>>>>> I tried setting -minimal.logging and -reduced.logging but still see
>>>>>>> debug lines in the log like:
>>>>>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu worker started:
>>>>>>> block=0611-1507050-000000 host=midway461 id=10
>>>>>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu ready for work:
>>>>>>> block=0611-1507050-000000 id=10
>>>>>>>
>>>>>>> Do we need DEBUG lines such as the ones listed above ? Is it reasonable
>>>>>>> to have these set by default to WARN ?
>>>>>> It is until there's a problem and then people ask for the opposite. We
>>>>>> should evaluate whether this belongs in reduced logging or not. But does
>>>>>> that really account for most of the 7G?
>>>>>>
>>>>>>> Secondly, setting -minimal.logging did not turn off these DEBUG lines
>>>>>>> and I had to set the following log4j.properties from DEBUG to WARN to
>>>>>>> remove most of the offending lines:
>>>>>> That sounds like a bug.
>>>>>>
>>>>>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=WARN
>>>>>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block=WARN
>>>>>>>
>>>>>>> With 6001 tasks, each taking 2 ms or so:
>>>>>>> Swift without any changes to logging ->  440879 lines and  51Mb
>>>>>>> Swift with -minimal.logging                ->    83350  lines and 9.5Mb
>>>>>>> Swift with -minimal.logging and         ->      7625  lines and 791Kb
>>>>>>> Cpu, Block log4j properties set
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Yadu
>>>>>>> _______________________________________________
>>>>>>> Swift-devel mailing list
>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>


From hategan at mcs.anl.gov  Thu Jun 12 12:58:22 2014
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 12 Jun 2014 10:58:22 -0700
Subject: [Swift-devel] Reducing swift log size
In-Reply-To: <5399DF30.6040708@uchicago.edu>
References: <5398B211.8060403@uchicago.edu> <1402516497.26970.3.camel@echo>
	<5398DED2.70808@uchicago.edu> <1402531386.29962.1.camel@echo>
	<5398F844.4050607@uchicago.edu> <1402555539.11363.1.camel@echo>
	<1402557767.11363.8.camel@echo> <5399DF30.6040708@uchicago.edu>
Message-ID: <1402595902.15226.6.camel@echo>

Ah!

Well, so the trick with two coaster services on localhost doesn't really
work well unless you use "ssh:", and this is a good example why. 

In your case you can avoid it easily if you change your first pool to
use the local provider instead of the coaster provider, since you don't
really need coasters to run locally.

Mihael


On Thu, 2014-06-12 at 12:11 -0500, Yadu Nand Babuji wrote:
> Hi Mihael,
> 
> Here's the package I'm running:
> http://users.rcc.uchicago.edu/~yadunand/swift-0.95-modded.tar.gz
> (It has Cpu.java modified, as well as as a debugging line in 
> libexec/swift-int-staging.k)
> 
> I shutdown the run once the logs were past 4gb, here's the link to the 
> log :
> http://users.rcc.uchicago.edu/~yadunand/run013_with_mods_to_Cpujava.tar.gz
> 
> Thanks,
> -Yadu
> 
> On 06/11/2014 07:03 PM, Mihael Hategan wrote:
> >>>> Sorry, I should have mentioned this, but can you please gzip them? It's
> >>>> a bit much otherwise.
> >>>>
> >>>> Mihael
> >>>>
> >>>> On Wed, 2014-06-11 at 17:57 -0500, Yadu Nand Babuji wrote:
> >>>>> Hi Mihael,
> >>>>>
> >>>>> I've got the logs for you.
> >>>>>
> >>>>> This time, i've run the 6001 tasks with 20s delay added, and was all run
> >>>>> with swift-0.95-RC6 (from our website) :
> >>>>>
> >>>>> Normal run                 ->
> >>>>> http://users.rcc.uchicago.edu/~yadunand/run010_swift_normal/run010.log (
> >>>>> 6.1gb )
> >>>>> With minimal logging ->
> >>>>> http://users.rcc.uchicago.edu/~yadunand/run011_minimal_logging/run011.log (
> >>>>> 7.5 gb )
> >>>>> With minimal logging ->
> >>>>> http://users.rcc.uchicago.edu/~yadunand/run012_minimal_plus_log4j/run012.log
> >>>>> ( 845.4 kb )
> >>>>> & log4j properties modified
> >>>>>
> >>>>> The run with minimal logging ran for ~15mins while the first run took
> >>>>> ~12mins. That *might* explain why the one
> >>>>> with minimal logging is larger.
> >>>>>
> >>>>> Thanks,
> >>>>> Yadu
> >>>>>
> >>>>> On 06/11/2014 02:54 PM, Mihael Hategan wrote:
> >>>>>> On Wed, 2014-06-11 at 14:46 -0500, Yadu Nand Babuji wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I'm running a proxy app for a user with 6000 tasks each taking a few
> >>>>>>> milliseconds, and the log sizes are unusually large. When the tasks are
> >>>>>>> set to take 20s, the total log size reaches ~7Gb.
> >>>>>> 7GB? Wow. I'd like to see that. Can you upload and post link?
> >>>>>>
> >>>>>>> I tried setting -minimal.logging and -reduced.logging but still see
> >>>>>>> debug lines in the log like:
> >>>>>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu worker started:
> >>>>>>> block=0611-1507050-000000 host=midway461 id=10
> >>>>>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu ready for work:
> >>>>>>> block=0611-1507050-000000 id=10
> >>>>>>>
> >>>>>>> Do we need DEBUG lines such as the ones listed above ? Is it reasonable
> >>>>>>> to have these set by default to WARN ?
> >>>>>> It is until there's a problem and then people ask for the opposite. We
> >>>>>> should evaluate whether this belongs in reduced logging or not. But does
> >>>>>> that really account for most of the 7G?
> >>>>>>
> >>>>>>> Secondly, setting -minimal.logging did not turn off these DEBUG lines
> >>>>>>> and I had to set the following log4j.properties from DEBUG to WARN to
> >>>>>>> remove most of the offending lines:
> >>>>>> That sounds like a bug.
> >>>>>>
> >>>>>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=WARN
> >>>>>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block=WARN
> >>>>>>>
> >>>>>>> With 6001 tasks, each taking 2 ms or so:
> >>>>>>> Swift without any changes to logging ->  440879 lines and  51Mb
> >>>>>>> Swift with -minimal.logging                ->    83350  lines and 9.5Mb
> >>>>>>> Swift with -minimal.logging and         ->      7625  lines and 791Kb
> >>>>>>> Cpu, Block log4j properties set
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Yadu
> >>>>>>> _______________________________________________
> >>>>>>> Swift-devel mailing list
> >>>>>>> Swift-devel at ci.uchicago.edu
> >>>>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >>
> >> _______________________________________________
> >> Swift-devel mailing list
> >> Swift-devel at ci.uchicago.edu
> >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >
> 


From wilde at anl.gov  Thu Jun 12 13:02:00 2014
From: wilde at anl.gov (Michael Wilde)
Date: Thu, 12 Jun 2014 13:02:00 -0500
Subject: [Swift-devel] Reducing swift log size
In-Reply-To: <1402595902.15226.6.camel@echo>
References: <5398B211.8060403@uchicago.edu>
	<1402516497.26970.3.camel@echo>	<5398DED2.70808@uchicago.edu>
	<1402531386.29962.1.camel@echo>	<5398F844.4050607@uchicago.edu>
	<1402555539.11363.1.camel@echo>	<1402557767.11363.8.camel@echo>
	<5399DF30.6040708@uchicago.edu> <1402595902.15226.6.camel@echo>
Message-ID: <5399EB18.5050104@anl.gov>

In general we're moving to running coasters in all configurations (in 
part to reduce the number of configurations to explain and test).

Yadu's also looking at using provider staging shared-filesystem mode to 
avoid un-necessary staging for local filesystems.

Can you explain the connection between this and the excessive logging? 
Can that be fixed rather than resorting to an alternate provider?

- Mike

On 6/12/14, 12:58 PM, Mihael Hategan wrote:
> Ah!
>
> Well, so the trick with two coaster services on localhost doesn't really
> work well unless you use "ssh:", and this is a good example why.
>
> In your case you can avoid it easily if you change your first pool to
> use the local provider instead of the coaster provider, since you don't
> really need coasters to run locally.
>
> Mihael
>
>
> On Thu, 2014-06-12 at 12:11 -0500, Yadu Nand Babuji wrote:
>> Hi Mihael,
>>
>> Here's the package I'm running:
>> http://users.rcc.uchicago.edu/~yadunand/swift-0.95-modded.tar.gz
>> (It has Cpu.java modified, as well as as a debugging line in
>> libexec/swift-int-staging.k)
>>
>> I shutdown the run once the logs were past 4gb, here's the link to the
>> log :
>> http://users.rcc.uchicago.edu/~yadunand/run013_with_mods_to_Cpujava.tar.gz
>>
>> Thanks,
>> -Yadu
>>
>> On 06/11/2014 07:03 PM, Mihael Hategan wrote:
>>>>>> Sorry, I should have mentioned this, but can you please gzip them? It's
>>>>>> a bit much otherwise.
>>>>>>
>>>>>> Mihael
>>>>>>
>>>>>> On Wed, 2014-06-11 at 17:57 -0500, Yadu Nand Babuji wrote:
>>>>>>> Hi Mihael,
>>>>>>>
>>>>>>> I've got the logs for you.
>>>>>>>
>>>>>>> This time, i've run the 6001 tasks with 20s delay added, and was all run
>>>>>>> with swift-0.95-RC6 (from our website) :
>>>>>>>
>>>>>>> Normal run                 ->
>>>>>>> http://users.rcc.uchicago.edu/~yadunand/run010_swift_normal/run010.log (
>>>>>>> 6.1gb )
>>>>>>> With minimal logging ->
>>>>>>> http://users.rcc.uchicago.edu/~yadunand/run011_minimal_logging/run011.log (
>>>>>>> 7.5 gb )
>>>>>>> With minimal logging ->
>>>>>>> http://users.rcc.uchicago.edu/~yadunand/run012_minimal_plus_log4j/run012.log
>>>>>>> ( 845.4 kb )
>>>>>>> & log4j properties modified
>>>>>>>
>>>>>>> The run with minimal logging ran for ~15mins while the first run took
>>>>>>> ~12mins. That *might* explain why the one
>>>>>>> with minimal logging is larger.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Yadu
>>>>>>>
>>>>>>> On 06/11/2014 02:54 PM, Mihael Hategan wrote:
>>>>>>>> On Wed, 2014-06-11 at 14:46 -0500, Yadu Nand Babuji wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I'm running a proxy app for a user with 6000 tasks each taking a few
>>>>>>>>> milliseconds, and the log sizes are unusually large. When the tasks are
>>>>>>>>> set to take 20s, the total log size reaches ~7Gb.
>>>>>>>> 7GB? Wow. I'd like to see that. Can you upload and post link?
>>>>>>>>
>>>>>>>>> I tried setting -minimal.logging and -reduced.logging but still see
>>>>>>>>> debug lines in the log like:
>>>>>>>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu worker started:
>>>>>>>>> block=0611-1507050-000000 host=midway461 id=10
>>>>>>>>> 2014-06-11 19:15:06,961+0000 DEBUG Cpu ready for work:
>>>>>>>>> block=0611-1507050-000000 id=10
>>>>>>>>>
>>>>>>>>> Do we need DEBUG lines such as the ones listed above ? Is it reasonable
>>>>>>>>> to have these set by default to WARN ?
>>>>>>>> It is until there's a problem and then people ask for the opposite. We
>>>>>>>> should evaluate whether this belongs in reduced logging or not. But does
>>>>>>>> that really account for most of the 7G?
>>>>>>>>
>>>>>>>>> Secondly, setting -minimal.logging did not turn off these DEBUG lines
>>>>>>>>> and I had to set the following log4j.properties from DEBUG to WARN to
>>>>>>>>> remove most of the offending lines:
>>>>>>>> That sounds like a bug.
>>>>>>>>
>>>>>>>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=WARN
>>>>>>>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Block=WARN
>>>>>>>>>
>>>>>>>>> With 6001 tasks, each taking 2 ms or so:
>>>>>>>>> Swift without any changes to logging ->  440879 lines and  51Mb
>>>>>>>>> Swift with -minimal.logging                ->    83350  lines and 9.5Mb
>>>>>>>>> Swift with -minimal.logging and         ->      7625  lines and 791Kb
>>>>>>>>> Cpu, Block log4j properties set
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Yadu
>>>>>>>>> _______________________________________________
>>>>>>>>> Swift-devel mailing list
>>>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel

-- 
Michael Wilde
Mathematics and Computer Science          Computation Institute
Argonne National Laboratory               The University of Chicago


From hategan at mcs.anl.gov  Thu Jun 12 13:36:44 2014
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 12 Jun 2014 11:36:44 -0700
Subject: [Swift-devel] Reducing swift log size
In-Reply-To: <5399EB18.5050104@anl.gov>
References: <5398B211.8060403@uchicago.edu> <1402516497.26970.3.camel@echo>
	<5398DED2.70808@uchicago.edu> <1402531386.29962.1.camel@echo>
	<5398F844.4050607@uchicago.edu> <1402555539.11363.1.camel@echo>
	<1402557767.11363.8.camel@echo> <5399DF30.6040708@uchicago.edu>
	<1402595902.15226.6.camel@echo> <5399EB18.5050104@anl.gov>
Message-ID: <1402598204.15763.13.camel@echo>

On Thu, 2014-06-12 at 13:02 -0500, Michael Wilde wrote:
> In general we're moving to running coasters in all configurations (in 
> part to reduce the number of configurations to explain and test).

Right. Although we could default to the local provider for local things.

> 
> Yadu's also looking at using provider staging shared-filesystem mode to 
> avoid un-necessary staging for local filesystems.
> 
> Can you explain the connection between this and the excessive logging? 
> Can that be fixed rather than resorting to an alternate provider?

Local coaster services run in the same JVM. So static variables are the
same in multiple instances of local coaster services. The code was
written with the assumption that there would be one service per JVM, a
scenario that we didn't think we would deviate from a few years ago.

The job to worker node submission scheme is made up of a thread that
looks at queued jobs and matches them with free workers. This runs in a
loop that polls both the job queue and the worker queue. It is, however,
possible for workers to be available that cannot fit any of the queued
jobs due to walltime constraints. So you don't want to loop constantly
in that case.

The good news is that if a worker cannot run a queued job now due to
time constraints, it will never be able to. So unless a new job with a
smaller walltime comes in, you can safely assume that you don't need to
bother waking up said worker.

This is achieved using a sequence number. The job queue keeps one and
changes it monotonically when new jobs come in. Sleeping workers take a
snapshot of that and are only awaken if it differs from the one in the
job queue (i.e. new jobs came in since we last figured that this worker
cannot run any of the already queued jobs).

The problem is that there are two job queues, one for each coaster
service. But the code only looks at one static instance of them when
checking whether a worker should be awaken. So the worker gets a low
sequence number from the right job queue, but then it checks it against
the other job queue, which has a higher sequence number. So it gets
awaken. Then it gets put to sleep because it has nothing to run really.

Anyway, there are two things that should be fixed there: the static
variables and this should be made threadless.

Mihael


From yadunand at uchicago.edu  Thu Jun 12 16:50:51 2014
From: yadunand at uchicago.edu (Yadu Nand Babuji)
Date: Thu, 12 Jun 2014 16:50:51 -0500
Subject: [Swift-devel] Reducing swift log size
In-Reply-To: <1402598204.15763.13.camel@echo>
References: <5398B211.8060403@uchicago.edu>
	<1402516497.26970.3.camel@echo>	<5398DED2.70808@uchicago.edu>
	<1402531386.29962.1.camel@echo>	<5398F844.4050607@uchicago.edu>
	<1402555539.11363.1.camel@echo>	<1402557767.11363.8.camel@echo>
	<5399DF30.6040708@uchicago.edu>	<1402595902.15226.6.camel@echo>
	<5399EB18.5050104@anl.gov> <1402598204.15763.13.camel@echo>
Message-ID: <539A20BB.7050705@uchicago.edu>

Just an update.

I ran the same configs as earlier, 6001 tasks, each taking 20s with 
swift-0.95-RC6
and set just local pool to use ssh-cl:local and the logs are only 42Mb. 
This is without
minimal or reduced logging set.

-Yadu

On 06/12/2014 01:36 PM, Mihael Hategan wrote:
> On Thu, 2014-06-12 at 13:02 -0500, Michael Wilde wrote:
>> In general we're moving to running coasters in all configurations (in
>> part to reduce the number of configurations to explain and test).
> Right. Although we could default to the local provider for local things.
>
>> Yadu's also looking at using provider staging shared-filesystem mode to
>> avoid un-necessary staging for local filesystems.
>>
>> Can you explain the connection between this and the excessive logging?
>> Can that be fixed rather than resorting to an alternate provider?
> Local coaster services run in the same JVM. So static variables are the
> same in multiple instances of local coaster services. The code was
> written with the assumption that there would be one service per JVM, a
> scenario that we didn't think we would deviate from a few years ago.
>
> The job to worker node submission scheme is made up of a thread that
> looks at queued jobs and matches them with free workers. This runs in a
> loop that polls both the job queue and the worker queue. It is, however,
> possible for workers to be available that cannot fit any of the queued
> jobs due to walltime constraints. So you don't want to loop constantly
> in that case.
>
> The good news is that if a worker cannot run a queued job now due to
> time constraints, it will never be able to. So unless a new job with a
> smaller walltime comes in, you can safely assume that you don't need to
> bother waking up said worker.
>
> This is achieved using a sequence number. The job queue keeps one and
> changes it monotonically when new jobs come in. Sleeping workers take a
> snapshot of that and are only awaken if it differs from the one in the
> job queue (i.e. new jobs came in since we last figured that this worker
> cannot run any of the already queued jobs).
>
> The problem is that there are two job queues, one for each coaster
> service. But the code only looks at one static instance of them when
> checking whether a worker should be awaken. So the worker gets a low
> sequence number from the right job queue, but then it checks it against
> the other job queue, which has a higher sequence number. So it gets
> awaken. Then it gets put to sleep because it has nothing to run really.
>
> Anyway, there are two things that should be fixed there: the static
> variables and this should be made threadless.
>
> Mihael
>
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel


From hategan at mcs.anl.gov  Thu Jun 12 17:09:18 2014
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 12 Jun 2014 15:09:18 -0700
Subject: [Swift-devel] Reducing swift log size
In-Reply-To: <539A20BB.7050705@uchicago.edu>
References: <5398B211.8060403@uchicago.edu> <1402516497.26970.3.camel@echo>
	<5398DED2.70808@uchicago.edu> <1402531386.29962.1.camel@echo>
	<5398F844.4050607@uchicago.edu> <1402555539.11363.1.camel@echo>
	<1402557767.11363.8.camel@echo> <5399DF30.6040708@uchicago.edu>
	<1402595902.15226.6.camel@echo> <5399EB18.5050104@anl.gov>
	<1402598204.15763.13.camel@echo> <539A20BB.7050705@uchicago.edu>
Message-ID: <1402610958.19138.1.camel@echo>

Thanks for the update!

On Thu, 2014-06-12 at 16:50 -0500, Yadu Nand Babuji wrote:
> Just an update.
> 
> I ran the same configs as earlier, 6001 tasks, each taking 20s with 
> swift-0.95-RC6
> and set just local pool to use ssh-cl:local

You really really really don't need coasters to run stuff on localhost.

Mihael


From yadudoc1729 at gmail.com  Thu Jun 12 18:29:13 2014
From: yadudoc1729 at gmail.com (Yadu Nand)
Date: Thu, 12 Jun 2014 18:29:13 -0500
Subject: [Swift-devel] Reducing swift log size
In-Reply-To: <1402610958.19138.1.camel@echo>
References: <5398B211.8060403@uchicago.edu> <1402516497.26970.3.camel@echo>
	<5398DED2.70808@uchicago.edu> <1402531386.29962.1.camel@echo>
	<5398F844.4050607@uchicago.edu> <1402555539.11363.1.camel@echo>
	<1402557767.11363.8.camel@echo> <5399DF30.6040708@uchicago.edu>
	<1402595902.15226.6.camel@echo> <5399EB18.5050104@anl.gov>
	<1402598204.15763.13.camel@echo> <539A20BB.7050705@uchicago.edu>
	<1402610958.19138.1.camel@echo>
Message-ID: <CANa904k2giauTekt9Ju=fE5uvY5kS-Z0H1+ZOwNQ-amJfn2ncw@mail.gmail.com>

Okay, in the (off-list) mail to greg, I've given tested configs for running
on local using the local provider :

<pool handle="local">
     <execution provider="local" jobmanager="local:local"/>
     <profile namespace="karajan" key="initialScore">10000</profile>
     <profile namespace="globus" key="maxwalltime">00:20:00</profile>
     <profile namespace="globus" key="maxtime">3600</profile>
     <profile namespace="swift" key="stagingMethod">file</profile>
     <filesystem provider="local"/>
     <workdirectory>*/tmp/*{env.USER}/swiftwork</workdirectory>
</pool>

And, it works pretty well!


On Thu, Jun 12, 2014 at 5:09 PM, Mihael Hategan <hategan at mcs.anl.gov> wrote:

> Thanks for the update!
>
> On Thu, 2014-06-12 at 16:50 -0500, Yadu Nand Babuji wrote:
> > Just an update.
> >
> > I ran the same configs as earlier, 6001 tasks, each taking 20s with
> > swift-0.95-RC6
> > and set just local pool to use ssh-cl:local
>
> You really really really don't need coasters to run stuff on localhost.
>
> Mihael
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>


-- 
Yadu Nand B
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20140612/68d2d541/attachment.html>

From hategan at mcs.anl.gov  Sun Jun 22 19:40:51 2014
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Sun, 22 Jun 2014 17:40:51 -0700
Subject: [Swift-devel] FQNs use in Swift
Message-ID: <1403484051.20517.17.camel@echo>

Hi,

What are your general feelings toward namespaces in various places in
swift (e.g. tc.data, sites.xml)? Do you like them? Think they are
necessary? Would like to see them gone?

Mihael


From wilde at anl.gov  Mon Jun 23 08:41:00 2014
From: wilde at anl.gov (Michael Wilde)
Date: Mon, 23 Jun 2014 08:41:00 -0500
Subject: [Swift-devel] FQNs use in Swift
In-Reply-To: <1403484051.20517.17.camel@echo>
References: <1403484051.20517.17.camel@echo>
Message-ID: <53A82E6C.8000808@anl.gov>

I don't think they matter in tc.data and sites.xml, since with the new 
config mechanism in 0.95 these files should seldom be visible to users.

I think namespaces might more important in the language itself, to 
support a richer package model for script libraries.

But neither is high priority at the moment.  I feel we should leave the 
namespaces in tc and sites as-is for now.

- Mike

On 6/22/14, 7:40 PM, Mihael Hategan wrote:
> Hi,
>
> What are your general feelings toward namespaces in various places in
> swift (e.g. tc.data, sites.xml)? Do you like them? Think they are
> necessary? Would like to see them gone?
>
> Mihael
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel

-- 
Michael Wilde
Mathematics and Computer Science          Computation Institute
Argonne National Laboratory               The University of Chicago


From hategan at mcs.anl.gov  Mon Jun 23 10:14:56 2014
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Mon, 23 Jun 2014 08:14:56 -0700
Subject: [Swift-devel] FQNs use in Swift
In-Reply-To: <53A82E6C.8000808@anl.gov>
References: <1403484051.20517.17.camel@echo> <53A82E6C.8000808@anl.gov>
Message-ID: <1403536496.31463.12.camel@echo>

The reason I'm asking is because I'm trying to fix the various coaster
configuration problems:
- the need to have a "pilot" job to set jobsPerNode
- the inability to change settings on a persistent service after the
first run
- the problems with multiple sites on the same host

The way we do things now is to pass these settings through
site/app/dynamic profiles which all get mashed into task attributes
(though some selection happens based on namespace). It's hard to
efficiently check if settings are different between tasks, since you
have to look at all attribute values and compare, for each task.

My plan was to make a cleaner separation. There should be site
attributes (such as jobThrottle), provider attributes (e.g. slots), and
job attributes (maxwalltime). Each would go into the corresponding XML
node. So jobThrottle would be a child of <site>, slots would be a child
of <execution>, and walltime would be a child of <application>, which
would now be defined in sites.xml instead of tc.data.

This eliminates the need for namespaces as a (poor - the name "karajan"
does not belong in sites.xml) indicator of what should go where. But the
more important thing is that once an execution provider for a site is
defined, you know that the settings for that are not going to change. So
you can use the actual instance to distinguish between different
settings. This makes it much easier to support multiple coaster
configurations.

And yes, David's configuration system makes this less relevant from a
user's perspective, but that just part of it.

So this makes FQNs an annoyance in sites.xml, so the only place where
they remain is for app names. But then we don't use them there. We name
things "cat", not "system::cat", and I have heard nobody so far trying
to use the latter. That's why I asked, but wanted to make it short.

Mihael

On Mon, 2014-06-23 at 08:41 -0500, Michael Wilde wrote:
> I don't think they matter in tc.data and sites.xml, since with the new 
> config mechanism in 0.95 these files should seldom be visible to users.
> 
> I think namespaces might more important in the language itself, to 
> support a richer package model for script libraries.
> 
> But neither is high priority at the moment.  I feel we should leave the 
> namespaces in tc and sites as-is for now.
> 
> - Mike
> 
> On 6/22/14, 7:40 PM, Mihael Hategan wrote:
> > Hi,
> >
> > What are your general feelings toward namespaces in various places in
> > swift (e.g. tc.data, sites.xml)? Do you like them? Think they are
> > necessary? Would like to see them gone?
> >
> > Mihael
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 


From wilde at anl.gov  Mon Jun 23 11:00:00 2014
From: wilde at anl.gov (Michael Wilde)
Date: Mon, 23 Jun 2014 11:00:00 -0500
Subject: [Swift-devel] FQNs use in Swift
In-Reply-To: <1403536496.31463.12.camel@echo>
References: <1403484051.20517.17.camel@echo> <53A82E6C.8000808@anl.gov>
	<1403536496.31463.12.camel@echo>
Message-ID: <53A84F00.3040808@anl.gov>

OK, I see what you want to do here, and why.

What you're proposing will make the internals cleaner, and would be a 
chance to harmonize the user-visible and internal property names.

If we do this now, in trunk, presumably 0.96 will have the new names.  
So that would put a stake in the ground for conversion or all users to 
the new config mechanism.

What should we do for backwards compatibility?  My inclination would be 
to provide a tool to convert sites.xml + tc.data into the new config 
format, and urge users to convert.

Whats the development time needed for this?

Will it make code maintenance and enhancement easier?

Currently, finding property values (eg, within provider code) has been a 
surprisingly large obstacle to provider enhancement and support. If this 
fixes that problem (which also requires developer documentation) it will 
be worthwhile, if its affordable.

- Mike


On 6/23/14, 10:14 AM, Mihael Hategan wrote:
> The reason I'm asking is because I'm trying to fix the various coaster
> configuration problems:
> - the need to have a "pilot" job to set jobsPerNode
> - the inability to change settings on a persistent service after the
> first run
> - the problems with multiple sites on the same host
>
> The way we do things now is to pass these settings through
> site/app/dynamic profiles which all get mashed into task attributes
> (though some selection happens based on namespace). It's hard to
> efficiently check if settings are different between tasks, since you
> have to look at all attribute values and compare, for each task.
>
> My plan was to make a cleaner separation. There should be site
> attributes (such as jobThrottle), provider attributes (e.g. slots), and
> job attributes (maxwalltime). Each would go into the corresponding XML
> node. So jobThrottle would be a child of <site>, slots would be a child
> of <execution>, and walltime would be a child of <application>, which
> would now be defined in sites.xml instead of tc.data.
>
> This eliminates the need for namespaces as a (poor - the name "karajan"
> does not belong in sites.xml) indicator of what should go where. But the
> more important thing is that once an execution provider for a site is
> defined, you know that the settings for that are not going to change. So
> you can use the actual instance to distinguish between different
> settings. This makes it much easier to support multiple coaster
> configurations.
>
> And yes, David's configuration system makes this less relevant from a
> user's perspective, but that just part of it.
>
> So this makes FQNs an annoyance in sites.xml, so the only place where
> they remain is for app names. But then we don't use them there. We name
> things "cat", not "system::cat", and I have heard nobody so far trying
> to use the latter. That's why I asked, but wanted to make it short.
>
> Mihael
>
> On Mon, 2014-06-23 at 08:41 -0500, Michael Wilde wrote:
>> I don't think they matter in tc.data and sites.xml, since with the new
>> config mechanism in 0.95 these files should seldom be visible to users.
>>
>> I think namespaces might more important in the language itself, to
>> support a richer package model for script libraries.
>>
>> But neither is high priority at the moment.  I feel we should leave the
>> namespaces in tc and sites as-is for now.
>>
>> - Mike
>>
>> On 6/22/14, 7:40 PM, Mihael Hategan wrote:
>>> Hi,
>>>
>>> What are your general feelings toward namespaces in various places in
>>> swift (e.g. tc.data, sites.xml)? Do you like them? Think they are
>>> necessary? Would like to see them gone?
>>>
>>> Mihael
>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>

-- 
Michael Wilde
Mathematics and Computer Science          Computation Institute
Argonne National Laboratory               The University of Chicago


From hategan at mcs.anl.gov  Mon Jun 23 12:10:10 2014
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Mon, 23 Jun 2014 10:10:10 -0700
Subject: [Swift-devel] FQNs use in Swift
In-Reply-To: <53A84F00.3040808@anl.gov>
References: <1403484051.20517.17.camel@echo> <53A82E6C.8000808@anl.gov>
	<1403536496.31463.12.camel@echo> <53A84F00.3040808@anl.gov>
Message-ID: <1403543410.32109.11.camel@echo>

On Mon, 2014-06-23 at 11:00 -0500, Michael Wilde wrote:
> OK, I see what you want to do here, and why.
> 
> What you're proposing will make the internals cleaner, and would be a 
> chance to harmonize the user-visible and internal property names.
> 
> If we do this now, in trunk, presumably 0.96 will have the new names.  
> So that would put a stake in the ground for conversion or all users to 
> the new config mechanism.
> 
> What should we do for backwards compatibility?

I was asking myself the same question. Initially, I wanted to allow both
old and new configurations, and translate internally, but I believe that
would make the code messy for what is essentially a one time operation
for a limited number of users (due to the new config mechanism)...

>   My inclination would be 
> to provide a tool to convert sites.xml + tc.data into the new config 
> format, and urge users to convert.

... so I reasoned that we could do just what you say above.

> 
> Whats the development time needed for this?

Small-ish. I did most of it yesterday. I was under the impression that I
was fixing the coaster stuff, but it led me here. Let me stress again,
the coaster stuff really needs fixing. This is, to me, acceptable
collateral damage.

> 
> Will it make code maintenance and enhancement easier?

That's my take on it, although that code isn't touched much.

> 
> Currently, finding property values (eg, within provider code) has been a 
> surprisingly large obstacle to provider enhancement and support. If this 
> fixes that problem (which also requires developer documentation) it will 
> be worthwhile, if its affordable.

I was thinking about that too. It is related in that provider properties
are now separate from other task properties, so it would be easier to
add some API to get a list of what each provider supports and to use
that to validate sites files without having to keep a separate account
of provider properties.

Mihael

PS: While we're at it, jobThrottle and initialScore are being "replaced"
with maxParallelJobs and initialParallelJobs.


From wilde at anl.gov  Mon Jun 23 12:29:21 2014
From: wilde at anl.gov (Michael Wilde)
Date: Mon, 23 Jun 2014 12:29:21 -0500
Subject: [Swift-devel] FQNs use in Swift
In-Reply-To: <1403543410.32109.11.camel@echo>
References: <1403484051.20517.17.camel@echo> <53A82E6C.8000808@anl.gov>	
	<1403536496.31463.12.camel@echo> <53A84F00.3040808@anl.gov>
	<1403543410.32109.11.camel@echo>
Message-ID: <53A863F1.9040008@anl.gov>

This all sounds good, so best to keep going.

But regarding maxParallelJobs and initialParallelJobs, lets keep the 
names in sync with 0.95.

There we used the term "task" to indicate a Swift function invocation 
(usually an app task) and "job" to mean a site resource manager job.

- Mike

On 6/23/14, 12:10 PM, Mihael Hategan wrote:
> On Mon, 2014-06-23 at 11:00 -0500, Michael Wilde wrote:
>> OK, I see what you want to do here, and why.
>>
>> What you're proposing will make the internals cleaner, and would be a
>> chance to harmonize the user-visible and internal property names.
>>
>> If we do this now, in trunk, presumably 0.96 will have the new names.
>> So that would put a stake in the ground for conversion or all users to
>> the new config mechanism.
>>
>> What should we do for backwards compatibility?
> I was asking myself the same question. Initially, I wanted to allow both
> old and new configurations, and translate internally, but I believe that
> would make the code messy for what is essentially a one time operation
> for a limited number of users (due to the new config mechanism)...
>
>>    My inclination would be
>> to provide a tool to convert sites.xml + tc.data into the new config
>> format, and urge users to convert.
> ... so I reasoned that we could do just what you say above.
>
>> Whats the development time needed for this?
> Small-ish. I did most of it yesterday. I was under the impression that I
> was fixing the coaster stuff, but it led me here. Let me stress again,
> the coaster stuff really needs fixing. This is, to me, acceptable
> collateral damage.
>
>> Will it make code maintenance and enhancement easier?
> That's my take on it, although that code isn't touched much.
>
>> Currently, finding property values (eg, within provider code) has been a
>> surprisingly large obstacle to provider enhancement and support. If this
>> fixes that problem (which also requires developer documentation) it will
>> be worthwhile, if its affordable.
> I was thinking about that too. It is related in that provider properties
> are now separate from other task properties, so it would be easier to
> add some API to get a list of what each provider supports and to use
> that to validate sites files without having to keep a separate account
> of provider properties.
>
> Mihael
>
> PS: While we're at it, jobThrottle and initialScore are being "replaced"
> with maxParallelJobs and initialParallelJobs.
>

-- 
Michael Wilde
Mathematics and Computer Science          Computation Institute
Argonne National Laboratory               The University of Chicago


From yadunand at uchicago.edu  Fri Jun 27 10:47:40 2014
From: yadunand at uchicago.edu (Yadu Nand Babuji)
Date: Fri, 27 Jun 2014 10:47:40 -0500
Subject: [Swift-devel] Swift on Google compute engine [Request for comments]
Message-ID: <53AD921C.6050807@uchicago.edu>

Hi Everyone,

Please try the Google Compute Engine setup and tutorial, I've linked below.
This will require credit card information and will bill you 
approximately $0.065/Hr
with the default config of 5 micro instances.

Cloud setup online doc:
https://github.com/yadudoc/swift-on-cloud/tree/master/compute-engine

Git repo to clone:
https://github.com/yadudoc/swift-on-cloud.git

Once setup is done, do "connect headnode" and you can run the 
swift-cloud-tutorial
under your $HOME/swift-on-cloud/swift-cloud-tutorial folder on the 
headnode instance.

Feedback would be much appreciated.

PS: Remember to shut-down instances using the "dissolve" command.


Thanks!
Yadu


From hategan at mcs.anl.gov  Fri Jun 27 10:54:42 2014
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 27 Jun 2014 08:54:42 -0700
Subject: [Swift-devel] Swift on Google compute engine [Request for
 comments]
In-Reply-To: <53AD921C.6050807@uchicago.edu>
References: <53AD921C.6050807@uchicago.edu>
Message-ID: <1403884482.29142.2.camel@echo>

Nice!

One suggestion is that we probably should encourage dropping the @ in
front of functions.

Mihael

On Fri, 2014-06-27 at 10:47 -0500, Yadu Nand Babuji wrote:
> Hi Everyone,
> 
> Please try the Google Compute Engine setup and tutorial, I've linked below.
> This will require credit card information and will bill you 
> approximately $0.065/Hr
> with the default config of 5 micro instances.
> 
> Cloud setup online doc:
> https://github.com/yadudoc/swift-on-cloud/tree/master/compute-engine
> 
> Git repo to clone:
> https://github.com/yadudoc/swift-on-cloud.git
> 
> Once setup is done, do "connect headnode" and you can run the 
> swift-cloud-tutorial
> under your $HOME/swift-on-cloud/swift-cloud-tutorial folder on the 
> headnode instance.
> 
> Feedback would be much appreciated.
> 
> PS: Remember to shut-down instances using the "dissolve" command.
> 
> 
> Thanks!
> Yadu
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel


From yadunand at uchicago.edu  Fri Jun 27 11:08:47 2014
From: yadunand at uchicago.edu (Yadu Nand Babuji)
Date: Fri, 27 Jun 2014 11:08:47 -0500
Subject: [Swift-devel] Swift on Google compute engine [Request for
	comments]
In-Reply-To: <1403884482.29142.2.camel@echo>
References: <53AD921C.6050807@uchicago.edu> <1403884482.29142.2.camel@echo>
Message-ID: <53AD970F.2080501@uchicago.edu>

Thanks!

I'm guessing that you are talking about the README in 
swift-cloud-tutorial. I've corrected the doc, and the
code was already not using any @ prefix for functions.

-Yadu

On 06/27/2014 10:54 AM, Mihael Hategan wrote:
> Nice!
>
> One suggestion is that we probably should encourage dropping the @ in
> front of functions.
>
> Mihael
>
> On Fri, 2014-06-27 at 10:47 -0500, Yadu Nand Babuji wrote:
>> Hi Everyone,
>>
>> Please try the Google Compute Engine setup and tutorial, I've linked below.
>> This will require credit card information and will bill you
>> approximately $0.065/Hr
>> with the default config of 5 micro instances.
>>
>> Cloud setup online doc:
>> https://github.com/yadudoc/swift-on-cloud/tree/master/compute-engine
>>
>> Git repo to clone:
>> https://github.com/yadudoc/swift-on-cloud.git
>>
>> Once setup is done, do "connect headnode" and you can run the
>> swift-cloud-tutorial
>> under your $HOME/swift-on-cloud/swift-cloud-tutorial folder on the
>> headnode instance.
>>
>> Feedback would be much appreciated.
>>
>> PS: Remember to shut-down instances using the "dissolve" command.
>>
>>
>> Thanks!
>> Yadu
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>


From foster at anl.gov  Fri Jun 27 11:11:18 2014
From: foster at anl.gov (Foster, Ian T.)
Date: Fri, 27 Jun 2014 16:11:18 +0000
Subject: [Swift-devel] Swift on Google compute engine [Request for
 comments]
In-Reply-To: <53AD921C.6050807@uchicago.edu>
References: <53AD921C.6050807@uchicago.edu>
Message-ID: <BBCD79FF-7E5F-48AA-9699-F9DC9D0622C7@anl.gov>

Very nice. 

I am curious as to the reason for running on google. No objection to it, I just don't have experience with it -- only google

> On Jun 27, 2014, at 10:47 AM, "Yadu Nand Babuji" <yadunand at uchicago.edu> wrote:
> 
> Hi Everyone,
> 
> Please try the Google Compute Engine setup and tutorial, I've linked below.
> This will require credit card information and will bill you 
> approximately $0.065/Hr
> with the default config of 5 micro instances.
> 
> Cloud setup online doc:
> https://github.com/yadudoc/swift-on-cloud/tree/master/compute-engine
> 
> Git repo to clone:
> https://github.com/yadudoc/swift-on-cloud.git
> 
> Once setup is done, do "connect headnode" and you can run the 
> swift-cloud-tutorial
> under your $HOME/swift-on-cloud/swift-cloud-tutorial folder on the 
> headnode instance.
> 
> Feedback would be much appreciated.
> 
> PS: Remember to shut-down instances using the "dissolve" command.
> 
> 
> Thanks!
> Yadu
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel


From yadunand at uchicago.edu  Fri Jun 27 12:00:52 2014
From: yadunand at uchicago.edu (Yadu Nand Babuji)
Date: Fri, 27 Jun 2014 12:00:52 -0500
Subject: [Swift-devel] Swift on Google compute engine [Request for
	comments]
In-Reply-To: <BBCD79FF-7E5F-48AA-9699-F9DC9D0622C7@anl.gov>
References: <53AD921C.6050807@uchicago.edu>
	<BBCD79FF-7E5F-48AA-9699-F9DC9D0622C7@anl.gov>
Message-ID: <53ADA344.6000807@uchicago.edu>

Thanks!

Google beats AWS in both Pricing and performance. Once you are past the 
initial run of setup.sh from
the compute-engine setup, starting a 20 node cluster takes on less than 
a 1min. The initial run copies
over images which takes time.

AWS                                    GCE
Micro instance                 $0.020/Hr                              
$0.013/Hr                  [1][2]
Networking perf               135mbits/s 692mbits/s                [3]
Boot speed 1min+                                 < 30s [4]

[1] https://developers.google.com/compute/pricing
[2] http://aws.amazon.com/ec2/pricing/
[3] 
https://blog.serverdensity.com/network-performance-aws-google-rackspace-softlayer/
[4] 
http://gigaom.com/2013/03/15/by-the-numbers-how-google-compute-engine-stacks-up-to-amazon-ec2/

Thanks
-Yadu

On 06/27/2014 11:11 AM, Foster, Ian T. wrote:
> Very nice.
>
> I am curious as to the reason for running on google. No objection to it, I just don't have experience with it -- only google
>
>> On Jun 27, 2014, at 10:47 AM, "Yadu Nand Babuji" <yadunand at uchicago.edu> wrote:
>>
>> Hi Everyone,
>>
>> Please try the Google Compute Engine setup and tutorial, I've linked below.
>> This will require credit card information and will bill you
>> approximately $0.065/Hr
>> with the default config of 5 micro instances.
>>
>> Cloud setup online doc:
>> https://github.com/yadudoc/swift-on-cloud/tree/master/compute-engine
>>
>> Git repo to clone:
>> https://github.com/yadudoc/swift-on-cloud.git
>>
>> Once setup is done, do "connect headnode" and you can run the
>> swift-cloud-tutorial
>> under your $HOME/swift-on-cloud/swift-cloud-tutorial folder on the
>> headnode instance.
>>
>> Feedback would be much appreciated.
>>
>> PS: Remember to shut-down instances using the "dissolve" command.
>>
>>
>> Thanks!
>> Yadu
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel


From foster at anl.gov  Fri Jun 27 12:01:50 2014
From: foster at anl.gov (Ian Foster)
Date: Fri, 27 Jun 2014 12:01:50 -0500
Subject: [Swift-devel] Swift on Google compute engine [Request for
	comments]
In-Reply-To: <53ADA344.6000807@uchicago.edu>
References: <53AD921C.6050807@uchicago.edu>
	<BBCD79FF-7E5F-48AA-9699-F9DC9D0622C7@anl.gov>
	<53ADA344.6000807@uchicago.edu>
Message-ID: <BD67AB36-5B67-42BD-8107-0A9EFF90A432@anl.gov>

Nice. This would be a great use case for linking credit card info into Globus Nexus authentication. (Once we get to that.)


On Jun 27, 2014, at 12:00 PM, Yadu Nand Babuji <yadunand at uchicago.edu> wrote:

> Thanks!
> 
> Google beats AWS in both Pricing and performance. Once you are past the initial run of setup.sh from
> the compute-engine setup, starting a 20 node cluster takes on less than a 1min. The initial run copies
> over images which takes time.
> 
> AWS                                    GCE
> Micro instance                 $0.020/Hr                              $0.013/Hr                  [1][2]
> Networking perf               135mbits/s 692mbits/s                [3]
> Boot speed 1min+                                 < 30s [4]
> 
> [1] https://developers.google.com/compute/pricing
> [2] http://aws.amazon.com/ec2/pricing/
> [3] https://blog.serverdensity.com/network-performance-aws-google-rackspace-softlayer/
> [4] http://gigaom.com/2013/03/15/by-the-numbers-how-google-compute-engine-stacks-up-to-amazon-ec2/
> 
> Thanks
> -Yadu
> 
> On 06/27/2014 11:11 AM, Foster, Ian T. wrote:
>> Very nice.
>> 
>> I am curious as to the reason for running on google. No objection to it, I just don't have experience with it -- only google
>> 
>>> On Jun 27, 2014, at 10:47 AM, "Yadu Nand Babuji" <yadunand at uchicago.edu> wrote:
>>> 
>>> Hi Everyone,
>>> 
>>> Please try the Google Compute Engine setup and tutorial, I've linked below.
>>> This will require credit card information and will bill you
>>> approximately $0.065/Hr
>>> with the default config of 5 micro instances.
>>> 
>>> Cloud setup online doc:
>>> https://github.com/yadudoc/swift-on-cloud/tree/master/compute-engine
>>> 
>>> Git repo to clone:
>>> https://github.com/yadudoc/swift-on-cloud.git
>>> 
>>> Once setup is done, do "connect headnode" and you can run the
>>> swift-cloud-tutorial
>>> under your $HOME/swift-on-cloud/swift-cloud-tutorial folder on the
>>> headnode instance.
>>> 
>>> Feedback would be much appreciated.
>>> 
>>> PS: Remember to shut-down instances using the "dissolve" command.
>>> 
>>> 
>>> Thanks!
>>> Yadu
>>> 
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 


From ketan at mcs.anl.gov  Fri Jun 27 23:47:55 2014
From: ketan at mcs.anl.gov (Ketan Maheshwari)
Date: Fri, 27 Jun 2014 23:47:55 -0500
Subject: [Swift-devel] Swift on Google compute engine [Request for
	comments]
In-Reply-To: <BD67AB36-5B67-42BD-8107-0A9EFF90A432@anl.gov>
References: <53AD921C.6050807@uchicago.edu>
	<BBCD79FF-7E5F-48AA-9699-F9DC9D0622C7@anl.gov>
	<53ADA344.6000807@uchicago.edu>
	<BD67AB36-5B67-42BD-8107-0A9EFF90A432@anl.gov>
Message-ID: <CAMUuvioZYNdSaf2d_fTPEy+=8=pWYAx_JJHaAsWC7WRf1ygSQQ@mail.gmail.com>

Google is adding new services to GCE by the day. Recently they announced a
new programming model (as a replacement of MapReduce) called ... Cloud
Dataflow. And some more services:
http://www.datacenterknowledge.com/archives/2014/06/25/google-dumps-mapreduce-favor-new-hyper-scale-analytics-system

--
Ketan


On Fri, Jun 27, 2014 at 12:01 PM, Ian Foster <foster at anl.gov> wrote:

> Nice. This would be a great use case for linking credit card info into
> Globus Nexus authentication. (Once we get to that.)
>
>
> On Jun 27, 2014, at 12:00 PM, Yadu Nand Babuji <yadunand at uchicago.edu>
> wrote:
>
> > Thanks!
> >
> > Google beats AWS in both Pricing and performance. Once you are past the
> initial run of setup.sh from
> > the compute-engine setup, starting a 20 node cluster takes on less than
> a 1min. The initial run copies
> > over images which takes time.
> >
> > AWS                                    GCE
> > Micro instance                 $0.020/Hr
>  $0.013/Hr                  [1][2]
> > Networking perf               135mbits/s 692mbits/s                [3]
> > Boot speed 1min+                                 < 30s [4]
> >
> > [1] https://developers.google.com/compute/pricing
> > [2] http://aws.amazon.com/ec2/pricing/
> > [3]
> https://blog.serverdensity.com/network-performance-aws-google-rackspace-softlayer/
> > [4]
> http://gigaom.com/2013/03/15/by-the-numbers-how-google-compute-engine-stacks-up-to-amazon-ec2/
> >
> > Thanks
> > -Yadu
> >
> > On 06/27/2014 11:11 AM, Foster, Ian T. wrote:
> >> Very nice.
> >>
> >> I am curious as to the reason for running on google. No objection to
> it, I just don't have experience with it -- only google
> >>
> >>> On Jun 27, 2014, at 10:47 AM, "Yadu Nand Babuji" <
> yadunand at uchicago.edu> wrote:
> >>>
> >>> Hi Everyone,
> >>>
> >>> Please try the Google Compute Engine setup and tutorial, I've linked
> below.
> >>> This will require credit card information and will bill you
> >>> approximately $0.065/Hr
> >>> with the default config of 5 micro instances.
> >>>
> >>> Cloud setup online doc:
> >>> https://github.com/yadudoc/swift-on-cloud/tree/master/compute-engine
> >>>
> >>> Git repo to clone:
> >>> https://github.com/yadudoc/swift-on-cloud.git
> >>>
> >>> Once setup is done, do "connect headnode" and you can run the
> >>> swift-cloud-tutorial
> >>> under your $HOME/swift-on-cloud/swift-cloud-tutorial folder on the
> >>> headnode instance.
> >>>
> >>> Feedback would be much appreciated.
> >>>
> >>> PS: Remember to shut-down instances using the "dissolve" command.
> >>>
> >>>
> >>> Thanks!
> >>> Yadu
> >>>
> >>> _______________________________________________
> >>> Swift-devel mailing list
> >>> Swift-devel at ci.uchicago.edu
> >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20140627/8a3824f9/attachment.html>