[Swift-user] Persistent Coasters & 0.95 RC6

Matthew Shaxted Matthew.Shaxted at som.com
Thu Jun 19 10:54:27 CDT 2014


Another strange behavior I am noticing in 0.95…

I would like to be able to run 12 jobs on each of my coaster nodes. I have defined the tasksPerWorker=12 flag in the swift.properties file, but it does not seem to be respecting the flag. It only stages in 2 tasks at a time.

Additionally there seems to be some inconsistent syntax in section 4.8 (when I try to use the jobsPerNode flag it says it does not exist): http://swift-lang.org/guides/trunk/userguide/userguide.html

Thanks for any help.

Matthew

Swift.Properties File

site=local,persistent-coasters

use.provider.staging=true
use.wrapper.staging=false
lazy.errors=false
provider.staging.pin.swiftfiles=false
wrapperlog.always.transfer=false

site.persistent-coasters {
   jobmanager=coaster-persistent:local:local:http://localhost:50200
   taskWalltime=00:15:00
   workerManager=passive
   workdir=/tmp/swiftwork
   filesystem=local
   tasksPerWorker=12
}

site.local {
   jobmanager=local
   initialScore=10000
   filesystem=local
   workdir=/home/som/EnergyPlus/shanghai/work
}

app.persistent-coasters.epCoast=/home/som/EnergyPlus/RunAndReduceEP.sh
app.local.epLocal=$PWD/RunAndReduceEP.sh



MATTHEW SHAXTED

SKIDMORE, OWINGS & MERRILL LLP
224 South Michigan Ave.
Chicago, IL 60604
TEL: 312.360.4368
FAX: 312.360.4545
matthew.shaxted at som.com<mailto:matthew.shaxted at som.com>

[cid:image001.png at 01CF8BAB.EDDA9F30]
WWW.SOM.COM<http://www.som.com/>

The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender.

[cid:image002.png at 01CF8BAB.EDDA9F30]

From: Matthew Shaxted
Sent: Wednesday, June 18, 2014 2:56 PM
To: 'Yadu Nand'
Cc: swift-user at ci.uchicago.edu
Subject: RE: [Swift-user] Persistent Coasters & 0.95 RC6

Wouldn’t you know it –works great with these settings…

Thanks very much Yadu. It seems I needed the correct workerManager and port number defined.

Matthew


From: Yadu Nand [mailto:yadudoc1729 at gmail.com]
Sent: Wednesday, June 18, 2014 2:40 PM
To: Matthew Shaxted
Cc: swift-user at ci.uchicago.edu<mailto:swift-user at ci.uchicago.edu>
Subject: Re: [Swift-user] Persistent Coasters & 0.95 RC6

Hi Matthew,

With your coaster-service.conf having the SERVICE_PORT set to 50200, the sites.xml file start-coaster-service
generates should not have a different port number. Could you check the log file start-coaster-service.log for a line
like this : Started coaster service: http://127.0.1.1:50200

Once you are sure that the coaster-service is indeed using the ports you specified, please try the following swift.properties
file.

site=local,persistent-coasters
use.provider.staging=true
execution.retries=2

site.persistent-coasters {
   jobmanager=coaster-persistent:local:local:http://localhost:50200
   taskWalltime=00:15:00
   workerManager=passive
   workdir=/tmp/swiftwork
   filesystem=local
}

site.local1 {
   jobmanager=local
   initialScore=10000
   filesystem=local
   workdir=/tmp/swiftwork
}

app.persistent-coasters.echoCoast=/bin/echo

I made a minor modification to your test.swift to get a log file :

type file;

app (file o, file e) echo (string s) {
   echoCoast s stderr=@e stdout=@o;
}

file out <"stdout">;
file err <"stderr">;

(out,err) = echo("HELLO WORLD");



I just tested this on my machine with one remote worker. Please let me know if this works. If not please include the
start-coaster-service.log that you get in the same folder as setup.sh along with the entire run directory.

Thanks!
Yadu

On Wed, Jun 18, 2014 at 1:59 PM, Matthew Shaxted <Matthew.Shaxted at som.com<mailto:Matthew.Shaxted at som.com>> wrote:
And here is a tar of the run001.log directory…



From: Matthew Shaxted
Sent: Wednesday, June 18, 2014 12:36 PM
To: 'Yadu Nand'
Cc: swift-user at ci.uchicago.edu<mailto:swift-user at ci.uchicago.edu>
Subject: RE: [Swift-user] Persistent Coasters & 0.95 RC6

Hi Yadu, yes it is attached.

Thanks

MATTHEW SHAXTED

SKIDMORE, OWINGS & MERRILL LLP
224 South Michigan Ave.
Chicago, IL 60604
TEL: 312.360.4368<tel:312.360.4368>
FAX: 312.360.4545<tel:312.360.4545>
matthew.shaxted at som.com<mailto:matthew.shaxted at som.com>

[cid:image001.png at 01CF8BAB.EDDA9F30]
WWW.SOM.COM<http://www.som.com/>

The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender.

[cid:image002.png at 01CF8BAB.EDDA9F30]

From: Yadu Nand [mailto:yadudoc1729 at gmail.com]
Sent: Wednesday, June 18, 2014 12:31 PM
To: Matthew Shaxted
Cc: swift-user at ci.uchicago.edu<mailto:swift-user at ci.uchicago.edu>
Subject: Re: [Swift-user] Persistent Coasters & 0.95 RC6

Hi Matthew,

Could you mail us a tar ball of your runNNN directory ?

Thanks,
Yadu

On Wed, Jun 18, 2014 at 12:20 PM, Matthew Shaxted <Matthew.Shaxted at som.com<mailto:Matthew.Shaxted at som.com>> wrote:
Hi, I am trying to upgrade my 0.94 persistent-coasters scripts to 0.95, and seem to be experiencing some difficulty when trying to execute coaster jobs. I can run these test scripts locally with no problems.

I have a swift.properties file that is outlined below. I can successfully start-coaster-service using a hosts.txt file. I notice that this process creates a unique sites.xml file but my swift.properties file should match it. I am manually defining the coaster service port to match that in the swift.properties file.

When I try to run the below test.swift script, I am getting the error that looks like below.

Any idea what may be causing this issue?


Swift.Properties File

sitedir.keep=true
wrapperlog.always.transfer=true

site.local {
   tasksPerWorker=8
   taskWalltime=00:15:00
   maxJobs=1
   workdir=/home/som/EnergyPlus/test/work
   filesystem=local
}
app.local.echoLocal=/bin/echo

site.persistent-coasters {
   jobManager=coaster-persistent:local:local:http://localhost:50200
   tasksPerWorker=8
   taskWalltime=00:15:00
   maxJobs=1
   workdir=/home/som/EnergyPlus/test/work
   filesystem=local
}
app.persistent-coasters.echoCoast=/bin/echo

site=local,persistent-coasters




test.swift file

app echo (string s) {
   echoCoast s;
}

echo("HELLO WORLD");




test.swift Error

from the run.log file
swift NO_STATUS_FILE jobid=echoCoast-5135s9sl - Error file missing
2014-06-18 12:13:25,214-0500 DEBUG swift APPLICATION_EXCEPTION jobid=echoCoast-5135s9sl - Application exception: Job failed with and exit code of 127

Command line output
Swift 0.95 RC6 swift-r7900 cog-r3908
RunID: run003
Progress: Wed, 18 Jun 2014 12:13:24-0500

Execution failed:
Exception in echoCoast:
    Arguments: [HELLO WORLD]
    Host: persistent-coasters
    Directory: test-run003/jobs/5/echoCoast-5135s9sl
        exception @ swift-int.k, line: 530
Caused by: Job failed with and exit code of 127
org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with and exit code of 127 (exit code: 127)
        at org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:40)
        at org.globus.cog.coaster.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88)
        at org.globus.cog.coaster.channels.AbstractCoasterChannel.handleRequest(AbstractCoasterChannel.java:527)
        at org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:238)
        at org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97)
        at org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:56)

Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with and exit code of 127
org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with and exit code of 127 (exit code: 127)
        at org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:40)
        at org.globus.cog.coaster.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88)
        at org.globus.cog.coaster.channels.AbstractCoasterChannel.handleRequest(AbstractCoasterChannel.java:527)
        at org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:238)
        at org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97)
        at org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:56)
(exit code: 127)






MATTHEW SHAXTED

SKIDMORE, OWINGS & MERRILL LLP
224 South Michigan Ave.
Chicago, IL 60604
TEL: 312.360.4368<tel:312.360.4368>
FAX: 312.360.4545<tel:312.360.4545>
matthew.shaxted at som.com<mailto:matthew.shaxted at som.com>

[cid:image001.png at 01CF8BAB.EDDA9F30]
WWW.SOM.COM<http://www.som.com/>

The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sender immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender.

[cid:image002.png at 01CF8BAB.EDDA9F30]


_______________________________________________
Swift-user mailing list
Swift-user at ci.uchicago.edu<mailto:Swift-user at ci.uchicago.edu>
https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user



--
Yadu Nand B




--
Yadu Nand B

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140619/1702f987/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 6643 bytes
Desc: image001.png
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140619/1702f987/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 3047 bytes
Desc: image002.png
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140619/1702f987/attachment-0001.png>


More information about the Swift-user mailing list