[Swift-user] swift on midway: apps and modules

David Kelly davidk at ci.uchicago.edu
Thu Nov 29 17:04:25 CST 2012


Hi Neil,

It could possibly be something environmental. Are the modules loaded automatically when you log in through your .bashrc? If they are not already, could you give that a try?

If that doesn't work, try editing sites.xml and try changing

<workdirectory>/scratch/local/swift</workdirectory>
to
<workdirectory>/project/joshuaelliott/narr</workdirectory>

Then try the run again and let me know when it's finished. That will just make some extra debugging information available that might better explain why it's failing.

Thanks,
David

----- Original Message -----
> From: "Neil Best" <nbest at ci.uchicago.edu>
> To: swift-user at ci.uchicago.edu
> Sent: Thursday, November 29, 2012 4:29:05 PM
> Subject: [Swift-user] swift on midway: apps and modules
> Running on Midway, it looks like there is a problem with defining apps
> that come from software modules.
> 
> [nbest at midway-login1 narr]$ pwd
> /project/joshuaelliott/narr
> [nbest at midway-login1 narr]$ runswift narr.swift
> ++ swift -config config -tc.file applist -sites.file sites.xml
> narr.swift
> Warning: Parameter grb, on line 12, shadows variable of same name on
> line 5
> Swift trunk swift-r6083 cog-r3522
> 
> RunID: 20121129-2115-mos5x5ad
> Progress: time: Thu, 29 Nov 2012 21:15:40 +0000
> Progress: time: Thu, 29 Nov 2012 21:15:57 +0000 Initializing:6
> Progress: time: Thu, 29 Nov 2012 21:15:58 +0000 Selecting site:1022
> Initializing site shared directory:1 Stage in:1
> Progress: time: Thu, 29 Nov 2012 21:15:59 +0000 Selecting site:623
> Stage in:399 Submitting:2
> Progress: time: Thu, 29 Nov 2012 21:16:00 +0000 Selecting site:623
> Stage in:388 Submitted:13
> Progress: time: Thu, 29 Nov 2012 21:16:01 +0000 Selecting site:623
> Stage in:371 Submitting:1 Submitted:29
> Progress: time: Thu, 29 Nov 2012 21:16:02 +0000 Selecting site:623
> Stage in:359 Submitting:1 Failed but can retry:41
> Progress: time: Thu, 29 Nov 2012 21:16:03 +0000 Selecting site:623
> Stage in:341 Active:1 Failed but can retry:59
> Progress: time: Thu, 29 Nov 2012 21:16:04 +0000 Selecting site:623
> Stage in:322 Submitting:1 Failed but can retry:78
> Progress: time: Thu, 29 Nov 2012 21:16:05 +0000 Selecting site:623
> Stage in:302 Active:1 Failed but can retry:98
> . . .
> Progress: time: Thu, 29 Nov 2012 21:18:16 +0000 Selecting site:623
> Stage in:14 Submitting:1 Failed but can retry:386
> Progress: time: Thu, 29 Nov 2012 21:18:17 +0000 Selecting site:623
> Stage in:10 Submitting:1 Failed but can retry:390
> Progress: time: Thu, 29 Nov 2012 21:18:19 +0000 Selecting site:623
> Stage in:5 Submitting:1 Failed but can retry:395
> Progress: time: Thu, 29 Nov 2012 21:18:20 +0000 Selecting site:622
> Stage in:2 Failed:3 Failed but can retry:397
> Execution failed:
> Exception in cnvgrib:
> Arguments: [-g12, -nv,
> data/grb/197901/narr-a_221_19790116_0600_000.grb,
> data/grb2/197901/narr-a_221_19790116_0600_000.grb2]
> Host: cluster
> Directory: narr-20121129-2115-mos5x5ad/jobs/r/cnvgrib-ricszn1l
> Caused by:
> Job failed with and exit code of 127
> org.globus.cog.abstraction.impl.common.execution.JobException: Job
> failed with and exit code of 127 (exit code: 127)
> at
> org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:38)
> at
> org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:90)
> at
> org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:502)
> at
> org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.stepNIO(AbstractStreamKarajanChannel.java:238)
> at
> org.globus.cog.karajan.workflow.service.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97)
> at
> org.globus.cog.karajan.workflow.service.channels.NIOMultiplexer.run(NIOMultiplexer.java:56)
> (exit code: 127)
> cnvgrib, narr.swift, line 17
> [nbest at midway-login1 narr]$ cat applist
> cluster cnvgrib /software/cnvgrib-1.4-el6-x86_64/bin/cnvgrib null null
> null
> cluster wgrib2 /software/wgrib2-0.1-el6-x86_64/bin/wgrib2 null null
> null
> 
> My app definition looks like this:
> 
> app (file grb2) cnvgrib (file grb) {
> cnvgrib "-g12" "-nv" @grb @grb2;
> }
> 
> I don't see the directory referenced in the "Exception" stanza. Where
> should that be?
> 
> Does this have anything to do with the fact that I am calling an
> executable from Modules, and therefore the Swift workers have bad
> environments?
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user



More information about the Swift-user mailing list