[Swift-devel] Re: Swift hanging in complex iterate script

Mihael Hategan hategan at mcs.anl.gov
Thu Sep 16 13:50:00 CDT 2010


I can't tell what's causing the problem, but it may generally be a good
idea to do a jstack -l when you get a hang.

Mihael

On Thu, 2010-09-16 at 12:14 -0600, wilde at mcs.anl.gov wrote:
> Mihael,
> 
> I've developed a Swift script that loops using iterate, reading requests to process an R function from a named pipe (fifo), calling R, and replying "done" on a response fifo.
> 
> This has been working very well, but I just hit a case where the script hangs.
> 
> I exercise it using a small battery of R tests; I was manually restarting the test battery (which does hundreds of R calls in 30 seconds or so, when it hung in the middle of the test suite.
> 
> As far as I can tell it hung after receiving a work request, mapping the files for the work request, but never called the app() function that invokes R.
> 
> The log is in ~wilde/rserver-20100916-1159-y94hftt0.log
> (I will try to post the script and all related files, but looks like bridled may have just gone down for patches)
> 
> look for these trace lines in the log, which are issued at the start of every R request:
> 
> line 37342:
> 
> 2010-09-16 12:03:24,212-0500 INFO  vdl:execute END_SUCCESS thread=0-1-86-4 tr=bash
> 2010-09-16 12:03:24,213-0500 INFO  apply STARTCOMPOUND thread=0-1-87-2 name=apply
> 2010-09-16 12:03:24,215-0500 WARN  trace SwiftScript trace: rserver: got dir, /autonfs/home/wilde/SwiftR/SwiftR.run.233
> 2010-09-16 12:03:24,215-0500 INFO  SetFieldValue Set: done=false
> 
> The END_SUCCESS is the completion of the last app() in the prior iterate pass, which signals the response ("done") fifo using a shell script.
> 
> The trace says its starting to process the next R request, #233 (randomly assigned)
> 
> after mapping 20 files (for 5 R datasets containing 2 R evaluation requests each)
> 
> it just hangs, and all I see in the log after that point is coaster heartbeats.
> 
> The last request prior to this hanging request is in the log at line 37137:
> 
> 2010-09-16 12:03:24,060-0500 INFO  vdl:execute END_SUCCESS thread=0-1-85-4 tr=bash
> 2010-09-16 12:03:24,062-0500 INFO  apply STARTCOMPOUND thread=0-1-86-2 name=apply
> 2010-09-16 12:03:24,062-0500 WARN  trace SwiftScript trace: rserver: got dir, /autonfs/home/wilde/SwiftR/SwiftR.run.174
> 2010-09-16 12:03:24,062-0500 INFO  SetFieldValue Set: done=false
> 
> R request #174 (and all prior ones) completed fine, and should illustrate the normal processing sequence.
> 
> Any ideas on what to look for regarding the cause of the hang?
> 
> I will try to reproduce it and try to get a karajan status trace using swift stdin.
> 
> - Mike
> 





More information about the Swift-devel mailing list