[Swift-devel] Re: Swift hanging in complex iterate script
Mihael Hategan
hategan at mcs.anl.gov
Thu Sep 16 13:50:00 CDT 2010
I can't tell what's causing the problem, but it may generally be a good
idea to do a jstack -l when you get a hang.
Mihael
On Thu, 2010-09-16 at 12:14 -0600, wilde at mcs.anl.gov wrote:
> Mihael,
>
> I've developed a Swift script that loops using iterate, reading requests to process an R function from a named pipe (fifo), calling R, and replying "done" on a response fifo.
>
> This has been working very well, but I just hit a case where the script hangs.
>
> I exercise it using a small battery of R tests; I was manually restarting the test battery (which does hundreds of R calls in 30 seconds or so, when it hung in the middle of the test suite.
>
> As far as I can tell it hung after receiving a work request, mapping the files for the work request, but never called the app() function that invokes R.
>
> The log is in ~wilde/rserver-20100916-1159-y94hftt0.log
> (I will try to post the script and all related files, but looks like bridled may have just gone down for patches)
>
> look for these trace lines in the log, which are issued at the start of every R request:
>
> line 37342:
>
> 2010-09-16 12:03:24,212-0500 INFO vdl:execute END_SUCCESS thread=0-1-86-4 tr=bash
> 2010-09-16 12:03:24,213-0500 INFO apply STARTCOMPOUND thread=0-1-87-2 name=apply
> 2010-09-16 12:03:24,215-0500 WARN trace SwiftScript trace: rserver: got dir, /autonfs/home/wilde/SwiftR/SwiftR.run.233
> 2010-09-16 12:03:24,215-0500 INFO SetFieldValue Set: done=false
>
> The END_SUCCESS is the completion of the last app() in the prior iterate pass, which signals the response ("done") fifo using a shell script.
>
> The trace says its starting to process the next R request, #233 (randomly assigned)
>
> after mapping 20 files (for 5 R datasets containing 2 R evaluation requests each)
>
> it just hangs, and all I see in the log after that point is coaster heartbeats.
>
> The last request prior to this hanging request is in the log at line 37137:
>
> 2010-09-16 12:03:24,060-0500 INFO vdl:execute END_SUCCESS thread=0-1-85-4 tr=bash
> 2010-09-16 12:03:24,062-0500 INFO apply STARTCOMPOUND thread=0-1-86-2 name=apply
> 2010-09-16 12:03:24,062-0500 WARN trace SwiftScript trace: rserver: got dir, /autonfs/home/wilde/SwiftR/SwiftR.run.174
> 2010-09-16 12:03:24,062-0500 INFO SetFieldValue Set: done=false
>
> R request #174 (and all prior ones) completed fine, and should illustrate the normal processing sequence.
>
> Any ideas on what to look for regarding the cause of the hang?
>
> I will try to reproduce it and try to get a karajan status trace using swift stdin.
>
> - Mike
>
More information about the Swift-devel
mailing list