[Swift-devel] Swift hanging in complex iterate script

wilde at mcs.anl.gov wilde at mcs.anl.gov
Thu Sep 16 13:14:11 CDT 2010


Mihael,

I've developed a Swift script that loops using iterate, reading requests to process an R function from a named pipe (fifo), calling R, and replying "done" on a response fifo.

This has been working very well, but I just hit a case where the script hangs.

I exercise it using a small battery of R tests; I was manually restarting the test battery (which does hundreds of R calls in 30 seconds or so, when it hung in the middle of the test suite.

As far as I can tell it hung after receiving a work request, mapping the files for the work request, but never called the app() function that invokes R.

The log is in ~wilde/rserver-20100916-1159-y94hftt0.log
(I will try to post the script and all related files, but looks like bridled may have just gone down for patches)

look for these trace lines in the log, which are issued at the start of every R request:

line 37342:

2010-09-16 12:03:24,212-0500 INFO  vdl:execute END_SUCCESS thread=0-1-86-4 tr=bash
2010-09-16 12:03:24,213-0500 INFO  apply STARTCOMPOUND thread=0-1-87-2 name=apply
2010-09-16 12:03:24,215-0500 WARN  trace SwiftScript trace: rserver: got dir, /autonfs/home/wilde/SwiftR/SwiftR.run.233
2010-09-16 12:03:24,215-0500 INFO  SetFieldValue Set: done=false

The END_SUCCESS is the completion of the last app() in the prior iterate pass, which signals the response ("done") fifo using a shell script.

The trace says its starting to process the next R request, #233 (randomly assigned)

after mapping 20 files (for 5 R datasets containing 2 R evaluation requests each)

it just hangs, and all I see in the log after that point is coaster heartbeats.

The last request prior to this hanging request is in the log at line 37137:

2010-09-16 12:03:24,060-0500 INFO  vdl:execute END_SUCCESS thread=0-1-85-4 tr=bash
2010-09-16 12:03:24,062-0500 INFO  apply STARTCOMPOUND thread=0-1-86-2 name=apply
2010-09-16 12:03:24,062-0500 WARN  trace SwiftScript trace: rserver: got dir, /autonfs/home/wilde/SwiftR/SwiftR.run.174
2010-09-16 12:03:24,062-0500 INFO  SetFieldValue Set: done=false

R request #174 (and all prior ones) completed fine, and should illustrate the normal processing sequence.

Any ideas on what to look for regarding the cause of the hang?

I will try to reproduce it and try to get a karajan status trace using swift stdin.

- Mike

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list