[Swift-user] Swift hangup error

Michael Wilde wilde at mcs.anl.gov
Tue Aug 28 10:43:15 CDT 2012


I think that there is a Swift bug which causes Swift to occasionally hang in termination processing. So if you have a script that runs many Swift scripts serially, you might need to work around this for the moment by e.g. greping for the "Final status" message and killing the Swift process, from within your multiple-swift loop.

The next time this happens, please do this:

First locate the PID of the swift Java process (using ps -u $USER -H)
Then capture a Java stack trace of that JVM
  (using jstack -l thatJavaPID >& jstack.out)
Then send the jstack.out file to swift-support.

I can't find this in bugzilla so I'm (re)filing it as bug 821.

Thanks!

- Mike


----- Original Message -----
> From: "Jonathan Margoliash" <jmargolpeople at gmail.com>
> To: "Michael Wilde" <wilde at mcs.anl.gov>, swift-user at ci.uchicago.edu, "Swift Language" <davidk at ci.uchicago.edu>,
> "Professor E. Yan" <eyan at anl.gov>
> Sent: Tuesday, August 28, 2012 10:26:51 AM
> Subject: Swift hangup error
> Hello all,
> 
> 
> I recently left a program running over night, and came back in the
> morning to find that it had successfully executed a swift script 14
> times, but on the 15th time the swift script would not return.
> Specifically, the terminal output after a swift script ends is
> normally:
> 
> -----
> Final status: Tue, 28 Aug 2012 07:55:46 -0500 Finished
> successfully:210
> Swift trunk swift-r5819 cog-r3424
> 
> RunID: 20120828-0705-4v6epf52
> 
> Returned from swift to evolve.m
> -----
> 
> (where the last line of text is generated by my code). However, when I
> came back, the last line of output was:
> 
> ----
> Final status: Tue, 28 Aug 2012 08:59:45 -0500 Finished
> successfully:210
> 
> ----
> 
> and, given the time stamp on that line, it had been hanging there for
> an hour. Since the two lines that usually follow the Final status line
> (Swift trunk and Run ID) had not been printed, I think it is clear
> that swift did not return control to my program. Why would this be the
> case? It is very improbable but possible that I ran out of space on
> one of the computers that swift was distributing jobs to, but I am
> positive that I did not run out of space on the computer which was
> distributing the jobs, and in any event I would hope that swift would
> crash more nicely if that were the case. Thanks,
> 
> Jonathan

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-user mailing list