[Swift-devel] Java hangs on new rcc hardware

Michael Wilde wilde at mcs.anl.gov
Mon Jul 9 08:19:55 CDT 2012


Java is acting strange for me on the new RCC "midway" cluster. The symptom is that the jvm seems to go into a tight cpu loop across several (3 or more) cores.

I see this first in the polling loop in the local scheduler provider, which calls Thread.sleep() and seems to not return. But each time I suspect and resume the jvm with ^Z, fg, ^Z, bg, it progresses further. Doing this twice enables the jvm to successfully complete the Swift script its running (which tests a single PBS job).

I see what appears to be similar behavior in the Swift build. The ant redist will hang somewhere around where Swift compiles the antlr output, then a similar suspect-resume sequence will cause it to continue.

I saw this first with the Java 1.7 that was installed on midway; then with the latest JDK 1.6, and also with what I think is a more recent/latest JDK 1.7.

Im still debugging, but any help or suggestions would be most welcome.

Thanks,

- Mike

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list