[Swift-user] exception @ swift-int.k, line: 511, Caused by: Block task failed: Connection to worker lost

Jonathan Ozik jozik at uchicago.edu
Tue Jul 29 20:56:28 CDT 2014


Hi all,

I’m getting spurious errors in the jobs that I’m running on Blues. The stdout includes exceptions like:
	exception @ swift-int.k, line: 511
Caused by: Block task failed: Connection to worker lost
java.io.IOException: Broken pipe
	at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
	at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
	at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
	at sun.nio.ch.IOUtil.write(IOUtil.java:65)
	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
	at org.globus.cog.coaster.channels.NIOSender.write(NIOSender.java:168)
	at org.globus.cog.coaster.channels.NIOSender.run(NIOSender.java:133)

These seem to occur at different parts of the submitted jobs. Let me know if there’s a log file that you’d like to look at.

In earlier attempts I was getting these warnings followed by broken pipe errors:
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000a0000000, 704643072, 2097152, 0) failed; error='Cannot allocate memory' (errno=12); Cannot allocate large pages, falling back to regular pages

Apparently that’s a known precursor of crashes on Java 7 as described here (http://www.oracle.com/technetwork/java/javase/7u51-relnotes-2085002.html):
Area: hotspot/gc
Synopsis: Crashes due to failure to allocate large pages.

On Linux, failures when allocating large pages can lead to crashes. When running JDK 7u51 or later versions, the issue can be recognized in two ways:

	• Before the crash happens one or more lines similar to this will have been printed to the log:
os::commit_memory(0x00000006b1600000, 352321536, 2097152, 0) failed; 
error='Cannot allocate memory' (errno=12); Cannot allocate large pages, falling back to regular pages
	• If a hs_err file is generated it will contain a line similar to this:
Large page allocation failures have occurred 3 times
The problem can be avoided by running with large page support turned off, for example by passing the "-XX:-UseLargePages" option to the java binary.

See 8007074 (not public).

So I added the -XX:-UseLargePages option in the invocations of Java code that I was responsible for. That seemed to get rid of the warning and the crashes for a while, but perhaps that was just a coincidence.

Jonathan




More information about the Swift-user mailing list