[Swift-user] exception @ swift-int.k, line: 511, Caused by: Block task failed: Connection to worker lost
Ozik, Jonathan
jozik at anl.gov
Thu Jul 31 10:41:39 CDT 2014
Thank you Mike.
Regarding the location of the ER files, to reduce variables the last few runs were done with 0.95-RC6.
Jonathan
On Jul 31, 2014, at 9:18 AM, Michael Wilde <wilde at anl.gov> wrote:
> I see this from PBS in your home dir:
>
> blues$ cat 583937.bmgt1.lcrc.anl.gov.ER
> Use of uninitialized value $s in concatenation (.) or string at
> /home/ozik/.globus/coasters/cscript4312030037430783094.pl line 2220.
> Use of uninitialized value $s in concatenation (.) or string at
> /home/ozik/.globus/coasters/cscript4312030037430783094.pl line 2220.
> blues$
>
> That looks to me like a Swift bug in worker.pl
>
> We'll look into this angle.
>
> Also I'm curious why these files are not going into your run dir (but
> perhaps thats because youre running an older trunk release, not 0.95?
> Or, thats a separate 0.95 bug).
>
> - Mike
>
> On 7/31/14, 9:13 AM, Michael Wilde wrote:
>> Some discussion and diagnosis of this incident has taken place off list.
>>
>> In a quick scan of the worker logs, I don't spot an obvious error that
>> would cause workers to exit.
>> Hopefully others on the Swift team can check those as well.
>>
>> Jonathan, do you have stdout/err files from the PBS scheduler on blues,
>> in your runNNN log dirs?
>>
>> If so, can you point us to them?
>>
>> Thanks,
>>
>> - Mike
>>
>> On 7/29/14, 8:56 PM, Jonathan Ozik wrote:
>>> Hi all,
>>>
>>> I’m getting spurious errors in the jobs that I’m running on Blues. The stdout includes exceptions like:
>>> exception @ swift-int.k, line: 511
>>> Caused by: Block task failed: Connection to worker lost
>>> java.io.IOException: Broken pipe
>>> at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>>> at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>>> at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>>> at sun.nio.ch.IOUtil.write(IOUtil.java:65)
>>> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
>>> at org.globus.cog.coaster.channels.NIOSender.write(NIOSender.java:168)
>>> at org.globus.cog.coaster.channels.NIOSender.run(NIOSender.java:133)
>>>
>>> These seem to occur at different parts of the submitted jobs. Let me know if there’s a log file that you’d like to look at.
>>>
>>> In earlier attempts I was getting these warnings followed by broken pipe errors:
>>> Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000a0000000, 704643072, 2097152, 0) failed; error='Cannot allocate memory' (errno=12); Cannot allocate large pages, falling back to regular pages
>>>
>>> Apparently that’s a known precursor of crashes on Java 7 as described here (http://www.oracle.com/technetwork/java/javase/7u51-relnotes-2085002.html):
>>> Area: hotspot/gc
>>> Synopsis: Crashes due to failure to allocate large pages.
>>>
>>> On Linux, failures when allocating large pages can lead to crashes. When running JDK 7u51 or later versions, the issue can be recognized in two ways:
>>>
>>> • Before the crash happens one or more lines similar to this will have been printed to the log:
>>> os::commit_memory(0x00000006b1600000, 352321536, 2097152, 0) failed;
>>> error='Cannot allocate memory' (errno=12); Cannot allocate large pages, falling back to regular pages
>>> • If a hs_err file is generated it will contain a line similar to this:
>>> Large page allocation failures have occurred 3 times
>>> The problem can be avoided by running with large page support turned off, for example by passing the "-XX:-UseLargePages" option to the java binary.
>>>
>>> See 8007074 (not public).
>>>
>>> So I added the -XX:-UseLargePages option in the invocations of Java code that I was responsible for. That seemed to get rid of the warning and the crashes for a while, but perhaps that was just a coincidence.
>>>
>>> Jonathan
>>>
>>> _______________________________________________
>>> Swift-user mailing list
>>> Swift-user at ci.uchicago.edu
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
> --
> Michael Wilde
> Mathematics and Computer Science Computation Institute
> Argonne National Laboratory The University of Chicago
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
More information about the Swift-user
mailing list