[Swift-user] exception @ swift-int.k, line: 511, Caused by: Block task failed: Connection to worker lost

Yadu Nand yadudoc1729 at gmail.com
Thu Jul 31 13:02:18 CDT 2014


Hi Mike,

I checked Jonathan's folders and it looks like the submit scripts and the
PBS submit, submit.stdout and submit.stderr files
correctly were written under the runNNN/scripts folder. His latest run was
using Swift-0.95-RC6 which failed with the logs
that you saw. The are also PBS*submit.stderr files which report the same
"uninitialized value $s in concatenation" error.

-Yadu





On Thu, Jul 31, 2014 at 9:18 AM, Michael Wilde <wilde at anl.gov> wrote:

> I see this from PBS in your home dir:
>
> blues$ cat 583937.bmgt1.lcrc.anl.gov.ER
> Use of uninitialized value $s in concatenation (.) or string at
> /home/ozik/.globus/coasters/cscript4312030037430783094.pl line 2220.
> Use of uninitialized value $s in concatenation (.) or string at
> /home/ozik/.globus/coasters/cscript4312030037430783094.pl line 2220.
> blues$
>
> That looks to me like a Swift bug in worker.pl
>
> We'll look into this angle.
>
> Also I'm curious why these files are not going into your run dir (but
> perhaps thats because youre running an older trunk release, not 0.95?
> Or, thats a separate 0.95 bug).
>
> - Mike
>
> On 7/31/14, 9:13 AM, Michael Wilde wrote:
> > Some discussion and diagnosis of this incident has taken place off list.
> >
> > In a quick scan of the worker logs, I don't spot an obvious error that
> > would cause workers to exit.
> > Hopefully others on the Swift team can check those as well.
> >
> > Jonathan, do you have stdout/err files from the PBS scheduler on blues,
> > in your runNNN log dirs?
> >
> > If so, can you point us to them?
> >
> > Thanks,
> >
> > - Mike
> >
> > On 7/29/14, 8:56 PM, Jonathan Ozik wrote:
> >> Hi all,
> >>
> >> I’m getting spurious errors in the jobs that I’m running on Blues. The
> stdout includes exceptions like:
> >>      exception @ swift-int.k, line: 511
> >> Caused by: Block task failed: Connection to worker lost
> >> java.io.IOException: Broken pipe
> >>      at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
> >>      at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
> >>      at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
> >>      at sun.nio.ch.IOUtil.write(IOUtil.java:65)
> >>      at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
> >>      at
> org.globus.cog.coaster.channels.NIOSender.write(NIOSender.java:168)
> >>      at
> org.globus.cog.coaster.channels.NIOSender.run(NIOSender.java:133)
> >>
> >> These seem to occur at different parts of the submitted jobs. Let me
> know if there’s a log file that you’d like to look at.
> >>
> >> In earlier attempts I was getting these warnings followed by broken
> pipe errors:
> >> Java HotSpot(TM) 64-Bit Server VM warning: INFO:
> os::commit_memory(0x00000000a0000000, 704643072, 2097152, 0) failed;
> error='Cannot allocate memory' (errno=12); Cannot allocate large pages,
> falling back to regular pages
> >>
> >> Apparently that’s a known precursor of crashes on Java 7 as described
> here (
> http://www.oracle.com/technetwork/java/javase/7u51-relnotes-2085002.html):
> >> Area: hotspot/gc
> >> Synopsis: Crashes due to failure to allocate large pages.
> >>
> >> On Linux, failures when allocating large pages can lead to crashes.
> When running JDK 7u51 or later versions, the issue can be recognized in two
> ways:
> >>
> >>      • Before the crash happens one or more lines similar to this will
> have been printed to the log:
> >> os::commit_memory(0x00000006b1600000, 352321536, 2097152, 0) failed;
> >> error='Cannot allocate memory' (errno=12); Cannot allocate large pages,
> falling back to regular pages
> >>      • If a hs_err file is generated it will contain a line similar to
> this:
> >> Large page allocation failures have occurred 3 times
> >> The problem can be avoided by running with large page support turned
> off, for example by passing the "-XX:-UseLargePages" option to the java
> binary.
> >>
> >> See 8007074 (not public).
> >>
> >> So I added the -XX:-UseLargePages option in the invocations of Java
> code that I was responsible for. That seemed to get rid of the warning and
> the crashes for a while, but perhaps that was just a coincidence.
> >>
> >> Jonathan
> >>
> >> _______________________________________________
> >> Swift-user mailing list
> >> Swift-user at ci.uchicago.edu
> >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
> --
> Michael Wilde
> Mathematics and Computer Science          Computation Institute
> Argonne National Laboratory               The University of Chicago
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>



-- 
Yadu Nand B
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140731/2c92b07c/attachment.html>


More information about the Swift-user mailing list