[petsc-users] Application Error

Tue Apr 28 14:04:44 CDT 2015

  Killed (signal 9)

  means that some process (generally external to the running process) has told the process to end. In HPC this often is because

1) the OS has started running low on memory so killed the process (that is taking much of the memory) 

2) the batch system has killed the process because it has hit some limit that the has been set by the batch system (such as running too long).

   My guess is that it is an "out of memory" issue and you are simply using more memory than available. So to run the size problem you want to run you need to use more nodes on your system. It is not likely a "bug" in MPI or elsewhere.

  Barry

> On Apr 28, 2015, at 9:49 AM, Sharp Stone <thronesf at gmail.com> wrote:
> 
> Dear All,
> 
> I'm using Petsc to do the parallel computation. But an error confuses me recently. When I use a small scale computation (the computational task is small), the code runs smoothly. However, when I use a much larger parallel computation domain/task, I always get the error (as in the attachment):
> 
> [proxy:0:0 at node01] HYD_pmcd_pmip_control_cmd_cb (../../../../source/mpich-3.1.1/src/pm/hydra/pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
> [proxy:0:0 at node01] HYDT_dmxu_poll_wait_for_event (../../../../source/mpich-3.1.1/src/pm/hydra/tools/demux/demux_poll.c:76): callback returned error status
> [proxy:0:0 at node01] main (../../../../source/mpich-3.1.1/src/pm/hydra/pm/pmiserv/pmip.c:206): demux engine error waiting for event
> 
> 
> I don't know what's been wrong. Is this because there is a bug in MPI?
> Thank you in advance for any ideas and suggestions!
> 
> -- 
> Best regards,
> 
> Feng
> <out.o367446><out.e367446>