Process interrupted error

Barry Smith bsmith at mcs.anl.gov
Fri Apr 13 11:24:12 CDT 2007


  Runs in debugger sometimes and not outside debugger is often
a sign of memory corruption. Run with -malloc_debug and put 
a CHKMEMQ; directly before you create the scatter context
see if that gives you any useful information.

  Please respond to petsc-maint and not petsc-users.

   Barry


On Thu, 12 Apr 2007, Knut Erik Teigen wrote:

> On Wed, 2007-04-11 at 06:55 -0500, Barry Smith wrote:
> >   Something is sending an interupt signal to all the proccess
> > except one. We've seen this happen where the "batch or node
> > scheduler does this to kill a long running job".
> > 
> >   Does it happen even if the parallel vector is real short?
> 
> Yes, even with a vector of size 1, this happens.
> > 
> >   Does the machine seem to hang at that point or does the sigint
> > come immediately?
> It hangs for a short period, and then displays the sigint error.
> When running with 2 processes, it also hangs a bit, but then continues
> with the correct results.
> > 
> >   Can you us the runtime option -start_in_debugger or totalview
> > to catch the signal?
> Something really weird happens here. The first time I run the program
> with -start_in_debugger noxterm,idb , it finishes correctly, no matter
> the number of processes or problem size. The second time, however, I get
> the sigint errors.
> When I run with idb I get:
> (idb) Program exited normally.
> (idb) Program received signal SIGINT
> 
> With gdb I get:
> 0xffffe410 in __kernel_vsyscall ()
> and then the program hangs.
> 
> However I get this message in the beginning with both:
> [3]PETSC ERROR: PETSC: Attaching gdb to ./out of pid 30549 on ivt0415
> 
> -Knut Erik-
> 
> > 
> >    Barry
> > 
> > On Wed, 11 Apr 2007, Knut Erik Teigen wrote:
> > 
> > > Hello,
> > > 
> > > I get the following error when trying to copy a solution vector to
> > > process zero using VecScatterCreateToZero:
> > > 
> > > [0] VecScatterCreate(): Special case: processor zero gets entire
> > > parallel vector, rest get none
> > > forrtl: error (69): process interrupted (SIGINT)
> > > forrtl: error (69): process interrupted (SIGINT)
> > > 
> > > When running with one or two processes, the code runs fine, but with
> > > three or more, the above error occurs, with one "process interrupted"
> > > error for each process minus one. Could someone help me figure out
> > > what's wrong?
> > > 
> > > Regards,
> > > Knut Erik Teigen
> > > 
> > >  
> > > 
> > > 
> > 
> > 
> 
> 




More information about the petsc-users mailing list