Process interrupted error
Barry Smith
bsmith at mcs.anl.gov
Fri Apr 13 11:24:12 CDT 2007
Runs in debugger sometimes and not outside debugger is often
a sign of memory corruption. Run with -malloc_debug and put
a CHKMEMQ; directly before you create the scatter context
see if that gives you any useful information.
Please respond to petsc-maint and not petsc-users.
Barry
On Thu, 12 Apr 2007, Knut Erik Teigen wrote:
> On Wed, 2007-04-11 at 06:55 -0500, Barry Smith wrote:
> > Something is sending an interupt signal to all the proccess
> > except one. We've seen this happen where the "batch or node
> > scheduler does this to kill a long running job".
> >
> > Does it happen even if the parallel vector is real short?
>
> Yes, even with a vector of size 1, this happens.
> >
> > Does the machine seem to hang at that point or does the sigint
> > come immediately?
> It hangs for a short period, and then displays the sigint error.
> When running with 2 processes, it also hangs a bit, but then continues
> with the correct results.
> >
> > Can you us the runtime option -start_in_debugger or totalview
> > to catch the signal?
> Something really weird happens here. The first time I run the program
> with -start_in_debugger noxterm,idb , it finishes correctly, no matter
> the number of processes or problem size. The second time, however, I get
> the sigint errors.
> When I run with idb I get:
> (idb) Program exited normally.
> (idb) Program received signal SIGINT
>
> With gdb I get:
> 0xffffe410 in __kernel_vsyscall ()
> and then the program hangs.
>
> However I get this message in the beginning with both:
> [3]PETSC ERROR: PETSC: Attaching gdb to ./out of pid 30549 on ivt0415
>
> -Knut Erik-
>
> >
> > Barry
> >
> > On Wed, 11 Apr 2007, Knut Erik Teigen wrote:
> >
> > > Hello,
> > >
> > > I get the following error when trying to copy a solution vector to
> > > process zero using VecScatterCreateToZero:
> > >
> > > [0] VecScatterCreate(): Special case: processor zero gets entire
> > > parallel vector, rest get none
> > > forrtl: error (69): process interrupted (SIGINT)
> > > forrtl: error (69): process interrupted (SIGINT)
> > >
> > > When running with one or two processes, the code runs fine, but with
> > > three or more, the above error occurs, with one "process interrupted"
> > > error for each process minus one. Could someone help me figure out
> > > what's wrong?
> > >
> > > Regards,
> > > Knut Erik Teigen
> > >
> > >
> > >
> > >
> >
> >
>
>
More information about the petsc-users
mailing list