bombing out writing large scratch files

Matthew Knepley knepley at gmail.com
Sat May 27 19:30:52 CDT 2006


What the error? It always shows the error when it cannot pop up the window.
Sounds like a problem with some batch environment being different from the
interactive node. Computer centers are the worst run thing in the world.

  Matt

On 5/27/06, Randall Mackie <randy at geosystem.us> wrote:
>
> I can't seem to get the debugger to pop up on my screen.
>
> When I'm logged into the cluster I'm working on, I can
> type xterm &, and an xterm pops up on my display. So I know
> I can get something from the remote cluster.
>
> Now, when I try this using PETSc, I'm getting the following error
> message, for example:
>
> ------------------------------------------------------------------------
> [17]PETSC ERROR: PETSC: Attaching gdb to
> /home/randy/d3inv/PETSC_V3.3/d3inv_3_3_petsc of pid 3628 on display
> 24.5.142.138:0.0 on
> machine compute-0-23.local
> ------------------------------------------------------------------------
>
> I'm using this in my command file:
>
> source ~/.bashrc
> time /opt/mpich/intel/bin/mpirun -np 20 -nolocal -machinefile machines \
>           /home/randy/d3inv/PETSC_V3.3/d3inv_3_3_petsc \
>           -start_in_debugger \
>           -debugger_node 1 \
>           -display 24.5.142.138:0.0 \
>           -em_ksp_type bcgs \
>           -em_sub_pc_type ilu \
>           -em_sub_pc_factor_levels 8 \
>           -em_sub_pc_factor_fill 4 \
>           -em_sub_pc_factor_reuse_ordering \
>           -em_sub_pc_factor_reuse_fill \
>           -em_sub_pc_factor_mat_ordering_type rcm \
>           -divh_ksp_type cr \
>           -divh_sub_pc_type icc \
>           -ppc_sub_pc_type ilu \
> << EOF
> ...
>
>
> Randy
>
>
> Matthew Knepley wrote:
> > 1) Make sure ssh is forwarding X (-Y I think)
> >
> > 2) -start_in_debugger
> >
> > 3) -display <your machine>:0.0
> >
> > should do it.
> >
> >    Matt
> >
> > On 5/27/06, *Randall Mackie* <randy at geosystem.us
> > <mailto:randy at geosystem.us>> wrote:
> >
> >     This is a stupid question, but how do I start in the debugger if I'm
> >     running
> >     on a cluster half-way around the world and I'm working on that
> cluster
> >     via ssh?
> >
> >     Randy
> >
> >
> >     Matthew Knepley wrote:
> >      > The best thing to do here is get a stack trace from the debugger.
> >     From the
> >      > description, it is hard to tell what statement is trying to
> >     access which
> >      > illegal
> >      > memory.
> >      >
> >      >    Matt
> >      >
> >      > On 5/27/06, *Randall Mackie* < randy at geosystem.us
> >     <mailto:randy at geosystem.us>
> >      > <mailto:randy at geosystem.us <mailto:randy at geosystem.us>>> wrote:
> >      >
> >      >     In my PETSc based modeling code, I write out intermediate
> >     results to
> >      >     a scratch
> >      >     file, and then read them back later. This has worked fine up
> >     until
> >      >     today,
> >      >     when for a large model, this seems to be causing my program
> >     to crash
> >      >     with
> >      >     errors like:
> >      >
> >      >
> >
> ------------------------------------------------------------------------
> >      >     [9]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
> >      >     Violation, probably memory access out of range
> >      >
> >      >
> >      >     I've tracked down the offending code to:
> >      >
> >      >                IF (rank == 0) THEN
> >      >                  irec=(iper-1)*2+ipol
> >      >                  write(7,rec=irec) (xvec(i),i=1,np)
> >      >                END IF
> >      >
> >      >     It writes out xvec for the first record, but then on the
> second
> >      >     record my program is crashing.
> >      >
> >      >     The record length (from an inquire statement) is  recl
> >     22626552
> >      >
> >      >     The size of the scratch file when my program crashes is 98M.
> >      >
> >      >     PETSc is compiled using the intel compilers ( v9.0 for
> fortran),
> >      >     and the users manual says that you can have record lengths of
> >      >     up to 2 billion bytes.
> >      >
> >      >     I'm kind of stuck as to what might be the cause. Any ideas
> >     from anyone
> >      >     would be greatly appreciated.
> >      >
> >      >     Randy Mackie
> >      >
> >      >     ps. I've tried both the optimized and debugging versions of
> >     the PETSc
> >      >     libraries, with the same result.
> >      >
> >      >
> >      >     --
> >      >     Randall Mackie
> >      >     GSY-USA, Inc.
> >      >     PMB# 643
> >      >     2261 Market St.,
> >      >     San Francisco, CA 94114-1600
> >      >     Tel (415) 469-8649
> >      >     Fax (415) 469-5044
> >      >
> >      >     California Registered Geophysicist
> >      >     License No. GP 1034
> >      >
> >      >
> >      >
> >      >
> >      > --
> >      > "Failure has a thousand explanations. Success doesn't need one"
> >     -- Sir
> >      > Alec Guiness
> >
> >     --
> >     Randall Mackie
> >     GSY-USA, Inc.
> >     PMB# 643
> >     2261 Market St.,
> >     San Francisco, CA 94114-1600
> >     Tel (415) 469-8649
> >     Fax (415) 469-5044
> >
> >     California Registered Geophysicist
> >     License No. GP 1034
> >
> >
> >
> >
> > --
> > "Failure has a thousand explanations. Success doesn't need one" -- Sir
> > Alec Guiness
>
> --
> Randall Mackie
> GSY-USA, Inc.
> PMB# 643
> 2261 Market St.,
> San Francisco, CA 94114-1600
> Tel (415) 469-8649
> Fax (415) 469-5044
>
> California Registered Geophysicist
> License No. GP 1034
>
>


-- 
"Failure has a thousand explanations. Success doesn't need one" -- Sir Alec
Guiness
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20060527/ecceb919/attachment.htm>


More information about the petsc-users mailing list