bombing out writing large scratch files

Barry Smith bsmith at mcs.anl.gov
Sat May 27 19:19:16 CDT 2006


   That's the correct message; it is trying to do the right
thing. My guess is that the cluster node doesn't have a
path back to your display. If, for example, the MPI jobs
are not started up with ssh with X forwarding.

   What happens when you do
mpirun -np 1 -nolocal -machinefile machines xterm -display  24.5.142.138:0.0
does it open an xterm on your system?

   BTW: it is -debugger_nodes 1 not -debugger_node 1

How about not using the -nolocal and use -debugger_nodes 0?

    Barry


On Sat, 27 May 2006, Randall Mackie wrote:

> I can't seem to get the debugger to pop up on my screen.
>
> When I'm logged into the cluster I'm working on, I can
> type xterm &, and an xterm pops up on my display. So I know
> I can get something from the remote cluster.
>
> Now, when I try this using PETSc, I'm getting the following error
> message, for example:
>
> ------------------------------------------------------------------------
> [17]PETSC ERROR: PETSC: Attaching gdb to 
> /home/randy/d3inv/PETSC_V3.3/d3inv_3_3_petsc of pid 3628 on display 
> 24.5.142.138:0.0 on machine compute-0-23.local
> ------------------------------------------------------------------------
>
> I'm using this in my command file:
>
> source ~/.bashrc
> time /opt/mpich/intel/bin/mpirun -np 20 -nolocal -machinefile machines \
>         /home/randy/d3inv/PETSC_V3.3/d3inv_3_3_petsc \
>         -start_in_debugger \
>         -debugger_node 1 \
>         -display 24.5.142.138:0.0 \
>         -em_ksp_type bcgs \
>         -em_sub_pc_type ilu \
>         -em_sub_pc_factor_levels 8 \
>         -em_sub_pc_factor_fill 4 \
>         -em_sub_pc_factor_reuse_ordering \
>         -em_sub_pc_factor_reuse_fill \
>         -em_sub_pc_factor_mat_ordering_type rcm \
>         -divh_ksp_type cr \
>         -divh_sub_pc_type icc \
>         -ppc_sub_pc_type ilu \
> << EOF
> ...
>
>
> Randy
>
>
> Matthew Knepley wrote:
>> 1) Make sure ssh is forwarding X (-Y I think)
>> 
>> 2) -start_in_debugger
>> 
>> 3) -display <your machine>:0.0
>> 
>> should do it.
>>
>>    Matt
>> 
>> On 5/27/06, *Randall Mackie* <randy at geosystem.us 
>> <mailto:randy at geosystem.us>> wrote:
>>
>>     This is a stupid question, but how do I start in the debugger if I'm
>>     running
>>     on a cluster half-way around the world and I'm working on that cluster
>>     via ssh?
>>
>>     Randy
>> 
>>
>>     Matthew Knepley wrote:
>>      > The best thing to do here is get a stack trace from the debugger.
>>     From the
>>      > description, it is hard to tell what statement is trying to
>>     access which
>>      > illegal
>>      > memory.
>>      >
>>      >    Matt
>>      >
>>      > On 5/27/06, *Randall Mackie* < randy at geosystem.us
>>     <mailto:randy at geosystem.us>
>>      > <mailto:randy at geosystem.us <mailto:randy at geosystem.us>>> wrote:
>>      >
>>      >     In my PETSc based modeling code, I write out intermediate
>>     results to
>>      >     a scratch
>>      >     file, and then read them back later. This has worked fine up
>>     until
>>      >     today,
>>      >     when for a large model, this seems to be causing my program
>>     to crash
>>      >     with
>>      >     errors like:
>>      >
>>      > 
>> ------------------------------------------------------------------------
>>      >     [9]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
>>      >     Violation, probably memory access out of range
>>      >
>>      >
>>      >     I've tracked down the offending code to:
>>      >
>>      >                IF (rank == 0) THEN
>>      >                  irec=(iper-1)*2+ipol
>>      >                  write(7,rec=irec) (xvec(i),i=1,np)
>>      >                END IF
>>      >
>>      >     It writes out xvec for the first record, but then on the second
>>      >     record my program is crashing.
>>      >
>>      >     The record length (from an inquire statement) is  recl 
>> 22626552
>>      >
>>      >     The size of the scratch file when my program crashes is 98M.
>>      >
>>      >     PETSc is compiled using the intel compilers ( v9.0 for fortran),
>>      >     and the users manual says that you can have record lengths of
>>      >     up to 2 billion bytes.
>>      >
>>      >     I'm kind of stuck as to what might be the cause. Any ideas
>>     from anyone
>>      >     would be greatly appreciated.
>>      >
>>      >     Randy Mackie
>>      >
>>      >     ps. I've tried both the optimized and debugging versions of
>>     the PETSc
>>      >     libraries, with the same result.
>>      >
>>      >
>>      >     --
>>      >     Randall Mackie
>>      >     GSY-USA, Inc.
>>      >     PMB# 643
>>      >     2261 Market St.,
>>      >     San Francisco, CA 94114-1600
>>      >     Tel (415) 469-8649
>>      >     Fax (415) 469-5044
>>      >
>>      >     California Registered Geophysicist
>>      >     License No. GP 1034
>>      >
>>      >
>>      >
>>      >
>>      > --
>>      > "Failure has a thousand explanations. Success doesn't need one"
>>     -- Sir
>>      > Alec Guiness
>>
>>     --
>>     Randall Mackie
>>     GSY-USA, Inc.
>>     PMB# 643
>>     2261 Market St.,
>>     San Francisco, CA 94114-1600
>>     Tel (415) 469-8649
>>     Fax (415) 469-5044
>>
>>     California Registered Geophysicist
>>     License No. GP 1034
>> 
>> 
>> 
>> 
>> -- 
>> "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec 
>> Guiness
>
>




More information about the petsc-users mailing list