bombing out writing large scratch files
Randall Mackie
randy at geosystem.us
Sat May 27 19:38:56 CDT 2006
This is frustrating. Now I'm getting this Message:
[randy at cluster Delta_May06]$ ./cmd_inv_petsc_3_3
Warning: No xauth data; using fake authentication data for X11 forwarding.
Warning: No xauth data; using fake authentication data for X11 forwarding.
Warning: No xauth data; using fake authentication data for X11 forwarding.
Warning: No xauth data; using fake authentication data for X11 forwarding.
Warning: No xauth data; using fake authentication data for X11 forwarding.
Warning: No xauth data; using fake authentication data for X11 forwarding.
Warning: No xauth data; using fake authentication data for X11 forwarding.
Warning: No xauth data; using fake authentication data for X11 forwarding.
Warning: No xauth data; using fake authentication data for X11 forwarding.
Warning: No xauth data; using fake authentication data for X11 forwarding.
------------------------------------------------------------------------
Petsc Release Version 2.3.1, Patch 13, Wed May 10 11:08:35 CDT 2006
BK revision: balay at asterix.mcs.anl.gov|ChangeSet|20060510160640|13832
See docs/changes/index.html for recent updates.
See docs/faq.html for hints about trouble shooting.
See docs/index.html for manual pages.
------------------------------------------------------------------------
/home/randy/d3inv/PETSC_V3.3/d3inv_3_3_petsc on a linux-gnu named cluster.geo.hpc by randy Sun May 28 02:26:04 2006
Libraries linked from /home/randy/SPARSE/petsc-2.3.1-p13/lib/linux-gnu
Configure run at Sun May 28 00:21:24 2006
Configure options --with-fortran --with-fortran-kernels=generic --with-blas-lapack-dir=/opt/intel/mkl72cluster/lib/32
--with-scalar-type=complex --with-debugging=1 --with-mpi-dir=/opt/mpich/intel --with-shared=0
------------------------------------------------------------------------
[0]PETSC ERROR: PETSC: Attaching gdb to /home/randy/d3inv/PETSC_V3.3/d3inv_3_3_petsc of
pid 4861 on display 24.5.142.138:0.0 on machine cluster.geo.hpc
I noticed, however, that in /opt/mpich/intel/bin, there are other mpirun commands,
like mpirun_dbg.gdb, so I'll give that a try and see if that helps.
Randy
Barry Smith wrote:
>
> That's the correct message; it is trying to do the right
> thing. My guess is that the cluster node doesn't have a
> path back to your display. If, for example, the MPI jobs
> are not started up with ssh with X forwarding.
>
> What happens when you do
> mpirun -np 1 -nolocal -machinefile machines xterm -display
> 24.5.142.138:0.0
> does it open an xterm on your system?
>
> BTW: it is -debugger_nodes 1 not -debugger_node 1
>
> How about not using the -nolocal and use -debugger_nodes 0?
>
> Barry
>
>
> On Sat, 27 May 2006, Randall Mackie wrote:
>
>> I can't seem to get the debugger to pop up on my screen.
>>
>> When I'm logged into the cluster I'm working on, I can
>> type xterm &, and an xterm pops up on my display. So I know
>> I can get something from the remote cluster.
>>
>> Now, when I try this using PETSc, I'm getting the following error
>> message, for example:
>>
>> ------------------------------------------------------------------------
>> [17]PETSC ERROR: PETSC: Attaching gdb to
>> /home/randy/d3inv/PETSC_V3.3/d3inv_3_3_petsc of pid 3628 on display
>> 24.5.142.138:0.0 on machine compute-0-23.local
>> ------------------------------------------------------------------------
>>
>> I'm using this in my command file:
>>
>> source ~/.bashrc
>> time /opt/mpich/intel/bin/mpirun -np 20 -nolocal -machinefile machines \
>> /home/randy/d3inv/PETSC_V3.3/d3inv_3_3_petsc \
>> -start_in_debugger \
>> -debugger_node 1 \
>> -display 24.5.142.138:0.0 \
>> -em_ksp_type bcgs \
>> -em_sub_pc_type ilu \
>> -em_sub_pc_factor_levels 8 \
>> -em_sub_pc_factor_fill 4 \
>> -em_sub_pc_factor_reuse_ordering \
>> -em_sub_pc_factor_reuse_fill \
>> -em_sub_pc_factor_mat_ordering_type rcm \
>> -divh_ksp_type cr \
>> -divh_sub_pc_type icc \
>> -ppc_sub_pc_type ilu \
>> << EOF
>> ...
>>
>>
>> Randy
>>
>>
>> Matthew Knepley wrote:
>>> 1) Make sure ssh is forwarding X (-Y I think)
>>>
>>> 2) -start_in_debugger
>>>
>>> 3) -display <your machine>:0.0
>>>
>>> should do it.
>>>
>>> Matt
>>>
>>> On 5/27/06, *Randall Mackie* <randy at geosystem.us
>>> <mailto:randy at geosystem.us>> wrote:
>>>
>>> This is a stupid question, but how do I start in the debugger if I'm
>>> running
>>> on a cluster half-way around the world and I'm working on that
>>> cluster
>>> via ssh?
>>>
>>> Randy
>>>
>>>
>>> Matthew Knepley wrote:
>>> > The best thing to do here is get a stack trace from the debugger.
>>> From the
>>> > description, it is hard to tell what statement is trying to
>>> access which
>>> > illegal
>>> > memory.
>>> >
>>> > Matt
>>> >
>>> > On 5/27/06, *Randall Mackie* < randy at geosystem.us
>>> <mailto:randy at geosystem.us>
>>> > <mailto:randy at geosystem.us <mailto:randy at geosystem.us>>> wrote:
>>> >
>>> > In my PETSc based modeling code, I write out intermediate
>>> results to
>>> > a scratch
>>> > file, and then read them back later. This has worked fine up
>>> until
>>> > today,
>>> > when for a large model, this seems to be causing my program
>>> to crash
>>> > with
>>> > errors like:
>>> >
>>> >
>>> ------------------------------------------------------------------------
>>> > [9]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
>>> > Violation, probably memory access out of range
>>> >
>>> >
>>> > I've tracked down the offending code to:
>>> >
>>> > IF (rank == 0) THEN
>>> > irec=(iper-1)*2+ipol
>>> > write(7,rec=irec) (xvec(i),i=1,np)
>>> > END IF
>>> >
>>> > It writes out xvec for the first record, but then on the
>>> second
>>> > record my program is crashing.
>>> >
>>> > The record length (from an inquire statement) is recl
>>> 22626552
>>> >
>>> > The size of the scratch file when my program crashes is 98M.
>>> >
>>> > PETSc is compiled using the intel compilers ( v9.0 for
>>> fortran),
>>> > and the users manual says that you can have record lengths of
>>> > up to 2 billion bytes.
>>> >
>>> > I'm kind of stuck as to what might be the cause. Any ideas
>>> from anyone
>>> > would be greatly appreciated.
>>> >
>>> > Randy Mackie
>>> >
>>> > ps. I've tried both the optimized and debugging versions of
>>> the PETSc
>>> > libraries, with the same result.
>>> >
>>> >
>>> > --
>>> > Randall Mackie
>>> > GSY-USA, Inc.
>>> > PMB# 643
>>> > 2261 Market St.,
>>> > San Francisco, CA 94114-1600
>>> > Tel (415) 469-8649
>>> > Fax (415) 469-5044
>>> >
>>> > California Registered Geophysicist
>>> > License No. GP 1034
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > "Failure has a thousand explanations. Success doesn't need one"
>>> -- Sir
>>> > Alec Guiness
>>>
>>> --
>>> Randall Mackie
>>> GSY-USA, Inc.
>>> PMB# 643
>>> 2261 Market St.,
>>> San Francisco, CA 94114-1600
>>> Tel (415) 469-8649
>>> Fax (415) 469-5044
>>>
>>> California Registered Geophysicist
>>> License No. GP 1034
>>>
>>>
>>>
>>>
>>> --
>>> "Failure has a thousand explanations. Success doesn't need one" --
>>> Sir Alec Guiness
>>
>>
>
--
Randall Mackie
GSY-USA, Inc.
PMB# 643
2261 Market St.,
San Francisco, CA 94114-1600
Tel (415) 469-8649
Fax (415) 469-5044
California Registered Geophysicist
License No. GP 1034
More information about the petsc-users
mailing list