bombing out writing large scratch files
Barry Smith
bsmith at mcs.anl.gov
Sat May 27 19:19:16 CDT 2006
That's the correct message; it is trying to do the right
thing. My guess is that the cluster node doesn't have a
path back to your display. If, for example, the MPI jobs
are not started up with ssh with X forwarding.
What happens when you do
mpirun -np 1 -nolocal -machinefile machines xterm -display 24.5.142.138:0.0
does it open an xterm on your system?
BTW: it is -debugger_nodes 1 not -debugger_node 1
How about not using the -nolocal and use -debugger_nodes 0?
Barry
On Sat, 27 May 2006, Randall Mackie wrote:
> I can't seem to get the debugger to pop up on my screen.
>
> When I'm logged into the cluster I'm working on, I can
> type xterm &, and an xterm pops up on my display. So I know
> I can get something from the remote cluster.
>
> Now, when I try this using PETSc, I'm getting the following error
> message, for example:
>
> ------------------------------------------------------------------------
> [17]PETSC ERROR: PETSC: Attaching gdb to
> /home/randy/d3inv/PETSC_V3.3/d3inv_3_3_petsc of pid 3628 on display
> 24.5.142.138:0.0 on machine compute-0-23.local
> ------------------------------------------------------------------------
>
> I'm using this in my command file:
>
> source ~/.bashrc
> time /opt/mpich/intel/bin/mpirun -np 20 -nolocal -machinefile machines \
> /home/randy/d3inv/PETSC_V3.3/d3inv_3_3_petsc \
> -start_in_debugger \
> -debugger_node 1 \
> -display 24.5.142.138:0.0 \
> -em_ksp_type bcgs \
> -em_sub_pc_type ilu \
> -em_sub_pc_factor_levels 8 \
> -em_sub_pc_factor_fill 4 \
> -em_sub_pc_factor_reuse_ordering \
> -em_sub_pc_factor_reuse_fill \
> -em_sub_pc_factor_mat_ordering_type rcm \
> -divh_ksp_type cr \
> -divh_sub_pc_type icc \
> -ppc_sub_pc_type ilu \
> << EOF
> ...
>
>
> Randy
>
>
> Matthew Knepley wrote:
>> 1) Make sure ssh is forwarding X (-Y I think)
>>
>> 2) -start_in_debugger
>>
>> 3) -display <your machine>:0.0
>>
>> should do it.
>>
>> Matt
>>
>> On 5/27/06, *Randall Mackie* <randy at geosystem.us
>> <mailto:randy at geosystem.us>> wrote:
>>
>> This is a stupid question, but how do I start in the debugger if I'm
>> running
>> on a cluster half-way around the world and I'm working on that cluster
>> via ssh?
>>
>> Randy
>>
>>
>> Matthew Knepley wrote:
>> > The best thing to do here is get a stack trace from the debugger.
>> From the
>> > description, it is hard to tell what statement is trying to
>> access which
>> > illegal
>> > memory.
>> >
>> > Matt
>> >
>> > On 5/27/06, *Randall Mackie* < randy at geosystem.us
>> <mailto:randy at geosystem.us>
>> > <mailto:randy at geosystem.us <mailto:randy at geosystem.us>>> wrote:
>> >
>> > In my PETSc based modeling code, I write out intermediate
>> results to
>> > a scratch
>> > file, and then read them back later. This has worked fine up
>> until
>> > today,
>> > when for a large model, this seems to be causing my program
>> to crash
>> > with
>> > errors like:
>> >
>> >
>> ------------------------------------------------------------------------
>> > [9]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
>> > Violation, probably memory access out of range
>> >
>> >
>> > I've tracked down the offending code to:
>> >
>> > IF (rank == 0) THEN
>> > irec=(iper-1)*2+ipol
>> > write(7,rec=irec) (xvec(i),i=1,np)
>> > END IF
>> >
>> > It writes out xvec for the first record, but then on the second
>> > record my program is crashing.
>> >
>> > The record length (from an inquire statement) is recl
>> 22626552
>> >
>> > The size of the scratch file when my program crashes is 98M.
>> >
>> > PETSc is compiled using the intel compilers ( v9.0 for fortran),
>> > and the users manual says that you can have record lengths of
>> > up to 2 billion bytes.
>> >
>> > I'm kind of stuck as to what might be the cause. Any ideas
>> from anyone
>> > would be greatly appreciated.
>> >
>> > Randy Mackie
>> >
>> > ps. I've tried both the optimized and debugging versions of
>> the PETSc
>> > libraries, with the same result.
>> >
>> >
>> > --
>> > Randall Mackie
>> > GSY-USA, Inc.
>> > PMB# 643
>> > 2261 Market St.,
>> > San Francisco, CA 94114-1600
>> > Tel (415) 469-8649
>> > Fax (415) 469-5044
>> >
>> > California Registered Geophysicist
>> > License No. GP 1034
>> >
>> >
>> >
>> >
>> > --
>> > "Failure has a thousand explanations. Success doesn't need one"
>> -- Sir
>> > Alec Guiness
>>
>> --
>> Randall Mackie
>> GSY-USA, Inc.
>> PMB# 643
>> 2261 Market St.,
>> San Francisco, CA 94114-1600
>> Tel (415) 469-8649
>> Fax (415) 469-5044
>>
>> California Registered Geophysicist
>> License No. GP 1034
>>
>>
>>
>>
>> --
>> "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec
>> Guiness
>
>
More information about the petsc-users
mailing list