bombing out writing large scratch files

Barry Smith bsmith at mcs.anl.gov
Sat May 27 21:53:54 CDT 2006


mpirun -np 2 valgrind --tool=memcheck executable




On Sat, 27 May 2006, Randall Mackie wrote:

> If using valgrind, can you tell me how to do that with mpirun and
> a parallel petsc program?
>
> is it valgrind mpirun program, or mpirun valgrind program?
>
> Randy
>
>
> Barry Smith wrote:
>>
>>   Sometimes a subtle memory bug can lurk under the covers and
>> then appear in a big problem. You can try putting a CHKMEMQ
>> right before the if (rank == ) in the code and run the debug
>> version with -malloc_debug
>> You could also consider valgrind (valgrind.org).
>>
>>    Barry
>> 
>> On Sat, 27 May 2006, Randall Mackie wrote:
>> 
>>> xvec is a double precision complex vector that is dynamically allocated
>>> once np is known. I've printed out the np value and it is correct.
>>> This works on the first pass, but not the second.
>>> 
>>> This PETSc program has been working just fine for a couple years now,
>>> the only difference this time is the size of the model I'm working
>>> with, which is substantially larger than typical.
>>> 
>>> I'm going to try to run this in the debugger and see if I can get
>>> anymore information.
>>> 
>>> Randy
>>> 
>>> 
>>> Barry Smith wrote:
>>>>
>>>>   Randy,
>>>>
>>>>     The only "PETSc" related reason for this is that
>>>> xvec(i), i=1,np is accessing out of range. What is xvec
>>>> and is it of length 1 to np?
>>>>
>>>>    Barry
>>>> 
>>>> 
>>>> On Sat, 27 May 2006, Randall Mackie wrote:
>>>> 
>>>>> In my PETSc based modeling code, I write out intermediate results to a 
>>>>> scratch
>>>>> file, and then read them back later. This has worked fine up until 
>>>>> today,
>>>>> when for a large model, this seems to be causing my program to crash 
>>>>> with
>>>>> errors like:
>>>>> 
>>>>> ------------------------------------------------------------------------ 
>>>>> [9]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, 
>>>>> probably memory access out of range
>>>>> 
>>>>> 
>>>>> I've tracked down the offending code to:
>>>>>
>>>>>          IF (rank == 0) THEN
>>>>>            irec=(iper-1)*2+ipol
>>>>>            write(7,rec=irec) (xvec(i),i=1,np)
>>>>>          END IF
>>>>> 
>>>>> It writes out xvec for the first record, but then on the second
>>>>> record my program is crashing.
>>>>> 
>>>>> The record length (from an inquire statement) is  recl     22626552
>>>>> 
>>>>> The size of the scratch file when my program crashes is 98M.
>>>>> 
>>>>> PETSc is compiled using the intel compilers (v9.0 for fortran),
>>>>> and the users manual says that you can have record lengths of
>>>>> up to 2 billion bytes.
>>>>> 
>>>>> I'm kind of stuck as to what might be the cause. Any ideas from anyone
>>>>> would be greatly appreciated.
>>>>> 
>>>>> Randy Mackie
>>>>> 
>>>>> ps. I've tried both the optimized and debugging versions of the PETSc
>>>>> libraries, with the same result.
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>> 
>
>




More information about the petsc-users mailing list