[petsc-users] slepc eating all my ram

Jose E. Roman jroman at dsic.upv.es
Sun Jul 17 12:40:52 CDT 2016


Simon:
I have made a few optimizations regarding memory management in EPS. In your case, these changes will allocate 1 vector less (maybe 2). If you are using the repository version, just pull and try again. Otherwise, wait until slepc-3.7.2 is released (in a few days).
Jose


> El 16 jul 2016, a las 17:00, Barry Smith <bsmith at mcs.anl.gov> escribió:
> 
> 
>  Send configure.log to petsc-maint at mcs.anl.gov
> 
> 
>> On Jul 16, 2016, at 8:40 AM, Simon Burton <simon at arrowtheory.com> wrote:
>> 
>> 
>> Hi again,
>> 
>> I found another machine with enough ram to run this (i think).
>> 
>> Running into another problem now, with dgemv:
>> 
>> [0] EPSSetUp_Power(): Warning: parameter mpd ignored
>> [0] STSetUp(): Setting up new ST
>> Intel MKL ERROR: Parameter 6 was incorrect on entry to DGEMV .
>> [0] BV_SafeSqrt(): Zero norm, either the vector is zero or a semi-inner product is being used
>> 
>> 
>> I dug into this in gdb a bit:
>> 
>> 
>> Breakpoint 2, 0x00007ffff4f4cbd0 in dgemv_ ()
>>  from /usr/physics/ic15/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_intel_lp64.so
>> (gdb) bt
>> #0  0x00007ffff4f4cbd0 in dgemv_ () from /usr/physics/ic15/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_intel_lp64.so
>> #1  0x00007ffff5e14b4b in BVDotVec_BLAS_Private (bv=0x6ba6b0, n_=4294967296, k_=1, A=0x7fe7f23b3650, x=0x7fe7f23b3650, 
>>   y=0x75a3b0, mpi=PETSC_FALSE) at /suphys/sburton/local/slepc-3.7.1/src/sys/classes/bv/interface/bvblas.c:274
>> #2  0x00007ffff5dcbd86 in BVDotVec_Svec (X=0x6ba6b0, y=0x74dbc0, m=0x75a3b0)
>>   at /suphys/sburton/local/slepc-3.7.1/src/sys/classes/bv/impls/svec/svec.c:150
>> #3  0x00007ffff5dffd58 in BVDotVec (X=0x6ba6b0, y=0x74dbc0, m=0x75a3b0)
>>   at /suphys/sburton/local/slepc-3.7.1/src/sys/classes/bv/interface/bvglobal.c:191
>> #4  0x00007ffff5e1aad9 in BVOrthogonalizeCGS1 (bv=0x6ba6b0, j=0, v=0x0, H=0x75a3b0, onorm=0x7fffffffdc28, 
>>   norm=0x7fffffffdc20) at /suphys/sburton/local/slepc-3.7.1/src/sys/classes/bv/interface/bvorthog.c:81
>> #5  0x00007ffff5e1c1bb in BVOrthogonalizeCGS (bv=0x6ba6b0, j=0, v=0x0, H=0x0, norm=0x7fffffffddb0, lindep=0x7fffffffddac)
>>   at /suphys/sburton/local/slepc-3.7.1/src/sys/classes/bv/interface/bvorthog.c:214
>> #6  0x00007ffff5e1ddfd in BVOrthogonalizeColumn (bv=0x6ba6b0, j=0, H=0x0, norm=0x7fffffffddb0, lindep=0x7fffffffddac)
>>   at /suphys/sburton/local/slepc-3.7.1/src/sys/classes/bv/interface/bvorthog.c:371
>> #7  0x00007ffff6050986 in EPSGetStartVector (eps=0x6a3ee0, i=0, breakdown=0x0)
>>   at /suphys/sburton/local/slepc-3.7.1/src/eps/interface/epssolve.c:758
>> #8  0x00007ffff5f52812 in EPSSolve_Power (eps=0x6a3ee0) at /suphys/sburton/local/slepc-3.7.1/src/eps/impls/power/power.c:103
>> #9  0x00007ffff6049b28 in EPSSolve (eps=0x6a3ee0) at /suphys/sburton/local/slepc-3.7.1/src/eps/interface/epssolve.c:101
>> #10 0x0000000000401430 in main ()
>> (gdb) up
>> #1  0x00007ffff5e14b4b in BVDotVec_BLAS_Private (bv=0x6ba6b0, n_=4294967296, k_=1, A=0x7fe7f23b3650, x=0x7fe7f23b3650, 
>>   y=0x75a3b0, mpi=PETSC_FALSE) at /suphys/sburton/local/slepc-3.7.1/src/sys/classes/bv/interface/bvblas.c:274
>> 274	    if (n) PetscStackCallBLAS("BLASgemv",BLASgemv_("C",&n,&k,&done,A,&n,x,&one,&zero,y,&one));
>> (gdb) print n
>> $1 = 4294967296
>> (gdb) print sizeof(n)
>> $2 = 8
>> (gdb) step
>> Intel MKL ERROR: Parameter 6 was incorrect on entry to DGEMV .
>> 
>> 
>> It looks to me like slepc is doing it right, but with error messages
>> like this who knows. It's a bit beyond me debugging assembly.
>> 
>> Originally I built petsc with --download-fblaslapack but i don't think
>> it was working with 64bit indexes (?)
>> 
>> Maybe I should try another blas.
>> 
>> Simon.
>> 
>> 
>> On Sat, 16 Jul 2016 07:17:44 +1000
>> Simon Burton <simon at arrowtheory.com> wrote:
>> 
>>> On Fri, 15 Jul 2016 19:53:31 +0200
>>> "Jose E. Roman" <jroman at dsic.upv.es> wrote:
>>> 
>>>> 
>>>> The default spectral transformation (STSHIFT) will allocate just one vector. At which exact point are you seeing that it allocates a bunch of vectors?
>>> 
>>> Yes I think you are right.
>>> I can get beyond STSetUp with the right settings.
>>> Now the solver runs out of memory inside EPSGetStartVector.
>>> 
>>>> 
>>>> Is this the unmodified ex3.c? Or did you change anything like EPSSetOperators(eps,A,B) ?
>>> 
>>> good question. I didn't change much, let me try again the original.
>>> 
>>>> Do you get the same behaviour with the original ex3 with the same problem size?
>>> 
>>> Yes
>>> 
>>>> 
>>>> Do you have the same problem with a smaller problem? (half size, say)
>>> 
>>> Halving n gives a quarter of the dimension, which is 8gb vector sizes.
>>> It works fine and uses a total of 48gb ram. Oh, I see at one point during
>>> initialization it hits a maximum of 56gb.
>>> 
>>> So I guess it needs to keep 6 vectors in total.
>>> With the original problem size this becomes 192gb which is
>>> just a few gb too much to crunch. I guess I can still try it,
>>> but it doesn't feel good hitting the harddrive that much.
>>> 
>>> Thanks for the suggestions.
>>> 
>>> Simon.
> 



More information about the petsc-users mailing list