[petsc-dev] Fwd: [petsc-users] Unexpected "Out of memory error" with SLEPC
Barry Smith
bsmith at mcs.anl.gov
Thu Jun 26 15:33:51 CDT 2014
I was thinking of catching the overflow BEFORE it hits the malloc. For example in DMSetUp_DA_3D() we have the unattractive code
#if !defined(PETSC_USE_64BIT_INDICES)
if (((Petsc64bitInt) M)*((Petsc64bitInt) N)*((Petsc64bitInt) P)*((Petsc64bitInt) dof) > (Petsc64bitInt) PETSC_MPI_INT_MAX) SETERRQ3(comm,PETSC_ERR_INT_OVERFLOW,"Mesh of %D by %D by %D (dof) is too large for 32 bit indices",M,N,dof);
#endif
Maybe we could introduce some macros like
#if !defined(PETSC_USE_64BIT_INDICES)
#define PetscCheckOverflow2(M,N) if (((Petsc64bitInt) M)*((Petsc64bitInt) N) > (Petsc64bitInt) PETSC_MPI_INT_MAX) SETERRQ3(comm,PETSC_ERR_INT_OVERFLOW,”Multiply of %D by %D bis too large for 32 bit indices",M,N;
#else
#define PetscCheckOverflow2(a,b)
#endif
and then check large products when they are likely to occur.
Barry
On Jun 26, 2014, at 3:25 PM, Karl Rupp <rupp at iue.tuwien.ac.at> wrote:
> Hey,
>
> > It would be nice if we could automatically detect this issue more often.
>
> Indeed.
>
> We have
>
> #define PetscMalloc(a,b) ((a != 0) ? do_something : (*(b) = 0,0) )
>
> For out typical internal use cases, 'a' is of type PetscInt, so we could check for (a < 0) and throw a better error message. On the other hand, assuming 'a' to be signed might result in tautology warnings for unsigned integers in user code, so it may be cleaner to hide this additional check in something like "PetscSignedMalloc()" and only use it for the typical internal routines where the argument to PetscMalloc() is a product of potentially large values. A quick grep on the source tree only shows a handful of cases where this is indeed the case (typically just matrices), allowing us to capture most of these problems. Is this a viable approach?
>
> Best regards,
> Karli
>
>
>
> >
>> Begin forwarded message:
>>
>>> *From: *Mathis Friesdorf <mathisfriesdorf at gmail.com
>>> <mailto:mathisfriesdorf at gmail.com>>
>>> *Subject: **Re: [petsc-users] Unexpected "Out of memory error" with SLEPC*
>>> *Date: *June 26, 2014 at 5:08:23 AM CDT
>>> *To: *Karl Rupp <rupp at iue.tuwien.ac.at <mailto:rupp at iue.tuwien.ac.at>>
>>> *Cc: *<petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>,
>>> <slepc-maint at grycap.upv.es <mailto:slepc-maint at grycap.upv.es>>
>>>
>>> Dear Karl,
>>>
>>> thanks a lot! This indeed worked. I recompiled PETSC with
>>>
>>> ./configure --with-64-bit-indices=1
>>> PETSC_ARCH = arch-linux2-c-debug-int64
>>>
>>> and the error is gone. Again thanks for your help!
>>>
>>> All the best, Mathis
>>>
>>>
>>> On Wed, Jun 25, 2014 at 12:42 PM, Karl Rupp <rupp at iue.tuwien.ac.at
>>> <mailto:rupp at iue.tuwien.ac.at>> wrote:
>>>
>>> Hi Mathis,
>>>
>>> this looks very much like an integer overflow:
>>> http://www.mcs.anl.gov/petsc/__documentation/faq.html#with-__64-bit-indices
>>> <http://www.mcs.anl.gov/petsc/documentation/faq.html#with-64-bit-indices>
>>>
>>> Best regards,
>>> Karli
>>>
>>>
>>> On 06/25/2014 12:31 PM, Mathis Friesdorf wrote:
>>>
>>> Dear all,
>>>
>>> after a very useful email exchange with Jed Brown quite a
>>> while ago, I
>>> was able to find the lowest eigenvalue of a large matrix which is
>>> constructed as a tensor product. Admittedly the solution is a bit
>>> hacked, but is based on a Matrix shell and Armadillo and therefore
>>> reasonably fast. The problem seems to work well for smaller
>>> systems, but
>>> once the vectors reach a certain size, I get "out of memory"
>>> errors. I
>>> have tested the initialization of a vector of that size and
>>> multiplication by the matrix. This works fine and takes
>>> roughly 20GB of
>>> memory. There are 256 GB available, so I see no reason why the esp
>>> solvers should complain. Does anyone have an idea what goes
>>> wrong here?
>>> The error message is not very helpful and claims that a memory is
>>> requested that is way beyond any reasonable number: /Memory
>>> requested
>>> 18446744056529684480./
>>>
>>>
>>> Thanks and all the best, Mathis Friesdorf
>>>
>>>
>>> *Output of the Program:*
>>> /mathis at n180:~/localisation$ ./local_plus 27/
>>> /System Size: 27
>>>
>>> ------------------------------__------------------------------__--------------
>>> [[30558,1],0]: A high-performance Open MPI point-to-point
>>> messaging module
>>> was unable to find any relevant network interfaces:
>>>
>>> Module: OpenFabrics (openib)
>>> Host: n180
>>>
>>> Another transport will be used instead, although this may
>>> result in
>>> lower performance.
>>> ------------------------------__------------------------------__--------------
>>> [0]PETSC ERROR: --------------------- Error Message
>>> ------------------------------__------
>>> [0]PETSC ERROR: Out of memory. This could be due to allocating
>>> [0]PETSC ERROR: too large an object or bleeding by not properly
>>> [0]PETSC ERROR: destroying unneeded objects.
>>> [0]PETSC ERROR: Memory allocated 3221286704 Memory used by process
>>> 3229827072
>>> [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log
>>> for info.
>>> [0]PETSC ERROR: Memory requested 18446744056529684480!
>>> [0]PETSC ERROR:
>>> ------------------------------__------------------------------__------------
>>> [0]PETSC ERROR: Petsc Release Version 3.4.4, Mar, 13, 2014
>>> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
>>> [0]PETSC ERROR: See docs/faq.html for hints about trouble
>>> shooting.
>>> [0]PETSC ERROR: See docs/index.html for manual pages.
>>> [0]PETSC ERROR:
>>> ------------------------------__------------------------------__------------
>>> [0]PETSC ERROR: ./local_plus on a arch-linux2-cxx-debug named
>>> n180 by
>>> mathis Wed Jun 25 12:23:01 2014
>>> [0]PETSC ERROR: Libraries linked from /home/mathis/bin_nodebug/lib
>>> [0]PETSC ERROR: Configure run at Wed Jun 25 00:03:34 2014
>>> [0]PETSC ERROR: Configure options
>>> PETSC_DIR=/home/mathis/petsc-__3.4.4
>>> --with-debugging=1 COPTFLAGS="-O3 -march=p4 -mtune=p4"
>>> --with-fortran=0
>>> -with-mpi=1 --with-mpi-dir=/usr/lib/__openmpi --with-clanguage=cxx
>>> --prefix=/home/mathis/bin___nodebug
>>> [0]PETSC ERROR:
>>> ------------------------------__------------------------------__------------
>>> [0]PETSC ERROR: PetscMallocAlign() line 46 in
>>> /home/mathis/petsc-3.4.4/src/__sys/memory/mal.c
>>> [0]PETSC ERROR: PetscTrMallocDefault() line 189 in
>>> /home/mathis/petsc-3.4.4/src/__sys/memory/mtr.c
>>> [0]PETSC ERROR: VecDuplicateVecs_Contiguous() line 62 in
>>> src/vec/contiguous.c
>>> [0]PETSC ERROR: VecDuplicateVecs() line 589 in
>>> /home/mathis/petsc-3.4.4/src/__vec/vec/interface/vector.c
>>> [0]PETSC ERROR: EPSAllocateSolution() line 51 in
>>> src/eps/interface/mem.c
>>> [0]PETSC ERROR: EPSSetUp_KrylovSchur() line 141 in
>>> src/eps/impls/krylov/__krylovschur/krylovschur.c
>>> [0]PETSC ERROR: EPSSetUp() line 147 in src/eps/interface/setup.c
>>> [0]PETSC ERROR: EPSSolve() line 90 in src/eps/interface/solve.c
>>> [0]PETSC ERROR: main() line 48 in local_plus.cpp
>>> ------------------------------__------------------------------__--------------
>>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
>>> with errorcode 55.
>>>
>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI
>>> processes.
>>> You may or may not see output from other processes, depending on
>>> exactly when Open MPI kills them.
>>> ------------------------------__------------------------------__--------------
>>>
>>> /
>>> *Output of make:
>>> */mathis at n180:~/localisation$ make local_plus
>>>
>>> mpicxx -o local_plus.o -c -Wall -Wwrite-strings
>>> -Wno-strict-aliasing
>>> -Wno-unknown-pragmas -g -fPIC
>>> -I/home/mathis/armadillo-4.__300.8/include -lblas -llapack
>>> -L/home/mathis/armadillo-4.__300.8 -O3 -larmadillo
>>> -fomit-frame-pointer
>>> -I/home/mathis/bin_nodebug/__include
>>> -I/home/mathis/bin_nodebug/__include
>>> -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/__openmpi
>>> -D__INSDIR__= -I/home/mathis/bin_nodebug
>>> -I/home/mathis/bin_nodebug//__include
>>> -I/home/mathis/bin_nodebug/__include
>>> local_plus.cpp
>>> local_plus.cpp:22:0: warning: "__FUNCT__" redefined [enabled
>>> by default]
>>> In file included from
>>> /home/mathis/bin_nodebug/__include/petscvec.h:10:0,
>>> from local_plus.cpp:10:
>>> /home/mathis/bin_nodebug/__include/petscviewer.h:386:0: note:
>>> this is the
>>> location of the previous definition
>>> g++ -o local_plus local_plus.o
>>> -Wl,-rpath,/home/mathis/bin___nodebug//lib
>>> -L/home/mathis/bin_nodebug//__lib -lslepc
>>> -Wl,-rpath,/home/mathis/bin___nodebug/lib
>>> -L/home/mathis/bin_nodebug/lib
>>> -lpetsc -llapack -lblas -lpthread
>>> -Wl,-rpath,/usr/lib/openmpi/__lib
>>> -L/usr/lib/openmpi/lib
>>> -Wl,-rpath,/usr/lib/gcc/x86___64-linux-gnu/4.7
>>> -L/usr/lib/gcc/x86_64-linux-__gnu/4.7
>>> -Wl,-rpath,/usr/lib/x86_64-__linux-gnu
>>> -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-__gnu
>>> -L/lib/x86_64-linux-gnu -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte
>>> -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl
>>> /bin/rm -f local_plus.o
>>> /
>>> *Code:
>>> *///Author: Mathis Friesdorf
>>> //MathisFriesdorf at gmail.com <mailto:MathisFriesdorf at gmail.com>
>>> <mailto:MathisFriesdorf at gmail.__com
>>> <mailto:MathisFriesdorf at gmail.com>>
>>>
>>>
>>> static char help[] = "1D chain\n";
>>>
>>> #include <iostream>
>>> #include <typeinfo>
>>> #include <armadillo>
>>> #include <petscsys.h>
>>> #include <petscvec.h>
>>> #include <petscmat.h>
>>> #include <slepcsys.h>
>>> #include <slepceps.h>
>>> #include <math.h>
>>> #include <assert.h>
>>>
>>> PetscErrorCode BathMult(Mat H, Vec x,Vec y);
>>> PetscInt L=30,d=2,dsys;
>>> PetscErrorCode ierr;
>>> arma::mat hint = "1.0 0 0 0.0; 0 -1.0 2.0 0; 0 2.0 -1.0 0; 0 0
>>> 0 1.0;";
>>>
>>> #define __FUNCT__ "main"
>>> int main(int argc, char **argv)
>>> {
>>> Mat H;
>>> EPS eps;
>>> Vec xr,xi;
>>> PetscScalar kr,ki;
>>> PetscInt j, nconv;
>>>
>>> L = strtol(argv[1],NULL,10);
>>> dsys = pow(d,L);
>>> printf("%s","System Size: ");
>>> printf("%i",L);
>>> printf("%s","\n");
>>> SlepcInitialize(&argc,&argv,(__char*)0,help);
>>>
>>>
>>> MatCreateShell(PETSC_COMM___WORLD,dsys,dsys,dsys,dsys,__NULL,&H);
>>> MatShellSetOperation(H,MATOP___MULT,(void(*)())BathMult);
>>> ierr = MatGetVecs(H,NULL,&xr); CHKERRQ(ierr);
>>> ierr = MatGetVecs(H,NULL,&xi); CHKERRQ(ierr);
>>>
>>> ierr = EPSCreate(PETSC_COMM_WORLD, &eps); CHKERRQ(ierr);
>>> ierr = EPSSetOperators(eps, H, NULL); CHKERRQ(ierr);
>>> ierr = EPSSetProblemType(eps, EPS_HEP); CHKERRQ(ierr);
>>> ierr = EPSSetWhichEigenpairs(eps,EPS___SMALLEST_REAL);
>>> CHKERRQ(ierr);
>>> ierr = EPSSetFromOptions( eps ); CHKERRQ(ierr);
>>> ierr = EPSSolve(eps); CHKERRQ(ierr);
>>> ierr = EPSGetConverged(eps, &nconv); CHKERRQ(ierr);
>>> for (j=0; j<1; j++) {
>>> EPSGetEigenpair(eps, j, &kr, &ki, xr, xi);
>>> printf("%s","Lowest Eigenvalue: ");
>>> PetscPrintf(PETSC_COMM_WORLD,"__%9F",kr);
>>> PetscPrintf(PETSC_COMM_WORLD,"__\n");
>>> }
>>> EPSDestroy(&eps);
>>>
>>> ierr = SlepcFinalize();
>>> return 0;
>>> }
>>> #undef __FUNCT__
>>>
>>> #define __FUNCT__ "BathMult"
>>> PetscErrorCode BathMult(Mat H, Vec x, Vec y)
>>> {
>>> PetscInt l;
>>> uint slice;
>>> PetscScalar *arrayin,*arrayout;
>>>
>>> VecGetArray(x,&arrayin);
>>> VecGetArray(y,&arrayout);
>>> arma::cube A = arma::cube(arrayin,1,1,pow(d,__L),
>>> /*copy_aux_mem*/false,/*__strict*/true);
>>> arma::mat result = arma::mat(arrayout,pow(d,L),1,
>>> /*copy_aux_mem*/false,/*__strict*/true);
>>> for (l=0;l<L-1;l++){
>>> A.reshape(pow(d,L-2-l),pow(d,__2),pow(d,l));
>>> result.reshape(pow(d,L-l),pow(__d,l));
>>> for (slice=0;slice<A.n_slices;__slice++){
>>> result.col(slice) += vectorise(A.slice(slice)*hint)__;
>>> }
>>> }
>>> arrayin = A.memptr();
>>> ierr = VecRestoreArray(x,&arrayin); CHKERRQ(ierr);
>>> arrayout = result.memptr();
>>> ierr = VecRestoreArray(y,&arrayout); CHKERRQ(ierr);
>>> PetscFunctionReturn(0);
>>> }
>>> #undef __FUNCT__/*
>>> *
More information about the petsc-dev
mailing list