From rongliang.chan at gmail.com Sat Oct 1 09:23:13 2016 From: rongliang.chan at gmail.com (Rongliang Chen) Date: Sat, 1 Oct 2016 22:23:13 +0800 Subject: [petsc-users] question about the BuildGradientReconstruction Message-ID: <89874515-3bb1-7b7b-d40d-0540992f3f70@gmail.com> Dear all, I have a question about the gradient reconstruction for the dmplexfvm. Why the ghost cells are ignored during the gradient reconstruction in the function BuildGradientReconstruction_Internal? For the tetrahedron mesh, if a cell (on the corner) whose three faces are on the boundary of the computational domain, then only one cell can be used to reconstruct the gradient (it is deficient for the least square). I found that, for this situation, the accuracy of the gradient reconstructed by the least square is a problem. Do you have any suggestions for deal with this situation? Best regards, Rongliang From bsmith at mcs.anl.gov Sat Oct 1 10:59:44 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 1 Oct 2016 10:59:44 -0500 Subject: [petsc-users] Solve KSP in parallel. In-Reply-To: References: <0002BCB5-855B-4A7A-A31D-3566CC6F80D7@mcs.anl.gov> <174EAC3B-DA31-4FEA-8321-FE7000E74D41@mcs.anl.gov> <5F3C5343-DF36-4121-ADF0-9D3224CC89D9@mcs.anl.gov> Message-ID: > On Sep 30, 2016, at 9:13 PM, Manuel Valera wrote: > > Hi Barry and all, > > I was successful on creating the parallel version to solve my big system, it is scaling accordingly, but i noticed the error norm increasing too, i don't know if this is because the output is duplicated or if its really increasing. Is this expected ? What do you mean by error norm? Do you have an exact solution you are comparing to? If so, you should scale the norm arising from this by 1/sqrt(nx*ny) where nx and ny are the number of grid points in the x and y direction. This scaling makes the norm correspond to the L2 norm of the error which is what you want to measure. With this new scaling you can do convergence studies, for example refine the grid once how much does the error norm reduce, refine the grid again and you should see a similar reduction in the error norm. Barry > > Thanks > > On Tue, Sep 27, 2016 at 4:07 PM, Barry Smith wrote: > > Yes, always use the binary file > > > On Sep 27, 2016, at 3:13 PM, Manuel Valera wrote: > > > > Barry, thanks for your insight, > > > > This standalone script must be translated into a much bigger model, which uses AIJ matrices to define the laplacian in the form of the 3 usual arrays, the ascii files in the script take the place of the arrays which are passed to the solving routine in the model. > > > > So, can i use the approach you mention to create the MPIAIJ from the petsc binary file ? would this be a better solution than reading the three arrays directly? In the model, even the smallest matrix is 10^5x10^5 elements > > > > Thanks. > > > > > > On Tue, Sep 27, 2016 at 12:53 PM, Barry Smith wrote: > > > > Are you loading a matrix from an ASCII file? If so don't do that. You should write a simple sequential PETSc program that reads in the ASCII file and saves the matrix as a PETSc binary file with MatView(). Then write your parallel code that reads in the binary file with MatLoad() and solves the system. You can read in the right hand side from ASCII and save it in the binary file also. Trying to read an ASCII file in parallel and set it into a PETSc parallel matrix is just a totally thankless task that is unnecessary. > > > > Barry > > > > > On Sep 26, 2016, at 6:40 PM, Manuel Valera wrote: > > > > > > Ok, last output was from simulated multicores, in an actual cluster the errors are of the kind: > > > > > > [valera at cinci CSRMatrix]$ petsc -n 2 ./solvelinearmgPETSc > > > TrivSoln loaded, size: 4 / 4 > > > TrivSoln loaded, size: 4 / 4 > > > RHS loaded, size: 4 / 4 > > > RHS loaded, size: 4 / 4 > > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > [0]PETSC ERROR: Argument out of range > > > [0]PETSC ERROR: Comm must be of size 1 > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > [0]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > [1]PETSC ERROR: Argument out of range > > > [1]PETSC ERROR: Comm must be of size 1 > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > [1]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > [1]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > [1]PETSC ERROR: #1 MatCreate_SeqAIJ() line 3958 in /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > [1]PETSC ERROR: #2 MatSetType() line 94 in /home/valera/petsc-3.7.2/src/mat/interface/matreg.c > > > [1]PETSC ERROR: #3 MatCreateSeqAIJWithArrays() line 4300 in /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > local size: 2 > > > local size: 2 > > > Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > [0]PETSC ERROR: #1 MatCreate_SeqAIJ() line 3958 in /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > [0]PETSC ERROR: #2 MatSetType() line 94 in /home/valera/petsc-3.7.2/src/mat/interface/matreg.c > > > [0]PETSC ERROR: #3 MatCreateSeqAIJWithArrays() line 4300 in /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > [1]PETSC ERROR: [0]PETSC ERROR: Nonconforming object sizes > > > [0]PETSC ERROR: Sum of local lengths 8 does not equal global length 4, my local length 4 > > > likely a call to VecSetSizes() or MatSetSizes() is wrong. > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html#split > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > Nonconforming object sizes > > > [1]PETSC ERROR: Sum of local lengths 8 does not equal global length 4, my local length 4 > > > likely a call to VecSetSizes() or MatSetSizes() is wrong. > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html#split > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > [0]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > [1]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > [1]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > [0]PETSC ERROR: #4 PetscSplitOwnership() line 93 in /home/valera/petsc-3.7.2/src/sys/utils/psplit.c > > > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > [1]PETSC ERROR: #4 PetscSplitOwnership() line 93 in /home/valera/petsc-3.7.2/src/sys/utils/psplit.c > > > [0]PETSC ERROR: #5 PetscLayoutSetUp() line 143 in /home/valera/petsc-3.7.2/src/vec/is/utils/pmap.c > > > [0]PETSC ERROR: #6 MatMPIAIJSetPreallocation_MPIAIJ() line 2768 in /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > [1]PETSC ERROR: #5 PetscLayoutSetUp() line 143 in /home/valera/petsc-3.7.2/src/vec/is/utils/pmap.c > > > [1]PETSC ERROR: [0]PETSC ERROR: #7 MatMPIAIJSetPreallocation() line 3505 in /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > #6 MatMPIAIJSetPreallocation_MPIAIJ() line 2768 in /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > [1]PETSC ERROR: [0]PETSC ERROR: #8 MatSetUp_MPIAIJ() line 2153 in /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > #7 MatMPIAIJSetPreallocation() line 3505 in /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > [1]PETSC ERROR: #8 MatSetUp_MPIAIJ() line 2153 in /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > [0]PETSC ERROR: #9 MatSetUp() line 739 in /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > [1]PETSC ERROR: #9 MatSetUp() line 739 in /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > [0]PETSC ERROR: Object is in wrong state > > > [0]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on argument 1 "mat" before MatSetNearNullSpace() > > > [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > [0]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > Object is in wrong state > > > [1]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on argument 1 "mat" before MatSetNearNullSpace() > > > [1]PETSC ERROR: [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > [0]PETSC ERROR: #10 MatSetNearNullSpace() line 8195 in /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > [1]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > [1]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > [1]PETSC ERROR: #10 MatSetNearNullSpace() line 8195 in /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > [0]PETSC ERROR: Object is in wrong state > > > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > [0]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on argument 1 "mat" before MatAssemblyBegin() > > > [0]PETSC ERROR: [1]PETSC ERROR: Object is in wrong state > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > [0]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on argument 1 "mat" before MatAssemblyBegin() > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > [1]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > [1]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > [1]PETSC ERROR: #11 MatAssemblyBegin() line 5093 in /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > [1]PETSC ERROR: #11 MatAssemblyBegin() line 5093 in /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > [0]PETSC ERROR: ------------------------------------------------------------------------ > > > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > > > [1]PETSC ERROR: ------------------------------------------------------------------------ > > > [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > > > [1]PETSC ERROR: [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > > > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > > > [1]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > > [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > > > or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > > > [0]PETSC ERROR: likely location of problem given in stack below > > > [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > > > [1]PETSC ERROR: likely location of problem given in stack below > > > [1]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > > > [0]PETSC ERROR: INSTEAD the line number of the start of the function > > > [0]PETSC ERROR: [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > > > [1]PETSC ERROR: INSTEAD the line number of the start of the function > > > is given. > > > [0]PETSC ERROR: [0] MatAssemblyEnd line 5185 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > [0]PETSC ERROR: [1]PETSC ERROR: is given. > > > [1]PETSC ERROR: [1] MatAssemblyEnd line 5185 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > [0] MatAssemblyBegin line 5090 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > [0]PETSC ERROR: [0] MatSetNearNullSpace line 8191 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > [0]PETSC ERROR: [1]PETSC ERROR: [1] MatAssemblyBegin line 5090 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > [1]PETSC ERROR: [0] PetscSplitOwnership line 80 /home/valera/petsc-3.7.2/src/sys/utils/psplit.c > > > [0]PETSC ERROR: [0] PetscLayoutSetUp line 129 /home/valera/petsc-3.7.2/src/vec/is/utils/pmap.c > > > [0]PETSC ERROR: [0] MatMPIAIJSetPreallocation_MPIAIJ line 2767 /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > [1] MatSetNearNullSpace line 8191 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > [1]PETSC ERROR: [1] PetscSplitOwnership line 80 /home/valera/petsc-3.7.2/src/sys/utils/psplit.c > > > [1]PETSC ERROR: [0]PETSC ERROR: [0] MatMPIAIJSetPreallocation line 3502 /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > [0]PETSC ERROR: [0] MatSetUp_MPIAIJ line 2152 /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > [1] PetscLayoutSetUp line 129 /home/valera/petsc-3.7.2/src/vec/is/utils/pmap.c > > > [1]PETSC ERROR: [1] MatMPIAIJSetPreallocation_MPIAIJ line 2767 /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > [0]PETSC ERROR: [0] MatSetUp line 727 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > [0]PETSC ERROR: [0] MatCreate_SeqAIJ line 3956 /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > [1]PETSC ERROR: [1] MatMPIAIJSetPreallocation line 3502 /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > [1]PETSC ERROR: [1] MatSetUp_MPIAIJ line 2152 /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > [0]PETSC ERROR: [0] MatSetType line 44 /home/valera/petsc-3.7.2/src/mat/interface/matreg.c > > > [0]PETSC ERROR: [0] MatCreateSeqAIJWithArrays line 4295 /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > [1]PETSC ERROR: [1] MatSetUp line 727 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > [1]PETSC ERROR: [1] MatCreate_SeqAIJ line 3956 /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > [0]PETSC ERROR: Signal received > > > [1]PETSC ERROR: [1] MatSetType line 44 /home/valera/petsc-3.7.2/src/mat/interface/matreg.c > > > [1]PETSC ERROR: [1] MatCreateSeqAIJWithArrays line 4295 /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > [1]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > [0]PETSC ERROR: Signal received > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > [1]PETSC ERROR: #12 User provided function() line 0 in unknown file > > > Petsc Release Version 3.7.2, Jun, 05, 2016 > > > [1]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > [1]PETSC ERROR: #12 User provided function() line 0 in unknown file > > > application called MPI_Abort(comm=0x84000004, 59) - process 0 > > > [cli_0]: aborting job: > > > application called MPI_Abort(comm=0x84000004, 59) - process 0 > > > application called MPI_Abort(comm=0x84000002, 59) - process 1 > > > [cli_1]: aborting job: > > > application called MPI_Abort(comm=0x84000002, 59) - process 1 > > > > > > =================================================================================== > > > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > > > = PID 10266 RUNNING AT cinci > > > = EXIT CODE: 59 > > > = CLEANING UP REMAINING PROCESSES > > > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > > > =================================================================================== > > > > > > > > > On Mon, Sep 26, 2016 at 3:51 PM, Manuel Valera wrote: > > > Ok, i created a tiny testcase just for this, > > > > > > The output from n# calls are as follows: > > > > > > n1: > > > Mat Object: 1 MPI processes > > > type: mpiaij > > > row 0: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > row 1: (0, 2.) (1, 1.) (2, 3.) (3, 4.) > > > row 2: (0, 4.) (1, 3.) (2, 1.) (3, 2.) > > > row 3: (0, 3.) (1, 4.) (2, 2.) (3, 1.) > > > > > > n2: > > > Mat Object: 2 MPI processes > > > type: mpiaij > > > row 0: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > row 1: (0, 2.) (1, 1.) (2, 3.) (3, 4.) > > > row 2: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > row 3: (0, 2.) (1, 1.) (2, 3.) (3, 4.) > > > > > > n4: > > > Mat Object: 4 MPI processes > > > type: mpiaij > > > row 0: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > row 1: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > row 2: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > row 3: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > > > > > > > > > > It really gets messed, no idea what's happening. > > > > > > > > > > > > > > > On Mon, Sep 26, 2016 at 3:12 PM, Barry Smith wrote: > > > > > > > On Sep 26, 2016, at 5:07 PM, Manuel Valera wrote: > > > > > > > > Ok i was using a big matrix before, from a smaller testcase i got the output and effectively, it looks like is not well read at all, results are attached for DRAW viewer, output is too big to use STDOUT even in the small testcase. n# is the number of processors requested. > > > > > > You need to construct a very small test case so you can determine why the values do not end up where you expect them. There is no way around it. > > > > > > > > is there a way to create the matrix in one node and the distribute it as needed on the rest ? maybe that would work. > > > > > > No the is not scalable. You become limited by the memory of the one node. > > > > > > > > > > > Thanks > > > > > > > > On Mon, Sep 26, 2016 at 2:40 PM, Barry Smith wrote: > > > > > > > > How large is the matrix? It will take a very long time if the matrix is large. Debug with a very small matrix. > > > > > > > > Barry > > > > > > > > > On Sep 26, 2016, at 4:34 PM, Manuel Valera wrote: > > > > > > > > > > Indeed there is something wrong with that call, it hangs out indefinitely showing only: > > > > > > > > > > Mat Object: 1 MPI processes > > > > > type: mpiaij > > > > > > > > > > It draws my attention that this program works for 1 processor but not more, but it doesnt show anything for that viewer in either case. > > > > > > > > > > Thanks for the insight on the redundant calls, this is not very clear on documentation, which calls are included in others. > > > > > > > > > > > > > > > > > > > > On Mon, Sep 26, 2016 at 2:02 PM, Barry Smith wrote: > > > > > > > > > > The call to MatCreateMPIAIJWithArrays() is likely interpreting the values you pass in different than you expect. > > > > > > > > > > Put a call to MatView(Ap,PETSC_VIEWER_STDOUT_WORLD,ierr) after the MatCreateMPIAIJWithArray() to see what PETSc thinks the matrix is. > > > > > > > > > > > > > > > > On Sep 26, 2016, at 3:42 PM, Manuel Valera wrote: > > > > > > > > > > > > Hello, > > > > > > > > > > > > I'm working on solve a linear system in parallel, following ex12 of the ksp tutorial i don't see major complication on doing so, so for a working linear system solver with PCJACOBI and KSPGCR i did only the following changes: > > > > > > > > > > > > call MatCreate(PETSC_COMM_WORLD,Ap,ierr) > > > > > > ! call MatSetType(Ap,MATSEQAIJ,ierr) > > > > > > call MatSetType(Ap,MATMPIAIJ,ierr) !paralellization > > > > > > > > > > > > call MatSetSizes(Ap,PETSC_DECIDE,PETSC_DECIDE,nbdp,nbdp,ierr); > > > > > > > > > > > > ! call MatSeqAIJSetPreallocationCSR(Ap,iapi,japi,app,ierr) > > > > > > call MatSetFromOptions(Ap,ierr) > > > > > > > > > > Note that none of the lines above are needed (or do anything) because the MatCreateMPIAIJWithArrays() creates the matrix from scratch itself. > > > > > > > > > > Barry > > > > > > > > > > > ! call MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD,nbdp,nbdp,iapi,japi,app,Ap,ierr) > > > > > > call MatCreateMPIAIJWithArrays(PETSC_COMM_WORLD,floor(real(nbdp)/sizel),PETSC_DECIDE,nbdp,nbdp,iapi,japi,app,Ap,ierr) > > > > > > > > > > > > > > > > > > I grayed out the changes from sequential implementation. > > > > > > > > > > > > So, it does not complain at runtime until it reaches KSPSolve(), with the following error: > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > > > > [1]PETSC ERROR: Object is in wrong state > > > > > > [1]PETSC ERROR: Matrix is missing diagonal entry 0 > > > > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > > > > [1]PETSC ERROR: Petsc Release Version 3.7.3, unknown > > > > > > [1]PETSC ERROR: ./solvelinearmgPETSc ? ? on a arch-linux2-c-debug named valera-HP-xw4600-Workstation by valera Mon Sep 26 13:35:15 2016 > > > > > > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich=1 --download-ml?=1 > > > > > > [1]PETSC ERROR: #1 MatILUFactorSymbolic_SeqAIJ() line 1733 in /home/valera/v5PETSc/petsc/petsc/src/mat/impls/aij/seq/aijfact.c > > > > > > [1]PETSC ERROR: #2 MatILUFactorSymbolic() line 6579 in /home/valera/v5PETSc/petsc/petsc/src/mat/interface/matrix.c > > > > > > [1]PETSC ERROR: #3 PCSetUp_ILU() line 212 in /home/valera/v5PETSc/petsc/petsc/src/ksp/pc/impls/factor/ilu/ilu.c > > > > > > [1]PETSC ERROR: #4 PCSetUp() line 968 in /home/valera/v5PETSc/petsc/petsc/src/ksp/pc/interface/precon.c > > > > > > [1]PETSC ERROR: #5 KSPSetUp() line 390 in /home/valera/v5PETSc/petsc/petsc/src/ksp/ksp/interface/itfunc.c > > > > > > [1]PETSC ERROR: #6 PCSetUpOnBlocks_BJacobi_Singleblock() line 650 in /home/valera/v5PETSc/petsc/petsc/src/ksp/pc/impls/bjacobi/bjacobi.c > > > > > > [1]PETSC ERROR: #7 PCSetUpOnBlocks() line 1001 in /home/valera/v5PETSc/petsc/petsc/src/ksp/pc/interface/precon.c > > > > > > [1]PETSC ERROR: #8 KSPSetUpOnBlocks() line 220 in /home/valera/v5PETSc/petsc/petsc/src/ksp/ksp/interface/itfunc.c > > > > > > [1]PETSC ERROR: #9 KSPSolve() line 600 in /home/valera/v5PETSc/petsc/petsc/src/ksp/ksp/interface/itfunc.c > > > > > > At line 333 of file solvelinearmgPETSc.f90 > > > > > > Fortran runtime error: Array bound mismatch for dimension 1 of array 'sol' (213120/106560) > > > > > > > > > > > > > > > > > > This code works for -n 1 cores, but it gives this error when using more than one core. > > > > > > > > > > > > What am i missing? > > > > > > > > > > > > Regards, > > > > > > > > > > > > Manuel. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From mvalera at mail.sdsu.edu Sat Oct 1 11:11:51 2016 From: mvalera at mail.sdsu.edu (Manuel Valera) Date: Sat, 1 Oct 2016 09:11:51 -0700 Subject: [petsc-users] Solve KSP in parallel. In-Reply-To: References: <0002BCB5-855B-4A7A-A31D-3566CC6F80D7@mcs.anl.gov> <174EAC3B-DA31-4FEA-8321-FE7000E74D41@mcs.anl.gov> <5F3C5343-DF36-4121-ADF0-9D3224CC89D9@mcs.anl.gov> Message-ID: I'm comparing with the ones vector as in many examples from petsc docs, so this may be because i hadn't set up the output to a single processor, but i get the following output for 1,2,4 processors: n=1 TrivSoln loaded, size: 213120 / 213120 RHS loaded, size: 213120 / 213120 * Norm: 7.21632103486563610E-011* Its: 101 Total time: 5.0112988948822021 n=2 TrivSoln loaded, size: 213120 / 213120 TrivSoln loaded, size: 213120 / 213120 RHS loaded, size: 213120 / 213120 RHS loaded, size: 213120 / 213120 * Norm: 1.09862436488003634E-007* Its: 101 Norm: 1.09862436488003634E-007 Its: 101 Total time: 2.9765341281890869 Total time: 2.9770300388336182 n=4 TrivSoln loaded, size: 213120 / 213120 TrivSoln loaded, size: 213120 / 213120 TrivSoln loaded, size: 213120 / 213120 TrivSoln loaded, size: 213120 / 213120 RHS loaded, size: 213120 / 213120 RHS loaded, size: 213120 / 213120 RHS loaded, size: 213120 / 213120 RHS loaded, size: 213120 / 213120 * Norm: 1.72790692829407788E-005* Its: 101 Norm: 1.72790692829407788E-005 Its: 101 Norm: 1.72790692829407788E-005 Its: 101 Norm: 1.72790692829407788E-005 Its: 101 Total time: 1.8007240295410156 Total time: 1.8008360862731934 Total time: 1.8008909225463867 Total time: 1.8009200096130371 That is the error norm from the ones vector, im attaching the script again. On Sat, Oct 1, 2016 at 8:59 AM, Barry Smith wrote: > > > On Sep 30, 2016, at 9:13 PM, Manuel Valera > wrote: > > > > Hi Barry and all, > > > > I was successful on creating the parallel version to solve my big > system, it is scaling accordingly, but i noticed the error norm increasing > too, i don't know if this is because the output is duplicated or if its > really increasing. Is this expected ? > > What do you mean by error norm? Do you have an exact solution you are > comparing to? If so, you should scale the norm arising from this by > 1/sqrt(nx*ny) where nx and ny are the number of grid points in the x and y > direction. This scaling makes the norm correspond to the L2 norm of the > error which is what you want to measure. > > With this new scaling you can do convergence studies, for example > refine the grid once how much does the error norm reduce, refine the grid > again and you should see a similar reduction in the error norm. > > > Barry > > > > > Thanks > > > > On Tue, Sep 27, 2016 at 4:07 PM, Barry Smith wrote: > > > > Yes, always use the binary file > > > > > On Sep 27, 2016, at 3:13 PM, Manuel Valera > wrote: > > > > > > Barry, thanks for your insight, > > > > > > This standalone script must be translated into a much bigger model, > which uses AIJ matrices to define the laplacian in the form of the 3 usual > arrays, the ascii files in the script take the place of the arrays which > are passed to the solving routine in the model. > > > > > > So, can i use the approach you mention to create the MPIAIJ from the > petsc binary file ? would this be a better solution than reading the three > arrays directly? In the model, even the smallest matrix is 10^5x10^5 > elements > > > > > > Thanks. > > > > > > > > > On Tue, Sep 27, 2016 at 12:53 PM, Barry Smith > wrote: > > > > > > Are you loading a matrix from an ASCII file? If so don't do that. > You should write a simple sequential PETSc program that reads in the ASCII > file and saves the matrix as a PETSc binary file with MatView(). Then write > your parallel code that reads in the binary file with MatLoad() and solves > the system. You can read in the right hand side from ASCII and save it in > the binary file also. Trying to read an ASCII file in parallel and set it > into a PETSc parallel matrix is just a totally thankless task that is > unnecessary. > > > > > > Barry > > > > > > > On Sep 26, 2016, at 6:40 PM, Manuel Valera > wrote: > > > > > > > > Ok, last output was from simulated multicores, in an actual cluster > the errors are of the kind: > > > > > > > > [valera at cinci CSRMatrix]$ petsc -n 2 ./solvelinearmgPETSc > > > > TrivSoln loaded, size: 4 / 4 > > > > TrivSoln loaded, size: 4 / 4 > > > > RHS loaded, size: 4 / 4 > > > > RHS loaded, size: 4 / 4 > > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > > > [0]PETSC ERROR: Argument out of range > > > > [0]PETSC ERROR: Comm must be of size 1 > > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/ > documentation/faq.html for trouble shooting. > > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > [0]PETSC ERROR: ./solvelinearmgPETSc > > > P on a > arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > > > [1]PETSC ERROR: Argument out of range > > > > [1]PETSC ERROR: Comm must be of size 1 > > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/ > documentation/faq.html for trouble shooting. > > > > [1]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > [1]PETSC ERROR: ./solvelinearmgPETSc > > > P on a > arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > [1]PETSC ERROR: #1 MatCreate_SeqAIJ() line 3958 in > /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > [1]PETSC ERROR: #2 MatSetType() line 94 in > /home/valera/petsc-3.7.2/src/mat/interface/matreg.c > > > > [1]PETSC ERROR: #3 MatCreateSeqAIJWithArrays() line 4300 in > /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > local size: 2 > > > > local size: 2 > > > > Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran > --download-fblaslapack=1 --download-mpich > > > > [0]PETSC ERROR: #1 MatCreate_SeqAIJ() line 3958 in > /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > [0]PETSC ERROR: #2 MatSetType() line 94 in > /home/valera/petsc-3.7.2/src/mat/interface/matreg.c > > > > [0]PETSC ERROR: #3 MatCreateSeqAIJWithArrays() line 4300 in > /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > > > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > > > [1]PETSC ERROR: [0]PETSC ERROR: Nonconforming object sizes > > > > [0]PETSC ERROR: Sum of local lengths 8 does not equal global length > 4, my local length 4 > > > > likely a call to VecSetSizes() or MatSetSizes() is wrong. > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html#split > > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/ > documentation/faq.html for trouble shooting. > > > > Nonconforming object sizes > > > > [1]PETSC ERROR: Sum of local lengths 8 does not equal global length > 4, my local length 4 > > > > likely a call to VecSetSizes() or MatSetSizes() is wrong. > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html#split > > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/ > documentation/faq.html for trouble shooting. > > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > [0]PETSC ERROR: ./solvelinearmgPETSc > > > P on a > arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > [1]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > [1]PETSC ERROR: ./solvelinearmgPETSc > > > P on a > arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > [0]PETSC ERROR: #4 PetscSplitOwnership() line 93 in > /home/valera/petsc-3.7.2/src/sys/utils/psplit.c > > > > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > [1]PETSC ERROR: #4 PetscSplitOwnership() line 93 in > /home/valera/petsc-3.7.2/src/sys/utils/psplit.c > > > > [0]PETSC ERROR: #5 PetscLayoutSetUp() line 143 in > /home/valera/petsc-3.7.2/src/vec/is/utils/pmap.c > > > > [0]PETSC ERROR: #6 MatMPIAIJSetPreallocation_MPIAIJ() line 2768 in > /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > [1]PETSC ERROR: #5 PetscLayoutSetUp() line 143 in > /home/valera/petsc-3.7.2/src/vec/is/utils/pmap.c > > > > [1]PETSC ERROR: [0]PETSC ERROR: #7 MatMPIAIJSetPreallocation() line > 3505 in /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > #6 MatMPIAIJSetPreallocation_MPIAIJ() line 2768 in > /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > [1]PETSC ERROR: [0]PETSC ERROR: #8 MatSetUp_MPIAIJ() line 2153 in > /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > #7 MatMPIAIJSetPreallocation() line 3505 in > /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > [1]PETSC ERROR: #8 MatSetUp_MPIAIJ() line 2153 in > /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > [0]PETSC ERROR: #9 MatSetUp() line 739 in > /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > [1]PETSC ERROR: #9 MatSetUp() line 739 in > /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > > > [0]PETSC ERROR: Object is in wrong state > > > > [0]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on > argument 1 "mat" before MatSetNearNullSpace() > > > > [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/ > documentation/faq.html for trouble shooting. > > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > [0]PETSC ERROR: ./solvelinearmgPETSc > > > P on a > arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > Object is in wrong state > > > > [1]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on > argument 1 "mat" before MatSetNearNullSpace() > > > > [1]PETSC ERROR: [0]PETSC ERROR: Configure options --with-cc=gcc > --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > [0]PETSC ERROR: #10 MatSetNearNullSpace() line 8195 in > /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble > shooting. > > > > [1]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > [1]PETSC ERROR: ./solvelinearmgPETSc > > > P on a > arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > [1]PETSC ERROR: #10 MatSetNearNullSpace() line 8195 in > /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > > > [0]PETSC ERROR: Object is in wrong state > > > > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > > > [0]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on > argument 1 "mat" before MatAssemblyBegin() > > > > [0]PETSC ERROR: [1]PETSC ERROR: Object is in wrong state > > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/ > documentation/faq.html for trouble shooting. > > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > [0]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on > argument 1 "mat" before MatAssemblyBegin() > > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/ > documentation/faq.html for trouble shooting. > > > > [1]PETSC ERROR: ./solvelinearmgPETSc > > > P on a > arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > [1]PETSC ERROR: ./solvelinearmgPETSc > > > P on a > arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > [1]PETSC ERROR: #11 MatAssemblyBegin() line 5093 in > /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran > --download-fblaslapack=1 --download-mpich > > > > [1]PETSC ERROR: #11 MatAssemblyBegin() line 5093 in > /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > [0]PETSC ERROR: ------------------------------ > ------------------------------------------ > > > > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation > Violation, probably memory access out of range > > > > [1]PETSC ERROR: ------------------------------ > ------------------------------------------ > > > > [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation > Violation, probably memory access out of range > > > > [1]PETSC ERROR: [0]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > > > > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/ > documentation/faq.html#valgrind > > > > [0]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > > > > [1]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/ > documentation/faq.html#valgrind > > > > [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple > Mac OS X to find memory corruption errors > > > > or try http://valgrind.org on GNU/linux and Apple Mac OS X to find > memory corruption errors > > > > [0]PETSC ERROR: likely location of problem given in stack below > > > > [0]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > > > > [1]PETSC ERROR: likely location of problem given in stack below > > > > [1]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > > > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > > > > [0]PETSC ERROR: INSTEAD the line number of the start of the > function > > > > [0]PETSC ERROR: [1]PETSC ERROR: Note: The EXACT line numbers in the > stack are not available, > > > > [1]PETSC ERROR: INSTEAD the line number of the start of the > function > > > > is given. > > > > [0]PETSC ERROR: [0] MatAssemblyEnd line 5185 > /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > [0]PETSC ERROR: [1]PETSC ERROR: is given. > > > > [1]PETSC ERROR: [1] MatAssemblyEnd line 5185 > /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > [0] MatAssemblyBegin line 5090 /home/valera/petsc-3.7.2/src/ > mat/interface/matrix.c > > > > [0]PETSC ERROR: [0] MatSetNearNullSpace line 8191 > /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > [0]PETSC ERROR: [1]PETSC ERROR: [1] MatAssemblyBegin line 5090 > /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > [1]PETSC ERROR: [0] PetscSplitOwnership line 80 > /home/valera/petsc-3.7.2/src/sys/utils/psplit.c > > > > [0]PETSC ERROR: [0] PetscLayoutSetUp line 129 > /home/valera/petsc-3.7.2/src/vec/is/utils/pmap.c > > > > [0]PETSC ERROR: [0] MatMPIAIJSetPreallocation_MPIAIJ line 2767 > /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > [1] MatSetNearNullSpace line 8191 /home/valera/petsc-3.7.2/src/ > mat/interface/matrix.c > > > > [1]PETSC ERROR: [1] PetscSplitOwnership line 80 > /home/valera/petsc-3.7.2/src/sys/utils/psplit.c > > > > [1]PETSC ERROR: [0]PETSC ERROR: [0] MatMPIAIJSetPreallocation line > 3502 /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > [0]PETSC ERROR: [0] MatSetUp_MPIAIJ line 2152 > /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > [1] PetscLayoutSetUp line 129 /home/valera/petsc-3.7.2/src/ > vec/is/utils/pmap.c > > > > [1]PETSC ERROR: [1] MatMPIAIJSetPreallocation_MPIAIJ line 2767 > /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > [0]PETSC ERROR: [0] MatSetUp line 727 /home/valera/petsc-3.7.2/src/ > mat/interface/matrix.c > > > > [0]PETSC ERROR: [0] MatCreate_SeqAIJ line 3956 > /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > [1]PETSC ERROR: [1] MatMPIAIJSetPreallocation line 3502 > /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > [1]PETSC ERROR: [1] MatSetUp_MPIAIJ line 2152 > /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > [0]PETSC ERROR: [0] MatSetType line 44 /home/valera/petsc-3.7.2/src/ > mat/interface/matreg.c > > > > [0]PETSC ERROR: [0] MatCreateSeqAIJWithArrays line 4295 > /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > [1]PETSC ERROR: [1] MatSetUp line 727 /home/valera/petsc-3.7.2/src/ > mat/interface/matrix.c > > > > [1]PETSC ERROR: [1] MatCreate_SeqAIJ line 3956 > /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > > > [0]PETSC ERROR: Signal received > > > > [1]PETSC ERROR: [1] MatSetType line 44 /home/valera/petsc-3.7.2/src/ > mat/interface/matreg.c > > > > [1]PETSC ERROR: [1] MatCreateSeqAIJWithArrays line 4295 > /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/ > documentation/faq.html for trouble shooting. > > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > > > [1]PETSC ERROR: ./solvelinearmgPETSc > > > P on a > arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > [0]PETSC ERROR: Signal received > > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/ > documentation/faq.html for trouble shooting. > > > > [1]PETSC ERROR: #12 User provided function() line 0 in unknown file > > > > Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > [1]PETSC ERROR: ./solvelinearmgPETSc > > > P on a > arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > [1]PETSC ERROR: #12 User provided function() line 0 in unknown file > > > > application called MPI_Abort(comm=0x84000004, 59) - process 0 > > > > [cli_0]: aborting job: > > > > application called MPI_Abort(comm=0x84000004, 59) - process 0 > > > > application called MPI_Abort(comm=0x84000002, 59) - process 1 > > > > [cli_1]: aborting job: > > > > application called MPI_Abort(comm=0x84000002, 59) - process 1 > > > > > > > > ============================================================ > ======================= > > > > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > > > > = PID 10266 RUNNING AT cinci > > > > = EXIT CODE: 59 > > > > = CLEANING UP REMAINING PROCESSES > > > > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > > > > ============================================================ > ======================= > > > > > > > > > > > > On Mon, Sep 26, 2016 at 3:51 PM, Manuel Valera < > mvalera at mail.sdsu.edu> wrote: > > > > Ok, i created a tiny testcase just for this, > > > > > > > > The output from n# calls are as follows: > > > > > > > > n1: > > > > Mat Object: 1 MPI processes > > > > type: mpiaij > > > > row 0: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > > row 1: (0, 2.) (1, 1.) (2, 3.) (3, 4.) > > > > row 2: (0, 4.) (1, 3.) (2, 1.) (3, 2.) > > > > row 3: (0, 3.) (1, 4.) (2, 2.) (3, 1.) > > > > > > > > n2: > > > > Mat Object: 2 MPI processes > > > > type: mpiaij > > > > row 0: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > > row 1: (0, 2.) (1, 1.) (2, 3.) (3, 4.) > > > > row 2: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > > row 3: (0, 2.) (1, 1.) (2, 3.) (3, 4.) > > > > > > > > n4: > > > > Mat Object: 4 MPI processes > > > > type: mpiaij > > > > row 0: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > > row 1: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > > row 2: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > > row 3: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > > > > > > > > > > > > > > It really gets messed, no idea what's happening. > > > > > > > > > > > > > > > > > > > > On Mon, Sep 26, 2016 at 3:12 PM, Barry Smith > wrote: > > > > > > > > > On Sep 26, 2016, at 5:07 PM, Manuel Valera > wrote: > > > > > > > > > > Ok i was using a big matrix before, from a smaller testcase i got > the output and effectively, it looks like is not well read at all, results > are attached for DRAW viewer, output is too big to use STDOUT even in the > small testcase. n# is the number of processors requested. > > > > > > > > You need to construct a very small test case so you can determine > why the values do not end up where you expect them. There is no way around > it. > > > > > > > > > > is there a way to create the matrix in one node and the distribute > it as needed on the rest ? maybe that would work. > > > > > > > > No the is not scalable. You become limited by the memory of the > one node. > > > > > > > > > > > > > > Thanks > > > > > > > > > > On Mon, Sep 26, 2016 at 2:40 PM, Barry Smith > wrote: > > > > > > > > > > How large is the matrix? It will take a very long time if the > matrix is large. Debug with a very small matrix. > > > > > > > > > > Barry > > > > > > > > > > > On Sep 26, 2016, at 4:34 PM, Manuel Valera < > mvalera at mail.sdsu.edu> wrote: > > > > > > > > > > > > Indeed there is something wrong with that call, it hangs out > indefinitely showing only: > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > type: mpiaij > > > > > > > > > > > > It draws my attention that this program works for 1 processor > but not more, but it doesnt show anything for that viewer in either case. > > > > > > > > > > > > Thanks for the insight on the redundant calls, this is not very > clear on documentation, which calls are included in others. > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Sep 26, 2016 at 2:02 PM, Barry Smith > wrote: > > > > > > > > > > > > The call to MatCreateMPIAIJWithArrays() is likely > interpreting the values you pass in different than you expect. > > > > > > > > > > > > Put a call to MatView(Ap,PETSC_VIEWER_STDOUT_WORLD,ierr) > after the MatCreateMPIAIJWithArray() to see what PETSc thinks the matrix is. > > > > > > > > > > > > > > > > > > > On Sep 26, 2016, at 3:42 PM, Manuel Valera < > mvalera at mail.sdsu.edu> wrote: > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > I'm working on solve a linear system in parallel, following > ex12 of the ksp tutorial i don't see major complication on doing so, so for > a working linear system solver with PCJACOBI and KSPGCR i did only the > following changes: > > > > > > > > > > > > > > call MatCreate(PETSC_COMM_WORLD,Ap,ierr) > > > > > > > ! call MatSetType(Ap,MATSEQAIJ,ierr) > > > > > > > call MatSetType(Ap,MATMPIAIJ,ierr) !paralellization > > > > > > > > > > > > > > call MatSetSizes(Ap,PETSC_DECIDE, > PETSC_DECIDE,nbdp,nbdp,ierr); > > > > > > > > > > > > > > ! call MatSeqAIJSetPreallocationCSR(Ap,iapi,japi,app,ierr) > > > > > > > call MatSetFromOptions(Ap,ierr) > > > > > > > > > > > > Note that none of the lines above are needed (or do > anything) because the MatCreateMPIAIJWithArrays() creates the matrix from > scratch itself. > > > > > > > > > > > > Barry > > > > > > > > > > > > > ! call MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD,nbdp,nbdp, > iapi,japi,app,Ap,ierr) > > > > > > > call MatCreateMPIAIJWithArrays(PETSC_COMM_WORLD,floor(real( > nbdp)/sizel),PETSC_DECIDE,nbdp,nbdp,iapi,japi,app,Ap,ierr) > > > > > > > > > > > > > > > > > > > > > I grayed out the changes from sequential implementation. > > > > > > > > > > > > > > So, it does not complain at runtime until it reaches > KSPSolve(), with the following error: > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > > > > > > [1]PETSC ERROR: Object is in wrong state > > > > > > > [1]PETSC ERROR: Matrix is missing diagonal entry 0 > > > > > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/ > documentation/faq.html for trouble shooting. > > > > > > > [1]PETSC ERROR: Petsc Release Version 3.7.3, unknown > > > > > > > [1]PETSC ERROR: ./solvelinearmgPETSc > > > ? ? on a > arch-linux2-c-debug named valera-HP-xw4600-Workstation by valera Mon Sep 26 > 13:35:15 2016 > > > > > > > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --download-fblaslapack=1 --download-mpich=1 > --download-ml?=1 > > > > > > > [1]PETSC ERROR: #1 MatILUFactorSymbolic_SeqAIJ() line 1733 in > /home/valera/v5PETSc/petsc/petsc/src/mat/impls/aij/seq/aijfact.c > > > > > > > [1]PETSC ERROR: #2 MatILUFactorSymbolic() line 6579 in > /home/valera/v5PETSc/petsc/petsc/src/mat/interface/matrix.c > > > > > > > [1]PETSC ERROR: #3 PCSetUp_ILU() line 212 in > /home/valera/v5PETSc/petsc/petsc/src/ksp/pc/impls/factor/ilu/ilu.c > > > > > > > [1]PETSC ERROR: #4 PCSetUp() line 968 in > /home/valera/v5PETSc/petsc/petsc/src/ksp/pc/interface/precon.c > > > > > > > [1]PETSC ERROR: #5 KSPSetUp() line 390 in > /home/valera/v5PETSc/petsc/petsc/src/ksp/ksp/interface/itfunc.c > > > > > > > [1]PETSC ERROR: #6 PCSetUpOnBlocks_BJacobi_Singleblock() line > 650 in /home/valera/v5PETSc/petsc/petsc/src/ksp/pc/impls/bjacobi/bjacobi.c > > > > > > > [1]PETSC ERROR: #7 PCSetUpOnBlocks() line 1001 in > /home/valera/v5PETSc/petsc/petsc/src/ksp/pc/interface/precon.c > > > > > > > [1]PETSC ERROR: #8 KSPSetUpOnBlocks() line 220 in > /home/valera/v5PETSc/petsc/petsc/src/ksp/ksp/interface/itfunc.c > > > > > > > [1]PETSC ERROR: #9 KSPSolve() line 600 in > /home/valera/v5PETSc/petsc/petsc/src/ksp/ksp/interface/itfunc.c > > > > > > > At line 333 of file solvelinearmgPETSc.f90 > > > > > > > Fortran runtime error: Array bound mismatch for dimension 1 of > array 'sol' (213120/106560) > > > > > > > > > > > > > > > > > > > > > This code works for -n 1 cores, but it gives this error when > using more than one core. > > > > > > > > > > > > > > What am i missing? > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > Manuel. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: solvelinearmgPETSc.f90 Type: text/x-fortran Size: 13751 bytes Desc: not available URL: From bsmith at mcs.anl.gov Sat Oct 1 11:56:13 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 1 Oct 2016 11:56:13 -0500 Subject: [petsc-users] Solve KSP in parallel. In-Reply-To: References: <0002BCB5-855B-4A7A-A31D-3566CC6F80D7@mcs.anl.gov> <174EAC3B-DA31-4FEA-8321-FE7000E74D41@mcs.anl.gov> <5F3C5343-DF36-4121-ADF0-9D3224CC89D9@mcs.anl.gov> Message-ID: <0DE9BC4B-2199-4211-99D5-F4F45D42BBCF@mcs.anl.gov> This is not expected. Run on 1 and 4 processes with -ksp_monitor_true_residual -ksp_converged_reason -ksp_view and send the output. Barry > On Oct 1, 2016, at 11:11 AM, Manuel Valera wrote: > > I'm comparing with the ones vector as in many examples from petsc docs, so this may be because i hadn't set up the output to a single processor, but i get the following output for 1,2,4 processors: > > n=1 > TrivSoln loaded, size: 213120 / 213120 > RHS loaded, size: 213120 / 213120 > Norm: 7.21632103486563610E-011 > Its: 101 > Total time: 5.0112988948822021 > > n=2 > TrivSoln loaded, size: 213120 / 213120 > TrivSoln loaded, size: 213120 / 213120 > RHS loaded, size: 213120 / 213120 > RHS loaded, size: 213120 / 213120 > Norm: 1.09862436488003634E-007 > Its: 101 > Norm: 1.09862436488003634E-007 > Its: 101 > Total time: 2.9765341281890869 > Total time: 2.9770300388336182 > > n=4 > TrivSoln loaded, size: 213120 / 213120 > TrivSoln loaded, size: 213120 / 213120 > TrivSoln loaded, size: 213120 / 213120 > TrivSoln loaded, size: 213120 / 213120 > RHS loaded, size: 213120 / 213120 > RHS loaded, size: 213120 / 213120 > RHS loaded, size: 213120 / 213120 > RHS loaded, size: 213120 / 213120 > Norm: 1.72790692829407788E-005 > Its: 101 > Norm: 1.72790692829407788E-005 > Its: 101 > Norm: 1.72790692829407788E-005 > Its: 101 > Norm: 1.72790692829407788E-005 > Its: 101 > Total time: 1.8007240295410156 > Total time: 1.8008360862731934 > Total time: 1.8008909225463867 > Total time: 1.8009200096130371 > > > That is the error norm from the ones vector, im attaching the script again. > > > On Sat, Oct 1, 2016 at 8:59 AM, Barry Smith wrote: > > > On Sep 30, 2016, at 9:13 PM, Manuel Valera wrote: > > > > Hi Barry and all, > > > > I was successful on creating the parallel version to solve my big system, it is scaling accordingly, but i noticed the error norm increasing too, i don't know if this is because the output is duplicated or if its really increasing. Is this expected ? > > What do you mean by error norm? Do you have an exact solution you are comparing to? If so, you should scale the norm arising from this by 1/sqrt(nx*ny) where nx and ny are the number of grid points in the x and y direction. This scaling makes the norm correspond to the L2 norm of the error which is what you want to measure. > > With this new scaling you can do convergence studies, for example refine the grid once how much does the error norm reduce, refine the grid again and you should see a similar reduction in the error norm. > > > Barry > > > > > Thanks > > > > On Tue, Sep 27, 2016 at 4:07 PM, Barry Smith wrote: > > > > Yes, always use the binary file > > > > > On Sep 27, 2016, at 3:13 PM, Manuel Valera wrote: > > > > > > Barry, thanks for your insight, > > > > > > This standalone script must be translated into a much bigger model, which uses AIJ matrices to define the laplacian in the form of the 3 usual arrays, the ascii files in the script take the place of the arrays which are passed to the solving routine in the model. > > > > > > So, can i use the approach you mention to create the MPIAIJ from the petsc binary file ? would this be a better solution than reading the three arrays directly? In the model, even the smallest matrix is 10^5x10^5 elements > > > > > > Thanks. > > > > > > > > > On Tue, Sep 27, 2016 at 12:53 PM, Barry Smith wrote: > > > > > > Are you loading a matrix from an ASCII file? If so don't do that. You should write a simple sequential PETSc program that reads in the ASCII file and saves the matrix as a PETSc binary file with MatView(). Then write your parallel code that reads in the binary file with MatLoad() and solves the system. You can read in the right hand side from ASCII and save it in the binary file also. Trying to read an ASCII file in parallel and set it into a PETSc parallel matrix is just a totally thankless task that is unnecessary. > > > > > > Barry > > > > > > > On Sep 26, 2016, at 6:40 PM, Manuel Valera wrote: > > > > > > > > Ok, last output was from simulated multicores, in an actual cluster the errors are of the kind: > > > > > > > > [valera at cinci CSRMatrix]$ petsc -n 2 ./solvelinearmgPETSc > > > > TrivSoln loaded, size: 4 / 4 > > > > TrivSoln loaded, size: 4 / 4 > > > > RHS loaded, size: 4 / 4 > > > > RHS loaded, size: 4 / 4 > > > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > > [0]PETSC ERROR: Argument out of range > > > > [0]PETSC ERROR: Comm must be of size 1 > > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > [0]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > > [1]PETSC ERROR: Argument out of range > > > > [1]PETSC ERROR: Comm must be of size 1 > > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > > [1]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > [1]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > [1]PETSC ERROR: #1 MatCreate_SeqAIJ() line 3958 in /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > [1]PETSC ERROR: #2 MatSetType() line 94 in /home/valera/petsc-3.7.2/src/mat/interface/matreg.c > > > > [1]PETSC ERROR: #3 MatCreateSeqAIJWithArrays() line 4300 in /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > local size: 2 > > > > local size: 2 > > > > Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > [0]PETSC ERROR: #1 MatCreate_SeqAIJ() line 3958 in /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > [0]PETSC ERROR: #2 MatSetType() line 94 in /home/valera/petsc-3.7.2/src/mat/interface/matreg.c > > > > [0]PETSC ERROR: #3 MatCreateSeqAIJWithArrays() line 4300 in /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > > [1]PETSC ERROR: [0]PETSC ERROR: Nonconforming object sizes > > > > [0]PETSC ERROR: Sum of local lengths 8 does not equal global length 4, my local length 4 > > > > likely a call to VecSetSizes() or MatSetSizes() is wrong. > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html#split > > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > > Nonconforming object sizes > > > > [1]PETSC ERROR: Sum of local lengths 8 does not equal global length 4, my local length 4 > > > > likely a call to VecSetSizes() or MatSetSizes() is wrong. > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html#split > > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > [0]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > [1]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > [1]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > [0]PETSC ERROR: #4 PetscSplitOwnership() line 93 in /home/valera/petsc-3.7.2/src/sys/utils/psplit.c > > > > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > [1]PETSC ERROR: #4 PetscSplitOwnership() line 93 in /home/valera/petsc-3.7.2/src/sys/utils/psplit.c > > > > [0]PETSC ERROR: #5 PetscLayoutSetUp() line 143 in /home/valera/petsc-3.7.2/src/vec/is/utils/pmap.c > > > > [0]PETSC ERROR: #6 MatMPIAIJSetPreallocation_MPIAIJ() line 2768 in /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > [1]PETSC ERROR: #5 PetscLayoutSetUp() line 143 in /home/valera/petsc-3.7.2/src/vec/is/utils/pmap.c > > > > [1]PETSC ERROR: [0]PETSC ERROR: #7 MatMPIAIJSetPreallocation() line 3505 in /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > #6 MatMPIAIJSetPreallocation_MPIAIJ() line 2768 in /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > [1]PETSC ERROR: [0]PETSC ERROR: #8 MatSetUp_MPIAIJ() line 2153 in /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > #7 MatMPIAIJSetPreallocation() line 3505 in /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > [1]PETSC ERROR: #8 MatSetUp_MPIAIJ() line 2153 in /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > [0]PETSC ERROR: #9 MatSetUp() line 739 in /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > [1]PETSC ERROR: #9 MatSetUp() line 739 in /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > > [0]PETSC ERROR: Object is in wrong state > > > > [0]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on argument 1 "mat" before MatSetNearNullSpace() > > > > [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > [0]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > Object is in wrong state > > > > [1]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on argument 1 "mat" before MatSetNearNullSpace() > > > > [1]PETSC ERROR: [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > [0]PETSC ERROR: #10 MatSetNearNullSpace() line 8195 in /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > > [1]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > [1]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > [1]PETSC ERROR: #10 MatSetNearNullSpace() line 8195 in /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > > [0]PETSC ERROR: Object is in wrong state > > > > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > > [0]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on argument 1 "mat" before MatAssemblyBegin() > > > > [0]PETSC ERROR: [1]PETSC ERROR: Object is in wrong state > > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > [0]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on argument 1 "mat" before MatAssemblyBegin() > > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > > [1]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > [1]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > [1]PETSC ERROR: #11 MatAssemblyBegin() line 5093 in /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > [1]PETSC ERROR: #11 MatAssemblyBegin() line 5093 in /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > [0]PETSC ERROR: ------------------------------------------------------------------------ > > > > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > > > > [1]PETSC ERROR: ------------------------------------------------------------------------ > > > > [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > > > > [1]PETSC ERROR: [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > > > > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > > > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > > > > [1]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > > > [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > > > > or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > > > > [0]PETSC ERROR: likely location of problem given in stack below > > > > [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > > > > [1]PETSC ERROR: likely location of problem given in stack below > > > > [1]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > > > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > > > > [0]PETSC ERROR: INSTEAD the line number of the start of the function > > > > [0]PETSC ERROR: [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > > > > [1]PETSC ERROR: INSTEAD the line number of the start of the function > > > > is given. > > > > [0]PETSC ERROR: [0] MatAssemblyEnd line 5185 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > [0]PETSC ERROR: [1]PETSC ERROR: is given. > > > > [1]PETSC ERROR: [1] MatAssemblyEnd line 5185 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > [0] MatAssemblyBegin line 5090 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > [0]PETSC ERROR: [0] MatSetNearNullSpace line 8191 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > [0]PETSC ERROR: [1]PETSC ERROR: [1] MatAssemblyBegin line 5090 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > [1]PETSC ERROR: [0] PetscSplitOwnership line 80 /home/valera/petsc-3.7.2/src/sys/utils/psplit.c > > > > [0]PETSC ERROR: [0] PetscLayoutSetUp line 129 /home/valera/petsc-3.7.2/src/vec/is/utils/pmap.c > > > > [0]PETSC ERROR: [0] MatMPIAIJSetPreallocation_MPIAIJ line 2767 /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > [1] MatSetNearNullSpace line 8191 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > [1]PETSC ERROR: [1] PetscSplitOwnership line 80 /home/valera/petsc-3.7.2/src/sys/utils/psplit.c > > > > [1]PETSC ERROR: [0]PETSC ERROR: [0] MatMPIAIJSetPreallocation line 3502 /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > [0]PETSC ERROR: [0] MatSetUp_MPIAIJ line 2152 /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > [1] PetscLayoutSetUp line 129 /home/valera/petsc-3.7.2/src/vec/is/utils/pmap.c > > > > [1]PETSC ERROR: [1] MatMPIAIJSetPreallocation_MPIAIJ line 2767 /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > [0]PETSC ERROR: [0] MatSetUp line 727 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > [0]PETSC ERROR: [0] MatCreate_SeqAIJ line 3956 /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > [1]PETSC ERROR: [1] MatMPIAIJSetPreallocation line 3502 /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > [1]PETSC ERROR: [1] MatSetUp_MPIAIJ line 2152 /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > [0]PETSC ERROR: [0] MatSetType line 44 /home/valera/petsc-3.7.2/src/mat/interface/matreg.c > > > > [0]PETSC ERROR: [0] MatCreateSeqAIJWithArrays line 4295 /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > [1]PETSC ERROR: [1] MatSetUp line 727 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > [1]PETSC ERROR: [1] MatCreate_SeqAIJ line 3956 /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > > [0]PETSC ERROR: Signal received > > > > [1]PETSC ERROR: [1] MatSetType line 44 /home/valera/petsc-3.7.2/src/mat/interface/matreg.c > > > > [1]PETSC ERROR: [1] MatCreateSeqAIJWithArrays line 4295 /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > > [1]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > [0]PETSC ERROR: Signal received > > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > > [1]PETSC ERROR: #12 User provided function() line 0 in unknown file > > > > Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > [1]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > [1]PETSC ERROR: #12 User provided function() line 0 in unknown file > > > > application called MPI_Abort(comm=0x84000004, 59) - process 0 > > > > [cli_0]: aborting job: > > > > application called MPI_Abort(comm=0x84000004, 59) - process 0 > > > > application called MPI_Abort(comm=0x84000002, 59) - process 1 > > > > [cli_1]: aborting job: > > > > application called MPI_Abort(comm=0x84000002, 59) - process 1 > > > > > > > > =================================================================================== > > > > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > > > > = PID 10266 RUNNING AT cinci > > > > = EXIT CODE: 59 > > > > = CLEANING UP REMAINING PROCESSES > > > > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > > > > =================================================================================== > > > > > > > > > > > > On Mon, Sep 26, 2016 at 3:51 PM, Manuel Valera wrote: > > > > Ok, i created a tiny testcase just for this, > > > > > > > > The output from n# calls are as follows: > > > > > > > > n1: > > > > Mat Object: 1 MPI processes > > > > type: mpiaij > > > > row 0: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > > row 1: (0, 2.) (1, 1.) (2, 3.) (3, 4.) > > > > row 2: (0, 4.) (1, 3.) (2, 1.) (3, 2.) > > > > row 3: (0, 3.) (1, 4.) (2, 2.) (3, 1.) > > > > > > > > n2: > > > > Mat Object: 2 MPI processes > > > > type: mpiaij > > > > row 0: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > > row 1: (0, 2.) (1, 1.) (2, 3.) (3, 4.) > > > > row 2: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > > row 3: (0, 2.) (1, 1.) (2, 3.) (3, 4.) > > > > > > > > n4: > > > > Mat Object: 4 MPI processes > > > > type: mpiaij > > > > row 0: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > > row 1: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > > row 2: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > > row 3: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > > > > > > > > > > > > > > It really gets messed, no idea what's happening. > > > > > > > > > > > > > > > > > > > > On Mon, Sep 26, 2016 at 3:12 PM, Barry Smith wrote: > > > > > > > > > On Sep 26, 2016, at 5:07 PM, Manuel Valera wrote: > > > > > > > > > > Ok i was using a big matrix before, from a smaller testcase i got the output and effectively, it looks like is not well read at all, results are attached for DRAW viewer, output is too big to use STDOUT even in the small testcase. n# is the number of processors requested. > > > > > > > > You need to construct a very small test case so you can determine why the values do not end up where you expect them. There is no way around it. > > > > > > > > > > is there a way to create the matrix in one node and the distribute it as needed on the rest ? maybe that would work. > > > > > > > > No the is not scalable. You become limited by the memory of the one node. > > > > > > > > > > > > > > Thanks > > > > > > > > > > On Mon, Sep 26, 2016 at 2:40 PM, Barry Smith wrote: > > > > > > > > > > How large is the matrix? It will take a very long time if the matrix is large. Debug with a very small matrix. > > > > > > > > > > Barry > > > > > > > > > > > On Sep 26, 2016, at 4:34 PM, Manuel Valera wrote: > > > > > > > > > > > > Indeed there is something wrong with that call, it hangs out indefinitely showing only: > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > type: mpiaij > > > > > > > > > > > > It draws my attention that this program works for 1 processor but not more, but it doesnt show anything for that viewer in either case. > > > > > > > > > > > > Thanks for the insight on the redundant calls, this is not very clear on documentation, which calls are included in others. > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Sep 26, 2016 at 2:02 PM, Barry Smith wrote: > > > > > > > > > > > > The call to MatCreateMPIAIJWithArrays() is likely interpreting the values you pass in different than you expect. > > > > > > > > > > > > Put a call to MatView(Ap,PETSC_VIEWER_STDOUT_WORLD,ierr) after the MatCreateMPIAIJWithArray() to see what PETSc thinks the matrix is. > > > > > > > > > > > > > > > > > > > On Sep 26, 2016, at 3:42 PM, Manuel Valera wrote: > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > I'm working on solve a linear system in parallel, following ex12 of the ksp tutorial i don't see major complication on doing so, so for a working linear system solver with PCJACOBI and KSPGCR i did only the following changes: > > > > > > > > > > > > > > call MatCreate(PETSC_COMM_WORLD,Ap,ierr) > > > > > > > ! call MatSetType(Ap,MATSEQAIJ,ierr) > > > > > > > call MatSetType(Ap,MATMPIAIJ,ierr) !paralellization > > > > > > > > > > > > > > call MatSetSizes(Ap,PETSC_DECIDE,PETSC_DECIDE,nbdp,nbdp,ierr); > > > > > > > > > > > > > > ! call MatSeqAIJSetPreallocationCSR(Ap,iapi,japi,app,ierr) > > > > > > > call MatSetFromOptions(Ap,ierr) > > > > > > > > > > > > Note that none of the lines above are needed (or do anything) because the MatCreateMPIAIJWithArrays() creates the matrix from scratch itself. > > > > > > > > > > > > Barry > > > > > > > > > > > > > ! call MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD,nbdp,nbdp,iapi,japi,app,Ap,ierr) > > > > > > > call MatCreateMPIAIJWithArrays(PETSC_COMM_WORLD,floor(real(nbdp)/sizel),PETSC_DECIDE,nbdp,nbdp,iapi,japi,app,Ap,ierr) > > > > > > > > > > > > > > > > > > > > > I grayed out the changes from sequential implementation. > > > > > > > > > > > > > > So, it does not complain at runtime until it reaches KSPSolve(), with the following error: > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > > > > > [1]PETSC ERROR: Object is in wrong state > > > > > > > [1]PETSC ERROR: Matrix is missing diagonal entry 0 > > > > > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > > > > > [1]PETSC ERROR: Petsc Release Version 3.7.3, unknown > > > > > > > [1]PETSC ERROR: ./solvelinearmgPETSc ? ? on a arch-linux2-c-debug named valera-HP-xw4600-Workstation by valera Mon Sep 26 13:35:15 2016 > > > > > > > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich=1 --download-ml?=1 > > > > > > > [1]PETSC ERROR: #1 MatILUFactorSymbolic_SeqAIJ() line 1733 in /home/valera/v5PETSc/petsc/petsc/src/mat/impls/aij/seq/aijfact.c > > > > > > > [1]PETSC ERROR: #2 MatILUFactorSymbolic() line 6579 in /home/valera/v5PETSc/petsc/petsc/src/mat/interface/matrix.c > > > > > > > [1]PETSC ERROR: #3 PCSetUp_ILU() line 212 in /home/valera/v5PETSc/petsc/petsc/src/ksp/pc/impls/factor/ilu/ilu.c > > > > > > > [1]PETSC ERROR: #4 PCSetUp() line 968 in /home/valera/v5PETSc/petsc/petsc/src/ksp/pc/interface/precon.c > > > > > > > [1]PETSC ERROR: #5 KSPSetUp() line 390 in /home/valera/v5PETSc/petsc/petsc/src/ksp/ksp/interface/itfunc.c > > > > > > > [1]PETSC ERROR: #6 PCSetUpOnBlocks_BJacobi_Singleblock() line 650 in /home/valera/v5PETSc/petsc/petsc/src/ksp/pc/impls/bjacobi/bjacobi.c > > > > > > > [1]PETSC ERROR: #7 PCSetUpOnBlocks() line 1001 in /home/valera/v5PETSc/petsc/petsc/src/ksp/pc/interface/precon.c > > > > > > > [1]PETSC ERROR: #8 KSPSetUpOnBlocks() line 220 in /home/valera/v5PETSc/petsc/petsc/src/ksp/ksp/interface/itfunc.c > > > > > > > [1]PETSC ERROR: #9 KSPSolve() line 600 in /home/valera/v5PETSc/petsc/petsc/src/ksp/ksp/interface/itfunc.c > > > > > > > At line 333 of file solvelinearmgPETSc.f90 > > > > > > > Fortran runtime error: Array bound mismatch for dimension 1 of array 'sol' (213120/106560) > > > > > > > > > > > > > > > > > > > > > This code works for -n 1 cores, but it gives this error when using more than one core. > > > > > > > > > > > > > > What am i missing? > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > Manuel. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From bsmith at mcs.anl.gov Sat Oct 1 12:52:57 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 1 Oct 2016 12:52:57 -0500 Subject: [petsc-users] Solve KSP in parallel. In-Reply-To: References: <0002BCB5-855B-4A7A-A31D-3566CC6F80D7@mcs.anl.gov> <174EAC3B-DA31-4FEA-8321-FE7000E74D41@mcs.anl.gov> <5F3C5343-DF36-4121-ADF0-9D3224CC89D9@mcs.anl.gov> <0DE9BC4B-2199-4211-99D5-F4F45D42BBCF@mcs.anl.gov> Message-ID: <6039C123-DF30-489A-8F01-057207F99D5A@mcs.anl.gov> Something is still very wrong: Norm: 7.2163210348656361E-011 Norm: 1.7279069282940779E-005 For 1 process run with the option -ksp_view_mat binary -ksp_view_rhs binary -ksp_view_solution binary and send the file binaryoutput to petsc-maint at mcs.anl.gov then for 4 processes run the same way and send the new file binaryoutput to petsc-maint at mcs.anl.gov Barry > On Oct 1, 2016, at 12:08 PM, Manuel Valera wrote: > > Interesting, it looks like it is an output issue, ksp_true_residual goes down to 10^-11 in every case. Output attached. > > On Sat, Oct 1, 2016 at 9:56 AM, Barry Smith wrote: > > This is not expected. > > Run on 1 and 4 processes with -ksp_monitor_true_residual -ksp_converged_reason -ksp_view and send the output. > > Barry > > > On Oct 1, 2016, at 11:11 AM, Manuel Valera wrote: > > > > I'm comparing with the ones vector as in many examples from petsc docs, so this may be because i hadn't set up the output to a single processor, but i get the following output for 1,2,4 processors: > > > > n=1 > > TrivSoln loaded, size: 213120 / 213120 > > RHS loaded, size: 213120 / 213120 > > Norm: 7.21632103486563610E-011 > > Its: 101 > > Total time: 5.0112988948822021 > > > > n=2 > > TrivSoln loaded, size: 213120 / 213120 > > TrivSoln loaded, size: 213120 / 213120 > > RHS loaded, size: 213120 / 213120 > > RHS loaded, size: 213120 / 213120 > > Norm: 1.09862436488003634E-007 > > Its: 101 > > Norm: 1.09862436488003634E-007 > > Its: 101 > > Total time: 2.9765341281890869 > > Total time: 2.9770300388336182 > > > > n=4 > > TrivSoln loaded, size: 213120 / 213120 > > TrivSoln loaded, size: 213120 / 213120 > > TrivSoln loaded, size: 213120 / 213120 > > TrivSoln loaded, size: 213120 / 213120 > > RHS loaded, size: 213120 / 213120 > > RHS loaded, size: 213120 / 213120 > > RHS loaded, size: 213120 / 213120 > > RHS loaded, size: 213120 / 213120 > > Norm: 1.72790692829407788E-005 > > Its: 101 > > Norm: 1.72790692829407788E-005 > > Its: 101 > > Norm: 1.72790692829407788E-005 > > Its: 101 > > Norm: 1.72790692829407788E-005 > > Its: 101 > > Total time: 1.8007240295410156 > > Total time: 1.8008360862731934 > > Total time: 1.8008909225463867 > > Total time: 1.8009200096130371 > > > > > > That is the error norm from the ones vector, im attaching the script again. > > > > > > On Sat, Oct 1, 2016 at 8:59 AM, Barry Smith wrote: > > > > > On Sep 30, 2016, at 9:13 PM, Manuel Valera wrote: > > > > > > Hi Barry and all, > > > > > > I was successful on creating the parallel version to solve my big system, it is scaling accordingly, but i noticed the error norm increasing too, i don't know if this is because the output is duplicated or if its really increasing. Is this expected ? > > > > What do you mean by error norm? Do you have an exact solution you are comparing to? If so, you should scale the norm arising from this by 1/sqrt(nx*ny) where nx and ny are the number of grid points in the x and y direction. This scaling makes the norm correspond to the L2 norm of the error which is what you want to measure. > > > > With this new scaling you can do convergence studies, for example refine the grid once how much does the error norm reduce, refine the grid again and you should see a similar reduction in the error norm. > > > > > > Barry > > > > > > > > Thanks > > > > > > On Tue, Sep 27, 2016 at 4:07 PM, Barry Smith wrote: > > > > > > Yes, always use the binary file > > > > > > > On Sep 27, 2016, at 3:13 PM, Manuel Valera wrote: > > > > > > > > Barry, thanks for your insight, > > > > > > > > This standalone script must be translated into a much bigger model, which uses AIJ matrices to define the laplacian in the form of the 3 usual arrays, the ascii files in the script take the place of the arrays which are passed to the solving routine in the model. > > > > > > > > So, can i use the approach you mention to create the MPIAIJ from the petsc binary file ? would this be a better solution than reading the three arrays directly? In the model, even the smallest matrix is 10^5x10^5 elements > > > > > > > > Thanks. > > > > > > > > > > > > On Tue, Sep 27, 2016 at 12:53 PM, Barry Smith wrote: > > > > > > > > Are you loading a matrix from an ASCII file? If so don't do that. You should write a simple sequential PETSc program that reads in the ASCII file and saves the matrix as a PETSc binary file with MatView(). Then write your parallel code that reads in the binary file with MatLoad() and solves the system. You can read in the right hand side from ASCII and save it in the binary file also. Trying to read an ASCII file in parallel and set it into a PETSc parallel matrix is just a totally thankless task that is unnecessary. > > > > > > > > Barry > > > > > > > > > On Sep 26, 2016, at 6:40 PM, Manuel Valera wrote: > > > > > > > > > > Ok, last output was from simulated multicores, in an actual cluster the errors are of the kind: > > > > > > > > > > [valera at cinci CSRMatrix]$ petsc -n 2 ./solvelinearmgPETSc > > > > > TrivSoln loaded, size: 4 / 4 > > > > > TrivSoln loaded, size: 4 / 4 > > > > > RHS loaded, size: 4 / 4 > > > > > RHS loaded, size: 4 / 4 > > > > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > > > [0]PETSC ERROR: Argument out of range > > > > > [0]PETSC ERROR: Comm must be of size 1 > > > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > > [0]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > > [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > > > [1]PETSC ERROR: Argument out of range > > > > > [1]PETSC ERROR: Comm must be of size 1 > > > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > > > [1]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > > [1]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > > [1]PETSC ERROR: #1 MatCreate_SeqAIJ() line 3958 in /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > > [1]PETSC ERROR: #2 MatSetType() line 94 in /home/valera/petsc-3.7.2/src/mat/interface/matreg.c > > > > > [1]PETSC ERROR: #3 MatCreateSeqAIJWithArrays() line 4300 in /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > > local size: 2 > > > > > local size: 2 > > > > > Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > > [0]PETSC ERROR: #1 MatCreate_SeqAIJ() line 3958 in /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > > [0]PETSC ERROR: #2 MatSetType() line 94 in /home/valera/petsc-3.7.2/src/mat/interface/matreg.c > > > > > [0]PETSC ERROR: #3 MatCreateSeqAIJWithArrays() line 4300 in /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > > > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > > > [1]PETSC ERROR: [0]PETSC ERROR: Nonconforming object sizes > > > > > [0]PETSC ERROR: Sum of local lengths 8 does not equal global length 4, my local length 4 > > > > > likely a call to VecSetSizes() or MatSetSizes() is wrong. > > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html#split > > > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > > > Nonconforming object sizes > > > > > [1]PETSC ERROR: Sum of local lengths 8 does not equal global length 4, my local length 4 > > > > > likely a call to VecSetSizes() or MatSetSizes() is wrong. > > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html#split > > > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > > [0]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > > [1]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > > [1]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > > [0]PETSC ERROR: #4 PetscSplitOwnership() line 93 in /home/valera/petsc-3.7.2/src/sys/utils/psplit.c > > > > > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > > [1]PETSC ERROR: #4 PetscSplitOwnership() line 93 in /home/valera/petsc-3.7.2/src/sys/utils/psplit.c > > > > > [0]PETSC ERROR: #5 PetscLayoutSetUp() line 143 in /home/valera/petsc-3.7.2/src/vec/is/utils/pmap.c > > > > > [0]PETSC ERROR: #6 MatMPIAIJSetPreallocation_MPIAIJ() line 2768 in /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > > [1]PETSC ERROR: #5 PetscLayoutSetUp() line 143 in /home/valera/petsc-3.7.2/src/vec/is/utils/pmap.c > > > > > [1]PETSC ERROR: [0]PETSC ERROR: #7 MatMPIAIJSetPreallocation() line 3505 in /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > > #6 MatMPIAIJSetPreallocation_MPIAIJ() line 2768 in /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > > [1]PETSC ERROR: [0]PETSC ERROR: #8 MatSetUp_MPIAIJ() line 2153 in /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > > #7 MatMPIAIJSetPreallocation() line 3505 in /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > > [1]PETSC ERROR: #8 MatSetUp_MPIAIJ() line 2153 in /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > > [0]PETSC ERROR: #9 MatSetUp() line 739 in /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > > [1]PETSC ERROR: #9 MatSetUp() line 739 in /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > > > [0]PETSC ERROR: Object is in wrong state > > > > > [0]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on argument 1 "mat" before MatSetNearNullSpace() > > > > > [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > > [0]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > > Object is in wrong state > > > > > [1]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on argument 1 "mat" before MatSetNearNullSpace() > > > > > [1]PETSC ERROR: [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > > [0]PETSC ERROR: #10 MatSetNearNullSpace() line 8195 in /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > > > [1]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > > [1]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > > [1]PETSC ERROR: #10 MatSetNearNullSpace() line 8195 in /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > > > [0]PETSC ERROR: Object is in wrong state > > > > > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > > > [0]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on argument 1 "mat" before MatAssemblyBegin() > > > > > [0]PETSC ERROR: [1]PETSC ERROR: Object is in wrong state > > > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > > [0]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on argument 1 "mat" before MatAssemblyBegin() > > > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > > > [1]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > > [1]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > > [1]PETSC ERROR: #11 MatAssemblyBegin() line 5093 in /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > > Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > > [1]PETSC ERROR: #11 MatAssemblyBegin() line 5093 in /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > > [0]PETSC ERROR: ------------------------------------------------------------------------ > > > > > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > > > > > [1]PETSC ERROR: ------------------------------------------------------------------------ > > > > > [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > > > > > [1]PETSC ERROR: [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > > > > > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > > > > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > > > > > [1]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > > > > [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > > > > > or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > > > > > [0]PETSC ERROR: likely location of problem given in stack below > > > > > [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > > > > > [1]PETSC ERROR: likely location of problem given in stack below > > > > > [1]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > > > > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > > > > > [0]PETSC ERROR: INSTEAD the line number of the start of the function > > > > > [0]PETSC ERROR: [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > > > > > [1]PETSC ERROR: INSTEAD the line number of the start of the function > > > > > is given. > > > > > [0]PETSC ERROR: [0] MatAssemblyEnd line 5185 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > > [0]PETSC ERROR: [1]PETSC ERROR: is given. > > > > > [1]PETSC ERROR: [1] MatAssemblyEnd line 5185 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > > [0] MatAssemblyBegin line 5090 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > > [0]PETSC ERROR: [0] MatSetNearNullSpace line 8191 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > > [0]PETSC ERROR: [1]PETSC ERROR: [1] MatAssemblyBegin line 5090 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > > [1]PETSC ERROR: [0] PetscSplitOwnership line 80 /home/valera/petsc-3.7.2/src/sys/utils/psplit.c > > > > > [0]PETSC ERROR: [0] PetscLayoutSetUp line 129 /home/valera/petsc-3.7.2/src/vec/is/utils/pmap.c > > > > > [0]PETSC ERROR: [0] MatMPIAIJSetPreallocation_MPIAIJ line 2767 /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > > [1] MatSetNearNullSpace line 8191 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > > [1]PETSC ERROR: [1] PetscSplitOwnership line 80 /home/valera/petsc-3.7.2/src/sys/utils/psplit.c > > > > > [1]PETSC ERROR: [0]PETSC ERROR: [0] MatMPIAIJSetPreallocation line 3502 /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > > [0]PETSC ERROR: [0] MatSetUp_MPIAIJ line 2152 /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > > [1] PetscLayoutSetUp line 129 /home/valera/petsc-3.7.2/src/vec/is/utils/pmap.c > > > > > [1]PETSC ERROR: [1] MatMPIAIJSetPreallocation_MPIAIJ line 2767 /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > > [0]PETSC ERROR: [0] MatSetUp line 727 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > > [0]PETSC ERROR: [0] MatCreate_SeqAIJ line 3956 /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > > [1]PETSC ERROR: [1] MatMPIAIJSetPreallocation line 3502 /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > > [1]PETSC ERROR: [1] MatSetUp_MPIAIJ line 2152 /home/valera/petsc-3.7.2/src/mat/impls/aij/mpi/mpiaij.c > > > > > [0]PETSC ERROR: [0] MatSetType line 44 /home/valera/petsc-3.7.2/src/mat/interface/matreg.c > > > > > [0]PETSC ERROR: [0] MatCreateSeqAIJWithArrays line 4295 /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > > [1]PETSC ERROR: [1] MatSetUp line 727 /home/valera/petsc-3.7.2/src/mat/interface/matrix.c > > > > > [1]PETSC ERROR: [1] MatCreate_SeqAIJ line 3956 /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > > > [0]PETSC ERROR: Signal received > > > > > [1]PETSC ERROR: [1] MatSetType line 44 /home/valera/petsc-3.7.2/src/mat/interface/matreg.c > > > > > [1]PETSC ERROR: [1] MatCreateSeqAIJWithArrays line 4295 /home/valera/petsc-3.7.2/src/mat/impls/aij/seq/aij.c > > > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > > > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > > [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > > > [1]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > > [0]PETSC ERROR: Signal received > > > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > > > [1]PETSC ERROR: #12 User provided function() line 0 in unknown file > > > > > Petsc Release Version 3.7.2, Jun, 05, 2016 > > > > > [1]PETSC ERROR: ./solvelinearmgPETSc P on a arch-linux2-c-debug named cinci by valera Mon Sep 26 16:39:02 2016 > > > > > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich > > > > > [1]PETSC ERROR: #12 User provided function() line 0 in unknown file > > > > > application called MPI_Abort(comm=0x84000004, 59) - process 0 > > > > > [cli_0]: aborting job: > > > > > application called MPI_Abort(comm=0x84000004, 59) - process 0 > > > > > application called MPI_Abort(comm=0x84000002, 59) - process 1 > > > > > [cli_1]: aborting job: > > > > > application called MPI_Abort(comm=0x84000002, 59) - process 1 > > > > > > > > > > =================================================================================== > > > > > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > > > > > = PID 10266 RUNNING AT cinci > > > > > = EXIT CODE: 59 > > > > > = CLEANING UP REMAINING PROCESSES > > > > > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > > > > > =================================================================================== > > > > > > > > > > > > > > > On Mon, Sep 26, 2016 at 3:51 PM, Manuel Valera wrote: > > > > > Ok, i created a tiny testcase just for this, > > > > > > > > > > The output from n# calls are as follows: > > > > > > > > > > n1: > > > > > Mat Object: 1 MPI processes > > > > > type: mpiaij > > > > > row 0: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > > > row 1: (0, 2.) (1, 1.) (2, 3.) (3, 4.) > > > > > row 2: (0, 4.) (1, 3.) (2, 1.) (3, 2.) > > > > > row 3: (0, 3.) (1, 4.) (2, 2.) (3, 1.) > > > > > > > > > > n2: > > > > > Mat Object: 2 MPI processes > > > > > type: mpiaij > > > > > row 0: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > > > row 1: (0, 2.) (1, 1.) (2, 3.) (3, 4.) > > > > > row 2: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > > > row 3: (0, 2.) (1, 1.) (2, 3.) (3, 4.) > > > > > > > > > > n4: > > > > > Mat Object: 4 MPI processes > > > > > type: mpiaij > > > > > row 0: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > > > row 1: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > > > row 2: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > > > row 3: (0, 1.) (1, 2.) (2, 4.) (3, 3.) > > > > > > > > > > > > > > > > > > > > It really gets messed, no idea what's happening. > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Sep 26, 2016 at 3:12 PM, Barry Smith wrote: > > > > > > > > > > > On Sep 26, 2016, at 5:07 PM, Manuel Valera wrote: > > > > > > > > > > > > Ok i was using a big matrix before, from a smaller testcase i got the output and effectively, it looks like is not well read at all, results are attached for DRAW viewer, output is too big to use STDOUT even in the small testcase. n# is the number of processors requested. > > > > > > > > > > You need to construct a very small test case so you can determine why the values do not end up where you expect them. There is no way around it. > > > > > > > > > > > > is there a way to create the matrix in one node and the distribute it as needed on the rest ? maybe that would work. > > > > > > > > > > No the is not scalable. You become limited by the memory of the one node. > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > On Mon, Sep 26, 2016 at 2:40 PM, Barry Smith wrote: > > > > > > > > > > > > How large is the matrix? It will take a very long time if the matrix is large. Debug with a very small matrix. > > > > > > > > > > > > Barry > > > > > > > > > > > > > On Sep 26, 2016, at 4:34 PM, Manuel Valera wrote: > > > > > > > > > > > > > > Indeed there is something wrong with that call, it hangs out indefinitely showing only: > > > > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > type: mpiaij > > > > > > > > > > > > > > It draws my attention that this program works for 1 processor but not more, but it doesnt show anything for that viewer in either case. > > > > > > > > > > > > > > Thanks for the insight on the redundant calls, this is not very clear on documentation, which calls are included in others. > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Sep 26, 2016 at 2:02 PM, Barry Smith wrote: > > > > > > > > > > > > > > The call to MatCreateMPIAIJWithArrays() is likely interpreting the values you pass in different than you expect. > > > > > > > > > > > > > > Put a call to MatView(Ap,PETSC_VIEWER_STDOUT_WORLD,ierr) after the MatCreateMPIAIJWithArray() to see what PETSc thinks the matrix is. > > > > > > > > > > > > > > > > > > > > > > On Sep 26, 2016, at 3:42 PM, Manuel Valera wrote: > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > I'm working on solve a linear system in parallel, following ex12 of the ksp tutorial i don't see major complication on doing so, so for a working linear system solver with PCJACOBI and KSPGCR i did only the following changes: > > > > > > > > > > > > > > > > call MatCreate(PETSC_COMM_WORLD,Ap,ierr) > > > > > > > > ! call MatSetType(Ap,MATSEQAIJ,ierr) > > > > > > > > call MatSetType(Ap,MATMPIAIJ,ierr) !paralellization > > > > > > > > > > > > > > > > call MatSetSizes(Ap,PETSC_DECIDE,PETSC_DECIDE,nbdp,nbdp,ierr); > > > > > > > > > > > > > > > > ! call MatSeqAIJSetPreallocationCSR(Ap,iapi,japi,app,ierr) > > > > > > > > call MatSetFromOptions(Ap,ierr) > > > > > > > > > > > > > > Note that none of the lines above are needed (or do anything) because the MatCreateMPIAIJWithArrays() creates the matrix from scratch itself. > > > > > > > > > > > > > > Barry > > > > > > > > > > > > > > > ! call MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD,nbdp,nbdp,iapi,japi,app,Ap,ierr) > > > > > > > > call MatCreateMPIAIJWithArrays(PETSC_COMM_WORLD,floor(real(nbdp)/sizel),PETSC_DECIDE,nbdp,nbdp,iapi,japi,app,Ap,ierr) > > > > > > > > > > > > > > > > > > > > > > > > I grayed out the changes from sequential implementation. > > > > > > > > > > > > > > > > So, it does not complain at runtime until it reaches KSPSolve(), with the following error: > > > > > > > > > > > > > > > > > > > > > > > > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > > > > > > [1]PETSC ERROR: Object is in wrong state > > > > > > > > [1]PETSC ERROR: Matrix is missing diagonal entry 0 > > > > > > > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > > > > > > [1]PETSC ERROR: Petsc Release Version 3.7.3, unknown > > > > > > > > [1]PETSC ERROR: ./solvelinearmgPETSc ? ? on a arch-linux2-c-debug named valera-HP-xw4600-Workstation by valera Mon Sep 26 13:35:15 2016 > > > > > > > > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack=1 --download-mpich=1 --download-ml?=1 > > > > > > > > [1]PETSC ERROR: #1 MatILUFactorSymbolic_SeqAIJ() line 1733 in /home/valera/v5PETSc/petsc/petsc/src/mat/impls/aij/seq/aijfact.c > > > > > > > > [1]PETSC ERROR: #2 MatILUFactorSymbolic() line 6579 in /home/valera/v5PETSc/petsc/petsc/src/mat/interface/matrix.c > > > > > > > > [1]PETSC ERROR: #3 PCSetUp_ILU() line 212 in /home/valera/v5PETSc/petsc/petsc/src/ksp/pc/impls/factor/ilu/ilu.c > > > > > > > > [1]PETSC ERROR: #4 PCSetUp() line 968 in /home/valera/v5PETSc/petsc/petsc/src/ksp/pc/interface/precon.c > > > > > > > > [1]PETSC ERROR: #5 KSPSetUp() line 390 in /home/valera/v5PETSc/petsc/petsc/src/ksp/ksp/interface/itfunc.c > > > > > > > > [1]PETSC ERROR: #6 PCSetUpOnBlocks_BJacobi_Singleblock() line 650 in /home/valera/v5PETSc/petsc/petsc/src/ksp/pc/impls/bjacobi/bjacobi.c > > > > > > > > [1]PETSC ERROR: #7 PCSetUpOnBlocks() line 1001 in /home/valera/v5PETSc/petsc/petsc/src/ksp/pc/interface/precon.c > > > > > > > > [1]PETSC ERROR: #8 KSPSetUpOnBlocks() line 220 in /home/valera/v5PETSc/petsc/petsc/src/ksp/ksp/interface/itfunc.c > > > > > > > > [1]PETSC ERROR: #9 KSPSolve() line 600 in /home/valera/v5PETSc/petsc/petsc/src/ksp/ksp/interface/itfunc.c > > > > > > > > At line 333 of file solvelinearmgPETSc.f90 > > > > > > > > Fortran runtime error: Array bound mismatch for dimension 1 of array 'sol' (213120/106560) > > > > > > > > > > > > > > > > > > > > > > > > This code works for -n 1 cores, but it gives this error when using more than one core. > > > > > > > > > > > > > > > > What am i missing? > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > > > Manuel. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From knepley at gmail.com Sat Oct 1 19:31:56 2016 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 1 Oct 2016 19:31:56 -0500 Subject: [petsc-users] question about the BuildGradientReconstruction In-Reply-To: <89874515-3bb1-7b7b-d40d-0540992f3f70@gmail.com> References: <89874515-3bb1-7b7b-d40d-0540992f3f70@gmail.com> Message-ID: On Sat, Oct 1, 2016 at 9:23 AM, Rongliang Chen wrote: > Dear all, > > I have a question about the gradient reconstruction for the dmplexfvm. > > Why the ghost cells are ignored during the gradient reconstruction in the > function BuildGradientReconstruction_Internal? > The may be a shortcoming. I have to think about that. > For the tetrahedron mesh, if a cell (on the corner) whose three faces are > on the boundary of the computational domain, then only one cell can be used > to reconstruct the gradient (it is deficient for the least square). I found > that, for this situation, the accuracy of the gradient reconstructed by the > least square is a problem. Do you have any suggestions for deal with this > situation? > That is definitely a problem. I will take a look. Thanks, Matt > Best regards, > Rongliang > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From gcfrai at gmail.com Mon Oct 3 15:25:06 2016 From: gcfrai at gmail.com (Amit Itagi) Date: Mon, 3 Oct 2016 16:25:06 -0400 Subject: [petsc-users] FFT using Petsc4py In-Reply-To: References: Message-ID: Lisandro, Thanks. I am still a little confused. In Petsc the steps are: MatCreateFFT MatGetVecsFFTW VecScatterPetsctoFFTW VecScatterFFTWtoPetsc I am trying to understand how to map these steps to Petsc4Py. Amit On Fri, Sep 30, 2016 at 4:16 AM, Lisandro Dalcin wrote: > > > On 27 September 2016 at 21:05, Amit Itagi wrote: > >> Hello, >> >> I am looking at the Petsc FFT interfaces. I was wondering if a parallel >> FFT can be performed within a Petsc4Py code. If not, what would be the best >> way to use the Petsc interfaces for FFT from Petsc4Py ? >> >> > It should work out of the box by using mat.setType(Mat.Type.FFTW) before > setup of your matrix. > > > -- > Lisandro Dalcin > ============ > Research Scientist > Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) > Extreme Computing Research Center (ECRC) > King Abdullah University of Science and Technology (KAUST) > http://ecrc.kaust.edu.sa/ > > 4700 King Abdullah University of Science and Technology > al-Khawarizmi Bldg (Bldg 1), Office # 0109 > Thuwal 23955-6900, Kingdom of Saudi Arabia > http://www.kaust.edu.sa > > Office Phone: +966 12 808-0459 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Mon Oct 3 18:36:38 2016 From: jychang48 at gmail.com (Justin Chang) Date: Mon, 3 Oct 2016 18:36:38 -0500 Subject: [petsc-users] DG within DMPlex Message-ID: Hi all, Is there, or will there be, support for implementing Discontinuous Galerkin formulations within the DMPlex framework? I think it would be nice to have something such as the SIPG formulation for the poisson problem in SNES ex12.c Thanks, Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Oct 3 21:28:35 2016 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 3 Oct 2016 21:28:35 -0500 Subject: [petsc-users] DG within DMPlex In-Reply-To: References: Message-ID: On Mon, Oct 3, 2016 at 6:36 PM, Justin Chang wrote: > Hi all, > > Is there, or will there be, support for implementing Discontinuous > Galerkin formulations within the DMPlex framework? I think it would be nice > to have something such as the SIPG formulation for the poisson problem in > SNES ex12.c > We will have a trial DG in PETSc shortly. However, I don't think DG methods make much sense for elliptic problems. Why would I use it there? Thanks, Matt > Thanks, > Justin > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Mon Oct 3 21:45:06 2016 From: jychang48 at gmail.com (Justin Chang) Date: Mon, 3 Oct 2016 21:45:06 -0500 Subject: [petsc-users] DG within DMPlex In-Reply-To: References: Message-ID: I am just saying the poission problem as an example since that is one of the simpler PDEs out there and already exists. On Mon, Oct 3, 2016 at 9:28 PM, Matthew Knepley wrote: > On Mon, Oct 3, 2016 at 6:36 PM, Justin Chang wrote: > >> Hi all, >> >> Is there, or will there be, support for implementing Discontinuous >> Galerkin formulations within the DMPlex framework? I think it would be nice >> to have something such as the SIPG formulation for the poisson problem in >> SNES ex12.c >> > > We will have a trial DG in PETSc shortly. However, I don't think DG > methods make much sense for elliptic > problems. Why would I use it there? > > Thanks, > > Matt > > >> Thanks, >> Justin >> > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cpraveen at gmail.com Mon Oct 3 21:51:31 2016 From: cpraveen at gmail.com (Praveen C) Date: Tue, 4 Oct 2016 08:21:31 +0530 Subject: [petsc-users] DG within DMPlex In-Reply-To: References: Message-ID: DG for elliptic operators still makes lot of sense if you have problems with discontinuous coefficients local grid adaptation (hp) convection-diffusion where convection is dominant fourth order problems (standard C^0 elements can be used) praveen On Tue, Oct 4, 2016 at 7:58 AM, Matthew Knepley wrote: > On Mon, Oct 3, 2016 at 6:36 PM, Justin Chang wrote: > >> Hi all, >> >> Is there, or will there be, support for implementing Discontinuous >> Galerkin formulations within the DMPlex framework? I think it would be nice >> to have something such as the SIPG formulation for the poisson problem in >> SNES ex12.c >> > > We will have a trial DG in PETSc shortly. However, I don't think DG > methods make much sense for elliptic > problems. Why would I use it there? > > Thanks, > > Matt > > >> Thanks, >> Justin >> > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Oct 3 21:52:38 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 3 Oct 2016 21:52:38 -0500 Subject: [petsc-users] DG within DMPlex In-Reply-To: References: Message-ID: <2879A375-0BAB-4850-9769-DA38DA74BBFF@mcs.anl.gov> > On Oct 3, 2016, at 9:45 PM, Justin Chang wrote: > > I am just saying the poission problem as an example since that is one of the simpler PDEs out there and already exists. Sometimes an example for the wrong approach is worse than no example. Can you suggest a simple example where Discontinuous Galerkin makes good sense instead of when it may not make sense? Barry > > On Mon, Oct 3, 2016 at 9:28 PM, Matthew Knepley wrote: > On Mon, Oct 3, 2016 at 6:36 PM, Justin Chang wrote: > Hi all, > > Is there, or will there be, support for implementing Discontinuous Galerkin formulations within the DMPlex framework? I think it would be nice to have something such as the SIPG formulation for the poisson problem in SNES ex12.c > > We will have a trial DG in PETSc shortly. However, I don't think DG methods make much sense for elliptic > problems. Why would I use it there? > > Thanks, > > Matt > > Thanks, > Justin > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > From jychang48 at gmail.com Mon Oct 3 21:57:25 2016 From: jychang48 at gmail.com (Justin Chang) Date: Mon, 3 Oct 2016 21:57:25 -0500 Subject: [petsc-users] DG within DMPlex In-Reply-To: <2879A375-0BAB-4850-9769-DA38DA74BBFF@mcs.anl.gov> References: <2879A375-0BAB-4850-9769-DA38DA74BBFF@mcs.anl.gov> Message-ID: Advection-diffusion equations. Perhaps SNES ex12 could be modified to include an advection term? On Mon, Oct 3, 2016 at 9:52 PM, Barry Smith wrote: > > > On Oct 3, 2016, at 9:45 PM, Justin Chang wrote: > > > > I am just saying the poission problem as an example since that is one of > the simpler PDEs out there and already exists. > > Sometimes an example for the wrong approach is worse than no example. > Can you suggest a simple example where Discontinuous Galerkin makes good > sense instead of when it may not make sense? > > Barry > > > > > On Mon, Oct 3, 2016 at 9:28 PM, Matthew Knepley > wrote: > > On Mon, Oct 3, 2016 at 6:36 PM, Justin Chang > wrote: > > Hi all, > > > > Is there, or will there be, support for implementing Discontinuous > Galerkin formulations within the DMPlex framework? I think it would be nice > to have something such as the SIPG formulation for the poisson problem in > SNES ex12.c > > > > We will have a trial DG in PETSc shortly. However, I don't think DG > methods make much sense for elliptic > > problems. Why would I use it there? > > > > Thanks, > > > > Matt > > > > Thanks, > > Justin > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sander.Arens at ugent.be Tue Oct 4 02:39:38 2016 From: Sander.Arens at ugent.be (Sander Arens) Date: Tue, 4 Oct 2016 09:39:38 +0200 Subject: [petsc-users] DG within DMPlex In-Reply-To: References: <2879A375-0BAB-4850-9769-DA38DA74BBFF@mcs.anl.gov> Message-ID: I think it would also be interesting to have something similar to TS ex25, but now with DMPlex and DG. On 4 October 2016 at 04:57, Justin Chang wrote: > Advection-diffusion equations. Perhaps SNES ex12 could be modified to > include an advection term? > > On Mon, Oct 3, 2016 at 9:52 PM, Barry Smith wrote: > >> >> > On Oct 3, 2016, at 9:45 PM, Justin Chang wrote: >> > >> > I am just saying the poission problem as an example since that is one >> of the simpler PDEs out there and already exists. >> >> Sometimes an example for the wrong approach is worse than no example. >> Can you suggest a simple example where Discontinuous Galerkin makes good >> sense instead of when it may not make sense? >> >> Barry >> >> > >> > On Mon, Oct 3, 2016 at 9:28 PM, Matthew Knepley >> wrote: >> > On Mon, Oct 3, 2016 at 6:36 PM, Justin Chang >> wrote: >> > Hi all, >> > >> > Is there, or will there be, support for implementing Discontinuous >> Galerkin formulations within the DMPlex framework? I think it would be nice >> to have something such as the SIPG formulation for the poisson problem in >> SNES ex12.c >> > >> > We will have a trial DG in PETSc shortly. However, I don't think DG >> methods make much sense for elliptic >> > problems. Why would I use it there? >> > >> > Thanks, >> > >> > Matt >> > >> > Thanks, >> > Justin >> > -- >> > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > -- Norbert Wiener >> > >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Oct 4 06:27:08 2016 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 4 Oct 2016 06:27:08 -0500 Subject: [petsc-users] DG within DMPlex In-Reply-To: References: Message-ID: On Mon, Oct 3, 2016 at 9:51 PM, Praveen C wrote: > DG for elliptic operators still makes lot of sense if you have > > problems with discontinuous coefficients > This is thrown around a lot, but without justification. Why is it better for discontinuous coefficients? The solution is smoother than the coefficient (elliptic regularity). Are DG bases more efficient than high order cG for this problem? I have never seen anything convincing. > local grid adaptation (hp) > This is just as easy in cG land. > convection-diffusion where convection is dominant > I have seen the CW Shu papers on this, and I can understand the possible advantages here. > fourth order problems (standard C^0 elements can be used) > Interior penalty is a possibility for this problem. So are C1 elements, with which this is rarely compared. It should also be compared with the NURBS formulations, like IGA. Matt > praveen > > > On Tue, Oct 4, 2016 at 7:58 AM, Matthew Knepley wrote: > >> On Mon, Oct 3, 2016 at 6:36 PM, Justin Chang wrote: >> >>> Hi all, >>> >>> Is there, or will there be, support for implementing Discontinuous >>> Galerkin formulations within the DMPlex framework? I think it would be nice >>> to have something such as the SIPG formulation for the poisson problem in >>> SNES ex12.c >>> >> >> We will have a trial DG in PETSc shortly. However, I don't think DG >> methods make much sense for elliptic >> problems. Why would I use it there? >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> Justin >>> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Oct 4 06:28:20 2016 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 4 Oct 2016 06:28:20 -0500 Subject: [petsc-users] DG within DMPlex In-Reply-To: References: <2879A375-0BAB-4850-9769-DA38DA74BBFF@mcs.anl.gov> Message-ID: On Tue, Oct 4, 2016 at 2:39 AM, Sander Arens wrote: > I think it would also be interesting to have something similar to TS ex25, > but now with DMPlex and DG. > I think this would be my first target. I realize that the Laplacian is part of it, so that Justin's suggestion of ex12 follows from that. Matt > On 4 October 2016 at 04:57, Justin Chang wrote: > >> Advection-diffusion equations. Perhaps SNES ex12 could be modified to >> include an advection term? >> >> On Mon, Oct 3, 2016 at 9:52 PM, Barry Smith wrote: >> >>> >>> > On Oct 3, 2016, at 9:45 PM, Justin Chang wrote: >>> > >>> > I am just saying the poission problem as an example since that is one >>> of the simpler PDEs out there and already exists. >>> >>> Sometimes an example for the wrong approach is worse than no example. >>> Can you suggest a simple example where Discontinuous Galerkin makes good >>> sense instead of when it may not make sense? >>> >>> Barry >>> >>> > >>> > On Mon, Oct 3, 2016 at 9:28 PM, Matthew Knepley >>> wrote: >>> > On Mon, Oct 3, 2016 at 6:36 PM, Justin Chang >>> wrote: >>> > Hi all, >>> > >>> > Is there, or will there be, support for implementing Discontinuous >>> Galerkin formulations within the DMPlex framework? I think it would be nice >>> to have something such as the SIPG formulation for the poisson problem in >>> SNES ex12.c >>> > >>> > We will have a trial DG in PETSc shortly. However, I don't think DG >>> methods make much sense for elliptic >>> > problems. Why would I use it there? >>> > >>> > Thanks, >>> > >>> > Matt >>> > >>> > Thanks, >>> > Justin >>> > -- >>> > What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> > -- Norbert Wiener >>> > >>> >>> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Oct 4 10:23:56 2016 From: jed at jedbrown.org (Jed Brown) Date: Tue, 04 Oct 2016 09:23:56 -0600 Subject: [petsc-users] DG within DMPlex In-Reply-To: References: Message-ID: <87zimki1df.fsf@jedbrown.org> Matthew Knepley writes: > On Mon, Oct 3, 2016 at 9:51 PM, Praveen C wrote: > >> DG for elliptic operators still makes lot of sense if you have >> >> problems with discontinuous coefficients >> > > This is thrown around a lot, but without justification. Why is it better > for discontinuous coefficients? The > solution is smoother than the coefficient (elliptic regularity). Are DG > bases more efficient than high order > cG for this problem? I have never seen anything convincing. CG is non-monotone and the artifacts are often pretty serious for high-contrast coefficients, especially when you're interested in gradients (flow in porous media). But because the coefficients are under/barely-resolved, you won't see any benefit from high order DG, in which case you're just using a complicated/expensive method versus H(div) finite elements (perhaps cast as finite volume or mimetic FD). -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From knepley at gmail.com Tue Oct 4 10:26:12 2016 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 4 Oct 2016 10:26:12 -0500 Subject: [petsc-users] DG within DMPlex In-Reply-To: <87zimki1df.fsf@jedbrown.org> References: <87zimki1df.fsf@jedbrown.org> Message-ID: On Tue, Oct 4, 2016 at 10:23 AM, Jed Brown wrote: > Matthew Knepley writes: > > > On Mon, Oct 3, 2016 at 9:51 PM, Praveen C wrote: > > > >> DG for elliptic operators still makes lot of sense if you have > >> > >> problems with discontinuous coefficients > >> > > > > This is thrown around a lot, but without justification. Why is it better > > for discontinuous coefficients? The > > solution is smoother than the coefficient (elliptic regularity). Are DG > > bases more efficient than high order > > cG for this problem? I have never seen anything convincing. > > CG is non-monotone and the artifacts are often pretty serious for > high-contrast coefficients, especially when you're interested in > gradients (flow in porous media). But because the coefficients are > under/barely-resolved, you won't see any benefit from high order DG, in > which case you're just using a complicated/expensive method versus > H(div) finite elements (perhaps cast as finite volume or mimetic FD). > I was including H(div) elements in my cG world. Is this terminology wrong? Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Oct 4 10:43:18 2016 From: jed at jedbrown.org (Jed Brown) Date: Tue, 04 Oct 2016 09:43:18 -0600 Subject: [petsc-users] DG within DMPlex In-Reply-To: References: <87zimki1df.fsf@jedbrown.org> Message-ID: <87wphoi0h5.fsf@jedbrown.org> Matthew Knepley writes: > On Tue, Oct 4, 2016 at 10:23 AM, Jed Brown wrote: > >> Matthew Knepley writes: >> >> > On Mon, Oct 3, 2016 at 9:51 PM, Praveen C wrote: >> > >> >> DG for elliptic operators still makes lot of sense if you have >> >> >> >> problems with discontinuous coefficients >> >> >> > >> > This is thrown around a lot, but without justification. Why is it better >> > for discontinuous coefficients? The >> > solution is smoother than the coefficient (elliptic regularity). Are DG >> > bases more efficient than high order >> > cG for this problem? I have never seen anything convincing. >> >> CG is non-monotone and the artifacts are often pretty serious for >> high-contrast coefficients, especially when you're interested in >> gradients (flow in porous media). But because the coefficients are >> under/barely-resolved, you won't see any benefit from high order DG, in >> which case you're just using a complicated/expensive method versus >> H(div) finite elements (perhaps cast as finite volume or mimetic FD). >> > > I was including H(div) elements in my cG world. Is this terminology wrong? It's not a continuous basis.... Perhaps ambiguous. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From knepley at gmail.com Tue Oct 4 11:09:57 2016 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 4 Oct 2016 11:09:57 -0500 Subject: [petsc-users] DG within DMPlex In-Reply-To: <87wphoi0h5.fsf@jedbrown.org> References: <87zimki1df.fsf@jedbrown.org> <87wphoi0h5.fsf@jedbrown.org> Message-ID: On Tue, Oct 4, 2016 at 10:43 AM, Jed Brown wrote: > Matthew Knepley writes: > > > On Tue, Oct 4, 2016 at 10:23 AM, Jed Brown wrote: > > > >> Matthew Knepley writes: > >> > >> > On Mon, Oct 3, 2016 at 9:51 PM, Praveen C wrote: > >> > > >> >> DG for elliptic operators still makes lot of sense if you have > >> >> > >> >> problems with discontinuous coefficients > >> >> > >> > > >> > This is thrown around a lot, but without justification. Why is it > better > >> > for discontinuous coefficients? The > >> > solution is smoother than the coefficient (elliptic regularity). Are > DG > >> > bases more efficient than high order > >> > cG for this problem? I have never seen anything convincing. > >> > >> CG is non-monotone and the artifacts are often pretty serious for > >> high-contrast coefficients, especially when you're interested in > >> gradients (flow in porous media). But because the coefficients are > >> under/barely-resolved, you won't see any benefit from high order DG, in > >> which case you're just using a complicated/expensive method versus > >> H(div) finite elements (perhaps cast as finite volume or mimetic FD). > >> > > > > I was including H(div) elements in my cG world. Is this terminology > wrong? > > It's not a continuous basis.... > > Perhaps ambiguous. > I think cG should refer to Conforming Galerkin, since that is really what is implied. DG and H(div) are both non-conforming. So I really want to cG/nG dichotomy. Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hengjiew at uci.edu Tue Oct 4 13:13:45 2016 From: hengjiew at uci.edu (frank) Date: Tue, 4 Oct 2016 11:13:45 -0700 Subject: [petsc-users] Performance of the Telescope Multigrid Preconditioner In-Reply-To: References: <577C337B.60909@uci.edu> <577D75D3.8010703@uci.edu> <2F25042C-E6D6-4AC6-9C22-1B63F8065836@mcs.anl.gov> <57804DE9.707@uci.edu> <5783D3E4.4020004@uci.edu> <5786C9C7.1080309@uci.edu> <5959F823-EDE5-4B34-84C2-271076977368@mcs.anl.gov> <0CFDEA05-2C49-4127-9F13-2B2DB71ADA77@mcs.anl.gov> <27f4756a-3c58-5c56-fd5b-000aac881a5b@uci.edu> Message-ID: <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> Hi, This question is follow-up of the thread "Question about memory usage in Multigrid preconditioner". I used to have the "Out of Memory(OOM)" problem when using the CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0; -matptap_scalable" option did solve that problem. Then I test the scalability by solving a 3d poisson eqn for 1 step. I used one sub-communicator in all the tests. The difference between the petsc options in those tests are: 1 the pc_telescope_reduction_factor; 2 the number of multigrid levels in the up/down solver. The function "ksp_solve" is timed. It is kind of slow and doesn't scale at all. Test1: 512^3 grid points Core# telescope_reduction_factor MG levels# for up/down solver Time for KSPSolve (s) 512 8 4 / 3 6.2466 4096 64 5 / 3 0.9361 32768 64 4 / 3 4.8914 Test2: 1024^3 grid points Core# telescope_reduction_factor MG levels# for up/down solver Time for KSPSolve (s) 4096 64 5 / 4 3.4139 8192 128 5 / 4 2.4196 16384 32 5 / 3 5.4150 32768 64 5 / 3 5.6067 65536 128 5 / 3 6.5219 I guess I didn't set the MG levels properly. What would be the efficient way to arrange the MG levels? Also which preconditionr at the coarse mesh of the 2nd communicator should I use to improve the performance? I attached the test code and the petsc options file for the 1024^3 cube with 32768 cores. Thank you. Regards, Frank On 09/15/2016 03:35 AM, Dave May wrote: > HI all, > > I the only unexpected memory usage I can see is associated with the > call to MatPtAP(). > Here is something you can try immediately. > Run your code with the additional options > -matrap 0 -matptap_scalable > > I didn't realize this before, but the default behaviour of MatPtAP in > parallel is actually to to explicitly form the transpose of P (e.g. > assemble R = P^T) and then compute R.A.P. > You don't want to do this. The option -matrap 0 resolves this issue. > > The implementation of P^T.A.P has two variants. > The scalable implementation (with respect to memory usage) is selected > via the second option -matptap_scalable. > > Try it out - I see a significant memory reduction using these options > for particular mesh sizes / partitions. > > I've attached a cleaned up version of the code you sent me. > There were a number of memory leaks and other issues. > The main points being > * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End} > * You should call PetscFinalize(), otherwise the option -log_summary > (-log_view) will not display anything once the program has completed. > > > Thanks, > Dave > > > On 15 September 2016 at 08:03, Hengjie Wang > wrote: > > Hi Dave, > > Sorry, I should have put more comment to explain the code. > The number of process in each dimension is the same: Px = Py=Pz=P. > So is the domain size. > So if the you want to run the code for a 512^3 grid points on > 16^3 cores, you need to set "-N 512 -P 16" in the command line. > I add more comments and also fix an error in the attached code. ( > The error only effects the accuracy of solution but not the memory > usage. ) > > Thank you. > Frank > > > On 9/14/2016 9:05 PM, Dave May wrote: >> >> >> On Thursday, 15 September 2016, Dave May > > wrote: >> >> >> >> On Thursday, 15 September 2016, frank wrote: >> >> Hi, >> >> I write a simple code to re-produce the error. I hope >> this can help to diagnose the problem. >> The code just solves a 3d poisson equation. >> >> >> Why is the stencil width a runtime parameter?? And why is the >> default value 2? For 7-pnt FD Laplace, you only need >> a stencil width of 1. >> >> Was this choice made to mimic something in the >> real application code? >> >> >> Please ignore - I misunderstood your usage of the param set by -P >> >> >> I run the code on a 1024^3 mesh. The process partition is >> 32 * 32 * 32. That's when I re-produce the OOM error. >> Each core has about 2G memory. >> I also run the code on a 512^3 mesh with 16 * 16 * 16 >> processes. The ksp solver works fine. >> I attached the code, ksp_view_pre's output and my petsc >> option file. >> >> Thank you. >> Frank >> >> On 09/09/2016 06:38 PM, Hengjie Wang wrote: >>> Hi Barry, >>> >>> I checked. On the supercomputer, I had the option >>> "-ksp_view_pre" but it is not in file I sent you. I am >>> sorry for the confusion. >>> >>> Regards, >>> Frank >>> >>> On Friday, September 9, 2016, Barry Smith >>> wrote: >>> >>> >>> > On Sep 9, 2016, at 3:11 PM, frank >>> wrote: >>> > >>> > Hi Barry, >>> > >>> > I think the first KSP view output is from >>> -ksp_view_pre. Before I submitted the test, I was >>> not sure whether there would be OOM error or not. So >>> I added both -ksp_view_pre and -ksp_view. >>> >>> But the options file you sent specifically does >>> NOT list the -ksp_view_pre so how could it be from that? >>> >>> Sorry to be pedantic but I've spent too much time >>> in the past trying to debug from incorrect >>> information and want to make sure that the >>> information I have is correct before thinking. >>> Please recheck exactly what happened. Rerun with the >>> exact input file you emailed if that is needed. >>> >>> Barry >>> >>> > >>> > Frank >>> > >>> > >>> > On 09/09/2016 12:38 PM, Barry Smith wrote: >>> >> Why does ksp_view2.txt have two KSP views in it >>> while ksp_view1.txt has only one KSPView in it? Did >>> you run two different solves in the 2 case but not >>> the one? >>> >> >>> >> Barry >>> >> >>> >> >>> >> >>> >>> On Sep 9, 2016, at 10:56 AM, frank >>> wrote: >>> >>> >>> >>> Hi, >>> >>> >>> >>> I want to continue digging into the memory >>> problem here. >>> >>> I did find a work around in the past, which is >>> to use less cores per node so that each core has 8G >>> memory. However this is deficient and expensive. I >>> hope to locate the place that uses the most memory. >>> >>> >>> >>> Here is a brief summary of the tests I did in past: >>> >>>> Test1: Mesh 1536*128*384 | Process Mesh 48*4*12 >>> >>> Maximum (over computational time) process >>> memory: total 7.0727e+08 >>> >>> Current process memory: total >>> 7.0727e+08 >>> >>> Maximum (over computational time) space >>> PetscMalloc()ed: total 6.3908e+11 >>> >>> Current space PetscMalloc()ed: >>> total 1.8275e+09 >>> >>> >>> >>>> Test2: Mesh 1536*128*384 | Process Mesh >>> 96*8*24 >>> >>> Maximum (over computational time) process >>> memory: total 5.9431e+09 >>> >>> Current process memory: total >>> 5.9431e+09 >>> >>> Maximum (over computational time) space >>> PetscMalloc()ed: total 5.3202e+12 >>> >>> Current space PetscMalloc()ed: >>> total 5.4844e+09 >>> >>> >>> >>>> Test3: Mesh 3072*256*768 | Process Mesh >>> 96*8*24 >>> >>> OOM( Out Of Memory ) killer of the >>> supercomputer terminated the job during "KSPSolve". >>> >>> >>> >>> I attached the output of ksp_view( the third >>> test's output is from ksp_view_pre ), memory_view >>> and also the petsc options. >>> >>> >>> >>> In all the tests, each core can access about 2G >>> memory. In test3, there are 4223139840 non-zeros in >>> the matrix. This will consume about 1.74M, using >>> double precision. Considering some extra memory used >>> to store integer index, 2G memory should still be >>> way enough. >>> >>> >>> >>> Is there a way to find out which part of >>> KSPSolve uses the most memory? >>> >>> Thank you so much. >>> >>> >>> >>> BTW, there are 4 options remains unused and I >>> don't understand why they are omitted: >>> >>> -mg_coarse_telescope_mg_coarse_ksp_type value: >>> preonly >>> >>> -mg_coarse_telescope_mg_coarse_pc_type value: >>> bjacobi >>> >>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1 >>> >>> -mg_coarse_telescope_mg_levels_ksp_type value: >>> richardson >>> >>> >>> >>> >>> >>> Regards, >>> >>> Frank >>> >>> >>> >>> On 07/13/2016 05:47 PM, Dave May wrote: >>> >>>> >>> >>>> On 14 July 2016 at 01:07, frank >>> wrote: >>> >>>> Hi Dave, >>> >>>> >>> >>>> Sorry for the late reply. >>> >>>> Thank you so much for your detailed reply. >>> >>>> >>> >>>> I have a question about the estimation of the >>> memory usage. There are 4223139840 allocated >>> non-zeros and 18432 MPI processes. Double precision >>> is used. So the memory per process is: >>> >>>> 4223139840 * 8bytes / 18432 / 1024 / 1024 = >>> 1.74M ? >>> >>>> Did I do sth wrong here? Because this seems too >>> small. >>> >>>> >>> >>>> No - I totally f***ed it up. You are correct. >>> That'll teach me for fumbling around with my iphone >>> calculator and not using my brain. (Note that to >>> convert to MB just divide by 1e6, not 1024^2 - >>> although I apparently cannot convert between units >>> correctly....) >>> >>>> >>> >>>> From the PETSc objects associated with the >>> solver, It looks like it _should_ run with 2GB per >>> MPI rank. Sorry for my mistake. Possibilities are: >>> somewhere in your usage of PETSc you've introduced a >>> memory leak; PETSc is doing a huge over allocation >>> (e.g. as per our discussion of MatPtAP); or in your >>> application code there are other objects you have >>> forgotten to log the memory for. >>> >>>> >>> >>>> >>> >>>> >>> >>>> I am running this job on Bluewater >>> >>>> I am using the 7 points FD stencil in 3D. >>> >>>> >>> >>>> I thought so on both counts. >>> >>>> >>> >>>> I apologize that I made a stupid mistake in >>> computing the memory per core. My settings render >>> each core can access only 2G memory on average >>> instead of 8G which I mentioned in previous email. I >>> re-run the job with 8G memory per core on average >>> and there is no "Out Of Memory" error. I would do >>> more test to see if there is still some memory issue. >>> >>>> >>> >>>> Ok. I'd still like to know where the memory was >>> being used since my estimates were off. >>> >>>> >>> >>>> >>> >>>> Thanks, >>> >>>> Dave >>> >>>> >>> >>>> Regards, >>> >>>> Frank >>> >>>> >>> >>>> >>> >>>> >>> >>>> On 07/11/2016 01:18 PM, Dave May wrote: >>> >>>>> Hi Frank, >>> >>>>> >>> >>>>> >>> >>>>> On 11 July 2016 at 19:14, frank >>> wrote: >>> >>>>> Hi Dave, >>> >>>>> >>> >>>>> I re-run the test using bjacobi as the >>> preconditioner on the coarse mesh of telescope. The >>> Grid is 3072*256*768 and process mesh is 96*8*24. >>> The petsc option file is attached. >>> >>>>> I still got the "Out Of Memory" error. The >>> error occurred before the linear solver finished one >>> step. So I don't have the full info from ksp_view. >>> The info from ksp_view_pre is attached. >>> >>>>> >>> >>>>> Okay - that is essentially useless (sorry) >>> >>>>> >>> >>>>> It seems to me that the error occurred when >>> the decomposition was going to be changed. >>> >>>>> >>> >>>>> Based on what information? >>> >>>>> Running with -info would give us more clues, >>> but will create a ton of output. >>> >>>>> Please try running the case which failed with >>> -info >>> >>>>> I had another test with a grid of >>> 1536*128*384 and the same process mesh as above. >>> There was no error. The ksp_view info is attached >>> for comparison. >>> >>>>> Thank you. >>> >>>>> >>> >>>>> >>> >>>>> [3] Here is my crude estimate of your memory >>> usage. >>> >>>>> I'll target the biggest memory hogs only to >>> get an order of magnitude estimate >>> >>>>> >>> >>>>> * The Fine grid operator contains 4223139840 >>> non-zeros --> 1.8 GB per MPI rank assuming double >>> precision. >>> >>>>> The indices for the AIJ could amount to >>> another 0.3 GB (assuming 32 bit integers) >>> >>>>> >>> >>>>> * You use 5 levels of coarsening, so the other >>> operators should represent (collectively) >>> >>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4 ~ 300 >>> MB per MPI rank on the communicator with 18432 ranks. >>> >>>>> The coarse grid should consume ~ 0.5 MB per >>> MPI rank on the communicator with 18432 ranks. >>> >>>>> >>> >>>>> * You use a reduction factor of 64, making the >>> new communicator with 288 MPI ranks. >>> >>>>> PCTelescope will first gather a temporary >>> matrix associated with your coarse level operator >>> assuming a comm size of 288 living on the comm with >>> size 18432. >>> >>>>> This matrix will require approximately 0.5 * >>> 64 = 32 MB per core on the 288 ranks. >>> >>>>> This matrix is then used to form a new MPIAIJ >>> matrix on the subcomm, thus require another 32 MB >>> per rank. >>> >>>>> The temporary matrix is now destroyed. >>> >>>>> >>> >>>>> * Because a DMDA is detected, a permutation >>> matrix is assembled. >>> >>>>> This requires 2 doubles per point in the DMDA. >>> >>>>> Your coarse DMDA contains 92 x 16 x 48 points. >>> >>>>> Thus the permutation matrix will require < 1 >>> MB per MPI rank on the sub-comm. >>> >>>>> >>> >>>>> * Lastly, the matrix is permuted. This uses >>> MatPtAP(), but the resulting operator will have the >>> same memory footprint as the unpermuted matrix (32 >>> MB). At any stage in PCTelescope, only 2 operators >>> of size 32 MB are held in memory when the DMDA is >>> provided. >>> >>>>> >>> >>>>> From my rough estimates, the worst case memory >>> foot print for any given core, given your options is >>> approximately >>> >>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB = 2465 MB >>> >>>>> This is way below 8 GB. >>> >>>>> >>> >>>>> Note this estimate completely ignores: >>> >>>>> (1) the memory required for the restriction >>> operator, >>> >>>>> (2) the potential growth in the number of >>> non-zeros per row due to Galerkin coarsening (I >>> wished -ksp_view_pre reported the output from >>> MatView so we could see the number of non-zeros >>> required by the coarse level operators) >>> >>>>> (3) all temporary vectors required by the CG >>> solver, and those required by the smoothers. >>> >>>>> (4) internal memory allocated by MatPtAP >>> >>>>> (5) memory associated with IS's used within >>> PCTelescope >>> >>>>> >>> >>>>> So either I am completely off in my estimates, >>> or you have not carefully estimated the memory usage >>> of your application code. Hopefully others might >>> examine/correct my rough estimates >>> >>>>> >>> >>>>> Since I don't have your code I cannot access >>> the latter. >>> >>>>> Since I don't have access to the same machine >>> you are running on, I think we need to take a step back. >>> >>>>> >>> >>>>> [1] What machine are you running on? Send me a >>> URL if its available >>> >>>>> >>> >>>>> [2] What discretization are you using? (I am >>> guessing a scalar 7 point FD stencil) >>> >>>>> If it's a 7 point FD stencil, we should be >>> able to examine the memory usage of your solver >>> configuration using a standard, light weight >>> existing PETSc example, run on your machine at the >>> same scale. >>> >>>>> This would hopefully enable us to correctly >>> evaluate the actual memory usage required by the >>> solver configuration you are using. >>> >>>>> >>> >>>>> Thanks, >>> >>>>> Dave >>> >>>>> >>> >>>>> >>> >>>>> Frank >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> On 07/08/2016 10:38 PM, Dave May wrote: >>> >>>>>> >>> >>>>>> On Saturday, 9 July 2016, frank >>> wrote: >>> >>>>>> Hi Barry and Dave, >>> >>>>>> >>> >>>>>> Thank both of you for the advice. >>> >>>>>> >>> >>>>>> @Barry >>> >>>>>> I made a mistake in the file names in last >>> email. I attached the correct files this time. >>> >>>>>> For all the three tests, 'Telescope' is used >>> as the coarse preconditioner. >>> >>>>>> >>> >>>>>> == Test1: Grid: 1536*128*384, Process >>> Mesh: 48*4*12 >>> >>>>>> Part of the memory usage: Vector 125 124 >>> 3971904 0. >>> >>>>>> Matrix 101 101 9462372 0 >>> >>>>>> >>> >>>>>> == Test2: Grid: 1536*128*384, Process Mesh: >>> 96*8*24 >>> >>>>>> Part of the memory usage: Vector 125 124 >>> 681672 0. >>> >>>>>> Matrix 101 101 1462180 0. >>> >>>>>> >>> >>>>>> In theory, the memory usage in Test1 should >>> be 8 times of Test2. In my case, it is about 6 times. >>> >>>>>> >>> >>>>>> == Test3: Grid: 3072*256*768, Process Mesh: >>> 96*8*24. Sub-domain per process: 32*32*32 >>> >>>>>> Here I get the out of memory error. >>> >>>>>> >>> >>>>>> I tried to use -mg_coarse jacobi. In this >>> way, I don't need to set -mg_coarse_ksp_type and >>> -mg_coarse_pc_type explicitly, right? >>> >>>>>> The linear solver didn't work in this case. >>> Petsc output some errors. >>> >>>>>> >>> >>>>>> @Dave >>> >>>>>> In test3, I use only one instance of >>> 'Telescope'. On the coarse mesh of 'Telescope', I >>> used LU as the preconditioner instead of SVD. >>> >>>>>> If my set the levels correctly, then on the >>> last coarse mesh of MG where it calls 'Telescope', >>> the sub-domain per process is 2*2*2. >>> >>>>>> On the last coarse mesh of 'Telescope', there >>> is only one grid point per process. >>> >>>>>> I still got the OOM error. The detailed petsc >>> option file is attached. >>> >>>>>> >>> >>>>>> Do you understand the expected memory usage >>> for the particular parallel LU implementation you >>> are using? I don't (seriously). Replace LU with >>> bjacobi and re-run this test. My point about solver >>> debugging is still valid. >>> >>>>>> >>> >>>>>> And please send the result of KSPView so we >>> can see what is actually used in the computations >>> >>>>>> >>> >>>>>> Thanks >>> >>>>>> Dave >>> >>>>>> >>> >>>>>> >>> >>>>>> Thank you so much. >>> >>>>>> >>> >>>>>> Frank >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote: >>> >>>>>> On Jul 6, 2016, at 4:19 PM, frank >>> wrote: >>> >>>>>> >>> >>>>>> Hi Barry, >>> >>>>>> >>> >>>>>> Thank you for you advice. >>> >>>>>> I tried three test. In the 1st test, the grid >>> is 3072*256*768 and the process mesh is 96*8*24. >>> >>>>>> The linear solver is 'cg' the preconditioner >>> is 'mg' and 'telescope' is used as the >>> preconditioner at the coarse mesh. >>> >>>>>> The system gives me the "Out of Memory" error >>> before the linear system is completely solved. >>> >>>>>> The info from '-ksp_view_pre' is attached. I >>> seems to me that the error occurs when it reaches >>> the coarse mesh. >>> >>>>>> >>> >>>>>> The 2nd test uses a grid of 1536*128*384 and >>> process mesh is 96*8*24. The 3rd test uses the same >>> grid but a different process mesh 48*4*12. >>> >>>>>> Are you sure this is right? The total >>> matrix and vector memory usage goes from 2nd test >>> >>>>>> Vector 384 383 8,193,712 0. >>> >>>>>> Matrix 103 103 11,508,688 0. >>> >>>>>> to 3rd test >>> >>>>>> Vector 384 383 1,590,520 0. >>> >>>>>> Matrix 103 103 3,508,664 0. >>> >>>>>> that is the memory usage got smaller but if >>> you have only 1/8th the processes and the same grid >>> it should have gotten about 8 times bigger. Did you >>> maybe cut the grid by a factor of 8 also? If so that >>> still doesn't explain it because the memory usage >>> changed by a factor of 5 something for the vectors >>> and 3 something for the matrices. >>> >>>>>> >>> >>>>>> >>> >>>>>> The linear solver and petsc options in 2nd >>> and 3rd tests are the same in 1st test. The linear >>> solver works fine in both test. >>> >>>>>> I attached the memory usage of the 2nd and >>> 3rd tests. The memory info is from the option >>> '-log_summary'. I tried to use '-momery_info' as you >>> suggested, but in my case petsc treated it as an >>> unused option. It output nothing about the memory. >>> Do I need to add sth to my code so I can use >>> '-memory_info'? >>> >>>>>> Sorry, my mistake the option is -memory_view >>> >>>>>> >>> >>>>>> Can you run the one case with -memory_view >>> and -mg_coarse jacobi -ksp_max_it 1 (just so it >>> doesn't iterate forever) to see how much memory is >>> used without the telescope? Also run case 2 the same >>> way. >>> >>>>>> >>> >>>>>> Barry >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> In both tests the memory usage is not large. >>> >>>>>> >>> >>>>>> It seems to me that it might be the >>> 'telescope' preconditioner that allocated a lot of >>> memory and caused the error in the 1st test. >>> >>>>>> Is there is a way to show how much memory it >>> allocated? >>> >>>>>> >>> >>>>>> Frank >>> >>>>>> >>> >>>>>> On 07/05/2016 03:37 PM, Barry Smith wrote: >>> >>>>>> Frank, >>> >>>>>> >>> >>>>>> You can run with -ksp_view_pre to have >>> it "view" the KSP before the solve so hopefully it >>> gets that far. >>> >>>>>> >>> >>>>>> Please run the problem that does fit >>> with -memory_info when the problem completes it will >>> show the "high water mark" for PETSc allocated >>> memory and total memory used. We first want to look >>> at these numbers to see if it is using more memory >>> than you expect. You could also run with say half >>> the grid spacing to see how the memory usage scaled >>> with the increase in grid points. Make the runs also >>> with -log_view and send all the output from these >>> options. >>> >>>>>> >>> >>>>>> Barry >>> >>>>>> >>> >>>>>> On Jul 5, 2016, at 5:23 PM, frank >>> wrote: >>> >>>>>> >>> >>>>>> Hi, >>> >>>>>> >>> >>>>>> I am using the CG ksp solver and Multigrid >>> preconditioner to solve a linear system in parallel. >>> >>>>>> I chose to use the 'Telescope' as the >>> preconditioner on the coarse mesh for its good >>> performance. >>> >>>>>> The petsc options file is attached. >>> >>>>>> >>> >>>>>> The domain is a 3d box. >>> >>>>>> It works well when the grid is 1536*128*384 >>> and the process mesh is 96*8*24. When I double the >>> size of grid and keep >>> the same process mesh and petsc options, I get an >>> "out of memory" error from the super-cluster I am using. >>> >>>>>> Each process has access to at least 8G >>> memory, which should be more than enough for my >>> application. I am sure that all the other parts of >>> my code( except the linear solver ) do not use much >>> memory. So I doubt if there is something wrong with >>> the linear solver. >>> >>>>>> The error occurs before the linear system is >>> completely solved so I don't have the info from ksp >>> view. I am not able to re-produce the error with a >>> smaller problem either. >>> >>>>>> In addition, I tried to use the block jacobi >>> as the preconditioner with the same grid and same >>> decomposition. The linear solver runs extremely slow >>> but there is no memory error. >>> >>>>>> >>> >>>>>> How can I diagnose what exactly cause the error? >>> >>>>>> Thank you so much. >>> >>>>>> >>> >>>>>> Frank >>> >>>>>> >>> >>>>>> >>> >>> >>>>>> >>> >>>>> >>> >>>> >>> >>> >>> >>> > >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- -ksp_type cg -ksp_norm_type unpreconditioned -ksp_rtol 1e-7 -options_left -ksp_initial_guess_nonzero yes -ksp_converged_reason -pc_type mg -pc_mg_galerkin -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_coarse_ksp_type preonly -mg_coarse_pc_type telescope -mg_coarse_pc_telescope_reduction_factor 64 -matrap 0 -matptap_scalable -memory_view -log_view -options_left 1 # Setting dmdarepart on subcomm -mg_coarse_telescope_ksp_type preonly -mg_coarse_telescope_pc_type mg -mg_coarse_telescope_pc_mg_galerkin -mg_coarse_telescope_pc_mg_levels 3 -mg_coarse_telescope_mg_levels_ksp_max_it 1 -mg_coarse_telescope_mg_levels_ksp_type richardson -mg_coarse_telescope_mg_coarse_ksp_type preonly -mg_coarse_telescope_mg_coarse_pc_type redundant -------------- next part -------------- A non-text attachment was scrubbed... Name: test_ksp.f90 Type: text/x-fortran Size: 6821 bytes Desc: not available URL: From knepley at gmail.com Tue Oct 4 13:24:28 2016 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 4 Oct 2016 13:24:28 -0500 Subject: [petsc-users] Performance of the Telescope Multigrid Preconditioner In-Reply-To: <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> References: <577C337B.60909@uci.edu> <577D75D3.8010703@uci.edu> <2F25042C-E6D6-4AC6-9C22-1B63F8065836@mcs.anl.gov> <57804DE9.707@uci.edu> <5783D3E4.4020004@uci.edu> <5786C9C7.1080309@uci.edu> <5959F823-EDE5-4B34-84C2-271076977368@mcs.anl.gov> <0CFDEA05-2C49-4127-9F13-2B2DB71ADA77@mcs.anl.gov> <27f4756a-3c58-5c56-fd5b-000aac881a5b@uci.edu> <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> Message-ID: On Tue, Oct 4, 2016 at 1:13 PM, frank wrote: > Hi, > This question is follow-up of the thread "Question about memory usage in > Multigrid preconditioner". > I used to have the "Out of Memory(OOM)" problem when using the > CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0; > -matptap_scalable" option did solve that problem. > > Then I test the scalability by solving a 3d poisson eqn for 1 step. I used > one sub-communicator in all the tests. The difference between the petsc > options in those tests are: 1 the pc_telescope_reduction_factor; 2 the > number of multigrid levels in the up/down solver. The function "ksp_solve" > is timed. It is kind of slow and doesn't scale at all. > 1) The number of levels cannot be different in the up/down smoothers. Why are you using a / ? 2) We need to see what solver you actually constructed, so give us the output of -ksp_view 3) For any performance questions, we need the output of -log_view 4) It looks like you are fixing the number of levels as you scale up. This makes the coarse problem much bigger, and is not a scalable way to proceed. Have you looked at the ratio of coarse grid time to level time? 5) Did you look at the options in this paper: https://arxiv.org/abs/1604.07163 Thanks, Matt > Test1: 512^3 grid points > Core# telescope_reduction_factor MG levels# for up/down > solver Time for KSPSolve (s) > 512 8 4 / > 3 6.2466 > 4096 64 5 / > 3 0.9361 > 32768 64 4 / > 3 4.8914 > > Test2: 1024^3 grid points > Core# telescope_reduction_factor MG levels# for up/down > solver Time for KSPSolve (s) > 4096 64 5 / 4 > 3.4139 > 8192 128 5 / > 4 2.4196 > 16384 32 5 / 3 > 5.4150 > 32768 64 5 / > 3 5.6067 > 65536 128 5 / > 3 6.5219 > > I guess I didn't set the MG levels properly. What would be the efficient > way to arrange the MG levels? > Also which preconditionr at the coarse mesh of the 2nd communicator should > I use to improve the performance? > > I attached the test code and the petsc options file for the 1024^3 cube > with 32768 cores. > > Thank you. > > Regards, > Frank > > > > > > > On 09/15/2016 03:35 AM, Dave May wrote: > > HI all, > > I the only unexpected memory usage I can see is associated with the call > to MatPtAP(). > Here is something you can try immediately. > Run your code with the additional options > -matrap 0 -matptap_scalable > > I didn't realize this before, but the default behaviour of MatPtAP in > parallel is actually to to explicitly form the transpose of P (e.g. > assemble R = P^T) and then compute R.A.P. > You don't want to do this. The option -matrap 0 resolves this issue. > > The implementation of P^T.A.P has two variants. > The scalable implementation (with respect to memory usage) is selected via > the second option -matptap_scalable. > > Try it out - I see a significant memory reduction using these options for > particular mesh sizes / partitions. > > I've attached a cleaned up version of the code you sent me. > There were a number of memory leaks and other issues. > The main points being > * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End} > * You should call PetscFinalize(), otherwise the option -log_summary > (-log_view) will not display anything once the program has completed. > > > Thanks, > Dave > > > On 15 September 2016 at 08:03, Hengjie Wang wrote: > >> Hi Dave, >> >> Sorry, I should have put more comment to explain the code. >> The number of process in each dimension is the same: Px = Py=Pz=P. So is >> the domain size. >> So if the you want to run the code for a 512^3 grid points on 16^3 >> cores, you need to set "-N 512 -P 16" in the command line. >> I add more comments and also fix an error in the attached code. ( The >> error only effects the accuracy of solution but not the memory usage. ) >> >> Thank you. >> Frank >> >> >> On 9/14/2016 9:05 PM, Dave May wrote: >> >> >> >> On Thursday, 15 September 2016, Dave May wrote: >> >>> >>> >>> On Thursday, 15 September 2016, frank wrote: >>> >>>> Hi, >>>> >>>> I write a simple code to re-produce the error. I hope this can help to >>>> diagnose the problem. >>>> The code just solves a 3d poisson equation. >>>> >>> >>> Why is the stencil width a runtime parameter?? And why is the default >>> value 2? For 7-pnt FD Laplace, you only need a stencil width of 1. >>> >>> Was this choice made to mimic something in the real application code? >>> >> >> Please ignore - I misunderstood your usage of the param set by -P >> >> >>> >>> >>>> >>>> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32. >>>> That's when I re-produce the OOM error. Each core has about 2G memory. >>>> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The >>>> ksp solver works fine. >>>> I attached the code, ksp_view_pre's output and my petsc option file. >>>> >>>> Thank you. >>>> Frank >>>> >>>> On 09/09/2016 06:38 PM, Hengjie Wang wrote: >>>> >>>> Hi Barry, >>>> >>>> I checked. On the supercomputer, I had the option "-ksp_view_pre" but >>>> it is not in file I sent you. I am sorry for the confusion. >>>> >>>> Regards, >>>> Frank >>>> >>>> On Friday, September 9, 2016, Barry Smith wrote: >>>> >>>>> >>>>> > On Sep 9, 2016, at 3:11 PM, frank wrote: >>>>> > >>>>> > Hi Barry, >>>>> > >>>>> > I think the first KSP view output is from -ksp_view_pre. Before I >>>>> submitted the test, I was not sure whether there would be OOM error or not. >>>>> So I added both -ksp_view_pre and -ksp_view. >>>>> >>>>> But the options file you sent specifically does NOT list the >>>>> -ksp_view_pre so how could it be from that? >>>>> >>>>> Sorry to be pedantic but I've spent too much time in the past >>>>> trying to debug from incorrect information and want to make sure that the >>>>> information I have is correct before thinking. Please recheck exactly what >>>>> happened. Rerun with the exact input file you emailed if that is needed. >>>>> >>>>> Barry >>>>> >>>>> > >>>>> > Frank >>>>> > >>>>> > >>>>> > On 09/09/2016 12:38 PM, Barry Smith wrote: >>>>> >> Why does ksp_view2.txt have two KSP views in it while >>>>> ksp_view1.txt has only one KSPView in it? Did you run two different solves >>>>> in the 2 case but not the one? >>>>> >> >>>>> >> Barry >>>>> >> >>>>> >> >>>>> >> >>>>> >>> On Sep 9, 2016, at 10:56 AM, frank wrote: >>>>> >>> >>>>> >>> Hi, >>>>> >>> >>>>> >>> I want to continue digging into the memory problem here. >>>>> >>> I did find a work around in the past, which is to use less cores >>>>> per node so that each core has 8G memory. However this is deficient and >>>>> expensive. I hope to locate the place that uses the most memory. >>>>> >>> >>>>> >>> Here is a brief summary of the tests I did in past: >>>>> >>>> Test1: Mesh 1536*128*384 | Process Mesh 48*4*12 >>>>> >>> Maximum (over computational time) process memory: total >>>>> 7.0727e+08 >>>>> >>> Current process memory: >>>>> total 7.0727e+08 >>>>> >>> Maximum (over computational time) space PetscMalloc()ed: total >>>>> 6.3908e+11 >>>>> >>> Current space PetscMalloc()ed: >>>>> total 1.8275e+09 >>>>> >>> >>>>> >>>> Test2: Mesh 1536*128*384 | Process Mesh 96*8*24 >>>>> >>> Maximum (over computational time) process memory: total >>>>> 5.9431e+09 >>>>> >>> Current process memory: >>>>> total 5.9431e+09 >>>>> >>> Maximum (over computational time) space PetscMalloc()ed: total >>>>> 5.3202e+12 >>>>> >>> Current space PetscMalloc()ed: >>>>> total 5.4844e+09 >>>>> >>> >>>>> >>>> Test3: Mesh 3072*256*768 | Process Mesh 96*8*24 >>>>> >>> OOM( Out Of Memory ) killer of the supercomputer terminated >>>>> the job during "KSPSolve". >>>>> >>> >>>>> >>> I attached the output of ksp_view( the third test's output is from >>>>> ksp_view_pre ), memory_view and also the petsc options. >>>>> >>> >>>>> >>> In all the tests, each core can access about 2G memory. In test3, >>>>> there are 4223139840 non-zeros in the matrix. This will consume about >>>>> 1.74M, using double precision. Considering some extra memory used to store >>>>> integer index, 2G memory should still be way enough. >>>>> >>> >>>>> >>> Is there a way to find out which part of KSPSolve uses the most >>>>> memory? >>>>> >>> Thank you so much. >>>>> >>> >>>>> >>> BTW, there are 4 options remains unused and I don't understand why >>>>> they are omitted: >>>>> >>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly >>>>> >>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi >>>>> >>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1 >>>>> >>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson >>>>> >>> >>>>> >>> >>>>> >>> Regards, >>>>> >>> Frank >>>>> >>> >>>>> >>> On 07/13/2016 05:47 PM, Dave May wrote: >>>>> >>>> >>>>> >>>> On 14 July 2016 at 01:07, frank wrote: >>>>> >>>> Hi Dave, >>>>> >>>> >>>>> >>>> Sorry for the late reply. >>>>> >>>> Thank you so much for your detailed reply. >>>>> >>>> >>>>> >>>> I have a question about the estimation of the memory usage. There >>>>> are 4223139840 allocated non-zeros and 18432 MPI processes. Double >>>>> precision is used. So the memory per process is: >>>>> >>>> 4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ? >>>>> >>>> Did I do sth wrong here? Because this seems too small. >>>>> >>>> >>>>> >>>> No - I totally f***ed it up. You are correct. That'll teach me >>>>> for fumbling around with my iphone calculator and not using my brain. (Note >>>>> that to convert to MB just divide by 1e6, not 1024^2 - although I >>>>> apparently cannot convert between units correctly....) >>>>> >>>> >>>>> >>>> From the PETSc objects associated with the solver, It looks like >>>>> it _should_ run with 2GB per MPI rank. Sorry for my mistake. Possibilities >>>>> are: somewhere in your usage of PETSc you've introduced a memory leak; >>>>> PETSc is doing a huge over allocation (e.g. as per our discussion of >>>>> MatPtAP); or in your application code there are other objects you have >>>>> forgotten to log the memory for. >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> I am running this job on Bluewater >>>>> >>>> I am using the 7 points FD stencil in 3D. >>>>> >>>> >>>>> >>>> I thought so on both counts. >>>>> >>>> >>>>> >>>> I apologize that I made a stupid mistake in computing the memory >>>>> per core. My settings render each core can access only 2G memory on average >>>>> instead of 8G which I mentioned in previous email. I re-run the job with 8G >>>>> memory per core on average and there is no "Out Of Memory" error. I would >>>>> do more test to see if there is still some memory issue. >>>>> >>>> >>>>> >>>> Ok. I'd still like to know where the memory was being used since >>>>> my estimates were off. >>>>> >>>> >>>>> >>>> >>>>> >>>> Thanks, >>>>> >>>> Dave >>>>> >>>> >>>>> >>>> Regards, >>>>> >>>> Frank >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> On 07/11/2016 01:18 PM, Dave May wrote: >>>>> >>>>> Hi Frank, >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 11 July 2016 at 19:14, frank wrote: >>>>> >>>>> Hi Dave, >>>>> >>>>> >>>>> >>>>> I re-run the test using bjacobi as the preconditioner on the >>>>> coarse mesh of telescope. The Grid is 3072*256*768 and process mesh is >>>>> 96*8*24. The petsc option file is attached. >>>>> >>>>> I still got the "Out Of Memory" error. The error occurred before >>>>> the linear solver finished one step. So I don't have the full info from >>>>> ksp_view. The info from ksp_view_pre is attached. >>>>> >>>>> >>>>> >>>>> Okay - that is essentially useless (sorry) >>>>> >>>>> >>>>> >>>>> It seems to me that the error occurred when the decomposition >>>>> was going to be changed. >>>>> >>>>> >>>>> >>>>> Based on what information? >>>>> >>>>> Running with -info would give us more clues, but will create a >>>>> ton of output. >>>>> >>>>> Please try running the case which failed with -info >>>>> >>>>> I had another test with a grid of 1536*128*384 and the same >>>>> process mesh as above. There was no error. The ksp_view info is attached >>>>> for comparison. >>>>> >>>>> Thank you. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> [3] Here is my crude estimate of your memory usage. >>>>> >>>>> I'll target the biggest memory hogs only to get an order of >>>>> magnitude estimate >>>>> >>>>> >>>>> >>>>> * The Fine grid operator contains 4223139840 non-zeros --> 1.8 >>>>> GB per MPI rank assuming double precision. >>>>> >>>>> The indices for the AIJ could amount to another 0.3 GB (assuming >>>>> 32 bit integers) >>>>> >>>>> >>>>> >>>>> * You use 5 levels of coarsening, so the other operators should >>>>> represent (collectively) >>>>> >>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4 ~ 300 MB per MPI rank on >>>>> the communicator with 18432 ranks. >>>>> >>>>> The coarse grid should consume ~ 0.5 MB per MPI rank on the >>>>> communicator with 18432 ranks. >>>>> >>>>> >>>>> >>>>> * You use a reduction factor of 64, making the new communicator >>>>> with 288 MPI ranks. >>>>> >>>>> PCTelescope will first gather a temporary matrix associated with >>>>> your coarse level operator assuming a comm size of 288 living on the comm >>>>> with size 18432. >>>>> >>>>> This matrix will require approximately 0.5 * 64 = 32 MB per core >>>>> on the 288 ranks. >>>>> >>>>> This matrix is then used to form a new MPIAIJ matrix on the >>>>> subcomm, thus require another 32 MB per rank. >>>>> >>>>> The temporary matrix is now destroyed. >>>>> >>>>> >>>>> >>>>> * Because a DMDA is detected, a permutation matrix is assembled. >>>>> >>>>> This requires 2 doubles per point in the DMDA. >>>>> >>>>> Your coarse DMDA contains 92 x 16 x 48 points. >>>>> >>>>> Thus the permutation matrix will require < 1 MB per MPI rank on >>>>> the sub-comm. >>>>> >>>>> >>>>> >>>>> * Lastly, the matrix is permuted. This uses MatPtAP(), but the >>>>> resulting operator will have the same memory footprint as the unpermuted >>>>> matrix (32 MB). At any stage in PCTelescope, only 2 operators of size 32 MB >>>>> are held in memory when the DMDA is provided. >>>>> >>>>> >>>>> >>>>> From my rough estimates, the worst case memory foot print for >>>>> any given core, given your options is approximately >>>>> >>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB = 2465 MB >>>>> >>>>> This is way below 8 GB. >>>>> >>>>> >>>>> >>>>> Note this estimate completely ignores: >>>>> >>>>> (1) the memory required for the restriction operator, >>>>> >>>>> (2) the potential growth in the number of non-zeros per row due >>>>> to Galerkin coarsening (I wished -ksp_view_pre reported the output from >>>>> MatView so we could see the number of non-zeros required by the coarse >>>>> level operators) >>>>> >>>>> (3) all temporary vectors required by the CG solver, and those >>>>> required by the smoothers. >>>>> >>>>> (4) internal memory allocated by MatPtAP >>>>> >>>>> (5) memory associated with IS's used within PCTelescope >>>>> >>>>> >>>>> >>>>> So either I am completely off in my estimates, or you have not >>>>> carefully estimated the memory usage of your application code. Hopefully >>>>> others might examine/correct my rough estimates >>>>> >>>>> >>>>> >>>>> Since I don't have your code I cannot access the latter. >>>>> >>>>> Since I don't have access to the same machine you are running >>>>> on, I think we need to take a step back. >>>>> >>>>> >>>>> >>>>> [1] What machine are you running on? Send me a URL if its >>>>> available >>>>> >>>>> >>>>> >>>>> [2] What discretization are you using? (I am guessing a scalar 7 >>>>> point FD stencil) >>>>> >>>>> If it's a 7 point FD stencil, we should be able to examine the >>>>> memory usage of your solver configuration using a standard, light weight >>>>> existing PETSc example, run on your machine at the same scale. >>>>> >>>>> This would hopefully enable us to correctly evaluate the actual >>>>> memory usage required by the solver configuration you are using. >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Dave >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Frank >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 07/08/2016 10:38 PM, Dave May wrote: >>>>> >>>>>> >>>>> >>>>>> On Saturday, 9 July 2016, frank wrote: >>>>> >>>>>> Hi Barry and Dave, >>>>> >>>>>> >>>>> >>>>>> Thank both of you for the advice. >>>>> >>>>>> >>>>> >>>>>> @Barry >>>>> >>>>>> I made a mistake in the file names in last email. I attached >>>>> the correct files this time. >>>>> >>>>>> For all the three tests, 'Telescope' is used as the coarse >>>>> preconditioner. >>>>> >>>>>> >>>>> >>>>>> == Test1: Grid: 1536*128*384, Process Mesh: 48*4*12 >>>>> >>>>>> Part of the memory usage: Vector 125 124 3971904 >>>>> 0. >>>>> >>>>>> Matrix 101 101 >>>>> 9462372 0 >>>>> >>>>>> >>>>> >>>>>> == Test2: Grid: 1536*128*384, Process Mesh: 96*8*24 >>>>> >>>>>> Part of the memory usage: Vector 125 124 681672 >>>>> 0. >>>>> >>>>>> Matrix 101 101 >>>>> 1462180 0. >>>>> >>>>>> >>>>> >>>>>> In theory, the memory usage in Test1 should be 8 times of >>>>> Test2. In my case, it is about 6 times. >>>>> >>>>>> >>>>> >>>>>> == Test3: Grid: 3072*256*768, Process Mesh: 96*8*24. >>>>> Sub-domain per process: 32*32*32 >>>>> >>>>>> Here I get the out of memory error. >>>>> >>>>>> >>>>> >>>>>> I tried to use -mg_coarse jacobi. In this way, I don't need to >>>>> set -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right? >>>>> >>>>>> The linear solver didn't work in this case. Petsc output some >>>>> errors. >>>>> >>>>>> >>>>> >>>>>> @Dave >>>>> >>>>>> In test3, I use only one instance of 'Telescope'. On the coarse >>>>> mesh of 'Telescope', I used LU as the preconditioner instead of SVD. >>>>> >>>>>> If my set the levels correctly, then on the last coarse mesh of >>>>> MG where it calls 'Telescope', the sub-domain per process is 2*2*2. >>>>> >>>>>> On the last coarse mesh of 'Telescope', there is only one grid >>>>> point per process. >>>>> >>>>>> I still got the OOM error. The detailed petsc option file is >>>>> attached. >>>>> >>>>>> >>>>> >>>>>> Do you understand the expected memory usage for the particular >>>>> parallel LU implementation you are using? I don't (seriously). Replace LU >>>>> with bjacobi and re-run this test. My point about solver debugging is still >>>>> valid. >>>>> >>>>>> >>>>> >>>>>> And please send the result of KSPView so we can see what is >>>>> actually used in the computations >>>>> >>>>>> >>>>> >>>>>> Thanks >>>>> >>>>>> Dave >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> Thank you so much. >>>>> >>>>>> >>>>> >>>>>> Frank >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote: >>>>> >>>>>> On Jul 6, 2016, at 4:19 PM, frank wrote: >>>>> >>>>>> >>>>> >>>>>> Hi Barry, >>>>> >>>>>> >>>>> >>>>>> Thank you for you advice. >>>>> >>>>>> I tried three test. In the 1st test, the grid is 3072*256*768 >>>>> and the process mesh is 96*8*24. >>>>> >>>>>> The linear solver is 'cg' the preconditioner is 'mg' and >>>>> 'telescope' is used as the preconditioner at the coarse mesh. >>>>> >>>>>> The system gives me the "Out of Memory" error before the linear >>>>> system is completely solved. >>>>> >>>>>> The info from '-ksp_view_pre' is attached. I seems to me that >>>>> the error occurs when it reaches the coarse mesh. >>>>> >>>>>> >>>>> >>>>>> The 2nd test uses a grid of 1536*128*384 and process mesh is >>>>> 96*8*24. The 3rd test uses the >>>>> same grid but a different process mesh 48*4*12. >>>>> >>>>>> Are you sure this is right? The total matrix and vector >>>>> memory usage goes from 2nd test >>>>> >>>>>> Vector 384 383 8,193,712 0. >>>>> >>>>>> Matrix 103 103 11,508,688 0. >>>>> >>>>>> to 3rd test >>>>> >>>>>> Vector 384 383 1,590,520 0. >>>>> >>>>>> Matrix 103 103 3,508,664 0. >>>>> >>>>>> that is the memory usage got smaller but if you have only 1/8th >>>>> the processes and the same grid it should have gotten about 8 times bigger. >>>>> Did you maybe cut the grid by a factor of 8 also? If so that still doesn't >>>>> explain it because the memory usage changed by a factor of 5 something for >>>>> the vectors and 3 something for the matrices. >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> The linear solver and petsc options in 2nd and 3rd tests are >>>>> the same in 1st test. The linear solver works fine in both test. >>>>> >>>>>> I attached the memory usage of the 2nd and 3rd tests. The >>>>> memory info is from the option '-log_summary'. I tried to use >>>>> '-momery_info' as you suggested, but in my case petsc treated it as an >>>>> unused option. It output nothing about the memory. Do I need to add sth to >>>>> my code so I can use '-memory_info'? >>>>> >>>>>> Sorry, my mistake the option is -memory_view >>>>> >>>>>> >>>>> >>>>>> Can you run the one case with -memory_view and -mg_coarse >>>>> jacobi -ksp_max_it 1 (just so it doesn't iterate forever) to see how much >>>>> memory is used without the telescope? Also run case 2 the same way. >>>>> >>>>>> >>>>> >>>>>> Barry >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> In both tests the memory usage is not large. >>>>> >>>>>> >>>>> >>>>>> It seems to me that it might be the 'telescope' preconditioner >>>>> that allocated a lot of memory and caused the error in the 1st test. >>>>> >>>>>> Is there is a way to show how much memory it allocated? >>>>> >>>>>> >>>>> >>>>>> Frank >>>>> >>>>>> >>>>> >>>>>> On 07/05/2016 03:37 PM, Barry Smith wrote: >>>>> >>>>>> Frank, >>>>> >>>>>> >>>>> >>>>>> You can run with -ksp_view_pre to have it "view" the KSP >>>>> before the solve so hopefully it gets that far. >>>>> >>>>>> >>>>> >>>>>> Please run the problem that does fit with -memory_info >>>>> when the problem completes it will show the "high water mark" for PETSc >>>>> allocated memory and total memory used. We first want to look at these >>>>> numbers to see if it is using more memory than you expect. You could also >>>>> run with say half the grid spacing to see how the memory usage scaled with >>>>> the increase in grid points. Make the runs also with -log_view and send all >>>>> the output from these options. >>>>> >>>>>> >>>>> >>>>>> Barry >>>>> >>>>>> >>>>> >>>>>> On Jul 5, 2016, at 5:23 PM, frank wrote: >>>>> >>>>>> >>>>> >>>>>> Hi, >>>>> >>>>>> >>>>> >>>>>> I am using the CG ksp solver and Multigrid preconditioner to >>>>> solve a linear system in parallel. >>>>> >>>>>> I chose to use the 'Telescope' as the preconditioner on the >>>>> coarse mesh for its good performance. >>>>> >>>>>> The petsc options file is attached. >>>>> >>>>>> >>>>> >>>>>> The domain is a 3d box. >>>>> >>>>>> It works well when the grid is 1536*128*384 and the process >>>>> mesh is 96*8*24. When I double the size of grid and >>>>> keep the same process mesh and petsc options, I >>>>> get an "out of memory" error from the super-cluster I am using. >>>>> >>>>>> Each process has access to at least 8G memory, which should be >>>>> more than enough for my application. I am sure that all the other parts of >>>>> my code( except the linear solver ) do not use much memory. So I doubt if >>>>> there is something wrong with the linear solver. >>>>> >>>>>> The error occurs before the linear system is completely solved >>>>> so I don't have the info from ksp view. I am not able to re-produce the >>>>> error with a smaller problem either. >>>>> >>>>>> In addition, I tried to use the block jacobi as the >>>>> preconditioner with the same grid and same decomposition. The linear solver >>>>> runs extremely slow but there is no memory error. >>>>> >>>>>> >>>>> >>>>>> How can I diagnose what exactly cause the error? >>>>> >>>>>> Thank you so much. >>>>> >>>>>> >>>>> >>>>>> Frank >>>>> >>>>>> >>>>> >>>>>> >>>> _options.txt> >>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>>>> >>> >>>> emory2.txt>>>>> tions3.txt> >>>>> > >>>>> >>>>> >>>> >>> >> >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Oct 4 13:36:28 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 4 Oct 2016 13:36:28 -0500 Subject: [petsc-users] Performance of the Telescope Multigrid Preconditioner In-Reply-To: <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> References: <577C337B.60909@uci.edu> <577D75D3.8010703@uci.edu> <2F25042C-E6D6-4AC6-9C22-1B63F8065836@mcs.anl.gov> <57804DE9.707@uci.edu> <5783D3E4.4020004@uci.edu> <5786C9C7.1080309@uci.edu> <5959F823-EDE5-4B34-84C2-271076977368@mcs.anl.gov> <0CFDEA05-2C49-4127-9F13-2B2DB71ADA77@mcs.anl.gov> <27f4756a-3c58-5c56-fd5b-000aac881a5b@uci.edu> <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> Message-ID: -ksp_view in both cases? > On Oct 4, 2016, at 1:13 PM, frank wrote: > > Hi, > > This question is follow-up of the thread "Question about memory usage in Multigrid preconditioner". > I used to have the "Out of Memory(OOM)" problem when using the CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0; -matptap_scalable" option did solve that problem. > > Then I test the scalability by solving a 3d poisson eqn for 1 step. I used one sub-communicator in all the tests. The difference between the petsc options in those tests are: 1 the pc_telescope_reduction_factor; 2 the number of multigrid levels in the up/down solver. The function "ksp_solve" is timed. It is kind of slow and doesn't scale at all. > > Test1: 512^3 grid points > Core# telescope_reduction_factor MG levels# for up/down solver Time for KSPSolve (s) > 512 8 4 / 3 6.2466 > 4096 64 5 / 3 0.9361 > 32768 64 4 / 3 4.8914 > > Test2: 1024^3 grid points > Core# telescope_reduction_factor MG levels# for up/down solver Time for KSPSolve (s) > 4096 64 5 / 4 3.4139 > 8192 128 5 / 4 2.4196 > 16384 32 5 / 3 5.4150 > 32768 64 5 / 3 5.6067 > 65536 128 5 / 3 6.5219 > > I guess I didn't set the MG levels properly. What would be the efficient way to arrange the MG levels? > Also which preconditionr at the coarse mesh of the 2nd communicator should I use to improve the performance? > > I attached the test code and the petsc options file for the 1024^3 cube with 32768 cores. > > Thank you. > > Regards, > Frank > > > > > > > On 09/15/2016 03:35 AM, Dave May wrote: >> HI all, >> >> I the only unexpected memory usage I can see is associated with the call to MatPtAP(). >> Here is something you can try immediately. >> Run your code with the additional options >> -matrap 0 -matptap_scalable >> >> I didn't realize this before, but the default behaviour of MatPtAP in parallel is actually to to explicitly form the transpose of P (e.g. assemble R = P^T) and then compute R.A.P. >> You don't want to do this. The option -matrap 0 resolves this issue. >> >> The implementation of P^T.A.P has two variants. >> The scalable implementation (with respect to memory usage) is selected via the second option -matptap_scalable. >> >> Try it out - I see a significant memory reduction using these options for particular mesh sizes / partitions. >> >> I've attached a cleaned up version of the code you sent me. >> There were a number of memory leaks and other issues. >> The main points being >> * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End} >> * You should call PetscFinalize(), otherwise the option -log_summary (-log_view) will not display anything once the program has completed. >> >> >> Thanks, >> Dave >> >> >> On 15 September 2016 at 08:03, Hengjie Wang wrote: >> Hi Dave, >> >> Sorry, I should have put more comment to explain the code. >> The number of process in each dimension is the same: Px = Py=Pz=P. So is the domain size. >> So if the you want to run the code for a 512^3 grid points on 16^3 cores, you need to set "-N 512 -P 16" in the command line. >> I add more comments and also fix an error in the attached code. ( The error only effects the accuracy of solution but not the memory usage. ) >> >> Thank you. >> Frank >> >> >> On 9/14/2016 9:05 PM, Dave May wrote: >>> >>> >>> On Thursday, 15 September 2016, Dave May wrote: >>> >>> >>> On Thursday, 15 September 2016, frank wrote: >>> Hi, >>> >>> I write a simple code to re-produce the error. I hope this can help to diagnose the problem. >>> The code just solves a 3d poisson equation. >>> >>> Why is the stencil width a runtime parameter?? And why is the default value 2? For 7-pnt FD Laplace, you only need a stencil width of 1. >>> >>> Was this choice made to mimic something in the real application code? >>> >>> Please ignore - I misunderstood your usage of the param set by -P >>> >>> >>> >>> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32. That's when I re-produce the OOM error. Each core has about 2G memory. >>> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp solver works fine. >>> I attached the code, ksp_view_pre's output and my petsc option file. >>> >>> Thank you. >>> Frank >>> >>> On 09/09/2016 06:38 PM, Hengjie Wang wrote: >>>> Hi Barry, >>>> >>>> I checked. On the supercomputer, I had the option "-ksp_view_pre" but it is not in file I sent you. I am sorry for the confusion. >>>> >>>> Regards, >>>> Frank >>>> >>>> On Friday, September 9, 2016, Barry Smith wrote: >>>> >>>> > On Sep 9, 2016, at 3:11 PM, frank wrote: >>>> > >>>> > Hi Barry, >>>> > >>>> > I think the first KSP view output is from -ksp_view_pre. Before I submitted the test, I was not sure whether there would be OOM error or not. So I added both -ksp_view_pre and -ksp_view. >>>> >>>> But the options file you sent specifically does NOT list the -ksp_view_pre so how could it be from that? >>>> >>>> Sorry to be pedantic but I've spent too much time in the past trying to debug from incorrect information and want to make sure that the information I have is correct before thinking. Please recheck exactly what happened. Rerun with the exact input file you emailed if that is needed. >>>> >>>> Barry >>>> >>>> > >>>> > Frank >>>> > >>>> > >>>> > On 09/09/2016 12:38 PM, Barry Smith wrote: >>>> >> Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt has only one KSPView in it? Did you run two different solves in the 2 case but not the one? >>>> >> >>>> >> Barry >>>> >> >>>> >> >>>> >> >>>> >>> On Sep 9, 2016, at 10:56 AM, frank wrote: >>>> >>> >>>> >>> Hi, >>>> >>> >>>> >>> I want to continue digging into the memory problem here. >>>> >>> I did find a work around in the past, which is to use less cores per node so that each core has 8G memory. However this is deficient and expensive. I hope to locate the place that uses the most memory. >>>> >>> >>>> >>> Here is a brief summary of the tests I did in past: >>>> >>>> Test1: Mesh 1536*128*384 | Process Mesh 48*4*12 >>>> >>> Maximum (over computational time) process memory: total 7.0727e+08 >>>> >>> Current process memory: total 7.0727e+08 >>>> >>> Maximum (over computational time) space PetscMalloc()ed: total 6.3908e+11 >>>> >>> Current space PetscMalloc()ed: total 1.8275e+09 >>>> >>> >>>> >>>> Test2: Mesh 1536*128*384 | Process Mesh 96*8*24 >>>> >>> Maximum (over computational time) process memory: total 5.9431e+09 >>>> >>> Current process memory: total 5.9431e+09 >>>> >>> Maximum (over computational time) space PetscMalloc()ed: total 5.3202e+12 >>>> >>> Current space PetscMalloc()ed: total 5.4844e+09 >>>> >>> >>>> >>>> Test3: Mesh 3072*256*768 | Process Mesh 96*8*24 >>>> >>> OOM( Out Of Memory ) killer of the supercomputer terminated the job during "KSPSolve". >>>> >>> >>>> >>> I attached the output of ksp_view( the third test's output is from ksp_view_pre ), memory_view and also the petsc options. >>>> >>> >>>> >>> In all the tests, each core can access about 2G memory. In test3, there are 4223139840 non-zeros in the matrix. This will consume about 1.74M, using double precision. Considering some extra memory used to store integer index, 2G memory should still be way enough. >>>> >>> >>>> >>> Is there a way to find out which part of KSPSolve uses the most memory? >>>> >>> Thank you so much. >>>> >>> >>>> >>> BTW, there are 4 options remains unused and I don't understand why they are omitted: >>>> >>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly >>>> >>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi >>>> >>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1 >>>> >>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson >>>> >>> >>>> >>> >>>> >>> Regards, >>>> >>> Frank >>>> >>> >>>> >>> On 07/13/2016 05:47 PM, Dave May wrote: >>>> >>>> >>>> >>>> On 14 July 2016 at 01:07, frank wrote: >>>> >>>> Hi Dave, >>>> >>>> >>>> >>>> Sorry for the late reply. >>>> >>>> Thank you so much for your detailed reply. >>>> >>>> >>>> >>>> I have a question about the estimation of the memory usage. There are 4223139840 allocated non-zeros and 18432 MPI processes. Double precision is used. So the memory per process is: >>>> >>>> 4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ? >>>> >>>> Did I do sth wrong here? Because this seems too small. >>>> >>>> >>>> >>>> No - I totally f***ed it up. You are correct. That'll teach me for fumbling around with my iphone calculator and not using my brain. (Note that to convert to MB just divide by 1e6, not 1024^2 - although I apparently cannot convert between units correctly....) >>>> >>>> >>>> >>>> From the PETSc objects associated with the solver, It looks like it _should_ run with 2GB per MPI rank. Sorry for my mistake. Possibilities are: somewhere in your usage of PETSc you've introduced a memory leak; PETSc is doing a huge over allocation (e.g. as per our discussion of MatPtAP); or in your application code there are other objects you have forgotten to log the memory for. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> I am running this job on Bluewater >>>> >>>> I am using the 7 points FD stencil in 3D. >>>> >>>> >>>> >>>> I thought so on both counts. >>>> >>>> >>>> >>>> I apologize that I made a stupid mistake in computing the memory per core. My settings render each core can access only 2G memory on average instead of 8G which I mentioned in previous email. I re-run the job with 8G memory per core on average and there is no "Out Of Memory" error. I would do more test to see if there is still some memory issue. >>>> >>>> >>>> >>>> Ok. I'd still like to know where the memory was being used since my estimates were off. >>>> >>>> >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Dave >>>> >>>> >>>> >>>> Regards, >>>> >>>> Frank >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 07/11/2016 01:18 PM, Dave May wrote: >>>> >>>>> Hi Frank, >>>> >>>>> >>>> >>>>> >>>> >>>>> On 11 July 2016 at 19:14, frank wrote: >>>> >>>>> Hi Dave, >>>> >>>>> >>>> >>>>> I re-run the test using bjacobi as the preconditioner on the coarse mesh of telescope. The Grid is 3072*256*768 and process mesh is 96*8*24. The petsc option file is attached. >>>> >>>>> I still got the "Out Of Memory" error. The error occurred before the linear solver finished one step. So I don't have the full info from ksp_view. The info from ksp_view_pre is attached. >>>> >>>>> >>>> >>>>> Okay - that is essentially useless (sorry) >>>> >>>>> >>>> >>>>> It seems to me that the error occurred when the decomposition was going to be changed. >>>> >>>>> >>>> >>>>> Based on what information? >>>> >>>>> Running with -info would give us more clues, but will create a ton of output. >>>> >>>>> Please try running the case which failed with -info >>>> >>>>> I had another test with a grid of 1536*128*384 and the same process mesh as above. There was no error. The ksp_view info is attached for comparison. >>>> >>>>> Thank you. >>>> >>>>> >>>> >>>>> >>>> >>>>> [3] Here is my crude estimate of your memory usage. >>>> >>>>> I'll target the biggest memory hogs only to get an order of magnitude estimate >>>> >>>>> >>>> >>>>> * The Fine grid operator contains 4223139840 non-zeros --> 1.8 GB per MPI rank assuming double precision. >>>> >>>>> The indices for the AIJ could amount to another 0.3 GB (assuming 32 bit integers) >>>> >>>>> >>>> >>>>> * You use 5 levels of coarsening, so the other operators should represent (collectively) >>>> >>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4 ~ 300 MB per MPI rank on the communicator with 18432 ranks. >>>> >>>>> The coarse grid should consume ~ 0.5 MB per MPI rank on the communicator with 18432 ranks. >>>> >>>>> >>>> >>>>> * You use a reduction factor of 64, making the new communicator with 288 MPI ranks. >>>> >>>>> PCTelescope will first gather a temporary matrix associated with your coarse level operator assuming a comm size of 288 living on the comm with size 18432. >>>> >>>>> This matrix will require approximately 0.5 * 64 = 32 MB per core on the 288 ranks. >>>> >>>>> This matrix is then used to form a new MPIAIJ matrix on the subcomm, thus require another 32 MB per rank. >>>> >>>>> The temporary matrix is now destroyed. >>>> >>>>> >>>> >>>>> * Because a DMDA is detected, a permutation matrix is assembled. >>>> >>>>> This requires 2 doubles per point in the DMDA. >>>> >>>>> Your coarse DMDA contains 92 x 16 x 48 points. >>>> >>>>> Thus the permutation matrix will require < 1 MB per MPI rank on the sub-comm. >>>> >>>>> >>>> >>>>> * Lastly, the matrix is permuted. This uses MatPtAP(), but the resulting operator will have the same memory footprint as the unpermuted matrix (32 MB). At any stage in PCTelescope, only 2 operators of size 32 MB are held in memory when the DMDA is provided. >>>> >>>>> >>>> >>>>> From my rough estimates, the worst case memory foot print for any given core, given your options is approximately >>>> >>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB = 2465 MB >>>> >>>>> This is way below 8 GB. >>>> >>>>> >>>> >>>>> Note this estimate completely ignores: >>>> >>>>> (1) the memory required for the restriction operator, >>>> >>>>> (2) the potential growth in the number of non-zeros per row due to Galerkin coarsening (I wished -ksp_view_pre reported the output from MatView so we could see the number of non-zeros required by the coarse level operators) >>>> >>>>> (3) all temporary vectors required by the CG solver, and those required by the smoothers. >>>> >>>>> (4) internal memory allocated by MatPtAP >>>> >>>>> (5) memory associated with IS's used within PCTelescope >>>> >>>>> >>>> >>>>> So either I am completely off in my estimates, or you have not carefully estimated the memory usage of your application code. Hopefully others might examine/correct my rough estimates >>>> >>>>> >>>> >>>>> Since I don't have your code I cannot access the latter. >>>> >>>>> Since I don't have access to the same machine you are running on, I think we need to take a step back. >>>> >>>>> >>>> >>>>> [1] What machine are you running on? Send me a URL if its available >>>> >>>>> >>>> >>>>> [2] What discretization are you using? (I am guessing a scalar 7 point FD stencil) >>>> >>>>> If it's a 7 point FD stencil, we should be able to examine the memory usage of your solver configuration using a standard, light weight existing PETSc example, run on your machine at the same scale. >>>> >>>>> This would hopefully enable us to correctly evaluate the actual memory usage required by the solver configuration you are using. >>>> >>>>> >>>> >>>>> Thanks, >>>> >>>>> Dave >>>> >>>>> >>>> >>>>> >>>> >>>>> Frank >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> On 07/08/2016 10:38 PM, Dave May wrote: >>>> >>>>>> >>>> >>>>>> On Saturday, 9 July 2016, frank wrote: >>>> >>>>>> Hi Barry and Dave, >>>> >>>>>> >>>> >>>>>> Thank both of you for the advice. >>>> >>>>>> >>>> >>>>>> @Barry >>>> >>>>>> I made a mistake in the file names in last email. I attached the correct files this time. >>>> >>>>>> For all the three tests, 'Telescope' is used as the coarse preconditioner. >>>> >>>>>> >>>> >>>>>> == Test1: Grid: 1536*128*384, Process Mesh: 48*4*12 >>>> >>>>>> Part of the memory usage: Vector 125 124 3971904 0. >>>> >>>>>> Matrix 101 101 9462372 0 >>>> >>>>>> >>>> >>>>>> == Test2: Grid: 1536*128*384, Process Mesh: 96*8*24 >>>> >>>>>> Part of the memory usage: Vector 125 124 681672 0. >>>> >>>>>> Matrix 101 101 1462180 0. >>>> >>>>>> >>>> >>>>>> In theory, the memory usage in Test1 should be 8 times of Test2. In my case, it is about 6 times. >>>> >>>>>> >>>> >>>>>> == Test3: Grid: 3072*256*768, Process Mesh: 96*8*24. Sub-domain per process: 32*32*32 >>>> >>>>>> Here I get the out of memory error. >>>> >>>>>> >>>> >>>>>> I tried to use -mg_coarse jacobi. In this way, I don't need to set -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right? >>>> >>>>>> The linear solver didn't work in this case. Petsc output some errors. >>>> >>>>>> >>>> >>>>>> @Dave >>>> >>>>>> In test3, I use only one instance of 'Telescope'. On the coarse mesh of 'Telescope', I used LU as the preconditioner instead of SVD. >>>> >>>>>> If my set the levels correctly, then on the last coarse mesh of MG where it calls 'Telescope', the sub-domain per process is 2*2*2. >>>> >>>>>> On the last coarse mesh of 'Telescope', there is only one grid point per process. >>>> >>>>>> I still got the OOM error. The detailed petsc option file is attached. >>>> >>>>>> >>>> >>>>>> Do you understand the expected memory usage for the particular parallel LU implementation you are using? I don't (seriously). Replace LU with bjacobi and re-run this test. My point about solver debugging is still valid. >>>> >>>>>> >>>> >>>>>> And please send the result of KSPView so we can see what is actually used in the computations >>>> >>>>>> >>>> >>>>>> Thanks >>>> >>>>>> Dave >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> Thank you so much. >>>> >>>>>> >>>> >>>>>> Frank >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote: >>>> >>>>>> On Jul 6, 2016, at 4:19 PM, frank wrote: >>>> >>>>>> >>>> >>>>>> Hi Barry, >>>> >>>>>> >>>> >>>>>> Thank you for you advice. >>>> >>>>>> I tried three test. In the 1st test, the grid is 3072*256*768 and the process mesh is 96*8*24. >>>> >>>>>> The linear solver is 'cg' the preconditioner is 'mg' and 'telescope' is used as the preconditioner at the coarse mesh. >>>> >>>>>> The system gives me the "Out of Memory" error before the linear system is completely solved. >>>> >>>>>> The info from '-ksp_view_pre' is attached. I seems to me that the error occurs when it reaches the coarse mesh. >>>> >>>>>> >>>> >>>>>> The 2nd test uses a grid of 1536*128*384 and process mesh is 96*8*24. The 3rd test uses the same grid but a different process mesh 48*4*12. >>>> >>>>>> Are you sure this is right? The total matrix and vector memory usage goes from 2nd test >>>> >>>>>> Vector 384 383 8,193,712 0. >>>> >>>>>> Matrix 103 103 11,508,688 0. >>>> >>>>>> to 3rd test >>>> >>>>>> Vector 384 383 1,590,520 0. >>>> >>>>>> Matrix 103 103 3,508,664 0. >>>> >>>>>> that is the memory usage got smaller but if you have only 1/8th the processes and the same grid it should have gotten about 8 times bigger. Did you maybe cut the grid by a factor of 8 also? If so that still doesn't explain it because the memory usage changed by a factor of 5 something for the vectors and 3 something for the matrices. >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> The linear solver and petsc options in 2nd and 3rd tests are the same in 1st test. The linear solver works fine in both test. >>>> >>>>>> I attached the memory usage of the 2nd and 3rd tests. The memory info is from the option '-log_summary'. I tried to use '-momery_info' as you suggested, but in my case petsc treated it as an unused option. It output nothing about the memory. Do I need to add sth to my code so I can use '-memory_info'? >>>> >>>>>> Sorry, my mistake the option is -memory_view >>>> >>>>>> >>>> >>>>>> Can you run the one case with -memory_view and -mg_coarse jacobi -ksp_max_it 1 (just so it doesn't iterate forever) to see how much memory is used without the telescope? Also run case 2 the same way. >>>> >>>>>> >>>> >>>>>> Barry >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> In both tests the memory usage is not large. >>>> >>>>>> >>>> >>>>>> It seems to me that it might be the 'telescope' preconditioner that allocated a lot of memory and caused the error in the 1st test. >>>> >>>>>> Is there is a way to show how much memory it allocated? >>>> >>>>>> >>>> >>>>>> Frank >>>> >>>>>> >>>> >>>>>> On 07/05/2016 03:37 PM, Barry Smith wrote: >>>> >>>>>> Frank, >>>> >>>>>> >>>> >>>>>> You can run with -ksp_view_pre to have it "view" the KSP before the solve so hopefully it gets that far. >>>> >>>>>> >>>> >>>>>> Please run the problem that does fit with -memory_info when the problem completes it will show the "high water mark" for PETSc allocated memory and total memory used. We first want to look at these numbers to see if it is using more memory than you expect. You could also run with say half the grid spacing to see how the memory usage scaled with the increase in grid points. Make the runs also with -log_view and send all the output from these options. >>>> >>>>>> >>>> >>>>>> Barry >>>> >>>>>> >>>> >>>>>> On Jul 5, 2016, at 5:23 PM, frank wrote: >>>> >>>>>> >>>> >>>>>> Hi, >>>> >>>>>> >>>> >>>>>> I am using the CG ksp solver and Multigrid preconditioner to solve a linear system in parallel. >>>> >>>>>> I chose to use the 'Telescope' as the preconditioner on the coarse mesh for its good performance. >>>> >>>>>> The petsc options file is attached. >>>> >>>>>> >>>> >>>>>> The domain is a 3d box. >>>> >>>>>> It works well when the grid is 1536*128*384 and the process mesh is 96*8*24. When I double the size of grid and keep the same process mesh and petsc options, I get an "out of memory" error from the super-cluster I am using. >>>> >>>>>> Each process has access to at least 8G memory, which should be more than enough for my application. I am sure that all the other parts of my code( except the linear solver ) do not use much memory. So I doubt if there is something wrong with the linear solver. >>>> >>>>>> The error occurs before the linear system is completely solved so I don't have the info from ksp view. I am not able to re-produce the error with a smaller problem either. >>>> >>>>>> In addition, I tried to use the block jacobi as the preconditioner with the same grid and same decomposition. The linear solver runs extremely slow but there is no memory error. >>>> >>>>>> >>>> >>>>>> How can I diagnose what exactly cause the error? >>>> >>>>>> Thank you so much. >>>> >>>>>> >>>> >>>>>> Frank >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>> >>>> >>>> >>>> >>> >>>> > >>>> >>> >>> >> >> > > From hengjiew at uci.edu Tue Oct 4 14:09:27 2016 From: hengjiew at uci.edu (frank) Date: Tue, 4 Oct 2016 12:09:27 -0700 Subject: [petsc-users] Performance of the Telescope Multigrid Preconditioner In-Reply-To: References: <577C337B.60909@uci.edu> <5783D3E4.4020004@uci.edu> <5786C9C7.1080309@uci.edu> <5959F823-EDE5-4B34-84C2-271076977368@mcs.anl.gov> <0CFDEA05-2C49-4127-9F13-2B2DB71ADA77@mcs.anl.gov> <27f4756a-3c58-5c56-fd5b-000aac881a5b@uci.edu> <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> Message-ID: Hi, On 10/04/2016 11:24 AM, Matthew Knepley wrote: > On Tue, Oct 4, 2016 at 1:13 PM, frank > wrote: > > Hi, > > This question is follow-up of the thread "Question about memory > usage in Multigrid preconditioner". > I used to have the "Out of Memory(OOM)" problem when using the > CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0; > -matptap_scalable" option did solve that problem. > > Then I test the scalability by solving a 3d poisson eqn for 1 > step. I used one sub-communicator in all the tests. The difference > between the petsc options in those tests are: 1 the > pc_telescope_reduction_factor; 2 the number of multigrid levels in > the up/down solver. The function "ksp_solve" is timed. It is kind > of slow and doesn't scale at all. > > > 1) The number of levels cannot be different in the up/down smoothers. > Why are you using a / ? I didn't mean the "up/down smoothers". I mean the "-pc_mg_levels" and "-mg_coarse_telescope_pc_mg_levels". > > 2) We need to see what solver you actually constructed, so give us the > output of -ksp_view > > 3) For any performance questions, we need the output of -log_view I attached the log_view's ouput for all the eight runs. The file is named by the cores# and the grid size. Ex, log_512_4096.txt is log_view from the case using 512^3 grid points and 4096 cores. I attach two ksp_view's output, just in case too many file become messy. The ksp_view for the other tests are quite similar. The only difference is the number of MG levels. > > 4) It looks like you are fixing the number of levels as you scale up. > This makes the coarse problem much bigger, and is not a scalable way > to proceed. > Have you looked at the ratio of coarse grid time to level time? How can I find the ratio? > > 5) Did you look at the options in this paper: > https://arxiv.org/abs/1604.07163 I am going to look at it now Thank you. Frank > > Thanks, > > Matt > > Test1: 512^3 grid points > Core# telescope_reduction_factor MG levels# for up/down > solver Time for KSPSolve (s) > 512 8 4 / 3 6.2466 > 4096 64 5 / 3 0.9361 > 32768 64 4 / 3 4.8914 > > Test2: 1024^3 grid points > Core# telescope_reduction_factor MG levels# for up/down > solver Time for KSPSolve (s) > 4096 64 5 / 4 3.4139 > 8192 128 5 / 4 2.4196 > 16384 32 5 / 3 5.4150 > 32768 64 5 / 3 5.6067 > 65536 128 5 / 3 6.5219 > > I guess I didn't set the MG levels properly. What would be the > efficient way to arrange the MG levels? > Also which preconditionr at the coarse mesh of the 2nd > communicator should I use to improve the performance? > > I attached the test code and the petsc options file for the 1024^3 > cube with 32768 cores. > > Thank you. > > Regards, > Frank > > > > > > > On 09/15/2016 03:35 AM, Dave May wrote: >> HI all, >> >> I the only unexpected memory usage I can see is associated with >> the call to MatPtAP(). >> Here is something you can try immediately. >> Run your code with the additional options >> -matrap 0 -matptap_scalable >> >> I didn't realize this before, but the default behaviour of >> MatPtAP in parallel is actually to to explicitly form the >> transpose of P (e.g. assemble R = P^T) and then compute R.A.P. >> You don't want to do this. The option -matrap 0 resolves this issue. >> >> The implementation of P^T.A.P has two variants. >> The scalable implementation (with respect to memory usage) is >> selected via the second option -matptap_scalable. >> >> Try it out - I see a significant memory reduction using these >> options for particular mesh sizes / partitions. >> >> I've attached a cleaned up version of the code you sent me. >> There were a number of memory leaks and other issues. >> The main points being >> * You should call DMDAVecGetArrayF90() before >> VecAssembly{Begin,End} >> * You should call PetscFinalize(), otherwise the option >> -log_summary (-log_view) will not display anything once the >> program has completed. >> >> >> Thanks, >> Dave >> >> >> On 15 September 2016 at 08:03, Hengjie Wang > > wrote: >> >> Hi Dave, >> >> Sorry, I should have put more comment to explain the code. >> The number of process in each dimension is the same: Px = >> Py=Pz=P. So is the domain size. >> So if the you want to run the code for a 512^3 grid points on >> 16^3 cores, you need to set "-N 512 -P 16" in the command line. >> I add more comments and also fix an error in the attached >> code. ( The error only effects the accuracy of solution but >> not the memory usage. ) >> >> Thank you. >> Frank >> >> >> On 9/14/2016 9:05 PM, Dave May wrote: >>> >>> >>> On Thursday, 15 September 2016, Dave May >>> > >>> wrote: >>> >>> >>> >>> On Thursday, 15 September 2016, frank >>> wrote: >>> >>> Hi, >>> >>> I write a simple code to re-produce the error. I >>> hope this can help to diagnose the problem. >>> The code just solves a 3d poisson equation. >>> >>> >>> Why is the stencil width a runtime parameter?? And why >>> is the default value 2? For 7-pnt FD Laplace, you only >>> need a stencil width of 1. >>> >>> Was this choice made to mimic something in the >>> real application code? >>> >>> >>> Please ignore - I misunderstood your usage of the param set >>> by -P >>> >>> >>> I run the code on a 1024^3 mesh. The process >>> partition is 32 * 32 * 32. That's when I re-produce >>> the OOM error. Each core has about 2G memory. >>> I also run the code on a 512^3 mesh with 16 * 16 * >>> 16 processes. The ksp solver works fine. >>> I attached the code, ksp_view_pre's output and my >>> petsc option file. >>> >>> Thank you. >>> Frank >>> >>> On 09/09/2016 06:38 PM, Hengjie Wang wrote: >>>> Hi Barry, >>>> >>>> I checked. On the supercomputer, I had the option >>>> "-ksp_view_pre" but it is not in file I sent you. I >>>> am sorry for the confusion. >>>> >>>> Regards, >>>> Frank >>>> >>>> On Friday, September 9, 2016, Barry Smith >>>> wrote: >>>> >>>> >>>> > On Sep 9, 2016, at 3:11 PM, frank >>>> wrote: >>>> > >>>> > Hi Barry, >>>> > >>>> > I think the first KSP view output is from >>>> -ksp_view_pre. Before I submitted the test, I >>>> was not sure whether there would be OOM error >>>> or not. So I added both -ksp_view_pre and >>>> -ksp_view. >>>> >>>> But the options file you sent specifically >>>> does NOT list the -ksp_view_pre so how could it >>>> be from that? >>>> >>>> Sorry to be pedantic but I've spent too much >>>> time in the past trying to debug from incorrect >>>> information and want to make sure that the >>>> information I have is correct before thinking. >>>> Please recheck exactly what happened. Rerun >>>> with the exact input file you emailed if that >>>> is needed. >>>> >>>> Barry >>>> >>>> > >>>> > Frank >>>> > >>>> > >>>> > On 09/09/2016 12:38 PM, Barry Smith wrote: >>>> >> Why does ksp_view2.txt have two KSP views >>>> in it while ksp_view1.txt has only one KSPView >>>> in it? Did you run two different solves in the >>>> 2 case but not the one? >>>> >> >>>> >> Barry >>>> >> >>>> >> >>>> >> >>>> >>> On Sep 9, 2016, at 10:56 AM, frank >>>> wrote: >>>> >>> >>>> >>> Hi, >>>> >>> >>>> >>> I want to continue digging into the memory >>>> problem here. >>>> >>> I did find a work around in the past, which >>>> is to use less cores per node so that each core >>>> has 8G memory. However this is deficient and >>>> expensive. I hope to locate the place that uses >>>> the most memory. >>>> >>> >>>> >>> Here is a brief summary of the tests I did >>>> in past: >>>> >>>> Test1: Mesh 1536*128*384 | Process Mesh >>>> 48*4*12 >>>> >>> Maximum (over computational time) process >>>> memory: total 7.0727e+08 >>>> >>> Current process memory: >>>> total 7.0727e+08 >>>> >>> Maximum (over computational time) space >>>> PetscMalloc()ed: total 6.3908e+11 >>>> >>> Current space PetscMalloc()ed: >>>> total 1.8275e+09 >>>> >>> >>>> >>>> Test2: Mesh 1536*128*384 | Process Mesh >>>> 96*8*24 >>>> >>> Maximum (over computational time) process >>>> memory: total 5.9431e+09 >>>> >>> Current process memory: >>>> total 5.9431e+09 >>>> >>> Maximum (over computational time) space >>>> PetscMalloc()ed: total 5.3202e+12 >>>> >>> Current space PetscMalloc()ed: >>>> total 5.4844e+09 >>>> >>> >>>> >>>> Test3: Mesh 3072*256*768 | Process Mesh >>>> 96*8*24 >>>> >>> OOM( Out Of Memory ) killer of the >>>> supercomputer terminated the job during "KSPSolve". >>>> >>> >>>> >>> I attached the output of ksp_view( the >>>> third test's output is from ksp_view_pre ), >>>> memory_view and also the petsc options. >>>> >>> >>>> >>> In all the tests, each core can access >>>> about 2G memory. In test3, there are 4223139840 >>>> non-zeros in the matrix. This will consume >>>> about 1.74M, using double precision. >>>> Considering some extra memory used to store >>>> integer index, 2G memory should still be way >>>> enough. >>>> >>> >>>> >>> Is there a way to find out which part of >>>> KSPSolve uses the most memory? >>>> >>> Thank you so much. >>>> >>> >>>> >>> BTW, there are 4 options remains unused and >>>> I don't understand why they are omitted: >>>> >>> -mg_coarse_telescope_mg_coarse_ksp_type >>>> value: preonly >>>> >>> -mg_coarse_telescope_mg_coarse_pc_type >>>> value: bjacobi >>>> >>> -mg_coarse_telescope_mg_levels_ksp_max_it >>>> value: 1 >>>> >>> -mg_coarse_telescope_mg_levels_ksp_type >>>> value: richardson >>>> >>> >>>> >>> >>>> >>> Regards, >>>> >>> Frank >>>> >>> >>>> >>> On 07/13/2016 05:47 PM, Dave May wrote: >>>> >>>> >>>> >>>> On 14 July 2016 at 01:07, frank >>>> wrote: >>>> >>>> Hi Dave, >>>> >>>> >>>> >>>> Sorry for the late reply. >>>> >>>> Thank you so much for your detailed reply. >>>> >>>> >>>> >>>> I have a question about the estimation of >>>> the memory usage. There are 4223139840 >>>> allocated non-zeros and 18432 MPI processes. >>>> Double precision is used. So the memory per >>>> process is: >>>> >>>> 4223139840 * 8bytes / 18432 / 1024 / 1024 >>>> = 1.74M ? >>>> >>>> Did I do sth wrong here? Because this >>>> seems too small. >>>> >>>> >>>> >>>> No - I totally f***ed it up. You are >>>> correct. That'll teach me for fumbling around >>>> with my iphone calculator and not using my >>>> brain. (Note that to convert to MB just divide >>>> by 1e6, not 1024^2 - although I apparently >>>> cannot convert between units correctly....) >>>> >>>> >>>> >>>> From the PETSc objects associated with the >>>> solver, It looks like it _should_ run with 2GB >>>> per MPI rank. Sorry for my mistake. >>>> Possibilities are: somewhere in your usage of >>>> PETSc you've introduced a memory leak; PETSc is >>>> doing a huge over allocation (e.g. as per our >>>> discussion of MatPtAP); or in your application >>>> code there are other objects you have forgotten >>>> to log the memory for. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> I am running this job on Bluewater >>>> >>>> I am using the 7 points FD stencil in 3D. >>>> >>>> >>>> >>>> I thought so on both counts. >>>> >>>> >>>> >>>> I apologize that I made a stupid mistake >>>> in computing the memory per core. My settings >>>> render each core can access only 2G memory on >>>> average instead of 8G which I mentioned in >>>> previous email. I re-run the job with 8G memory >>>> per core on average and there is no "Out Of >>>> Memory" error. I would do more test to see if >>>> there is still some memory issue. >>>> >>>> >>>> >>>> Ok. I'd still like to know where the >>>> memory was being used since my estimates were off. >>>> >>>> >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Dave >>>> >>>> >>>> >>>> Regards, >>>> >>>> Frank >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 07/11/2016 01:18 PM, Dave May wrote: >>>> >>>>> Hi Frank, >>>> >>>>> >>>> >>>>> >>>> >>>>> On 11 July 2016 at 19:14, frank >>>> wrote: >>>> >>>>> Hi Dave, >>>> >>>>> >>>> >>>>> I re-run the test using bjacobi as the >>>> preconditioner on the coarse mesh of telescope. >>>> The Grid is 3072*256*768 and process mesh is >>>> 96*8*24. The petsc option file is attached. >>>> >>>>> I still got the "Out Of Memory" error. >>>> The error occurred before the linear solver >>>> finished one step. So I don't have the full >>>> info from ksp_view. The info from ksp_view_pre >>>> is attached. >>>> >>>>> >>>> >>>>> Okay - that is essentially useless (sorry) >>>> >>>>> >>>> >>>>> It seems to me that the error occurred >>>> when the decomposition was going to be changed. >>>> >>>>> >>>> >>>>> Based on what information? >>>> >>>>> Running with -info would give us more >>>> clues, but will create a ton of output. >>>> >>>>> Please try running the case which failed >>>> with -info >>>> >>>>> I had another test with a grid of >>>> 1536*128*384 and the same process mesh as >>>> above. There was no error. The ksp_view info is >>>> attached for comparison. >>>> >>>>> Thank you. >>>> >>>>> >>>> >>>>> >>>> >>>>> [3] Here is my crude estimate of your >>>> memory usage. >>>> >>>>> I'll target the biggest memory hogs only >>>> to get an order of magnitude estimate >>>> >>>>> >>>> >>>>> * The Fine grid operator contains >>>> 4223139840 non-zeros --> 1.8 GB per MPI rank >>>> assuming double precision. >>>> >>>>> The indices for the AIJ could amount to >>>> another 0.3 GB (assuming 32 bit integers) >>>> >>>>> >>>> >>>>> * You use 5 levels of coarsening, so the >>>> other operators should represent (collectively) >>>> >>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4 ~ >>>> 300 MB per MPI rank on the communicator with >>>> 18432 ranks. >>>> >>>>> The coarse grid should consume ~ 0.5 MB >>>> per MPI rank on the communicator with 18432 ranks. >>>> >>>>> >>>> >>>>> * You use a reduction factor of 64, >>>> making the new communicator with 288 MPI ranks. >>>> >>>>> PCTelescope will first gather a temporary >>>> matrix associated with your coarse level >>>> operator assuming a comm size of 288 living on >>>> the comm with size 18432. >>>> >>>>> This matrix will require approximately >>>> 0.5 * 64 = 32 MB per core on the 288 ranks. >>>> >>>>> This matrix is then used to form a new >>>> MPIAIJ matrix on the subcomm, thus require >>>> another 32 MB per rank. >>>> >>>>> The temporary matrix is now destroyed. >>>> >>>>> >>>> >>>>> * Because a DMDA is detected, a >>>> permutation matrix is assembled. >>>> >>>>> This requires 2 doubles per point in the >>>> DMDA. >>>> >>>>> Your coarse DMDA contains 92 x 16 x 48 >>>> points. >>>> >>>>> Thus the permutation matrix will require >>>> < 1 MB per MPI rank on the sub-comm. >>>> >>>>> >>>> >>>>> * Lastly, the matrix is permuted. This >>>> uses MatPtAP(), but the resulting operator will >>>> have the same memory footprint as the >>>> unpermuted matrix (32 MB). At any stage in >>>> PCTelescope, only 2 operators of size 32 MB are >>>> held in memory when the DMDA is provided. >>>> >>>>> >>>> >>>>> From my rough estimates, the worst case >>>> memory foot print for any given core, given >>>> your options is approximately >>>> >>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB >>>> = 2465 MB >>>> >>>>> This is way below 8 GB. >>>> >>>>> >>>> >>>>> Note this estimate completely ignores: >>>> >>>>> (1) the memory required for the >>>> restriction operator, >>>> >>>>> (2) the potential growth in the number of >>>> non-zeros per row due to Galerkin coarsening (I >>>> wished -ksp_view_pre reported the output from >>>> MatView so we could see the number of non-zeros >>>> required by the coarse level operators) >>>> >>>>> (3) all temporary vectors required by the >>>> CG solver, and those required by the smoothers. >>>> >>>>> (4) internal memory allocated by MatPtAP >>>> >>>>> (5) memory associated with IS's used >>>> within PCTelescope >>>> >>>>> >>>> >>>>> So either I am completely off in my >>>> estimates, or you have not carefully estimated >>>> the memory usage of your application code. >>>> Hopefully others might examine/correct my rough >>>> estimates >>>> >>>>> >>>> >>>>> Since I don't have your code I cannot >>>> access the latter. >>>> >>>>> Since I don't have access to the same >>>> machine you are running on, I think we need to >>>> take a step back. >>>> >>>>> >>>> >>>>> [1] What machine are you running on? Send >>>> me a URL if its available >>>> >>>>> >>>> >>>>> [2] What discretization are you using? (I >>>> am guessing a scalar 7 point FD stencil) >>>> >>>>> If it's a 7 point FD stencil, we should >>>> be able to examine the memory usage of your >>>> solver configuration using a standard, light >>>> weight existing PETSc example, run on your >>>> machine at the same scale. >>>> >>>>> This would hopefully enable us to >>>> correctly evaluate the actual memory usage >>>> required by the solver configuration you are using. >>>> >>>>> >>>> >>>>> Thanks, >>>> >>>>> Dave >>>> >>>>> >>>> >>>>> >>>> >>>>> Frank >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> On 07/08/2016 10:38 PM, Dave May wrote: >>>> >>>>>> >>>> >>>>>> On Saturday, 9 July 2016, frank >>>> wrote: >>>> >>>>>> Hi Barry and Dave, >>>> >>>>>> >>>> >>>>>> Thank both of you for the advice. >>>> >>>>>> >>>> >>>>>> @Barry >>>> >>>>>> I made a mistake in the file names in >>>> last email. I attached the correct files this time. >>>> >>>>>> For all the three tests, 'Telescope' is >>>> used as the coarse preconditioner. >>>> >>>>>> >>>> >>>>>> == Test1: Grid: 1536*128*384, >>>> Process Mesh: 48*4*12 >>>> >>>>>> Part of the memory usage: Vector 125 >>>> 124 3971904 0. >>>> >>>>>> Matrix 101 101 >>>> 9462372 0 >>>> >>>>>> >>>> >>>>>> == Test2: Grid: 1536*128*384, Process >>>> Mesh: 96*8*24 >>>> >>>>>> Part of the memory usage: Vector 125 >>>> 124 681672 0. >>>> >>>>>> Matrix 101 101 >>>> 1462180 0. >>>> >>>>>> >>>> >>>>>> In theory, the memory usage in Test1 >>>> should be 8 times of Test2. In my case, it is >>>> about 6 times. >>>> >>>>>> >>>> >>>>>> == Test3: Grid: 3072*256*768, Process >>>> Mesh: 96*8*24. Sub-domain per process: 32*32*32 >>>> >>>>>> Here I get the out of memory error. >>>> >>>>>> >>>> >>>>>> I tried to use -mg_coarse jacobi. In >>>> this way, I don't need to set >>>> -mg_coarse_ksp_type and -mg_coarse_pc_type >>>> explicitly, right? >>>> >>>>>> The linear solver didn't work in this >>>> case. Petsc output some errors. >>>> >>>>>> >>>> >>>>>> @Dave >>>> >>>>>> In test3, I use only one instance of >>>> 'Telescope'. On the coarse mesh of 'Telescope', >>>> I used LU as the preconditioner instead of SVD. >>>> >>>>>> If my set the levels correctly, then on >>>> the last coarse mesh of MG where it calls >>>> 'Telescope', the sub-domain per process is 2*2*2. >>>> >>>>>> On the last coarse mesh of 'Telescope', >>>> there is only one grid point per process. >>>> >>>>>> I still got the OOM error. The detailed >>>> petsc option file is attached. >>>> >>>>>> >>>> >>>>>> Do you understand the expected memory >>>> usage for the particular parallel LU >>>> implementation you are using? I don't >>>> (seriously). Replace LU with bjacobi and re-run >>>> this test. My point about solver debugging is >>>> still valid. >>>> >>>>>> >>>> >>>>>> And please send the result of KSPView so >>>> we can see what is actually used in the >>>> computations >>>> >>>>>> >>>> >>>>>> Thanks >>>> >>>>>> Dave >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> Thank you so much. >>>> >>>>>> >>>> >>>>>> Frank >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote: >>>> >>>>>> On Jul 6, 2016, at 4:19 PM, frank >>>> wrote: >>>> >>>>>> >>>> >>>>>> Hi Barry, >>>> >>>>>> >>>> >>>>>> Thank you for you advice. >>>> >>>>>> I tried three test. In the 1st test, the >>>> grid is 3072*256*768 and the process mesh is >>>> 96*8*24. >>>> >>>>>> The linear solver is 'cg' the >>>> preconditioner is 'mg' and 'telescope' is used >>>> as the preconditioner at the coarse mesh. >>>> >>>>>> The system gives me the "Out of Memory" >>>> error before the linear system is completely >>>> solved. >>>> >>>>>> The info from '-ksp_view_pre' is >>>> attached. I seems to me that the error occurs >>>> when it reaches the coarse mesh. >>>> >>>>>> >>>> >>>>>> The 2nd test uses a grid of 1536*128*384 >>>> and process mesh is 96*8*24. The 3rd test uses >>>> the same grid but a different process mesh 48*4*12. >>>> >>>>>> Are you sure this is right? The total >>>> matrix and vector memory usage goes from 2nd test >>>> >>>>>> Vector 384 383 >>>> 8,193,712 0. >>>> >>>>>> Matrix 103 103 >>>> 11,508,688 0. >>>> >>>>>> to 3rd test >>>> >>>>>> Vector 384 383 >>>> 1,590,520 0. >>>> >>>>>> Matrix 103 103 >>>> 3,508,664 0. >>>> >>>>>> that is the memory usage got smaller but >>>> if you have only 1/8th the processes and the >>>> same grid it should have gotten about 8 times >>>> bigger. Did you maybe cut the grid by a factor >>>> of 8 also? If so that still doesn't explain it >>>> because the memory usage changed by a factor of >>>> 5 something for the vectors and 3 something for >>>> the matrices. >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> The linear solver and petsc options in >>>> 2nd and 3rd tests are the same in 1st test. The >>>> linear solver works fine in both test. >>>> >>>>>> I attached the memory usage of the 2nd >>>> and 3rd tests. The memory info is from the >>>> option '-log_summary'. I tried to use >>>> '-momery_info' as you suggested, but in my case >>>> petsc treated it as an unused option. It output >>>> nothing about the memory. Do I need to add sth >>>> to my code so I can use '-memory_info'? >>>> >>>>>> Sorry, my mistake the option is >>>> -memory_view >>>> >>>>>> >>>> >>>>>> Can you run the one case with >>>> -memory_view and -mg_coarse jacobi -ksp_max_it >>>> 1 (just so it doesn't iterate forever) to see >>>> how much memory is used without the telescope? >>>> Also run case 2 the same way. >>>> >>>>>> >>>> >>>>>> Barry >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> In both tests the memory usage is not large. >>>> >>>>>> >>>> >>>>>> It seems to me that it might be the >>>> 'telescope' preconditioner that allocated a lot >>>> of memory and caused the error in the 1st test. >>>> >>>>>> Is there is a way to show how much >>>> memory it allocated? >>>> >>>>>> >>>> >>>>>> Frank >>>> >>>>>> >>>> >>>>>> On 07/05/2016 03:37 PM, Barry Smith wrote: >>>> >>>>>> Frank, >>>> >>>>>> >>>> >>>>>> You can run with -ksp_view_pre to >>>> have it "view" the KSP before the solve so >>>> hopefully it gets that far. >>>> >>>>>> >>>> >>>>>> Please run the problem that does >>>> fit with -memory_info when the problem >>>> completes it will show the "high water mark" >>>> for PETSc allocated memory and total memory >>>> used. We first want to look at these numbers to >>>> see if it is using more memory than you expect. >>>> You could also run with say half the grid >>>> spacing to see how the memory usage scaled with >>>> the increase in grid points. Make the runs also >>>> with -log_view and send all the output from >>>> these options. >>>> >>>>>> >>>> >>>>>> Barry >>>> >>>>>> >>>> >>>>>> On Jul 5, 2016, at 5:23 PM, frank >>>> wrote: >>>> >>>>>> >>>> >>>>>> Hi, >>>> >>>>>> >>>> >>>>>> I am using the CG ksp solver and >>>> Multigrid preconditioner to solve a linear >>>> system in parallel. >>>> >>>>>> I chose to use the 'Telescope' as the >>>> preconditioner on the coarse mesh for its good >>>> performance. >>>> >>>>>> The petsc options file is attached. >>>> >>>>>> >>>> >>>>>> The domain is a 3d box. >>>> >>>>>> It works well when the grid is >>>> 1536*128*384 and the process mesh is 96*8*24. >>>> When I double the size of grid and >>>> keep the same process mesh and petsc options, >>>> I get an "out of memory" error from the >>>> super-cluster I am using. >>>> >>>>>> Each process has access to at least 8G >>>> memory, which should be more than enough for my >>>> application. I am sure that all the other parts >>>> of my code( except the linear solver ) do not >>>> use much memory. So I doubt if there is >>>> something wrong with the linear solver. >>>> >>>>>> The error occurs before the linear >>>> system is completely solved so I don't have the >>>> info from ksp view. I am not able to re-produce >>>> the error with a smaller problem either. >>>> >>>>>> In addition, I tried to use the block >>>> jacobi as the preconditioner with the same grid >>>> and same decomposition. The linear solver runs >>>> extremely slow but there is no memory error. >>>> >>>>>> >>>> >>>>>> How can I diagnose what exactly cause >>>> the error? >>>> >>>>>> Thank you so much. >>>> >>>>>> >>>> >>>>>> Frank >>>> >>>>>> >>>> >>>>>> >>>> >>>> >>>>>> >>>> >>>>> >>>> >>>> >>>> >>> >>>> >>>> > >>>> >>> >> >> > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- Linear solve converged due to CONVERGED_RTOL iterations 7 KSP Object: 4096 MPI processes type: cg maximum iterations=10000 tolerances: relative=1e-07, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using UNPRECONDITIONED norm type for convergence test PC Object: 4096 MPI processes type: mg MG: type is MULTIPLICATIVE, levels=5 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 4096 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 4096 MPI processes type: telescope Telescope: parent comm size reduction factor = 64 Telescope: comm_size = 4096 , subcomm_size = 64 Telescope: DMDA detected DMDA Object: (repart_) 64 MPI processes M 32 N 32 P 32 m 4 n 4 p 4 dof 1 overlap 1 KSP Object: (mg_coarse_telescope_) 64 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_telescope_) 64 MPI processes type: mg MG: type is MULTIPLICATIVE, levels=3 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_telescope_mg_coarse_) 64 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_telescope_mg_coarse_) 64 MPI processes type: redundant Redundant preconditioner: First (color=0) of 64 PCs follows linear system matrix = precond matrix: Mat Object: 64 MPI processes type: mpiaij rows=512, cols=512 total: nonzeros=13824, allocated nonzeros=13824 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 2 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_coarse_telescope_mg_levels_1_) 64 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_coarse_telescope_mg_levels_1_) 64 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 64 MPI processes type: mpiaij rows=4096, cols=4096 total: nonzeros=110592, allocated nonzeros=110592 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_coarse_telescope_mg_levels_2_) 64 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_coarse_telescope_mg_levels_2_) 64 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 64 MPI processes type: mpiaij rows=32768, cols=32768 total: nonzeros=884736, allocated nonzeros=884736 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 64 MPI processes type: mpiaij rows=32768, cols=32768 total: nonzeros=884736, allocated nonzeros=884736 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object: (mg_coarse_telescope_mg_coarse_redundant_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_telescope_mg_coarse_redundant_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5., needed 8.69575 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=512, cols=512 package used to perform factorization: petsc total: nonzeros=120210, allocated nonzeros=120210 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=512, cols=512 total: nonzeros=13824, allocated nonzeros=13824 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 4096 MPI processes type: mpiaij rows=32768, cols=32768 total: nonzeros=884736, allocated nonzeros=884736 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 2 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 4096 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 4096 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 4096 MPI processes type: mpiaij rows=262144, cols=262144 total: nonzeros=7077888, allocated nonzeros=7077888 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 4096 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 4096 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 4096 MPI processes type: mpiaij rows=2097152, cols=2097152 total: nonzeros=56623104, allocated nonzeros=56623104 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 4096 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_3_) 4096 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 4096 MPI processes type: mpiaij rows=16777216, cols=16777216 total: nonzeros=452984832, allocated nonzeros=452984832 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 4 ------------------------------- KSP Object: (mg_levels_4_) 4096 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_4_) 4096 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 4096 MPI processes type: mpiaij rows=134217728, cols=134217728 total: nonzeros=939524096, allocated nonzeros=939524096 total number of mallocs used during MatSetValues calls =0 has attached null space Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 4096 MPI processes type: mpiaij rows=134217728, cols=134217728 total: nonzeros=939524096, allocated nonzeros=939524096 total number of mallocs used during MatSetValues calls =0 has attached null space -------------- next part -------------- Linear solve converged due to CONVERGED_RTOL iterations 8 KSP Object: 8192 MPI processes type: cg maximum iterations=10000 tolerances: relative=1e-07, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using UNPRECONDITIONED norm type for convergence test PC Object: 8192 MPI processes type: mg MG: type is MULTIPLICATIVE, levels=5 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 8192 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 8192 MPI processes type: telescope Telescope: parent comm size reduction factor = 128 Telescope: comm_size = 8192 , subcomm_size = 64 Telescope: DMDA detected DMDA Object: (repart_) 64 MPI processes M 64 N 64 P 64 m 4 n 4 p 4 dof 1 overlap 1 KSP Object: (mg_coarse_telescope_) 64 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_telescope_) 64 MPI processes type: mg MG: type is MULTIPLICATIVE, levels=4 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_telescope_mg_coarse_) 64 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_telescope_mg_coarse_) 64 MPI processes type: redundant Redundant preconditioner: First (color=0) of 64 PCs follows linear system matrix = precond matrix: Mat Object: 64 MPI processes type: mpiaij rows=512, cols=512 total: nonzeros=13824, allocated nonzeros=13824 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 2 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_coarse_telescope_mg_levels_1_) 64 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_coarse_telescope_mg_levels_1_) 64 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 64 MPI processes type: mpiaij rows=4096, cols=4096 total: nonzeros=110592, allocated nonzeros=110592 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_coarse_telescope_mg_levels_2_) 64 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_coarse_telescope_mg_levels_2_) 64 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 64 MPI processes type: mpiaij rows=32768, cols=32768 total: nonzeros=884736, allocated nonzeros=884736 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_coarse_telescope_mg_levels_3_) 64 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_coarse_telescope_mg_levels_3_) 64 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 64 MPI processes type: mpiaij rows=262144, cols=262144 total: nonzeros=7077888, allocated nonzeros=7077888 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 64 MPI processes type: mpiaij rows=262144, cols=262144 total: nonzeros=7077888, allocated nonzeros=7077888 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object: (mg_coarse_telescope_mg_coarse_redundant_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_telescope_mg_coarse_redundant_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5., needed 8.69575 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=512, cols=512 package used to perform factorization: petsc total: nonzeros=120210, allocated nonzeros=120210 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=512, cols=512 total: nonzeros=13824, allocated nonzeros=13824 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 8192 MPI processes type: mpiaij rows=262144, cols=262144 total: nonzeros=7077888, allocated nonzeros=7077888 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 16 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 8192 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 8192 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 8192 MPI processes type: mpiaij rows=2097152, cols=2097152 total: nonzeros=56623104, allocated nonzeros=56623104 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 8192 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 8192 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 8192 MPI processes type: mpiaij rows=16777216, cols=16777216 total: nonzeros=452984832, allocated nonzeros=452984832 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 8192 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_3_) 8192 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 8192 MPI processes type: mpiaij rows=134217728, cols=134217728 total: nonzeros=3623878656, allocated nonzeros=3623878656 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 4 ------------------------------- KSP Object: (mg_levels_4_) 8192 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_4_) 8192 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 8192 MPI processes type: mpiaij rows=1073741824, cols=1073741824 total: nonzeros=7516192768, allocated nonzeros=7516192768 total number of mallocs used during MatSetValues calls =0 has attached null space Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 8192 MPI processes type: mpiaij rows=1073741824, cols=1073741824 total: nonzeros=7516192768, allocated nonzeros=7516192768 total number of mallocs used during MatSetValues calls =0 has attached null space -------------- next part -------------- Linear solve converged due to CONVERGED_RTOL iterations 7 1 step time: 6.2466299533843994 norm1 error: 1.2135791829058829E-005 norm inf error: 1.0512737852365958E-002 Summary of Memory Usage in PETSc Maximum (over computational time) process memory: total 8.0407e+07 max 1.9696e+05 min 1.5078e+05 Current process memory: total 8.0407e+07 max 1.9696e+05 min 1.5078e+05 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./test_ksp.exe on a gnu-opt named . with 512 processors, by wang11 Tue Oct 4 05:04:05 2016 Using Petsc Development GIT revision: v3.6.3-2059-geab7831 GIT Date: 2016-01-20 10:58:35 -0600 Max Max/Min Avg Total Time (sec): 7.128e+00 1.00215 7.121e+00 Objects: 3.330e+02 1.72539 2.105e+02 Flops: 2.508e+09 9.15893 5.530e+08 2.832e+11 Flops/sec: 3.521e+08 9.16346 7.765e+07 3.976e+10 MPI Messages: 3.918e+03 2.07713 2.157e+03 1.104e+06 MPI Message Lengths: 1.003e+07 1.17554 4.064e+03 4.488e+09 MPI Reductions: 4.310e+02 1.60223 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 7.1208e+00 100.0% 2.8316e+11 100.0% 1.104e+06 100.0% 4.064e+03 100.0% 2.882e+02 66.9% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage BuildTwoSidedF 1 1.0 2.5056e-0217.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecTDot 14 1.0 6.0542e-02 1.6 7.34e+06 1.0 0.0e+00 0.0e+00 1.4e+01 1 1 0 0 3 1 1 0 0 5 62074 VecNorm 8 1.0 3.5572e-02 3.1 4.19e+06 1.0 0.0e+00 0.0e+00 8.0e+00 0 1 0 0 2 0 1 0 0 3 60370 VecScale 28 2.0 2.1243e-04 1.8 7.35e+04 1.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 144250 VecCopy 9 1.0 3.8947e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 193 1.8 1.6343e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 28 1.0 1.0030e-01 1.1 1.47e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 74940 VecAYPX 48 1.4 6.3155e-02 1.6 7.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 57380 VecAssemblyBegin 1 1.0 2.5080e-0217.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 1 1.0 2.2888e-0512.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 194 1.6 3.9131e-02 1.6 0.00e+00 0.0 7.2e+05 4.1e+03 0.0e+00 0 0 65 65 0 0 0 65 65 0 0 VecScatterEnd 194 1.6 3.4133e+0068.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 42 0 0 0 0 42 0 0 0 0 0 MatMult 56 1.3 5.0448e-01 1.2 8.70e+07 1.0 2.9e+05 8.2e+03 0.0e+00 6 15 26 53 0 6 15 26 53 0 86737 MatMultAdd 35 1.7 8.0332e-02 1.2 1.43e+07 1.0 8.2e+04 1.5e+03 0.0e+00 1 3 7 3 0 1 3 7 3 0 90220 MatMultTranspose 47 1.5 1.1686e-01 1.4 1.64e+07 1.0 1.1e+05 1.4e+03 0.0e+00 1 3 10 3 0 1 3 10 3 0 70913 MatSolve 7 0.0 5.4884e-02 0.0 4.38e+07 0.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 51106 MatSOR 70 1.7 7.4662e-01 1.1 8.85e+07 1.0 2.1e+05 1.2e+03 1.8e+00 10 15 19 5 0 10 15 19 5 1 58271 MatLUFactorSym 1 0.0 1.3002e-01 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLUFactorNum 1 0.0 3.0343e+00 0.0 2.18e+09 0.0 0.0e+00 0.0e+00 0.0e+00 5 49 0 0 0 5 49 0 0 0 46035 MatConvert 1 0.0 1.4801e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatResidual 35 1.7 2.5246e-01 1.3 4.14e+07 1.0 2.3e+05 4.1e+03 0.0e+00 3 7 21 21 0 3 7 21 21 0 80802 MatAssemblyBegin 29 1.5 6.2687e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.1e+01 1 0 0 0 5 1 0 0 0 7 0 MatAssemblyEnd 29 1.5 2.8406e-01 1.0 0.00e+00 0.0 1.5e+05 5.4e+02 7.7e+01 4 0 14 2 18 4 0 14 2 27 0 MatGetRowIJ 1 0.0 1.1208e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetSubMatrice 2 2.0 4.1284e-02 9.3 0.00e+00 0.0 2.2e+03 3.4e+04 3.5e+00 0 0 0 2 1 0 0 0 2 1 0 MatGetOrdering 1 0.0 7.9041e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatPtAP 6 1.5 1.0306e+00 1.0 4.18e+07 1.0 3.1e+05 4.4e+03 7.2e+01 14 7 28 30 17 14 7 28 30 25 20208 MatPtAPSymbolic 6 1.5 4.9107e-01 1.0 0.00e+00 0.0 1.8e+05 5.3e+03 3.0e+01 7 0 16 21 7 7 0 16 21 10 0 MatPtAPNumeric 6 1.5 5.3958e-01 1.0 4.18e+07 1.0 1.3e+05 3.0e+03 4.2e+01 7 7 11 9 10 7 7 11 9 15 38597 MatRedundantMat 1 0.0 2.7650e-02 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e-01 0 0 0 0 0 0 0 0 0 0 0 MatMPIConcateSeq 1 0.0 1.6951e-02 0.0 0.00e+00 0.0 3.3e+03 1.4e+02 1.9e+00 0 0 0 0 0 0 0 0 0 1 0 MatGetLocalMat 6 1.5 4.7763e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatGetBrAoCol 6 1.5 4.1229e-02 1.2 0.00e+00 0.0 1.4e+05 5.5e+03 0.0e+00 1 0 13 17 0 1 0 13 17 0 0 MatGetSymTrans 12 1.5 1.4412e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DMCoarsen 5 1.7 8.8470e-03 1.4 0.00e+00 0.0 2.0e+04 8.4e+02 3.6e+01 0 0 2 0 8 0 0 2 0 12 0 DMCreateInterpolation 5 1.7 2.1848e-01 1.0 2.05e+06 1.0 3.5e+04 7.5e+02 5.2e+01 3 0 3 1 12 3 0 3 1 18 4739 KSPSetUp 10 2.0 1.9465e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 3 0 0 0 0 4 0 KSPSolve 1 1.0 6.2467e+00 1.0 2.51e+09 9.2 1.1e+06 4.0e+03 2.6e+02 88100 99 98 60 88100 99 98 90 45330 PCSetUp 2 2.0 4.5211e+00 3.6 2.23e+0952.3 3.8e+05 3.8e+03 2.1e+02 23 57 35 33 48 23 57 35 33 72 35732 PCApply 7 1.0 4.6845e+00 1.0 2.42e+0913.0 7.2e+05 3.1e+03 3.0e+01 66 84 65 50 7 66 84 65 50 11 50783 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 133 133 29053936 0. Vector Scatter 24 24 2464384 0. Matrix 58 58 118369764 0. Matrix Null Space 1 1 592 0. Distributed Mesh 7 7 34944 0. Star Forest Bipartite Graph 14 14 11872 0. Discrete System 7 7 5992 0. Index Set 54 54 1628276 0. IS L to G Mapping 7 7 1367088 0. Krylov Solver 11 11 13640 0. DMKSP interface 5 5 3240 0. Preconditioner 11 11 11008 0. Viewer 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 1.90735e-07 Average time for MPI_Barrier(): 1.87874e-05 Average time for zero size MPI_Send(): 1.10432e-05 #PETSc Option Table entries: -ksp_converged_reason -ksp_initial_guess_nonzero yes -ksp_norm_type unpreconditioned -ksp_rtol 1e-7 -ksp_type cg -log_view -matptap_scalable -matrap 0 -memory_view -mg_coarse_ksp_type preonly -mg_coarse_pc_telescope_reduction_factor 8 -mg_coarse_pc_type telescope -mg_coarse_telescope_ksp_type preonly -mg_coarse_telescope_mg_coarse_ksp_type preonly -mg_coarse_telescope_mg_coarse_pc_type redundant -mg_coarse_telescope_mg_levels_ksp_max_it 1 -mg_coarse_telescope_mg_levels_ksp_type richardson -mg_coarse_telescope_pc_mg_galerkin -mg_coarse_telescope_pc_mg_levels 3 -mg_coarse_telescope_pc_type mg -mg_levels_ksp_max_it 1 -mg_levels_ksp_type richardson -N 512 -options_left 1 -pc_mg_galerkin -pc_mg_levels 4 -pc_type mg -ppe_max_iter 20 -px 8 -py 8 -pz 8 #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --known-level1-dcache-size=16384 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --known-has-attribute-aligned=1 --with-batch="1 " --known-mpi-shared="0 " --known-mpi-shared-libraries=0 --known-memcmp-ok --with-blas-lapack-lib=/opt/acml/5.3.1/gfortran64/lib/libacml.a --COPTFLAGS="-march=bdver1 -O3 -ffast-math -fPIC " --FOPTFLAGS="-march=bdver1 -O3 -ffast-math -fPIC " --CXXOPTFLAGS="-march=bdver1 -O3 -ffast-math -fPIC " --with-x="0 " --with-debugging="0 " --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " --with-fortranlib-autodetect="0 " --with-shared-libraries="0 " --with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " --download-hypre="1 " --download-blacs="1 " --download-scalapack="1 " --download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " PETSC_ARCH=gnu-opt ----------------------------------------- Libraries compiled on Tue Feb 16 12:57:46 2016 on h2ologin3 Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64 Using PETSc directory: /mnt/a/u/sciteam/wang11/Sftw/petsc Using PETSc arch: gnu-opt ----------------------------------------- Using C compiler: cc -march=bdver1 -O3 -ffast-math -fPIC ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: ftn -march=bdver1 -O3 -ffast-math -fPIC ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/include -I/mnt/a/u/sciteam/wang11/Sftw/petsc/include -I/mnt/a/u/sciteam/wang11/Sftw/petsc/include -I/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/include ----------------------------------------- Using C linker: cc Using Fortran linker: ftn Using libraries: -Wl,-rpath,/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -L/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -lpetsc -Wl,-rpath,/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -L/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -lsuperlu_dist_4.3 -lHYPRE -lscalapack -Wl,-rpath,/opt/acml/5.3.1/gfortran64/lib -L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lssl -lcrypto -ldl ----------------------------------------- #PETSc Option Table entries: -ksp_converged_reason -ksp_initial_guess_nonzero yes -ksp_norm_type unpreconditioned -ksp_rtol 1e-7 -ksp_type cg -log_view -matptap_scalable -matrap 0 -memory_view -mg_coarse_ksp_type preonly -mg_coarse_pc_telescope_reduction_factor 8 -mg_coarse_pc_type telescope -mg_coarse_telescope_ksp_type preonly -mg_coarse_telescope_mg_coarse_ksp_type preonly -mg_coarse_telescope_mg_coarse_pc_type redundant -mg_coarse_telescope_mg_levels_ksp_max_it 1 -mg_coarse_telescope_mg_levels_ksp_type richardson -mg_coarse_telescope_pc_mg_galerkin -mg_coarse_telescope_pc_mg_levels 3 -mg_coarse_telescope_pc_type mg -mg_levels_ksp_max_it 1 -mg_levels_ksp_type richardson -N 512 -options_left 1 -pc_mg_galerkin -pc_mg_levels 4 -pc_type mg -ppe_max_iter 20 -px 8 -py 8 -pz 8 #End of PETSc Option Table entries There is one unused database option. It is: Option left: name:-ppe_max_iter value: 20 Application 48712763 resources: utime ~3749s, stime ~789s, Rss ~196960, inblocks ~781565, outblocks ~505751 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_512_4096.txt URL: -------------- next part -------------- Linear solve converged due to CONVERGED_RTOL iterations 7 1 step time: 4.8914160728454590 norm1 error: 8.6827845637092041E-008 norm inf error: 4.1127664509280201E-003 Summary of Memory Usage in PETSc Maximum (over computational time) process memory: total 1.9679e+09 max 1.1249e+05 min 4.1456e+04 Current process memory: total 1.9679e+09 max 1.1249e+05 min 4.1456e+04 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./test_ksp.exe on a gnu-opt named . with 32768 processors, by wang11 Tue Oct 4 03:50:16 2016 Using Petsc Development GIT revision: v3.6.3-2059-geab7831 GIT Date: 2016-01-20 10:58:35 -0600 Max Max/Min Avg Total Time (sec): 5.221e+00 1.00192 5.215e+00 Objects: 3.330e+02 1.72539 1.952e+02 Flops: 2.232e+09 531.65406 3.900e+07 1.278e+12 Flops/sec: 4.277e+08 531.89802 7.473e+06 2.449e+11 MPI Messages: 8.594e+03 4.55579 2.011e+03 6.589e+07 MPI Message Lengths: 1.078e+06 1.95814 2.782e+02 1.833e+10 MPI Reductions: 4.310e+02 1.60223 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 5.2149e+00 100.0% 1.2779e+12 100.0% 6.589e+07 100.0% 2.782e+02 100.0% 2.705e+02 62.8% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage BuildTwoSidedF 1 1.0 6.2082e-02 6.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecTDot 14 1.0 1.5901e-02 2.1 1.15e+05 1.0 0.0e+00 0.0e+00 1.4e+01 0 0 0 0 3 0 0 0 0 5 236313 VecNorm 8 1.0 8.2795e-0299.5 6.55e+04 1.0 0.0e+00 0.0e+00 8.0e+00 1 0 0 0 2 1 0 0 0 3 25937 VecScale 28 2.0 4.6015e-0417.9 8.96e+03 2.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 324014 VecCopy 9 1.0 2.4486e-04 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 193 1.8 5.3072e-04 4.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 28 1.0 6.1011e-04 2.5 2.29e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 12319342 VecAYPX 48 1.4 4.3058e-04 2.8 1.15e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 8416119 VecAssemblyBegin 1 1.0 6.2096e-02 6.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAssemblyEnd 1 1.0 6.3896e-0567.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 194 1.6 2.2339e-02 8.0 0.00e+00 0.0 4.3e+07 2.8e+02 0.0e+00 0 0 65 66 0 0 0 65 66 0 0 VecScatterEnd 194 1.6 3.7815e+0039.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 71 0 0 0 0 71 0 0 0 0 0 MatMult 56 1.3 7.7610e-02 7.5 1.55e+06 1.2 1.7e+07 5.6e+02 0.0e+00 0 3 26 53 0 0 3 26 53 0 563808 MatMultAdd 35 1.7 1.1928e-02 9.2 2.48e+05 1.1 4.9e+06 1.1e+02 0.0e+00 0 1 7 3 0 0 1 7 3 0 607627 MatMultTranspose 47 1.5 2.6726e-0213.3 2.84e+05 1.1 6.5e+06 9.9e+01 0.0e+00 0 1 10 3 0 0 1 10 3 0 310054 MatSolve 7 0.0 5.5102e-02 0.0 4.38e+07 0.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 407368 MatSOR 70 1.7 2.0535e-02 3.7 1.70e+06 1.4 1.2e+07 9.8e+01 2.2e-01 0 3 18 7 0 0 3 18 7 0 1976428 MatLUFactorSym 1 0.0 1.4304e-01 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLUFactorNum 1 0.0 3.0453e+00 0.0 2.18e+09 0.0 0.0e+00 0.0e+00 0.0e+00 1 87 0 0 0 1 87 0 0 0 366959 MatConvert 1 0.0 1.3890e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatResidual 35 1.7 7.3063e-0211.3 8.37e+05 1.4 1.3e+07 3.0e+02 0.0e+00 0 2 20 22 0 0 2 20 22 0 279200 MatAssemblyBegin 29 1.5 1.1239e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+01 2 0 0 0 5 2 0 0 0 7 0 MatAssemblyEnd 29 1.5 3.6328e-01 1.1 0.00e+00 0.0 8.9e+06 4.1e+01 7.3e+01 6 0 14 2 17 6 0 14 2 27 0 MatGetRowIJ 1 0.0 1.1570e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetSubMatrice 2 2.0 1.0665e-01 4.9 0.00e+00 0.0 1.6e+05 5.4e+02 3.1e+00 1 0 0 0 1 1 0 0 0 1 0 MatGetOrdering 1 0.0 8.1892e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatPtAP 6 1.5 4.1852e-01 1.0 7.98e+05 1.2 1.9e+07 3.0e+02 6.9e+01 8 2 28 30 16 8 2 28 30 25 50373 MatPtAPSymbolic 6 1.5 2.2612e-01 1.0 0.00e+00 0.0 1.1e+07 3.7e+02 2.8e+01 4 0 16 22 7 4 0 16 22 10 0 MatPtAPNumeric 6 1.5 1.9413e-01 1.0 7.98e+05 1.2 7.7e+06 2.0e+02 4.0e+01 4 2 12 8 9 4 2 12 8 15 108597 MatRedundantMat 1 0.0 2.9847e-02 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.2e-02 0 0 0 0 0 0 0 0 0 0 0 MatMPIConcateSeq 1 0.0 7.8937e-02 0.0 0.00e+00 0.0 2.7e+04 4.0e+01 2.3e-01 0 0 0 0 0 0 0 0 0 0 0 MatGetLocalMat 6 1.5 7.7701e-04 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetBrAoCol 6 1.5 1.9681e-02 3.1 0.00e+00 0.0 8.3e+06 3.9e+02 0.0e+00 0 0 13 18 0 0 0 13 18 0 0 MatGetSymTrans 12 1.5 2.0599e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DMCoarsen 5 1.7 9.4588e-02 1.0 0.00e+00 0.0 1.2e+06 5.8e+01 3.3e+01 2 0 2 0 8 2 0 2 0 12 0 DMCreateInterpolation 5 1.7 2.1863e-01 1.0 3.54e+04 1.1 2.1e+06 5.8e+01 4.8e+01 4 0 3 1 11 4 0 3 1 18 4736 KSPSetUp 10 2.0 2.9837e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 1 0 0 0 3 1 0 0 0 4 0 KSPSolve 1 1.0 4.8916e+00 1.0 2.23e+09531.7 6.5e+07 2.8e+02 2.4e+02 94100 99 98 56 94100 99 98 89 261253 PCSetUp 2 2.0 4.6506e+00 4.8 2.18e+093247.5 2.3e+07 2.5e+02 1.9e+02 20 89 35 32 44 20 89 35 32 71 245045 PCApply 7 1.0 3.7972e+00 1.0 2.23e+09794.1 4.2e+07 2.2e+02 1.6e+01 73 96 63 51 4 73 96 63 51 6 324561 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 133 133 850544 0. Vector Scatter 24 24 68032 0. Matrix 58 58 42186948 0. Matrix Null Space 1 1 592 0. Distributed Mesh 7 7 34944 0. Star Forest Bipartite Graph 14 14 11872 0. Discrete System 7 7 5992 0. Index Set 54 54 152244 0. IS L to G Mapping 7 7 37936 0. Krylov Solver 11 11 13640 0. DMKSP interface 5 5 3240 0. Preconditioner 11 11 11008 0. Viewer 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 1.90735e-07 Average time for MPI_Barrier(): 6.00338e-05 Average time for zero size MPI_Send(): 1.25148e-05 #PETSc Option Table entries: -ksp_converged_reason -ksp_initial_guess_nonzero yes -ksp_norm_type unpreconditioned -ksp_rtol 1e-7 -ksp_type cg -log_view -matptap_scalable -matrap 0 -memory_view -mg_coarse_ksp_type preonly -mg_coarse_pc_telescope_reduction_factor 64 -mg_coarse_pc_type telescope -mg_coarse_telescope_ksp_type preonly -mg_coarse_telescope_mg_coarse_ksp_type preonly -mg_coarse_telescope_mg_coarse_pc_type redundant -mg_coarse_telescope_mg_levels_ksp_max_it 1 -mg_coarse_telescope_mg_levels_ksp_type richardson -mg_coarse_telescope_pc_mg_galerkin -mg_coarse_telescope_pc_mg_levels 3 -mg_coarse_telescope_pc_type mg -mg_levels_ksp_max_it 1 -mg_levels_ksp_type richardson -N 512 -options_left 1 -pc_mg_galerkin -pc_mg_levels 4 -pc_type mg -ppe_max_iter 20 -px 32 -py 32 -pz 32 #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --known-level1-dcache-size=16384 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --known-has-attribute-aligned=1 --with-batch="1 " --known-mpi-shared="0 " --known-mpi-shared-libraries=0 --known-memcmp-ok --with-blas-lapack-lib=/opt/acml/5.3.1/gfortran64/lib/libacml.a --COPTFLAGS="-march=bdver1 -O3 -ffast-math -fPIC " --FOPTFLAGS="-march=bdver1 -O3 -ffast-math -fPIC " --CXXOPTFLAGS="-march=bdver1 -O3 -ffast-math -fPIC " --with-x="0 " --with-debugging="0 " --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " --with-fortranlib-autodetect="0 " --with-shared-libraries="0 " --with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " --download-hypre="1 " --download-blacs="1 " --download-scalapack="1 " --download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " PETSC_ARCH=gnu-opt ----------------------------------------- Libraries compiled on Tue Feb 16 12:57:46 2016 on h2ologin3 Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64 Using PETSc directory: /mnt/a/u/sciteam/wang11/Sftw/petsc Using PETSc arch: gnu-opt ----------------------------------------- Using C compiler: cc -march=bdver1 -O3 -ffast-math -fPIC ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: ftn -march=bdver1 -O3 -ffast-math -fPIC ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/include -I/mnt/a/u/sciteam/wang11/Sftw/petsc/include -I/mnt/a/u/sciteam/wang11/Sftw/petsc/include -I/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/include ----------------------------------------- Using C linker: cc Using Fortran linker: ftn Using libraries: -Wl,-rpath,/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -L/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -lpetsc -Wl,-rpath,/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -L/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -lsuperlu_dist_4.3 -lHYPRE -lscalapack -Wl,-rpath,/opt/acml/5.3.1/gfortran64/lib -L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lssl -lcrypto -ldl ----------------------------------------- #PETSc Option Table entries: -ksp_converged_reason -ksp_initial_guess_nonzero yes -ksp_norm_type unpreconditioned -ksp_rtol 1e-7 -ksp_type cg -log_view -matptap_scalable -matrap 0 -memory_view -mg_coarse_ksp_type preonly -mg_coarse_pc_telescope_reduction_factor 64 -mg_coarse_pc_type telescope -mg_coarse_telescope_ksp_type preonly -mg_coarse_telescope_mg_coarse_ksp_type preonly -mg_coarse_telescope_mg_coarse_pc_type redundant -mg_coarse_telescope_mg_levels_ksp_max_it 1 -mg_coarse_telescope_mg_levels_ksp_type richardson -mg_coarse_telescope_pc_mg_galerkin -mg_coarse_telescope_pc_mg_levels 3 -mg_coarse_telescope_pc_type mg -mg_levels_ksp_max_it 1 -mg_levels_ksp_type richardson -N 512 -options_left 1 -pc_mg_galerkin -pc_mg_levels 4 -pc_type mg -ppe_max_iter 20 -px 32 -py 32 -pz 32 #End of PETSc Option Table entries There is one unused database option. It is: Option left: name:-ppe_max_iter value: 20 Application 48712514 resources: utime ~274648s, stime ~36467s, Rss ~112492, inblocks ~29956998, outblocks ~32114238 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_1024_4096.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_1024_8192.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_1024_16384.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_1024_32768.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_1024_65536.txt URL: From hengjiew at uci.edu Tue Oct 4 14:14:39 2016 From: hengjiew at uci.edu (frank) Date: Tue, 4 Oct 2016 12:14:39 -0700 Subject: [petsc-users] Performance of the Telescope Multigrid Preconditioner In-Reply-To: References: <577C337B.60909@uci.edu> <57804DE9.707@uci.edu> <5783D3E4.4020004@uci.edu> <5786C9C7.1080309@uci.edu> <5959F823-EDE5-4B34-84C2-271076977368@mcs.anl.gov> <0CFDEA05-2C49-4127-9F13-2B2DB71ADA77@mcs.anl.gov> <27f4756a-3c58-5c56-fd5b-000aac881a5b@uci.edu> <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> Message-ID: <7d31fc6a-d423-0bc2-a69b-276d4f6f71e5@uci.edu> Hi, I attached two ksp_view for the two grid sizes. The major difference between the ksp solver in those runs is the number of MG levels. Except that, the ksp_view are quite similar. I also attached the log_view for all the eight runs. Hope it would not be too messy. Thank you. Frank On 10/04/2016 11:36 AM, Barry Smith wrote: > -ksp_view in both cases? > >> On Oct 4, 2016, at 1:13 PM, frank wrote: >> >> Hi, >> >> This question is follow-up of the thread "Question about memory usage in Multigrid preconditioner". >> I used to have the "Out of Memory(OOM)" problem when using the CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0; -matptap_scalable" option did solve that problem. >> >> Then I test the scalability by solving a 3d poisson eqn for 1 step. I used one sub-communicator in all the tests. The difference between the petsc options in those tests are: 1 the pc_telescope_reduction_factor; 2 the number of multigrid levels in the up/down solver. The function "ksp_solve" is timed. It is kind of slow and doesn't scale at all. >> >> Test1: 512^3 grid points >> Core# telescope_reduction_factor MG levels# for up/down solver Time for KSPSolve (s) >> 512 8 4 / 3 6.2466 >> 4096 64 5 / 3 0.9361 >> 32768 64 4 / 3 4.8914 >> >> Test2: 1024^3 grid points >> Core# telescope_reduction_factor MG levels# for up/down solver Time for KSPSolve (s) >> 4096 64 5 / 4 3.4139 >> 8192 128 5 / 4 2.4196 >> 16384 32 5 / 3 5.4150 >> 32768 64 5 / 3 5.6067 >> 65536 128 5 / 3 6.5219 >> >> I guess I didn't set the MG levels properly. What would be the efficient way to arrange the MG levels? >> Also which preconditionr at the coarse mesh of the 2nd communicator should I use to improve the performance? >> >> I attached the test code and the petsc options file for the 1024^3 cube with 32768 cores. >> >> Thank you. >> >> Regards, >> Frank >> >> >> >> >> >> >> On 09/15/2016 03:35 AM, Dave May wrote: >>> HI all, >>> >>> I the only unexpected memory usage I can see is associated with the call to MatPtAP(). >>> Here is something you can try immediately. >>> Run your code with the additional options >>> -matrap 0 -matptap_scalable >>> >>> I didn't realize this before, but the default behaviour of MatPtAP in parallel is actually to to explicitly form the transpose of P (e.g. assemble R = P^T) and then compute R.A.P. >>> You don't want to do this. The option -matrap 0 resolves this issue. >>> >>> The implementation of P^T.A.P has two variants. >>> The scalable implementation (with respect to memory usage) is selected via the second option -matptap_scalable. >>> >>> Try it out - I see a significant memory reduction using these options for particular mesh sizes / partitions. >>> >>> I've attached a cleaned up version of the code you sent me. >>> There were a number of memory leaks and other issues. >>> The main points being >>> * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End} >>> * You should call PetscFinalize(), otherwise the option -log_summary (-log_view) will not display anything once the program has completed. >>> >>> >>> Thanks, >>> Dave >>> >>> >>> On 15 September 2016 at 08:03, Hengjie Wang wrote: >>> Hi Dave, >>> >>> Sorry, I should have put more comment to explain the code. >>> The number of process in each dimension is the same: Px = Py=Pz=P. So is the domain size. >>> So if the you want to run the code for a 512^3 grid points on 16^3 cores, you need to set "-N 512 -P 16" in the command line. >>> I add more comments and also fix an error in the attached code. ( The error only effects the accuracy of solution but not the memory usage. ) >>> >>> Thank you. >>> Frank >>> >>> >>> On 9/14/2016 9:05 PM, Dave May wrote: >>>> >>>> On Thursday, 15 September 2016, Dave May wrote: >>>> >>>> >>>> On Thursday, 15 September 2016, frank wrote: >>>> Hi, >>>> >>>> I write a simple code to re-produce the error. I hope this can help to diagnose the problem. >>>> The code just solves a 3d poisson equation. >>>> >>>> Why is the stencil width a runtime parameter?? And why is the default value 2? For 7-pnt FD Laplace, you only need a stencil width of 1. >>>> >>>> Was this choice made to mimic something in the real application code? >>>> >>>> Please ignore - I misunderstood your usage of the param set by -P >>>> >>>> >>>> >>>> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32. That's when I re-produce the OOM error. Each core has about 2G memory. >>>> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp solver works fine. >>>> I attached the code, ksp_view_pre's output and my petsc option file. >>>> >>>> Thank you. >>>> Frank >>>> >>>> On 09/09/2016 06:38 PM, Hengjie Wang wrote: >>>>> Hi Barry, >>>>> >>>>> I checked. On the supercomputer, I had the option "-ksp_view_pre" but it is not in file I sent you. I am sorry for the confusion. >>>>> >>>>> Regards, >>>>> Frank >>>>> >>>>> On Friday, September 9, 2016, Barry Smith wrote: >>>>> >>>>>> On Sep 9, 2016, at 3:11 PM, frank wrote: >>>>>> >>>>>> Hi Barry, >>>>>> >>>>>> I think the first KSP view output is from -ksp_view_pre. Before I submitted the test, I was not sure whether there would be OOM error or not. So I added both -ksp_view_pre and -ksp_view. >>>>> But the options file you sent specifically does NOT list the -ksp_view_pre so how could it be from that? >>>>> >>>>> Sorry to be pedantic but I've spent too much time in the past trying to debug from incorrect information and want to make sure that the information I have is correct before thinking. Please recheck exactly what happened. Rerun with the exact input file you emailed if that is needed. >>>>> >>>>> Barry >>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> On 09/09/2016 12:38 PM, Barry Smith wrote: >>>>>>> Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt has only one KSPView in it? Did you run two different solves in the 2 case but not the one? >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Sep 9, 2016, at 10:56 AM, frank wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I want to continue digging into the memory problem here. >>>>>>>> I did find a work around in the past, which is to use less cores per node so that each core has 8G memory. However this is deficient and expensive. I hope to locate the place that uses the most memory. >>>>>>>> >>>>>>>> Here is a brief summary of the tests I did in past: >>>>>>>>> Test1: Mesh 1536*128*384 | Process Mesh 48*4*12 >>>>>>>> Maximum (over computational time) process memory: total 7.0727e+08 >>>>>>>> Current process memory: total 7.0727e+08 >>>>>>>> Maximum (over computational time) space PetscMalloc()ed: total 6.3908e+11 >>>>>>>> Current space PetscMalloc()ed: total 1.8275e+09 >>>>>>>> >>>>>>>>> Test2: Mesh 1536*128*384 | Process Mesh 96*8*24 >>>>>>>> Maximum (over computational time) process memory: total 5.9431e+09 >>>>>>>> Current process memory: total 5.9431e+09 >>>>>>>> Maximum (over computational time) space PetscMalloc()ed: total 5.3202e+12 >>>>>>>> Current space PetscMalloc()ed: total 5.4844e+09 >>>>>>>> >>>>>>>>> Test3: Mesh 3072*256*768 | Process Mesh 96*8*24 >>>>>>>> OOM( Out Of Memory ) killer of the supercomputer terminated the job during "KSPSolve". >>>>>>>> >>>>>>>> I attached the output of ksp_view( the third test's output is from ksp_view_pre ), memory_view and also the petsc options. >>>>>>>> >>>>>>>> In all the tests, each core can access about 2G memory. In test3, there are 4223139840 non-zeros in the matrix. This will consume about 1.74M, using double precision. Considering some extra memory used to store integer index, 2G memory should still be way enough. >>>>>>>> >>>>>>>> Is there a way to find out which part of KSPSolve uses the most memory? >>>>>>>> Thank you so much. >>>>>>>> >>>>>>>> BTW, there are 4 options remains unused and I don't understand why they are omitted: >>>>>>>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly >>>>>>>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi >>>>>>>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1 >>>>>>>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson >>>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> Frank >>>>>>>> >>>>>>>> On 07/13/2016 05:47 PM, Dave May wrote: >>>>>>>>> On 14 July 2016 at 01:07, frank wrote: >>>>>>>>> Hi Dave, >>>>>>>>> >>>>>>>>> Sorry for the late reply. >>>>>>>>> Thank you so much for your detailed reply. >>>>>>>>> >>>>>>>>> I have a question about the estimation of the memory usage. There are 4223139840 allocated non-zeros and 18432 MPI processes. Double precision is used. So the memory per process is: >>>>>>>>> 4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ? >>>>>>>>> Did I do sth wrong here? Because this seems too small. >>>>>>>>> >>>>>>>>> No - I totally f***ed it up. You are correct. That'll teach me for fumbling around with my iphone calculator and not using my brain. (Note that to convert to MB just divide by 1e6, not 1024^2 - although I apparently cannot convert between units correctly....) >>>>>>>>> >>>>>>>>> From the PETSc objects associated with the solver, It looks like it _should_ run with 2GB per MPI rank. Sorry for my mistake. Possibilities are: somewhere in your usage of PETSc you've introduced a memory leak; PETSc is doing a huge over allocation (e.g. as per our discussion of MatPtAP); or in your application code there are other objects you have forgotten to log the memory for. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I am running this job on Bluewater >>>>>>>>> I am using the 7 points FD stencil in 3D. >>>>>>>>> >>>>>>>>> I thought so on both counts. >>>>>>>>> >>>>>>>>> I apologize that I made a stupid mistake in computing the memory per core. My settings render each core can access only 2G memory on average instead of 8G which I mentioned in previous email. I re-run the job with 8G memory per core on average and there is no "Out Of Memory" error. I would do more test to see if there is still some memory issue. >>>>>>>>> >>>>>>>>> Ok. I'd still like to know where the memory was being used since my estimates were off. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Dave >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Frank >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 07/11/2016 01:18 PM, Dave May wrote: >>>>>>>>>> Hi Frank, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 11 July 2016 at 19:14, frank wrote: >>>>>>>>>> Hi Dave, >>>>>>>>>> >>>>>>>>>> I re-run the test using bjacobi as the preconditioner on the coarse mesh of telescope. The Grid is 3072*256*768 and process mesh is 96*8*24. The petsc option file is attached. >>>>>>>>>> I still got the "Out Of Memory" error. The error occurred before the linear solver finished one step. So I don't have the full info from ksp_view. The info from ksp_view_pre is attached. >>>>>>>>>> >>>>>>>>>> Okay - that is essentially useless (sorry) >>>>>>>>>> >>>>>>>>>> It seems to me that the error occurred when the decomposition was going to be changed. >>>>>>>>>> >>>>>>>>>> Based on what information? >>>>>>>>>> Running with -info would give us more clues, but will create a ton of output. >>>>>>>>>> Please try running the case which failed with -info >>>>>>>>>> I had another test with a grid of 1536*128*384 and the same process mesh as above. There was no error. The ksp_view info is attached for comparison. >>>>>>>>>> Thank you. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [3] Here is my crude estimate of your memory usage. >>>>>>>>>> I'll target the biggest memory hogs only to get an order of magnitude estimate >>>>>>>>>> >>>>>>>>>> * The Fine grid operator contains 4223139840 non-zeros --> 1.8 GB per MPI rank assuming double precision. >>>>>>>>>> The indices for the AIJ could amount to another 0.3 GB (assuming 32 bit integers) >>>>>>>>>> >>>>>>>>>> * You use 5 levels of coarsening, so the other operators should represent (collectively) >>>>>>>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4 ~ 300 MB per MPI rank on the communicator with 18432 ranks. >>>>>>>>>> The coarse grid should consume ~ 0.5 MB per MPI rank on the communicator with 18432 ranks. >>>>>>>>>> >>>>>>>>>> * You use a reduction factor of 64, making the new communicator with 288 MPI ranks. >>>>>>>>>> PCTelescope will first gather a temporary matrix associated with your coarse level operator assuming a comm size of 288 living on the comm with size 18432. >>>>>>>>>> This matrix will require approximately 0.5 * 64 = 32 MB per core on the 288 ranks. >>>>>>>>>> This matrix is then used to form a new MPIAIJ matrix on the subcomm, thus require another 32 MB per rank. >>>>>>>>>> The temporary matrix is now destroyed. >>>>>>>>>> >>>>>>>>>> * Because a DMDA is detected, a permutation matrix is assembled. >>>>>>>>>> This requires 2 doubles per point in the DMDA. >>>>>>>>>> Your coarse DMDA contains 92 x 16 x 48 points. >>>>>>>>>> Thus the permutation matrix will require < 1 MB per MPI rank on the sub-comm. >>>>>>>>>> >>>>>>>>>> * Lastly, the matrix is permuted. This uses MatPtAP(), but the resulting operator will have the same memory footprint as the unpermuted matrix (32 MB). At any stage in PCTelescope, only 2 operators of size 32 MB are held in memory when the DMDA is provided. >>>>>>>>>> >>>>>>>>>> From my rough estimates, the worst case memory foot print for any given core, given your options is approximately >>>>>>>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB = 2465 MB >>>>>>>>>> This is way below 8 GB. >>>>>>>>>> >>>>>>>>>> Note this estimate completely ignores: >>>>>>>>>> (1) the memory required for the restriction operator, >>>>>>>>>> (2) the potential growth in the number of non-zeros per row due to Galerkin coarsening (I wished -ksp_view_pre reported the output from MatView so we could see the number of non-zeros required by the coarse level operators) >>>>>>>>>> (3) all temporary vectors required by the CG solver, and those required by the smoothers. >>>>>>>>>> (4) internal memory allocated by MatPtAP >>>>>>>>>> (5) memory associated with IS's used within PCTelescope >>>>>>>>>> >>>>>>>>>> So either I am completely off in my estimates, or you have not carefully estimated the memory usage of your application code. Hopefully others might examine/correct my rough estimates >>>>>>>>>> >>>>>>>>>> Since I don't have your code I cannot access the latter. >>>>>>>>>> Since I don't have access to the same machine you are running on, I think we need to take a step back. >>>>>>>>>> >>>>>>>>>> [1] What machine are you running on? Send me a URL if its available >>>>>>>>>> >>>>>>>>>> [2] What discretization are you using? (I am guessing a scalar 7 point FD stencil) >>>>>>>>>> If it's a 7 point FD stencil, we should be able to examine the memory usage of your solver configuration using a standard, light weight existing PETSc example, run on your machine at the same scale. >>>>>>>>>> This would hopefully enable us to correctly evaluate the actual memory usage required by the solver configuration you are using. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Dave >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Frank >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 07/08/2016 10:38 PM, Dave May wrote: >>>>>>>>>>> On Saturday, 9 July 2016, frank wrote: >>>>>>>>>>> Hi Barry and Dave, >>>>>>>>>>> >>>>>>>>>>> Thank both of you for the advice. >>>>>>>>>>> >>>>>>>>>>> @Barry >>>>>>>>>>> I made a mistake in the file names in last email. I attached the correct files this time. >>>>>>>>>>> For all the three tests, 'Telescope' is used as the coarse preconditioner. >>>>>>>>>>> >>>>>>>>>>> == Test1: Grid: 1536*128*384, Process Mesh: 48*4*12 >>>>>>>>>>> Part of the memory usage: Vector 125 124 3971904 0. >>>>>>>>>>> Matrix 101 101 9462372 0 >>>>>>>>>>> >>>>>>>>>>> == Test2: Grid: 1536*128*384, Process Mesh: 96*8*24 >>>>>>>>>>> Part of the memory usage: Vector 125 124 681672 0. >>>>>>>>>>> Matrix 101 101 1462180 0. >>>>>>>>>>> >>>>>>>>>>> In theory, the memory usage in Test1 should be 8 times of Test2. In my case, it is about 6 times. >>>>>>>>>>> >>>>>>>>>>> == Test3: Grid: 3072*256*768, Process Mesh: 96*8*24. Sub-domain per process: 32*32*32 >>>>>>>>>>> Here I get the out of memory error. >>>>>>>>>>> >>>>>>>>>>> I tried to use -mg_coarse jacobi. In this way, I don't need to set -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right? >>>>>>>>>>> The linear solver didn't work in this case. Petsc output some errors. >>>>>>>>>>> >>>>>>>>>>> @Dave >>>>>>>>>>> In test3, I use only one instance of 'Telescope'. On the coarse mesh of 'Telescope', I used LU as the preconditioner instead of SVD. >>>>>>>>>>> If my set the levels correctly, then on the last coarse mesh of MG where it calls 'Telescope', the sub-domain per process is 2*2*2. >>>>>>>>>>> On the last coarse mesh of 'Telescope', there is only one grid point per process. >>>>>>>>>>> I still got the OOM error. The detailed petsc option file is attached. >>>>>>>>>>> >>>>>>>>>>> Do you understand the expected memory usage for the particular parallel LU implementation you are using? I don't (seriously). Replace LU with bjacobi and re-run this test. My point about solver debugging is still valid. >>>>>>>>>>> >>>>>>>>>>> And please send the result of KSPView so we can see what is actually used in the computations >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> Dave >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thank you so much. >>>>>>>>>>> >>>>>>>>>>> Frank >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote: >>>>>>>>>>> On Jul 6, 2016, at 4:19 PM, frank wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Barry, >>>>>>>>>>> >>>>>>>>>>> Thank you for you advice. >>>>>>>>>>> I tried three test. In the 1st test, the grid is 3072*256*768 and the process mesh is 96*8*24. >>>>>>>>>>> The linear solver is 'cg' the preconditioner is 'mg' and 'telescope' is used as the preconditioner at the coarse mesh. >>>>>>>>>>> The system gives me the "Out of Memory" error before the linear system is completely solved. >>>>>>>>>>> The info from '-ksp_view_pre' is attached. I seems to me that the error occurs when it reaches the coarse mesh. >>>>>>>>>>> >>>>>>>>>>> The 2nd test uses a grid of 1536*128*384 and process mesh is 96*8*24. The 3rd test uses the same grid but a different process mesh 48*4*12. >>>>>>>>>>> Are you sure this is right? The total matrix and vector memory usage goes from 2nd test >>>>>>>>>>> Vector 384 383 8,193,712 0. >>>>>>>>>>> Matrix 103 103 11,508,688 0. >>>>>>>>>>> to 3rd test >>>>>>>>>>> Vector 384 383 1,590,520 0. >>>>>>>>>>> Matrix 103 103 3,508,664 0. >>>>>>>>>>> that is the memory usage got smaller but if you have only 1/8th the processes and the same grid it should have gotten about 8 times bigger. Did you maybe cut the grid by a factor of 8 also? If so that still doesn't explain it because the memory usage changed by a factor of 5 something for the vectors and 3 something for the matrices. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The linear solver and petsc options in 2nd and 3rd tests are the same in 1st test. The linear solver works fine in both test. >>>>>>>>>>> I attached the memory usage of the 2nd and 3rd tests. The memory info is from the option '-log_summary'. I tried to use '-momery_info' as you suggested, but in my case petsc treated it as an unused option. It output nothing about the memory. Do I need to add sth to my code so I can use '-memory_info'? >>>>>>>>>>> Sorry, my mistake the option is -memory_view >>>>>>>>>>> >>>>>>>>>>> Can you run the one case with -memory_view and -mg_coarse jacobi -ksp_max_it 1 (just so it doesn't iterate forever) to see how much memory is used without the telescope? Also run case 2 the same way. >>>>>>>>>>> >>>>>>>>>>> Barry >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> In both tests the memory usage is not large. >>>>>>>>>>> >>>>>>>>>>> It seems to me that it might be the 'telescope' preconditioner that allocated a lot of memory and caused the error in the 1st test. >>>>>>>>>>> Is there is a way to show how much memory it allocated? >>>>>>>>>>> >>>>>>>>>>> Frank >>>>>>>>>>> >>>>>>>>>>> On 07/05/2016 03:37 PM, Barry Smith wrote: >>>>>>>>>>> Frank, >>>>>>>>>>> >>>>>>>>>>> You can run with -ksp_view_pre to have it "view" the KSP before the solve so hopefully it gets that far. >>>>>>>>>>> >>>>>>>>>>> Please run the problem that does fit with -memory_info when the problem completes it will show the "high water mark" for PETSc allocated memory and total memory used. We first want to look at these numbers to see if it is using more memory than you expect. You could also run with say half the grid spacing to see how the memory usage scaled with the increase in grid points. Make the runs also with -log_view and send all the output from these options. >>>>>>>>>>> >>>>>>>>>>> Barry >>>>>>>>>>> >>>>>>>>>>> On Jul 5, 2016, at 5:23 PM, frank wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I am using the CG ksp solver and Multigrid preconditioner to solve a linear system in parallel. >>>>>>>>>>> I chose to use the 'Telescope' as the preconditioner on the coarse mesh for its good performance. >>>>>>>>>>> The petsc options file is attached. >>>>>>>>>>> >>>>>>>>>>> The domain is a 3d box. >>>>>>>>>>> It works well when the grid is 1536*128*384 and the process mesh is 96*8*24. When I double the size of grid and keep the same process mesh and petsc options, I get an "out of memory" error from the super-cluster I am using. >>>>>>>>>>> Each process has access to at least 8G memory, which should be more than enough for my application. I am sure that all the other parts of my code( except the linear solver ) do not use much memory. So I doubt if there is something wrong with the linear solver. >>>>>>>>>>> The error occurs before the linear system is completely solved so I don't have the info from ksp view. I am not able to re-produce the error with a smaller problem either. >>>>>>>>>>> In addition, I tried to use the block jacobi as the preconditioner with the same grid and same decomposition. The linear solver runs extremely slow but there is no memory error. >>>>>>>>>>> >>>>>>>>>>> How can I diagnose what exactly cause the error? >>>>>>>>>>> Thank you so much. >>>>>>>>>>> >>>>>>>>>>> Frank >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>> >>> >> -------------- next part -------------- Linear solve converged due to CONVERGED_RTOL iterations 7 KSP Object: 4096 MPI processes type: cg maximum iterations=10000 tolerances: relative=1e-07, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using UNPRECONDITIONED norm type for convergence test PC Object: 4096 MPI processes type: mg MG: type is MULTIPLICATIVE, levels=5 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 4096 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 4096 MPI processes type: telescope Telescope: parent comm size reduction factor = 64 Telescope: comm_size = 4096 , subcomm_size = 64 Telescope: DMDA detected DMDA Object: (repart_) 64 MPI processes M 32 N 32 P 32 m 4 n 4 p 4 dof 1 overlap 1 KSP Object: (mg_coarse_telescope_) 64 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_telescope_) 64 MPI processes type: mg MG: type is MULTIPLICATIVE, levels=3 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_telescope_mg_coarse_) 64 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_telescope_mg_coarse_) 64 MPI processes type: redundant Redundant preconditioner: First (color=0) of 64 PCs follows linear system matrix = precond matrix: Mat Object: 64 MPI processes type: mpiaij rows=512, cols=512 total: nonzeros=13824, allocated nonzeros=13824 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 2 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_coarse_telescope_mg_levels_1_) 64 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_coarse_telescope_mg_levels_1_) 64 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 64 MPI processes type: mpiaij rows=4096, cols=4096 total: nonzeros=110592, allocated nonzeros=110592 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_coarse_telescope_mg_levels_2_) 64 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_coarse_telescope_mg_levels_2_) 64 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 64 MPI processes type: mpiaij rows=32768, cols=32768 total: nonzeros=884736, allocated nonzeros=884736 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 64 MPI processes type: mpiaij rows=32768, cols=32768 total: nonzeros=884736, allocated nonzeros=884736 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object: (mg_coarse_telescope_mg_coarse_redundant_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_telescope_mg_coarse_redundant_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5., needed 8.69575 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=512, cols=512 package used to perform factorization: petsc total: nonzeros=120210, allocated nonzeros=120210 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=512, cols=512 total: nonzeros=13824, allocated nonzeros=13824 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 4096 MPI processes type: mpiaij rows=32768, cols=32768 total: nonzeros=884736, allocated nonzeros=884736 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 2 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 4096 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 4096 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 4096 MPI processes type: mpiaij rows=262144, cols=262144 total: nonzeros=7077888, allocated nonzeros=7077888 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 4096 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 4096 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 4096 MPI processes type: mpiaij rows=2097152, cols=2097152 total: nonzeros=56623104, allocated nonzeros=56623104 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 4096 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_3_) 4096 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 4096 MPI processes type: mpiaij rows=16777216, cols=16777216 total: nonzeros=452984832, allocated nonzeros=452984832 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 4 ------------------------------- KSP Object: (mg_levels_4_) 4096 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_4_) 4096 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 4096 MPI processes type: mpiaij rows=134217728, cols=134217728 total: nonzeros=939524096, allocated nonzeros=939524096 total number of mallocs used during MatSetValues calls =0 has attached null space Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 4096 MPI processes type: mpiaij rows=134217728, cols=134217728 total: nonzeros=939524096, allocated nonzeros=939524096 total number of mallocs used during MatSetValues calls =0 has attached null space -------------- next part -------------- Linear solve converged due to CONVERGED_RTOL iterations 8 KSP Object: 8192 MPI processes type: cg maximum iterations=10000 tolerances: relative=1e-07, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using UNPRECONDITIONED norm type for convergence test PC Object: 8192 MPI processes type: mg MG: type is MULTIPLICATIVE, levels=5 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 8192 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 8192 MPI processes type: telescope Telescope: parent comm size reduction factor = 128 Telescope: comm_size = 8192 , subcomm_size = 64 Telescope: DMDA detected DMDA Object: (repart_) 64 MPI processes M 64 N 64 P 64 m 4 n 4 p 4 dof 1 overlap 1 KSP Object: (mg_coarse_telescope_) 64 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_telescope_) 64 MPI processes type: mg MG: type is MULTIPLICATIVE, levels=4 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_telescope_mg_coarse_) 64 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_telescope_mg_coarse_) 64 MPI processes type: redundant Redundant preconditioner: First (color=0) of 64 PCs follows linear system matrix = precond matrix: Mat Object: 64 MPI processes type: mpiaij rows=512, cols=512 total: nonzeros=13824, allocated nonzeros=13824 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 2 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_coarse_telescope_mg_levels_1_) 64 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_coarse_telescope_mg_levels_1_) 64 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 64 MPI processes type: mpiaij rows=4096, cols=4096 total: nonzeros=110592, allocated nonzeros=110592 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_coarse_telescope_mg_levels_2_) 64 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_coarse_telescope_mg_levels_2_) 64 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 64 MPI processes type: mpiaij rows=32768, cols=32768 total: nonzeros=884736, allocated nonzeros=884736 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_coarse_telescope_mg_levels_3_) 64 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_coarse_telescope_mg_levels_3_) 64 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 64 MPI processes type: mpiaij rows=262144, cols=262144 total: nonzeros=7077888, allocated nonzeros=7077888 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 64 MPI processes type: mpiaij rows=262144, cols=262144 total: nonzeros=7077888, allocated nonzeros=7077888 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object: (mg_coarse_telescope_mg_coarse_redundant_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_telescope_mg_coarse_redundant_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5., needed 8.69575 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=512, cols=512 package used to perform factorization: petsc total: nonzeros=120210, allocated nonzeros=120210 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=512, cols=512 total: nonzeros=13824, allocated nonzeros=13824 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 8192 MPI processes type: mpiaij rows=262144, cols=262144 total: nonzeros=7077888, allocated nonzeros=7077888 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 16 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 8192 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 8192 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 8192 MPI processes type: mpiaij rows=2097152, cols=2097152 total: nonzeros=56623104, allocated nonzeros=56623104 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 8192 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 8192 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 8192 MPI processes type: mpiaij rows=16777216, cols=16777216 total: nonzeros=452984832, allocated nonzeros=452984832 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 8192 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_3_) 8192 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 8192 MPI processes type: mpiaij rows=134217728, cols=134217728 total: nonzeros=3623878656, allocated nonzeros=3623878656 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 4 ------------------------------- KSP Object: (mg_levels_4_) 8192 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_4_) 8192 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 8192 MPI processes type: mpiaij rows=1073741824, cols=1073741824 total: nonzeros=7516192768, allocated nonzeros=7516192768 total number of mallocs used during MatSetValues calls =0 has attached null space Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 8192 MPI processes type: mpiaij rows=1073741824, cols=1073741824 total: nonzeros=7516192768, allocated nonzeros=7516192768 total number of mallocs used during MatSetValues calls =0 has attached null space -------------- next part -------------- Linear solve converged due to CONVERGED_RTOL iterations 7 1 step time: 6.2466299533843994 norm1 error: 1.2135791829058829E-005 norm inf error: 1.0512737852365958E-002 Summary of Memory Usage in PETSc Maximum (over computational time) process memory: total 8.0407e+07 max 1.9696e+05 min 1.5078e+05 Current process memory: total 8.0407e+07 max 1.9696e+05 min 1.5078e+05 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./test_ksp.exe on a gnu-opt named . with 512 processors, by wang11 Tue Oct 4 05:04:05 2016 Using Petsc Development GIT revision: v3.6.3-2059-geab7831 GIT Date: 2016-01-20 10:58:35 -0600 Max Max/Min Avg Total Time (sec): 7.128e+00 1.00215 7.121e+00 Objects: 3.330e+02 1.72539 2.105e+02 Flops: 2.508e+09 9.15893 5.530e+08 2.832e+11 Flops/sec: 3.521e+08 9.16346 7.765e+07 3.976e+10 MPI Messages: 3.918e+03 2.07713 2.157e+03 1.104e+06 MPI Message Lengths: 1.003e+07 1.17554 4.064e+03 4.488e+09 MPI Reductions: 4.310e+02 1.60223 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 7.1208e+00 100.0% 2.8316e+11 100.0% 1.104e+06 100.0% 4.064e+03 100.0% 2.882e+02 66.9% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage BuildTwoSidedF 1 1.0 2.5056e-0217.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecTDot 14 1.0 6.0542e-02 1.6 7.34e+06 1.0 0.0e+00 0.0e+00 1.4e+01 1 1 0 0 3 1 1 0 0 5 62074 VecNorm 8 1.0 3.5572e-02 3.1 4.19e+06 1.0 0.0e+00 0.0e+00 8.0e+00 0 1 0 0 2 0 1 0 0 3 60370 VecScale 28 2.0 2.1243e-04 1.8 7.35e+04 1.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 144250 VecCopy 9 1.0 3.8947e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 193 1.8 1.6343e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 28 1.0 1.0030e-01 1.1 1.47e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 74940 VecAYPX 48 1.4 6.3155e-02 1.6 7.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 57380 VecAssemblyBegin 1 1.0 2.5080e-0217.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 1 1.0 2.2888e-0512.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 194 1.6 3.9131e-02 1.6 0.00e+00 0.0 7.2e+05 4.1e+03 0.0e+00 0 0 65 65 0 0 0 65 65 0 0 VecScatterEnd 194 1.6 3.4133e+0068.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 42 0 0 0 0 42 0 0 0 0 0 MatMult 56 1.3 5.0448e-01 1.2 8.70e+07 1.0 2.9e+05 8.2e+03 0.0e+00 6 15 26 53 0 6 15 26 53 0 86737 MatMultAdd 35 1.7 8.0332e-02 1.2 1.43e+07 1.0 8.2e+04 1.5e+03 0.0e+00 1 3 7 3 0 1 3 7 3 0 90220 MatMultTranspose 47 1.5 1.1686e-01 1.4 1.64e+07 1.0 1.1e+05 1.4e+03 0.0e+00 1 3 10 3 0 1 3 10 3 0 70913 MatSolve 7 0.0 5.4884e-02 0.0 4.38e+07 0.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 51106 MatSOR 70 1.7 7.4662e-01 1.1 8.85e+07 1.0 2.1e+05 1.2e+03 1.8e+00 10 15 19 5 0 10 15 19 5 1 58271 MatLUFactorSym 1 0.0 1.3002e-01 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLUFactorNum 1 0.0 3.0343e+00 0.0 2.18e+09 0.0 0.0e+00 0.0e+00 0.0e+00 5 49 0 0 0 5 49 0 0 0 46035 MatConvert 1 0.0 1.4801e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatResidual 35 1.7 2.5246e-01 1.3 4.14e+07 1.0 2.3e+05 4.1e+03 0.0e+00 3 7 21 21 0 3 7 21 21 0 80802 MatAssemblyBegin 29 1.5 6.2687e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.1e+01 1 0 0 0 5 1 0 0 0 7 0 MatAssemblyEnd 29 1.5 2.8406e-01 1.0 0.00e+00 0.0 1.5e+05 5.4e+02 7.7e+01 4 0 14 2 18 4 0 14 2 27 0 MatGetRowIJ 1 0.0 1.1208e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetSubMatrice 2 2.0 4.1284e-02 9.3 0.00e+00 0.0 2.2e+03 3.4e+04 3.5e+00 0 0 0 2 1 0 0 0 2 1 0 MatGetOrdering 1 0.0 7.9041e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatPtAP 6 1.5 1.0306e+00 1.0 4.18e+07 1.0 3.1e+05 4.4e+03 7.2e+01 14 7 28 30 17 14 7 28 30 25 20208 MatPtAPSymbolic 6 1.5 4.9107e-01 1.0 0.00e+00 0.0 1.8e+05 5.3e+03 3.0e+01 7 0 16 21 7 7 0 16 21 10 0 MatPtAPNumeric 6 1.5 5.3958e-01 1.0 4.18e+07 1.0 1.3e+05 3.0e+03 4.2e+01 7 7 11 9 10 7 7 11 9 15 38597 MatRedundantMat 1 0.0 2.7650e-02 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e-01 0 0 0 0 0 0 0 0 0 0 0 MatMPIConcateSeq 1 0.0 1.6951e-02 0.0 0.00e+00 0.0 3.3e+03 1.4e+02 1.9e+00 0 0 0 0 0 0 0 0 0 1 0 MatGetLocalMat 6 1.5 4.7763e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatGetBrAoCol 6 1.5 4.1229e-02 1.2 0.00e+00 0.0 1.4e+05 5.5e+03 0.0e+00 1 0 13 17 0 1 0 13 17 0 0 MatGetSymTrans 12 1.5 1.4412e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DMCoarsen 5 1.7 8.8470e-03 1.4 0.00e+00 0.0 2.0e+04 8.4e+02 3.6e+01 0 0 2 0 8 0 0 2 0 12 0 DMCreateInterpolation 5 1.7 2.1848e-01 1.0 2.05e+06 1.0 3.5e+04 7.5e+02 5.2e+01 3 0 3 1 12 3 0 3 1 18 4739 KSPSetUp 10 2.0 1.9465e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 3 0 0 0 0 4 0 KSPSolve 1 1.0 6.2467e+00 1.0 2.51e+09 9.2 1.1e+06 4.0e+03 2.6e+02 88100 99 98 60 88100 99 98 90 45330 PCSetUp 2 2.0 4.5211e+00 3.6 2.23e+0952.3 3.8e+05 3.8e+03 2.1e+02 23 57 35 33 48 23 57 35 33 72 35732 PCApply 7 1.0 4.6845e+00 1.0 2.42e+0913.0 7.2e+05 3.1e+03 3.0e+01 66 84 65 50 7 66 84 65 50 11 50783 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 133 133 29053936 0. Vector Scatter 24 24 2464384 0. Matrix 58 58 118369764 0. Matrix Null Space 1 1 592 0. Distributed Mesh 7 7 34944 0. Star Forest Bipartite Graph 14 14 11872 0. Discrete System 7 7 5992 0. Index Set 54 54 1628276 0. IS L to G Mapping 7 7 1367088 0. Krylov Solver 11 11 13640 0. DMKSP interface 5 5 3240 0. Preconditioner 11 11 11008 0. Viewer 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 1.90735e-07 Average time for MPI_Barrier(): 1.87874e-05 Average time for zero size MPI_Send(): 1.10432e-05 #PETSc Option Table entries: -ksp_converged_reason -ksp_initial_guess_nonzero yes -ksp_norm_type unpreconditioned -ksp_rtol 1e-7 -ksp_type cg -log_view -matptap_scalable -matrap 0 -memory_view -mg_coarse_ksp_type preonly -mg_coarse_pc_telescope_reduction_factor 8 -mg_coarse_pc_type telescope -mg_coarse_telescope_ksp_type preonly -mg_coarse_telescope_mg_coarse_ksp_type preonly -mg_coarse_telescope_mg_coarse_pc_type redundant -mg_coarse_telescope_mg_levels_ksp_max_it 1 -mg_coarse_telescope_mg_levels_ksp_type richardson -mg_coarse_telescope_pc_mg_galerkin -mg_coarse_telescope_pc_mg_levels 3 -mg_coarse_telescope_pc_type mg -mg_levels_ksp_max_it 1 -mg_levels_ksp_type richardson -N 512 -options_left 1 -pc_mg_galerkin -pc_mg_levels 4 -pc_type mg -ppe_max_iter 20 -px 8 -py 8 -pz 8 #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --known-level1-dcache-size=16384 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --known-has-attribute-aligned=1 --with-batch="1 " --known-mpi-shared="0 " --known-mpi-shared-libraries=0 --known-memcmp-ok --with-blas-lapack-lib=/opt/acml/5.3.1/gfortran64/lib/libacml.a --COPTFLAGS="-march=bdver1 -O3 -ffast-math -fPIC " --FOPTFLAGS="-march=bdver1 -O3 -ffast-math -fPIC " --CXXOPTFLAGS="-march=bdver1 -O3 -ffast-math -fPIC " --with-x="0 " --with-debugging="0 " --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " --with-fortranlib-autodetect="0 " --with-shared-libraries="0 " --with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " --download-hypre="1 " --download-blacs="1 " --download-scalapack="1 " --download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " PETSC_ARCH=gnu-opt ----------------------------------------- Libraries compiled on Tue Feb 16 12:57:46 2016 on h2ologin3 Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64 Using PETSc directory: /mnt/a/u/sciteam/wang11/Sftw/petsc Using PETSc arch: gnu-opt ----------------------------------------- Using C compiler: cc -march=bdver1 -O3 -ffast-math -fPIC ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: ftn -march=bdver1 -O3 -ffast-math -fPIC ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/include -I/mnt/a/u/sciteam/wang11/Sftw/petsc/include -I/mnt/a/u/sciteam/wang11/Sftw/petsc/include -I/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/include ----------------------------------------- Using C linker: cc Using Fortran linker: ftn Using libraries: -Wl,-rpath,/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -L/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -lpetsc -Wl,-rpath,/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -L/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -lsuperlu_dist_4.3 -lHYPRE -lscalapack -Wl,-rpath,/opt/acml/5.3.1/gfortran64/lib -L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lssl -lcrypto -ldl ----------------------------------------- #PETSc Option Table entries: -ksp_converged_reason -ksp_initial_guess_nonzero yes -ksp_norm_type unpreconditioned -ksp_rtol 1e-7 -ksp_type cg -log_view -matptap_scalable -matrap 0 -memory_view -mg_coarse_ksp_type preonly -mg_coarse_pc_telescope_reduction_factor 8 -mg_coarse_pc_type telescope -mg_coarse_telescope_ksp_type preonly -mg_coarse_telescope_mg_coarse_ksp_type preonly -mg_coarse_telescope_mg_coarse_pc_type redundant -mg_coarse_telescope_mg_levels_ksp_max_it 1 -mg_coarse_telescope_mg_levels_ksp_type richardson -mg_coarse_telescope_pc_mg_galerkin -mg_coarse_telescope_pc_mg_levels 3 -mg_coarse_telescope_pc_type mg -mg_levels_ksp_max_it 1 -mg_levels_ksp_type richardson -N 512 -options_left 1 -pc_mg_galerkin -pc_mg_levels 4 -pc_type mg -ppe_max_iter 20 -px 8 -py 8 -pz 8 #End of PETSc Option Table entries There is one unused database option. It is: Option left: name:-ppe_max_iter value: 20 Application 48712763 resources: utime ~3749s, stime ~789s, Rss ~196960, inblocks ~781565, outblocks ~505751 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_512_4096.txt URL: -------------- next part -------------- Linear solve converged due to CONVERGED_RTOL iterations 7 1 step time: 4.8914160728454590 norm1 error: 8.6827845637092041E-008 norm inf error: 4.1127664509280201E-003 Summary of Memory Usage in PETSc Maximum (over computational time) process memory: total 1.9679e+09 max 1.1249e+05 min 4.1456e+04 Current process memory: total 1.9679e+09 max 1.1249e+05 min 4.1456e+04 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./test_ksp.exe on a gnu-opt named . with 32768 processors, by wang11 Tue Oct 4 03:50:16 2016 Using Petsc Development GIT revision: v3.6.3-2059-geab7831 GIT Date: 2016-01-20 10:58:35 -0600 Max Max/Min Avg Total Time (sec): 5.221e+00 1.00192 5.215e+00 Objects: 3.330e+02 1.72539 1.952e+02 Flops: 2.232e+09 531.65406 3.900e+07 1.278e+12 Flops/sec: 4.277e+08 531.89802 7.473e+06 2.449e+11 MPI Messages: 8.594e+03 4.55579 2.011e+03 6.589e+07 MPI Message Lengths: 1.078e+06 1.95814 2.782e+02 1.833e+10 MPI Reductions: 4.310e+02 1.60223 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 5.2149e+00 100.0% 1.2779e+12 100.0% 6.589e+07 100.0% 2.782e+02 100.0% 2.705e+02 62.8% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage BuildTwoSidedF 1 1.0 6.2082e-02 6.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecTDot 14 1.0 1.5901e-02 2.1 1.15e+05 1.0 0.0e+00 0.0e+00 1.4e+01 0 0 0 0 3 0 0 0 0 5 236313 VecNorm 8 1.0 8.2795e-0299.5 6.55e+04 1.0 0.0e+00 0.0e+00 8.0e+00 1 0 0 0 2 1 0 0 0 3 25937 VecScale 28 2.0 4.6015e-0417.9 8.96e+03 2.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 324014 VecCopy 9 1.0 2.4486e-04 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 193 1.8 5.3072e-04 4.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 28 1.0 6.1011e-04 2.5 2.29e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 12319342 VecAYPX 48 1.4 4.3058e-04 2.8 1.15e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 8416119 VecAssemblyBegin 1 1.0 6.2096e-02 6.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAssemblyEnd 1 1.0 6.3896e-0567.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 194 1.6 2.2339e-02 8.0 0.00e+00 0.0 4.3e+07 2.8e+02 0.0e+00 0 0 65 66 0 0 0 65 66 0 0 VecScatterEnd 194 1.6 3.7815e+0039.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 71 0 0 0 0 71 0 0 0 0 0 MatMult 56 1.3 7.7610e-02 7.5 1.55e+06 1.2 1.7e+07 5.6e+02 0.0e+00 0 3 26 53 0 0 3 26 53 0 563808 MatMultAdd 35 1.7 1.1928e-02 9.2 2.48e+05 1.1 4.9e+06 1.1e+02 0.0e+00 0 1 7 3 0 0 1 7 3 0 607627 MatMultTranspose 47 1.5 2.6726e-0213.3 2.84e+05 1.1 6.5e+06 9.9e+01 0.0e+00 0 1 10 3 0 0 1 10 3 0 310054 MatSolve 7 0.0 5.5102e-02 0.0 4.38e+07 0.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 407368 MatSOR 70 1.7 2.0535e-02 3.7 1.70e+06 1.4 1.2e+07 9.8e+01 2.2e-01 0 3 18 7 0 0 3 18 7 0 1976428 MatLUFactorSym 1 0.0 1.4304e-01 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLUFactorNum 1 0.0 3.0453e+00 0.0 2.18e+09 0.0 0.0e+00 0.0e+00 0.0e+00 1 87 0 0 0 1 87 0 0 0 366959 MatConvert 1 0.0 1.3890e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatResidual 35 1.7 7.3063e-0211.3 8.37e+05 1.4 1.3e+07 3.0e+02 0.0e+00 0 2 20 22 0 0 2 20 22 0 279200 MatAssemblyBegin 29 1.5 1.1239e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+01 2 0 0 0 5 2 0 0 0 7 0 MatAssemblyEnd 29 1.5 3.6328e-01 1.1 0.00e+00 0.0 8.9e+06 4.1e+01 7.3e+01 6 0 14 2 17 6 0 14 2 27 0 MatGetRowIJ 1 0.0 1.1570e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetSubMatrice 2 2.0 1.0665e-01 4.9 0.00e+00 0.0 1.6e+05 5.4e+02 3.1e+00 1 0 0 0 1 1 0 0 0 1 0 MatGetOrdering 1 0.0 8.1892e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatPtAP 6 1.5 4.1852e-01 1.0 7.98e+05 1.2 1.9e+07 3.0e+02 6.9e+01 8 2 28 30 16 8 2 28 30 25 50373 MatPtAPSymbolic 6 1.5 2.2612e-01 1.0 0.00e+00 0.0 1.1e+07 3.7e+02 2.8e+01 4 0 16 22 7 4 0 16 22 10 0 MatPtAPNumeric 6 1.5 1.9413e-01 1.0 7.98e+05 1.2 7.7e+06 2.0e+02 4.0e+01 4 2 12 8 9 4 2 12 8 15 108597 MatRedundantMat 1 0.0 2.9847e-02 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.2e-02 0 0 0 0 0 0 0 0 0 0 0 MatMPIConcateSeq 1 0.0 7.8937e-02 0.0 0.00e+00 0.0 2.7e+04 4.0e+01 2.3e-01 0 0 0 0 0 0 0 0 0 0 0 MatGetLocalMat 6 1.5 7.7701e-04 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetBrAoCol 6 1.5 1.9681e-02 3.1 0.00e+00 0.0 8.3e+06 3.9e+02 0.0e+00 0 0 13 18 0 0 0 13 18 0 0 MatGetSymTrans 12 1.5 2.0599e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DMCoarsen 5 1.7 9.4588e-02 1.0 0.00e+00 0.0 1.2e+06 5.8e+01 3.3e+01 2 0 2 0 8 2 0 2 0 12 0 DMCreateInterpolation 5 1.7 2.1863e-01 1.0 3.54e+04 1.1 2.1e+06 5.8e+01 4.8e+01 4 0 3 1 11 4 0 3 1 18 4736 KSPSetUp 10 2.0 2.9837e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 1 0 0 0 3 1 0 0 0 4 0 KSPSolve 1 1.0 4.8916e+00 1.0 2.23e+09531.7 6.5e+07 2.8e+02 2.4e+02 94100 99 98 56 94100 99 98 89 261253 PCSetUp 2 2.0 4.6506e+00 4.8 2.18e+093247.5 2.3e+07 2.5e+02 1.9e+02 20 89 35 32 44 20 89 35 32 71 245045 PCApply 7 1.0 3.7972e+00 1.0 2.23e+09794.1 4.2e+07 2.2e+02 1.6e+01 73 96 63 51 4 73 96 63 51 6 324561 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 133 133 850544 0. Vector Scatter 24 24 68032 0. Matrix 58 58 42186948 0. Matrix Null Space 1 1 592 0. Distributed Mesh 7 7 34944 0. Star Forest Bipartite Graph 14 14 11872 0. Discrete System 7 7 5992 0. Index Set 54 54 152244 0. IS L to G Mapping 7 7 37936 0. Krylov Solver 11 11 13640 0. DMKSP interface 5 5 3240 0. Preconditioner 11 11 11008 0. Viewer 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 1.90735e-07 Average time for MPI_Barrier(): 6.00338e-05 Average time for zero size MPI_Send(): 1.25148e-05 #PETSc Option Table entries: -ksp_converged_reason -ksp_initial_guess_nonzero yes -ksp_norm_type unpreconditioned -ksp_rtol 1e-7 -ksp_type cg -log_view -matptap_scalable -matrap 0 -memory_view -mg_coarse_ksp_type preonly -mg_coarse_pc_telescope_reduction_factor 64 -mg_coarse_pc_type telescope -mg_coarse_telescope_ksp_type preonly -mg_coarse_telescope_mg_coarse_ksp_type preonly -mg_coarse_telescope_mg_coarse_pc_type redundant -mg_coarse_telescope_mg_levels_ksp_max_it 1 -mg_coarse_telescope_mg_levels_ksp_type richardson -mg_coarse_telescope_pc_mg_galerkin -mg_coarse_telescope_pc_mg_levels 3 -mg_coarse_telescope_pc_type mg -mg_levels_ksp_max_it 1 -mg_levels_ksp_type richardson -N 512 -options_left 1 -pc_mg_galerkin -pc_mg_levels 4 -pc_type mg -ppe_max_iter 20 -px 32 -py 32 -pz 32 #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --known-level1-dcache-size=16384 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --known-has-attribute-aligned=1 --with-batch="1 " --known-mpi-shared="0 " --known-mpi-shared-libraries=0 --known-memcmp-ok --with-blas-lapack-lib=/opt/acml/5.3.1/gfortran64/lib/libacml.a --COPTFLAGS="-march=bdver1 -O3 -ffast-math -fPIC " --FOPTFLAGS="-march=bdver1 -O3 -ffast-math -fPIC " --CXXOPTFLAGS="-march=bdver1 -O3 -ffast-math -fPIC " --with-x="0 " --with-debugging="0 " --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " --with-fortranlib-autodetect="0 " --with-shared-libraries="0 " --with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " --download-hypre="1 " --download-blacs="1 " --download-scalapack="1 " --download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " PETSC_ARCH=gnu-opt ----------------------------------------- Libraries compiled on Tue Feb 16 12:57:46 2016 on h2ologin3 Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64 Using PETSc directory: /mnt/a/u/sciteam/wang11/Sftw/petsc Using PETSc arch: gnu-opt ----------------------------------------- Using C compiler: cc -march=bdver1 -O3 -ffast-math -fPIC ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: ftn -march=bdver1 -O3 -ffast-math -fPIC ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/include -I/mnt/a/u/sciteam/wang11/Sftw/petsc/include -I/mnt/a/u/sciteam/wang11/Sftw/petsc/include -I/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/include ----------------------------------------- Using C linker: cc Using Fortran linker: ftn Using libraries: -Wl,-rpath,/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -L/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -lpetsc -Wl,-rpath,/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -L/mnt/a/u/sciteam/wang11/Sftw/petsc/gnu-opt/lib -lsuperlu_dist_4.3 -lHYPRE -lscalapack -Wl,-rpath,/opt/acml/5.3.1/gfortran64/lib -L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lssl -lcrypto -ldl ----------------------------------------- #PETSc Option Table entries: -ksp_converged_reason -ksp_initial_guess_nonzero yes -ksp_norm_type unpreconditioned -ksp_rtol 1e-7 -ksp_type cg -log_view -matptap_scalable -matrap 0 -memory_view -mg_coarse_ksp_type preonly -mg_coarse_pc_telescope_reduction_factor 64 -mg_coarse_pc_type telescope -mg_coarse_telescope_ksp_type preonly -mg_coarse_telescope_mg_coarse_ksp_type preonly -mg_coarse_telescope_mg_coarse_pc_type redundant -mg_coarse_telescope_mg_levels_ksp_max_it 1 -mg_coarse_telescope_mg_levels_ksp_type richardson -mg_coarse_telescope_pc_mg_galerkin -mg_coarse_telescope_pc_mg_levels 3 -mg_coarse_telescope_pc_type mg -mg_levels_ksp_max_it 1 -mg_levels_ksp_type richardson -N 512 -options_left 1 -pc_mg_galerkin -pc_mg_levels 4 -pc_type mg -ppe_max_iter 20 -px 32 -py 32 -pz 32 #End of PETSc Option Table entries There is one unused database option. It is: Option left: name:-ppe_max_iter value: 20 Application 48712514 resources: utime ~274648s, stime ~36467s, Rss ~112492, inblocks ~29956998, outblocks ~32114238 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_1024_4096.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_1024_8192.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_1024_16384.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_1024_32768.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_1024_65536.txt URL: From dave.mayhem23 at gmail.com Tue Oct 4 14:56:03 2016 From: dave.mayhem23 at gmail.com (Dave May) Date: Tue, 4 Oct 2016 20:56:03 +0100 Subject: [petsc-users] Performance of the Telescope Multigrid Preconditioner In-Reply-To: <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> References: <577C337B.60909@uci.edu> <577D75D3.8010703@uci.edu> <2F25042C-E6D6-4AC6-9C22-1B63F8065836@mcs.anl.gov> <57804DE9.707@uci.edu> <5783D3E4.4020004@uci.edu> <5786C9C7.1080309@uci.edu> <5959F823-EDE5-4B34-84C2-271076977368@mcs.anl.gov> <0CFDEA05-2C49-4127-9F13-2B2DB71ADA77@mcs.anl.gov> <27f4756a-3c58-5c56-fd5b-000aac881a5b@uci.edu> <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> Message-ID: On Tuesday, 4 October 2016, frank wrote: > Hi, > This question is follow-up of the thread "Question about memory usage in > Multigrid preconditioner". > I used to have the "Out of Memory(OOM)" problem when using the > CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0; > -matptap_scalable" option did solve that problem. > > Then I test the scalability by solving a 3d poisson eqn for 1 step. I used > one sub-communicator in all the tests. The difference between the petsc > options in those tests are: 1 the pc_telescope_reduction_factor; 2 the > number of multigrid levels in the up/down solver. The function "ksp_solve" > is timed. It is kind of slow and doesn't scale at all. > > Test1: 512^3 grid points > Core# telescope_reduction_factor MG levels# for up/down > solver Time for KSPSolve (s) > 512 8 4 / > 3 6.2466 > 4096 64 5 / > 3 0.9361 > 32768 64 4 / > 3 4.8914 > > Test2: 1024^3 grid points > Core# telescope_reduction_factor MG levels# for up/down > solver Time for KSPSolve (s) > 4096 64 5 / 4 > 3.4139 > 8192 128 5 / > 4 2.4196 > 16384 32 5 / 3 > 5.4150 > 32768 64 5 / > 3 5.6067 > 65536 128 5 / > 3 6.5219 > You have to be very careful how you interpret these numbers. Your solver contains nested calls to KSPSolve, and unfortunately as a result the numbers you report include setup time. This will remain true even if you call KSPSetUp on the outermost KSP. Your email concerns scalability of the silver application, so let's focus on that issue. The only way to clearly separate setup from solve time is to perform two identical solves. The second solve will not require any setup. You should monitor the second solve via a new PetscStage. This was what I did in the telescope paper. It was the only way to understand the setup cost (and scaling) cf the solve time (and scaling). Thanks Dave > I guess I didn't set the MG levels properly. What would be the efficient > way to arrange the MG levels? > Also which preconditionr at the coarse mesh of the 2nd communicator should > I use to improve the performance? > > I attached the test code and the petsc options file for the 1024^3 cube > with 32768 cores. > > Thank you. > > Regards, > Frank > > > > > > > On 09/15/2016 03:35 AM, Dave May wrote: > > HI all, > > I the only unexpected memory usage I can see is associated with the call > to MatPtAP(). > Here is something you can try immediately. > Run your code with the additional options > -matrap 0 -matptap_scalable > > I didn't realize this before, but the default behaviour of MatPtAP in > parallel is actually to to explicitly form the transpose of P (e.g. > assemble R = P^T) and then compute R.A.P. > You don't want to do this. The option -matrap 0 resolves this issue. > > The implementation of P^T.A.P has two variants. > The scalable implementation (with respect to memory usage) is selected via > the second option -matptap_scalable. > > Try it out - I see a significant memory reduction using these options for > particular mesh sizes / partitions. > > I've attached a cleaned up version of the code you sent me. > There were a number of memory leaks and other issues. > The main points being > * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End} > * You should call PetscFinalize(), otherwise the option -log_summary > (-log_view) will not display anything once the program has completed. > > > Thanks, > Dave > > > On 15 September 2016 at 08:03, Hengjie Wang > wrote: > >> Hi Dave, >> >> Sorry, I should have put more comment to explain the code. >> The number of process in each dimension is the same: Px = Py=Pz=P. So is >> the domain size. >> So if the you want to run the code for a 512^3 grid points on 16^3 >> cores, you need to set "-N 512 -P 16" in the command line. >> I add more comments and also fix an error in the attached code. ( The >> error only effects the accuracy of solution but not the memory usage. ) >> >> Thank you. >> Frank >> >> >> On 9/14/2016 9:05 PM, Dave May wrote: >> >> >> >> On Thursday, 15 September 2016, Dave May > > wrote: >> >>> >>> >>> On Thursday, 15 September 2016, frank wrote: >>> >>>> Hi, >>>> >>>> I write a simple code to re-produce the error. I hope this can help to >>>> diagnose the problem. >>>> The code just solves a 3d poisson equation. >>>> >>> >>> Why is the stencil width a runtime parameter?? And why is the default >>> value 2? For 7-pnt FD Laplace, you only need a stencil width of 1. >>> >>> Was this choice made to mimic something in the real application code? >>> >> >> Please ignore - I misunderstood your usage of the param set by -P >> >> >>> >>> >>>> >>>> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32. >>>> That's when I re-produce the OOM error. Each core has about 2G memory. >>>> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The >>>> ksp solver works fine. >>>> I attached the code, ksp_view_pre's output and my petsc option file. >>>> >>>> Thank you. >>>> Frank >>>> >>>> On 09/09/2016 06:38 PM, Hengjie Wang wrote: >>>> >>>> Hi Barry, >>>> >>>> I checked. On the supercomputer, I had the option "-ksp_view_pre" but >>>> it is not in file I sent you. I am sorry for the confusion. >>>> >>>> Regards, >>>> Frank >>>> >>>> On Friday, September 9, 2016, Barry Smith wrote: >>>> >>>>> >>>>> > On Sep 9, 2016, at 3:11 PM, frank wrote: >>>>> > >>>>> > Hi Barry, >>>>> > >>>>> > I think the first KSP view output is from -ksp_view_pre. Before I >>>>> submitted the test, I was not sure whether there would be OOM error or not. >>>>> So I added both -ksp_view_pre and -ksp_view. >>>>> >>>>> But the options file you sent specifically does NOT list the >>>>> -ksp_view_pre so how could it be from that? >>>>> >>>>> Sorry to be pedantic but I've spent too much time in the past >>>>> trying to debug from incorrect information and want to make sure that the >>>>> information I have is correct before thinking. Please recheck exactly what >>>>> happened. Rerun with the exact input file you emailed if that is needed. >>>>> >>>>> Barry >>>>> >>>>> > >>>>> > Frank >>>>> > >>>>> > >>>>> > On 09/09/2016 12:38 PM, Barry Smith wrote: >>>>> >> Why does ksp_view2.txt have two KSP views in it while >>>>> ksp_view1.txt has only one KSPView in it? Did you run two different solves >>>>> in the 2 case but not the one? >>>>> >> >>>>> >> Barry >>>>> >> >>>>> >> >>>>> >> >>>>> >>> On Sep 9, 2016, at 10:56 AM, frank wrote: >>>>> >>> >>>>> >>> Hi, >>>>> >>> >>>>> >>> I want to continue digging into the memory problem here. >>>>> >>> I did find a work around in the past, which is to use less cores >>>>> per node so that each core has 8G memory. However this is deficient and >>>>> expensive. I hope to locate the place that uses the most memory. >>>>> >>> >>>>> >>> Here is a brief summary of the tests I did in past: >>>>> >>>> Test1: Mesh 1536*128*384 | Process Mesh 48*4*12 >>>>> >>> Maximum (over computational time) process memory: total >>>>> 7.0727e+08 >>>>> >>> Current process memory: >>>>> total 7.0727e+08 >>>>> >>> Maximum (over computational time) space PetscMalloc()ed: total >>>>> 6.3908e+11 >>>>> >>> Current space PetscMalloc()ed: >>>>> total 1.8275e+09 >>>>> >>> >>>>> >>>> Test2: Mesh 1536*128*384 | Process Mesh 96*8*24 >>>>> >>> Maximum (over computational time) process memory: total >>>>> 5.9431e+09 >>>>> >>> Current process memory: >>>>> total 5.9431e+09 >>>>> >>> Maximum (over computational time) space PetscMalloc()ed: total >>>>> 5.3202e+12 >>>>> >>> Current space PetscMalloc()ed: >>>>> total 5.4844e+09 >>>>> >>> >>>>> >>>> Test3: Mesh 3072*256*768 | Process Mesh 96*8*24 >>>>> >>> OOM( Out Of Memory ) killer of the supercomputer terminated >>>>> the job during "KSPSolve". >>>>> >>> >>>>> >>> I attached the output of ksp_view( the third test's output is from >>>>> ksp_view_pre ), memory_view and also the petsc options. >>>>> >>> >>>>> >>> In all the tests, each core can access about 2G memory. In test3, >>>>> there are 4223139840 non-zeros in the matrix. This will consume about >>>>> 1.74M, using double precision. Considering some extra memory used to store >>>>> integer index, 2G memory should still be way enough. >>>>> >>> >>>>> >>> Is there a way to find out which part of KSPSolve uses the most >>>>> memory? >>>>> >>> Thank you so much. >>>>> >>> >>>>> >>> BTW, there are 4 options remains unused and I don't understand why >>>>> they are omitted: >>>>> >>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly >>>>> >>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi >>>>> >>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1 >>>>> >>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson >>>>> >>> >>>>> >>> >>>>> >>> Regards, >>>>> >>> Frank >>>>> >>> >>>>> >>> On 07/13/2016 05:47 PM, Dave May wrote: >>>>> >>>> >>>>> >>>> On 14 July 2016 at 01:07, frank wrote: >>>>> >>>> Hi Dave, >>>>> >>>> >>>>> >>>> Sorry for the late reply. >>>>> >>>> Thank you so much for your detailed reply. >>>>> >>>> >>>>> >>>> I have a question about the estimation of the memory usage. There >>>>> are 4223139840 allocated non-zeros and 18432 MPI processes. Double >>>>> precision is used. So the memory per process is: >>>>> >>>> 4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ? >>>>> >>>> Did I do sth wrong here? Because this seems too small. >>>>> >>>> >>>>> >>>> No - I totally f***ed it up. You are correct. That'll teach me >>>>> for fumbling around with my iphone calculator and not using my brain. (Note >>>>> that to convert to MB just divide by 1e6, not 1024^2 - although I >>>>> apparently cannot convert between units correctly....) >>>>> >>>> >>>>> >>>> From the PETSc objects associated with the solver, It looks like >>>>> it _should_ run with 2GB per MPI rank. Sorry for my mistake. Possibilities >>>>> are: somewhere in your usage of PETSc you've introduced a memory leak; >>>>> PETSc is doing a huge over allocation (e.g. as per our discussion of >>>>> MatPtAP); or in your application code there are other objects you have >>>>> forgotten to log the memory for. >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> I am running this job on Bluewater >>>>> >>>> I am using the 7 points FD stencil in 3D. >>>>> >>>> >>>>> >>>> I thought so on both counts. >>>>> >>>> >>>>> >>>> I apologize that I made a stupid mistake in computing the memory >>>>> per core. My settings render each core can access only 2G memory on average >>>>> instead of 8G which I mentioned in previous email. I re-run the job with 8G >>>>> memory per core on average and there is no "Out Of Memory" error. I would >>>>> do more test to see if there is still some memory issue. >>>>> >>>> >>>>> >>>> Ok. I'd still like to know where the memory was being used since >>>>> my estimates were off. >>>>> >>>> >>>>> >>>> >>>>> >>>> Thanks, >>>>> >>>> Dave >>>>> >>>> >>>>> >>>> Regards, >>>>> >>>> Frank >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> On 07/11/2016 01:18 PM, Dave May wrote: >>>>> >>>>> Hi Frank, >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 11 July 2016 at 19:14, frank wrote: >>>>> >>>>> Hi Dave, >>>>> >>>>> >>>>> >>>>> I re-run the test using bjacobi as the preconditioner on the >>>>> coarse mesh of telescope. The Grid is 3072*256*768 and process mesh is >>>>> 96*8*24. The petsc option file is attached. >>>>> >>>>> I still got the "Out Of Memory" error. The error occurred before >>>>> the linear solver finished one step. So I don't have the full info from >>>>> ksp_view. The info from ksp_view_pre is attached. >>>>> >>>>> >>>>> >>>>> Okay - that is essentially useless (sorry) >>>>> >>>>> >>>>> >>>>> It seems to me that the error occurred when the decomposition >>>>> was going to be changed. >>>>> >>>>> >>>>> >>>>> Based on what information? >>>>> >>>>> Running with -info would give us more clues, but will create a >>>>> ton of output. >>>>> >>>>> Please try running the case which failed with -info >>>>> >>>>> I had another test with a grid of 1536*128*384 and the same >>>>> process mesh as above. There was no error. The ksp_view info is attached >>>>> for comparison. >>>>> >>>>> Thank you. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> [3] Here is my crude estimate of your memory usage. >>>>> >>>>> I'll target the biggest memory hogs only to get an order of >>>>> magnitude estimate >>>>> >>>>> >>>>> >>>>> * The Fine grid operator contains 4223139840 non-zeros --> 1.8 >>>>> GB per MPI rank assuming double precision. >>>>> >>>>> The indices for the AIJ could amount to another 0.3 GB (assuming >>>>> 32 bit integers) >>>>> >>>>> >>>>> >>>>> * You use 5 levels of coarsening, so the other operators should >>>>> represent (collectively) >>>>> >>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4 ~ 300 MB per MPI rank on >>>>> the communicator with 18432 ranks. >>>>> >>>>> The coarse grid should consume ~ 0.5 MB per MPI rank on the >>>>> communicator with 18432 ranks. >>>>> >>>>> >>>>> >>>>> * You use a reduction factor of 64, making the new communicator >>>>> with 288 MPI ranks. >>>>> >>>>> PCTelescope will first gather a temporary matrix associated with >>>>> your coarse level operator assuming a comm size of 288 living on the comm >>>>> with size 18432. >>>>> >>>>> This matrix will require approximately 0.5 * 64 = 32 MB per core >>>>> on the 288 ranks. >>>>> >>>>> This matrix is then used to form a new MPIAIJ matrix on the >>>>> subcomm, thus require another 32 MB per rank. >>>>> >>>>> The temporary matrix is now destroyed. >>>>> >>>>> >>>>> >>>>> * Because a DMDA is detected, a permutation matrix is assembled. >>>>> >>>>> This requires 2 doubles per point in the DMDA. >>>>> >>>>> Your coarse DMDA contains 92 x 16 x 48 points. >>>>> >>>>> Thus the permutation matrix will require < 1 MB per MPI rank on >>>>> the sub-comm. >>>>> >>>>> >>>>> >>>>> * Lastly, the matrix is permuted. This uses MatPtAP(), but the >>>>> resulting operator will have the same memory footprint as the unpermuted >>>>> matrix (32 MB). At any stage in PCTelescope, only 2 operators of size 32 MB >>>>> are held in memory when the DMDA is provided. >>>>> >>>>> >>>>> >>>>> From my rough estimates, the worst case memory foot print for >>>>> any given core, given your options is approximately >>>>> >>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB = 2465 MB >>>>> >>>>> This is way below 8 GB. >>>>> >>>>> >>>>> >>>>> Note this estimate completely ignores: >>>>> >>>>> (1) the memory required for the restriction operator, >>>>> >>>>> (2) the potential growth in the number of non-zeros per row due >>>>> to Galerkin coarsening (I wished -ksp_view_pre reported the output from >>>>> MatView so we could see the number of non-zeros required by the coarse >>>>> level operators) >>>>> >>>>> (3) all temporary vectors required by the CG solver, and those >>>>> required by the smoothers. >>>>> >>>>> (4) internal memory allocated by MatPtAP >>>>> >>>>> (5) memory associated with IS's used within PCTelescope >>>>> >>>>> >>>>> >>>>> So either I am completely off in my estimates, or you have not >>>>> carefully estimated the memory usage of your application code. Hopefully >>>>> others might examine/correct my rough estimates >>>>> >>>>> >>>>> >>>>> Since I don't have your code I cannot access the latter. >>>>> >>>>> Since I don't have access to the same machine you are running >>>>> on, I think we need to take a step back. >>>>> >>>>> >>>>> >>>>> [1] What machine are you running on? Send me a URL if its >>>>> available >>>>> >>>>> >>>>> >>>>> [2] What discretization are you using? (I am guessing a scalar 7 >>>>> point FD stencil) >>>>> >>>>> If it's a 7 point FD stencil, we should be able to examine the >>>>> memory usage of your solver configuration using a standard, light weight >>>>> existing PETSc example, run on your machine at the same scale. >>>>> >>>>> This would hopefully enable us to correctly evaluate the actual >>>>> memory usage required by the solver configuration you are using. >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Dave >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Frank >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 07/08/2016 10:38 PM, Dave May wrote: >>>>> >>>>>> >>>>> >>>>>> On Saturday, 9 July 2016, frank wrote: >>>>> >>>>>> Hi Barry and Dave, >>>>> >>>>>> >>>>> >>>>>> Thank both of you for the advice. >>>>> >>>>>> >>>>> >>>>>> @Barry >>>>> >>>>>> I made a mistake in the file names in last email. I attached >>>>> the correct files this time. >>>>> >>>>>> For all the three tests, 'Telescope' is used as the coarse >>>>> preconditioner. >>>>> >>>>>> >>>>> >>>>>> == Test1: Grid: 1536*128*384, Process Mesh: 48*4*12 >>>>> >>>>>> Part of the memory usage: Vector 125 124 3971904 >>>>> 0. >>>>> >>>>>> Matrix 101 101 >>>>> 9462372 0 >>>>> >>>>>> >>>>> >>>>>> == Test2: Grid: 1536*128*384, Process Mesh: 96*8*24 >>>>> >>>>>> Part of the memory usage: Vector 125 124 681672 >>>>> 0. >>>>> >>>>>> Matrix 101 101 >>>>> 1462180 0. >>>>> >>>>>> >>>>> >>>>>> In theory, the memory usage in Test1 should be 8 times of >>>>> Test2. In my case, it is about 6 times. >>>>> >>>>>> >>>>> >>>>>> == Test3: Grid: 3072*256*768, Process Mesh: 96*8*24. >>>>> Sub-domain per process: 32*32*32 >>>>> >>>>>> Here I get the out of memory error. >>>>> >>>>>> >>>>> >>>>>> I tried to use -mg_coarse jacobi. In this way, I don't need to >>>>> set -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right? >>>>> >>>>>> The linear solver didn't work in this case. Petsc output some >>>>> errors. >>>>> >>>>>> >>>>> >>>>>> @Dave >>>>> >>>>>> In test3, I use only one instance of 'Telescope'. On the coarse >>>>> mesh of 'Telescope', I used LU as the preconditioner instead of SVD. >>>>> >>>>>> If my set the levels correctly, then on the last coarse mesh of >>>>> MG where it calls 'Telescope', the sub-domain per process is 2*2*2. >>>>> >>>>>> On the last coarse mesh of 'Telescope', there is only one grid >>>>> point per process. >>>>> >>>>>> I still got the OOM error. The detailed petsc option file is >>>>> attached. >>>>> >>>>>> >>>>> >>>>>> Do you understand the expected memory usage for the particular >>>>> parallel LU implementation you are using? I don't (seriously). Replace LU >>>>> with bjacobi and re-run this test. My point about solver debugging is still >>>>> valid. >>>>> >>>>>> >>>>> >>>>>> And please send the result of KSPView so we can see what is >>>>> actually used in the computations >>>>> >>>>>> >>>>> >>>>>> Thanks >>>>> >>>>>> Dave >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> Thank you so much. >>>>> >>>>>> >>>>> >>>>>> Frank >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote: >>>>> >>>>>> On Jul 6, 2016, at 4:19 PM, frank wrote: >>>>> >>>>>> >>>>> >>>>>> Hi Barry, >>>>> >>>>>> >>>>> >>>>>> Thank you for you advice. >>>>> >>>>>> I tried three test. In the 1st test, the grid is 3072*256*768 >>>>> and the process mesh is 96*8*24. >>>>> >>>>>> The linear solver is 'cg' the preconditioner is 'mg' and >>>>> 'telescope' is used as the preconditioner at the coarse mesh. >>>>> >>>>>> The system gives me the "Out of Memory" error before the linear >>>>> system is completely solved. >>>>> >>>>>> The info from '-ksp_view_pre' is attached. I seems to me that >>>>> the error occurs when it reaches the coarse mesh. >>>>> >>>>>> >>>>> >>>>>> The 2nd test uses a grid of 1536*128*384 and process mesh is >>>>> 96*8*24. The 3rd test uses the >>>>> same grid but a different process mesh 48*4*12. >>>>> >>>>>> Are you sure this is right? The total matrix and vector >>>>> memory usage goes from 2nd test >>>>> >>>>>> Vector 384 383 8,193,712 0. >>>>> >>>>>> Matrix 103 103 11,508,688 0. >>>>> >>>>>> to 3rd test >>>>> >>>>>> Vector 384 383 1,590,520 0. >>>>> >>>>>> Matrix 103 103 3,508,664 0. >>>>> >>>>>> that is the memory usage got smaller but if you have only 1/8th >>>>> the processes and the same grid it should have gotten about 8 times bigger. >>>>> Did you maybe cut the grid by a factor of 8 also? If so that still doesn't >>>>> explain it because the memory usage changed by a factor of 5 something for >>>>> the vectors and 3 something for the matrices. >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> The linear solver and petsc options in 2nd and 3rd tests are >>>>> the same in 1st test. The linear solver works fine in both test. >>>>> >>>>>> I attached the memory usage of the 2nd and 3rd tests. The >>>>> memory info is from the option '-log_summary'. I tried to use >>>>> '-momery_info' as you suggested, but in my case petsc treated it as an >>>>> unused option. It output nothing about the memory. Do I need to add sth to >>>>> my code so I can use '-memory_info'? >>>>> >>>>>> Sorry, my mistake the option is -memory_view >>>>> >>>>>> >>>>> >>>>>> Can you run the one case with -memory_view and -mg_coarse >>>>> jacobi -ksp_max_it 1 (just so it doesn't iterate forever) to see how much >>>>> memory is used without the telescope? Also run case 2 the same way. >>>>> >>>>>> >>>>> >>>>>> Barry >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> In both tests the memory usage is not large. >>>>> >>>>>> >>>>> >>>>>> It seems to me that it might be the 'telescope' preconditioner >>>>> that allocated a lot of memory and caused the error in the 1st test. >>>>> >>>>>> Is there is a way to show how much memory it allocated? >>>>> >>>>>> >>>>> >>>>>> Frank >>>>> >>>>>> >>>>> >>>>>> On 07/05/2016 03:37 PM, Barry Smith wrote: >>>>> >>>>>> Frank, >>>>> >>>>>> >>>>> >>>>>> You can run with -ksp_view_pre to have it "view" the KSP >>>>> before the solve so hopefully it gets that far. >>>>> >>>>>> >>>>> >>>>>> Please run the problem that does fit with -memory_info >>>>> when the problem completes it will show the "high water mark" for PETSc >>>>> allocated memory and total memory used. We first want to look at these >>>>> numbers to see if it is using more memory than you expect. You could also >>>>> run with say half the grid spacing to see how the memory usage scaled with >>>>> the increase in grid points. Make the runs also with -log_view and send all >>>>> the output from these options. >>>>> >>>>>> >>>>> >>>>>> Barry >>>>> >>>>>> >>>>> >>>>>> On Jul 5, 2016, at 5:23 PM, frank wrote: >>>>> >>>>>> >>>>> >>>>>> Hi, >>>>> >>>>>> >>>>> >>>>>> I am using the CG ksp solver and Multigrid preconditioner to >>>>> solve a linear system in parallel. >>>>> >>>>>> I chose to use the 'Telescope' as the preconditioner on the >>>>> coarse mesh for its good performance. >>>>> >>>>>> The petsc options file is attached. >>>>> >>>>>> >>>>> >>>>>> The domain is a 3d box. >>>>> >>>>>> It works well when the grid is 1536*128*384 and the process >>>>> mesh is 96*8*24. When I double the size of grid and >>>>> keep the same process mesh and petsc options, I >>>>> get an "out of memory" error from the super-cluster I am using. >>>>> >>>>>> Each process has access to at least 8G memory, which should be >>>>> more than enough for my application. I am sure that all the other parts of >>>>> my code( except the linear solver ) do not use much memory. So I doubt if >>>>> there is something wrong with the linear solver. >>>>> >>>>>> The error occurs before the linear system is completely solved >>>>> so I don't have the info from ksp view. I am not able to re-produce the >>>>> error with a smaller problem either. >>>>> >>>>>> In addition, I tried to use the block jacobi as the >>>>> preconditioner with the same grid and same decomposition. The linear solver >>>>> runs extremely slow but there is no memory error. >>>>> >>>>>> >>>>> >>>>>> How can I diagnose what exactly cause the error? >>>>> >>>>>> Thank you so much. >>>>> >>>>>> >>>>> >>>>>> Frank >>>>> >>>>>> >>>>> >>>>>> >>>> _options.txt> >>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>>>> >>> >>>> emory2.txt>>>>> tions3.txt> >>>>> > >>>>> >>>>> >>>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hengjiew at uci.edu Tue Oct 4 15:09:55 2016 From: hengjiew at uci.edu (frank) Date: Tue, 4 Oct 2016 13:09:55 -0700 Subject: [petsc-users] Performance of the Telescope Multigrid Preconditioner In-Reply-To: References: <577C337B.60909@uci.edu> <5783D3E4.4020004@uci.edu> <5786C9C7.1080309@uci.edu> <5959F823-EDE5-4B34-84C2-271076977368@mcs.anl.gov> <0CFDEA05-2C49-4127-9F13-2B2DB71ADA77@mcs.anl.gov> <27f4756a-3c58-5c56-fd5b-000aac881a5b@uci.edu> <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> Message-ID: Hi Dave, Thank you for the reply. What do you mean by the "nested calls to KSPSolve"? I tried to call KSPSolve twice, but the the second solve converged in 0 iteration. KSPSolve seems to remember the solution. How can I force both solves start from the same initial guess? Thank you. Frank On 10/04/2016 12:56 PM, Dave May wrote: > > > On Tuesday, 4 October 2016, frank > wrote: > > Hi, > > This question is follow-up of the thread "Question about memory > usage in Multigrid preconditioner". > I used to have the "Out of Memory(OOM)" problem when using the > CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0; > -matptap_scalable" option did solve that problem. > > Then I test the scalability by solving a 3d poisson eqn for 1 > step. I used one sub-communicator in all the tests. The difference > between the petsc options in those tests are: 1 the > pc_telescope_reduction_factor; 2 the number of multigrid levels in > the up/down solver. The function "ksp_solve" is timed. It is kind > of slow and doesn't scale at all. > > Test1: 512^3 grid points > Core# telescope_reduction_factor MG levels# for up/down > solver Time for KSPSolve (s) > 512 8 4 / 3 6.2466 > 4096 64 5 / 3 0.9361 > 32768 64 4 / 3 4.8914 > > Test2: 1024^3 grid points > Core# telescope_reduction_factor MG levels# for up/down > solver Time for KSPSolve (s) > 4096 64 5 / 4 3.4139 > 8192 128 5 / 4 2.4196 > 16384 32 5 / 3 5.4150 > 32768 64 5 / 3 5.6067 > 65536 128 5 / 3 6.5219 > > > You have to be very careful how you interpret these numbers. Your > solver contains nested calls to KSPSolve, and unfortunately as a > result the numbers you report include setup time. This will remain > true even if you call KSPSetUp on the outermost KSP. > > Your email concerns scalability of the silver application, so let's > focus on that issue. > > The only way to clearly separate setup from solve time is to perform > two identical solves. The second solve will not require any setup. You > should monitor the second solve via a new PetscStage. > > This was what I did in the telescope paper. It was the only way to > understand the setup cost (and scaling) cf the solve time (and scaling). > > Thanks > Dave > > I guess I didn't set the MG levels properly. What would be the > efficient way to arrange the MG levels? > Also which preconditionr at the coarse mesh of the 2nd > communicator should I use to improve the performance? > > I attached the test code and the petsc options file for the 1024^3 > cube with 32768 cores. > > Thank you. > > Regards, > Frank > > > > > > > On 09/15/2016 03:35 AM, Dave May wrote: >> HI all, >> >> I the only unexpected memory usage I can see is associated with >> the call to MatPtAP(). >> Here is something you can try immediately. >> Run your code with the additional options >> -matrap 0 -matptap_scalable >> >> I didn't realize this before, but the default behaviour of >> MatPtAP in parallel is actually to to explicitly form the >> transpose of P (e.g. assemble R = P^T) and then compute R.A.P. >> You don't want to do this. The option -matrap 0 resolves this issue. >> >> The implementation of P^T.A.P has two variants. >> The scalable implementation (with respect to memory usage) is >> selected via the second option -matptap_scalable. >> >> Try it out - I see a significant memory reduction using these >> options for particular mesh sizes / partitions. >> >> I've attached a cleaned up version of the code you sent me. >> There were a number of memory leaks and other issues. >> The main points being >> * You should call DMDAVecGetArrayF90() before >> VecAssembly{Begin,End} >> * You should call PetscFinalize(), otherwise the option >> -log_summary (-log_view) will not display anything once the >> program has completed. >> >> >> Thanks, >> Dave >> >> >> On 15 September 2016 at 08:03, Hengjie Wang > > wrote: >> >> Hi Dave, >> >> Sorry, I should have put more comment to explain the code. >> The number of process in each dimension is the same: Px = >> Py=Pz=P. So is the domain size. >> So if the you want to run the code for a 512^3 grid points >> on 16^3 cores, you need to set "-N 512 -P 16" in the command >> line. >> I add more comments and also fix an error in the attached >> code. ( The error only effects the accuracy of solution but >> not the memory usage. ) >> >> Thank you. >> Frank >> >> >> On 9/14/2016 9:05 PM, Dave May wrote: >>> >>> >>> On Thursday, 15 September 2016, Dave May >>> >> > >>> wrote: >>> >>> >>> >>> On Thursday, 15 September 2016, frank >>> wrote: >>> >>> Hi, >>> >>> I write a simple code to re-produce the error. I >>> hope this can help to diagnose the problem. >>> The code just solves a 3d poisson equation. >>> >>> >>> Why is the stencil width a runtime parameter?? And why >>> is the default value 2? For 7-pnt FD Laplace, you only >>> need a stencil width of 1. >>> >>> Was this choice made to mimic something in the >>> real application code? >>> >>> >>> Please ignore - I misunderstood your usage of the param set >>> by -P >>> >>> >>> I run the code on a 1024^3 mesh. The process >>> partition is 32 * 32 * 32. That's when I re-produce >>> the OOM error. Each core has about 2G memory. >>> I also run the code on a 512^3 mesh with 16 * 16 * >>> 16 processes. The ksp solver works fine. >>> I attached the code, ksp_view_pre's output and my >>> petsc option file. >>> >>> Thank you. >>> Frank >>> >>> On 09/09/2016 06:38 PM, Hengjie Wang wrote: >>>> Hi Barry, >>>> >>>> I checked. On the supercomputer, I had the option >>>> "-ksp_view_pre" but it is not in file I sent you. I >>>> am sorry for the confusion. >>>> >>>> Regards, >>>> Frank >>>> >>>> On Friday, September 9, 2016, Barry Smith >>>> wrote: >>>> >>>> >>>> > On Sep 9, 2016, at 3:11 PM, frank >>>> wrote: >>>> > >>>> > Hi Barry, >>>> > >>>> > I think the first KSP view output is from >>>> -ksp_view_pre. Before I submitted the test, I >>>> was not sure whether there would be OOM error >>>> or not. So I added both -ksp_view_pre and >>>> -ksp_view. >>>> >>>> But the options file you sent specifically >>>> does NOT list the -ksp_view_pre so how could it >>>> be from that? >>>> >>>> Sorry to be pedantic but I've spent too much >>>> time in the past trying to debug from incorrect >>>> information and want to make sure that the >>>> information I have is correct before thinking. >>>> Please recheck exactly what happened. Rerun >>>> with the exact input file you emailed if that >>>> is needed. >>>> >>>> Barry >>>> >>>> > >>>> > Frank >>>> > >>>> > >>>> > On 09/09/2016 12:38 PM, Barry Smith wrote: >>>> >> Why does ksp_view2.txt have two KSP views >>>> in it while ksp_view1.txt has only one KSPView >>>> in it? Did you run two different solves in the >>>> 2 case but not the one? >>>> >> >>>> >> Barry >>>> >> >>>> >> >>>> >> >>>> >>> On Sep 9, 2016, at 10:56 AM, frank >>>> wrote: >>>> >>> >>>> >>> Hi, >>>> >>> >>>> >>> I want to continue digging into the memory >>>> problem here. >>>> >>> I did find a work around in the past, which >>>> is to use less cores per node so that each core >>>> has 8G memory. However this is deficient and >>>> expensive. I hope to locate the place that uses >>>> the most memory. >>>> >>> >>>> >>> Here is a brief summary of the tests I did >>>> in past: >>>> >>>> Test1: Mesh 1536*128*384 | Process Mesh >>>> 48*4*12 >>>> >>> Maximum (over computational time) process >>>> memory: total 7.0727e+08 >>>> >>> Current process memory: total 7.0727e+08 >>>> >>> Maximum (over computational time) space >>>> PetscMalloc()ed: total 6.3908e+11 >>>> >>> Current space PetscMalloc()ed: >>>> total 1.8275e+09 >>>> >>> >>>> >>>> Test2: Mesh 1536*128*384 | Process Mesh >>>> 96*8*24 >>>> >>> Maximum (over computational time) process >>>> memory: total 5.9431e+09 >>>> >>> Current process memory: total 5.9431e+09 >>>> >>> Maximum (over computational time) space >>>> PetscMalloc()ed: total 5.3202e+12 >>>> >>> Current space PetscMalloc()ed: >>>> total 5.4844e+09 >>>> >>> >>>> >>>> Test3: Mesh 3072*256*768 | Process Mesh >>>> 96*8*24 >>>> >>> OOM( Out Of Memory ) killer of the >>>> supercomputer terminated the job during "KSPSolve". >>>> >>> >>>> >>> I attached the output of ksp_view( the >>>> third test's output is from ksp_view_pre ), >>>> memory_view and also the petsc options. >>>> >>> >>>> >>> In all the tests, each core can access >>>> about 2G memory. In test3, there are 4223139840 >>>> non-zeros in the matrix. This will consume >>>> about 1.74M, using double precision. >>>> Considering some extra memory used to store >>>> integer index, 2G memory should still be way >>>> enough. >>>> >>> >>>> >>> Is there a way to find out which part of >>>> KSPSolve uses the most memory? >>>> >>> Thank you so much. >>>> >>> >>>> >>> BTW, there are 4 options remains unused and >>>> I don't understand why they are omitted: >>>> >>> -mg_coarse_telescope_mg_coarse_ksp_type >>>> value: preonly >>>> >>> -mg_coarse_telescope_mg_coarse_pc_type >>>> value: bjacobi >>>> >>> -mg_coarse_telescope_mg_levels_ksp_max_it >>>> value: 1 >>>> >>> -mg_coarse_telescope_mg_levels_ksp_type >>>> value: richardson >>>> >>> >>>> >>> >>>> >>> Regards, >>>> >>> Frank >>>> >>> >>>> >>> On 07/13/2016 05:47 PM, Dave May wrote: >>>> >>>> >>>> >>>> On 14 July 2016 at 01:07, frank >>>> wrote: >>>> >>>> Hi Dave, >>>> >>>> >>>> >>>> Sorry for the late reply. >>>> >>>> Thank you so much for your detailed reply. >>>> >>>> >>>> >>>> I have a question about the estimation of >>>> the memory usage. There are 4223139840 >>>> allocated non-zeros and 18432 MPI processes. >>>> Double precision is used. So the memory per >>>> process is: >>>> >>>> 4223139840 * 8bytes / 18432 / 1024 / 1024 >>>> = 1.74M ? >>>> >>>> Did I do sth wrong here? Because this >>>> seems too small. >>>> >>>> >>>> >>>> No - I totally f***ed it up. You are >>>> correct. That'll teach me for fumbling around >>>> with my iphone calculator and not using my >>>> brain. (Note that to convert to MB just divide >>>> by 1e6, not 1024^2 - although I apparently >>>> cannot convert between units correctly....) >>>> >>>> >>>> >>>> From the PETSc objects associated with the >>>> solver, It looks like it _should_ run with 2GB >>>> per MPI rank. Sorry for my mistake. >>>> Possibilities are: somewhere in your usage of >>>> PETSc you've introduced a memory leak; PETSc is >>>> doing a huge over allocation (e.g. as per our >>>> discussion of MatPtAP); or in your application >>>> code there are other objects you have forgotten >>>> to log the memory for. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> I am running this job on Bluewater >>>> >>>> I am using the 7 points FD stencil in 3D. >>>> >>>> >>>> >>>> I thought so on both counts. >>>> >>>> >>>> >>>> I apologize that I made a stupid mistake >>>> in computing the memory per core. My settings >>>> render each core can access only 2G memory on >>>> average instead of 8G which I mentioned in >>>> previous email. I re-run the job with 8G memory >>>> per core on average and there is no "Out Of >>>> Memory" error. I would do more test to see if >>>> there is still some memory issue. >>>> >>>> >>>> >>>> Ok. I'd still like to know where the >>>> memory was being used since my estimates were off. >>>> >>>> >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Dave >>>> >>>> >>>> >>>> Regards, >>>> >>>> Frank >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 07/11/2016 01:18 PM, Dave May wrote: >>>> >>>>> Hi Frank, >>>> >>>>> >>>> >>>>> >>>> >>>>> On 11 July 2016 at 19:14, frank >>>> wrote: >>>> >>>>> Hi Dave, >>>> >>>>> >>>> >>>>> I re-run the test using bjacobi as the >>>> preconditioner on the coarse mesh of telescope. >>>> The Grid is 3072*256*768 and process mesh is >>>> 96*8*24. The petsc option file is attached. >>>> >>>>> I still got the "Out Of Memory" error. >>>> The error occurred before the linear solver >>>> finished one step. So I don't have the full >>>> info from ksp_view. The info from ksp_view_pre >>>> is attached. >>>> >>>>> >>>> >>>>> Okay - that is essentially useless (sorry) >>>> >>>>> >>>> >>>>> It seems to me that the error occurred >>>> when the decomposition was going to be changed. >>>> >>>>> >>>> >>>>> Based on what information? >>>> >>>>> Running with -info would give us more >>>> clues, but will create a ton of output. >>>> >>>>> Please try running the case which failed >>>> with -info >>>> >>>>> I had another test with a grid of >>>> 1536*128*384 and the same process mesh as >>>> above. There was no error. The ksp_view info is >>>> attached for comparison. >>>> >>>>> Thank you. >>>> >>>>> >>>> >>>>> >>>> >>>>> [3] Here is my crude estimate of your >>>> memory usage. >>>> >>>>> I'll target the biggest memory hogs only >>>> to get an order of magnitude estimate >>>> >>>>> >>>> >>>>> * The Fine grid operator contains >>>> 4223139840 non-zeros --> 1.8 GB per MPI rank >>>> assuming double precision. >>>> >>>>> The indices for the AIJ could amount to >>>> another 0.3 GB (assuming 32 bit integers) >>>> >>>>> >>>> >>>>> * You use 5 levels of coarsening, so the >>>> other operators should represent (collectively) >>>> >>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4 ~ >>>> 300 MB per MPI rank on the communicator with >>>> 18432 ranks. >>>> >>>>> The coarse grid should consume ~ 0.5 MB >>>> per MPI rank on the communicator with 18432 ranks. >>>> >>>>> >>>> >>>>> * You use a reduction factor of 64, >>>> making the new communicator with 288 MPI ranks. >>>> >>>>> PCTelescope will first gather a temporary >>>> matrix associated with your coarse level >>>> operator assuming a comm size of 288 living on >>>> the comm with size 18432. >>>> >>>>> This matrix will require approximately >>>> 0.5 * 64 = 32 MB per core on the 288 ranks. >>>> >>>>> This matrix is then used to form a new >>>> MPIAIJ matrix on the subcomm, thus require >>>> another 32 MB per rank. >>>> >>>>> The temporary matrix is now destroyed. >>>> >>>>> >>>> >>>>> * Because a DMDA is detected, a >>>> permutation matrix is assembled. >>>> >>>>> This requires 2 doubles per point in the >>>> DMDA. >>>> >>>>> Your coarse DMDA contains 92 x 16 x 48 >>>> points. >>>> >>>>> Thus the permutation matrix will require >>>> < 1 MB per MPI rank on the sub-comm. >>>> >>>>> >>>> >>>>> * Lastly, the matrix is permuted. This >>>> uses MatPtAP(), but the resulting operator will >>>> have the same memory footprint as the >>>> unpermuted matrix (32 MB). At any stage in >>>> PCTelescope, only 2 operators of size 32 MB are >>>> held in memory when the DMDA is provided. >>>> >>>>> >>>> >>>>> From my rough estimates, the worst case >>>> memory foot print for any given core, given >>>> your options is approximately >>>> >>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB >>>> = 2465 MB >>>> >>>>> This is way below 8 GB. >>>> >>>>> >>>> >>>>> Note this estimate completely ignores: >>>> >>>>> (1) the memory required for the >>>> restriction operator, >>>> >>>>> (2) the potential growth in the number of >>>> non-zeros per row due to Galerkin coarsening (I >>>> wished -ksp_view_pre reported the output from >>>> MatView so we could see the number of non-zeros >>>> required by the coarse level operators) >>>> >>>>> (3) all temporary vectors required by the >>>> CG solver, and those required by the smoothers. >>>> >>>>> (4) internal memory allocated by MatPtAP >>>> >>>>> (5) memory associated with IS's used >>>> within PCTelescope >>>> >>>>> >>>> >>>>> So either I am completely off in my >>>> estimates, or you have not carefully estimated >>>> the memory usage of your application code. >>>> Hopefully others might examine/correct my rough >>>> estimates >>>> >>>>> >>>> >>>>> Since I don't have your code I cannot >>>> access the latter. >>>> >>>>> Since I don't have access to the same >>>> machine you are running on, I think we need to >>>> take a step back. >>>> >>>>> >>>> >>>>> [1] What machine are you running on? Send >>>> me a URL if its available >>>> >>>>> >>>> >>>>> [2] What discretization are you using? (I >>>> am guessing a scalar 7 point FD stencil) >>>> >>>>> If it's a 7 point FD stencil, we should >>>> be able to examine the memory usage of your >>>> solver configuration using a standard, light >>>> weight existing PETSc example, run on your >>>> machine at the same scale. >>>> >>>>> This would hopefully enable us to >>>> correctly evaluate the actual memory usage >>>> required by the solver configuration you are using. >>>> >>>>> >>>> >>>>> Thanks, >>>> >>>>> Dave >>>> >>>>> >>>> >>>>> >>>> >>>>> Frank >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> On 07/08/2016 10:38 PM, Dave May wrote: >>>> >>>>>> >>>> >>>>>> On Saturday, 9 July 2016, frank >>>> wrote: >>>> >>>>>> Hi Barry and Dave, >>>> >>>>>> >>>> >>>>>> Thank both of you for the advice. >>>> >>>>>> >>>> >>>>>> @Barry >>>> >>>>>> I made a mistake in the file names in >>>> last email. I attached the correct files this time. >>>> >>>>>> For all the three tests, 'Telescope' is >>>> used as the coarse preconditioner. >>>> >>>>>> >>>> >>>>>> == Test1: Grid: 1536*128*384, >>>> Process Mesh: 48*4*12 >>>> >>>>>> Part of the memory usage: Vector 125 >>>> 124 3971904 0. >>>> >>>>>> Matrix 101 101 >>>> 9462372 0 >>>> >>>>>> >>>> >>>>>> == Test2: Grid: 1536*128*384, Process >>>> Mesh: 96*8*24 >>>> >>>>>> Part of the memory usage: Vector 125 >>>> 124 681672 0. >>>> >>>>>> Matrix 101 101 >>>> 1462180 0. >>>> >>>>>> >>>> >>>>>> In theory, the memory usage in Test1 >>>> should be 8 times of Test2. In my case, it is >>>> about 6 times. >>>> >>>>>> >>>> >>>>>> == Test3: Grid: 3072*256*768, Process >>>> Mesh: 96*8*24. Sub-domain per process: 32*32*32 >>>> >>>>>> Here I get the out of memory error. >>>> >>>>>> >>>> >>>>>> I tried to use -mg_coarse jacobi. In >>>> this way, I don't need to set >>>> -mg_coarse_ksp_type and -mg_coarse_pc_type >>>> explicitly, right? >>>> >>>>>> The linear solver didn't work in this >>>> case. Petsc output some errors. >>>> >>>>>> >>>> >>>>>> @Dave >>>> >>>>>> In test3, I use only one instance of >>>> 'Telescope'. On the coarse mesh of 'Telescope', >>>> I used LU as the preconditioner instead of SVD. >>>> >>>>>> If my set the levels correctly, then on >>>> the last coarse mesh of MG where it calls >>>> 'Telescope', the sub-domain per process is 2*2*2. >>>> >>>>>> On the last coarse mesh of 'Telescope', >>>> there is only one grid point per process. >>>> >>>>>> I still got the OOM error. The detailed >>>> petsc option file is attached. >>>> >>>>>> >>>> >>>>>> Do you understand the expected memory >>>> usage for the particular parallel LU >>>> implementation you are using? I don't >>>> (seriously). Replace LU with bjacobi and re-run >>>> this test. My point about solver debugging is >>>> still valid. >>>> >>>>>> >>>> >>>>>> And please send the result of KSPView so >>>> we can see what is actually used in the >>>> computations >>>> >>>>>> >>>> >>>>>> Thanks >>>> >>>>>> Dave >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> Thank you so much. >>>> >>>>>> >>>> >>>>>> Frank >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote: >>>> >>>>>> On Jul 6, 2016, at 4:19 PM, frank >>>> wrote: >>>> >>>>>> >>>> >>>>>> Hi Barry, >>>> >>>>>> >>>> >>>>>> Thank you for you advice. >>>> >>>>>> I tried three test. In the 1st test, the >>>> grid is 3072*256*768 and the process mesh is >>>> 96*8*24. >>>> >>>>>> The linear solver is 'cg' the >>>> preconditioner is 'mg' and 'telescope' is used >>>> as the preconditioner at the coarse mesh. >>>> >>>>>> The system gives me the "Out of Memory" >>>> error before the linear system is completely >>>> solved. >>>> >>>>>> The info from '-ksp_view_pre' is >>>> attached. I seems to me that the error occurs >>>> when it reaches the coarse mesh. >>>> >>>>>> >>>> >>>>>> The 2nd test uses a grid of 1536*128*384 >>>> and process mesh is 96*8*24. The 3rd >>>> test uses the same grid but a different >>>> process mesh 48*4*12. >>>> >>>>>> Are you sure this is right? The total >>>> matrix and vector memory usage goes from 2nd test >>>> >>>>>> Vector 384 383 >>>> 8,193,712 0. >>>> >>>>>> Matrix 103 103 >>>> 11,508,688 0. >>>> >>>>>> to 3rd test >>>> >>>>>> Vector 384 383 >>>> 1,590,520 0. >>>> >>>>>> Matrix 103 103 >>>> 3,508,664 0. >>>> >>>>>> that is the memory usage got smaller but >>>> if you have only 1/8th the processes and the >>>> same grid it should have gotten about 8 times >>>> bigger. Did you maybe cut the grid by a factor >>>> of 8 also? If so that still doesn't explain it >>>> because the memory usage changed by a factor of >>>> 5 something for the vectors and 3 something for >>>> the matrices. >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> The linear solver and petsc options in >>>> 2nd and 3rd tests are the same in 1st test. The >>>> linear solver works fine in both test. >>>> >>>>>> I attached the memory usage of the 2nd >>>> and 3rd tests. The memory info is from the >>>> option '-log_summary'. I tried to use >>>> '-momery_info' as you suggested, but in my case >>>> petsc treated it as an unused option. It output >>>> nothing about the memory. Do I need to add sth >>>> to my code so I can use '-memory_info'? >>>> >>>>>> Sorry, my mistake the option is >>>> -memory_view >>>> >>>>>> >>>> >>>>>> Can you run the one case with >>>> -memory_view and -mg_coarse jacobi -ksp_max_it >>>> 1 (just so it doesn't iterate forever) to see >>>> how much memory is used without the telescope? >>>> Also run case 2 the same way. >>>> >>>>>> >>>> >>>>>> Barry >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> In both tests the memory usage is not large. >>>> >>>>>> >>>> >>>>>> It seems to me that it might be the >>>> 'telescope' preconditioner that allocated a lot >>>> of memory and caused the error in the 1st test. >>>> >>>>>> Is there is a way to show how much >>>> memory it allocated? >>>> >>>>>> >>>> >>>>>> Frank >>>> >>>>>> >>>> >>>>>> On 07/05/2016 03:37 PM, Barry Smith wrote: >>>> >>>>>> Frank, >>>> >>>>>> >>>> >>>>>> You can run with -ksp_view_pre to have >>>> it "view" the KSP before the solve so hopefully >>>> it gets that far. >>>> >>>>>> >>>> >>>>>> Please run the problem that does fit >>>> with -memory_info when the problem completes it >>>> will show the "high water mark" for PETSc >>>> allocated memory and total memory used. We >>>> first want to look at these numbers to see if >>>> it is using more memory than you expect. You >>>> could also run with say half the grid spacing >>>> to see how the memory usage scaled with the >>>> increase in grid points. Make the runs also >>>> with -log_view and send all the output from >>>> these options. >>>> >>>>>> >>>> >>>>>> Barry >>>> >>>>>> >>>> >>>>>> On Jul 5, 2016, at 5:23 PM, frank >>>> wrote: >>>> >>>>>> >>>> >>>>>> Hi, >>>> >>>>>> >>>> >>>>>> I am using the CG ksp solver and >>>> Multigrid preconditioner to solve a linear >>>> system in parallel. >>>> >>>>>> I chose to use the 'Telescope' as the >>>> preconditioner on the coarse mesh for its good >>>> performance. >>>> >>>>>> The petsc options file is attached. >>>> >>>>>> >>>> >>>>>> The domain is a 3d box. >>>> >>>>>> It works well when the grid is >>>> 1536*128*384 and the process mesh is 96*8*24. >>>> When I double the size of grid and >>>> keep the same process mesh and petsc >>>> options, I get an "out of memory" error from >>>> the super-cluster I am using. >>>> >>>>>> Each process has access to at least 8G >>>> memory, which should be more than enough for my >>>> application. I am sure that all the other parts >>>> of my code( except the linear solver ) do not >>>> use much memory. So I doubt if there is >>>> something wrong with the linear solver. >>>> >>>>>> The error occurs before the linear >>>> system is completely solved so I don't have the >>>> info from ksp view. I am not able to re-produce >>>> the error with a smaller problem either. >>>> >>>>>> In addition, I tried to use the block >>>> jacobi as the preconditioner with the same grid >>>> and same decomposition. The linear solver runs >>>> extremely slow but there is no memory error. >>>> >>>>>> >>>> >>>>>> How can I diagnose what exactly cause >>>> the error? >>>> >>>>>> Thank you so much. >>>> >>>>>> >>>> >>>>>> Frank >>>> >>>>>> >>>> >>>>>> >>>> >>>> >>>>>> >>>> >>>>> >>>> >>>> >>>> >>> >>>> >>>> > >>>> >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Oct 4 15:20:33 2016 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 4 Oct 2016 15:20:33 -0500 Subject: [petsc-users] Performance of the Telescope Multigrid Preconditioner In-Reply-To: References: <577C337B.60909@uci.edu> <5783D3E4.4020004@uci.edu> <5786C9C7.1080309@uci.edu> <5959F823-EDE5-4B34-84C2-271076977368@mcs.anl.gov> <0CFDEA05-2C49-4127-9F13-2B2DB71ADA77@mcs.anl.gov> <27f4756a-3c58-5c56-fd5b-000aac881a5b@uci.edu> <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> Message-ID: On Tue, Oct 4, 2016 at 3:09 PM, frank wrote: > Hi Dave, > > Thank you for the reply. > What do you mean by the "nested calls to KSPSolve"? > KSPSolve is called again after redistributing the computation. > I tried to call KSPSolve twice, but the the second solve converged in 0 > iteration. KSPSolve seems to remember the solution. How can I force both > solves start from the same initial guess? > Did you zero the solution vector between solves? VecSet(x, 0.0); Matt > Thank you. > > Frank > > > > On 10/04/2016 12:56 PM, Dave May wrote: > > > > On Tuesday, 4 October 2016, frank wrote: > >> Hi, >> This question is follow-up of the thread "Question about memory usage in >> Multigrid preconditioner". >> I used to have the "Out of Memory(OOM)" problem when using the >> CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0; >> -matptap_scalable" option did solve that problem. >> >> Then I test the scalability by solving a 3d poisson eqn for 1 step. I >> used one sub-communicator in all the tests. The difference between the >> petsc options in those tests are: 1 the pc_telescope_reduction_factor; 2 >> the number of multigrid levels in the up/down solver. The function >> "ksp_solve" is timed. It is kind of slow and doesn't scale at all. >> >> Test1: 512^3 grid points >> Core# telescope_reduction_factor MG levels# for up/down >> solver Time for KSPSolve (s) >> 512 8 4 / >> 3 6.2466 >> 4096 64 5 / >> 3 0.9361 >> 32768 64 4 / >> 3 4.8914 >> >> Test2: 1024^3 grid points >> Core# telescope_reduction_factor MG levels# for up/down >> solver Time for KSPSolve (s) >> 4096 64 5 / 4 >> 3.4139 >> 8192 128 5 / >> 4 2.4196 >> 16384 32 5 / 3 >> 5.4150 >> 32768 64 5 / >> 3 5.6067 >> 65536 128 5 / >> 3 6.5219 >> > > You have to be very careful how you interpret these numbers. Your solver > contains nested calls to KSPSolve, and unfortunately as a result the > numbers you report include setup time. This will remain true even if you > call KSPSetUp on the outermost KSP. > > Your email concerns scalability of the silver application, so let's focus > on that issue. > > The only way to clearly separate setup from solve time is to perform two > identical solves. The second solve will not require any setup. You should > monitor the second solve via a new PetscStage. > > This was what I did in the telescope paper. It was the only way to > understand the setup cost (and scaling) cf the solve time (and scaling). > > Thanks > Dave > > > >> I guess I didn't set the MG levels properly. What would be the efficient >> way to arrange the MG levels? >> Also which preconditionr at the coarse mesh of the 2nd communicator >> should I use to improve the performance? >> >> I attached the test code and the petsc options file for the 1024^3 cube >> with 32768 cores. >> >> Thank you. >> >> Regards, >> Frank >> >> >> >> >> >> >> On 09/15/2016 03:35 AM, Dave May wrote: >> >> HI all, >> >> I the only unexpected memory usage I can see is associated with the call >> to MatPtAP(). >> Here is something you can try immediately. >> Run your code with the additional options >> -matrap 0 -matptap_scalable >> >> I didn't realize this before, but the default behaviour of MatPtAP in >> parallel is actually to to explicitly form the transpose of P (e.g. >> assemble R = P^T) and then compute R.A.P. >> You don't want to do this. The option -matrap 0 resolves this issue. >> >> The implementation of P^T.A.P has two variants. >> The scalable implementation (with respect to memory usage) is selected >> via the second option -matptap_scalable. >> >> Try it out - I see a significant memory reduction using these options for >> particular mesh sizes / partitions. >> >> I've attached a cleaned up version of the code you sent me. >> There were a number of memory leaks and other issues. >> The main points being >> * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End} >> * You should call PetscFinalize(), otherwise the option -log_summary >> (-log_view) will not display anything once the program has completed. >> >> >> Thanks, >> Dave >> >> >> On 15 September 2016 at 08:03, Hengjie Wang wrote: >> >>> Hi Dave, >>> >>> Sorry, I should have put more comment to explain the code. >>> The number of process in each dimension is the same: Px = Py=Pz=P. So is >>> the domain size. >>> So if the you want to run the code for a 512^3 grid points on 16^3 >>> cores, you need to set "-N 512 -P 16" in the command line. >>> I add more comments and also fix an error in the attached code. ( The >>> error only effects the accuracy of solution but not the memory usage. ) >>> >>> Thank you. >>> Frank >>> >>> >>> On 9/14/2016 9:05 PM, Dave May wrote: >>> >>> >>> >>> On Thursday, 15 September 2016, Dave May >>> wrote: >>> >>>> >>>> >>>> On Thursday, 15 September 2016, frank wrote: >>>> >>>>> Hi, >>>>> >>>>> I write a simple code to re-produce the error. I hope this can help to >>>>> diagnose the problem. >>>>> The code just solves a 3d poisson equation. >>>>> >>>> >>>> Why is the stencil width a runtime parameter?? And why is the default >>>> value 2? For 7-pnt FD Laplace, you only need a stencil width of 1. >>>> >>>> Was this choice made to mimic something in the real application code? >>>> >>> >>> Please ignore - I misunderstood your usage of the param set by -P >>> >>> >>>> >>>> >>>>> >>>>> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * >>>>> 32. That's when I re-produce the OOM error. Each core has about 2G memory. >>>>> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The >>>>> ksp solver works fine. >>>>> I attached the code, ksp_view_pre's output and my petsc option file. >>>>> >>>>> Thank you. >>>>> Frank >>>>> >>>>> On 09/09/2016 06:38 PM, Hengjie Wang wrote: >>>>> >>>>> Hi Barry, >>>>> >>>>> I checked. On the supercomputer, I had the option "-ksp_view_pre" but >>>>> it is not in file I sent you. I am sorry for the confusion. >>>>> >>>>> Regards, >>>>> Frank >>>>> >>>>> On Friday, September 9, 2016, Barry Smith wrote: >>>>> >>>>>> >>>>>> > On Sep 9, 2016, at 3:11 PM, frank wrote: >>>>>> > >>>>>> > Hi Barry, >>>>>> > >>>>>> > I think the first KSP view output is from -ksp_view_pre. Before I >>>>>> submitted the test, I was not sure whether there would be OOM error or not. >>>>>> So I added both -ksp_view_pre and -ksp_view. >>>>>> >>>>>> But the options file you sent specifically does NOT list the >>>>>> -ksp_view_pre so how could it be from that? >>>>>> >>>>>> Sorry to be pedantic but I've spent too much time in the past >>>>>> trying to debug from incorrect information and want to make sure that the >>>>>> information I have is correct before thinking. Please recheck exactly what >>>>>> happened. Rerun with the exact input file you emailed if that is needed. >>>>>> >>>>>> Barry >>>>>> >>>>>> > >>>>>> > Frank >>>>>> > >>>>>> > >>>>>> > On 09/09/2016 12:38 PM, Barry Smith wrote: >>>>>> >> Why does ksp_view2.txt have two KSP views in it while >>>>>> ksp_view1.txt has only one KSPView in it? Did you run two different solves >>>>>> in the 2 case but not the one? >>>>>> >> >>>>>> >> Barry >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >>> On Sep 9, 2016, at 10:56 AM, frank wrote: >>>>>> >>> >>>>>> >>> Hi, >>>>>> >>> >>>>>> >>> I want to continue digging into the memory problem here. >>>>>> >>> I did find a work around in the past, which is to use less cores >>>>>> per node so that each core has 8G memory. However this is deficient and >>>>>> expensive. I hope to locate the place that uses the most memory. >>>>>> >>> >>>>>> >>> Here is a brief summary of the tests I did in past: >>>>>> >>>> Test1: Mesh 1536*128*384 | Process Mesh 48*4*12 >>>>>> >>> Maximum (over computational time) process memory: total >>>>>> 7.0727e+08 >>>>>> >>> Current process memory: >>>>>> total 7.0727e+08 >>>>>> >>> Maximum (over computational time) space PetscMalloc()ed: total >>>>>> 6.3908e+11 >>>>>> >>> Current space PetscMalloc()ed: >>>>>> total 1.8275e+09 >>>>>> >>> >>>>>> >>>> Test2: Mesh 1536*128*384 | Process Mesh 96*8*24 >>>>>> >>> Maximum (over computational time) process memory: total >>>>>> 5.9431e+09 >>>>>> >>> Current process memory: >>>>>> total 5.9431e+09 >>>>>> >>> Maximum (over computational time) space PetscMalloc()ed: total >>>>>> 5.3202e+12 >>>>>> >>> Current space PetscMalloc()ed: >>>>>> total 5.4844e+09 >>>>>> >>> >>>>>> >>>> Test3: Mesh 3072*256*768 | Process Mesh 96*8*24 >>>>>> >>> OOM( Out Of Memory ) killer of the supercomputer terminated >>>>>> the job during "KSPSolve". >>>>>> >>> >>>>>> >>> I attached the output of ksp_view( the third test's output is >>>>>> from ksp_view_pre ), memory_view and also the petsc options. >>>>>> >>> >>>>>> >>> In all the tests, each core can access about 2G memory. In test3, >>>>>> there are 4223139840 non-zeros in the matrix. This will consume about >>>>>> 1.74M, using double precision. Considering some extra memory used to store >>>>>> integer index, 2G memory should still be way enough. >>>>>> >>> >>>>>> >>> Is there a way to find out which part of KSPSolve uses the most >>>>>> memory? >>>>>> >>> Thank you so much. >>>>>> >>> >>>>>> >>> BTW, there are 4 options remains unused and I don't understand >>>>>> why they are omitted: >>>>>> >>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly >>>>>> >>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi >>>>>> >>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1 >>>>>> >>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson >>>>>> >>> >>>>>> >>> >>>>>> >>> Regards, >>>>>> >>> Frank >>>>>> >>> >>>>>> >>> On 07/13/2016 05:47 PM, Dave May wrote: >>>>>> >>>> >>>>>> >>>> On 14 July 2016 at 01:07, frank wrote: >>>>>> >>>> Hi Dave, >>>>>> >>>> >>>>>> >>>> Sorry for the late reply. >>>>>> >>>> Thank you so much for your detailed reply. >>>>>> >>>> >>>>>> >>>> I have a question about the estimation of the memory usage. >>>>>> There are 4223139840 allocated non-zeros and 18432 MPI processes. Double >>>>>> precision is used. So the memory per process is: >>>>>> >>>> 4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ? >>>>>> >>>> Did I do sth wrong here? Because this seems too small. >>>>>> >>>> >>>>>> >>>> No - I totally f***ed it up. You are correct. That'll teach me >>>>>> for fumbling around with my iphone calculator and not using my brain. (Note >>>>>> that to convert to MB just divide by 1e6, not 1024^2 - although I >>>>>> apparently cannot convert between units correctly....) >>>>>> >>>> >>>>>> >>>> From the PETSc objects associated with the solver, It looks like >>>>>> it _should_ run with 2GB per MPI rank. Sorry for my mistake. Possibilities >>>>>> are: somewhere in your usage of PETSc you've introduced a memory leak; >>>>>> PETSc is doing a huge over allocation (e.g. as per our discussion of >>>>>> MatPtAP); or in your application code there are other objects you have >>>>>> forgotten to log the memory for. >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> I am running this job on Bluewater >>>>>> >>>> I am using the 7 points FD stencil in 3D. >>>>>> >>>> >>>>>> >>>> I thought so on both counts. >>>>>> >>>> >>>>>> >>>> I apologize that I made a stupid mistake in computing the memory >>>>>> per core. My settings render each core can access only 2G memory on average >>>>>> instead of 8G which I mentioned in previous email. I re-run the job with 8G >>>>>> memory per core on average and there is no "Out Of Memory" error. I would >>>>>> do more test to see if there is still some memory issue. >>>>>> >>>> >>>>>> >>>> Ok. I'd still like to know where the memory was being used since >>>>>> my estimates were off. >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> Thanks, >>>>>> >>>> Dave >>>>>> >>>> >>>>>> >>>> Regards, >>>>>> >>>> Frank >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> On 07/11/2016 01:18 PM, Dave May wrote: >>>>>> >>>>> Hi Frank, >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> On 11 July 2016 at 19:14, frank wrote: >>>>>> >>>>> Hi Dave, >>>>>> >>>>> >>>>>> >>>>> I re-run the test using bjacobi as the preconditioner on the >>>>>> coarse mesh of telescope. The Grid is 3072*256*768 and process mesh is >>>>>> 96*8*24. The petsc option file is attached. >>>>>> >>>>> I still got the "Out Of Memory" error. The error occurred >>>>>> before the linear solver finished one step. So I don't have the full info >>>>>> from ksp_view. The info from ksp_view_pre is attached. >>>>>> >>>>> >>>>>> >>>>> Okay - that is essentially useless (sorry) >>>>>> >>>>> >>>>>> >>>>> It seems to me that the error occurred when the decomposition >>>>>> was going to be changed. >>>>>> >>>>> >>>>>> >>>>> Based on what information? >>>>>> >>>>> Running with -info would give us more clues, but will create a >>>>>> ton of output. >>>>>> >>>>> Please try running the case which failed with -info >>>>>> >>>>> I had another test with a grid of 1536*128*384 and the same >>>>>> process mesh as above. There was no error. The ksp_view info is attached >>>>>> for comparison. >>>>>> >>>>> Thank you. >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> [3] Here is my crude estimate of your memory usage. >>>>>> >>>>> I'll target the biggest memory hogs only to get an order of >>>>>> magnitude estimate >>>>>> >>>>> >>>>>> >>>>> * The Fine grid operator contains 4223139840 non-zeros --> 1.8 >>>>>> GB per MPI rank assuming double precision. >>>>>> >>>>> The indices for the AIJ could amount to another 0.3 GB >>>>>> (assuming 32 bit integers) >>>>>> >>>>> >>>>>> >>>>> * You use 5 levels of coarsening, so the other operators should >>>>>> represent (collectively) >>>>>> >>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4 ~ 300 MB per MPI rank on >>>>>> the communicator with 18432 ranks. >>>>>> >>>>> The coarse grid should consume ~ 0.5 MB per MPI rank on the >>>>>> communicator with 18432 ranks. >>>>>> >>>>> >>>>>> >>>>> * You use a reduction factor of 64, making the new communicator >>>>>> with 288 MPI ranks. >>>>>> >>>>> PCTelescope will first gather a temporary matrix associated >>>>>> with your coarse level operator assuming a comm size of 288 living on the >>>>>> comm with size 18432. >>>>>> >>>>> This matrix will require approximately 0.5 * 64 = 32 MB per >>>>>> core on the 288 ranks. >>>>>> >>>>> This matrix is then used to form a new MPIAIJ matrix on the >>>>>> subcomm, thus require another 32 MB per rank. >>>>>> >>>>> The temporary matrix is now destroyed. >>>>>> >>>>> >>>>>> >>>>> * Because a DMDA is detected, a permutation matrix is assembled. >>>>>> >>>>> This requires 2 doubles per point in the DMDA. >>>>>> >>>>> Your coarse DMDA contains 92 x 16 x 48 points. >>>>>> >>>>> Thus the permutation matrix will require < 1 MB per MPI rank on >>>>>> the sub-comm. >>>>>> >>>>> >>>>>> >>>>> * Lastly, the matrix is permuted. This uses MatPtAP(), but the >>>>>> resulting operator will have the same memory footprint as the unpermuted >>>>>> matrix (32 MB). At any stage in PCTelescope, only 2 operators of size 32 MB >>>>>> are held in memory when the DMDA is provided. >>>>>> >>>>> >>>>>> >>>>> From my rough estimates, the worst case memory foot print for >>>>>> any given core, given your options is approximately >>>>>> >>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB = 2465 MB >>>>>> >>>>> This is way below 8 GB. >>>>>> >>>>> >>>>>> >>>>> Note this estimate completely ignores: >>>>>> >>>>> (1) the memory required for the restriction operator, >>>>>> >>>>> (2) the potential growth in the number of non-zeros per row due >>>>>> to Galerkin coarsening (I wished -ksp_view_pre reported the output from >>>>>> MatView so we could see the number of non-zeros required by the coarse >>>>>> level operators) >>>>>> >>>>> (3) all temporary vectors required by the CG solver, and those >>>>>> required by the smoothers. >>>>>> >>>>> (4) internal memory allocated by MatPtAP >>>>>> >>>>> (5) memory associated with IS's used within PCTelescope >>>>>> >>>>> >>>>>> >>>>> So either I am completely off in my estimates, or you have not >>>>>> carefully estimated the memory usage of your application code. Hopefully >>>>>> others might examine/correct my rough estimates >>>>>> >>>>> >>>>>> >>>>> Since I don't have your code I cannot access the latter. >>>>>> >>>>> Since I don't have access to the same machine you are running >>>>>> on, I think we need to take a step back. >>>>>> >>>>> >>>>>> >>>>> [1] What machine are you running on? Send me a URL if its >>>>>> available >>>>>> >>>>> >>>>>> >>>>> [2] What discretization are you using? (I am guessing a scalar >>>>>> 7 point FD stencil) >>>>>> >>>>> If it's a 7 point FD stencil, we should be able to examine the >>>>>> memory usage of your solver configuration using a standard, light weight >>>>>> existing PETSc example, run on your machine at the same scale. >>>>>> >>>>> This would hopefully enable us to correctly evaluate the actual >>>>>> memory usage required by the solver configuration you are using. >>>>>> >>>>> >>>>>> >>>>> Thanks, >>>>>> >>>>> Dave >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> Frank >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> On 07/08/2016 10:38 PM, Dave May wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Saturday, 9 July 2016, frank wrote: >>>>>> >>>>>> Hi Barry and Dave, >>>>>> >>>>>> >>>>>> >>>>>> Thank both of you for the advice. >>>>>> >>>>>> >>>>>> >>>>>> @Barry >>>>>> >>>>>> I made a mistake in the file names in last email. I attached >>>>>> the correct files this time. >>>>>> >>>>>> For all the three tests, 'Telescope' is used as the coarse >>>>>> preconditioner. >>>>>> >>>>>> >>>>>> >>>>>> == Test1: Grid: 1536*128*384, Process Mesh: 48*4*12 >>>>>> >>>>>> Part of the memory usage: Vector 125 124 >>>>>> 3971904 0. >>>>>> >>>>>> Matrix 101 101 >>>>>> 9462372 0 >>>>>> >>>>>> >>>>>> >>>>>> == Test2: Grid: 1536*128*384, Process Mesh: 96*8*24 >>>>>> >>>>>> Part of the memory usage: Vector 125 124 681672 >>>>>> 0. >>>>>> >>>>>> Matrix 101 101 >>>>>> 1462180 0. >>>>>> >>>>>> >>>>>> >>>>>> In theory, the memory usage in Test1 should be 8 times of >>>>>> Test2. In my case, it is about 6 times. >>>>>> >>>>>> >>>>>> >>>>>> == Test3: Grid: 3072*256*768, Process Mesh: 96*8*24. >>>>>> Sub-domain per process: 32*32*32 >>>>>> >>>>>> Here I get the out of memory error. >>>>>> >>>>>> >>>>>> >>>>>> I tried to use -mg_coarse jacobi. In this way, I don't need to >>>>>> set -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right? >>>>>> >>>>>> The linear solver didn't work in this case. Petsc output some >>>>>> errors. >>>>>> >>>>>> >>>>>> >>>>>> @Dave >>>>>> >>>>>> In test3, I use only one instance of 'Telescope'. On the >>>>>> coarse mesh of 'Telescope', I used LU as the preconditioner instead of SVD. >>>>>> >>>>>> If my set the levels correctly, then on the last coarse mesh >>>>>> of MG where it calls 'Telescope', the sub-domain per process is 2*2*2. >>>>>> >>>>>> On the last coarse mesh of 'Telescope', there is only one grid >>>>>> point per process. >>>>>> >>>>>> I still got the OOM error. The detailed petsc option file is >>>>>> attached. >>>>>> >>>>>> >>>>>> >>>>>> Do you understand the expected memory usage for the particular >>>>>> parallel LU implementation you are using? I don't (seriously). Replace LU >>>>>> with bjacobi and re-run this test. My point about solver debugging is still >>>>>> valid. >>>>>> >>>>>> >>>>>> >>>>>> And please send the result of KSPView so we can see what is >>>>>> actually used in the computations >>>>>> >>>>>> >>>>>> >>>>>> Thanks >>>>>> >>>>>> Dave >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Thank you so much. >>>>>> >>>>>> >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote: >>>>>> >>>>>> On Jul 6, 2016, at 4:19 PM, frank wrote: >>>>>> >>>>>> >>>>>> >>>>>> Hi Barry, >>>>>> >>>>>> >>>>>> >>>>>> Thank you for you advice. >>>>>> >>>>>> I tried three test. In the 1st test, the grid is 3072*256*768 >>>>>> and the process mesh is 96*8*24. >>>>>> >>>>>> The linear solver is 'cg' the preconditioner is 'mg' and >>>>>> 'telescope' is used as the preconditioner at the coarse mesh. >>>>>> >>>>>> The system gives me the "Out of Memory" error before the >>>>>> linear system is completely solved. >>>>>> >>>>>> The info from '-ksp_view_pre' is attached. I seems to me that >>>>>> the error occurs when it reaches the coarse mesh. >>>>>> >>>>>> >>>>>> >>>>>> The 2nd test uses a grid of 1536*128*384 and process mesh is >>>>>> 96*8*24. The 3rd test uses the >>>>>> same grid but a different process mesh 48*4*12. >>>>>> >>>>>> Are you sure this is right? The total matrix and vector >>>>>> memory usage goes from 2nd test >>>>>> >>>>>> Vector 384 383 8,193,712 >>>>>> 0. >>>>>> >>>>>> Matrix 103 103 11,508,688 >>>>>> 0. >>>>>> >>>>>> to 3rd test >>>>>> >>>>>> Vector 384 383 1,590,520 0. >>>>>> >>>>>> Matrix 103 103 3,508,664 >>>>>> 0. >>>>>> >>>>>> that is the memory usage got smaller but if you have only >>>>>> 1/8th the processes and the same grid it should have gotten about 8 times >>>>>> bigger. Did you maybe cut the grid by a factor of 8 also? If so that still >>>>>> doesn't explain it because the memory usage changed by a factor of 5 >>>>>> something for the vectors and 3 something for the matrices. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> The linear solver and petsc options in 2nd and 3rd tests are >>>>>> the same in 1st test. The linear solver works fine in both test. >>>>>> >>>>>> I attached the memory usage of the 2nd and 3rd tests. The >>>>>> memory info is from the option '-log_summary'. I tried to use >>>>>> '-momery_info' as you suggested, but in my case petsc treated it as an >>>>>> unused option. It output nothing about the memory. Do I need to add sth to >>>>>> my code so I can use '-memory_info'? >>>>>> >>>>>> Sorry, my mistake the option is -memory_view >>>>>> >>>>>> >>>>>> >>>>>> Can you run the one case with -memory_view and -mg_coarse >>>>>> jacobi -ksp_max_it 1 (just so it doesn't iterate forever) to see how much >>>>>> memory is used without the telescope? Also run case 2 the same way. >>>>>> >>>>>> >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> In both tests the memory usage is not large. >>>>>> >>>>>> >>>>>> >>>>>> It seems to me that it might be the 'telescope' >>>>>> preconditioner that allocated a lot of memory and caused the error in the >>>>>> 1st test. >>>>>> >>>>>> Is there is a way to show how much memory it allocated? >>>>>> >>>>>> >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> >>>>>> On 07/05/2016 03:37 PM, Barry Smith wrote: >>>>>> >>>>>> Frank, >>>>>> >>>>>> >>>>>> >>>>>> You can run with -ksp_view_pre to have it "view" the KSP >>>>>> before the solve so hopefully it gets that far. >>>>>> >>>>>> >>>>>> >>>>>> Please run the problem that does fit with -memory_info >>>>>> when the problem completes it will show the "high water mark" for PETSc >>>>>> allocated memory and total memory used. We first want to look at these >>>>>> numbers to see if it is using more memory than you expect. You could also >>>>>> run with say half the grid spacing to see how the memory usage scaled with >>>>>> the increase in grid points. Make the runs also with -log_view and send all >>>>>> the output from these options. >>>>>> >>>>>> >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>> On Jul 5, 2016, at 5:23 PM, frank wrote: >>>>>> >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> >>>>>> >>>>>> I am using the CG ksp solver and Multigrid preconditioner to >>>>>> solve a linear system in parallel. >>>>>> >>>>>> I chose to use the 'Telescope' as the preconditioner on the >>>>>> coarse mesh for its good performance. >>>>>> >>>>>> The petsc options file is attached. >>>>>> >>>>>> >>>>>> >>>>>> The domain is a 3d box. >>>>>> >>>>>> It works well when the grid is 1536*128*384 and the process >>>>>> mesh is 96*8*24. When I double the size of grid and >>>>>> keep the same process mesh and petsc options, I >>>>>> get an "out of memory" error from the super-cluster I am using. >>>>>> >>>>>> Each process has access to at least 8G memory, which should be >>>>>> more than enough for my application. I am sure that all the other parts of >>>>>> my code( except the linear solver ) do not use much memory. So I doubt if >>>>>> there is something wrong with the linear solver. >>>>>> >>>>>> The error occurs before the linear system is completely solved >>>>>> so I don't have the info from ksp view. I am not able to re-produce the >>>>>> error with a smaller problem either. >>>>>> >>>>>> In addition, I tried to use the block jacobi as the >>>>>> preconditioner with the same grid and same decomposition. The linear solver >>>>>> runs extremely slow but there is no memory error. >>>>>> >>>>>> >>>>>> >>>>>> How can I diagnose what exactly cause the error? >>>>>> >>>>>> Thank you so much. >>>>>> >>>>>> >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> >>>>>> >>>>> _options.txt> >>>>>> >>>>>> >>>>>> >>>>> >>>>>> >>>> >>>>>> >>> >>>>> emory2.txt>>>>>> tions3.txt> >>>>>> > >>>>>> >>>>>> >>>>> >>>> >>> >>> >> >> > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hengjiew at uci.edu Tue Oct 4 15:26:09 2016 From: hengjiew at uci.edu (frank) Date: Tue, 4 Oct 2016 13:26:09 -0700 Subject: [petsc-users] Performance of the Telescope Multigrid Preconditioner In-Reply-To: References: <577C337B.60909@uci.edu> <5786C9C7.1080309@uci.edu> <5959F823-EDE5-4B34-84C2-271076977368@mcs.anl.gov> <0CFDEA05-2C49-4127-9F13-2B2DB71ADA77@mcs.anl.gov> <27f4756a-3c58-5c56-fd5b-000aac881a5b@uci.edu> <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> Message-ID: On 10/04/2016 01:20 PM, Matthew Knepley wrote: > On Tue, Oct 4, 2016 at 3:09 PM, frank > wrote: > > Hi Dave, > > Thank you for the reply. > What do you mean by the "nested calls to KSPSolve"? > > > KSPSolve is called again after redistributing the computation. I am still confused. There is only one KSPSolve in my code. Do you mean KSPSolve is called again in the sub-communicator? If that's the case, even if I put two identical KSPSolve in the code, the sub-communicator is still going to call KSPSolve, right? > I tried to call KSPSolve twice, but the the second solve converged > in 0 iteration. KSPSolve seems to remember the solution. How can I > force both solves start from the same initial guess? > > > Did you zero the solution vector between solves? VecSet(x, 0.0); > > Matt > > Thank you. > > Frank > > > > On 10/04/2016 12:56 PM, Dave May wrote: >> >> >> On Tuesday, 4 October 2016, frank > > wrote: >> >> Hi, >> >> This question is follow-up of the thread "Question about >> memory usage in Multigrid preconditioner". >> I used to have the "Out of Memory(OOM)" problem when using >> the CG+Telescope MG solver with 32768 cores. Adding the >> "-matrap 0; -matptap_scalable" option did solve that problem. >> >> Then I test the scalability by solving a 3d poisson eqn for 1 >> step. I used one sub-communicator in all the tests. The >> difference between the petsc options in those tests are: 1 >> the pc_telescope_reduction_factor; 2 the number of multigrid >> levels in the up/down solver. The function "ksp_solve" is >> timed. It is kind of slow and doesn't scale at all. >> >> Test1: 512^3 grid points >> Core# telescope_reduction_factor MG levels# for >> up/down solver Time for KSPSolve (s) >> 512 8 4 / 3 6.2466 >> 4096 64 5 / 3 0.9361 >> 32768 64 4 / 3 4.8914 >> >> Test2: 1024^3 grid points >> Core# telescope_reduction_factor MG levels# for >> up/down solver Time for KSPSolve (s) >> 4096 64 5 / 4 3.4139 >> 8192 128 5 / 4 2.4196 >> 16384 32 5 / 3 5.4150 >> 32768 64 5 / 3 5.6067 >> 65536 128 5 / 3 6.5219 >> >> >> You have to be very careful how you interpret these numbers. Your >> solver contains nested calls to KSPSolve, and unfortunately as a >> result the numbers you report include setup time. This will >> remain true even if you call KSPSetUp on the outermost KSP. >> >> Your email concerns scalability of the silver application, so >> let's focus on that issue. >> >> The only way to clearly separate setup from solve time is >> to perform two identical solves. The second solve will not >> require any setup. You should monitor the second solve via a new >> PetscStage. >> >> This was what I did in the telescope paper. It was the only way >> to understand the setup cost (and scaling) cf the solve time (and >> scaling). >> >> Thanks >> Dave >> >> I guess I didn't set the MG levels properly. What would be >> the efficient way to arrange the MG levels? >> Also which preconditionr at the coarse mesh of the 2nd >> communicator should I use to improve the performance? >> >> I attached the test code and the petsc options file for the >> 1024^3 cube with 32768 cores. >> >> Thank you. >> >> Regards, >> Frank >> >> >> >> >> >> >> On 09/15/2016 03:35 AM, Dave May wrote: >>> HI all, >>> >>> I the only unexpected memory usage I can see is associated >>> with the call to MatPtAP(). >>> Here is something you can try immediately. >>> Run your code with the additional options >>> -matrap 0 -matptap_scalable >>> >>> I didn't realize this before, but the default behaviour of >>> MatPtAP in parallel is actually to to explicitly form the >>> transpose of P (e.g. assemble R = P^T) and then compute R.A.P. >>> You don't want to do this. The option -matrap 0 resolves >>> this issue. >>> >>> The implementation of P^T.A.P has two variants. >>> The scalable implementation (with respect to memory usage) >>> is selected via the second option -matptap_scalable. >>> >>> Try it out - I see a significant memory reduction using >>> these options for particular mesh sizes / partitions. >>> >>> I've attached a cleaned up version of the code you sent me. >>> There were a number of memory leaks and other issues. >>> The main points being >>> * You should call DMDAVecGetArrayF90() before >>> VecAssembly{Begin,End} >>> * You should call PetscFinalize(), otherwise the option >>> -log_summary (-log_view) will not display anything once the >>> program has completed. >>> >>> >>> Thanks, >>> Dave >>> >>> >>> On 15 September 2016 at 08:03, Hengjie Wang >>> wrote: >>> >>> Hi Dave, >>> >>> Sorry, I should have put more comment to explain the code. >>> The number of process in each dimension is the same: Px >>> = Py=Pz=P. So is the domain size. >>> So if the you want to run the code for a 512^3 grid >>> points on 16^3 cores, you need to set "-N 512 -P 16" in >>> the command line. >>> I add more comments and also fix an error in the >>> attached code. ( The error only effects the accuracy of >>> solution but not the memory usage. ) >>> >>> Thank you. >>> Frank >>> >>> >>> On 9/14/2016 9:05 PM, Dave May wrote: >>>> >>>> >>>> On Thursday, 15 September 2016, Dave May >>>> wrote: >>>> >>>> >>>> >>>> On Thursday, 15 September 2016, frank >>>> wrote: >>>> >>>> Hi, >>>> >>>> I write a simple code to re-produce the error. >>>> I hope this can help to diagnose the problem. >>>> The code just solves a 3d poisson equation. >>>> >>>> >>>> Why is the stencil width a runtime parameter?? And >>>> why is the default value 2? For 7-pnt FD Laplace, >>>> you only need a stencil width of 1. >>>> >>>> Was this choice made to mimic something in the >>>> real application code? >>>> >>>> >>>> Please ignore - I misunderstood your usage of the param >>>> set by -P >>>> >>>> >>>> I run the code on a 1024^3 mesh. The process >>>> partition is 32 * 32 * 32. That's when I >>>> re-produce the OOM error. Each core has about >>>> 2G memory. >>>> I also run the code on a 512^3 mesh with 16 * >>>> 16 * 16 processes. The ksp solver works fine. >>>> I attached the code, ksp_view_pre's output and >>>> my petsc option file. >>>> >>>> Thank you. >>>> Frank >>>> >>>> On 09/09/2016 06:38 PM, Hengjie Wang wrote: >>>>> Hi Barry, >>>>> >>>>> I checked. On the supercomputer, I had the >>>>> option "-ksp_view_pre" but it is not in file I >>>>> sent you. I am sorry for the confusion. >>>>> >>>>> Regards, >>>>> Frank >>>>> >>>>> On Friday, September 9, 2016, Barry Smith >>>>> wrote: >>>>> >>>>> >>>>> > On Sep 9, 2016, at 3:11 PM, frank >>>>> wrote: >>>>> > >>>>> > Hi Barry, >>>>> > >>>>> > I think the first KSP view output is >>>>> from -ksp_view_pre. Before I submitted the >>>>> test, I was not sure whether there would >>>>> be OOM error or not. So I added both >>>>> -ksp_view_pre and -ksp_view. >>>>> >>>>> But the options file you sent >>>>> specifically does NOT list the >>>>> -ksp_view_pre so how could it be from that? >>>>> >>>>> Sorry to be pedantic but I've spent too >>>>> much time in the past trying to debug from >>>>> incorrect information and want to make >>>>> sure that the information I have is >>>>> correct before thinking. Please recheck >>>>> exactly what happened. Rerun with the >>>>> exact input file you emailed if that is >>>>> needed. >>>>> >>>>> Barry >>>>> >>>>> > >>>>> > Frank >>>>> > >>>>> > >>>>> > On 09/09/2016 12:38 PM, Barry Smith wrote: >>>>> >> Why does ksp_view2.txt have two KSP >>>>> views in it while ksp_view1.txt has only >>>>> one KSPView in it? Did you run two >>>>> different solves in the 2 case but not the >>>>> one? >>>>> >> >>>>> >> Barry >>>>> >> >>>>> >> >>>>> >> >>>>> >>> On Sep 9, 2016, at 10:56 AM, frank >>>>> wrote: >>>>> >>> >>>>> >>> Hi, >>>>> >>> >>>>> >>> I want to continue digging into the >>>>> memory problem here. >>>>> >>> I did find a work around in the past, >>>>> which is to use less cores per node so >>>>> that each core has 8G memory. However this >>>>> is deficient and expensive. I hope to >>>>> locate the place that uses the most memory. >>>>> >>> >>>>> >>> Here is a brief summary of the tests I >>>>> did in past: >>>>> >>>> Test1: Mesh 1536*128*384 | >>>>> Process Mesh 48*4*12 >>>>> >>> Maximum (over computational time) >>>>> process memory: total 7.0727e+08 >>>>> >>> Current process memory: total >>>>> 7.0727e+08 >>>>> >>> Maximum (over computational time) >>>>> space PetscMalloc()ed: total 6.3908e+11 >>>>> >>> Current space PetscMalloc()ed: >>>>> >>>>> total 1.8275e+09 >>>>> >>> >>>>> >>>> Test2: Mesh 1536*128*384 | >>>>> Process Mesh 96*8*24 >>>>> >>> Maximum (over computational time) >>>>> process memory: total 5.9431e+09 >>>>> >>> Current process memory: total >>>>> 5.9431e+09 >>>>> >>> Maximum (over computational time) >>>>> space PetscMalloc()ed: total 5.3202e+12 >>>>> >>> Current space PetscMalloc()ed: >>>>> >>>>> total 5.4844e+09 >>>>> >>> >>>>> >>>> Test3: Mesh 3072*256*768 | >>>>> Process Mesh 96*8*24 >>>>> >>> OOM( Out Of Memory ) killer of the >>>>> supercomputer terminated the job during >>>>> "KSPSolve". >>>>> >>> >>>>> >>> I attached the output of ksp_view( the >>>>> third test's output is from ksp_view_pre >>>>> ), memory_view and also the petsc options. >>>>> >>> >>>>> >>> In all the tests, each core can access >>>>> about 2G memory. In test3, there are >>>>> 4223139840 non-zeros in the matrix. This >>>>> will consume about 1.74M, using double >>>>> precision. Considering some extra memory >>>>> used to store integer index, 2G memory >>>>> should still be way enough. >>>>> >>> >>>>> >>> Is there a way to find out which part >>>>> of KSPSolve uses the most memory? >>>>> >>> Thank you so much. >>>>> >>> >>>>> >>> BTW, there are 4 options remains >>>>> unused and I don't understand why they are >>>>> omitted: >>>>> >>> >>>>> -mg_coarse_telescope_mg_coarse_ksp_type >>>>> value: preonly >>>>> >>> -mg_coarse_telescope_mg_coarse_pc_type >>>>> value: bjacobi >>>>> >>> >>>>> -mg_coarse_telescope_mg_levels_ksp_max_it >>>>> value: 1 >>>>> >>> >>>>> -mg_coarse_telescope_mg_levels_ksp_type >>>>> value: richardson >>>>> >>> >>>>> >>> >>>>> >>> Regards, >>>>> >>> Frank >>>>> >>> >>>>> >>> On 07/13/2016 05:47 PM, Dave May wrote: >>>>> >>>> >>>>> >>>> On 14 July 2016 at 01:07, frank >>>>> wrote: >>>>> >>>> Hi Dave, >>>>> >>>> >>>>> >>>> Sorry for the late reply. >>>>> >>>> Thank you so much for your detailed >>>>> reply. >>>>> >>>> >>>>> >>>> I have a question about the >>>>> estimation of the memory usage. There are >>>>> 4223139840 allocated non-zeros and 18432 >>>>> MPI processes. Double precision is used. >>>>> So the memory per process is: >>>>> >>>> 4223139840 * 8bytes / 18432 / 1024 >>>>> / 1024 = 1.74M ? >>>>> >>>> Did I do sth wrong here? Because this >>>>> seems too small. >>>>> >>>> >>>>> >>>> No - I totally f***ed it up. You are >>>>> correct. That'll teach me for fumbling >>>>> around with my iphone calculator and not >>>>> using my brain. (Note that to convert to >>>>> MB just divide by 1e6, not 1024^2 - >>>>> although I apparently cannot convert >>>>> between units correctly....) >>>>> >>>> >>>>> >>>> From the PETSc objects associated >>>>> with the solver, It looks like it _should_ >>>>> run with 2GB per MPI rank. Sorry for my >>>>> mistake. Possibilities are: somewhere in >>>>> your usage of PETSc you've introduced a >>>>> memory leak; PETSc is doing a huge over >>>>> allocation (e.g. as per our discussion of >>>>> MatPtAP); or in your application code >>>>> there are other objects you have forgotten >>>>> to log the memory for. >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> I am running this job on Bluewater >>>>> >>>> I am using the 7 points FD stencil in 3D. >>>>> >>>> >>>>> >>>> I thought so on both counts. >>>>> >>>> >>>>> >>>> I apologize that I made a stupid >>>>> mistake in computing the memory per core. >>>>> My settings render each core can access >>>>> only 2G memory on average instead of 8G >>>>> which I mentioned in previous email. I >>>>> re-run the job with 8G memory per core on >>>>> average and there is no "Out Of Memory" >>>>> error. I would do more test to see if >>>>> there is still some memory issue. >>>>> >>>> >>>>> >>>> Ok. I'd still like to know where the >>>>> memory was being used since my estimates >>>>> were off. >>>>> >>>> >>>>> >>>> >>>>> >>>> Thanks, >>>>> >>>> Dave >>>>> >>>> >>>>> >>>> Regards, >>>>> >>>> Frank >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> On 07/11/2016 01:18 PM, Dave May wrote: >>>>> >>>>> Hi Frank, >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 11 July 2016 at 19:14, frank >>>>> wrote: >>>>> >>>>> Hi Dave, >>>>> >>>>> >>>>> >>>>> I re-run the test using bjacobi as >>>>> the preconditioner on the coarse mesh of >>>>> telescope. The Grid is 3072*256*768 and >>>>> process mesh is 96*8*24. The petsc option >>>>> file is attached. >>>>> >>>>> I still got the "Out Of Memory" >>>>> error. The error occurred before the >>>>> linear solver finished one step. So I >>>>> don't have the full info from ksp_view. >>>>> The info from ksp_view_pre is attached. >>>>> >>>>> >>>>> >>>>> Okay - that is essentially useless >>>>> (sorry) >>>>> >>>>> >>>>> >>>>> It seems to me that the error >>>>> occurred when the decomposition was going >>>>> to be changed. >>>>> >>>>> >>>>> >>>>> Based on what information? >>>>> >>>>> Running with -info would give us >>>>> more clues, but will create a ton of output. >>>>> >>>>> Please try running the case which >>>>> failed with -info >>>>> >>>>> I had another test with a grid of >>>>> 1536*128*384 and the same process mesh as >>>>> above. There was no error. The ksp_view >>>>> info is attached for comparison. >>>>> >>>>> Thank you. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> [3] Here is my crude estimate of >>>>> your memory usage. >>>>> >>>>> I'll target the biggest memory hogs >>>>> only to get an order of magnitude estimate >>>>> >>>>> >>>>> >>>>> * The Fine grid operator contains >>>>> 4223139840 non-zeros --> 1.8 GB per MPI >>>>> rank assuming double precision. >>>>> >>>>> The indices for the AIJ could amount >>>>> to another 0.3 GB (assuming 32 bit integers) >>>>> >>>>> >>>>> >>>>> * You use 5 levels of coarsening, so >>>>> the other operators should represent >>>>> (collectively) >>>>> >>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + >>>>> 2.1/8^4 ~ 300 MB per MPI rank on the >>>>> communicator with 18432 ranks. >>>>> >>>>> The coarse grid should consume ~ 0.5 >>>>> MB per MPI rank on the communicator with >>>>> 18432 ranks. >>>>> >>>>> >>>>> >>>>> * You use a reduction factor of 64, >>>>> making the new communicator with 288 MPI >>>>> ranks. >>>>> >>>>> PCTelescope will first gather a >>>>> temporary matrix associated with your >>>>> coarse level operator assuming a comm size >>>>> of 288 living on the comm with size 18432. >>>>> >>>>> This matrix will require >>>>> approximately 0.5 * 64 = 32 MB per core on >>>>> the 288 ranks. >>>>> >>>>> This matrix is then used to form a >>>>> new MPIAIJ matrix on the subcomm, thus >>>>> require another 32 MB per rank. >>>>> >>>>> The temporary matrix is now destroyed. >>>>> >>>>> >>>>> >>>>> * Because a DMDA is detected, a >>>>> permutation matrix is assembled. >>>>> >>>>> This requires 2 doubles per point in >>>>> the DMDA. >>>>> >>>>> Your coarse DMDA contains 92 x 16 x >>>>> 48 points. >>>>> >>>>> Thus the permutation matrix will >>>>> require < 1 MB per MPI rank on the sub-comm. >>>>> >>>>> >>>>> >>>>> * Lastly, the matrix is permuted. >>>>> This uses MatPtAP(), but the resulting >>>>> operator will have the same memory >>>>> footprint as the unpermuted matrix (32 >>>>> MB). At any stage in PCTelescope, only 2 >>>>> operators of size 32 MB are held in memory >>>>> when the DMDA is provided. >>>>> >>>>> >>>>> >>>>> From my rough estimates, the worst >>>>> case memory foot print for any given core, >>>>> given your options is approximately >>>>> >>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 >>>>> MB = 2465 MB >>>>> >>>>> This is way below 8 GB. >>>>> >>>>> >>>>> >>>>> Note this estimate completely ignores: >>>>> >>>>> (1) the memory required for the >>>>> restriction operator, >>>>> >>>>> (2) the potential growth in the >>>>> number of non-zeros per row due to >>>>> Galerkin coarsening (I wished >>>>> -ksp_view_pre reported the output from >>>>> MatView so we could see the number of >>>>> non-zeros required by the coarse level >>>>> operators) >>>>> >>>>> (3) all temporary vectors required >>>>> by the CG solver, and those required by >>>>> the smoothers. >>>>> >>>>> (4) internal memory allocated by MatPtAP >>>>> >>>>> (5) memory associated with IS's used >>>>> within PCTelescope >>>>> >>>>> >>>>> >>>>> So either I am completely off in my >>>>> estimates, or you have not carefully >>>>> estimated the memory usage of your >>>>> application code. Hopefully others might >>>>> examine/correct my rough estimates >>>>> >>>>> >>>>> >>>>> Since I don't have your code I >>>>> cannot access the latter. >>>>> >>>>> Since I don't have access to the >>>>> same machine you are running on, I think >>>>> we need to take a step back. >>>>> >>>>> >>>>> >>>>> [1] What machine are you running on? >>>>> Send me a URL if its available >>>>> >>>>> >>>>> >>>>> [2] What discretization are you >>>>> using? (I am guessing a scalar 7 point FD >>>>> stencil) >>>>> >>>>> If it's a 7 point FD stencil, we >>>>> should be able to examine the memory usage >>>>> of your solver configuration using a >>>>> standard, light weight existing PETSc >>>>> example, run on your machine at the same >>>>> scale. >>>>> >>>>> This would hopefully enable us to >>>>> correctly evaluate the actual memory usage >>>>> required by the solver configuration you >>>>> are using. >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Dave >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Frank >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 07/08/2016 10:38 PM, Dave May wrote: >>>>> >>>>>> >>>>> >>>>>> On Saturday, 9 July 2016, frank >>>>> wrote: >>>>> >>>>>> Hi Barry and Dave, >>>>> >>>>>> >>>>> >>>>>> Thank both of you for the advice. >>>>> >>>>>> >>>>> >>>>>> @Barry >>>>> >>>>>> I made a mistake in the file names >>>>> in last email. I attached the correct >>>>> files this time. >>>>> >>>>>> For all the three tests, >>>>> 'Telescope' is used as the coarse >>>>> preconditioner. >>>>> >>>>>> >>>>> >>>>>> == Test1: Grid: 1536*128*384, >>>>> Process Mesh: 48*4*12 >>>>> >>>>>> Part of the memory usage: Vector >>>>> 125 124 3971904 0. >>>>> >>>>>> Matrix 101 101 9462372 0 >>>>> >>>>>> >>>>> >>>>>> == Test2: Grid: 1536*128*384, >>>>> Process Mesh: 96*8*24 >>>>> >>>>>> Part of the memory usage: Vector >>>>> 125 124 681672 0. >>>>> >>>>>> Matrix 101 101 1462180 0. >>>>> >>>>>> >>>>> >>>>>> In theory, the memory usage in >>>>> Test1 should be 8 times of Test2. In my >>>>> case, it is about 6 times. >>>>> >>>>>> >>>>> >>>>>> == Test3: Grid: 3072*256*768, >>>>> Process Mesh: 96*8*24. Sub-domain per >>>>> process: 32*32*32 >>>>> >>>>>> Here I get the out of memory error. >>>>> >>>>>> >>>>> >>>>>> I tried to use -mg_coarse jacobi. >>>>> In this way, I don't need to set >>>>> -mg_coarse_ksp_type and -mg_coarse_pc_type >>>>> explicitly, right? >>>>> >>>>>> The linear solver didn't work in >>>>> this case. Petsc output some errors. >>>>> >>>>>> >>>>> >>>>>> @Dave >>>>> >>>>>> In test3, I use only one instance >>>>> of 'Telescope'. On the coarse mesh of >>>>> 'Telescope', I used LU as the >>>>> preconditioner instead of SVD. >>>>> >>>>>> If my set the levels correctly, >>>>> then on the last coarse mesh of MG where >>>>> it calls 'Telescope', the sub-domain per >>>>> process is 2*2*2. >>>>> >>>>>> On the last coarse mesh of >>>>> 'Telescope', there is only one grid point >>>>> per process. >>>>> >>>>>> I still got the OOM error. The >>>>> detailed petsc option file is attached. >>>>> >>>>>> >>>>> >>>>>> Do you understand the expected >>>>> memory usage for the particular parallel >>>>> LU implementation you are using? I don't >>>>> (seriously). Replace LU with bjacobi and >>>>> re-run this test. My point about solver >>>>> debugging is still valid. >>>>> >>>>>> >>>>> >>>>>> And please send the result of >>>>> KSPView so we can see what is actually >>>>> used in the computations >>>>> >>>>>> >>>>> >>>>>> Thanks >>>>> >>>>>> Dave >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> Thank you so much. >>>>> >>>>>> >>>>> >>>>>> Frank >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> On 07/06/2016 02:51 PM, Barry Smith >>>>> wrote: >>>>> >>>>>> On Jul 6, 2016, at 4:19 PM, frank >>>>> wrote: >>>>> >>>>>> >>>>> >>>>>> Hi Barry, >>>>> >>>>>> >>>>> >>>>>> Thank you for you advice. >>>>> >>>>>> I tried three test. In the 1st >>>>> test, the grid is 3072*256*768 and the >>>>> process mesh is 96*8*24. >>>>> >>>>>> The linear solver is 'cg' the >>>>> preconditioner is 'mg' and 'telescope' is >>>>> used as the preconditioner at the coarse mesh. >>>>> >>>>>> The system gives me the "Out of >>>>> Memory" error before the linear system is >>>>> completely solved. >>>>> >>>>>> The info from '-ksp_view_pre' is >>>>> attached. I seems to me that the error >>>>> occurs when it reaches the coarse mesh. >>>>> >>>>>> >>>>> >>>>>> The 2nd test uses a grid of >>>>> 1536*128*384 and process mesh is 96*8*24. >>>>> The 3rd test uses the same grid >>>>> but a different process mesh 48*4*12. >>>>> >>>>>> Are you sure this is right? The >>>>> total matrix and vector memory usage goes >>>>> from 2nd test >>>>> >>>>>> Vector 384 >>>>> 383 8,193,712 0. >>>>> >>>>>> Matrix 103 >>>>> 103 11,508,688 0. >>>>> >>>>>> to 3rd test >>>>> >>>>>> Vector 384 >>>>> 383 1,590,520 0. >>>>> >>>>>> Matrix 103 >>>>> 103 3,508,664 0. >>>>> >>>>>> that is the memory usage got >>>>> smaller but if you have only 1/8th the >>>>> processes and the same grid it should have >>>>> gotten about 8 times bigger. Did you maybe >>>>> cut the grid by a factor of 8 also? If so >>>>> that still doesn't explain it because the >>>>> memory usage changed by a factor of 5 >>>>> something for the vectors and 3 something >>>>> for the matrices. >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> The linear solver and petsc options >>>>> in 2nd and 3rd tests are the same in 1st >>>>> test. The linear solver works fine in both >>>>> test. >>>>> >>>>>> I attached the memory usage of the >>>>> 2nd and 3rd tests. The memory info is from >>>>> the option '-log_summary'. I tried to use >>>>> '-momery_info' as you suggested, but in my >>>>> case petsc treated it as an unused option. >>>>> It output nothing about the memory. Do I >>>>> need to add sth to my code so I can use >>>>> '-memory_info'? >>>>> >>>>>> Sorry, my mistake the option is >>>>> -memory_view >>>>> >>>>>> >>>>> >>>>>> Can you run the one case with >>>>> -memory_view and -mg_coarse jacobi >>>>> -ksp_max_it 1 (just so it doesn't iterate >>>>> forever) to see how much memory is used >>>>> without the telescope? Also run case 2 the >>>>> same way. >>>>> >>>>>> >>>>> >>>>>> Barry >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> In both tests the memory usage is >>>>> not large. >>>>> >>>>>> >>>>> >>>>>> It seems to me that it might be the >>>>> 'telescope' preconditioner that allocated >>>>> a lot of memory and caused the error in >>>>> the 1st test. >>>>> >>>>>> Is there is a way to show how much >>>>> memory it allocated? >>>>> >>>>>> >>>>> >>>>>> Frank >>>>> >>>>>> >>>>> >>>>>> On 07/05/2016 03:37 PM, Barry Smith >>>>> wrote: >>>>> >>>>>> Frank, >>>>> >>>>>> >>>>> >>>>>> You can run with -ksp_view_pre >>>>> to have it "view" the KSP before the solve >>>>> so hopefully it gets that far. >>>>> >>>>>> >>>>> >>>>>> Please run the problem that >>>>> does fit with -memory_info when the >>>>> problem completes it will show the "high >>>>> water mark" for PETSc allocated memory and >>>>> total memory used. We first want to look >>>>> at these numbers to see if it is using >>>>> more memory than you expect. You could >>>>> also run with say half the grid spacing to >>>>> see how the memory usage scaled with the >>>>> increase in grid points. Make the runs >>>>> also with -log_view and send all the >>>>> output from these options. >>>>> >>>>>> >>>>> >>>>>> Barry >>>>> >>>>>> >>>>> >>>>>> On Jul 5, 2016, at 5:23 PM, frank >>>>> wrote: >>>>> >>>>>> >>>>> >>>>>> Hi, >>>>> >>>>>> >>>>> >>>>>> I am using the CG ksp solver and >>>>> Multigrid preconditioner to solve a linear >>>>> system in parallel. >>>>> >>>>>> I chose to use the 'Telescope' as >>>>> the preconditioner on the coarse mesh for >>>>> its good performance. >>>>> >>>>>> The petsc options file is attached. >>>>> >>>>>> >>>>> >>>>>> The domain is a 3d box. >>>>> >>>>>> It works well when the grid is >>>>> 1536*128*384 and the process mesh is >>>>> 96*8*24. When I double the size of grid >>>>> and keep the same process mesh and petsc >>>>> options, I get an "out of memory" error >>>>> from the super-cluster I am using. >>>>> >>>>>> Each process has access to at least >>>>> 8G memory, which should be more than >>>>> enough for my application. I am sure that >>>>> all the other parts of my code( except the >>>>> linear solver ) do not use much memory. So >>>>> I doubt if there is something wrong with >>>>> the linear solver. >>>>> >>>>>> The error occurs before the linear >>>>> system is completely solved so I don't >>>>> have the info from ksp view. I am not able >>>>> to re-produce the error with a smaller >>>>> problem either. >>>>> >>>>>> In addition, I tried to use the >>>>> block jacobi as the preconditioner with >>>>> the same grid and same decomposition. The >>>>> linear solver runs extremely slow but >>>>> there is no memory error. >>>>> >>>>>> >>>>> >>>>>> How can I diagnose what exactly >>>>> cause the error? >>>>> >>>>>> Thank you so much. >>>>> >>>>>> >>>>> >>>>>> Frank >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>>>> >>> >>>>> >>>>> > >>>>> >>>> >>> >>> >> > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Oct 4 15:31:06 2016 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 4 Oct 2016 15:31:06 -0500 Subject: [petsc-users] Performance of the Telescope Multigrid Preconditioner In-Reply-To: References: <577C337B.60909@uci.edu> <5786C9C7.1080309@uci.edu> <5959F823-EDE5-4B34-84C2-271076977368@mcs.anl.gov> <0CFDEA05-2C49-4127-9F13-2B2DB71ADA77@mcs.anl.gov> <27f4756a-3c58-5c56-fd5b-000aac881a5b@uci.edu> <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> Message-ID: On Tue, Oct 4, 2016 at 3:26 PM, frank wrote: > > On 10/04/2016 01:20 PM, Matthew Knepley wrote: > > On Tue, Oct 4, 2016 at 3:09 PM, frank wrote: > >> Hi Dave, >> >> Thank you for the reply. >> What do you mean by the "nested calls to KSPSolve"? >> > > KSPSolve is called again after redistributing the computation. > > > I am still confused. There is only one KSPSolve in my code. > Thats right. You call it once, but it is called internally again. > Do you mean KSPSolve is called again in the sub-communicator? If that's > the case, even if I put two identical KSPSolve in the code, the > sub-communicator is still going to call KSPSolve, right? > Yes. Matt > > >> I tried to call KSPSolve twice, but the the second solve converged in 0 >> iteration. KSPSolve seems to remember the solution. How can I force both >> solves start from the same initial guess? >> > > Did you zero the solution vector between solves? VecSet(x, 0.0); > > Matt > > >> Thank you. >> >> Frank >> >> >> >> On 10/04/2016 12:56 PM, Dave May wrote: >> >> >> >> On Tuesday, 4 October 2016, frank wrote: >> >>> Hi, >>> This question is follow-up of the thread "Question about memory usage in >>> Multigrid preconditioner". >>> I used to have the "Out of Memory(OOM)" problem when using the >>> CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0; >>> -matptap_scalable" option did solve that problem. >>> >>> Then I test the scalability by solving a 3d poisson eqn for 1 step. I >>> used one sub-communicator in all the tests. The difference between the >>> petsc options in those tests are: 1 the pc_telescope_reduction_factor; 2 >>> the number of multigrid levels in the up/down solver. The function >>> "ksp_solve" is timed. It is kind of slow and doesn't scale at all. >>> >>> Test1: 512^3 grid points >>> Core# telescope_reduction_factor MG levels# for up/down >>> solver Time for KSPSolve (s) >>> 512 8 4 / >>> 3 6.2466 >>> 4096 64 5 / >>> 3 0.9361 >>> 32768 64 4 / >>> 3 4.8914 >>> >>> Test2: 1024^3 grid points >>> Core# telescope_reduction_factor MG levels# for up/down >>> solver Time for KSPSolve (s) >>> 4096 64 5 / 4 >>> 3.4139 >>> 8192 128 5 / >>> 4 2.4196 >>> 16384 32 5 / 3 >>> 5.4150 >>> 32768 64 5 / >>> 3 5.6067 >>> 65536 128 5 / >>> 3 6.5219 >>> >> >> You have to be very careful how you interpret these numbers. Your solver >> contains nested calls to KSPSolve, and unfortunately as a result the >> numbers you report include setup time. This will remain true even if you >> call KSPSetUp on the outermost KSP. >> >> Your email concerns scalability of the silver application, so let's focus >> on that issue. >> >> The only way to clearly separate setup from solve time is to perform two >> identical solves. The second solve will not require any setup. You should >> monitor the second solve via a new PetscStage. >> >> This was what I did in the telescope paper. It was the only way to >> understand the setup cost (and scaling) cf the solve time (and scaling). >> >> Thanks >> Dave >> >> >> >>> I guess I didn't set the MG levels properly. What would be the efficient >>> way to arrange the MG levels? >>> Also which preconditionr at the coarse mesh of the 2nd communicator >>> should I use to improve the performance? >>> >>> I attached the test code and the petsc options file for the 1024^3 cube >>> with 32768 cores. >>> >>> Thank you. >>> >>> Regards, >>> Frank >>> >>> >>> >>> >>> >>> >>> On 09/15/2016 03:35 AM, Dave May wrote: >>> >>> HI all, >>> >>> I the only unexpected memory usage I can see is associated with the call >>> to MatPtAP(). >>> Here is something you can try immediately. >>> Run your code with the additional options >>> -matrap 0 -matptap_scalable >>> >>> I didn't realize this before, but the default behaviour of MatPtAP in >>> parallel is actually to to explicitly form the transpose of P (e.g. >>> assemble R = P^T) and then compute R.A.P. >>> You don't want to do this. The option -matrap 0 resolves this issue. >>> >>> The implementation of P^T.A.P has two variants. >>> The scalable implementation (with respect to memory usage) is selected >>> via the second option -matptap_scalable. >>> >>> Try it out - I see a significant memory reduction using these options >>> for particular mesh sizes / partitions. >>> >>> I've attached a cleaned up version of the code you sent me. >>> There were a number of memory leaks and other issues. >>> The main points being >>> * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End} >>> * You should call PetscFinalize(), otherwise the option -log_summary >>> (-log_view) will not display anything once the program has completed. >>> >>> >>> Thanks, >>> Dave >>> >>> >>> On 15 September 2016 at 08:03, Hengjie Wang wrote: >>> >>>> Hi Dave, >>>> >>>> Sorry, I should have put more comment to explain the code. >>>> The number of process in each dimension is the same: Px = Py=Pz=P. So >>>> is the domain size. >>>> So if the you want to run the code for a 512^3 grid points on 16^3 >>>> cores, you need to set "-N 512 -P 16" in the command line. >>>> I add more comments and also fix an error in the attached code. ( The >>>> error only effects the accuracy of solution but not the memory usage. ) >>>> >>>> Thank you. >>>> Frank >>>> >>>> >>>> On 9/14/2016 9:05 PM, Dave May wrote: >>>> >>>> >>>> >>>> On Thursday, 15 September 2016, Dave May >>>> wrote: >>>> >>>>> >>>>> >>>>> On Thursday, 15 September 2016, frank wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I write a simple code to re-produce the error. I hope this can help >>>>>> to diagnose the problem. >>>>>> The code just solves a 3d poisson equation. >>>>>> >>>>> >>>>> Why is the stencil width a runtime parameter?? And why is the default >>>>> value 2? For 7-pnt FD Laplace, you only need a stencil width of 1. >>>>> >>>>> Was this choice made to mimic something in the real application code? >>>>> >>>> >>>> Please ignore - I misunderstood your usage of the param set by -P >>>> >>>> >>>>> >>>>> >>>>>> >>>>>> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * >>>>>> 32. That's when I re-produce the OOM error. Each core has about 2G memory. >>>>>> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The >>>>>> ksp solver works fine. >>>>>> I attached the code, ksp_view_pre's output and my petsc option file. >>>>>> >>>>>> Thank you. >>>>>> Frank >>>>>> >>>>>> On 09/09/2016 06:38 PM, Hengjie Wang wrote: >>>>>> >>>>>> Hi Barry, >>>>>> >>>>>> I checked. On the supercomputer, I had the option "-ksp_view_pre" but >>>>>> it is not in file I sent you. I am sorry for the confusion. >>>>>> >>>>>> Regards, >>>>>> Frank >>>>>> >>>>>> On Friday, September 9, 2016, Barry Smith wrote: >>>>>> >>>>>>> >>>>>>> > On Sep 9, 2016, at 3:11 PM, frank wrote: >>>>>>> > >>>>>>> > Hi Barry, >>>>>>> > >>>>>>> > I think the first KSP view output is from -ksp_view_pre. Before I >>>>>>> submitted the test, I was not sure whether there would be OOM error or not. >>>>>>> So I added both -ksp_view_pre and -ksp_view. >>>>>>> >>>>>>> But the options file you sent specifically does NOT list the >>>>>>> -ksp_view_pre so how could it be from that? >>>>>>> >>>>>>> Sorry to be pedantic but I've spent too much time in the past >>>>>>> trying to debug from incorrect information and want to make sure that the >>>>>>> information I have is correct before thinking. Please recheck exactly what >>>>>>> happened. Rerun with the exact input file you emailed if that is needed. >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> > >>>>>>> > Frank >>>>>>> > >>>>>>> > >>>>>>> > On 09/09/2016 12:38 PM, Barry Smith wrote: >>>>>>> >> Why does ksp_view2.txt have two KSP views in it while >>>>>>> ksp_view1.txt has only one KSPView in it? Did you run two different solves >>>>>>> in the 2 case but not the one? >>>>>>> >> >>>>>>> >> Barry >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >>> On Sep 9, 2016, at 10:56 AM, frank wrote: >>>>>>> >>> >>>>>>> >>> Hi, >>>>>>> >>> >>>>>>> >>> I want to continue digging into the memory problem here. >>>>>>> >>> I did find a work around in the past, which is to use less cores >>>>>>> per node so that each core has 8G memory. However this is deficient and >>>>>>> expensive. I hope to locate the place that uses the most memory. >>>>>>> >>> >>>>>>> >>> Here is a brief summary of the tests I did in past: >>>>>>> >>>> Test1: Mesh 1536*128*384 | Process Mesh 48*4*12 >>>>>>> >>> Maximum (over computational time) process memory: >>>>>>> total 7.0727e+08 >>>>>>> >>> Current process memory: >>>>>>> total 7.0727e+08 >>>>>>> >>> Maximum (over computational time) space PetscMalloc()ed: total >>>>>>> 6.3908e+11 >>>>>>> >>> Current space PetscMalloc()ed: >>>>>>> total 1.8275e+09 >>>>>>> >>> >>>>>>> >>>> Test2: Mesh 1536*128*384 | Process Mesh 96*8*24 >>>>>>> >>> Maximum (over computational time) process memory: >>>>>>> total 5.9431e+09 >>>>>>> >>> Current process memory: >>>>>>> total 5.9431e+09 >>>>>>> >>> Maximum (over computational time) space PetscMalloc()ed: total >>>>>>> 5.3202e+12 >>>>>>> >>> Current space PetscMalloc()ed: >>>>>>> total 5.4844e+09 >>>>>>> >>> >>>>>>> >>>> Test3: Mesh 3072*256*768 | Process Mesh 96*8*24 >>>>>>> >>> OOM( Out Of Memory ) killer of the supercomputer terminated >>>>>>> the job during "KSPSolve". >>>>>>> >>> >>>>>>> >>> I attached the output of ksp_view( the third test's output is >>>>>>> from ksp_view_pre ), memory_view and also the petsc options. >>>>>>> >>> >>>>>>> >>> In all the tests, each core can access about 2G memory. In >>>>>>> test3, there are 4223139840 non-zeros in the matrix. This will consume >>>>>>> about 1.74M, using double precision. Considering some extra memory used to >>>>>>> store integer index, 2G memory should still be way enough. >>>>>>> >>> >>>>>>> >>> Is there a way to find out which part of KSPSolve uses the most >>>>>>> memory? >>>>>>> >>> Thank you so much. >>>>>>> >>> >>>>>>> >>> BTW, there are 4 options remains unused and I don't understand >>>>>>> why they are omitted: >>>>>>> >>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly >>>>>>> >>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi >>>>>>> >>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1 >>>>>>> >>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> Regards, >>>>>>> >>> Frank >>>>>>> >>> >>>>>>> >>> On 07/13/2016 05:47 PM, Dave May wrote: >>>>>>> >>>> >>>>>>> >>>> On 14 July 2016 at 01:07, frank wrote: >>>>>>> >>>> Hi Dave, >>>>>>> >>>> >>>>>>> >>>> Sorry for the late reply. >>>>>>> >>>> Thank you so much for your detailed reply. >>>>>>> >>>> >>>>>>> >>>> I have a question about the estimation of the memory usage. >>>>>>> There are 4223139840 allocated non-zeros and 18432 MPI processes. Double >>>>>>> precision is used. So the memory per process is: >>>>>>> >>>> 4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ? >>>>>>> >>>> Did I do sth wrong here? Because this seems too small. >>>>>>> >>>> >>>>>>> >>>> No - I totally f***ed it up. You are correct. That'll teach me >>>>>>> for fumbling around with my iphone calculator and not using my brain. (Note >>>>>>> that to convert to MB just divide by 1e6, not 1024^2 - although I >>>>>>> apparently cannot convert between units correctly....) >>>>>>> >>>> >>>>>>> >>>> From the PETSc objects associated with the solver, It looks >>>>>>> like it _should_ run with 2GB per MPI rank. Sorry for my mistake. >>>>>>> Possibilities are: somewhere in your usage of PETSc you've introduced a >>>>>>> memory leak; PETSc is doing a huge over allocation (e.g. as per our >>>>>>> discussion of MatPtAP); or in your application code there are other objects >>>>>>> you have forgotten to log the memory for. >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> I am running this job on Bluewater >>>>>>> >>>> I am using the 7 points FD stencil in 3D. >>>>>>> >>>> >>>>>>> >>>> I thought so on both counts. >>>>>>> >>>> >>>>>>> >>>> I apologize that I made a stupid mistake in computing the >>>>>>> memory per core. My settings render each core can access only 2G memory on >>>>>>> average instead of 8G which I mentioned in previous email. I re-run the job >>>>>>> with 8G memory per core on average and there is no "Out Of Memory" error. I >>>>>>> would do more test to see if there is still some memory issue. >>>>>>> >>>> >>>>>>> >>>> Ok. I'd still like to know where the memory was being used >>>>>>> since my estimates were off. >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> Thanks, >>>>>>> >>>> Dave >>>>>>> >>>> >>>>>>> >>>> Regards, >>>>>>> >>>> Frank >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> On 07/11/2016 01:18 PM, Dave May wrote: >>>>>>> >>>>> Hi Frank, >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> >>>>> On 11 July 2016 at 19:14, frank wrote: >>>>>>> >>>>> Hi Dave, >>>>>>> >>>>> >>>>>>> >>>>> I re-run the test using bjacobi as the preconditioner on the >>>>>>> coarse mesh of telescope. The Grid is 3072*256*768 and process mesh is >>>>>>> 96*8*24. The petsc option file is attached. >>>>>>> >>>>> I still got the "Out Of Memory" error. The error occurred >>>>>>> before the linear solver finished one step. So I don't have the full info >>>>>>> from ksp_view. The info from ksp_view_pre is attached. >>>>>>> >>>>> >>>>>>> >>>>> Okay - that is essentially useless (sorry) >>>>>>> >>>>> >>>>>>> >>>>> It seems to me that the error occurred when the decomposition >>>>>>> was going to be changed. >>>>>>> >>>>> >>>>>>> >>>>> Based on what information? >>>>>>> >>>>> Running with -info would give us more clues, but will create a >>>>>>> ton of output. >>>>>>> >>>>> Please try running the case which failed with -info >>>>>>> >>>>> I had another test with a grid of 1536*128*384 and the same >>>>>>> process mesh as above. There was no error. The ksp_view info is attached >>>>>>> for comparison. >>>>>>> >>>>> Thank you. >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> >>>>> [3] Here is my crude estimate of your memory usage. >>>>>>> >>>>> I'll target the biggest memory hogs only to get an order of >>>>>>> magnitude estimate >>>>>>> >>>>> >>>>>>> >>>>> * The Fine grid operator contains 4223139840 non-zeros --> 1.8 >>>>>>> GB per MPI rank assuming double precision. >>>>>>> >>>>> The indices for the AIJ could amount to another 0.3 GB >>>>>>> (assuming 32 bit integers) >>>>>>> >>>>> >>>>>>> >>>>> * You use 5 levels of coarsening, so the other operators >>>>>>> should represent (collectively) >>>>>>> >>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4 ~ 300 MB per MPI rank >>>>>>> on the communicator with 18432 ranks. >>>>>>> >>>>> The coarse grid should consume ~ 0.5 MB per MPI rank on the >>>>>>> communicator with 18432 ranks. >>>>>>> >>>>> >>>>>>> >>>>> * You use a reduction factor of 64, making the new >>>>>>> communicator with 288 MPI ranks. >>>>>>> >>>>> PCTelescope will first gather a temporary matrix associated >>>>>>> with your coarse level operator assuming a comm size of 288 living on the >>>>>>> comm with size 18432. >>>>>>> >>>>> This matrix will require approximately 0.5 * 64 = 32 MB per >>>>>>> core on the 288 ranks. >>>>>>> >>>>> This matrix is then used to form a new MPIAIJ matrix on the >>>>>>> subcomm, thus require another 32 MB per rank. >>>>>>> >>>>> The temporary matrix is now destroyed. >>>>>>> >>>>> >>>>>>> >>>>> * Because a DMDA is detected, a permutation matrix is >>>>>>> assembled. >>>>>>> >>>>> This requires 2 doubles per point in the DMDA. >>>>>>> >>>>> Your coarse DMDA contains 92 x 16 x 48 points. >>>>>>> >>>>> Thus the permutation matrix will require < 1 MB per MPI rank >>>>>>> on the sub-comm. >>>>>>> >>>>> >>>>>>> >>>>> * Lastly, the matrix is permuted. This uses MatPtAP(), but the >>>>>>> resulting operator will have the same memory footprint as the unpermuted >>>>>>> matrix (32 MB). At any stage in PCTelescope, only 2 operators of size 32 MB >>>>>>> are held in memory when the DMDA is provided. >>>>>>> >>>>> >>>>>>> >>>>> From my rough estimates, the worst case memory foot print for >>>>>>> any given core, given your options is approximately >>>>>>> >>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB = 2465 MB >>>>>>> >>>>> This is way below 8 GB. >>>>>>> >>>>> >>>>>>> >>>>> Note this estimate completely ignores: >>>>>>> >>>>> (1) the memory required for the restriction operator, >>>>>>> >>>>> (2) the potential growth in the number of non-zeros per row >>>>>>> due to Galerkin coarsening (I wished -ksp_view_pre reported the output from >>>>>>> MatView so we could see the number of non-zeros required by the coarse >>>>>>> level operators) >>>>>>> >>>>> (3) all temporary vectors required by the CG solver, and those >>>>>>> required by the smoothers. >>>>>>> >>>>> (4) internal memory allocated by MatPtAP >>>>>>> >>>>> (5) memory associated with IS's used within PCTelescope >>>>>>> >>>>> >>>>>>> >>>>> So either I am completely off in my estimates, or you have not >>>>>>> carefully estimated the memory usage of your application code. Hopefully >>>>>>> others might examine/correct my rough estimates >>>>>>> >>>>> >>>>>>> >>>>> Since I don't have your code I cannot access the latter. >>>>>>> >>>>> Since I don't have access to the same machine you are running >>>>>>> on, I think we need to take a step back. >>>>>>> >>>>> >>>>>>> >>>>> [1] What machine are you running on? Send me a URL if its >>>>>>> available >>>>>>> >>>>> >>>>>>> >>>>> [2] What discretization are you using? (I am guessing a scalar >>>>>>> 7 point FD stencil) >>>>>>> >>>>> If it's a 7 point FD stencil, we should be able to examine the >>>>>>> memory usage of your solver configuration using a standard, light weight >>>>>>> existing PETSc example, run on your machine at the same scale. >>>>>>> >>>>> This would hopefully enable us to correctly evaluate the >>>>>>> actual memory usage required by the solver configuration you are using. >>>>>>> >>>>> >>>>>>> >>>>> Thanks, >>>>>>> >>>>> Dave >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> >>>>> Frank >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> >>>>> On 07/08/2016 10:38 PM, Dave May wrote: >>>>>>> >>>>>> >>>>>>> >>>>>> On Saturday, 9 July 2016, frank wrote: >>>>>>> >>>>>> Hi Barry and Dave, >>>>>>> >>>>>> >>>>>>> >>>>>> Thank both of you for the advice. >>>>>>> >>>>>> >>>>>>> >>>>>> @Barry >>>>>>> >>>>>> I made a mistake in the file names in last email. I attached >>>>>>> the correct files this time. >>>>>>> >>>>>> For all the three tests, 'Telescope' is used as the coarse >>>>>>> preconditioner. >>>>>>> >>>>>> >>>>>>> >>>>>> == Test1: Grid: 1536*128*384, Process Mesh: 48*4*12 >>>>>>> >>>>>> Part of the memory usage: Vector 125 124 >>>>>>> 3971904 0. >>>>>>> >>>>>> Matrix 101 >>>>>>> 101 9462372 0 >>>>>>> >>>>>> >>>>>>> >>>>>> == Test2: Grid: 1536*128*384, Process Mesh: 96*8*24 >>>>>>> >>>>>> Part of the memory usage: Vector 125 124 >>>>>>> 681672 0. >>>>>>> >>>>>> Matrix 101 >>>>>>> 101 1462180 0. >>>>>>> >>>>>> >>>>>>> >>>>>> In theory, the memory usage in Test1 should be 8 times of >>>>>>> Test2. In my case, it is about 6 times. >>>>>>> >>>>>> >>>>>>> >>>>>> == Test3: Grid: 3072*256*768, Process Mesh: 96*8*24. >>>>>>> Sub-domain per process: 32*32*32 >>>>>>> >>>>>> Here I get the out of memory error. >>>>>>> >>>>>> >>>>>>> >>>>>> I tried to use -mg_coarse jacobi. In this way, I don't need >>>>>>> to set -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right? >>>>>>> >>>>>> The linear solver didn't work in this case. Petsc output some >>>>>>> errors. >>>>>>> >>>>>> >>>>>>> >>>>>> @Dave >>>>>>> >>>>>> In test3, I use only one instance of 'Telescope'. On the >>>>>>> coarse mesh of 'Telescope', I used LU as the preconditioner instead of SVD. >>>>>>> >>>>>> If my set the levels correctly, then on the last coarse mesh >>>>>>> of MG where it calls 'Telescope', the sub-domain per process is 2*2*2. >>>>>>> >>>>>> On the last coarse mesh of 'Telescope', there is only one >>>>>>> grid point per process. >>>>>>> >>>>>> I still got the OOM error. The detailed petsc option file is >>>>>>> attached. >>>>>>> >>>>>> >>>>>>> >>>>>> Do you understand the expected memory usage for the >>>>>>> particular parallel LU implementation you are using? I don't (seriously). >>>>>>> Replace LU with bjacobi and re-run this test. My point about solver >>>>>>> debugging is still valid. >>>>>>> >>>>>> >>>>>>> >>>>>> And please send the result of KSPView so we can see what is >>>>>>> actually used in the computations >>>>>>> >>>>>> >>>>>>> >>>>>> Thanks >>>>>>> >>>>>> Dave >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> Thank you so much. >>>>>>> >>>>>> >>>>>>> >>>>>> Frank >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote: >>>>>>> >>>>>> On Jul 6, 2016, at 4:19 PM, frank wrote: >>>>>>> >>>>>> >>>>>>> >>>>>> Hi Barry, >>>>>>> >>>>>> >>>>>>> >>>>>> Thank you for you advice. >>>>>>> >>>>>> I tried three test. In the 1st test, the grid is 3072*256*768 >>>>>>> and the process mesh is 96*8*24. >>>>>>> >>>>>> The linear solver is 'cg' the preconditioner is 'mg' and >>>>>>> 'telescope' is used as the preconditioner at the coarse mesh. >>>>>>> >>>>>> The system gives me the "Out of Memory" error before the >>>>>>> linear system is completely solved. >>>>>>> >>>>>> The info from '-ksp_view_pre' is attached. I seems to me that >>>>>>> the error occurs when it reaches the coarse mesh. >>>>>>> >>>>>> >>>>>>> >>>>>> The 2nd test uses a grid of 1536*128*384 and process mesh is >>>>>>> 96*8*24. The 3rd test uses the >>>>>>> same grid but a different process mesh 48*4*12. >>>>>>> >>>>>> Are you sure this is right? The total matrix and vector >>>>>>> memory usage goes from 2nd test >>>>>>> >>>>>> Vector 384 383 8,193,712 >>>>>>> 0. >>>>>>> >>>>>> Matrix 103 103 11,508,688 >>>>>>> 0. >>>>>>> >>>>>> to 3rd test >>>>>>> >>>>>> Vector 384 383 1,590,520 >>>>>>> 0. >>>>>>> >>>>>> Matrix 103 103 3,508,664 >>>>>>> 0. >>>>>>> >>>>>> that is the memory usage got smaller but if you have only >>>>>>> 1/8th the processes and the same grid it should have gotten about 8 times >>>>>>> bigger. Did you maybe cut the grid by a factor of 8 also? If so that still >>>>>>> doesn't explain it because the memory usage changed by a factor of 5 >>>>>>> something for the vectors and 3 something for the matrices. >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> The linear solver and petsc options in 2nd and 3rd tests are >>>>>>> the same in 1st test. The linear solver works fine in both test. >>>>>>> >>>>>> I attached the memory usage of the 2nd and 3rd tests. The >>>>>>> memory info is from the option '-log_summary'. I tried to use >>>>>>> '-momery_info' as you suggested, but in my case petsc treated it as an >>>>>>> unused option. It output nothing about the memory. Do I need to add sth to >>>>>>> my code so I can use '-memory_info'? >>>>>>> >>>>>> Sorry, my mistake the option is -memory_view >>>>>>> >>>>>> >>>>>>> >>>>>> Can you run the one case with -memory_view and -mg_coarse >>>>>>> jacobi -ksp_max_it 1 (just so it doesn't iterate forever) to see how much >>>>>>> memory is used without the telescope? Also run case 2 the same way. >>>>>>> >>>>>> >>>>>>> >>>>>> Barry >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> In both tests the memory usage is not large. >>>>>>> >>>>>> >>>>>>> >>>>>> It seems to me that it might be the 'telescope' >>>>>>> preconditioner that allocated a lot of memory and caused the error in the >>>>>>> 1st test. >>>>>>> >>>>>> Is there is a way to show how much memory it allocated? >>>>>>> >>>>>> >>>>>>> >>>>>> Frank >>>>>>> >>>>>> >>>>>>> >>>>>> On 07/05/2016 03:37 PM, Barry Smith wrote: >>>>>>> >>>>>> Frank, >>>>>>> >>>>>> >>>>>>> >>>>>> You can run with -ksp_view_pre to have it "view" the KSP >>>>>>> before the solve so hopefully it gets that far. >>>>>>> >>>>>> >>>>>>> >>>>>> Please run the problem that does fit with -memory_info >>>>>>> when the problem completes it will show the "high water mark" for PETSc >>>>>>> allocated memory and total memory used. We first want to look at these >>>>>>> numbers to see if it is using more memory than you expect. You could also >>>>>>> run with say half the grid spacing to see how the memory usage scaled with >>>>>>> the increase in grid points. Make the runs also with -log_view and send all >>>>>>> the output from these options. >>>>>>> >>>>>> >>>>>>> >>>>>> Barry >>>>>>> >>>>>> >>>>>>> >>>>>> On Jul 5, 2016, at 5:23 PM, frank wrote: >>>>>>> >>>>>> >>>>>>> >>>>>> Hi, >>>>>>> >>>>>> >>>>>>> >>>>>> I am using the CG ksp solver and Multigrid preconditioner to >>>>>>> solve a linear system in parallel. >>>>>>> >>>>>> I chose to use the 'Telescope' as the preconditioner on the >>>>>>> coarse mesh for its good performance. >>>>>>> >>>>>> The petsc options file is attached. >>>>>>> >>>>>> >>>>>>> >>>>>> The domain is a 3d box. >>>>>>> >>>>>> It works well when the grid is 1536*128*384 and the process >>>>>>> mesh is 96*8*24. When I double the size of grid and >>>>>>> keep the same process mesh and petsc options, I >>>>>>> get an "out of memory" error from the super-cluster I am using. >>>>>>> >>>>>> Each process has access to at least 8G memory, which should >>>>>>> be more than enough for my application. I am sure that all the other parts >>>>>>> of my code( except the linear solver ) do not use much memory. So I doubt >>>>>>> if there is something wrong with the linear solver. >>>>>>> >>>>>> The error occurs before the linear system is completely >>>>>>> solved so I don't have the info from ksp view. I am not able to re-produce >>>>>>> the error with a smaller problem either. >>>>>>> >>>>>> In addition, I tried to use the block jacobi as the >>>>>>> preconditioner with the same grid and same decomposition. The linear solver >>>>>>> runs extremely slow but there is no memory error. >>>>>>> >>>>>> >>>>>>> >>>>>> How can I diagnose what exactly cause the error? >>>>>>> >>>>>> Thank you so much. >>>>>>> >>>>>> >>>>>>> >>>>>> Frank >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>> _options.txt> >>>>>>> >>>>>> >>>>>>> >>>>> >>>>>>> >>>> >>>>>>> >>> >>>>>> emory2.txt>>>>>>> tions3.txt> >>>>>>> > >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From sb020287 at gmail.com Tue Oct 4 21:02:12 2016 From: sb020287 at gmail.com (Somdeb Bandopadhyay) Date: Wed, 5 Oct 2016 10:02:12 +0800 Subject: [petsc-users] using DMDA with python Message-ID: Dear all, I want to write a solver for incompressible navier stokes using python and I want to use PETsc (particularly dmda & ksp) for this. May I know if this type of work is feasible/already done? I intend to run my solver in a cluster and so am slightly concerned about the performance if I use python with petsc. My deepest apologies if this mail of mine caused you any inconvenience. Somdeb -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Oct 4 21:12:45 2016 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 4 Oct 2016 21:12:45 -0500 Subject: [petsc-users] using DMDA with python In-Reply-To: References: Message-ID: On Tue, Oct 4, 2016 at 9:02 PM, Somdeb Bandopadhyay wrote: > Dear all, > I want to write a solver for incompressible navier stokes > using python and I want to use PETsc (particularly dmda & ksp) for this. > May I know if this type of work is feasible/already done? > How do you plan to discretize your system? DMDA supports only collocation discretizations, so some sort of penalty for pressure would have to be employed. Thanks, Matt > I intend to run my solver in a cluster and so am slightly > concerned about the performance if I use python with petsc. > My deepest apologies if this mail of mine caused you any > inconvenience. > > Somdeb > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From sb020287 at gmail.com Tue Oct 4 21:23:41 2016 From: sb020287 at gmail.com (Somdeb Bandopadhyay) Date: Wed, 5 Oct 2016 10:23:41 +0800 Subject: [petsc-users] using DMDA with python In-Reply-To: References: Message-ID: Hi again Sir, Thank you very much for the quick response. I am planning to implement a mustiphase algorithm on collocated grid. I already qrote a C code for 2d case, but it wasn't very generalized . So for the final version, I intend to use python as a script to interact with PETSc kernels. Somdeb On Wed, Oct 5, 2016 at 10:12 AM, Matthew Knepley wrote: > On Tue, Oct 4, 2016 at 9:02 PM, Somdeb Bandopadhyay > wrote: > >> Dear all, >> I want to write a solver for incompressible navier stokes >> using python and I want to use PETsc (particularly dmda & ksp) for this. >> May I know if this type of work is feasible/already done? >> > > How do you plan to discretize your system? DMDA supports only collocation > discretizations, so some sort of penalty for pressure would > have to be employed. > > Thanks, > > Matt > > >> I intend to run my solver in a cluster and so am slightly >> concerned about the performance if I use python with petsc. >> My deepest apologies if this mail of mine caused you any >> inconvenience. >> >> Somdeb >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sb020287 at gmail.com Tue Oct 4 21:47:53 2016 From: sb020287 at gmail.com (Somdeb Bandopadhyay) Date: Wed, 5 Oct 2016 10:47:53 +0800 Subject: [petsc-users] using DMDA with python In-Reply-To: References: Message-ID: Hi again, Please allow me to explain in detail here:- 1. I am using Zang's (jcp 1994) method for incompressible flow on generalized collocated grid. 2. The main difference lies on the calculation of the grid matrix, for which I am using Gaitonde et al (2002)'s work 3. I want to use python to set up the domain , grid(structured) and boundary/initial conditions. 4. I want petsc to a) decompose the domain with dmda b) use ksp for linear solver. I * have not* used petsc4py rigorously , so before trying his venture I wnt to know whether it is feasible or not, and if there is any example for similar work (so that I can copy their approach, to be precise) Have a very good day. Somdeb On Wed, Oct 5, 2016 at 10:23 AM, Somdeb Bandopadhyay wrote: > Hi again Sir, > Thank you very much for the quick response. I am planning to > implement a mustiphase algorithm on collocated grid. I already qrote a C > code for 2d case, but it wasn't very generalized . So for the final > version, I intend to use python as a script to interact with PETSc kernels. > > Somdeb > > > On Wed, Oct 5, 2016 at 10:12 AM, Matthew Knepley > wrote: > >> On Tue, Oct 4, 2016 at 9:02 PM, Somdeb Bandopadhyay >> wrote: >> >>> Dear all, >>> I want to write a solver for incompressible navier stokes >>> using python and I want to use PETsc (particularly dmda & ksp) for this. >>> May I know if this type of work is feasible/already done? >>> >> >> How do you plan to discretize your system? DMDA supports only collocation >> discretizations, so some sort of penalty for pressure would >> have to be employed. >> >> Thanks, >> >> Matt >> >> >>> I intend to run my solver in a cluster and so am slightly >>> concerned about the performance if I use python with petsc. >>> My deepest apologies if this mail of mine caused you any >>> inconvenience. >>> >>> Somdeb >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ztdepyahoo at 163.com Wed Oct 5 04:02:03 2016 From: ztdepyahoo at 163.com (=?GBK?B?tqHAz8qm?=) Date: Wed, 5 Oct 2016 17:02:03 +0800 (CST) Subject: [petsc-users] How to broadcast a double value to all the nodes in the cluster with Petsc Message-ID: <34ad7036.45b0.15794140728.Coremail.ztdepyahoo@163.com> Dear professor: How to broadcast a double value to all the nodes in the cluster with Petsc -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.sanan at gmail.com Wed Oct 5 04:10:53 2016 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Wed, 5 Oct 2016 11:10:53 +0200 Subject: [petsc-users] How to broadcast a double value to all the nodes in the cluster with Petsc In-Reply-To: <34ad7036.45b0.15794140728.Coremail.ztdepyahoo@163.com> References: <34ad7036.45b0.15794140728.Coremail.ztdepyahoo@163.com> Message-ID: PETSc, by design, does not wrap any of the existing functionality of MPI, so this would be accomplished with an MPI function like MPI_Bcast(). On Wed, Oct 5, 2016 at 11:02 AM, ??? wrote: > Dear professor: > How to broadcast a double value to all the nodes in the cluster with > Petsc > > > > > > > > > > > > > > > > > > > From cpraveen at gmail.com Wed Oct 5 07:54:14 2016 From: cpraveen at gmail.com (Praveen C) Date: Wed, 5 Oct 2016 18:24:14 +0530 Subject: [petsc-users] Vector with ghost values using DMDA Message-ID: Dear all I am using DMDA and create a vector with DMCreateGlobalVector However this does not have ghost values. How should I create vector if I want to access ghost values ? Thanks praveen -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Oct 5 09:15:42 2016 From: jed at jedbrown.org (Jed Brown) Date: Wed, 05 Oct 2016 08:15:42 -0600 Subject: [petsc-users] Vector with ghost values using DMDA In-Reply-To: References: Message-ID: <87mviihofl.fsf@jedbrown.org> Praveen C writes: > Dear all > > I am using DMDA and create a vector with > > DMCreateGlobalVector > > > However this does not have ghost values. How should I create vector if I > want to access ghost values ? That's what local vectors are for. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From cpraveen at gmail.com Wed Oct 5 09:20:05 2016 From: cpraveen at gmail.com (Praveen C) Date: Wed, 5 Oct 2016 19:50:05 +0530 Subject: [petsc-users] Vector with ghost values using DMDA In-Reply-To: <87mviihofl.fsf@jedbrown.org> References: <87mviihofl.fsf@jedbrown.org> Message-ID: So I have to create a global vector AND a local vector using DMCreateLocalVector. Then I do DMGlobalToLocalBegin/End. Does this not lead to too much copying ? I see there is VecCreateGhost but no such thing for DMDA ? Best praveen PS: Would be nice if the reply-to was set to mailing list. I frequently forget to do Reply All. On Wed, Oct 5, 2016 at 7:45 PM, Jed Brown wrote: > Praveen C writes: > > > Dear all > > > > I am using DMDA and create a vector with > > > > DMCreateGlobalVector > > > > > > However this does not have ghost values. How should I create vector if I > > want to access ghost values ? > > That's what local vectors are for. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Wed Oct 5 09:22:30 2016 From: hzhang at mcs.anl.gov (Hong) Date: Wed, 5 Oct 2016 09:22:30 -0500 Subject: [petsc-users] Vector with ghost values using DMDA In-Reply-To: References: <87mviihofl.fsf@jedbrown.org> Message-ID: Praveen : DMGetLocalVector(). See petsc/src/snes/examples/tutorials/ex19.c Hong > So I have to create a global vector AND a local vector using > DMCreateLocalVector. > > Then I do DMGlobalToLocalBegin/End. Does this not lead to too much copying > ? I see there is VecCreateGhost but no such thing for DMDA ? > > Best > praveen > > PS: Would be nice if the reply-to was set to mailing list. I frequently > forget to do Reply All. > > On Wed, Oct 5, 2016 at 7:45 PM, Jed Brown wrote: > >> Praveen C writes: >> >> > Dear all >> > >> > I am using DMDA and create a vector with >> > >> > DMCreateGlobalVector >> > >> > >> > However this does not have ghost values. How should I create vector if I >> > want to access ghost values ? >> >> That's what local vectors are for. >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Wed Oct 5 09:24:33 2016 From: hzhang at mcs.anl.gov (Hong) Date: Wed, 5 Oct 2016 09:24:33 -0500 Subject: [petsc-users] How to broadcast a double value to all the nodes in the cluster with Petsc In-Reply-To: <34ad7036.45b0.15794140728.Coremail.ztdepyahoo@163.com> References: <34ad7036.45b0.15794140728.Coremail.ztdepyahoo@163.com> Message-ID: ??? : > How to broadcast a double value to all the nodes in the cluster with > Petsc > MPI_Bcast(). Hong > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Oct 5 09:28:21 2016 From: jed at jedbrown.org (Jed Brown) Date: Wed, 05 Oct 2016 08:28:21 -0600 Subject: [petsc-users] Vector with ghost values using DMDA In-Reply-To: References: <87mviihofl.fsf@jedbrown.org> Message-ID: <87h98qhnui.fsf@jedbrown.org> Praveen C writes: > So I have to create a global vector AND a local vector using > DMCreateLocalVector. > > Then I do DMGlobalToLocalBegin/End. Does this not lead to too much copying > ? It's typically more efficient -- the solver gets to work with contiguous vectors and doesn't have unused "ghost" storage in every vector of a Krylov space, for example. > I see there is VecCreateGhost but no such thing for DMDA ? That would necessitate non-contiguous indexing which would kill performance. Also note that the ghost points in a VecGhost would exist but be unused in every vector of a Krylov space, every stage of a Runge-Kutta method, etc. Much better to have one or two local vectors that you use while evaluating residuals. > PS: Would be nice if the reply-to was set to mailing list. I frequently > forget to do Reply All. Then make reply-all your default instead of asking to break (individual) reply for everyone else. http://www.unicom.com/pw/reply-to-harmful.html http://woozle.org/~neale/papers/reply-to-still-harmful.html -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From knepley at gmail.com Wed Oct 5 09:32:40 2016 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 5 Oct 2016 09:32:40 -0500 Subject: [petsc-users] Vector with ghost values using DMDA In-Reply-To: References: Message-ID: On Wed, Oct 5, 2016 at 7:54 AM, Praveen C wrote: > Dear all > > I am using DMDA and create a vector with > > DMCreateGlobalVector > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMCreateLocalVector.html Matt > However this does not have ghost values. How should I create vector if I > want to access ghost values ? > > Thanks > praveen > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Oct 5 09:39:21 2016 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 5 Oct 2016 09:39:21 -0500 Subject: [petsc-users] using DMDA with python In-Reply-To: References: Message-ID: On Tue, Oct 4, 2016 at 9:47 PM, Somdeb Bandopadhyay wrote: > Hi again, > Please allow me to explain in detail here:- > > 1. I am using Zang's (jcp 1994) method for incompressible flow on > generalized collocated grid. > 2. The main difference lies on the calculation of the grid matrix, for > which I am using Gaitonde et al (2002)'s work > 3. I want to use python to set up the domain , grid(structured) and > boundary/initial conditions. > 4. I want petsc to a) decompose the domain with dmda b) use ksp for > linear solver. > > > I * have not* used petsc4py rigorously , so before trying his venture I > wnt to know whether it is feasible or not, and if there is any example for > similar work (so that I can copy their approach, to be precise) > It sounds like you want to use DMDA in the same way we suggest in the tutorials. In particular, Lisandro has a Poisson tutorial that does everything you want I believe (except multiple fields which is straightforward). Thanks, Matt > Have a very good day. > > Somdeb > > On Wed, Oct 5, 2016 at 10:23 AM, Somdeb Bandopadhyay > wrote: > >> Hi again Sir, >> Thank you very much for the quick response. I am planning to >> implement a mustiphase algorithm on collocated grid. I already qrote a C >> code for 2d case, but it wasn't very generalized . So for the final >> version, I intend to use python as a script to interact with PETSc kernels. >> >> Somdeb >> >> >> On Wed, Oct 5, 2016 at 10:12 AM, Matthew Knepley >> wrote: >> >>> On Tue, Oct 4, 2016 at 9:02 PM, Somdeb Bandopadhyay >>> wrote: >>> >>>> Dear all, >>>> I want to write a solver for incompressible navier stokes >>>> using python and I want to use PETsc (particularly dmda & ksp) for this. >>>> May I know if this type of work is feasible/already done? >>>> >>> >>> How do you plan to discretize your system? DMDA supports only >>> collocation discretizations, so some sort of penalty for pressure would >>> have to be employed. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> I intend to run my solver in a cluster and so am slightly >>>> concerned about the performance if I use python with petsc. >>>> My deepest apologies if this mail of mine caused you any >>>> inconvenience. >>>> >>>> Somdeb >>>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From e.tadeu at gmail.com Wed Oct 5 11:19:24 2016 From: e.tadeu at gmail.com (E. Tadeu) Date: Wed, 5 Oct 2016 13:19:24 -0300 Subject: [petsc-users] using DMDA with python In-Reply-To: References: Message-ID: Matt, Do you know if there is any example of solving Navier Stokes using a staggered approach by using a different DM object such as DMPlex? Thanks, Edson On Tue, Oct 4, 2016 at 11:12 PM, Matthew Knepley wrote: > On Tue, Oct 4, 2016 at 9:02 PM, Somdeb Bandopadhyay > wrote: > >> Dear all, >> I want to write a solver for incompressible navier stokes >> using python and I want to use PETsc (particularly dmda & ksp) for this. >> May I know if this type of work is feasible/already done? >> > > How do you plan to discretize your system? DMDA supports only collocation > discretizations, so some sort of penalty for pressure would > have to be employed. > > Thanks, > > Matt > > >> I intend to run my solver in a cluster and so am slightly >> concerned about the performance if I use python with petsc. >> My deepest apologies if this mail of mine caused you any >> inconvenience. >> >> Somdeb >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cpraveen at gmail.com Wed Oct 5 11:27:43 2016 From: cpraveen at gmail.com (Praveen C) Date: Wed, 5 Oct 2016 21:57:43 +0530 Subject: [petsc-users] Vector with ghost values using DMDA In-Reply-To: References: Message-ID: Thanks to all. Your answers were very helpful. Best praveen -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Oct 5 12:49:46 2016 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 5 Oct 2016 12:49:46 -0500 Subject: [petsc-users] using DMDA with python In-Reply-To: References: Message-ID: On Wed, Oct 5, 2016 at 11:19 AM, E. Tadeu wrote: > Matt, > > Do you know if there is any example of solving Navier Stokes using a > staggered approach by using a different DM object such as DMPlex? > SNES ex62 can do P2/P1 Stokes, which is similar. Is that what you want to see? For real structured grid, staggered mesh stuff like MAC, I would just do this on a single DMDA, but think of it as being staggered, and expand my stencil as necessary. Thanks, Matt > > Thanks, > Edson > > > On Tue, Oct 4, 2016 at 11:12 PM, Matthew Knepley > wrote: > >> On Tue, Oct 4, 2016 at 9:02 PM, Somdeb Bandopadhyay >> wrote: >> >>> Dear all, >>> I want to write a solver for incompressible navier stokes >>> using python and I want to use PETsc (particularly dmda & ksp) for this. >>> May I know if this type of work is feasible/already done? >>> >> >> How do you plan to discretize your system? DMDA supports only collocation >> discretizations, so some sort of penalty for pressure would >> have to be employed. >> >> Thanks, >> >> Matt >> >> >>> I intend to run my solver in a cluster and so am slightly >>> concerned about the performance if I use python with petsc. >>> My deepest apologies if this mail of mine caused you any >>> inconvenience. >>> >>> Somdeb >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Wed Oct 5 13:03:48 2016 From: dave.mayhem23 at gmail.com (Dave May) Date: Wed, 5 Oct 2016 19:03:48 +0100 Subject: [petsc-users] using DMDA with python In-Reply-To: References: Message-ID: On 5 October 2016 at 18:49, Matthew Knepley wrote: > On Wed, Oct 5, 2016 at 11:19 AM, E. Tadeu wrote: > >> Matt, >> >> Do you know if there is any example of solving Navier Stokes using a >> staggered approach by using a different DM object such as DMPlex? >> > > SNES ex62 can do P2/P1 Stokes, which is similar. Is that what you want to > see? > > For real structured grid, staggered mesh stuff like MAC, I would just do > this on a single DMDA, but think of it as being staggered, and expand my > stencil as necessary. > Following that up, for a DMDA example using a staggered grid, take a look at snes/ex30.c http://www.mcs.anl.gov/petsc/petsc-current/src/snes/examples/tutorials/ex30.c.html Thanks, Dave > > Thanks, > > Matt > > >> >> Thanks, >> Edson >> >> >> On Tue, Oct 4, 2016 at 11:12 PM, Matthew Knepley >> wrote: >> >>> On Tue, Oct 4, 2016 at 9:02 PM, Somdeb Bandopadhyay >>> wrote: >>> >>>> Dear all, >>>> I want to write a solver for incompressible navier stokes >>>> using python and I want to use PETsc (particularly dmda & ksp) for this. >>>> May I know if this type of work is feasible/already done? >>>> >>> >>> How do you plan to discretize your system? DMDA supports only >>> collocation discretizations, so some sort of penalty for pressure would >>> have to be employed. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> I intend to run my solver in a cluster and so am slightly >>>> concerned about the performance if I use python with petsc. >>>> My deepest apologies if this mail of mine caused you any >>>> inconvenience. >>>> >>>> Somdeb >>>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From overholt at capesim.com Wed Oct 5 14:30:25 2016 From: overholt at capesim.com (Matthew Overholt) Date: Wed, 5 Oct 2016 15:30:25 -0400 Subject: [petsc-users] large PetscCommDuplicate overhead Message-ID: <004201d21f3e$ed31c120$c7954360$@capesim.com> Hi Petsc-Users, I am trying to understand an issue where PetscCommDuplicate() calls are taking an increasing percentage of time as I run a fixed-sized problem on more processes. I am using the FEM to solve the steady-state heat transfer equation (K.x = q) using a PC direct solver, like MUMPS. I am running on the NERSC Cray X30, which has two Xeon's per node with 12 cores each, and profiling the code using CrayPat sampling. On a typical problem (1E+6 finite elements), running on a single node: -for 2 cores (1 on each Xeon), about 1% of time is PetscCommDuplicate (on process 1, but on the root it is less), and (for reference) 9% of total time is for MUMPS. -for 8 cores (4 on each Xeon), over 6% of time is PetscCommDuplicate (on every process except the root, where it is <1%), and 9-10% of total time is for MUMPS. What is the large PetscCommDuplicate time connected to, an increasing number of messages (tags)? Would using fewer MatSetValues() and VecSetValues() calls (with longer message lengths) alleviate this? For reference, the PETSc calling sequence in the code is as follows. // Create the solution and RHS vectors ierr = VecCreate(petscData->mpicomm,&mesh->hpx); ierr = PetscObjectSetName((PetscObject) mesh->hpx, "Solution"); ierr = VecSetSizes(mesh->hpx,mesh->lxN,mesh->neqns); // size = # of equations; distribution to match mesh ierr = VecSetFromOptions(mesh->hpx); // allow run time options ierr = VecDuplicate(mesh->hpx,&q); // create the RHS vector // Create the stiffnexx matrix ierr = MatCreate(petscData->mpicomm,&K); ierr = MatSetSizes(K,mesh->lxN,mesh->lxN,mesh->neqns,mesh->neqns); ierr = MatSetType(K,MATAIJ); // default sparse type // Do preallocation ierr = MatMPIAIJSetPreallocation(K,d_nz,NULL,o_nz,NULL); ierr = MatSeqAIJSetPreallocation(K,d_nz,NULL); ierr = MatSetOption(K,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE); ierr = MatSetUp(K); // Create and set up the KSP context as a PreConditioner Only (Direct) Solution ierr = KSPCreate(petscData->mpicomm,&ksp); ierr = KSPSetOperators(ksp,K,K); ierr = KSPSetType(ksp,KSPPREONLY); // Set the temperature vector ierr = VecSet(mesh->hpx,mesh->Tmin); // Set the default PC method as MUMPS ierr = KSPGetPC(ksp,&pc); // extract the preconditioner ierr = PCSetType(pc,PCLU); // set pc options ierr = PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS); ierr = KSPSetFromOptions(ksp); // Set the values for the K matrix and q vector // which involves a lot of these calls ierr = MatSetValues(K,mrows,idxm,ncols,idxn,pKe,ADD_VALUES); // 1 call per matrix row (equation) ierr = VecSetValues(q,nqe,ixn,pqe,ADD_VALUES); // 1 call per element ierr = VecAssemblyBegin(q); ierr = MatAssemblyBegin(K,MAT_FINAL_ASSEMBLY); ierr = VecAssemblyEnd(q); ierr = MatAssemblyEnd(K,MAT_FINAL_ASSEMBLY); // Solve ////////////////////////////////////// ierr = KSPSolve(ksp,q,mesh->hpx); ... *Note that the code evenly divides the finite elements over the total number of processors, and I am using ghosting of the FE vertices vector to handle the vertices that are needed on more than 1 process. Thanks in advance for your help, Matt Overholt CapeSym, Inc. --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Oct 5 14:44:16 2016 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 5 Oct 2016 14:44:16 -0500 Subject: [petsc-users] large PetscCommDuplicate overhead In-Reply-To: <004201d21f3e$ed31c120$c7954360$@capesim.com> References: <004201d21f3e$ed31c120$c7954360$@capesim.com> Message-ID: On Wed, Oct 5, 2016 at 2:30 PM, Matthew Overholt wrote: > Hi Petsc-Users, > > > > I am trying to understand an issue where PetscCommDuplicate() calls are > taking an increasing percentage of time as I run a fixed-sized problem on > more processes. > > > > I am using the FEM to solve the steady-state heat transfer equation (K.x = > q) using a PC direct solver, like MUMPS. > > > > I am running on the NERSC Cray X30, which has two Xeon's per node with 12 > cores each, and profiling the code using CrayPat sampling. > > > > On a typical problem (1E+6 finite elements), running on a single node: > > -for 2 cores (1 on each Xeon), about 1% of time is PetscCommDuplicate (on > process 1, but on the root it is less), and (for reference) 9% of total > time is for MUMPS. > > -for 8 cores (4 on each Xeon), over 6% of time is PetscCommDuplicate (on > every process except the root, where it is <1%), and 9-10% of total time is > for MUMPS. > > > > What is the large PetscCommDuplicate time connected to, an increasing > number of messages (tags)? Would using fewer MatSetValues() and > VecSetValues() calls (with longer message lengths) alleviate this? > 1) I am skeptical of the result. Can you do a run with direct measurement rather than sampling? 2) Can we see the output of -log_view ? 3) Are you sure you configured with --with-debugging=0 ? 4) If this result is true, it could be coming from bad behavior with PetscSpinlock on this machine. We need to see configure.log Thanks, Matt > For reference, the PETSc calling sequence in the code is as follows. > > // Create the solution and RHS vectors > > ierr = VecCreate(petscData->mpicomm,&mesh->hpx); > > ierr = PetscObjectSetName((PetscObject) mesh->hpx, "Solution"); > > ierr = VecSetSizes(mesh->hpx,mesh->lxN,mesh->neqns); // size = # of > equations; distribution to match mesh > > ierr = VecSetFromOptions(mesh->hpx); // allow run time options > > ierr = VecDuplicate(mesh->hpx,&q); // create the RHS vector > > // Create the stiffnexx matrix > > ierr = MatCreate(petscData->mpicomm,&K); > > ierr = MatSetSizes(K,mesh->lxN,mesh->lxN,mesh->neqns,mesh->neqns); > > ierr = MatSetType(K,MATAIJ); // default sparse type > > // Do preallocation > > ierr = MatMPIAIJSetPreallocation(K,d_nz,NULL,o_nz,NULL); > > ierr = MatSeqAIJSetPreallocation(K,d_nz,NULL); > > ierr = MatSetOption(K,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE); > > ierr = MatSetUp(K); > > // Create and set up the KSP context as a PreConditioner Only (Direct) > Solution > > ierr = KSPCreate(petscData->mpicomm,&ksp); > > ierr = KSPSetOperators(ksp,K,K); > > ierr = KSPSetType(ksp,KSPPREONLY); > > // Set the temperature vector > > ierr = VecSet(mesh->hpx,mesh->Tmin); > > // Set the default PC method as MUMPS > > ierr = KSPGetPC(ksp,&pc); // extract the preconditioner > > ierr = PCSetType(pc,PCLU); // set pc options > > ierr = PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS); > > ierr = KSPSetFromOptions(ksp); > > > > // Set the values for the K matrix and q vector > > // which involves a lot of these calls > > ierr = MatSetValues(K,mrows,idxm,ncols,idxn,pKe,ADD_VALUES); > // 1 call per matrix row (equation) > > ierr = VecSetValues(q,nqe,ixn,pqe,ADD_VALUES); // 1 call per > element > > ierr = VecAssemblyBegin(q); > > ierr = MatAssemblyBegin(K,MAT_FINAL_ASSEMBLY); > > ierr = VecAssemblyEnd(q); > > ierr = MatAssemblyEnd(K,MAT_FINAL_ASSEMBLY); > > > > // Solve ////////////////////////////////////// > > ierr = KSPSolve(ksp,q,mesh->hpx); > > ... > > *Note that the code evenly divides the finite elements over the total > number of processors, and I am using ghosting of the FE vertices vector to > handle the vertices that are needed on more than 1 process. > > > > Thanks in advance for your help, > > Matt Overholt > > CapeSym, Inc. > > > Virus-free. > www.avast.com > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Oct 5 15:41:53 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 5 Oct 2016 15:41:53 -0500 Subject: [petsc-users] large PetscCommDuplicate overhead In-Reply-To: <004201d21f3e$ed31c120$c7954360$@capesim.com> References: <004201d21f3e$ed31c120$c7954360$@capesim.com> Message-ID: <1EF15B5B-168C-4FFD-98BB-4C49678C02FC@mcs.anl.gov> > On Oct 5, 2016, at 2:30 PM, Matthew Overholt wrote: > > Hi Petsc-Users, > > I am trying to understand an issue where PetscCommDuplicate() calls are taking an increasing percentage of time as I run a fixed-sized problem on more processes. > > I am using the FEM to solve the steady-state heat transfer equation (K.x = q) using a PC direct solver, like MUMPS. > > I am running on the NERSC Cray X30, which has two Xeon's per node with 12 cores each, and profiling the code using CrayPat sampling. > > On a typical problem (1E+6 finite elements), running on a single node: > -for 2 cores (1 on each Xeon), about 1% of time is PetscCommDuplicate (on process 1, but on the root it is less), and (for reference) 9% of total time is for MUMPS. > -for 8 cores (4 on each Xeon), over 6% of time is PetscCommDuplicate (on every process except the root, where it is <1%), and 9-10% of total time is for MUMPS. What does PetscCommDuplicate() have to do with MUMPS? Nothing at all, you are just giving its time for comparison? > > What is the large PetscCommDuplicate time connected to, an increasing number of messages (tags)? Would using fewer MatSetValues() and VecSetValues() calls (with longer message lengths) alleviate this? No PetscCommDuplicate won't increate with more messages or calls to XXXSetValues(). PetscCommDuplicate() is only called essentially on the creation of new PETSc objects. It should also be fast since it basically needs to do just a MPI_Attr_get(). With more processes but the same problem size and code there should be pretty much the same number of objects created. PetscSpinlockLock() does nothing if you are not using threads so it won't take any time. Is there a way to see where it is spending its time inside the PetscCommDuplicate()? Perhaps the Cray MPI_Attr_get() has issues. Barry > > For reference, the PETSc calling sequence in the code is as follows. > // Create the solution and RHS vectors > ierr = VecCreate(petscData->mpicomm,&mesh->hpx); > ierr = PetscObjectSetName((PetscObject) mesh->hpx, "Solution"); > ierr = VecSetSizes(mesh->hpx,mesh->lxN,mesh->neqns); // size = # of equations; distribution to match mesh > ierr = VecSetFromOptions(mesh->hpx); // allow run time options > ierr = VecDuplicate(mesh->hpx,&q); // create the RHS vector > // Create the stiffnexx matrix > ierr = MatCreate(petscData->mpicomm,&K); > ierr = MatSetSizes(K,mesh->lxN,mesh->lxN,mesh->neqns,mesh->neqns); > ierr = MatSetType(K,MATAIJ); // default sparse type > // Do preallocation > ierr = MatMPIAIJSetPreallocation(K,d_nz,NULL,o_nz,NULL); > ierr = MatSeqAIJSetPreallocation(K,d_nz,NULL); > ierr = MatSetOption(K,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE); > ierr = MatSetUp(K); > // Create and set up the KSP context as a PreConditioner Only (Direct) Solution > ierr = KSPCreate(petscData->mpicomm,&ksp); > ierr = KSPSetOperators(ksp,K,K); > ierr = KSPSetType(ksp,KSPPREONLY); > // Set the temperature vector > ierr = VecSet(mesh->hpx,mesh->Tmin); > // Set the default PC method as MUMPS > ierr = KSPGetPC(ksp,&pc); // extract the preconditioner > ierr = PCSetType(pc,PCLU); // set pc options > ierr = PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS); > ierr = KSPSetFromOptions(ksp); > > // Set the values for the K matrix and q vector > // which involves a lot of these calls > ierr = MatSetValues(K,mrows,idxm,ncols,idxn,pKe,ADD_VALUES); // 1 call per matrix row (equation) > ierr = VecSetValues(q,nqe,ixn,pqe,ADD_VALUES); // 1 call per element > ierr = VecAssemblyBegin(q); > ierr = MatAssemblyBegin(K,MAT_FINAL_ASSEMBLY); > ierr = VecAssemblyEnd(q); > ierr = MatAssemblyEnd(K,MAT_FINAL_ASSEMBLY); > > // Solve ////////////////////////////////////// > ierr = KSPSolve(ksp,q,mesh->hpx); > ... > *Note that the code evenly divides the finite elements over the total number of processors, and I am using ghosting of the FE vertices vector to handle the vertices that are needed on more than 1 process. > > Thanks in advance for your help, > Matt Overholt > CapeSym, Inc. > > Virus-free. www.avast.com From hengjiew at uci.edu Wed Oct 5 19:57:18 2016 From: hengjiew at uci.edu (frank) Date: Wed, 5 Oct 2016 17:57:18 -0700 Subject: [petsc-users] create global vector in latest version of petsc Message-ID: Hi, I update petsc to the latest version by pulling from the repo. Then I find one of my old code, which worked before, output errors now. After debugging, I find that the error is caused by "DMCreateGlobalVector". I attach a short program which can re-produce the error. This program works well with an older version of petsc. I also attach the script I used to configure petsc. The error message is below. Did I miss something in the installation ? Thank you. 1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- 2 [0]PETSC ERROR: Null argument, when expecting valid pointer 3 [0]PETSC ERROR: Null Object: Parameter # 2 4 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. 5 [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1571-g7fc5cb5 GIT Date: 2016-10-05 10:56:19 -0500 6 [0]PETSC ERROR: [2]PETSC ERROR: ./test_ksp.exe on a gnu-dbg-32idx named kolmog1 by frank Wed Oct 5 17:40:07 2016 7 [0]PETSC ERROR: Configure options --known-mpi-shared="0 " --known-memcmp-ok --with-debugging="1 " --with-shared-libraries=0 --with-mpi-compilers="1 " --download-blacs="1 " --download-metis="1 " --dow nload-parmetis="1 " --download-superlu_dist="1 " --download-hypre=1 PETSC_ARCH=gnu-dbg-32idx 8 [0]PETSC ERROR: #1 VecSetLocalToGlobalMapping() line 83 in /home/frank/petsc/src/vec/vec/interface/vector.c 9 [0]PETSC ERROR: #2 DMCreateGlobalVector_DA() line 45 in /home/frank/petsc/src/dm/impls/da/dadist.c 10 [0]PETSC ERROR: #3 DMCreateGlobalVector() line 880 in /home/frank/petsc/src/dm/interface/dm.c Regards, Frank -------------- next part -------------- A non-text attachment was scrubbed... Name: test_vec.f90 Type: text/x-fortran Size: 815 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gnu-dbg-32idx.py Type: text/x-python Size: 914 bytes Desc: not available URL: From knepley at gmail.com Wed Oct 5 20:08:42 2016 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 5 Oct 2016 20:08:42 -0500 Subject: [petsc-users] create global vector in latest version of petsc In-Reply-To: References: Message-ID: On Wed, Oct 5, 2016 at 7:57 PM, frank wrote: > Hi, > > I update petsc to the latest version by pulling from the repo. Then I find > one of my old code, which worked before, output errors now. > After debugging, I find that the error is caused by "DMCreateGlobalVector". > I attach a short program which can re-produce the error. This program > works well with an older version of petsc. > I also attach the script I used to configure petsc. > First, did you reconfigure after pulling? If not, please do this, rebuild, and try again. Thanks, Matt > The error message is below. Did I miss something in the installation ? > Thank you. > > 1 [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > 2 [0]PETSC ERROR: Null argument, when expecting valid pointer > 3 [0]PETSC ERROR: Null Object: Parameter # 2 > 4 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d > ocumentation/faq.html for trouble shooting. > 5 [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1571-g7fc5cb5 > GIT Date: 2016-10-05 10:56:19 -0500 > 6 [0]PETSC ERROR: [2]PETSC ERROR: ./test_ksp.exe on a gnu-dbg-32idx > named kolmog1 by frank Wed Oct 5 17:40:07 2016 > 7 [0]PETSC ERROR: Configure options --known-mpi-shared="0 " > --known-memcmp-ok --with-debugging="1 " --with-shared-libraries=0 > --with-mpi-compilers="1 " --download-blacs="1 " --download-metis="1 " > --dow nload-parmetis="1 " --download-superlu_dist="1 " > --download-hypre=1 PETSC_ARCH=gnu-dbg-32idx > 8 [0]PETSC ERROR: #1 VecSetLocalToGlobalMapping() line 83 in > /home/frank/petsc/src/vec/vec/interface/vector.c > 9 [0]PETSC ERROR: #2 DMCreateGlobalVector_DA() line 45 in > /home/frank/petsc/src/dm/impls/da/dadist.c > 10 [0]PETSC ERROR: #3 DMCreateGlobalVector() line 880 in > /home/frank/petsc/src/dm/interface/dm.c > > > Regards, > Frank > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hengjiew at uci.edu Wed Oct 5 20:23:22 2016 From: hengjiew at uci.edu (Hengjie Wang) Date: Wed, 5 Oct 2016 18:23:22 -0700 Subject: [petsc-users] create global vector in latest version of petsc In-Reply-To: References: Message-ID: Hi, I did. I am using GNU compiler 5.4.0. I don't know if this matters. Thank you Frank On 10/5/2016 6:08 PM, Matthew Knepley wrote: > On Wed, Oct 5, 2016 at 7:57 PM, frank > wrote: > > Hi, > > I update petsc to the latest version by pulling from the repo. > Then I find one of my old code, which worked before, output errors > now. > After debugging, I find that the error is caused by > "DMCreateGlobalVector". > I attach a short program which can re-produce the error. This > program works well with an older version of petsc. > I also attach the script I used to configure petsc. > > > First, did you reconfigure after pulling? If not, please do this, > rebuild, and try again. > > Thanks, > > Matt > > The error message is below. Did I miss something in the > installation ? Thank you. > > 1 [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > 2 [0]PETSC ERROR: Null argument, when expecting valid pointer > 3 [0]PETSC ERROR: Null Object: Parameter # 2 > 4 [0]PETSC ERROR: See > http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble > shooting. > 5 [0]PETSC ERROR: Petsc Development GIT revision: > v3.7.4-1571-g7fc5cb5 GIT Date: 2016-10-05 10:56:19 -0500 > 6 [0]PETSC ERROR: [2]PETSC ERROR: ./test_ksp.exe on a > gnu-dbg-32idx named kolmog1 by frank Wed Oct 5 17:40:07 2016 > 7 [0]PETSC ERROR: Configure options --known-mpi-shared="0 " > --known-memcmp-ok --with-debugging="1 " --with-shared-libraries=0 > --with-mpi-compilers="1 " --download-blacs="1 " > --download-metis="1 " --dow nload-parmetis="1 " > --download-superlu_dist="1 " --download-hypre=1 > PETSC_ARCH=gnu-dbg-32idx > 8 [0]PETSC ERROR: #1 VecSetLocalToGlobalMapping() line 83 in > /home/frank/petsc/src/vec/vec/interface/vector.c > 9 [0]PETSC ERROR: #2 DMCreateGlobalVector_DA() line 45 in > /home/frank/petsc/src/dm/impls/da/dadist.c > 10 [0]PETSC ERROR: #3 DMCreateGlobalVector() line 880 in > /home/frank/petsc/src/dm/interface/dm.c > > > Regards, > Frank > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Oct 5 20:57:39 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 5 Oct 2016 20:57:39 -0500 Subject: [petsc-users] create global vector in latest version of petsc In-Reply-To: References: Message-ID: <1FFE2C62-EA4B-4D6D-A538-4D32ACF35CA7@mcs.anl.gov> PETSc fortran programs should always end with .F90 not .f90 can you try again with that name? The capital F is important. Barry > On Oct 5, 2016, at 7:57 PM, frank wrote: > > Hi, > > I update petsc to the latest version by pulling from the repo. Then I find one of my old code, which worked before, output errors now. > After debugging, I find that the error is caused by "DMCreateGlobalVector". > I attach a short program which can re-produce the error. This program works well with an older version of petsc. > I also attach the script I used to configure petsc. > > The error message is below. Did I miss something in the installation ? Thank you. > > 1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > 2 [0]PETSC ERROR: Null argument, when expecting valid pointer > 3 [0]PETSC ERROR: Null Object: Parameter # 2 > 4 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > 5 [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1571-g7fc5cb5 GIT Date: 2016-10-05 10:56:19 -0500 > 6 [0]PETSC ERROR: [2]PETSC ERROR: ./test_ksp.exe on a gnu-dbg-32idx named kolmog1 by frank Wed Oct 5 17:40:07 2016 > 7 [0]PETSC ERROR: Configure options --known-mpi-shared="0 " --known-memcmp-ok --with-debugging="1 " --with-shared-libraries=0 --with-mpi-compilers="1 " --download-blacs="1 " --download-metis="1 " --dow nload-parmetis="1 " --download-superlu_dist="1 " --download-hypre=1 PETSC_ARCH=gnu-dbg-32idx > 8 [0]PETSC ERROR: #1 VecSetLocalToGlobalMapping() line 83 in /home/frank/petsc/src/vec/vec/interface/vector.c > 9 [0]PETSC ERROR: #2 DMCreateGlobalVector_DA() line 45 in /home/frank/petsc/src/dm/impls/da/dadist.c > 10 [0]PETSC ERROR: #3 DMCreateGlobalVector() line 880 in /home/frank/petsc/src/dm/interface/dm.c > > > Regards, > Frank > > > > From hengjiew at uci.edu Wed Oct 5 21:11:27 2016 From: hengjiew at uci.edu (Hengjie Wang) Date: Wed, 5 Oct 2016 19:11:27 -0700 Subject: [petsc-users] create global vector in latest version of petsc In-Reply-To: <1FFE2C62-EA4B-4D6D-A538-4D32ACF35CA7@mcs.anl.gov> References: <1FFE2C62-EA4B-4D6D-A538-4D32ACF35CA7@mcs.anl.gov> Message-ID: Hi, I just tried .F90. It had the error. I attached the full error log. Thank you. Frank On 10/5/2016 6:57 PM, Barry Smith wrote: > PETSc fortran programs should always end with .F90 not .f90 can you try again with that name? The capital F is important. > > Barry > >> On Oct 5, 2016, at 7:57 PM, frank wrote: >> >> Hi, >> >> I update petsc to the latest version by pulling from the repo. Then I find one of my old code, which worked before, output errors now. >> After debugging, I find that the error is caused by "DMCreateGlobalVector". >> I attach a short program which can re-produce the error. This program works well with an older version of petsc. >> I also attach the script I used to configure petsc. >> >> The error message is below. Did I miss something in the installation ? Thank you. >> >> 1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> 2 [0]PETSC ERROR: Null argument, when expecting valid pointer >> 3 [0]PETSC ERROR: Null Object: Parameter # 2 >> 4 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> 5 [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1571-g7fc5cb5 GIT Date: 2016-10-05 10:56:19 -0500 >> 6 [0]PETSC ERROR: [2]PETSC ERROR: ./test_ksp.exe on a gnu-dbg-32idx named kolmog1 by frank Wed Oct 5 17:40:07 2016 >> 7 [0]PETSC ERROR: Configure options --known-mpi-shared="0 " --known-memcmp-ok --with-debugging="1 " --with-shared-libraries=0 --with-mpi-compilers="1 " --download-blacs="1 " --download-metis="1 " --dow nload-parmetis="1 " --download-superlu_dist="1 " --download-hypre=1 PETSC_ARCH=gnu-dbg-32idx >> 8 [0]PETSC ERROR: #1 VecSetLocalToGlobalMapping() line 83 in /home/frank/petsc/src/vec/vec/interface/vector.c >> 9 [0]PETSC ERROR: #2 DMCreateGlobalVector_DA() line 45 in /home/frank/petsc/src/dm/impls/da/dadist.c >> 10 [0]PETSC ERROR: #3 DMCreateGlobalVector() line 880 in /home/frank/petsc/src/dm/interface/dm.c >> >> >> Regards, >> Frank >> >> >> >> -------------- next part -------------- [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Null argument, when expecting valid pointer [0]PETSC ERROR: Null Object: Parameter # 2 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1571-g7fc5cb5 GIT Date: 2016-10-05 10:56:19 -0500 [0]PETSC ERROR: [2]PETSC ERROR: ./test_ksp.exe on a gnu-dbg-32idx named kolmog1 by frank Wed Oct 5 18:58:44 2016 [0]PETSC ERROR: Configure options --known-mpi-shared="0 " --known-memcmp-ok --with-debugging="1 " --with-shared-libraries=0 --with-mpi-compilers="1 " --download-blacs="1 " --download-metis="1 " --download-parmetis="1 " --download-superlu_dist="1 " --download-hypre=1 PETSC_ARCH=gnu-dbg-32idx [0]PETSC ERROR: #1 VecSetLocalToGlobalMapping() line 83 in /home/frank/petsc/src/vec/vec/interface/vector.c --------------------- Error Message -------------------------------------------------------------- [2]PETSC ERROR: Null argument, when expecting valid pointer [2]PETSC ERROR: Null Object: Parameter # 2 [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [2]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1571-g7fc5cb5 GIT Date: 2016-10-05 10:56:19 -0500 [2]PETSC ERROR: ./test_ksp.exe on a gnu-dbg-32idx named kolmog1 by frank Wed Oct 5 18:58:44 2016 [2]PETSC ERROR: Configure options --known-mpi-shared="0 " --known-memcmp-ok --with-debugging="1 " --with-shared-libraries=0 --with-mpi-compilers="1 " --download-blacs="1 " --download-metis="1 " --download-parmetis="1 " --download-superlu_dist="1 " --download-hypre=1 PETSC_ARCH=gnu-dbg-32idx [2]PETSC ERROR: #1 VecSetLocalToGlobalMapping() line 83 in /home/frank/petsc/src/vec/vec/interface/vector.c [2]PETSC ERROR: [0]PETSC ERROR: #2 DMCreateGlobalVector_DA() line 45 in /home/frank/petsc/src/dm/impls/da/dadist.c [2]PETSC ERROR: #3 DMCreateGlobalVector() line 880 in /home/frank/petsc/src/dm/interface/dm.c #2 DMCreateGlobalVector_DA() line 45 in /home/frank/petsc/src/dm/impls/da/dadist.c [0]PETSC ERROR: #3 DMCreateGlobalVector() line 880 in /home/frank/petsc/src/dm/interface/dm.c [3]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [3]PETSC ERROR: Null argument, when expecting valid pointer [3]PETSC ERROR: Null Object: Parameter # 2 [3]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [3]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1571-g7fc5cb5 GIT Date: 2016-10-05 10:56:19 -0500 [3]PETSC ERROR: ./test_ksp.exe on a gnu-dbg-32idx named kolmog1 by frank Wed Oct 5 18:58:44 2016 [3]PETSC ERROR: Configure options --known-mpi-shared="0 " --known-memcmp-ok --with-debugging="1 " --with-shared-libraries=0 --with-mpi-compilers="1 " --download-blacs="1 " --download-metis="1 " --download-parmetis="1 " --download-superlu_dist="1 " --download-hypre=1 PETSC_ARCH=gnu-dbg-32idx [3]PETSC ERROR: #1 VecSetLocalToGlobalMapping() line 83 in /home/frank/petsc/src/vec/vec/interface/vector.c [3]PETSC ERROR: #2 DMCreateGlobalVector_DA() line 45 in /home/frank/petsc/src/dm/impls/da/dadist.c [3]PETSC ERROR: #3 DMCreateGlobalVector() line 880 in /home/frank/petsc/src/dm/interface/dm.c [4]PETSC ERROR: [6]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [6]PETSC ERROR: Null argument, when expecting valid pointer [6]PETSC ERROR: Null Object: Parameter # 2 [6]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. --------------------- Error Message -------------------------------------------------------------- [4]PETSC ERROR: Null argument, when expecting valid pointer [4]PETSC ERROR: Null Object: Parameter # 2 [4]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [4]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1571-g7fc5cb5 GIT Date: 2016-10-05 10:56:19 -0500 [4]PETSC ERROR: ./test_ksp.exe on a gnu-dbg-32idx named kolmog1 by frank Wed Oct 5 18:58:44 2016 [4]PETSC ERROR: Configure options --known-mpi-shared="0 " --known-memcmp-ok --with-debugging="1 " --with-shared-libraries=0 --with-mpi-compilers="1 " --download-blacs="1 " --download-metis="1 " --download-parmetis="1 " --download-superlu_dist="1 " --download-hypre=1 PETSC_ARCH=gnu-dbg-32idx [4]PETSC ERROR: #1 VecSetLocalToGlobalMapping() line 83 in /home/frank/petsc/src/vec/vec/interface/vector.c [4]PETSC ERROR: #2 DMCreateGlobalVector_DA() line 45 in /home/frank/petsc/src/dm/impls/da/dadist.c [4]PETSC ERROR: #3 DMCreateGlobalVector() line 880 in /home/frank/petsc/src/dm/interface/dm.c [6]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1571-g7fc5cb5 GIT Date: 2016-10-05 10:56:19 -0500 [6]PETSC ERROR: ./test_ksp.exe on a gnu-dbg-32idx named kolmog1 by frank Wed Oct 5 18:58:44 2016 [6]PETSC ERROR: Configure options --known-mpi-shared="0 " --known-memcmp-ok --with-debugging="1 " --with-shared-libraries=0 --with-mpi-compilers="1 " --download-blacs="1 " --download-metis="1 " --download-parmetis="1 " --download-superlu_dist="1 " --download-hypre=1 PETSC_ARCH=gnu-dbg-32idx [6]PETSC ERROR: #1 VecSetLocalToGlobalMapping() line 83 in /home/frank/petsc/src/vec/vec/interface/vector.c [6]PETSC ERROR: #2 DMCreateGlobalVector_DA() line 45 in /home/frank/petsc/src/dm/impls/da/dadist.c [6]PETSC ERROR: #3 DMCreateGlobalVector() line 880 in /home/frank/petsc/src/dm/interface/dm.c [5]PETSC ERROR: [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: Null argument, when expecting valid pointer [1]PETSC ERROR: Null Object: Parameter # 2 [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [1]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1571-g7fc5cb5 GIT Date: 2016-10-05 10:56:19 -0500 [1]PETSC ERROR: ./test_ksp.exe on a gnu-dbg-32idx named kolmog1 by frank Wed Oct 5 18:58:44 2016 [1]PETSC ERROR: Configure options --known-mpi-shared="0 " --known-memcmp-ok --with-debugging="1 " --with-shared-libraries=0 --with-mpi-compilers="1 " --download-blacs="1 " --download-metis="1 " --download-parmetis="1 " --download-superlu_dist="1 " --download-hypre=1 PETSC_ARCH=gnu-dbg-32idx [1]PETSC ERROR: #1 VecSetLocalToGlobalMapping() line 83 in /home/frank/petsc/src/vec/vec/interface/vector.c [1]PETSC ERROR: #2 DMCreateGlobalVector_DA() line 45 in /home/frank/petsc/src/dm/impls/da/dadist.c [1]PETSC ERROR: #3 DMCreateGlobalVector() line 880 in /home/frank/petsc/src/dm/interface/dm.c --------------------- Error Message -------------------------------------------------------------- [5]PETSC ERROR: Null argument, when expecting valid pointer [5]PETSC ERROR: Null Object: Parameter # 2 [5]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [5]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1571-g7fc5cb5 GIT Date: 2016-10-05 10:56:19 -0500 [5]PETSC ERROR: ./test_ksp.exe on a gnu-dbg-32idx named kolmog1 by frank Wed Oct 5 18:58:44 2016 [5]PETSC ERROR: Configure options --known-mpi-shared="0 " --known-memcmp-ok --with-debugging="1 " --with-shared-libraries=0 --with-mpi-compilers="1 " --download-blacs="1 " --download-metis="1 " --download-parmetis="1 " --download-superlu_dist="1 " --download-hypre=1 PETSC_ARCH=gnu-dbg-32idx [5]PETSC ERROR: #1 VecSetLocalToGlobalMapping() line 83 in /home/frank/petsc/src/vec/vec/interface/vector.c [5]PETSC ERROR: #2 DMCreateGlobalVector_DA() line 45 in /home/frank/petsc/src/dm/impls/da/dadist.c [5]PETSC ERROR: #3 DMCreateGlobalVector() line 880 in /home/frank/petsc/src/dm/interface/dm.c [7]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [7]PETSC ERROR: Null argument, when expecting valid pointer [7]PETSC ERROR: Null Object: Parameter # 2 [7]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [7]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1571-g7fc5cb5 GIT Date: 2016-10-05 10:56:19 -0500 [7]PETSC ERROR: ./test_ksp.exe on a gnu-dbg-32idx named kolmog1 by frank Wed Oct 5 18:58:44 2016 [7]PETSC ERROR: Configure options --known-mpi-shared="0 " --known-memcmp-ok --with-debugging="1 " --with-shared-libraries=0 --with-mpi-compilers="1 " --download-blacs="1 " --download-metis="1 " --download-parmetis="1 " --download-superlu_dist="1 " --download-hypre=1 PETSC_ARCH=gnu-dbg-32idx [7]PETSC ERROR: #1 VecSetLocalToGlobalMapping() line 83 in /home/frank/petsc/src/vec/vec/interface/vector.c [7]PETSC ERROR: #2 DMCreateGlobalVector_DA() line 45 in /home/frank/petsc/src/dm/impls/da/dadist.c [7]PETSC ERROR: #3 DMCreateGlobalVector() line 880 in /home/frank/petsc/src/dm/interface/dm.c From mirzadeh at gmail.com Wed Oct 5 21:21:45 2016 From: mirzadeh at gmail.com (Mohammad Mirzadeh) Date: Wed, 5 Oct 2016 22:21:45 -0400 Subject: [petsc-users] issue with NullSpaceRemove in parallel Message-ID: Hi folks, I am trying to track down a bug that is sometimes triggered when solving a singular system (poisson+neumann). It only seems to happen in parallel and halfway through the run. I can provide detailed information about the actual problem, but the error message I get boils down to this: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Invalid argument [0]PETSC ERROR: Scalar value must be same on all processes, argument # 2 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.6.3, Dec, 03, 2015 [0]PETSC ERROR: ./two_fluid_2d on a linux named bazantserver1 by mohammad Wed Oct 5 21:14:47 2016 [0]PETSC ERROR: Configure options PETSC_ARCH=linux --prefix=/usr/local --with-clanguage=cxx --with-c-support --with-shared-libraries --download-hypre --download-metis --download-parmetis --download-ml --download-superlu_dist COPTFLAGS=" -O3 -march=native" CXXOPTFLAGS=" -O3 -march=native" FOPTFLAGS=" -O3 -march=native" [0]PETSC ERROR: #1 VecShift() line 1480 in /tmp/petsc-3.6.3/src/vec/vec/utils/vinv.c [0]PETSC ERROR: #2 MatNullSpaceRemove() line 348 in /tmp/petsc-3.6.3/src/mat/interface/matnull.c [0]PETSC ERROR: #3 KSP_RemoveNullSpace() line 207 in /tmp/petsc-3.6.3/include/petsc/private/kspimpl.h [0]PETSC ERROR: #4 KSP_PCApply() line 243 in /tmp/petsc-3.6.3/include/petsc/private/kspimpl.h [0]PETSC ERROR: #5 KSPInitialResidual() line 63 in /tmp/petsc-3.6.3/src/ksp/ksp/interface/itres.c [0]PETSC ERROR: #6 KSPSolve_BCGS() line 50 in /tmp/petsc-3.6.3/src/ksp/ksp/impls/bcgs/bcgs.c [0]PETSC ERROR: #7 KSPSolve() line 604 in /tmp/petsc-3.6.3/src/ksp/ksp/interface/itfunc.c I understand this is somewhat vague question, but any idea what could cause this sort of problem? This was on 2 processors. The same code runs fine on a single processor. Also the solution seems to converge fine on previous iterations, e.g. this is the convergence info from the last iteration before the code breaks: 0 KSP preconditioned resid norm 6.814085878146e+01 true resid norm 2.885308600701e+00 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 3.067319980814e-01 true resid norm 8.480307326867e-02 ||r(i)||/||b|| 2.939133555699e-02 2 KSP preconditioned resid norm 1.526405979843e-03 true resid norm 1.125228519827e-03 ||r(i)||/||b|| 3.899855008762e-04 3 KSP preconditioned resid norm 2.199423175998e-05 true resid norm 4.232832916628e-05 ||r(i)||/||b|| 1.467029528695e-05 4 KSP preconditioned resid norm 5.382291463582e-07 true resid norm 8.438732856334e-07 ||r(i)||/||b|| 2.924724535283e-07 5 KSP preconditioned resid norm 9.495525177398e-09 true resid norm 1.408250768598e-08 ||r(i)||/||b|| 4.880763077669e-09 6 KSP preconditioned resid norm 9.249233376169e-11 true resid norm 2.795840275267e-10 ||r(i)||/||b|| 9.689917655907e-11 7 KSP preconditioned resid norm 1.138293762641e-12 true resid norm 2.559058680281e-12 ||r(i)||/||b|| 8.869272006674e-13 Also, if it matters, this is using hypre as PC and bcgs as KSP. Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Oct 5 22:18:15 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 5 Oct 2016 22:18:15 -0500 Subject: [petsc-users] issue with NullSpaceRemove in parallel In-Reply-To: References: Message-ID: <5BD3E1A6-0F72-431E-A11C-5D9B762DC194@mcs.anl.gov> The message "Scalar value must be same on all processes, argument # 2" comes up often when a Nan or Inf as gotten into the computation. The IEEE standard for floating point operations defines that Nan != Nan; I recommend running again with -fp_trap this should cause the code to stop with an error message as soon as the Nan or Inf is generated. Barry > On Oct 5, 2016, at 9:21 PM, Mohammad Mirzadeh wrote: > > Hi folks, > > I am trying to track down a bug that is sometimes triggered when solving a singular system (poisson+neumann). It only seems to happen in parallel and halfway through the run. I can provide detailed information about the actual problem, but the error message I get boils down to this: > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Invalid argument > [0]PETSC ERROR: Scalar value must be same on all processes, argument # 2 > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.6.3, Dec, 03, 2015 > [0]PETSC ERROR: ./two_fluid_2d on a linux named bazantserver1 by mohammad Wed Oct 5 21:14:47 2016 > [0]PETSC ERROR: Configure options PETSC_ARCH=linux --prefix=/usr/local --with-clanguage=cxx --with-c-support --with-shared-libraries --download-hypre --download-metis --download-parmetis --download-ml --download-superlu_dist COPTFLAGS=" -O3 -march=native" CXXOPTFLAGS=" -O3 -march=native" FOPTFLAGS=" -O3 -march=native" > [0]PETSC ERROR: #1 VecShift() line 1480 in /tmp/petsc-3.6.3/src/vec/vec/utils/vinv.c > [0]PETSC ERROR: #2 MatNullSpaceRemove() line 348 in /tmp/petsc-3.6.3/src/mat/interface/matnull.c > [0]PETSC ERROR: #3 KSP_RemoveNullSpace() line 207 in /tmp/petsc-3.6.3/include/petsc/private/kspimpl.h > [0]PETSC ERROR: #4 KSP_PCApply() line 243 in /tmp/petsc-3.6.3/include/petsc/private/kspimpl.h > [0]PETSC ERROR: #5 KSPInitialResidual() line 63 in /tmp/petsc-3.6.3/src/ksp/ksp/interface/itres.c > [0]PETSC ERROR: #6 KSPSolve_BCGS() line 50 in /tmp/petsc-3.6.3/src/ksp/ksp/impls/bcgs/bcgs.c > [0]PETSC ERROR: #7 KSPSolve() line 604 in /tmp/petsc-3.6.3/src/ksp/ksp/interface/itfunc.c > > I understand this is somewhat vague question, but any idea what could cause this sort of problem? This was on 2 processors. The same code runs fine on a single processor. Also the solution seems to converge fine on previous iterations, e.g. this is the convergence info from the last iteration before the code breaks: > > 0 KSP preconditioned resid norm 6.814085878146e+01 true resid norm 2.885308600701e+00 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 3.067319980814e-01 true resid norm 8.480307326867e-02 ||r(i)||/||b|| 2.939133555699e-02 > 2 KSP preconditioned resid norm 1.526405979843e-03 true resid norm 1.125228519827e-03 ||r(i)||/||b|| 3.899855008762e-04 > 3 KSP preconditioned resid norm 2.199423175998e-05 true resid norm 4.232832916628e-05 ||r(i)||/||b|| 1.467029528695e-05 > 4 KSP preconditioned resid norm 5.382291463582e-07 true resid norm 8.438732856334e-07 ||r(i)||/||b|| 2.924724535283e-07 > 5 KSP preconditioned resid norm 9.495525177398e-09 true resid norm 1.408250768598e-08 ||r(i)||/||b|| 4.880763077669e-09 > 6 KSP preconditioned resid norm 9.249233376169e-11 true resid norm 2.795840275267e-10 ||r(i)||/||b|| 9.689917655907e-11 > 7 KSP preconditioned resid norm 1.138293762641e-12 true resid norm 2.559058680281e-12 ||r(i)||/||b|| 8.869272006674e-13 > > Also, if it matters, this is using hypre as PC and bcgs as KSP. > > Thanks From bsmith at mcs.anl.gov Wed Oct 5 22:50:59 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 5 Oct 2016 22:50:59 -0500 Subject: [petsc-users] create global vector in latest version of petsc In-Reply-To: References: <1FFE2C62-EA4B-4D6D-A538-4D32ACF35CA7@mcs.anl.gov> Message-ID: <39ED9CA4-2490-4462-8F44-7787D1AEE048@mcs.anl.gov> Sorry, as indicated in http://www.mcs.anl.gov/petsc/documentation/changes/dev.html in order to get the previous behavior of DMDACreate3d() you need to follow it with the two lines DMSetFromOptions(da); DMSetUp(da); Barry > On Oct 5, 2016, at 9:11 PM, Hengjie Wang wrote: > > Hi, > > I just tried .F90. It had the error. I attached the full error log. > > Thank you. > > Frank > > > On 10/5/2016 6:57 PM, Barry Smith wrote: >> PETSc fortran programs should always end with .F90 not .f90 can you try again with that name? The capital F is important. >> >> Barry >> >>> On Oct 5, 2016, at 7:57 PM, frank wrote: >>> >>> Hi, >>> >>> I update petsc to the latest version by pulling from the repo. Then I find one of my old code, which worked before, output errors now. >>> After debugging, I find that the error is caused by "DMCreateGlobalVector". >>> I attach a short program which can re-produce the error. This program works well with an older version of petsc. >>> I also attach the script I used to configure petsc. >>> >>> The error message is below. Did I miss something in the installation ? Thank you. >>> >>> 1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>> 2 [0]PETSC ERROR: Null argument, when expecting valid pointer >>> 3 [0]PETSC ERROR: Null Object: Parameter # 2 >>> 4 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>> 5 [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1571-g7fc5cb5 GIT Date: 2016-10-05 10:56:19 -0500 >>> 6 [0]PETSC ERROR: [2]PETSC ERROR: ./test_ksp.exe on a gnu-dbg-32idx named kolmog1 by frank Wed Oct 5 17:40:07 2016 >>> 7 [0]PETSC ERROR: Configure options --known-mpi-shared="0 " --known-memcmp-ok --with-debugging="1 " --with-shared-libraries=0 --with-mpi-compilers="1 " --download-blacs="1 " --download-metis="1 " --dow nload-parmetis="1 " --download-superlu_dist="1 " --download-hypre=1 PETSC_ARCH=gnu-dbg-32idx >>> 8 [0]PETSC ERROR: #1 VecSetLocalToGlobalMapping() line 83 in /home/frank/petsc/src/vec/vec/interface/vector.c >>> 9 [0]PETSC ERROR: #2 DMCreateGlobalVector_DA() line 45 in /home/frank/petsc/src/dm/impls/da/dadist.c >>> 10 [0]PETSC ERROR: #3 DMCreateGlobalVector() line 880 in /home/frank/petsc/src/dm/interface/dm.c >>> >>> >>> Regards, >>> Frank >>> >>> >>> >>> > > From hengjiew at uci.edu Wed Oct 5 23:08:26 2016 From: hengjiew at uci.edu (Hengjie Wang) Date: Wed, 5 Oct 2016 21:08:26 -0700 Subject: [petsc-users] create global vector in latest version of petsc In-Reply-To: <39ED9CA4-2490-4462-8F44-7787D1AEE048@mcs.anl.gov> References: <1FFE2C62-EA4B-4D6D-A538-4D32ACF35CA7@mcs.anl.gov> <39ED9CA4-2490-4462-8F44-7787D1AEE048@mcs.anl.gov> Message-ID: <05945afe-9e6e-7cbe-e3c6-5781082cdc04@uci.edu> Hi, There is no error now. Thank you so much. Frank On 10/5/2016 8:50 PM, Barry Smith wrote: > Sorry, as indicated in http://www.mcs.anl.gov/petsc/documentation/changes/dev.html in order to get the previous behavior of > DMDACreate3d() you need to follow it with the two lines > > DMSetFromOptions(da); > DMSetUp(da); > > Barry > > > >> On Oct 5, 2016, at 9:11 PM, Hengjie Wang wrote: >> >> Hi, >> >> I just tried .F90. It had the error. I attached the full error log. >> >> Thank you. >> >> Frank >> >> >> On 10/5/2016 6:57 PM, Barry Smith wrote: >>> PETSc fortran programs should always end with .F90 not .f90 can you try again with that name? The capital F is important. >>> >>> Barry >>> >>>> On Oct 5, 2016, at 7:57 PM, frank wrote: >>>> >>>> Hi, >>>> >>>> I update petsc to the latest version by pulling from the repo. Then I find one of my old code, which worked before, output errors now. >>>> After debugging, I find that the error is caused by "DMCreateGlobalVector". >>>> I attach a short program which can re-produce the error. This program works well with an older version of petsc. >>>> I also attach the script I used to configure petsc. >>>> >>>> The error message is below. Did I miss something in the installation ? Thank you. >>>> >>>> 1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>> 2 [0]PETSC ERROR: Null argument, when expecting valid pointer >>>> 3 [0]PETSC ERROR: Null Object: Parameter # 2 >>>> 4 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>> 5 [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1571-g7fc5cb5 GIT Date: 2016-10-05 10:56:19 -0500 >>>> 6 [0]PETSC ERROR: [2]PETSC ERROR: ./test_ksp.exe on a gnu-dbg-32idx named kolmog1 by frank Wed Oct 5 17:40:07 2016 >>>> 7 [0]PETSC ERROR: Configure options --known-mpi-shared="0 " --known-memcmp-ok --with-debugging="1 " --with-shared-libraries=0 --with-mpi-compilers="1 " --download-blacs="1 " --download-metis="1 " --dow nload-parmetis="1 " --download-superlu_dist="1 " --download-hypre=1 PETSC_ARCH=gnu-dbg-32idx >>>> 8 [0]PETSC ERROR: #1 VecSetLocalToGlobalMapping() line 83 in /home/frank/petsc/src/vec/vec/interface/vector.c >>>> 9 [0]PETSC ERROR: #2 DMCreateGlobalVector_DA() line 45 in /home/frank/petsc/src/dm/impls/da/dadist.c >>>> 10 [0]PETSC ERROR: #3 DMCreateGlobalVector() line 880 in /home/frank/petsc/src/dm/interface/dm.c >>>> >>>> >>>> Regards, >>>> Frank >>>> >>>> >>>> >>>> >> From sb020287 at gmail.com Thu Oct 6 01:09:31 2016 From: sb020287 at gmail.com (Somdeb Bandopadhyay) Date: Thu, 6 Oct 2016 14:09:31 +0800 Subject: [petsc-users] using DMDA with python In-Reply-To: References: Message-ID: Thanks alot for all of your suggestions. I think I have a better insight about the direction now. On Thu, Oct 6, 2016 at 2:03 AM, Dave May wrote: > > > On 5 October 2016 at 18:49, Matthew Knepley wrote: > >> On Wed, Oct 5, 2016 at 11:19 AM, E. Tadeu wrote: >> >>> Matt, >>> >>> Do you know if there is any example of solving Navier Stokes using a >>> staggered approach by using a different DM object such as DMPlex? >>> >> >> SNES ex62 can do P2/P1 Stokes, which is similar. Is that what you want to >> see? >> >> For real structured grid, staggered mesh stuff like MAC, I would just do >> this on a single DMDA, but think of it as being staggered, and expand my >> stencil as necessary. >> > > Following that up, for a DMDA example using a staggered grid, take a look > at snes/ex30.c > > http://www.mcs.anl.gov/petsc/petsc-current/src/snes/ > examples/tutorials/ex30.c.html > > Thanks, > Dave > > >> >> Thanks, >> >> Matt >> >> >>> >>> Thanks, >>> Edson >>> >>> >>> On Tue, Oct 4, 2016 at 11:12 PM, Matthew Knepley >>> wrote: >>> >>>> On Tue, Oct 4, 2016 at 9:02 PM, Somdeb Bandopadhyay >>> > wrote: >>>> >>>>> Dear all, >>>>> I want to write a solver for incompressible navier stokes >>>>> using python and I want to use PETsc (particularly dmda & ksp) for this. >>>>> May I know if this type of work is feasible/already done? >>>>> >>>> >>>> How do you plan to discretize your system? DMDA supports only >>>> collocation discretizations, so some sort of penalty for pressure would >>>> have to be employed. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> I intend to run my solver in a cluster and so am slightly >>>>> concerned about the performance if I use python with petsc. >>>>> My deepest apologies if this mail of mine caused you any >>>>> inconvenience. >>>>> >>>>> Somdeb >>>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From snakexf at gmail.com Thu Oct 6 02:12:25 2016 From: snakexf at gmail.com (Feng Xing) Date: Thu, 6 Oct 2016 09:12:25 +0200 Subject: [petsc-users] petsc init in class constructor Message-ID: <1520E543-8F17-4634-8747-9796E58DBF03@gmail.com> Hello everyone, I would like to write a c++ class which solve a linear system with petsc (code as following). Petsc is used only in this class. So I call MPI_Init in main.cpp, but PetscInitialize and PetscFinalise are in constructor/destructor of class. I am wondering if this way is safe? class solvepetsc{ solvepetsc(int argc, char** argv){ PetscInitialize(&argc, &argv, NULL, NULL); }; ~solvepetsc(){ PetscFinalize(); }; // ... } Thanks a lot and best regards, Feng Xing Postdoc INRIA France From rupp at iue.tuwien.ac.at Thu Oct 6 03:39:59 2016 From: rupp at iue.tuwien.ac.at (Karl Rupp) Date: Thu, 6 Oct 2016 10:39:59 +0200 Subject: [petsc-users] petsc init in class constructor In-Reply-To: <1520E543-8F17-4634-8747-9796E58DBF03@gmail.com> References: <1520E543-8F17-4634-8747-9796E58DBF03@gmail.com> Message-ID: Hi, > I would like to write a c++ class which solve a linear system with petsc (code as following). > Petsc is used only in this class. So I call MPI_Init in main.cpp, but PetscInitialize and PetscFinalise are in constructor/destructor of class. > I am wondering if this way is safe? > > class solvepetsc{ > > solvepetsc(int argc, char** argv){ > PetscInitialize(&argc, &argv, NULL, NULL); > }; > > ~solvepetsc(){ > PetscFinalize(); > }; > > // ... > } well, you could do it this way, but the devil's in the detail: You should check the return value of PetscInitialize() and PetscFinalize() for error checking. Now, how do you plan to deal with an error in PetscInitialize()? You can't throw an exception, because that will terminate your program immediately. Delaying the error checks is likely to be fragile. Another question: How do you instances of solvepetsc do you expect? If you have more than one instance in use at the same time, you will run into problems due to an oversubscription of the internal global variables (logging, etc.). Long story short: I'd encourage you not to initialize PETSc in a constructor, unless you know exactly what you are doing and can control the possible side effects. Best regards, Karli From stefano.zampini at gmail.com Thu Oct 6 03:42:45 2016 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Thu, 6 Oct 2016 11:42:45 +0300 Subject: [petsc-users] petsc init in class constructor In-Reply-To: References: <1520E543-8F17-4634-8747-9796E58DBF03@gmail.com> Message-ID: > > Long story short: I'd encourage you not to initialize PETSc in a constructor, unless you know exactly what you are doing and can control the possible side effects. > completely agree on that. From stefano.zampini at gmail.com Thu Oct 6 03:44:38 2016 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Thu, 6 Oct 2016 11:44:38 +0300 Subject: [petsc-users] petsc init in class constructor In-Reply-To: References: <1520E543-8F17-4634-8747-9796E58DBF03@gmail.com> Message-ID: <72E8E4ED-6BD6-4D60-B7AF-C1B3C9BC3D50@gmail.com> Sorry, I realized I replied to Feng and not to the list. Below is my previous email. Feng, Your class is fine as long as the constructor of your solver is called AFTER MPI_Init, and the destructor BEFORE MPI_Finalize. This means that you cannot have something like int main (int argc, char**argv) { MPI_Init(); solverpetsc S(argc,argv); MPI_Finalize(); return 0; } as the destructor for S will be called after MPI_Finalize(). If you call PetscInitialize before MPI_Init, PETSc will call it for you. But then, it will also call MPI_Finalize during PetscFinalize(). so, the following code will abort at any_MPI_call() int main (int argc, char**argv) { solverpetsc *S; S = new solverpetsc(argc,argv); ?. ?. delete S; any_MPI_call(); return 0; } On Oct 6, 2016, at 11:42 AM, Stefano Zampini wrote: >> >> Long story short: I'd encourage you not to initialize PETSc in a constructor, unless you know exactly what you are doing and can control the possible side effects. >> > > completely agree on that. > From snakexf at gmail.com Thu Oct 6 04:03:44 2016 From: snakexf at gmail.com (Feng Xing) Date: Thu, 6 Oct 2016 11:03:44 +0200 Subject: [petsc-users] petsc init in class constructor In-Reply-To: <72E8E4ED-6BD6-4D60-B7AF-C1B3C9BC3D50@gmail.com> References: <1520E543-8F17-4634-8747-9796E58DBF03@gmail.com> <72E8E4ED-6BD6-4D60-B7AF-C1B3C9BC3D50@gmail.com> Message-ID: Hello Stefano and Karl, Thanks a lot for your detailed replies. I understand now it is not a good idea to put petscinit in constructor. :-) Best reagards, Feng > On 06 Oct 2016, at 10:44, Stefano Zampini wrote: > > Sorry, > I realized I replied to Feng and not to the list. Below is my previous email. > > Feng, > > Your class is fine as long as the constructor of your solver is called AFTER MPI_Init, and the destructor BEFORE MPI_Finalize. > This means that you cannot have something like > > int main (int argc, char**argv) > { > MPI_Init(); > solverpetsc S(argc,argv); > MPI_Finalize(); > return 0; > } > > as the destructor for S will be called after MPI_Finalize(). > > If you call PetscInitialize before MPI_Init, PETSc will call it for you. But then, it will also call MPI_Finalize during PetscFinalize(). > > so, the following code will abort at any_MPI_call() > > int main (int argc, char**argv) > { > solverpetsc *S; > > S = new solverpetsc(argc,argv); > ?. > > ?. > delete S; > any_MPI_call(); > return 0; > } > On Oct 6, 2016, at 11:42 AM, Stefano Zampini wrote: > >>> >>> Long story short: I'd encourage you not to initialize PETSc in a constructor, unless you know exactly what you are doing and can control the possible side effects. >>> >> >> completely agree on that. >> > From ibarletta at inogs.it Thu Oct 6 04:22:43 2016 From: ibarletta at inogs.it (Ivano Barletta) Date: Thu, 6 Oct 2016 11:22:43 +0200 Subject: [petsc-users] Using Petsc with Finite Elements Domain Decomposition In-Reply-To: References: Message-ID: Hello everyone Recently I resumed the task of nesting Petsc into this fem ocean model, for the solution of a linear system I followed your suggestions and "almost" everything works. The problem raised during a run with 4 CPUs, when i got this error 3:[3]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- 3:[3]PETSC ERROR: Petsc has generated inconsistent data 3:[3]PETSC ERROR: Negative MPI source! 3:[3]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. 3:[3]PETSC ERROR: Petsc Release Version 3.7.1, May, 15, 2016 3:[3]PETSC ERROR: /users/home/ib04116/shympi_last4/fem3d/shympi ^A on a linux-gnu-intel named n243.cluster.net by ib04116 Thu Oct 6 10:37:01 2016 3:[3]PETSC ERROR: Configure options CFLAGS=-I/users/home/opt/netcdf/netcdf-4.2.1.1/include -I/users/home/opt/szip/szip-2.1/include -I/users/home/opt/hdf5/hdf5-1.8.10-patch1/include -I/usr/include -I/users/home/opt/netcdf/netcdf-4.3/include -I/users/home/opt/hdf5/hdf5-1.8.11/include FFLAGS=-xHost -no-prec-div -O3 -I/users/home/opt/netcdf/netcdf-4.2.1.1/include -I/users/home/opt/netcdf/netcdf-4.3/include LDFLAGS=-L/users/home/opt/netcdf/netcdf-4.2.1.1/lib -lnetcdff -L/users/home/opt/szip/szip-2.1/lib -L/users/home/opt/hdf5/hdf5-1.8.10-patch1/lib -L/users/home/opt/netcdf/netcdf-4.2.1.1/lib -L/usr/lib64/ -lz -lnetcdf -lnetcdf -lgpfs -L/users/home/opt/netcdf/netcdf-4.3/lib -L/users/home/opt/hdf5/hdf5-1.8.11/lib -L/users/home/opt/netcdf/netcdf-4.3/lib -lcurl --PETSC_ARCH=linux-gnu-intel --with-cc=mpiicc --with-fc=mpiifort --with-cxx=mpiicpc --with-mpiexec=mpirun --with-blas-lapack-dir=/users/home/opt/intel/composer_xe_2013/mkl --with-scalapack-lib="-L/users/home/opt/intel/composer_xe_2013/mkl//lib/intel64 -lmkl_scalapack_ilp64 -lmkl_blacs_intelmpi_ilp64" --with-scalapack-include=/users/home/opt/intel/composer_xe_2013/mkl/include --download-metis --download-parmetis --download-mumps --download-superlu 3:[3]PETSC ERROR: #1 MatStashScatterGetMesg_Ref() line 692 in /users/home/sco116/petsc/petsc-3.7.1/src/mat/utils/matstash.c 3:[3]PETSC ERROR: #2 MatStashScatterGetMesg_Private() line 663 in /users/home/sco116/petsc/petsc-3.7.1/src/mat/utils/matstash.c 3:[3]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 713 in /users/home/sco116/petsc/petsc-3.7.1/src/mat/impls/aij/mpi/mpiaij.c 3:[3]PETSC ERROR: #4 MatAssemblyEnd() line 5187 in /users/home/sco116/petsc/petsc-3.7.1/src/mat/interface/matrix.c The code is in fortran and the Petsc version is 3.7.1 This error looks quite strange to me, because it doesn't happen always in the same situation. The model goes through several time steps, but this error is not raised always at the same time. It has happened at the fourth, for example, at the fifth time step. What is even more odd is that once the run of the model (720 time steps) was completed without any error. What I do to solve the linear system for each time step is the following: call petsc_solve( ..arguments..) subroutine petsc_solve(..args) call PetscInitialize(PETSC_NULL_CHARACTER) call MatCreate ... ... call KSPSolve(...) call XXXDestroy() call PetscFinalize end subroutine Do you think that calling PetscInitialize and PetscFinalize several times might cause problems? I guess Petsc use the same communicator of the model, which is MPI_COMM_WORLD It don't have hints to troubleshoot this, since is not a reproducible error and I don't know where to look to sort it out. Have you got any suggestion? Thanks in advance Ivano 2016-07-13 5:16 GMT+02:00 Barry Smith : > > > On Jul 12, 2016, at 4:13 AM, Matthew Knepley wrote: > > > > On Tue, Jul 12, 2016 at 3:35 AM, Ivano Barletta > wrote: > > Dear Petsc users > > > > my aim is to parallelize the solution of a linear > > system into a finite elements > > ocean model. > > > > The model has been almost entirely parallelized, with > > a partitioning of the domain made element-wise through > > the use of Zoltan libraries, so the subdomains > > share the nodes lying on the edges. > > > > The linear system includes node-to-node dependencies > > so my guess is that I need to create an halo surrounding > > each subdomain, to allow connections of edge nodes with > > neighbour subdomains ones > > > > Apart from that, my question is if Petsc accept a > > previously made partitioning (maybe taking into account of halo) > > using the data structures coming out of it > > > > Has anybody of you ever faced a similar problem? > > > > If all you want to do is construct a PETSc Mat and Vec for the linear > system, > > just give PETSc the non-overlapping partition to create those objects. > You > > can input values on off-process partitions automatically using > MatSetValues() > > and VecSetValues(). > > Note that by just using the VecSetValues() and MatSetValues() PETSc will > manage all the halo business needed by the linear algebra system solver > automatically. You don't need to provide any halo information to PETSc. It > is really straightforward. > > Barry > > > > > Thanks, > > > > Matt > > > > Thanks in advance > > Ivano > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kyungjun.choi92 at gmail.com Thu Oct 6 09:23:48 2016 From: kyungjun.choi92 at gmail.com (Choi Kyungjun) Date: Thu, 6 Oct 2016 23:23:48 +0900 Subject: [petsc-users] Question about using MatSNESMFWPSetComputeNormU In-Reply-To: <64FDE88F-897A-48C0-89E4-214DB4619DAF@mcs.anl.gov> References: <04A37A4A-66DA-461A-983A-B031EFD0183D@mcs.anl.gov> <88CE2F93-938B-46D8-BEC5-7287E2430353@mcs.anl.gov> <64FDE88F-897A-48C0-89E4-214DB4619DAF@mcs.anl.gov> Message-ID: Dear Matt. Thank you very much for your help. I'm currently working on PETSc library into 2-D compressible Euler / NS equation solver, especially for convergence of steady state problem. I adjusted my flow code as you told me, using snes_mf command line option, but I have a question about SNESsetFunction, especially function evaluation routine. My command line option goes like this, as you told me. *-snes_mf -pc_type none -snes_view -snes_monitor -ksp_monitor -snes_converged_reason -ksp_converged_reason* I remember that if I use snes_mf option, the matrix-free method is applied with computing Jacobian like below. [image: ?? ??? 1](captured from Petsc Manual p.113) But I computed Jacobian with function evaluation routine, with SNESSetFunction(snes, r, FormPetscResidual, userctx, ier). I referred to my reference code which computes Jacobian like below. F'(u) a = -F(u) - (volume)/dt *a This is just reverse calculation of equation, not matrix-free form. This is done at the function evaluation routine (FormPetscResidual). *I want to ask how I can use the REAL matrix-free form.* I'll attach my flow code and computation log below. Thank you so much every time for your sincere help. Kyungjun. *================================================================================* *(This is the flow code, and vectors are already created.)* *call SNESCreate(PETSC_COMM_WORLD, Mixt%snes, ier)* *call SNESSetFunction(Mixt%snes, Mixt%r, FormPetscResidual, Collect, ier)* *call SNESSetFromOptions(Mixt%snes, ier)* *================================================================================* *(This is function evaluation routine)* *subroutine FormPetscResidual(snes, x, f, Collect, ier)* type(t_Collect), intent(inout) :: Collect SNES :: snes Vec :: x, f integer :: ier, counter, iCell, iVar, temp integer :: ndof real(8), allocatable :: CVar(:,:) real(8), allocatable :: PVar(:,:) PetscScalar, pointer :: xx_v(:) PetscScalar, pointer :: ff_v(:) ! Set degree of freedom of this system. ndof = Collect%pMixt%nCVar * Collect%pGrid%nCell ! Backup the original values for cv to local array CVar allocate( CVar(0:Collect%pMixt%nCVar-1, Collect%pGrid%nCell) ) allocate( PVar(0:Collect%pMixt%nPVar-1, Collect%pGrid%nCell) ) allocate( xx_v(1:ndof) ) allocate( ff_v(1:ndof) ) xx_v(:) = 0d0 ff_v(:) = 0d0 ! Backup the original values for cv and pv do iCell = 1, Collect%pGrid%nCell do iVar = 0, Collect%pMixt%nCVar-1 CVar(iVar,iCell) = Collect%pMixt%cv(iVar,iCell) PVar(iVar,iCell) = Collect%pMixt%pv(iVar,iCell) end do end do ! Copy the input argument vector x to array value xx_v call VecGetArrayReadF90(x, xx_v, ier) call VecGetArrayF90(f, ff_v, ier) ! Compute copy the given vector into Mixt%cv and check for validity counter = 0 do iCell = 1, Collect%pGrid%nCell do iVar = 0, Collect%pMixt%nCVar-1 counter = counter + 1 Collect%pMixt%cv(iVar,iCell) = xx_v(counter) end do end do ! Update primitive variables with input x vector to compute residual call PostProcessing(Collect%pMixt,Collect%pGrid,Collect%pConf) ! Compute the residual call ComputeResidual(Collect%pMixt,Collect%pGrid,Collect%pConf) --> where update residual of cell ! Copy the residual array into the PETSc vector counter = 0 do iCell = 1, Collect%pGrid%nCell do iVar = 0, Collect%pMixt%nCVar-1 counter = counter + 1 * ff_v(counter) = Collect%pMixt%Residual(iVar,iCell) + Collect%pGrid%vol(iCell)/Collect%pMixt%TimeStep(iCell)*( Collect%pMixt%cv(iVar,iCell) - CVar(iVar,iCell) )* end do end do ! Restore conservative variables do iCell = 1, Collect%pGrid%nCell do iVar = 0, Collect%pMixt%nCVar-1 Collect%pMixt%cv(iVar,iCell) = CVar(iVar,iCell) Collect%pMixt%pv(iVar,iCell) = PVar(iVar,iCell) end do end do call VecRestoreArrayReadF90(x, xx_v, ier) call VecRestoreArrayF90(f, ff_v, ier) deallocate(CVar) deallocate(PVar) *end subroutine* *================================================================================* *Computation log* [image: ?? ??? 2] 2016-08-19 21:14 GMT+09:00 Barry Smith : > > It looks like the SNESView() you have below was called before you ever > did a solve, hence it prints the message "information may be incomplete". > Note also zero function evaluations have been done in the SNESSolve, if the > solve had been called it should be great than 0. > > SNES Object: 1 MPI processes > type: newtonls > SNES has not been set up so information may be incomplete > > This is also why it prints > > The compute h routine has not yet been set > > The information about the h routine won't be printed until after an actual > solve is done and the "compute h" function is set. > > Barry > > Note you can call MatMFFDSetType() to control the "compute h" function > that is used. > > > > > On Aug 19, 2016, at 12:04 AM, ??? wrote: > > > > Dear Barry and Matt. > > > > Thank you very much for helping me up all night. (in my time) > > > > And sorry for not asking with sufficient source code condition or my > circumstances. (also with poor English.) > > > > > > I just want to make sure that the options of my code is well applied. > > > > I'm trying to use GMRES with matrix-free method. I'd like to solve 2-D > euler equation without preconditioning matrix, for now. > > > > > > 1) I'm still curious whether my snes context is using MF jacobian. ( > just like -snes_mf command line option) > > > > 2) And mind if I ask you that whether I applied petsc functions properly? > > > > I'll check out ex5 for applying command line options. > > > > > > I'll attach my petsc flow code and option log by SNESView() below. > > ------------------------------------------------------------ > -------------------------------------------------------- > > - petsc flow code > > ------------------------------------------------------------ > -------------------------------------------------------- > > > > ndof = Mixt%nCVar * Grid%nCell > > > > call VecCreateMPIWIthArray(PETSC_COMM_WORLD, Mixt%nCVar, ndof, > PETSC_DECIDE, Mixt%cv, Mixt%x, ier) > > call VecDuplicate(Mixt%x, Mixt%r, ier) > > call VecSet(Mixt%r, zero, ier) > > > > call SNESCreate(PETSC_COMM_WORLD, Mixt%snes, ier) > > call SNESSetFunction(Mixt%snes, Mixt%r, FormPetscResidual, Collect, ier) > > call MatCreateSNESMF(Mixt%snes, Mixt%A, ier) > > > > call SNESSetJacobian(Mixt%snes, Mixt%A, Mixt%A, MatMFFDComputeJacobian, > Collect, ier) > > call SNESSetFromOptions(Mixt%snes, ier) > > > > call SNESGetKSP(Mixt%snes, ksp, ier) > > call KSPSetType(ksp, KSPGMRES, ier) > > call KSPGetPC(ksp, pc, ier) > > call PCSetType(pc, PCNONE, ier) > > call KSPSetInitialGuessNonzero(ksp, PETSC_TRUE, ier) > > call KSPGMRESSetRestart(ksp, 30, ier) > > call KSPGMRESSetPreAllocation(ksp, ier) > > > > > > call SNESSetFunction(Mixt%snes, Mixt%r, FormPetscResidual, Collect, ier) > > call SNESSetJacobian(Mixt%snes, Mixt%A, Mixt%A, MatMFFDComputeJacobian, > Collect, ier) > > > > call SNESSolve(Mixt%snes, PETSC_NULL_OBJECT, Mixt%x, ier) > > > > stop ( for temporary ) > > > > > > ------------------------------------------------------------ > -------------------------------------------------------- > > subroutine FormPetscResidual(snes, x, f, Collect, ier) > > type(t_Collect), intent(inout) :: Collect > > > > SNES :: snes > > Vec :: x, f > > integer :: ier, counter, iCell, iVar, temp > > integer :: ndof > > real(8), allocatable :: CVar(:,:) > > real(8), allocatable :: PVar(:,:) > > PetscScalar, pointer :: xx_v(:) > > PetscScalar, pointer :: ff_v(:) > > > > ! Set degree of freedom of this system. > > ndof = Collect%pMixt%nCVar * Collect%pGrid%nCell > > > > ! Backup the original values for cv to local array CVar > > allocate( CVar(0:Collect%pMixt%nCVar-1, Collect%pGrid%nCell) ) > > allocate( PVar(0:Collect%pMixt%nPVar-1, Collect%pGrid%nCell) ) > > allocate( xx_v(1:ndof) ) > > allocate( ff_v(1:ndof) ) > > xx_v(:) = 0d0 > > ff_v(:) = 0d0 > > > > ! Backup the original values for cv and pv > > do iCell = 1, Collect%pGrid%nCell > > do iVar = 0, Collect%pMixt%nCVar-1 > > CVar(iVar,iCell) = Collect%pMixt%cv(iVar,iCell) > > PVar(iVar,iCell) = Collect%pMixt%pv(iVar,iCell) > > end do > > end do > > > > ! Copy the input argument vector x to array value xx_v > > call VecGetArrayReadF90(x, xx_v, ier) > > call VecGetArrayF90(f, ff_v, ier) > > > > ! Compute copy the given vector into Mixt%cv and check for validity > > counter = 0 > > do iCell = 1, Collect%pGrid%nCell > > do iVar = 0, Collect%pMixt%nCVar-1 > > counter = counter + 1 > > Collect%pMixt%cv(iVar,iCell) = xx_v(counter) > > end do > > end do > > > > ! Update primitive variables with input x vector to compute residual > > call PostProcessing(Collect%pMixt,Collect%pGrid,Collect%pConf) > > > > > > ! Compute the residual > > call ComputeResidual(Collect%pMixt,Collect%pGrid,Collect%pConf) --> > where update residual of cell > > > > ! Copy the residual array into the PETSc vector > > counter = 0 > > do iCell = 1, Collect%pGrid%nCell > > do iVar = 0, Collect%pMixt%nCVar-1 > > counter = counter + 1 > > > > ff_v(counter) = Collect%pMixt%Residual(iVar,iCell) + > Collect%pGrid%vol(iCell)/Collect%pMixt%TimeStep(iCell)*( > Collect%pMixt%cv(iVar,iCell) - CVar(iVar,iCell) ) > > end do > > end do > > > > ! Restore conservative variables > > do iCell = 1, Collect%pGrid%nCell > > do iVar = 0, Collect%pMixt%nCVar-1 > > Collect%pMixt%cv(iVar,iCell) = CVar(iVar,iCell) > > Collect%pMixt%pv(iVar,iCell) = PVar(iVar,iCell) > > end do > > end do > > > > call VecRestoreArrayReadF90(x, xx_v, ier) > > call VecRestoreArrayF90(f, ff_v, ier) > > > > deallocate(CVar) > > deallocate(PVar) > > ------------------------------------------------------------ > -------------------------------------------------------- > > > > > > ------------------------------------------------------------ > -------------------------------------------------------- > > - option log > > ------------------------------------------------------------ > -------------------------------------------------------- > > SNES Object: 1 MPI processes > > type: newtonls > > SNES has not been set up so information may be incomplete > > maximum iterations=1, maximum function evaluations=10000 > > tolerances: relative=1e-08, absolute=1e-32, solution=1e-08 > > total number of linear solver iterations=0 > > total number of function evaluations=0 > > norm schedule ALWAYS > > SNESLineSearch Object: 1 MPI processes > > type: bt > > interpolation: cubic > > alpha=1.000000e-04 > > maxstep=1.000000e+08, minlambda=1.000000e-12 > > tolerances: relative=1.000000e-08, absolute=1.000000e-15, > lambda=1.000000e-08 > > maximum iterations=40 > > KSP Object: 1 MPI processes > > type: gmres > > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > GMRES: happy breakdown tolerance 1e-30 > > maximum iterations=10000 > > tolerances: relative=0.001, absolute=1e-50, divergence=10000. > > left preconditioning > > using nonzero initial guess > > using DEFAULT norm type for convergence test > > PC Object: 1 MPI processes > > type: none > > PC has not been set up so information may be incomplete > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: mffd > > rows=11616, cols=11616 > > Matrix-free approximation: > > err=1.49012e-08 (relative error in function evaluation) > > The compute h routine has not yet been set > > > > > > Sincerely, > > > > Kyungjun > > > > > > 2016-08-19 13:00 GMT+09:00 Barry Smith : > > > > > On Aug 18, 2016, at 10:28 PM, ??? wrote: > > > > > > Dear Matt. > > > > > > I didn't use the command line options because it looked not working. > > > > > > I called SNESSetFromOptions(snes, ier) in my source code, > > > > > > but options like -snes_mf or -snes_monitor doesn't look working. > > > > "doesn't work" is not useful to help us figure out what has gone > wrong. You need to show us EXACTLY what you did by sending the code you > compiled and the command line options you ran and all the output include > full error messages. Without the information we simply do not have enough > information to even begin to guess why it "doesn't work". > > > > Barry > > > > > > > > > > > > > Is there anything that I should consider more? > > > > > > > > > 2016-08-19 4:47 GMT+09:00 Matthew Knepley : > > > On Thu, Aug 18, 2016 at 2:44 PM, ??? > wrote: > > > Is there a part that you considered this as finite-difference > approximation? > > > I thought I used matrix-free method with MatCreateSNESMF() function > > > > > > You did not tell the SNES to use a MF Jacobian, you just made a Mat > object. This is why > > > we encourage people to use the command line. Everything is setup > correctly and in order. > > > Why would you choose not to. This creates long rounds of email. > > > > > > Matt > > > > > > Also I used > > > - call PCSetType(pc, PCNONE, ier) --> so the pc type shows 'none' at > the log > > > > > > > > > I didn't use any of command line options. > > > > > > > > > Kyungjun > > > > > > 2016-08-19 4:27 GMT+09:00 Barry Smith : > > > > > > You can't use that Jacobian function SNESComputeJacobianDefault > with matrix free, it tries to compute the matrix entries and stick them > into the matrix. You can use MatMFFDComputeJacobian > > > > > > > On Aug 18, 2016, at 2:03 PM, ??? wrote: > > > > > > > > I got stuck at FormJacobian stage. > > > > > > > > - call SNESComputeJacobianDefault(snes, v, J, pJ, FormResidual, > ier) --> J & pJ are same with A matrix-free matrix (input argument) > > > > > > > > > > > > > > > > with these kind of messages.. > > > > > > > > [0]PETSC ERROR: No support for this operation for this object type > > > > [0]PETSC ERROR: Mat type mffd > > > > > > > > > > > > > > > > Guess it's because I used A matrix-free matrix (which is mffd type) > into pJ position. > > > > > > > > Is there any solution for this kind of situation? > > > > > > > > > > > > 2016-08-19 2:05 GMT+09:00 Matthew Knepley : > > > > On Thu, Aug 18, 2016 at 12:04 PM, ??? > wrote: > > > > Then in order not to use preconditioner, > > > > > > > > is it ok if I just put A matrix-free matrix (made from > MatCreateSNESMF()) into the place where preA should be? > > > > > > > > Yes, but again the solve will likely perform very poorly. > > > > > > > > Thanks, > > > > > > > > Matt > > > > > > > > The flow goes like this > > > > - call SNESCreate > > > > - call SNESSetFunction(snes, r, FormResidual, userctx, ier) > > > > - call MatCreateSNESMF(snes, A, ier) > > > > - call SNESSetJacobian(snes, A, A, FormJacobian, userctx, ier) > > > > - call SNESSetFromOptions() > > > > > > > > - call SNESGetKSP(snes, ksp, ier) > > > > - call KSPSetType(ksp, KSPGMRES, ier) > > > > - call KSPGetPC(ksp, pc, ier) > > > > - call PCSetType(pc, PCNONE, ier) > > > > - call KSPGMRESSetRestart(ksp, 30, ier) > > > > > > > > - call SNESSolve() > > > > . > > > > . > > > > > > > > > > > > and inside the FormJacobian routine > > > > - call SNESComputeJacobian(snes, v, J, pJ, userctx, ier) --> J and > pJ must be pointed with A and A. > > > > > > > > > > > > > > > > Thank you again, > > > > > > > > Kyungjun. > > > > > > > > 2016-08-19 1:44 GMT+09:00 Matthew Knepley : > > > > On Thu, Aug 18, 2016 at 11:42 AM, ??? > wrote: > > > > Thanks for your helpful answers. > > > > > > > > Here's another question... > > > > > > > > As I read some example PETSc codes, I noticed that there should be a > preconditioning matrix (e.g. approx. jacobian matrix) when using > MatCreateSNESMF(). > > > > > > > > I mean, > > > > after calling MatCreateSNESMF(snes, A, ier), > > > > there should be another matrix preA(preconditioning matrix) to use > SNESSetJacobian(snes, A, preA, FormJacobian, ctx, ier). > > > > > > > > > > > > 1) Is there any way that I can use matrix-free method without making > preconditioning matrix? > > > > > > > > Don't use a preconditioner. As you might expect, this does not often > work out well. > > > > > > > > 2) I have a reference code, and the code adopts > > > > > > > > MatFDColoringCreate() > > > > and finally uses > > > > SNESComputeJacobianDefaultColor() at FormJacobian stage. > > > > > > > > But I can't see the inside of the fdcolor and I'm curious of this > mechanism. Can you explain this very briefly or tell me an example code > that I can refer to. ( I think none of PETSc example code is using > fdcolor..) > > > > > > > > This is the default, so there is no need for all that code. We use > naive graph 2-coloring. I think there might be a review article by Alex > Pothen about that. > > > > > > > > Thanks, > > > > > > > > Matt > > > > > > > > > > > > Best, > > > > > > > > Kyungjun. > > > > > > > > 2016-08-19 0:54 GMT+09:00 Matthew Knepley : > > > > On Thu, Aug 18, 2016 at 10:39 AM, ??? > wrote: > > > > 1) I wanna know the difference between applying option with command > line and within source code. > > > > From my experience, command line option helps set other default > settings that I didn't applied, I guess. > > > > > > > > The command line arguments are applied to an object when > *SetFromOptions() is called, so in this case > > > > you want SNESSetFromOptions() on the solver. There should be no > difference from using the API. > > > > > > > > 2) I made a matrix-free matrix with MatCreateSNESMF function, and > every time I check my snes context with SNESView, > > > > > > > > Mat Object: 1 MPI processes > > > > type: mffd > > > > rows=11616, cols=11616 > > > > Matrix-free approximation: > > > > err=1.49012e-08 (relative error in function evaluation) > > > > The compute h routine has not yet been set > > > > > > > > at the end of line shows there's no routine for computing h value. > > > > I used MatMFFDWPSetComputeNormU function, but it didn't work I think. > > > > Is it ok if I leave the h value that way? Or should I have to set h > computing routine? > > > > > > > > I am guessing you are calling the function on a different object > from the one that is viewed here. > > > > However, there will always be a default function for computing h. > > > > > > > > Thanks, > > > > > > > > Matt > > > > > > > > Kyungjun. > > > > > > > > 2016-08-18 23:18 GMT+09:00 Matthew Knepley : > > > > On Thu, Aug 18, 2016 at 8:35 AM, ??? > wrote: > > > > Hi, I'm trying to set my SNES matrix-free with Walker & Pernice way > of computing h value. > > > > > > > > I found above command (MatSNESMFWPSetComputeNormU) but my fortran > compiler couldn't fine any reference of that command. > > > > > > > > I checked Petsc changes log, but there weren't any mentions about > that command. > > > > > > > > Should I have to include another specific header file? > > > > > > > > We have this function > > > > > > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/ > MatMFFDWPSetComputeNormU.html > > > > > > > > but I would recommend using the command line option > > > > > > > > -mat_mffd_compute_normu > > > > > > > > Thanks, > > > > > > > > Matt > > > > > > > > Thank you always. > > > > > > > > > > > > > > > > -- > > > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > > > -- Norbert Wiener > > > > > > > > > > > > > > > > > > > > -- > > > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > > > -- Norbert Wiener > > > > > > > > > > > > > > > > > > > > -- > > > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > > > -- Norbert Wiener > > > > > > > > > > > > > > > > > > > > -- > > > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > > > -- Norbert Wiener > > > > > > > > > > > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > > -- Norbert Wiener > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 22943 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 222628 bytes Desc: not available URL: From knepley at gmail.com Thu Oct 6 09:47:12 2016 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 6 Oct 2016 09:47:12 -0500 Subject: [petsc-users] Question about using MatSNESMFWPSetComputeNormU In-Reply-To: References: <04A37A4A-66DA-461A-983A-B031EFD0183D@mcs.anl.gov> <88CE2F93-938B-46D8-BEC5-7287E2430353@mcs.anl.gov> <64FDE88F-897A-48C0-89E4-214DB4619DAF@mcs.anl.gov> Message-ID: On Thu, Oct 6, 2016 at 9:23 AM, Choi Kyungjun wrote: > Dear Matt. > > Thank you very much for your help. > > I'm currently working on PETSc library into 2-D compressible Euler / NS > equation solver, especially for convergence of steady state problem. > > I adjusted my flow code as you told me, using snes_mf command line > option, but I have a question about SNESsetFunction, especially function > evaluation routine. > > > My command line option goes like this, as you told me. > > *-snes_mf -pc_type none -snes_view -snes_monitor -ksp_monitor > -snes_converged_reason -ksp_converged_reason* > > > I remember that if I use snes_mf option, the matrix-free method is applied > with computing Jacobian like below. > More precisely, it computes the action of the Jacobian on a vector a. > > [image: ?? ??? 1](captured from Petsc Manual p.113) > > But I computed Jacobian with function evaluation routine, with > SNESSetFunction(snes, r, FormPetscResidual, userctx, ier). > > I referred to my reference code which computes Jacobian like below. > Again, this is the action of the Jacobian, and this formula makes no sense to me. I also do not understand the two sentences above. > F'(u) a = -F(u) - (volume)/dt *a > > This is just reverse calculation of equation, not matrix-free form. This > is done at the function evaluation routine (FormPetscResidual). > > > *I want to ask how I can use the REAL matrix-free form.* > I do not know what you mean here. You are using the MF form is you give -snes_mf. Thanks, Matt > I'll attach my flow code and computation log below. > > > Thank you so much every time for your sincere help. > > Kyungjun. > > > > *================================================================================* > *(This is the flow code, and vectors are already created.)* > > *call SNESCreate(PETSC_COMM_WORLD, Mixt%snes, ier)* > > *call SNESSetFunction(Mixt%snes, Mixt%r, FormPetscResidual, Collect, ier)* > > *call SNESSetFromOptions(Mixt%snes, ier)* > > > *================================================================================* > *(This is function evaluation routine)* > > *subroutine FormPetscResidual(snes, x, f, Collect, ier)* > type(t_Collect), intent(inout) :: Collect > > SNES :: snes > Vec :: x, f > integer :: ier, counter, iCell, iVar, temp > integer :: ndof > real(8), allocatable :: CVar(:,:) > real(8), allocatable :: PVar(:,:) > PetscScalar, pointer :: xx_v(:) > PetscScalar, pointer :: ff_v(:) > > ! Set degree of freedom of this system. > ndof = Collect%pMixt%nCVar * Collect%pGrid%nCell > > ! Backup the original values for cv to local array CVar > allocate( CVar(0:Collect%pMixt%nCVar-1, Collect%pGrid%nCell) ) > allocate( PVar(0:Collect%pMixt%nPVar-1, Collect%pGrid%nCell) ) > allocate( xx_v(1:ndof) ) > allocate( ff_v(1:ndof) ) > xx_v(:) = 0d0 > ff_v(:) = 0d0 > > ! Backup the original values for cv and pv > do iCell = 1, Collect%pGrid%nCell > do iVar = 0, Collect%pMixt%nCVar-1 > CVar(iVar,iCell) = Collect%pMixt%cv(iVar,iCell) > PVar(iVar,iCell) = Collect%pMixt%pv(iVar,iCell) > end do > end do > > ! Copy the input argument vector x to array value xx_v > call VecGetArrayReadF90(x, xx_v, ier) > call VecGetArrayF90(f, ff_v, ier) > > ! Compute copy the given vector into Mixt%cv and check for validity > counter = 0 > do iCell = 1, Collect%pGrid%nCell > do iVar = 0, Collect%pMixt%nCVar-1 > counter = counter + 1 > Collect%pMixt%cv(iVar,iCell) = xx_v(counter) > end do > end do > > ! Update primitive variables with input x vector to compute residual > call PostProcessing(Collect%pMixt,Collect%pGrid,Collect%pConf) > > > ! Compute the residual > call ComputeResidual(Collect%pMixt,Collect%pGrid,Collect%pConf) --> > where update residual of cell > > ! Copy the residual array into the PETSc vector > counter = 0 > do iCell = 1, Collect%pGrid%nCell > do iVar = 0, Collect%pMixt%nCVar-1 > counter = counter + 1 > > * ff_v(counter) = Collect%pMixt%Residual(iVar,iCell) + > Collect%pGrid%vol(iCell)/Collect%pMixt%TimeStep(iCell)*( > Collect%pMixt%cv(iVar,iCell) - CVar(iVar,iCell) )* > end do > end do > > ! Restore conservative variables > do iCell = 1, Collect%pGrid%nCell > do iVar = 0, Collect%pMixt%nCVar-1 > Collect%pMixt%cv(iVar,iCell) = CVar(iVar,iCell) > Collect%pMixt%pv(iVar,iCell) = PVar(iVar,iCell) > end do > end do > > call VecRestoreArrayReadF90(x, xx_v, ier) > call VecRestoreArrayF90(f, ff_v, ier) > > deallocate(CVar) > deallocate(PVar) > > *end subroutine* > > > > *================================================================================* > *Computation log* > [image: ?? ??? 2] > > > > > > 2016-08-19 21:14 GMT+09:00 Barry Smith : > >> >> It looks like the SNESView() you have below was called before you ever >> did a solve, hence it prints the message "information may be incomplete". >> Note also zero function evaluations have been done in the SNESSolve, if the >> solve had been called it should be great than 0. >> >> SNES Object: 1 MPI processes >> type: newtonls >> SNES has not been set up so information may be incomplete >> >> This is also why it prints >> >> The compute h routine has not yet been set >> >> The information about the h routine won't be printed until after an >> actual solve is done and the "compute h" function is set. >> >> Barry >> >> Note you can call MatMFFDSetType() to control the "compute h" function >> that is used. >> >> >> >> > On Aug 19, 2016, at 12:04 AM, ??? wrote: >> > >> > Dear Barry and Matt. >> > >> > Thank you very much for helping me up all night. (in my time) >> > >> > And sorry for not asking with sufficient source code condition or my >> circumstances. (also with poor English.) >> > >> > >> > I just want to make sure that the options of my code is well applied. >> > >> > I'm trying to use GMRES with matrix-free method. I'd like to solve 2-D >> euler equation without preconditioning matrix, for now. >> > >> > >> > 1) I'm still curious whether my snes context is using MF jacobian. ( >> just like -snes_mf command line option) >> > >> > 2) And mind if I ask you that whether I applied petsc functions >> properly? >> > >> > I'll check out ex5 for applying command line options. >> > >> > >> > I'll attach my petsc flow code and option log by SNESView() below. >> > ------------------------------------------------------------ >> -------------------------------------------------------- >> > - petsc flow code >> > ------------------------------------------------------------ >> -------------------------------------------------------- >> > >> > ndof = Mixt%nCVar * Grid%nCell >> > >> > call VecCreateMPIWIthArray(PETSC_COMM_WORLD, Mixt%nCVar, ndof, >> PETSC_DECIDE, Mixt%cv, Mixt%x, ier) >> > call VecDuplicate(Mixt%x, Mixt%r, ier) >> > call VecSet(Mixt%r, zero, ier) >> > >> > call SNESCreate(PETSC_COMM_WORLD, Mixt%snes, ier) >> > call SNESSetFunction(Mixt%snes, Mixt%r, FormPetscResidual, Collect, ier) >> > call MatCreateSNESMF(Mixt%snes, Mixt%A, ier) >> > >> > call SNESSetJacobian(Mixt%snes, Mixt%A, Mixt%A, MatMFFDComputeJacobian, >> Collect, ier) >> > call SNESSetFromOptions(Mixt%snes, ier) >> > >> > call SNESGetKSP(Mixt%snes, ksp, ier) >> > call KSPSetType(ksp, KSPGMRES, ier) >> > call KSPGetPC(ksp, pc, ier) >> > call PCSetType(pc, PCNONE, ier) >> > call KSPSetInitialGuessNonzero(ksp, PETSC_TRUE, ier) >> > call KSPGMRESSetRestart(ksp, 30, ier) >> > call KSPGMRESSetPreAllocation(ksp, ier) >> > >> > >> > call SNESSetFunction(Mixt%snes, Mixt%r, FormPetscResidual, Collect, ier) >> > call SNESSetJacobian(Mixt%snes, Mixt%A, Mixt%A, MatMFFDComputeJacobian, >> Collect, ier) >> > >> > call SNESSolve(Mixt%snes, PETSC_NULL_OBJECT, Mixt%x, ier) >> > >> > stop ( for temporary ) >> > >> > >> > ------------------------------------------------------------ >> -------------------------------------------------------- >> > subroutine FormPetscResidual(snes, x, f, Collect, ier) >> > type(t_Collect), intent(inout) :: Collect >> > >> > SNES :: snes >> > Vec :: x, f >> > integer :: ier, counter, iCell, iVar, temp >> > integer :: ndof >> > real(8), allocatable :: CVar(:,:) >> > real(8), allocatable :: PVar(:,:) >> > PetscScalar, pointer :: xx_v(:) >> > PetscScalar, pointer :: ff_v(:) >> > >> > ! Set degree of freedom of this system. >> > ndof = Collect%pMixt%nCVar * Collect%pGrid%nCell >> > >> > ! Backup the original values for cv to local array CVar >> > allocate( CVar(0:Collect%pMixt%nCVar-1, Collect%pGrid%nCell) ) >> > allocate( PVar(0:Collect%pMixt%nPVar-1, Collect%pGrid%nCell) ) >> > allocate( xx_v(1:ndof) ) >> > allocate( ff_v(1:ndof) ) >> > xx_v(:) = 0d0 >> > ff_v(:) = 0d0 >> > >> > ! Backup the original values for cv and pv >> > do iCell = 1, Collect%pGrid%nCell >> > do iVar = 0, Collect%pMixt%nCVar-1 >> > CVar(iVar,iCell) = Collect%pMixt%cv(iVar,iCell) >> > PVar(iVar,iCell) = Collect%pMixt%pv(iVar,iCell) >> > end do >> > end do >> > >> > ! Copy the input argument vector x to array value xx_v >> > call VecGetArrayReadF90(x, xx_v, ier) >> > call VecGetArrayF90(f, ff_v, ier) >> > >> > ! Compute copy the given vector into Mixt%cv and check for validity >> > counter = 0 >> > do iCell = 1, Collect%pGrid%nCell >> > do iVar = 0, Collect%pMixt%nCVar-1 >> > counter = counter + 1 >> > Collect%pMixt%cv(iVar,iCell) = xx_v(counter) >> > end do >> > end do >> > >> > ! Update primitive variables with input x vector to compute residual >> > call PostProcessing(Collect%pMixt,Collect%pGrid,Collect%pConf) >> > >> > >> > ! Compute the residual >> > call ComputeResidual(Collect%pMixt,Collect%pGrid,Collect%pConf) >> --> where update residual of cell >> > >> > ! Copy the residual array into the PETSc vector >> > counter = 0 >> > do iCell = 1, Collect%pGrid%nCell >> > do iVar = 0, Collect%pMixt%nCVar-1 >> > counter = counter + 1 >> > >> > ff_v(counter) = Collect%pMixt%Residual(iVar,iCell) + >> Collect%pGrid%vol(iCell)/Collect%pMixt%TimeStep(iCell)*( >> Collect%pMixt%cv(iVar,iCell) - CVar(iVar,iCell) ) >> > end do >> > end do >> > >> > ! Restore conservative variables >> > do iCell = 1, Collect%pGrid%nCell >> > do iVar = 0, Collect%pMixt%nCVar-1 >> > Collect%pMixt%cv(iVar,iCell) = CVar(iVar,iCell) >> > Collect%pMixt%pv(iVar,iCell) = PVar(iVar,iCell) >> > end do >> > end do >> > >> > call VecRestoreArrayReadF90(x, xx_v, ier) >> > call VecRestoreArrayF90(f, ff_v, ier) >> > >> > deallocate(CVar) >> > deallocate(PVar) >> > ------------------------------------------------------------ >> -------------------------------------------------------- >> > >> > >> > ------------------------------------------------------------ >> -------------------------------------------------------- >> > - option log >> > ------------------------------------------------------------ >> -------------------------------------------------------- >> > SNES Object: 1 MPI processes >> > type: newtonls >> > SNES has not been set up so information may be incomplete >> > maximum iterations=1, maximum function evaluations=10000 >> > tolerances: relative=1e-08, absolute=1e-32, solution=1e-08 >> > total number of linear solver iterations=0 >> > total number of function evaluations=0 >> > norm schedule ALWAYS >> > SNESLineSearch Object: 1 MPI processes >> > type: bt >> > interpolation: cubic >> > alpha=1.000000e-04 >> > maxstep=1.000000e+08, minlambda=1.000000e-12 >> > tolerances: relative=1.000000e-08, absolute=1.000000e-15, >> lambda=1.000000e-08 >> > maximum iterations=40 >> > KSP Object: 1 MPI processes >> > type: gmres >> > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >> Orthogonalization with no iterative refinement >> > GMRES: happy breakdown tolerance 1e-30 >> > maximum iterations=10000 >> > tolerances: relative=0.001, absolute=1e-50, divergence=10000. >> > left preconditioning >> > using nonzero initial guess >> > using DEFAULT norm type for convergence test >> > PC Object: 1 MPI processes >> > type: none >> > PC has not been set up so information may be incomplete >> > linear system matrix = precond matrix: >> > Mat Object: 1 MPI processes >> > type: mffd >> > rows=11616, cols=11616 >> > Matrix-free approximation: >> > err=1.49012e-08 (relative error in function evaluation) >> > The compute h routine has not yet been set >> > >> > >> > Sincerely, >> > >> > Kyungjun >> > >> > >> > 2016-08-19 13:00 GMT+09:00 Barry Smith : >> > >> > > On Aug 18, 2016, at 10:28 PM, ??? wrote: >> > > >> > > Dear Matt. >> > > >> > > I didn't use the command line options because it looked not working. >> > > >> > > I called SNESSetFromOptions(snes, ier) in my source code, >> > > >> > > but options like -snes_mf or -snes_monitor doesn't look working. >> > >> > "doesn't work" is not useful to help us figure out what has gone >> wrong. You need to show us EXACTLY what you did by sending the code you >> compiled and the command line options you ran and all the output include >> full error messages. Without the information we simply do not have enough >> information to even begin to guess why it "doesn't work". >> > >> > Barry >> > >> > >> > > >> > > >> > > Is there anything that I should consider more? >> > > >> > > >> > > 2016-08-19 4:47 GMT+09:00 Matthew Knepley : >> > > On Thu, Aug 18, 2016 at 2:44 PM, ??? >> wrote: >> > > Is there a part that you considered this as finite-difference >> approximation? >> > > I thought I used matrix-free method with MatCreateSNESMF() function >> > > >> > > You did not tell the SNES to use a MF Jacobian, you just made a Mat >> object. This is why >> > > we encourage people to use the command line. Everything is setup >> correctly and in order. >> > > Why would you choose not to. This creates long rounds of email. >> > > >> > > Matt >> > > >> > > Also I used >> > > - call PCSetType(pc, PCNONE, ier) --> so the pc type shows 'none' at >> the log >> > > >> > > >> > > I didn't use any of command line options. >> > > >> > > >> > > Kyungjun >> > > >> > > 2016-08-19 4:27 GMT+09:00 Barry Smith : >> > > >> > > You can't use that Jacobian function SNESComputeJacobianDefault >> with matrix free, it tries to compute the matrix entries and stick them >> into the matrix. You can use MatMFFDComputeJacobian >> > > >> > > > On Aug 18, 2016, at 2:03 PM, ??? wrote: >> > > > >> > > > I got stuck at FormJacobian stage. >> > > > >> > > > - call SNESComputeJacobianDefault(snes, v, J, pJ, FormResidual, >> ier) --> J & pJ are same with A matrix-free matrix (input argument) >> > > > >> > > > >> > > > >> > > > with these kind of messages.. >> > > > >> > > > [0]PETSC ERROR: No support for this operation for this object type >> > > > [0]PETSC ERROR: Mat type mffd >> > > > >> > > > >> > > > >> > > > Guess it's because I used A matrix-free matrix (which is mffd type) >> into pJ position. >> > > > >> > > > Is there any solution for this kind of situation? >> > > > >> > > > >> > > > 2016-08-19 2:05 GMT+09:00 Matthew Knepley : >> > > > On Thu, Aug 18, 2016 at 12:04 PM, ??? >> wrote: >> > > > Then in order not to use preconditioner, >> > > > >> > > > is it ok if I just put A matrix-free matrix (made from >> MatCreateSNESMF()) into the place where preA should be? >> > > > >> > > > Yes, but again the solve will likely perform very poorly. >> > > > >> > > > Thanks, >> > > > >> > > > Matt >> > > > >> > > > The flow goes like this >> > > > - call SNESCreate >> > > > - call SNESSetFunction(snes, r, FormResidual, userctx, ier) >> > > > - call MatCreateSNESMF(snes, A, ier) >> > > > - call SNESSetJacobian(snes, A, A, FormJacobian, userctx, ier) >> > > > - call SNESSetFromOptions() >> > > > >> > > > - call SNESGetKSP(snes, ksp, ier) >> > > > - call KSPSetType(ksp, KSPGMRES, ier) >> > > > - call KSPGetPC(ksp, pc, ier) >> > > > - call PCSetType(pc, PCNONE, ier) >> > > > - call KSPGMRESSetRestart(ksp, 30, ier) >> > > > >> > > > - call SNESSolve() >> > > > . >> > > > . >> > > > >> > > > >> > > > and inside the FormJacobian routine >> > > > - call SNESComputeJacobian(snes, v, J, pJ, userctx, ier) --> J and >> pJ must be pointed with A and A. >> > > > >> > > > >> > > > >> > > > Thank you again, >> > > > >> > > > Kyungjun. >> > > > >> > > > 2016-08-19 1:44 GMT+09:00 Matthew Knepley : >> > > > On Thu, Aug 18, 2016 at 11:42 AM, ??? >> wrote: >> > > > Thanks for your helpful answers. >> > > > >> > > > Here's another question... >> > > > >> > > > As I read some example PETSc codes, I noticed that there should be >> a preconditioning matrix (e.g. approx. jacobian matrix) when using >> MatCreateSNESMF(). >> > > > >> > > > I mean, >> > > > after calling MatCreateSNESMF(snes, A, ier), >> > > > there should be another matrix preA(preconditioning matrix) to use >> SNESSetJacobian(snes, A, preA, FormJacobian, ctx, ier). >> > > > >> > > > >> > > > 1) Is there any way that I can use matrix-free method without >> making preconditioning matrix? >> > > > >> > > > Don't use a preconditioner. As you might expect, this does not >> often work out well. >> > > > >> > > > 2) I have a reference code, and the code adopts >> > > > >> > > > MatFDColoringCreate() >> > > > and finally uses >> > > > SNESComputeJacobianDefaultColor() at FormJacobian stage. >> > > > >> > > > But I can't see the inside of the fdcolor and I'm curious of this >> mechanism. Can you explain this very briefly or tell me an example code >> that I can refer to. ( I think none of PETSc example code is using >> fdcolor..) >> > > > >> > > > This is the default, so there is no need for all that code. We use >> naive graph 2-coloring. I think there might be a review article by Alex >> Pothen about that. >> > > > >> > > > Thanks, >> > > > >> > > > Matt >> > > > >> > > > >> > > > Best, >> > > > >> > > > Kyungjun. >> > > > >> > > > 2016-08-19 0:54 GMT+09:00 Matthew Knepley : >> > > > On Thu, Aug 18, 2016 at 10:39 AM, ??? >> wrote: >> > > > 1) I wanna know the difference between applying option with command >> line and within source code. >> > > > From my experience, command line option helps set other default >> settings that I didn't applied, I guess. >> > > > >> > > > The command line arguments are applied to an object when >> *SetFromOptions() is called, so in this case >> > > > you want SNESSetFromOptions() on the solver. There should be no >> difference from using the API. >> > > > >> > > > 2) I made a matrix-free matrix with MatCreateSNESMF function, and >> every time I check my snes context with SNESView, >> > > > >> > > > Mat Object: 1 MPI processes >> > > > type: mffd >> > > > rows=11616, cols=11616 >> > > > Matrix-free approximation: >> > > > err=1.49012e-08 (relative error in function evaluation) >> > > > The compute h routine has not yet been set >> > > > >> > > > at the end of line shows there's no routine for computing h value. >> > > > I used MatMFFDWPSetComputeNormU function, but it didn't work I >> think. >> > > > Is it ok if I leave the h value that way? Or should I have to set h >> computing routine? >> > > > >> > > > I am guessing you are calling the function on a different object >> from the one that is viewed here. >> > > > However, there will always be a default function for computing h. >> > > > >> > > > Thanks, >> > > > >> > > > Matt >> > > > >> > > > Kyungjun. >> > > > >> > > > 2016-08-18 23:18 GMT+09:00 Matthew Knepley : >> > > > On Thu, Aug 18, 2016 at 8:35 AM, ??? >> wrote: >> > > > Hi, I'm trying to set my SNES matrix-free with Walker & Pernice way >> of computing h value. >> > > > >> > > > I found above command (MatSNESMFWPSetComputeNormU) but my fortran >> compiler couldn't fine any reference of that command. >> > > > >> > > > I checked Petsc changes log, but there weren't any mentions about >> that command. >> > > > >> > > > Should I have to include another specific header file? >> > > > >> > > > We have this function >> > > > >> > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages >> /Mat/MatMFFDWPSetComputeNormU.html >> > > > >> > > > but I would recommend using the command line option >> > > > >> > > > -mat_mffd_compute_normu >> > > > >> > > > Thanks, >> > > > >> > > > Matt >> > > > >> > > > Thank you always. >> > > > >> > > > >> > > > >> > > > -- >> > > > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > > > -- Norbert Wiener >> > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > > > -- Norbert Wiener >> > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > > > -- Norbert Wiener >> > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > > > -- Norbert Wiener >> > > > >> > > >> > > >> > > >> > > >> > > >> > > -- >> > > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > > -- Norbert Wiener >> > > >> > >> > >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 22943 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 222628 bytes Desc: not available URL: From bourdin at lsu.edu Thu Oct 6 09:51:38 2016 From: bourdin at lsu.edu (Blaise A Bourdin) Date: Thu, 6 Oct 2016 14:51:38 +0000 Subject: [petsc-users] printing snes prefix with monitor Message-ID: <54DDF861-53D7-4561-A6EB-E43B89C9DDBB@lsu.edu> Hi, I have a problem with 2 nested snes (i.e. SNESComputeFunction for snes1 involves a SNESSolve for snes2). Each snes has a different prefix. The problem is that the SESMonitor won?t print the SNES prefix, so that making sense of output can be a bit tricky? Is there a simple way to have each snes monitor display the prefix of the snes it refers to? Alternatively, where in the source code is the residual printed during snessolve? Blaise -- Department of Mathematics and Center for Computation & Technology Louisiana State University, Baton Rouge, LA 70803, USA Tel. +1 (225) 578 1612, Fax +1 (225) 578 4276 http://www.math.lsu.edu/~bourdin From knepley at gmail.com Thu Oct 6 09:55:09 2016 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 6 Oct 2016 09:55:09 -0500 Subject: [petsc-users] printing snes prefix with monitor In-Reply-To: <54DDF861-53D7-4561-A6EB-E43B89C9DDBB@lsu.edu> References: <54DDF861-53D7-4561-A6EB-E43B89C9DDBB@lsu.edu> Message-ID: On Thu, Oct 6, 2016 at 9:51 AM, Blaise A Bourdin wrote: > Hi, > > I have a problem with 2 nested snes (i.e. SNESComputeFunction for snes1 > involves a SNESSolve for snes2). > Each snes has a different prefix. The problem is that the SESMonitor won?t > print the SNES prefix, so that making sense of output can be a bit tricky? > Is there a simple way to have each snes monitor display the prefix of the > snes it refers to? Alternatively, where in the source code is the residual > printed during snessolve? It would be nice to have a mode that put the prefix on the monitor line. What we currently do is indent the subsolve. I normally make the tab level 1 greater than the enclosing solve http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscObjectGetTabLevel.html#PetscObjectGetTabLevel http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscObjectSetTabLevel.html Thanks, Matt > Blaise > -- > Department of Mathematics and Center for Computation & Technology > Louisiana State University, Baton Rouge, LA 70803, USA > Tel. +1 (225) 578 1612, Fax +1 (225) 578 4276 http://www.math.lsu.edu/~ > bourdin -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From overholt at capesim.com Thu Oct 6 10:45:17 2016 From: overholt at capesim.com (Matthew Overholt) Date: Thu, 6 Oct 2016 11:45:17 -0400 Subject: [petsc-users] large PetscCommDuplicate overhead In-Reply-To: <1EF15B5B-168C-4FFD-98BB-4C49678C02FC@mcs.anl.gov> References: <004201d21f3e$ed31c120$c7954360$@capesim.com> <1EF15B5B-168C-4FFD-98BB-4C49678C02FC@mcs.anl.gov> Message-ID: <001801d21fe8$a3e67970$ebb36c50$@capesim.com> Matthew and Barry, 1) I did a direct measurement of PetscCommDuplicate() time by tracing just that call (using CrayPat), and confirmed the sampling results. For 8 processes (n=8), tracing counted a total of 101 calls, taking ~0 time on the root process but taking 11.78 seconds (6.3% of 188 total seconds) on each of the other 7 processes. For 16 processes (n=16, still only 1 node), tracing counted 102 total calls for a total of 18.42 seconds (13.2% of 139.6 total seconds) on every process except the root. 2) Copied below is a section of the log view for the first two solutions for n=2, which shows the same calls as for n=8. (I can send the entire log files if desired.) In each case I count about 44 PCD calls per process during initialization and meshing, 7 calls during setup, 9 calls for the first solution, then 3 calls for each subsequent solution (fixed-point iteration), and 3 calls to write out the solution, for 75 total. 3) I would expect that the administrators of this machine have configured PETSc appropriately. I am using their current default install, which is 3.7.2. https://www.nersc.gov/users/software/programming-libraries/math-libraries/pe tsc/ 4) Yes, I just gave the MUMPS time as a comparison. 5) As to where it is spending time, perhaps the timing results in the log files will be helpful. The "Solution took ..." printouts give the total solution time for that iteration, the others are incremental times. (As an aside, I have been wondering why the solution times do not scale well with process count, even though that work is entirely done in parallel PETSc routines.) Thanks, Matt Overholt ********** -log_view -info results for n=2 : the first solution and subsequent fixed-point iteration *********** [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374779 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374779 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374780 [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374781 [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374782 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374780 [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374782 Matrix setup took 0.108 s [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374779 [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374781 KSP PC setup took 0.079 s [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs. [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs. [0] MatStashScatterBegin_Ref(): No of messages: 0 [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. [1] MatStashScatterBegin_Ref(): No of messages: 1 [1] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 7713792 bytes [1] MatAssemblyBegin_MPIAIJ(): Stash has 482112 entries, uses 5 mallocs. [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 599214; storage space: 1050106 unneeded,15128672 used [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 [1] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 599214) < 0.6. Do not use CompressedRow routines. [1] MatSeqAIJCheckInode(): Found 599214 nodes out of 599214 rows. Not using Inode routines [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 621594; storage space: 1237634 unneeded,15545404 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 621594) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 621594 nodes out of 621594 rows. Not using Inode routines [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374780 [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374782 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374780 [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374782 [0] VecScatterCreateCommon_PtoS(): Using MPI_Alltoallv() for scatter [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter [0] VecScatterCreate(): General case: MPI to Seq [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 15700; storage space: 5257543 unneeded,136718 used [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 89 [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 19 [1] MatCheckCompressedRow(): Found the ratio (num_zerorows 582718)/(num_localrows 599214) > 0.6. Use CompressedRow routines. [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 16496; storage space: 5464978 unneeded,136718 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 490 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 16 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 605894)/(num_localrows 621594) > 0.6. Use CompressedRow routines. K and q SetValues took 26.426 s [0] PCSetUp(): Setting up PC for first time [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374782 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374780 [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374782 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374780 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374780 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374780 [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374782 [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374782 [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374782 [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374782 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374780 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374780 [0] VecScatterCreate(): Special case: processor zero gets entire parallel vector, rest get none ** Max-trans not allowed because matrix is distributed [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374780 [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374782 [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374780 [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374782 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374780 [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374782 [0] VecScatterCreateCommon_PtoS(): Using MPI_Alltoallv() for scatter [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter [0] VecScatterCreate(): General case: Seq to MPI [1] VecScatterCreate(): General case: Seq to MPI Solution took 102.21 s NL iteration 0: delta = 32.0488 67.6279. Error delta calc took 0.045 s Node and Element temps update took 0.017 s [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs. [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs. [0] MatStashScatterBegin_Ref(): No of messages: 0 [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. [1] MatStashScatterBegin_Ref(): No of messages: 1 [1] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 7713792 bytes [1] MatAssemblyBegin_MPIAIJ(): Stash has 482112 entries, uses 0 mallocs. [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 599214; storage space: 0 unneeded,15128672 used [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 [1] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 599214) < 0.6. Do not use CompressedRow routines. [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 621594; storage space: 0 unneeded,15545404 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 621594) < 0.6. Do not use CompressedRow routines. [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 15700; storage space: 0 unneeded,136718 used [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 19 [1] MatCheckCompressedRow(): Found the ratio (num_zerorows 582718)/(num_localrows 599214) > 0.6. Use CompressedRow routines. [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 16496; storage space: 0 unneeded,136718 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 16 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 605894)/(num_localrows 621594) > 0.6. Use CompressedRow routines. K and q SetValues took 2.366 s [0] PCSetUp(): Setting up PC with same nonzero pattern [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374780 [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374782 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374780 [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374782 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374780 [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374782 [0] VecScatterCreateCommon_PtoS(): Using MPI_Alltoallv() for scatter [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter [0] VecScatterCreate(): General case: Seq to MPI [1] VecScatterCreate(): General case: Seq to MPI Solution took 82.156 s -----Original Message----- From: Barry Smith [mailto:bsmith at mcs.anl.gov] Sent: Wednesday, October 05, 2016 4:42 PM To: overholt at capesim.com Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] large PetscCommDuplicate overhead > On Oct 5, 2016, at 2:30 PM, Matthew Overholt wrote: > > Hi Petsc-Users, > > I am trying to understand an issue where PetscCommDuplicate() calls are taking an increasing percentage of time as I run a fixed-sized problem on more processes. > > I am using the FEM to solve the steady-state heat transfer equation (K.x = q) using a PC direct solver, like MUMPS. > > I am running on the NERSC Cray X30, which has two Xeon's per node with 12 cores each, and profiling the code using CrayPat sampling. > > On a typical problem (1E+6 finite elements), running on a single node: > -for 2 cores (1 on each Xeon), about 1% of time is PetscCommDuplicate (on process 1, but on the root it is less), and (for reference) 9% of total time is for MUMPS. > -for 8 cores (4 on each Xeon), over 6% of time is PetscCommDuplicate (on every process except the root, where it is <1%), and 9-10% of total time is for MUMPS. What does PetscCommDuplicate() have to do with MUMPS? Nothing at all, you are just giving its time for comparison? > > What is the large PetscCommDuplicate time connected to, an increasing number of messages (tags)? Would using fewer MatSetValues() and VecSetValues() calls (with longer message lengths) alleviate this? No PetscCommDuplicate won't increate with more messages or calls to XXXSetValues(). PetscCommDuplicate() is only called essentially on the creation of new PETSc objects. It should also be fast since it basically needs to do just a MPI_Attr_get(). With more processes but the same problem size and code there should be pretty much the same number of objects created. PetscSpinlockLock() does nothing if you are not using threads so it won't take any time. Is there a way to see where it is spending its time inside the PetscCommDuplicate()? Perhaps the Cray MPI_Attr_get() has issues. Barry > > For reference, the PETSc calling sequence in the code is as follows. > // Create the solution and RHS vectors > ierr = VecCreate(petscData->mpicomm,&mesh->hpx); > ierr = PetscObjectSetName((PetscObject) mesh->hpx, "Solution"); > ierr = VecSetSizes(mesh->hpx,mesh->lxN,mesh->neqns); // size = # of equations; distribution to match mesh > ierr = VecSetFromOptions(mesh->hpx); // allow run time options > ierr = VecDuplicate(mesh->hpx,&q); // create the RHS vector > // Create the stiffnexx matrix > ierr = MatCreate(petscData->mpicomm,&K); > ierr = MatSetSizes(K,mesh->lxN,mesh->lxN,mesh->neqns,mesh->neqns); > ierr = MatSetType(K,MATAIJ); // default sparse type > // Do preallocation > ierr = MatMPIAIJSetPreallocation(K,d_nz,NULL,o_nz,NULL); > ierr = MatSeqAIJSetPreallocation(K,d_nz,NULL); > ierr = MatSetOption(K,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE); > ierr = MatSetUp(K); > // Create and set up the KSP context as a PreConditioner Only (Direct) Solution > ierr = KSPCreate(petscData->mpicomm,&ksp); > ierr = KSPSetOperators(ksp,K,K); > ierr = KSPSetType(ksp,KSPPREONLY); > // Set the temperature vector > ierr = VecSet(mesh->hpx,mesh->Tmin); > // Set the default PC method as MUMPS > ierr = KSPGetPC(ksp,&pc); // extract the preconditioner > ierr = PCSetType(pc,PCLU); // set pc options > ierr = PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS); > ierr = KSPSetFromOptions(ksp); > > // Set the values for the K matrix and q vector > // which involves a lot of these calls > ierr = MatSetValues(K,mrows,idxm,ncols,idxn,pKe,ADD_VALUES); // 1 call per matrix row (equation) > ierr = VecSetValues(q,nqe,ixn,pqe,ADD_VALUES); // 1 call per element > ierr = VecAssemblyBegin(q); > ierr = MatAssemblyBegin(K,MAT_FINAL_ASSEMBLY); > ierr = VecAssemblyEnd(q); > ierr = MatAssemblyEnd(K,MAT_FINAL_ASSEMBLY); > > // Solve ////////////////////////////////////// > ierr = KSPSolve(ksp,q,mesh->hpx); > ... > *Note that the code evenly divides the finite elements over the total number of processors, and I am using ghosting of the FE vertices vector to handle the vertices that are needed on more than 1 process. > > Thanks in advance for your help, > Matt Overholt > CapeSym, Inc. > > Virus-free. www.avast.com --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus From bsmith at mcs.anl.gov Thu Oct 6 10:45:39 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 6 Oct 2016 10:45:39 -0500 Subject: [petsc-users] Using Petsc with Finite Elements Domain Decomposition In-Reply-To: References: Message-ID: It is almost surely some subtle memory corruption somewhere use http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind to track it down. > On Oct 6, 2016, at 4:22 AM, Ivano Barletta wrote: > > Hello everyone > > Recently I resumed the task of nesting Petsc into > this fem ocean model, for the solution of a linear system > > I followed your suggestions and "almost" everything works. > > The problem raised during a run with 4 CPUs, when i got this > error > > 3:[3]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > 3:[3]PETSC ERROR: Petsc has generated inconsistent data > 3:[3]PETSC ERROR: Negative MPI source! > 3:[3]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > 3:[3]PETSC ERROR: Petsc Release Version 3.7.1, May, 15, 2016 > 3:[3]PETSC ERROR: /users/home/ib04116/shympi_last4/fem3d/shympi ^A on a linux-gnu-intel named n243.cluster.net by ib04116 Thu Oct 6 10:37:01 2016 > 3:[3]PETSC ERROR: Configure options CFLAGS=-I/users/home/opt/netcdf/netcdf-4.2.1.1/include -I/users/home/opt/szip/szip-2.1/include -I/users/home/opt/hdf5/hdf5-1.8.10-patch1/include -I/usr/include -I/users/home/opt/netcdf/netcdf-4.3/include -I/users/home/opt/hdf5/hdf5-1.8.11/include FFLAGS=-xHost -no-prec-div -O3 -I/users/home/opt/netcdf/netcdf-4.2.1.1/include -I/users/home/opt/netcdf/netcdf-4.3/include LDFLAGS=-L/users/home/opt/netcdf/netcdf-4.2.1.1/lib -lnetcdff -L/users/home/opt/szip/szip-2.1/lib -L/users/home/opt/hdf5/hdf5-1.8.10-patch1/lib -L/users/home/opt/netcdf/netcdf-4.2.1.1/lib -L/usr/lib64/ -lz -lnetcdf -lnetcdf -lgpfs -L/users/home/opt/netcdf/netcdf-4.3/lib -L/users/home/opt/hdf5/hdf5-1.8.11/lib -L/users/home/opt/netcdf/netcdf-4.3/lib -lcurl --PETSC_ARCH=linux-gnu-intel --with-cc=mpiicc --with-fc=mpiifort --with-cxx=mpiicpc --with-mpiexec=mpirun --with-blas-lapack-dir=/users/home/opt/intel/composer_xe_2013/mkl --with-scalapack-lib="-L/users/home/opt/intel/composer_xe_2013/mkl//lib/intel64 -lmkl_scalapack_ilp64 -lmkl_blacs_intelmpi_ilp64" --with-scalapack-include=/users/home/opt/intel/composer_xe_2013/mkl/include --download-metis --download-parmetis --download-mumps --download-superlu > 3:[3]PETSC ERROR: #1 MatStashScatterGetMesg_Ref() line 692 in /users/home/sco116/petsc/petsc-3.7.1/src/mat/utils/matstash.c > 3:[3]PETSC ERROR: #2 MatStashScatterGetMesg_Private() line 663 in /users/home/sco116/petsc/petsc-3.7.1/src/mat/utils/matstash.c > 3:[3]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 713 in /users/home/sco116/petsc/petsc-3.7.1/src/mat/impls/aij/mpi/mpiaij.c > 3:[3]PETSC ERROR: #4 MatAssemblyEnd() line 5187 in /users/home/sco116/petsc/petsc-3.7.1/src/mat/interface/matrix.c > > The code is in fortran and the Petsc version is 3.7.1 > > This error looks quite strange to me, because it doesn't happen always in the same > situation. The model goes through several time steps, but this error is not > raised always at the same time. It has happened at the fourth, for example, at > the fifth time step. What is even more odd is that once the run of the model (720 time steps) > was completed without any error. > > What I do to solve the linear system for each time step is the following: > > call petsc_solve( ..arguments..) > > subroutine petsc_solve(..args) > call PetscInitialize(PETSC_NULL_CHARACTER) > > call MatCreate > ... > ... > call KSPSolve(...) > > call XXXDestroy() > call PetscFinalize > end subroutine > > Do you think that calling PetscInitialize and PetscFinalize > several times might cause problems? I guess Petsc use > the same communicator of the model, which is MPI_COMM_WORLD > > It don't have hints to troubleshoot this, since is not a > reproducible error and I don't know where to look to > sort it out. > > Have you got any suggestion? > > Thanks in advance > > Ivano > > > > > 2016-07-13 5:16 GMT+02:00 Barry Smith : > > > On Jul 12, 2016, at 4:13 AM, Matthew Knepley wrote: > > > > On Tue, Jul 12, 2016 at 3:35 AM, Ivano Barletta wrote: > > Dear Petsc users > > > > my aim is to parallelize the solution of a linear > > system into a finite elements > > ocean model. > > > > The model has been almost entirely parallelized, with > > a partitioning of the domain made element-wise through > > the use of Zoltan libraries, so the subdomains > > share the nodes lying on the edges. > > > > The linear system includes node-to-node dependencies > > so my guess is that I need to create an halo surrounding > > each subdomain, to allow connections of edge nodes with > > neighbour subdomains ones > > > > Apart from that, my question is if Petsc accept a > > previously made partitioning (maybe taking into account of halo) > > using the data structures coming out of it > > > > Has anybody of you ever faced a similar problem? > > > > If all you want to do is construct a PETSc Mat and Vec for the linear system, > > just give PETSc the non-overlapping partition to create those objects. You > > can input values on off-process partitions automatically using MatSetValues() > > and VecSetValues(). > > Note that by just using the VecSetValues() and MatSetValues() PETSc will manage all the halo business needed by the linear algebra system solver automatically. You don't need to provide any halo information to PETSc. It is really straightforward. > > Barry > > > > > Thanks, > > > > Matt > > > > Thanks in advance > > Ivano > > > > > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > From bsmith at mcs.anl.gov Thu Oct 6 10:55:13 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 6 Oct 2016 10:55:13 -0500 Subject: [petsc-users] large PetscCommDuplicate overhead In-Reply-To: <001801d21fe8$a3e67970$ebb36c50$@capesim.com> References: <004201d21f3e$ed31c120$c7954360$@capesim.com> <1EF15B5B-168C-4FFD-98BB-4C49678C02FC@mcs.anl.gov> <001801d21fe8$a3e67970$ebb36c50$@capesim.com> Message-ID: <0581236A-D4F6-4C2C-8C26-CED0D3E9CA75@mcs.anl.gov> Matt, Thanks for this information. It sure looks like there is something seriously wrong with the MPI_Attr_get() on the cray for non-root process. Does any PETSc developer have access to such a machine? We need to write a test program that just calls MPI_Attr_get a bunch of times (no PETSc) to see if we can reproduce the problem and report it to Cray. Barry On Oct 6, 2016, at 10:45 AM, Matthew Overholt wrote: > > > Matthew and Barry, > > 1) I did a direct measurement of PetscCommDuplicate() time by tracing just > that call (using CrayPat), and confirmed the sampling results. For 8 > processes (n=8), tracing counted a total of 101 calls, taking ~0 time on the > root process but taking 11.78 seconds (6.3% of 188 total seconds) on each of > the other 7 processes. For 16 processes (n=16, still only 1 node), tracing > counted 102 total calls for a total of 18.42 seconds (13.2% of 139.6 total > seconds) on every process except the root. > > 2) Copied below is a section of the log view for the first two solutions for > n=2, which shows the same calls as for n=8. (I can send the entire log files > if desired.) In each case I count about 44 PCD calls per process during > initialization and meshing, 7 calls during setup, 9 calls for the first > solution, then 3 calls for each subsequent solution (fixed-point iteration), > and 3 calls to write out the solution, for 75 total. > > 3) I would expect that the administrators of this machine have configured > PETSc appropriately. I am using their current default install, which is > 3.7.2. > https://www.nersc.gov/users/software/programming-libraries/math-libraries/pe > tsc/ > > 4) Yes, I just gave the MUMPS time as a comparison. > > 5) As to where it is spending time, perhaps the timing results in the log > files will be helpful. The "Solution took ..." printouts give the total > solution time for that iteration, the others are incremental times. (As an > aside, I have been wondering why the solution times do not scale well with > process count, even though that work is entirely done in parallel PETSc > routines.) > > Thanks, > Matt Overholt > > > ********** -log_view -info results for n=2 : the first solution and > subsequent fixed-point iteration *********** > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374779 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374779 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374781 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > Matrix setup took 0.108 s > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374779 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374781 > KSP PC setup took 0.079 s > [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs. > [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs. > [0] MatStashScatterBegin_Ref(): No of messages: 0 > [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. > [1] MatStashScatterBegin_Ref(): No of messages: 1 > [1] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 7713792 bytes > [1] MatAssemblyBegin_MPIAIJ(): Stash has 482112 entries, uses 5 mallocs. > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 599214; storage space: > 1050106 unneeded,15128672 used > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows > 599214) < 0.6. Do not use CompressedRow routines. > [1] MatSeqAIJCheckInode(): Found 599214 nodes out of 599214 rows. Not using > Inode routines > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 621594; storage space: > 1237634 unneeded,15545404 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows > 621594) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 621594 nodes out of 621594 rows. Not using > Inode routines > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] VecScatterCreateCommon_PtoS(): Using MPI_Alltoallv() for scatter > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter > [0] VecScatterCreate(): General case: MPI to Seq > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 15700; storage space: > 5257543 unneeded,136718 used > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 89 > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 19 > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows > 582718)/(num_localrows 599214) > 0.6. Use CompressedRow routines. > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 16496; storage space: > 5464978 unneeded,136718 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 490 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 16 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 605894)/(num_localrows 621594) > 0.6. Use CompressedRow routines. > K and q SetValues took 26.426 s > [0] PCSetUp(): Setting up PC for first time > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [0] VecScatterCreate(): Special case: processor zero gets entire parallel > vector, rest get none > ** Max-trans not allowed because matrix is distributed > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is > unchanged > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] VecScatterCreateCommon_PtoS(): Using MPI_Alltoallv() for scatter > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter > [0] VecScatterCreate(): General case: Seq to MPI > [1] VecScatterCreate(): General case: Seq to MPI > Solution took 102.21 s > > NL iteration 0: delta = 32.0488 67.6279. > Error delta calc took 0.045 s > Node and Element temps update took 0.017 s > [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs. > [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs. > [0] MatStashScatterBegin_Ref(): No of messages: 0 > [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. > [1] MatStashScatterBegin_Ref(): No of messages: 1 > [1] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 7713792 bytes > [1] MatAssemblyBegin_MPIAIJ(): Stash has 482112 entries, uses 0 mallocs. > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 599214; storage space: 0 > unneeded,15128672 used > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows > 599214) < 0.6. Do not use CompressedRow routines. > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 621594; storage space: 0 > unneeded,15545404 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows > 621594) < 0.6. Do not use CompressedRow routines. > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 15700; storage space: 0 > unneeded,136718 used > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 19 > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows > 582718)/(num_localrows 599214) > 0.6. Use CompressedRow routines. > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 16496; storage space: 0 > unneeded,136718 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 16 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 605894)/(num_localrows 621594) > 0.6. Use CompressedRow routines. > K and q SetValues took 2.366 s > [0] PCSetUp(): Setting up PC with same nonzero pattern > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] VecScatterCreateCommon_PtoS(): Using MPI_Alltoallv() for scatter > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter > [0] VecScatterCreate(): General case: Seq to MPI > [1] VecScatterCreate(): General case: Seq to MPI > Solution took 82.156 s > > -----Original Message----- > From: Barry Smith [mailto:bsmith at mcs.anl.gov] > Sent: Wednesday, October 05, 2016 4:42 PM > To: overholt at capesim.com > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] large PetscCommDuplicate overhead > > >> On Oct 5, 2016, at 2:30 PM, Matthew Overholt wrote: >> >> Hi Petsc-Users, >> >> I am trying to understand an issue where PetscCommDuplicate() calls are > taking an increasing percentage of time as I run a fixed-sized problem on > more processes. >> >> I am using the FEM to solve the steady-state heat transfer equation (K.x = > q) using a PC direct solver, like MUMPS. >> >> I am running on the NERSC Cray X30, which has two Xeon's per node with 12 > cores each, and profiling the code using CrayPat sampling. >> >> On a typical problem (1E+6 finite elements), running on a single node: >> -for 2 cores (1 on each Xeon), about 1% of time is PetscCommDuplicate (on > process 1, but on the root it is less), and (for reference) 9% of total time > is for MUMPS. >> -for 8 cores (4 on each Xeon), over 6% of time is PetscCommDuplicate (on > every process except the root, where it is <1%), and 9-10% of total time is > for MUMPS. > > What does PetscCommDuplicate() have to do with MUMPS? Nothing at all, you > are just giving its time for comparison? > >> >> What is the large PetscCommDuplicate time connected to, an increasing > number of messages (tags)? Would using fewer MatSetValues() and > VecSetValues() calls (with longer message lengths) alleviate this? > > No PetscCommDuplicate won't increate with more messages or calls to > XXXSetValues(). PetscCommDuplicate() is only called essentially on the > creation of new PETSc objects. It should also be fast since it basically > needs to do just a MPI_Attr_get(). With more processes but the same problem > size and code there should be pretty much the same number of objects > created. > > PetscSpinlockLock() does nothing if you are not using threads so it won't > take any time. > > Is there a way to see where it is spending its time inside the > PetscCommDuplicate()? Perhaps the Cray MPI_Attr_get() has issues. > > Barry > > > > > > >> >> For reference, the PETSc calling sequence in the code is as follows. >> // Create the solution and RHS vectors >> ierr = VecCreate(petscData->mpicomm,&mesh->hpx); >> ierr = PetscObjectSetName((PetscObject) mesh->hpx, "Solution"); >> ierr = VecSetSizes(mesh->hpx,mesh->lxN,mesh->neqns); // size = # of > equations; distribution to match mesh >> ierr = VecSetFromOptions(mesh->hpx); // allow run time options >> ierr = VecDuplicate(mesh->hpx,&q); // create the RHS vector >> // Create the stiffnexx matrix >> ierr = MatCreate(petscData->mpicomm,&K); >> ierr = MatSetSizes(K,mesh->lxN,mesh->lxN,mesh->neqns,mesh->neqns); >> ierr = MatSetType(K,MATAIJ); // default sparse type >> // Do preallocation >> ierr = MatMPIAIJSetPreallocation(K,d_nz,NULL,o_nz,NULL); >> ierr = MatSeqAIJSetPreallocation(K,d_nz,NULL); >> ierr = MatSetOption(K,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE); >> ierr = MatSetUp(K); >> // Create and set up the KSP context as a PreConditioner Only (Direct) > Solution >> ierr = KSPCreate(petscData->mpicomm,&ksp); >> ierr = KSPSetOperators(ksp,K,K); >> ierr = KSPSetType(ksp,KSPPREONLY); >> // Set the temperature vector >> ierr = VecSet(mesh->hpx,mesh->Tmin); >> // Set the default PC method as MUMPS >> ierr = KSPGetPC(ksp,&pc); // extract the preconditioner >> ierr = PCSetType(pc,PCLU); // set pc options >> ierr = PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS); >> ierr = KSPSetFromOptions(ksp); >> >> // Set the values for the K matrix and q vector >> // which involves a lot of these calls >> ierr = MatSetValues(K,mrows,idxm,ncols,idxn,pKe,ADD_VALUES); // > 1 call per matrix row (equation) >> ierr = VecSetValues(q,nqe,ixn,pqe,ADD_VALUES); // 1 call per > element >> ierr = VecAssemblyBegin(q); >> ierr = MatAssemblyBegin(K,MAT_FINAL_ASSEMBLY); >> ierr = VecAssemblyEnd(q); >> ierr = MatAssemblyEnd(K,MAT_FINAL_ASSEMBLY); >> >> // Solve ////////////////////////////////////// >> ierr = KSPSolve(ksp,q,mesh->hpx); >> ... >> *Note that the code evenly divides the finite elements over the total > number of processors, and I am using ghosting of the FE vertices vector to > handle the vertices that are needed on more than 1 process. >> >> Thanks in advance for your help, >> Matt Overholt >> CapeSym, Inc. >> >> Virus-free. www.avast.com > > > --- > This email has been checked for viruses by Avast antivirus software. > https://www.avast.com/antivirus > From mirzadeh at gmail.com Thu Oct 6 10:56:27 2016 From: mirzadeh at gmail.com (Mohammad Mirzadeh) Date: Thu, 6 Oct 2016 11:56:27 -0400 Subject: [petsc-users] issue with NullSpaceRemove in parallel In-Reply-To: <5BD3E1A6-0F72-431E-A11C-5D9B762DC194@mcs.anl.gov> References: <5BD3E1A6-0F72-431E-A11C-5D9B762DC194@mcs.anl.gov> Message-ID: Thanks Barry. That seems to have fixed it; I had a NAN somewhere in the RHS. On Wed, Oct 5, 2016 at 11:18 PM, Barry Smith wrote: > > The message "Scalar value must be same on all processes, argument # 2" > comes up often when a Nan or Inf as gotten into the computation. The IEEE > standard for floating point operations defines that Nan != Nan; > > I recommend running again with -fp_trap this should cause the code to > stop with an error message as soon as the Nan or Inf is generated. > > Barry > > > > > > On Oct 5, 2016, at 9:21 PM, Mohammad Mirzadeh > wrote: > > > > Hi folks, > > > > I am trying to track down a bug that is sometimes triggered when solving > a singular system (poisson+neumann). It only seems to happen in parallel > and halfway through the run. I can provide detailed information about the > actual problem, but the error message I get boils down to this: > > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: Invalid argument > > [0]PETSC ERROR: Scalar value must be same on all processes, argument # 2 > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [0]PETSC ERROR: Petsc Release Version 3.6.3, Dec, 03, 2015 > > [0]PETSC ERROR: ./two_fluid_2d on a linux named bazantserver1 by > mohammad Wed Oct 5 21:14:47 2016 > > [0]PETSC ERROR: Configure options PETSC_ARCH=linux --prefix=/usr/local > --with-clanguage=cxx --with-c-support --with-shared-libraries > --download-hypre --download-metis --download-parmetis --download-ml > --download-superlu_dist COPTFLAGS=" -O3 -march=native" CXXOPTFLAGS=" -O3 > -march=native" FOPTFLAGS=" -O3 -march=native" > > [0]PETSC ERROR: #1 VecShift() line 1480 in /tmp/petsc-3.6.3/src/vec/vec/ > utils/vinv.c > > [0]PETSC ERROR: #2 MatNullSpaceRemove() line 348 in > /tmp/petsc-3.6.3/src/mat/interface/matnull.c > > [0]PETSC ERROR: #3 KSP_RemoveNullSpace() line 207 in > /tmp/petsc-3.6.3/include/petsc/private/kspimpl.h > > [0]PETSC ERROR: #4 KSP_PCApply() line 243 in /tmp/petsc-3.6.3/include/ > petsc/private/kspimpl.h > > [0]PETSC ERROR: #5 KSPInitialResidual() line 63 in > /tmp/petsc-3.6.3/src/ksp/ksp/interface/itres.c > > [0]PETSC ERROR: #6 KSPSolve_BCGS() line 50 in > /tmp/petsc-3.6.3/src/ksp/ksp/impls/bcgs/bcgs.c > > [0]PETSC ERROR: #7 KSPSolve() line 604 in /tmp/petsc-3.6.3/src/ksp/ksp/ > interface/itfunc.c > > > > I understand this is somewhat vague question, but any idea what could > cause this sort of problem? This was on 2 processors. The same code runs > fine on a single processor. Also the solution seems to converge fine on > previous iterations, e.g. this is the convergence info from the last > iteration before the code breaks: > > > > 0 KSP preconditioned resid norm 6.814085878146e+01 true resid norm > 2.885308600701e+00 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP preconditioned resid norm 3.067319980814e-01 true resid norm > 8.480307326867e-02 ||r(i)||/||b|| 2.939133555699e-02 > > 2 KSP preconditioned resid norm 1.526405979843e-03 true resid norm > 1.125228519827e-03 ||r(i)||/||b|| 3.899855008762e-04 > > 3 KSP preconditioned resid norm 2.199423175998e-05 true resid norm > 4.232832916628e-05 ||r(i)||/||b|| 1.467029528695e-05 > > 4 KSP preconditioned resid norm 5.382291463582e-07 true resid norm > 8.438732856334e-07 ||r(i)||/||b|| 2.924724535283e-07 > > 5 KSP preconditioned resid norm 9.495525177398e-09 true resid norm > 1.408250768598e-08 ||r(i)||/||b|| 4.880763077669e-09 > > 6 KSP preconditioned resid norm 9.249233376169e-11 true resid norm > 2.795840275267e-10 ||r(i)||/||b|| 9.689917655907e-11 > > 7 KSP preconditioned resid norm 1.138293762641e-12 true resid norm > 2.559058680281e-12 ||r(i)||/||b|| 8.869272006674e-13 > > > > Also, if it matters, this is using hypre as PC and bcgs as KSP. > > > > Thanks > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Oct 6 11:03:47 2016 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 6 Oct 2016 11:03:47 -0500 Subject: [petsc-users] large PetscCommDuplicate overhead In-Reply-To: <0581236A-D4F6-4C2C-8C26-CED0D3E9CA75@mcs.anl.gov> References: <004201d21f3e$ed31c120$c7954360$@capesim.com> <1EF15B5B-168C-4FFD-98BB-4C49678C02FC@mcs.anl.gov> <001801d21fe8$a3e67970$ebb36c50$@capesim.com> <0581236A-D4F6-4C2C-8C26-CED0D3E9CA75@mcs.anl.gov> Message-ID: On Thu, Oct 6, 2016 at 10:55 AM, Barry Smith wrote: > > Matt, > > Thanks for this information. It sure looks like there is something > seriously wrong with the MPI_Attr_get() on the cray for non-root process. > Does any PETSc developer have access to such a machine? We need to write a > test program that just calls MPI_Attr_get a bunch of times (no PETSc) to > see if we can reproduce the problem and report it to Cray. > Barry, if you write it, we can give it to Patrick Sanan to run. Thanks, Matt > Barry > > > > On Oct 6, 2016, at 10:45 AM, Matthew Overholt > wrote: > > > > > > Matthew and Barry, > > > > 1) I did a direct measurement of PetscCommDuplicate() time by tracing > just > > that call (using CrayPat), and confirmed the sampling results. For 8 > > processes (n=8), tracing counted a total of 101 calls, taking ~0 time on > the > > root process but taking 11.78 seconds (6.3% of 188 total seconds) on > each of > > the other 7 processes. For 16 processes (n=16, still only 1 node), > tracing > > counted 102 total calls for a total of 18.42 seconds (13.2% of 139.6 > total > > seconds) on every process except the root. > > > > 2) Copied below is a section of the log view for the first two solutions > for > > n=2, which shows the same calls as for n=8. (I can send the entire log > files > > if desired.) In each case I count about 44 PCD calls per process during > > initialization and meshing, 7 calls during setup, 9 calls for the first > > solution, then 3 calls for each subsequent solution (fixed-point > iteration), > > and 3 calls to write out the solution, for 75 total. > > > > 3) I would expect that the administrators of this machine have configured > > PETSc appropriately. I am using their current default install, which is > > 3.7.2. > > https://www.nersc.gov/users/software/programming- > libraries/math-libraries/pe > > tsc/ > > > > 4) Yes, I just gave the MUMPS time as a comparison. > > > > 5) As to where it is spending time, perhaps the timing results in the log > > files will be helpful. The "Solution took ..." printouts give the total > > solution time for that iteration, the others are incremental times. (As > an > > aside, I have been wondering why the solution times do not scale well > with > > process count, even though that work is entirely done in parallel PETSc > > routines.) > > > > Thanks, > > Matt Overholt > > > > > > ********** -log_view -info results for n=2 : the first solution and > > subsequent fixed-point iteration *********** > > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > > -2080374779 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > > -2080374779 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374780 > > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > > -2080374781 > > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374782 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374780 > > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374782 > > Matrix setup took 0.108 s > > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > > -2080374779 > > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > > -2080374781 > > KSP PC setup took 0.079 s > > [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs. > > [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs. > > [0] MatStashScatterBegin_Ref(): No of messages: 0 > > [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. > > [1] MatStashScatterBegin_Ref(): No of messages: 1 > > [1] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 7713792 bytes > > [1] MatAssemblyBegin_MPIAIJ(): Stash has 482112 entries, uses 5 mallocs. > > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 599214; storage space: > > 1050106 unneeded,15128672 used > > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 > > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows > > 599214) < 0.6. Do not use CompressedRow routines. > > [1] MatSeqAIJCheckInode(): Found 599214 nodes out of 599214 rows. Not > using > > Inode routines > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 621594; storage space: > > 1237634 unneeded,15545404 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows > > 621594) < 0.6. Do not use CompressedRow routines. > > [0] MatSeqAIJCheckInode(): Found 621594 nodes out of 621594 rows. Not > using > > Inode routines > > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374780 > > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374782 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374780 > > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374782 > > [0] VecScatterCreateCommon_PtoS(): Using MPI_Alltoallv() for scatter > > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter > > [0] VecScatterCreate(): General case: MPI to Seq > > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 15700; storage space: > > 5257543 unneeded,136718 used > > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is > 89 > > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 19 > > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows > > 582718)/(num_localrows 599214) > 0.6. Use CompressedRow routines. > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 16496; storage space: > > 5464978 unneeded,136718 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is > 490 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 16 > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > > 605894)/(num_localrows 621594) > 0.6. Use CompressedRow routines. > > K and q SetValues took 26.426 s > > [0] PCSetUp(): Setting up PC for first time > > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374782 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374780 > > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374782 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374780 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374780 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374780 > > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374782 > > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374782 > > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374782 > > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374782 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374780 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374780 > > [0] VecScatterCreate(): Special case: processor zero gets entire parallel > > vector, rest get none > > ** Max-trans not allowed because matrix is distributed > > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374780 > > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374782 > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is > > unchanged > > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374780 > > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374782 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374780 > > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374782 > > [0] VecScatterCreateCommon_PtoS(): Using MPI_Alltoallv() for scatter > > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter > > [0] VecScatterCreate(): General case: Seq to MPI > > [1] VecScatterCreate(): General case: Seq to MPI > > Solution took 102.21 s > > > > NL iteration 0: delta = 32.0488 67.6279. > > Error delta calc took 0.045 s > > Node and Element temps update took 0.017 s > > [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs. > > [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs. > > [0] MatStashScatterBegin_Ref(): No of messages: 0 > > [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. > > [1] MatStashScatterBegin_Ref(): No of messages: 1 > > [1] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 7713792 bytes > > [1] MatAssemblyBegin_MPIAIJ(): Stash has 482112 entries, uses 0 mallocs. > > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 599214; storage > space: 0 > > unneeded,15128672 used > > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 > > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows > > 599214) < 0.6. Do not use CompressedRow routines. > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 621594; storage > space: 0 > > unneeded,15545404 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows > > 621594) < 0.6. Do not use CompressedRow routines. > > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 15700; storage space: > 0 > > unneeded,136718 used > > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 19 > > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows > > 582718)/(num_localrows 599214) > 0.6. Use CompressedRow routines. > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 16496; storage space: > 0 > > unneeded,136718 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 16 > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > > 605894)/(num_localrows 621594) > 0.6. Use CompressedRow routines. > > K and q SetValues took 2.366 s > > [0] PCSetUp(): Setting up PC with same nonzero pattern > > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374780 > > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374782 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374780 > > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374782 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374780 > > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > > -2080374782 > > [0] VecScatterCreateCommon_PtoS(): Using MPI_Alltoallv() for scatter > > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter > > [0] VecScatterCreate(): General case: Seq to MPI > > [1] VecScatterCreate(): General case: Seq to MPI > > Solution took 82.156 s > > > > -----Original Message----- > > From: Barry Smith [mailto:bsmith at mcs.anl.gov] > > Sent: Wednesday, October 05, 2016 4:42 PM > > To: overholt at capesim.com > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] large PetscCommDuplicate overhead > > > > > >> On Oct 5, 2016, at 2:30 PM, Matthew Overholt > wrote: > >> > >> Hi Petsc-Users, > >> > >> I am trying to understand an issue where PetscCommDuplicate() calls are > > taking an increasing percentage of time as I run a fixed-sized problem on > > more processes. > >> > >> I am using the FEM to solve the steady-state heat transfer equation > (K.x = > > q) using a PC direct solver, like MUMPS. > >> > >> I am running on the NERSC Cray X30, which has two Xeon's per node with > 12 > > cores each, and profiling the code using CrayPat sampling. > >> > >> On a typical problem (1E+6 finite elements), running on a single node: > >> -for 2 cores (1 on each Xeon), about 1% of time is PetscCommDuplicate > (on > > process 1, but on the root it is less), and (for reference) 9% of total > time > > is for MUMPS. > >> -for 8 cores (4 on each Xeon), over 6% of time is PetscCommDuplicate (on > > every process except the root, where it is <1%), and 9-10% of total time > is > > for MUMPS. > > > > What does PetscCommDuplicate() have to do with MUMPS? Nothing at all, > you > > are just giving its time for comparison? > > > >> > >> What is the large PetscCommDuplicate time connected to, an increasing > > number of messages (tags)? Would using fewer MatSetValues() and > > VecSetValues() calls (with longer message lengths) alleviate this? > > > > No PetscCommDuplicate won't increate with more messages or calls to > > XXXSetValues(). PetscCommDuplicate() is only called essentially on the > > creation of new PETSc objects. It should also be fast since it basically > > needs to do just a MPI_Attr_get(). With more processes but the same > problem > > size and code there should be pretty much the same number of objects > > created. > > > > PetscSpinlockLock() does nothing if you are not using threads so it > won't > > take any time. > > > > Is there a way to see where it is spending its time inside the > > PetscCommDuplicate()? Perhaps the Cray MPI_Attr_get() has issues. > > > > Barry > > > > > > > > > > > > > >> > >> For reference, the PETSc calling sequence in the code is as follows. > >> // Create the solution and RHS vectors > >> ierr = VecCreate(petscData->mpicomm,&mesh->hpx); > >> ierr = PetscObjectSetName((PetscObject) mesh->hpx, "Solution"); > >> ierr = VecSetSizes(mesh->hpx,mesh->lxN,mesh->neqns); // size = # of > > equations; distribution to match mesh > >> ierr = VecSetFromOptions(mesh->hpx); // allow run time options > >> ierr = VecDuplicate(mesh->hpx,&q); // create the RHS vector > >> // Create the stiffnexx matrix > >> ierr = MatCreate(petscData->mpicomm,&K); > >> ierr = MatSetSizes(K,mesh->lxN,mesh->lxN,mesh->neqns,mesh->neqns); > >> ierr = MatSetType(K,MATAIJ); // default sparse type > >> // Do preallocation > >> ierr = MatMPIAIJSetPreallocation(K,d_nz,NULL,o_nz,NULL); > >> ierr = MatSeqAIJSetPreallocation(K,d_nz,NULL); > >> ierr = MatSetOption(K,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE); > >> ierr = MatSetUp(K); > >> // Create and set up the KSP context as a PreConditioner Only > (Direct) > > Solution > >> ierr = KSPCreate(petscData->mpicomm,&ksp); > >> ierr = KSPSetOperators(ksp,K,K); > >> ierr = KSPSetType(ksp,KSPPREONLY); > >> // Set the temperature vector > >> ierr = VecSet(mesh->hpx,mesh->Tmin); > >> // Set the default PC method as MUMPS > >> ierr = KSPGetPC(ksp,&pc); // extract the preconditioner > >> ierr = PCSetType(pc,PCLU); // set pc options > >> ierr = PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS); > >> ierr = KSPSetFromOptions(ksp); > >> > >> // Set the values for the K matrix and q vector > >> // which involves a lot of these calls > >> ierr = MatSetValues(K,mrows,idxm,ncols,idxn,pKe,ADD_VALUES); > // > > 1 call per matrix row (equation) > >> ierr = VecSetValues(q,nqe,ixn,pqe,ADD_VALUES); // 1 call > per > > element > >> ierr = VecAssemblyBegin(q); > >> ierr = MatAssemblyBegin(K,MAT_FINAL_ASSEMBLY); > >> ierr = VecAssemblyEnd(q); > >> ierr = MatAssemblyEnd(K,MAT_FINAL_ASSEMBLY); > >> > >> // Solve ////////////////////////////////////// > >> ierr = KSPSolve(ksp,q,mesh->hpx); > >> ... > >> *Note that the code evenly divides the finite elements over the total > > number of processors, and I am using ghosting of the FE vertices vector > to > > handle the vertices that are needed on more than 1 process. > >> > >> Thanks in advance for your help, > >> Matt Overholt > >> CapeSym, Inc. > >> > >> Virus-free. www.avast.com > > > > > > --- > > This email has been checked for viruses by Avast antivirus software. > > https://www.avast.com/antivirus > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Oct 6 11:07:51 2016 From: jed at jedbrown.org (Jed Brown) Date: Thu, 06 Oct 2016 10:07:51 -0600 Subject: [petsc-users] large PetscCommDuplicate overhead In-Reply-To: <0581236A-D4F6-4C2C-8C26-CED0D3E9CA75@mcs.anl.gov> References: <004201d21f3e$ed31c120$c7954360$@capesim.com> <1EF15B5B-168C-4FFD-98BB-4C49678C02FC@mcs.anl.gov> <001801d21fe8$a3e67970$ebb36c50$@capesim.com> <0581236A-D4F6-4C2C-8C26-CED0D3E9CA75@mcs.anl.gov> Message-ID: <8760p5h354.fsf@jedbrown.org> Barry Smith writes: > Matt, > > Thanks for this information. It sure looks like there is something > seriously wrong with the MPI_Attr_get() on the cray for non-root > process. Does any PETSc developer have access to such a machine? Yes, we have a NERSC allocation for Hierarchical Solvers. Satish's account is active. You have a dormant account that I added to the allocation. > We need to write a test program that just calls MPI_Attr_get a > bunch of times (no PETSc) to see if we can reproduce the problem > and report it to Cray. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From patrick.sanan at gmail.com Thu Oct 6 12:06:44 2016 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Thu, 6 Oct 2016 19:06:44 +0200 Subject: [petsc-users] large PetscCommDuplicate overhead In-Reply-To: References: <004201d21f3e$ed31c120$c7954360$@capesim.com> <1EF15B5B-168C-4FFD-98BB-4C49678C02FC@mcs.anl.gov> <001801d21fe8$a3e67970$ebb36c50$@capesim.com> <0581236A-D4F6-4C2C-8C26-CED0D3E9CA75@mcs.anl.gov> Message-ID: Happy to (though Piz Daint goes down for an extended upgrade on Oct 17 so would need to be run before then)! On Thu, Oct 6, 2016 at 6:03 PM, Matthew Knepley wrote: > On Thu, Oct 6, 2016 at 10:55 AM, Barry Smith wrote: >> >> >> Matt, >> >> Thanks for this information. It sure looks like there is something >> seriously wrong with the MPI_Attr_get() on the cray for non-root process. >> Does any PETSc developer have access to such a machine? We need to write a >> test program that just calls MPI_Attr_get a bunch of times (no PETSc) to see >> if we can reproduce the problem and report it to Cray. > > > Barry, if you write it, we can give it to Patrick Sanan to run. > > Thanks, > > Matt > >> >> Barry >> >> >> >> On Oct 6, 2016, at 10:45 AM, Matthew Overholt >> wrote: >> > >> > >> > Matthew and Barry, >> > >> > 1) I did a direct measurement of PetscCommDuplicate() time by tracing >> > just >> > that call (using CrayPat), and confirmed the sampling results. For 8 >> > processes (n=8), tracing counted a total of 101 calls, taking ~0 time on >> > the >> > root process but taking 11.78 seconds (6.3% of 188 total seconds) on >> > each of >> > the other 7 processes. For 16 processes (n=16, still only 1 node), >> > tracing >> > counted 102 total calls for a total of 18.42 seconds (13.2% of 139.6 >> > total >> > seconds) on every process except the root. >> > >> > 2) Copied below is a section of the log view for the first two solutions >> > for >> > n=2, which shows the same calls as for n=8. (I can send the entire log >> > files >> > if desired.) In each case I count about 44 PCD calls per process during >> > initialization and meshing, 7 calls during setup, 9 calls for the first >> > solution, then 3 calls for each subsequent solution (fixed-point >> > iteration), >> > and 3 calls to write out the solution, for 75 total. >> > >> > 3) I would expect that the administrators of this machine have >> > configured >> > PETSc appropriately. I am using their current default install, which is >> > 3.7.2. >> > >> > https://www.nersc.gov/users/software/programming-libraries/math-libraries/pe >> > tsc/ >> > >> > 4) Yes, I just gave the MUMPS time as a comparison. >> > >> > 5) As to where it is spending time, perhaps the timing results in the >> > log >> > files will be helpful. The "Solution took ..." printouts give the total >> > solution time for that iteration, the others are incremental times. (As >> > an >> > aside, I have been wondering why the solution times do not scale well >> > with >> > process count, even though that work is entirely done in parallel PETSc >> > routines.) >> > >> > Thanks, >> > Matt Overholt >> > >> > >> > ********** -log_view -info results for n=2 : the first solution and >> > subsequent fixed-point iteration *********** >> > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 >> > -2080374779 >> > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 >> > -2080374779 >> > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374780 >> > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 >> > -2080374781 >> > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374782 >> > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374780 >> > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374782 >> > Matrix setup took 0.108 s >> > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 >> > -2080374779 >> > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 >> > -2080374781 >> > KSP PC setup took 0.079 s >> > [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs. >> > [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs. >> > [0] MatStashScatterBegin_Ref(): No of messages: 0 >> > [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. >> > [1] MatStashScatterBegin_Ref(): No of messages: 1 >> > [1] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 7713792 bytes >> > [1] MatAssemblyBegin_MPIAIJ(): Stash has 482112 entries, uses 5 mallocs. >> > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 599214; storage >> > space: >> > 1050106 unneeded,15128672 used >> > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >> > 0 >> > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 >> > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows >> > 0)/(num_localrows >> > 599214) < 0.6. Do not use CompressedRow routines. >> > [1] MatSeqAIJCheckInode(): Found 599214 nodes out of 599214 rows. Not >> > using >> > Inode routines >> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 621594; storage >> > space: >> > 1237634 unneeded,15545404 used >> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >> > 0 >> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 >> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >> > 0)/(num_localrows >> > 621594) < 0.6. Do not use CompressedRow routines. >> > [0] MatSeqAIJCheckInode(): Found 621594 nodes out of 621594 rows. Not >> > using >> > Inode routines >> > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374780 >> > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374782 >> > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374780 >> > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374782 >> > [0] VecScatterCreateCommon_PtoS(): Using MPI_Alltoallv() for scatter >> > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter >> > [0] VecScatterCreate(): General case: MPI to Seq >> > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 15700; storage space: >> > 5257543 unneeded,136718 used >> > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >> > 89 >> > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 19 >> > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows >> > 582718)/(num_localrows 599214) > 0.6. Use CompressedRow routines. >> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 16496; storage space: >> > 5464978 unneeded,136718 used >> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >> > 490 >> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 16 >> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >> > 605894)/(num_localrows 621594) > 0.6. Use CompressedRow routines. >> > K and q SetValues took 26.426 s >> > [0] PCSetUp(): Setting up PC for first time >> > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374782 >> > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374780 >> > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374782 >> > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374780 >> > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374780 >> > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374780 >> > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374782 >> > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374782 >> > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374782 >> > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374782 >> > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374780 >> > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374780 >> > [0] VecScatterCreate(): Special case: processor zero gets entire >> > parallel >> > vector, rest get none >> > ** Max-trans not allowed because matrix is distributed >> > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374780 >> > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374782 >> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator >> > is >> > unchanged >> > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374780 >> > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374782 >> > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374780 >> > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374782 >> > [0] VecScatterCreateCommon_PtoS(): Using MPI_Alltoallv() for scatter >> > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter >> > [0] VecScatterCreate(): General case: Seq to MPI >> > [1] VecScatterCreate(): General case: Seq to MPI >> > Solution took 102.21 s >> > >> > NL iteration 0: delta = 32.0488 67.6279. >> > Error delta calc took 0.045 s >> > Node and Element temps update took 0.017 s >> > [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs. >> > [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs. >> > [0] MatStashScatterBegin_Ref(): No of messages: 0 >> > [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. >> > [1] MatStashScatterBegin_Ref(): No of messages: 1 >> > [1] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 7713792 bytes >> > [1] MatAssemblyBegin_MPIAIJ(): Stash has 482112 entries, uses 0 mallocs. >> > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 599214; storage >> > space: 0 >> > unneeded,15128672 used >> > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >> > 0 >> > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 >> > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows >> > 0)/(num_localrows >> > 599214) < 0.6. Do not use CompressedRow routines. >> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 621594; storage >> > space: 0 >> > unneeded,15545404 used >> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >> > 0 >> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 >> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >> > 0)/(num_localrows >> > 621594) < 0.6. Do not use CompressedRow routines. >> > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 15700; storage space: >> > 0 >> > unneeded,136718 used >> > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >> > 0 >> > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 19 >> > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows >> > 582718)/(num_localrows 599214) > 0.6. Use CompressedRow routines. >> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 16496; storage space: >> > 0 >> > unneeded,136718 used >> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >> > 0 >> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 16 >> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >> > 605894)/(num_localrows 621594) > 0.6. Use CompressedRow routines. >> > K and q SetValues took 2.366 s >> > [0] PCSetUp(): Setting up PC with same nonzero pattern >> > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374780 >> > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374782 >> > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374780 >> > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374782 >> > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374780 >> > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >> > -2080374782 >> > [0] VecScatterCreateCommon_PtoS(): Using MPI_Alltoallv() for scatter >> > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter >> > [0] VecScatterCreate(): General case: Seq to MPI >> > [1] VecScatterCreate(): General case: Seq to MPI >> > Solution took 82.156 s >> > >> > -----Original Message----- >> > From: Barry Smith [mailto:bsmith at mcs.anl.gov] >> > Sent: Wednesday, October 05, 2016 4:42 PM >> > To: overholt at capesim.com >> > Cc: petsc-users at mcs.anl.gov >> > Subject: Re: [petsc-users] large PetscCommDuplicate overhead >> > >> > >> >> On Oct 5, 2016, at 2:30 PM, Matthew Overholt >> >> wrote: >> >> >> >> Hi Petsc-Users, >> >> >> >> I am trying to understand an issue where PetscCommDuplicate() calls are >> > taking an increasing percentage of time as I run a fixed-sized problem >> > on >> > more processes. >> >> >> >> I am using the FEM to solve the steady-state heat transfer equation >> >> (K.x = >> > q) using a PC direct solver, like MUMPS. >> >> >> >> I am running on the NERSC Cray X30, which has two Xeon's per node with >> >> 12 >> > cores each, and profiling the code using CrayPat sampling. >> >> >> >> On a typical problem (1E+6 finite elements), running on a single node: >> >> -for 2 cores (1 on each Xeon), about 1% of time is PetscCommDuplicate >> >> (on >> > process 1, but on the root it is less), and (for reference) 9% of total >> > time >> > is for MUMPS. >> >> -for 8 cores (4 on each Xeon), over 6% of time is PetscCommDuplicate >> >> (on >> > every process except the root, where it is <1%), and 9-10% of total time >> > is >> > for MUMPS. >> > >> > What does PetscCommDuplicate() have to do with MUMPS? Nothing at all, >> > you >> > are just giving its time for comparison? >> > >> >> >> >> What is the large PetscCommDuplicate time connected to, an increasing >> > number of messages (tags)? Would using fewer MatSetValues() and >> > VecSetValues() calls (with longer message lengths) alleviate this? >> > >> > No PetscCommDuplicate won't increate with more messages or calls to >> > XXXSetValues(). PetscCommDuplicate() is only called essentially on the >> > creation of new PETSc objects. It should also be fast since it >> > basically >> > needs to do just a MPI_Attr_get(). With more processes but the same >> > problem >> > size and code there should be pretty much the same number of objects >> > created. >> > >> > PetscSpinlockLock() does nothing if you are not using threads so it >> > won't >> > take any time. >> > >> > Is there a way to see where it is spending its time inside the >> > PetscCommDuplicate()? Perhaps the Cray MPI_Attr_get() has issues. >> > >> > Barry >> > >> > >> > >> > >> > >> > >> >> >> >> For reference, the PETSc calling sequence in the code is as follows. >> >> // Create the solution and RHS vectors >> >> ierr = VecCreate(petscData->mpicomm,&mesh->hpx); >> >> ierr = PetscObjectSetName((PetscObject) mesh->hpx, "Solution"); >> >> ierr = VecSetSizes(mesh->hpx,mesh->lxN,mesh->neqns); // size = # of >> > equations; distribution to match mesh >> >> ierr = VecSetFromOptions(mesh->hpx); // allow run time options >> >> ierr = VecDuplicate(mesh->hpx,&q); // create the RHS vector >> >> // Create the stiffnexx matrix >> >> ierr = MatCreate(petscData->mpicomm,&K); >> >> ierr = MatSetSizes(K,mesh->lxN,mesh->lxN,mesh->neqns,mesh->neqns); >> >> ierr = MatSetType(K,MATAIJ); // default sparse type >> >> // Do preallocation >> >> ierr = MatMPIAIJSetPreallocation(K,d_nz,NULL,o_nz,NULL); >> >> ierr = MatSeqAIJSetPreallocation(K,d_nz,NULL); >> >> ierr = MatSetOption(K,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE); >> >> ierr = MatSetUp(K); >> >> // Create and set up the KSP context as a PreConditioner Only >> >> (Direct) >> > Solution >> >> ierr = KSPCreate(petscData->mpicomm,&ksp); >> >> ierr = KSPSetOperators(ksp,K,K); >> >> ierr = KSPSetType(ksp,KSPPREONLY); >> >> // Set the temperature vector >> >> ierr = VecSet(mesh->hpx,mesh->Tmin); >> >> // Set the default PC method as MUMPS >> >> ierr = KSPGetPC(ksp,&pc); // extract the preconditioner >> >> ierr = PCSetType(pc,PCLU); // set pc options >> >> ierr = PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS); >> >> ierr = KSPSetFromOptions(ksp); >> >> >> >> // Set the values for the K matrix and q vector >> >> // which involves a lot of these calls >> >> ierr = MatSetValues(K,mrows,idxm,ncols,idxn,pKe,ADD_VALUES); >> >> // >> > 1 call per matrix row (equation) >> >> ierr = VecSetValues(q,nqe,ixn,pqe,ADD_VALUES); // 1 call >> >> per >> > element >> >> ierr = VecAssemblyBegin(q); >> >> ierr = MatAssemblyBegin(K,MAT_FINAL_ASSEMBLY); >> >> ierr = VecAssemblyEnd(q); >> >> ierr = MatAssemblyEnd(K,MAT_FINAL_ASSEMBLY); >> >> >> >> // Solve ////////////////////////////////////// >> >> ierr = KSPSolve(ksp,q,mesh->hpx); >> >> ... >> >> *Note that the code evenly divides the finite elements over the total >> > number of processors, and I am using ghosting of the FE vertices vector >> > to >> > handle the vertices that are needed on more than 1 process. >> >> >> >> Thanks in advance for your help, >> >> Matt Overholt >> >> CapeSym, Inc. >> >> >> >> Virus-free. www.avast.com >> > >> > >> > --- >> > This email has been checked for viruses by Avast antivirus software. >> > https://www.avast.com/antivirus >> > >> > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener From bsmith at mcs.anl.gov Thu Oct 6 12:58:40 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 6 Oct 2016 12:58:40 -0500 Subject: [petsc-users] Question about using MatSNESMFWPSetComputeNormU In-Reply-To: References: <04A37A4A-66DA-461A-983A-B031EFD0183D@mcs.anl.gov> <88CE2F93-938B-46D8-BEC5-7287E2430353@mcs.anl.gov> <64FDE88F-897A-48C0-89E4-214DB4619DAF@mcs.anl.gov> Message-ID: <7FF0E20A-3988-4A57-96F7-008AF0AB8200@mcs.anl.gov> If you want to provide your own matrix-free application you use MatCreateShell() and MatShellSetOperation(mat, MATOP_MULT,yourfunction()) See for example src/snes/examples/tests/ex69.c (ignore the business about errorinmatmult) Barry > On Oct 6, 2016, at 9:23 AM, Choi Kyungjun wrote: > > Dear Matt. > > Thank you very much for your help. > > I'm currently working on PETSc library into 2-D compressible Euler / NS equation solver, especially for convergence of steady state problem. > > I adjusted my flow code as you told me, using snes_mf command line option, but I have a question about SNESsetFunction, especially function evaluation routine. > > > My command line option goes like this, as you told me. > > -snes_mf -pc_type none -snes_view -snes_monitor -ksp_monitor -snes_converged_reason -ksp_converged_reason > > > I remember that if I use snes_mf option, the matrix-free method is applied with computing Jacobian like below. > > (captured from Petsc Manual p.113) > > But I computed Jacobian with function evaluation routine, with SNESSetFunction(snes, r, FormPetscResidual, userctx, ier). > > I referred to my reference code which computes Jacobian like below. > > F'(u) a = -F(u) - (volume)/dt *a > > This is just reverse calculation of equation, not matrix-free form. This is done at the function evaluation routine (FormPetscResidual). > > > I want to ask how I can use the REAL matrix-free form. > > I'll attach my flow code and computation log below. > > > Thank you so much every time for your sincere help. > > Kyungjun. > > > ================================================================================ > (This is the flow code, and vectors are already created.) > > call SNESCreate(PETSC_COMM_WORLD, Mixt%snes, ier) > > call SNESSetFunction(Mixt%snes, Mixt%r, FormPetscResidual, Collect, ier) > > call SNESSetFromOptions(Mixt%snes, ier) > > ================================================================================ > (This is function evaluation routine) > > subroutine FormPetscResidual(snes, x, f, Collect, ier) > type(t_Collect), intent(inout) :: Collect > > SNES :: snes > Vec :: x, f > integer :: ier, counter, iCell, iVar, temp > integer :: ndof > real(8), allocatable :: CVar(:,:) > real(8), allocatable :: PVar(:,:) > PetscScalar, pointer :: xx_v(:) > PetscScalar, pointer :: ff_v(:) > > ! Set degree of freedom of this system. > ndof = Collect%pMixt%nCVar * Collect%pGrid%nCell > > ! Backup the original values for cv to local array CVar > allocate( CVar(0:Collect%pMixt%nCVar-1, Collect%pGrid%nCell) ) > allocate( PVar(0:Collect%pMixt%nPVar-1, Collect%pGrid%nCell) ) > allocate( xx_v(1:ndof) ) > allocate( ff_v(1:ndof) ) > xx_v(:) = 0d0 > ff_v(:) = 0d0 > > ! Backup the original values for cv and pv > do iCell = 1, Collect%pGrid%nCell > do iVar = 0, Collect%pMixt%nCVar-1 > CVar(iVar,iCell) = Collect%pMixt%cv(iVar,iCell) > PVar(iVar,iCell) = Collect%pMixt%pv(iVar,iCell) > end do > end do > > ! Copy the input argument vector x to array value xx_v > call VecGetArrayReadF90(x, xx_v, ier) > call VecGetArrayF90(f, ff_v, ier) > > ! Compute copy the given vector into Mixt%cv and check for validity > counter = 0 > do iCell = 1, Collect%pGrid%nCell > do iVar = 0, Collect%pMixt%nCVar-1 > counter = counter + 1 > Collect%pMixt%cv(iVar,iCell) = xx_v(counter) > end do > end do > > ! Update primitive variables with input x vector to compute residual > call PostProcessing(Collect%pMixt,Collect%pGrid,Collect%pConf) > > > ! Compute the residual > call ComputeResidual(Collect%pMixt,Collect%pGrid,Collect%pConf) --> where update residual of cell > > ! Copy the residual array into the PETSc vector > counter = 0 > do iCell = 1, Collect%pGrid%nCell > do iVar = 0, Collect%pMixt%nCVar-1 > counter = counter + 1 > > ff_v(counter) = Collect%pMixt%Residual(iVar,iCell) + Collect%pGrid%vol(iCell)/Collect%pMixt%TimeStep(iCell)*( Collect%pMixt%cv(iVar,iCell) - CVar(iVar,iCell) ) > end do > end do > > ! Restore conservative variables > do iCell = 1, Collect%pGrid%nCell > do iVar = 0, Collect%pMixt%nCVar-1 > Collect%pMixt%cv(iVar,iCell) = CVar(iVar,iCell) > Collect%pMixt%pv(iVar,iCell) = PVar(iVar,iCell) > end do > end do > > call VecRestoreArrayReadF90(x, xx_v, ier) > call VecRestoreArrayF90(f, ff_v, ier) > > deallocate(CVar) > deallocate(PVar) > > end subroutine > > > ================================================================================ > Computation log > > > > > > > 2016-08-19 21:14 GMT+09:00 Barry Smith : > > It looks like the SNESView() you have below was called before you ever did a solve, hence it prints the message "information may be incomplete". Note also zero function evaluations have been done in the SNESSolve, if the solve had been called it should be great than 0. > > SNES Object: 1 MPI processes > type: newtonls > SNES has not been set up so information may be incomplete > > This is also why it prints > > The compute h routine has not yet been set > > The information about the h routine won't be printed until after an actual solve is done and the "compute h" function is set. > > Barry > > Note you can call MatMFFDSetType() to control the "compute h" function that is used. > > > > > On Aug 19, 2016, at 12:04 AM, ??? wrote: > > > > Dear Barry and Matt. > > > > Thank you very much for helping me up all night. (in my time) > > > > And sorry for not asking with sufficient source code condition or my circumstances. (also with poor English.) > > > > > > I just want to make sure that the options of my code is well applied. > > > > I'm trying to use GMRES with matrix-free method. I'd like to solve 2-D euler equation without preconditioning matrix, for now. > > > > > > 1) I'm still curious whether my snes context is using MF jacobian. ( just like -snes_mf command line option) > > > > 2) And mind if I ask you that whether I applied petsc functions properly? > > > > I'll check out ex5 for applying command line options. > > > > > > I'll attach my petsc flow code and option log by SNESView() below. > > -------------------------------------------------------------------------------------------------------------------- > > - petsc flow code > > -------------------------------------------------------------------------------------------------------------------- > > > > ndof = Mixt%nCVar * Grid%nCell > > > > call VecCreateMPIWIthArray(PETSC_COMM_WORLD, Mixt%nCVar, ndof, PETSC_DECIDE, Mixt%cv, Mixt%x, ier) > > call VecDuplicate(Mixt%x, Mixt%r, ier) > > call VecSet(Mixt%r, zero, ier) > > > > call SNESCreate(PETSC_COMM_WORLD, Mixt%snes, ier) > > call SNESSetFunction(Mixt%snes, Mixt%r, FormPetscResidual, Collect, ier) > > call MatCreateSNESMF(Mixt%snes, Mixt%A, ier) > > > > call SNESSetJacobian(Mixt%snes, Mixt%A, Mixt%A, MatMFFDComputeJacobian, Collect, ier) > > call SNESSetFromOptions(Mixt%snes, ier) > > > > call SNESGetKSP(Mixt%snes, ksp, ier) > > call KSPSetType(ksp, KSPGMRES, ier) > > call KSPGetPC(ksp, pc, ier) > > call PCSetType(pc, PCNONE, ier) > > call KSPSetInitialGuessNonzero(ksp, PETSC_TRUE, ier) > > call KSPGMRESSetRestart(ksp, 30, ier) > > call KSPGMRESSetPreAllocation(ksp, ier) > > > > > > call SNESSetFunction(Mixt%snes, Mixt%r, FormPetscResidual, Collect, ier) > > call SNESSetJacobian(Mixt%snes, Mixt%A, Mixt%A, MatMFFDComputeJacobian, Collect, ier) > > > > call SNESSolve(Mixt%snes, PETSC_NULL_OBJECT, Mixt%x, ier) > > > > stop ( for temporary ) > > > > > > -------------------------------------------------------------------------------------------------------------------- > > subroutine FormPetscResidual(snes, x, f, Collect, ier) > > type(t_Collect), intent(inout) :: Collect > > > > SNES :: snes > > Vec :: x, f > > integer :: ier, counter, iCell, iVar, temp > > integer :: ndof > > real(8), allocatable :: CVar(:,:) > > real(8), allocatable :: PVar(:,:) > > PetscScalar, pointer :: xx_v(:) > > PetscScalar, pointer :: ff_v(:) > > > > ! Set degree of freedom of this system. > > ndof = Collect%pMixt%nCVar * Collect%pGrid%nCell > > > > ! Backup the original values for cv to local array CVar > > allocate( CVar(0:Collect%pMixt%nCVar-1, Collect%pGrid%nCell) ) > > allocate( PVar(0:Collect%pMixt%nPVar-1, Collect%pGrid%nCell) ) > > allocate( xx_v(1:ndof) ) > > allocate( ff_v(1:ndof) ) > > xx_v(:) = 0d0 > > ff_v(:) = 0d0 > > > > ! Backup the original values for cv and pv > > do iCell = 1, Collect%pGrid%nCell > > do iVar = 0, Collect%pMixt%nCVar-1 > > CVar(iVar,iCell) = Collect%pMixt%cv(iVar,iCell) > > PVar(iVar,iCell) = Collect%pMixt%pv(iVar,iCell) > > end do > > end do > > > > ! Copy the input argument vector x to array value xx_v > > call VecGetArrayReadF90(x, xx_v, ier) > > call VecGetArrayF90(f, ff_v, ier) > > > > ! Compute copy the given vector into Mixt%cv and check for validity > > counter = 0 > > do iCell = 1, Collect%pGrid%nCell > > do iVar = 0, Collect%pMixt%nCVar-1 > > counter = counter + 1 > > Collect%pMixt%cv(iVar,iCell) = xx_v(counter) > > end do > > end do > > > > ! Update primitive variables with input x vector to compute residual > > call PostProcessing(Collect%pMixt,Collect%pGrid,Collect%pConf) > > > > > > ! Compute the residual > > call ComputeResidual(Collect%pMixt,Collect%pGrid,Collect%pConf) --> where update residual of cell > > > > ! Copy the residual array into the PETSc vector > > counter = 0 > > do iCell = 1, Collect%pGrid%nCell > > do iVar = 0, Collect%pMixt%nCVar-1 > > counter = counter + 1 > > > > ff_v(counter) = Collect%pMixt%Residual(iVar,iCell) + Collect%pGrid%vol(iCell)/Collect%pMixt%TimeStep(iCell)*( Collect%pMixt%cv(iVar,iCell) - CVar(iVar,iCell) ) > > end do > > end do > > > > ! Restore conservative variables > > do iCell = 1, Collect%pGrid%nCell > > do iVar = 0, Collect%pMixt%nCVar-1 > > Collect%pMixt%cv(iVar,iCell) = CVar(iVar,iCell) > > Collect%pMixt%pv(iVar,iCell) = PVar(iVar,iCell) > > end do > > end do > > > > call VecRestoreArrayReadF90(x, xx_v, ier) > > call VecRestoreArrayF90(f, ff_v, ier) > > > > deallocate(CVar) > > deallocate(PVar) > > -------------------------------------------------------------------------------------------------------------------- > > > > > > -------------------------------------------------------------------------------------------------------------------- > > - option log > > -------------------------------------------------------------------------------------------------------------------- > > SNES Object: 1 MPI processes > > type: newtonls > > SNES has not been set up so information may be incomplete > > maximum iterations=1, maximum function evaluations=10000 > > tolerances: relative=1e-08, absolute=1e-32, solution=1e-08 > > total number of linear solver iterations=0 > > total number of function evaluations=0 > > norm schedule ALWAYS > > SNESLineSearch Object: 1 MPI processes > > type: bt > > interpolation: cubic > > alpha=1.000000e-04 > > maxstep=1.000000e+08, minlambda=1.000000e-12 > > tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 > > maximum iterations=40 > > KSP Object: 1 MPI processes > > type: gmres > > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > > GMRES: happy breakdown tolerance 1e-30 > > maximum iterations=10000 > > tolerances: relative=0.001, absolute=1e-50, divergence=10000. > > left preconditioning > > using nonzero initial guess > > using DEFAULT norm type for convergence test > > PC Object: 1 MPI processes > > type: none > > PC has not been set up so information may be incomplete > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: mffd > > rows=11616, cols=11616 > > Matrix-free approximation: > > err=1.49012e-08 (relative error in function evaluation) > > The compute h routine has not yet been set > > > > > > Sincerely, > > > > Kyungjun > > > > > > 2016-08-19 13:00 GMT+09:00 Barry Smith : > > > > > On Aug 18, 2016, at 10:28 PM, ??? wrote: > > > > > > Dear Matt. > > > > > > I didn't use the command line options because it looked not working. > > > > > > I called SNESSetFromOptions(snes, ier) in my source code, > > > > > > but options like -snes_mf or -snes_monitor doesn't look working. > > > > "doesn't work" is not useful to help us figure out what has gone wrong. You need to show us EXACTLY what you did by sending the code you compiled and the command line options you ran and all the output include full error messages. Without the information we simply do not have enough information to even begin to guess why it "doesn't work". > > > > Barry > > > > > > > > > > > > > Is there anything that I should consider more? > > > > > > > > > 2016-08-19 4:47 GMT+09:00 Matthew Knepley : > > > On Thu, Aug 18, 2016 at 2:44 PM, ??? wrote: > > > Is there a part that you considered this as finite-difference approximation? > > > I thought I used matrix-free method with MatCreateSNESMF() function > > > > > > You did not tell the SNES to use a MF Jacobian, you just made a Mat object. This is why > > > we encourage people to use the command line. Everything is setup correctly and in order. > > > Why would you choose not to. This creates long rounds of email. > > > > > > Matt > > > > > > Also I used > > > - call PCSetType(pc, PCNONE, ier) --> so the pc type shows 'none' at the log > > > > > > > > > I didn't use any of command line options. > > > > > > > > > Kyungjun > > > > > > 2016-08-19 4:27 GMT+09:00 Barry Smith : > > > > > > You can't use that Jacobian function SNESComputeJacobianDefault with matrix free, it tries to compute the matrix entries and stick them into the matrix. You can use MatMFFDComputeJacobian > > > > > > > On Aug 18, 2016, at 2:03 PM, ??? wrote: > > > > > > > > I got stuck at FormJacobian stage. > > > > > > > > - call SNESComputeJacobianDefault(snes, v, J, pJ, FormResidual, ier) --> J & pJ are same with A matrix-free matrix (input argument) > > > > > > > > > > > > > > > > with these kind of messages.. > > > > > > > > [0]PETSC ERROR: No support for this operation for this object type > > > > [0]PETSC ERROR: Mat type mffd > > > > > > > > > > > > > > > > Guess it's because I used A matrix-free matrix (which is mffd type) into pJ position. > > > > > > > > Is there any solution for this kind of situation? > > > > > > > > > > > > 2016-08-19 2:05 GMT+09:00 Matthew Knepley : > > > > On Thu, Aug 18, 2016 at 12:04 PM, ??? wrote: > > > > Then in order not to use preconditioner, > > > > > > > > is it ok if I just put A matrix-free matrix (made from MatCreateSNESMF()) into the place where preA should be? > > > > > > > > Yes, but again the solve will likely perform very poorly. > > > > > > > > Thanks, > > > > > > > > Matt > > > > > > > > The flow goes like this > > > > - call SNESCreate > > > > - call SNESSetFunction(snes, r, FormResidual, userctx, ier) > > > > - call MatCreateSNESMF(snes, A, ier) > > > > - call SNESSetJacobian(snes, A, A, FormJacobian, userctx, ier) > > > > - call SNESSetFromOptions() > > > > > > > > - call SNESGetKSP(snes, ksp, ier) > > > > - call KSPSetType(ksp, KSPGMRES, ier) > > > > - call KSPGetPC(ksp, pc, ier) > > > > - call PCSetType(pc, PCNONE, ier) > > > > - call KSPGMRESSetRestart(ksp, 30, ier) > > > > > > > > - call SNESSolve() > > > > . > > > > . > > > > > > > > > > > > and inside the FormJacobian routine > > > > - call SNESComputeJacobian(snes, v, J, pJ, userctx, ier) --> J and pJ must be pointed with A and A. > > > > > > > > > > > > > > > > Thank you again, > > > > > > > > Kyungjun. > > > > > > > > 2016-08-19 1:44 GMT+09:00 Matthew Knepley : > > > > On Thu, Aug 18, 2016 at 11:42 AM, ??? wrote: > > > > Thanks for your helpful answers. > > > > > > > > Here's another question... > > > > > > > > As I read some example PETSc codes, I noticed that there should be a preconditioning matrix (e.g. approx. jacobian matrix) when using MatCreateSNESMF(). > > > > > > > > I mean, > > > > after calling MatCreateSNESMF(snes, A, ier), > > > > there should be another matrix preA(preconditioning matrix) to use SNESSetJacobian(snes, A, preA, FormJacobian, ctx, ier). > > > > > > > > > > > > 1) Is there any way that I can use matrix-free method without making preconditioning matrix? > > > > > > > > Don't use a preconditioner. As you might expect, this does not often work out well. > > > > > > > > 2) I have a reference code, and the code adopts > > > > > > > > MatFDColoringCreate() > > > > and finally uses > > > > SNESComputeJacobianDefaultColor() at FormJacobian stage. > > > > > > > > But I can't see the inside of the fdcolor and I'm curious of this mechanism. Can you explain this very briefly or tell me an example code that I can refer to. ( I think none of PETSc example code is using fdcolor..) > > > > > > > > This is the default, so there is no need for all that code. We use naive graph 2-coloring. I think there might be a review article by Alex Pothen about that. > > > > > > > > Thanks, > > > > > > > > Matt > > > > > > > > > > > > Best, > > > > > > > > Kyungjun. > > > > > > > > 2016-08-19 0:54 GMT+09:00 Matthew Knepley : > > > > On Thu, Aug 18, 2016 at 10:39 AM, ??? wrote: > > > > 1) I wanna know the difference between applying option with command line and within source code. > > > > From my experience, command line option helps set other default settings that I didn't applied, I guess. > > > > > > > > The command line arguments are applied to an object when *SetFromOptions() is called, so in this case > > > > you want SNESSetFromOptions() on the solver. There should be no difference from using the API. > > > > > > > > 2) I made a matrix-free matrix with MatCreateSNESMF function, and every time I check my snes context with SNESView, > > > > > > > > Mat Object: 1 MPI processes > > > > type: mffd > > > > rows=11616, cols=11616 > > > > Matrix-free approximation: > > > > err=1.49012e-08 (relative error in function evaluation) > > > > The compute h routine has not yet been set > > > > > > > > at the end of line shows there's no routine for computing h value. > > > > I used MatMFFDWPSetComputeNormU function, but it didn't work I think. > > > > Is it ok if I leave the h value that way? Or should I have to set h computing routine? > > > > > > > > I am guessing you are calling the function on a different object from the one that is viewed here. > > > > However, there will always be a default function for computing h. > > > > > > > > Thanks, > > > > > > > > Matt > > > > > > > > Kyungjun. > > > > > > > > 2016-08-18 23:18 GMT+09:00 Matthew Knepley : > > > > On Thu, Aug 18, 2016 at 8:35 AM, ??? wrote: > > > > Hi, I'm trying to set my SNES matrix-free with Walker & Pernice way of computing h value. > > > > > > > > I found above command (MatSNESMFWPSetComputeNormU) but my fortran compiler couldn't fine any reference of that command. > > > > > > > > I checked Petsc changes log, but there weren't any mentions about that command. > > > > > > > > Should I have to include another specific header file? > > > > > > > > We have this function > > > > > > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatMFFDWPSetComputeNormU.html > > > > > > > > but I would recommend using the command line option > > > > > > > > -mat_mffd_compute_normu > > > > > > > > Thanks, > > > > > > > > Matt > > > > > > > > Thank you always. > > > > > > > > > > > > > > > > -- > > > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > > > -- Norbert Wiener > > > > > > > > > > > > > > > > > > > > -- > > > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > > > -- Norbert Wiener > > > > > > > > > > > > > > > > > > > > -- > > > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > > > -- Norbert Wiener > > > > > > > > > > > > > > > > > > > > -- > > > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > > > -- Norbert Wiener > > > > > > > > > > > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > > -- Norbert Wiener > > > > > > > > > From bsmith at mcs.anl.gov Thu Oct 6 13:05:04 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 6 Oct 2016 13:05:04 -0500 Subject: [petsc-users] printing snes prefix with monitor In-Reply-To: References: <54DDF861-53D7-4561-A6EB-E43B89C9DDBB@lsu.edu> Message-ID: <742F0EBF-3B4E-42A9-955A-F5F11967CFC2@mcs.anl.gov> Ah, this support was added to KSP but not SNES. We will accept a pull request that adds the support for SNES (in your case you should also most definitely do the indenting Matt suggests). PetscErrorCode SNESMonitorDefault(SNES snes,PetscInt its,PetscReal fgnorm,PetscViewerAndFormat *vf) { PetscErrorCode ierr; PetscViewer viewer = vf->viewer; PetscFunctionBegin; PetscValidHeaderSpecific(viewer,PETSC_VIEWER_CLASSID,4); ierr = PetscViewerPushFormat(viewer,vf->format);CHKERRQ(ierr); ierr = PetscViewerASCIIAddTab(viewer,((PetscObject)snes)->tablevel);CHKERRQ(ierr); ierr = PetscViewerASCIIPrintf(viewer,"%3D SNES Function norm %14.12e \n",its,(double)fgnorm);CHKERRQ(ierr); ierr = PetscViewerASCIISubtractTab(viewer,((PetscObject)snes)->tablevel);CHKERRQ(ierr); ierr = PetscViewerPopFormat(viewer);CHKERRQ(ierr) ; PetscFunctionReturn(0); } PetscErrorCode KSPMonitorDefault(KSP ksp,PetscInt n,PetscReal rnorm,PetscViewerAndFormat *dummy) { PetscErrorCode ierr; PetscViewer viewer = dummy->viewer; PetscFunctionBegin; PetscValidHeaderSpecific(viewer,PETSC_VIEWER_CLASSID,4); ierr = PetscViewerPushFormat(viewer,dummy->format);CHKERRQ(ierr); ierr = PetscViewerASCIIAddTab(viewer,((PetscObject)ksp)->tablevel);CHKERRQ(ierr); if (n == 0 && ((PetscObject)ksp)->prefix) { ierr = PetscViewerASCIIPrintf(viewer," Residual norms for %s solve.\n",((PetscObject)ksp)->prefix);CHKERRQ(ierr); } ierr = PetscViewerASCIIPrintf(viewer,"%3D KSP Residual norm %14.12e \n",n,(double)rnorm);CHKERRQ(ierr); ierr = PetscViewerASCIISubtractTab(viewer,((PetscObject)ksp)->tablevel);CHKERRQ(ierr); ierr = PetscViewerPopFormat(viewer);CHKERRQ(ierr); PetscFunctionReturn(0); } > On Oct 6, 2016, at 9:55 AM, Matthew Knepley wrote: > > On Thu, Oct 6, 2016 at 9:51 AM, Blaise A Bourdin wrote: > Hi, > > I have a problem with 2 nested snes (i.e. SNESComputeFunction for snes1 involves a SNESSolve for snes2). > Each snes has a different prefix. The problem is that the SESMonitor won?t print the SNES prefix, so that making sense of output can be a bit tricky? > Is there a simple way to have each snes monitor display the prefix of the snes it refers to? Alternatively, where in the source code is the residual printed during snessolve? > > It would be nice to have a mode that put the prefix on the monitor line. > > What we currently do is indent the subsolve. I normally make the tab level 1 greater than the enclosing solve > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscObjectGetTabLevel.html#PetscObjectGetTabLevel > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscObjectSetTabLevel.html > > Thanks, > > Matt > > > Blaise > -- > Department of Mathematics and Center for Computation & Technology > Louisiana State University, Baton Rouge, LA 70803, USA > Tel. +1 (225) 578 1612, Fax +1 (225) 578 4276 http://www.math.lsu.edu/~bourdin > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener From hengjiew at uci.edu Thu Oct 6 19:33:16 2016 From: hengjiew at uci.edu (frank) Date: Thu, 6 Oct 2016 17:33:16 -0700 Subject: [petsc-users] Performance of the Telescope Multigrid Preconditioner In-Reply-To: References: <577C337B.60909@uci.edu> <5783D3E4.4020004@uci.edu> <5786C9C7.1080309@uci.edu> <5959F823-EDE5-4B34-84C2-271076977368@mcs.anl.gov> <0CFDEA05-2C49-4127-9F13-2B2DB71ADA77@mcs.anl.gov> <27f4756a-3c58-5c56-fd5b-000aac881a5b@uci.edu> <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> Message-ID: Dear Dave, Follow your advice, I solve the identical equation twice and time two steps separately. The result is below: Test: 1024^3 grid points Cores# reduction factor MG levels# time of 1st solve 2nd time 4096 64 6 + 3 3.85 1.75 8192 128 5 + 3 5.52 0.91 16384 256 5 + 3 5.37 0.52 32768 512 5 + 4 3.03 0.36 32768 64 | 8 4 | 3 | 3 2.80 0.43 65536 1024 5 + 4 3.38 0.59 65536 32 | 32 4 | 4 | 3 2.14 0.22 I also attached the log_view info from all the run. The file is names by the cores# + reduction factor. The ksp_view and petsc_options for the 1st run are also included. Others are similar. The only differences are the reduction factor and mg levels. ** The time for the 1st solve is generally much larger. Is this because the ksp solver on the sub-communicator is set up during the 1st solve? ** The time for 1st solve does not scale. In practice, I am solving a variable coefficient Poisson equation. I need to build the matrix every time step. Therefore, each step is similar to the 1st solve which does not scale. Is there a way I can improve the performance? ** The 2nd solve scales but not quite well for more than 16384 cores. It seems to me that the performance depends on the tuning of MG levels on the sub-communicator(s). Is there some general strategies regarding how to distribute the levels? or when to use multiple sub-communicators ? Thank you. Regards, Frank On 10/04/2016 12:56 PM, Dave May wrote: > > > On Tuesday, 4 October 2016, frank > wrote: > > Hi, > > This question is follow-up of the thread "Question about memory > usage in Multigrid preconditioner". > I used to have the "Out of Memory(OOM)" problem when using the > CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0; > -matptap_scalable" option did solve that problem. > > Then I test the scalability by solving a 3d poisson eqn for 1 > step. I used one sub-communicator in all the tests. The difference > between the petsc options in those tests are: 1 the > pc_telescope_reduction_factor; 2 the number of multigrid levels in > the up/down solver. The function "ksp_solve" is timed. It is kind > of slow and doesn't scale at all. > > Test1: 512^3 grid points > Core# telescope_reduction_factor MG levels# for up/down > solver Time for KSPSolve (s) > 512 8 4 / 3 6.2466 > 4096 64 5 / 3 0.9361 > 32768 64 4 / 3 4.8914 > > Test2: 1024^3 grid points > Core# telescope_reduction_factor MG levels# for up/down > solver Time for KSPSolve (s) > 4096 64 5 / 4 3.4139 > 8192 128 5 / 4 2.4196 > 16384 32 5 / 3 5.4150 > 32768 64 5 / 3 5.6067 > 65536 128 5 / 3 6.5219 > > > You have to be very careful how you interpret these numbers. Your > solver contains nested calls to KSPSolve, and unfortunately as a > result the numbers you report include setup time. This will remain > true even if you call KSPSetUp on the outermost KSP. > > Your email concerns scalability of the silver application, so let's > focus on that issue. > > The only way to clearly separate setup from solve time is to perform > two identical solves. The second solve will not require any setup. You > should monitor the second solve via a new PetscStage. > > This was what I did in the telescope paper. It was the only way to > understand the setup cost (and scaling) cf the solve time (and scaling). > > Thanks > Dave > > I guess I didn't set the MG levels properly. What would be the > efficient way to arrange the MG levels? > Also which preconditionr at the coarse mesh of the 2nd > communicator should I use to improve the performance? > > I attached the test code and the petsc options file for the 1024^3 > cube with 32768 cores. > > Thank you. > > Regards, > Frank > > > > > > > On 09/15/2016 03:35 AM, Dave May wrote: >> HI all, >> >> I the only unexpected memory usage I can see is associated with >> the call to MatPtAP(). >> Here is something you can try immediately. >> Run your code with the additional options >> -matrap 0 -matptap_scalable >> >> I didn't realize this before, but the default behaviour of >> MatPtAP in parallel is actually to to explicitly form the >> transpose of P (e.g. assemble R = P^T) and then compute R.A.P. >> You don't want to do this. The option -matrap 0 resolves this issue. >> >> The implementation of P^T.A.P has two variants. >> The scalable implementation (with respect to memory usage) is >> selected via the second option -matptap_scalable. >> >> Try it out - I see a significant memory reduction using these >> options for particular mesh sizes / partitions. >> >> I've attached a cleaned up version of the code you sent me. >> There were a number of memory leaks and other issues. >> The main points being >> * You should call DMDAVecGetArrayF90() before >> VecAssembly{Begin,End} >> * You should call PetscFinalize(), otherwise the option >> -log_summary (-log_view) will not display anything once the >> program has completed. >> >> >> Thanks, >> Dave >> >> >> On 15 September 2016 at 08:03, Hengjie Wang > > wrote: >> >> Hi Dave, >> >> Sorry, I should have put more comment to explain the code. >> The number of process in each dimension is the same: Px = >> Py=Pz=P. So is the domain size. >> So if the you want to run the code for a 512^3 grid points >> on 16^3 cores, you need to set "-N 512 -P 16" in the command >> line. >> I add more comments and also fix an error in the attached >> code. ( The error only effects the accuracy of solution but >> not the memory usage. ) >> >> Thank you. >> Frank >> >> >> On 9/14/2016 9:05 PM, Dave May wrote: >>> >>> >>> On Thursday, 15 September 2016, Dave May >>> >> > >>> wrote: >>> >>> >>> >>> On Thursday, 15 September 2016, frank >>> wrote: >>> >>> Hi, >>> >>> I write a simple code to re-produce the error. I >>> hope this can help to diagnose the problem. >>> The code just solves a 3d poisson equation. >>> >>> >>> Why is the stencil width a runtime parameter?? And why >>> is the default value 2? For 7-pnt FD Laplace, you only >>> need a stencil width of 1. >>> >>> Was this choice made to mimic something in the >>> real application code? >>> >>> >>> Please ignore - I misunderstood your usage of the param set >>> by -P >>> >>> >>> I run the code on a 1024^3 mesh. The process >>> partition is 32 * 32 * 32. That's when I re-produce >>> the OOM error. Each core has about 2G memory. >>> I also run the code on a 512^3 mesh with 16 * 16 * >>> 16 processes. The ksp solver works fine. >>> I attached the code, ksp_view_pre's output and my >>> petsc option file. >>> >>> Thank you. >>> Frank >>> >>> On 09/09/2016 06:38 PM, Hengjie Wang wrote: >>>> Hi Barry, >>>> >>>> I checked. On the supercomputer, I had the option >>>> "-ksp_view_pre" but it is not in file I sent you. I >>>> am sorry for the confusion. >>>> >>>> Regards, >>>> Frank >>>> >>>> On Friday, September 9, 2016, Barry Smith >>>> wrote: >>>> >>>> >>>> > On Sep 9, 2016, at 3:11 PM, frank >>>> wrote: >>>> > >>>> > Hi Barry, >>>> > >>>> > I think the first KSP view output is from >>>> -ksp_view_pre. Before I submitted the test, I >>>> was not sure whether there would be OOM error >>>> or not. So I added both -ksp_view_pre and >>>> -ksp_view. >>>> >>>> But the options file you sent specifically >>>> does NOT list the -ksp_view_pre so how could it >>>> be from that? >>>> >>>> Sorry to be pedantic but I've spent too much >>>> time in the past trying to debug from incorrect >>>> information and want to make sure that the >>>> information I have is correct before thinking. >>>> Please recheck exactly what happened. Rerun >>>> with the exact input file you emailed if that >>>> is needed. >>>> >>>> Barry >>>> >>>> > >>>> > Frank >>>> > >>>> > >>>> > On 09/09/2016 12:38 PM, Barry Smith wrote: >>>> >> Why does ksp_view2.txt have two KSP views >>>> in it while ksp_view1.txt has only one KSPView >>>> in it? Did you run two different solves in the >>>> 2 case but not the one? >>>> >> >>>> >> Barry >>>> >> >>>> >> >>>> >> >>>> >>> On Sep 9, 2016, at 10:56 AM, frank >>>> wrote: >>>> >>> >>>> >>> Hi, >>>> >>> >>>> >>> I want to continue digging into the memory >>>> problem here. >>>> >>> I did find a work around in the past, which >>>> is to use less cores per node so that each core >>>> has 8G memory. However this is deficient and >>>> expensive. I hope to locate the place that uses >>>> the most memory. >>>> >>> >>>> >>> Here is a brief summary of the tests I did >>>> in past: >>>> >>>> Test1: Mesh 1536*128*384 | Process Mesh >>>> 48*4*12 >>>> >>> Maximum (over computational time) process >>>> memory: total 7.0727e+08 >>>> >>> Current process memory: total 7.0727e+08 >>>> >>> Maximum (over computational time) space >>>> PetscMalloc()ed: total 6.3908e+11 >>>> >>> Current space PetscMalloc()ed: >>>> total 1.8275e+09 >>>> >>> >>>> >>>> Test2: Mesh 1536*128*384 | Process Mesh >>>> 96*8*24 >>>> >>> Maximum (over computational time) process >>>> memory: total 5.9431e+09 >>>> >>> Current process memory: total 5.9431e+09 >>>> >>> Maximum (over computational time) space >>>> PetscMalloc()ed: total 5.3202e+12 >>>> >>> Current space PetscMalloc()ed: >>>> total 5.4844e+09 >>>> >>> >>>> >>>> Test3: Mesh 3072*256*768 | Process Mesh >>>> 96*8*24 >>>> >>> OOM( Out Of Memory ) killer of the >>>> supercomputer terminated the job during "KSPSolve". >>>> >>> >>>> >>> I attached the output of ksp_view( the >>>> third test's output is from ksp_view_pre ), >>>> memory_view and also the petsc options. >>>> >>> >>>> >>> In all the tests, each core can access >>>> about 2G memory. In test3, there are 4223139840 >>>> non-zeros in the matrix. This will consume >>>> about 1.74M, using double precision. >>>> Considering some extra memory used to store >>>> integer index, 2G memory should still be way >>>> enough. >>>> >>> >>>> >>> Is there a way to find out which part of >>>> KSPSolve uses the most memory? >>>> >>> Thank you so much. >>>> >>> >>>> >>> BTW, there are 4 options remains unused and >>>> I don't understand why they are omitted: >>>> >>> -mg_coarse_telescope_mg_coarse_ksp_type >>>> value: preonly >>>> >>> -mg_coarse_telescope_mg_coarse_pc_type >>>> value: bjacobi >>>> >>> -mg_coarse_telescope_mg_levels_ksp_max_it >>>> value: 1 >>>> >>> -mg_coarse_telescope_mg_levels_ksp_type >>>> value: richardson >>>> >>> >>>> >>> >>>> >>> Regards, >>>> >>> Frank >>>> >>> >>>> >>> On 07/13/2016 05:47 PM, Dave May wrote: >>>> >>>> >>>> >>>> On 14 July 2016 at 01:07, frank >>>> wrote: >>>> >>>> Hi Dave, >>>> >>>> >>>> >>>> Sorry for the late reply. >>>> >>>> Thank you so much for your detailed reply. >>>> >>>> >>>> >>>> I have a question about the estimation of >>>> the memory usage. There are 4223139840 >>>> allocated non-zeros and 18432 MPI processes. >>>> Double precision is used. So the memory per >>>> process is: >>>> >>>> 4223139840 * 8bytes / 18432 / 1024 / 1024 >>>> = 1.74M ? >>>> >>>> Did I do sth wrong here? Because this >>>> seems too small. >>>> >>>> >>>> >>>> No - I totally f***ed it up. You are >>>> correct. That'll teach me for fumbling around >>>> with my iphone calculator and not using my >>>> brain. (Note that to convert to MB just divide >>>> by 1e6, not 1024^2 - although I apparently >>>> cannot convert between units correctly....) >>>> >>>> >>>> >>>> From the PETSc objects associated with the >>>> solver, It looks like it _should_ run with 2GB >>>> per MPI rank. Sorry for my mistake. >>>> Possibilities are: somewhere in your usage of >>>> PETSc you've introduced a memory leak; PETSc is >>>> doing a huge over allocation (e.g. as per our >>>> discussion of MatPtAP); or in your application >>>> code there are other objects you have forgotten >>>> to log the memory for. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> I am running this job on Bluewater >>>> >>>> I am using the 7 points FD stencil in 3D. >>>> >>>> >>>> >>>> I thought so on both counts. >>>> >>>> >>>> >>>> I apologize that I made a stupid mistake >>>> in computing the memory per core. My settings >>>> render each core can access only 2G memory on >>>> average instead of 8G which I mentioned in >>>> previous email. I re-run the job with 8G memory >>>> per core on average and there is no "Out Of >>>> Memory" error. I would do more test to see if >>>> there is still some memory issue. >>>> >>>> >>>> >>>> Ok. I'd still like to know where the >>>> memory was being used since my estimates were off. >>>> >>>> >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Dave >>>> >>>> >>>> >>>> Regards, >>>> >>>> Frank >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 07/11/2016 01:18 PM, Dave May wrote: >>>> >>>>> Hi Frank, >>>> >>>>> >>>> >>>>> >>>> >>>>> On 11 July 2016 at 19:14, frank >>>> wrote: >>>> >>>>> Hi Dave, >>>> >>>>> >>>> >>>>> I re-run the test using bjacobi as the >>>> preconditioner on the coarse mesh of telescope. >>>> The Grid is 3072*256*768 and process mesh is >>>> 96*8*24. The petsc option file is attached. >>>> >>>>> I still got the "Out Of Memory" error. >>>> The error occurred before the linear solver >>>> finished one step. So I don't have the full >>>> info from ksp_view. The info from ksp_view_pre >>>> is attached. >>>> >>>>> >>>> >>>>> Okay - that is essentially useless (sorry) >>>> >>>>> >>>> >>>>> It seems to me that the error occurred >>>> when the decomposition was going to be changed. >>>> >>>>> >>>> >>>>> Based on what information? >>>> >>>>> Running with -info would give us more >>>> clues, but will create a ton of output. >>>> >>>>> Please try running the case which failed >>>> with -info >>>> >>>>> I had another test with a grid of >>>> 1536*128*384 and the same process mesh as >>>> above. There was no error. The ksp_view info is >>>> attached for comparison. >>>> >>>>> Thank you. >>>> >>>>> >>>> >>>>> >>>> >>>>> [3] Here is my crude estimate of your >>>> memory usage. >>>> >>>>> I'll target the biggest memory hogs only >>>> to get an order of magnitude estimate >>>> >>>>> >>>> >>>>> * The Fine grid operator contains >>>> 4223139840 non-zeros --> 1.8 GB per MPI rank >>>> assuming double precision. >>>> >>>>> The indices for the AIJ could amount to >>>> another 0.3 GB (assuming 32 bit integers) >>>> >>>>> >>>> >>>>> * You use 5 levels of coarsening, so the >>>> other operators should represent (collectively) >>>> >>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4 ~ >>>> 300 MB per MPI rank on the communicator with >>>> 18432 ranks. >>>> >>>>> The coarse grid should consume ~ 0.5 MB >>>> per MPI rank on the communicator with 18432 ranks. >>>> >>>>> >>>> >>>>> * You use a reduction factor of 64, >>>> making the new communicator with 288 MPI ranks. >>>> >>>>> PCTelescope will first gather a temporary >>>> matrix associated with your coarse level >>>> operator assuming a comm size of 288 living on >>>> the comm with size 18432. >>>> >>>>> This matrix will require approximately >>>> 0.5 * 64 = 32 MB per core on the 288 ranks. >>>> >>>>> This matrix is then used to form a new >>>> MPIAIJ matrix on the subcomm, thus require >>>> another 32 MB per rank. >>>> >>>>> The temporary matrix is now destroyed. >>>> >>>>> >>>> >>>>> * Because a DMDA is detected, a >>>> permutation matrix is assembled. >>>> >>>>> This requires 2 doubles per point in the >>>> DMDA. >>>> >>>>> Your coarse DMDA contains 92 x 16 x 48 >>>> points. >>>> >>>>> Thus the permutation matrix will require >>>> < 1 MB per MPI rank on the sub-comm. >>>> >>>>> >>>> >>>>> * Lastly, the matrix is permuted. This >>>> uses MatPtAP(), but the resulting operator will >>>> have the same memory footprint as the >>>> unpermuted matrix (32 MB). At any stage in >>>> PCTelescope, only 2 operators of size 32 MB are >>>> held in memory when the DMDA is provided. >>>> >>>>> >>>> >>>>> From my rough estimates, the worst case >>>> memory foot print for any given core, given >>>> your options is approximately >>>> >>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB >>>> = 2465 MB >>>> >>>>> This is way below 8 GB. >>>> >>>>> >>>> >>>>> Note this estimate completely ignores: >>>> >>>>> (1) the memory required for the >>>> restriction operator, >>>> >>>>> (2) the potential growth in the number of >>>> non-zeros per row due to Galerkin coarsening (I >>>> wished -ksp_view_pre reported the output from >>>> MatView so we could see the number of non-zeros >>>> required by the coarse level operators) >>>> >>>>> (3) all temporary vectors required by the >>>> CG solver, and those required by the smoothers. >>>> >>>>> (4) internal memory allocated by MatPtAP >>>> >>>>> (5) memory associated with IS's used >>>> within PCTelescope >>>> >>>>> >>>> >>>>> So either I am completely off in my >>>> estimates, or you have not carefully estimated >>>> the memory usage of your application code. >>>> Hopefully others might examine/correct my rough >>>> estimates >>>> >>>>> >>>> >>>>> Since I don't have your code I cannot >>>> access the latter. >>>> >>>>> Since I don't have access to the same >>>> machine you are running on, I think we need to >>>> take a step back. >>>> >>>>> >>>> >>>>> [1] What machine are you running on? Send >>>> me a URL if its available >>>> >>>>> >>>> >>>>> [2] What discretization are you using? (I >>>> am guessing a scalar 7 point FD stencil) >>>> >>>>> If it's a 7 point FD stencil, we should >>>> be able to examine the memory usage of your >>>> solver configuration using a standard, light >>>> weight existing PETSc example, run on your >>>> machine at the same scale. >>>> >>>>> This would hopefully enable us to >>>> correctly evaluate the actual memory usage >>>> required by the solver configuration you are using. >>>> >>>>> >>>> >>>>> Thanks, >>>> >>>>> Dave >>>> >>>>> >>>> >>>>> >>>> >>>>> Frank >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> On 07/08/2016 10:38 PM, Dave May wrote: >>>> >>>>>> >>>> >>>>>> On Saturday, 9 July 2016, frank >>>> wrote: >>>> >>>>>> Hi Barry and Dave, >>>> >>>>>> >>>> >>>>>> Thank both of you for the advice. >>>> >>>>>> >>>> >>>>>> @Barry >>>> >>>>>> I made a mistake in the file names in >>>> last email. I attached the correct files this time. >>>> >>>>>> For all the three tests, 'Telescope' is >>>> used as the coarse preconditioner. >>>> >>>>>> >>>> >>>>>> == Test1: Grid: 1536*128*384, >>>> Process Mesh: 48*4*12 >>>> >>>>>> Part of the memory usage: Vector 125 >>>> 124 3971904 0. >>>> >>>>>> Matrix 101 101 >>>> 9462372 0 >>>> >>>>>> >>>> >>>>>> == Test2: Grid: 1536*128*384, Process >>>> Mesh: 96*8*24 >>>> >>>>>> Part of the memory usage: Vector 125 >>>> 124 681672 0. >>>> >>>>>> Matrix 101 101 >>>> 1462180 0. >>>> >>>>>> >>>> >>>>>> In theory, the memory usage in Test1 >>>> should be 8 times of Test2. In my case, it is >>>> about 6 times. >>>> >>>>>> >>>> >>>>>> == Test3: Grid: 3072*256*768, Process >>>> Mesh: 96*8*24. Sub-domain per process: 32*32*32 >>>> >>>>>> Here I get the out of memory error. >>>> >>>>>> >>>> >>>>>> I tried to use -mg_coarse jacobi. In >>>> this way, I don't need to set >>>> -mg_coarse_ksp_type and -mg_coarse_pc_type >>>> explicitly, right? >>>> >>>>>> The linear solver didn't work in this >>>> case. Petsc output some errors. >>>> >>>>>> >>>> >>>>>> @Dave >>>> >>>>>> In test3, I use only one instance of >>>> 'Telescope'. On the coarse mesh of 'Telescope', >>>> I used LU as the preconditioner instead of SVD. >>>> >>>>>> If my set the levels correctly, then on >>>> the last coarse mesh of MG where it calls >>>> 'Telescope', the sub-domain per process is 2*2*2. >>>> >>>>>> On the last coarse mesh of 'Telescope', >>>> there is only one grid point per process. >>>> >>>>>> I still got the OOM error. The detailed >>>> petsc option file is attached. >>>> >>>>>> >>>> >>>>>> Do you understand the expected memory >>>> usage for the particular parallel LU >>>> implementation you are using? I don't >>>> (seriously). Replace LU with bjacobi and re-run >>>> this test. My point about solver debugging is >>>> still valid. >>>> >>>>>> >>>> >>>>>> And please send the result of KSPView so >>>> we can see what is actually used in the >>>> computations >>>> >>>>>> >>>> >>>>>> Thanks >>>> >>>>>> Dave >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> Thank you so much. >>>> >>>>>> >>>> >>>>>> Frank >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote: >>>> >>>>>> On Jul 6, 2016, at 4:19 PM, frank >>>> wrote: >>>> >>>>>> >>>> >>>>>> Hi Barry, >>>> >>>>>> >>>> >>>>>> Thank you for you advice. >>>> >>>>>> I tried three test. In the 1st test, the >>>> grid is 3072*256*768 and the process mesh is >>>> 96*8*24. >>>> >>>>>> The linear solver is 'cg' the >>>> preconditioner is 'mg' and 'telescope' is used >>>> as the preconditioner at the coarse mesh. >>>> >>>>>> The system gives me the "Out of Memory" >>>> error before the linear system is completely >>>> solved. >>>> >>>>>> The info from '-ksp_view_pre' is >>>> attached. I seems to me that the error occurs >>>> when it reaches the coarse mesh. >>>> >>>>>> >>>> >>>>>> The 2nd test uses a grid of 1536*128*384 >>>> and process mesh is 96*8*24. The 3rd >>>> test uses the same grid but a different >>>> process mesh 48*4*12. >>>> >>>>>> Are you sure this is right? The total >>>> matrix and vector memory usage goes from 2nd test >>>> >>>>>> Vector 384 383 >>>> 8,193,712 0. >>>> >>>>>> Matrix 103 103 >>>> 11,508,688 0. >>>> >>>>>> to 3rd test >>>> >>>>>> Vector 384 383 >>>> 1,590,520 0. >>>> >>>>>> Matrix 103 103 >>>> 3,508,664 0. >>>> >>>>>> that is the memory usage got smaller but >>>> if you have only 1/8th the processes and the >>>> same grid it should have gotten about 8 times >>>> bigger. Did you maybe cut the grid by a factor >>>> of 8 also? If so that still doesn't explain it >>>> because the memory usage changed by a factor of >>>> 5 something for the vectors and 3 something for >>>> the matrices. >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> The linear solver and petsc options in >>>> 2nd and 3rd tests are the same in 1st test. The >>>> linear solver works fine in both test. >>>> >>>>>> I attached the memory usage of the 2nd >>>> and 3rd tests. The memory info is from the >>>> option '-log_summary'. I tried to use >>>> '-momery_info' as you suggested, but in my case >>>> petsc treated it as an unused option. It output >>>> nothing about the memory. Do I need to add sth >>>> to my code so I can use '-memory_info'? >>>> >>>>>> Sorry, my mistake the option is >>>> -memory_view >>>> >>>>>> >>>> >>>>>> Can you run the one case with >>>> -memory_view and -mg_coarse jacobi -ksp_max_it >>>> 1 (just so it doesn't iterate forever) to see >>>> how much memory is used without the telescope? >>>> Also run case 2 the same way. >>>> >>>>>> >>>> >>>>>> Barry >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> In both tests the memory usage is not large. >>>> >>>>>> >>>> >>>>>> It seems to me that it might be the >>>> 'telescope' preconditioner that allocated a lot >>>> of memory and caused the error in the 1st test. >>>> >>>>>> Is there is a way to show how much >>>> memory it allocated? >>>> >>>>>> >>>> >>>>>> Frank >>>> >>>>>> >>>> >>>>>> On 07/05/2016 03:37 PM, Barry Smith wrote: >>>> >>>>>> Frank, >>>> >>>>>> >>>> >>>>>> You can run with -ksp_view_pre to have >>>> it "view" the KSP before the solve so hopefully >>>> it gets that far. >>>> >>>>>> >>>> >>>>>> Please run the problem that does fit >>>> with -memory_info when the problem completes it >>>> will show the "high water mark" for PETSc >>>> allocated memory and total memory used. We >>>> first want to look at these numbers to see if >>>> it is using more memory than you expect. You >>>> could also run with say half the grid spacing >>>> to see how the memory usage scaled with the >>>> increase in grid points. Make the runs also >>>> with -log_view and send all the output from >>>> these options. >>>> >>>>>> >>>> >>>>>> Barry >>>> >>>>>> >>>> >>>>>> On Jul 5, 2016, at 5:23 PM, frank >>>> wrote: >>>> >>>>>> >>>> >>>>>> Hi, >>>> >>>>>> >>>> >>>>>> I am using the CG ksp solver and >>>> Multigrid preconditioner to solve a linear >>>> system in parallel. >>>> >>>>>> I chose to use the 'Telescope' as the >>>> preconditioner on the coarse mesh for its good >>>> performance. >>>> >>>>>> The petsc options file is attached. >>>> >>>>>> >>>> >>>>>> The domain is a 3d box. >>>> >>>>>> It works well when the grid is >>>> 1536*128*384 and the process mesh is 96*8*24. >>>> When I double the size of grid and >>>> keep the same process mesh and petsc >>>> options, I get an "out of memory" error from >>>> the super-cluster I am using. >>>> >>>>>> Each process has access to at least 8G >>>> memory, which should be more than enough for my >>>> application. I am sure that all the other parts >>>> of my code( except the linear solver ) do not >>>> use much memory. So I doubt if there is >>>> something wrong with the linear solver. >>>> >>>>>> The error occurs before the linear >>>> system is completely solved so I don't have the >>>> info from ksp view. I am not able to re-produce >>>> the error with a smaller problem either. >>>> >>>>>> In addition, I tried to use the block >>>> jacobi as the preconditioner with the same grid >>>> and same decomposition. The linear solver runs >>>> extremely slow but there is no memory error. >>>> >>>>>> >>>> >>>>>> How can I diagnose what exactly cause >>>> the error? >>>> >>>>>> Thank you so much. >>>> >>>>>> >>>> >>>>>> Frank >>>> >>>>>> >>>> >>>>>> >>>> >>>> >>>>>> >>>> >>>>> >>>> >>>> >>>> >>> >>>> >>>> > >>>> >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- Linear solve converged due to CONVERGED_RTOL iterations 7 KSP Object: 4096 MPI processes type: cg maximum iterations=10000 tolerances: relative=1e-07, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using UNPRECONDITIONED norm type for convergence test PC Object: 4096 MPI processes type: mg MG: type is MULTIPLICATIVE, levels=6 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 4096 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 4096 MPI processes type: telescope Telescope: parent comm size reduction factor = 64 Telescope: comm_size = 4096 , subcomm_size = 64 Telescope: subcomm type: interlaced Telescope: DMDA detected DMDA Object: (mg_coarse_telescope_repart_) 64 MPI processes M 32 N 32 P 32 m 4 n 4 p 4 dof 1 overlap 1 KSP Object: (mg_coarse_telescope_) 64 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_telescope_) 64 MPI processes type: mg MG: type is MULTIPLICATIVE, levels=3 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_telescope_mg_coarse_) 64 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_telescope_mg_coarse_) 64 MPI processes type: redundant Redundant preconditioner: First (color=0) of 64 PCs follows linear system matrix = precond matrix: Mat Object: 64 MPI processes type: mpiaij rows=512, cols=512 total: nonzeros=13824, allocated nonzeros=13824 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 2 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_coarse_telescope_mg_levels_1_) 64 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_coarse_telescope_mg_levels_1_) 64 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 64 MPI processes type: mpiaij rows=4096, cols=4096 total: nonzeros=110592, allocated nonzeros=110592 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_coarse_telescope_mg_levels_2_) 64 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_coarse_telescope_mg_levels_2_) 64 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 64 MPI processes type: mpiaij rows=32768, cols=32768 total: nonzeros=884736, allocated nonzeros=884736 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 64 MPI processes type: mpiaij rows=32768, cols=32768 total: nonzeros=884736, allocated nonzeros=884736 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines KSP Object: (mg_coarse_telescope_mg_coarse_redundant_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_telescope_mg_coarse_redundant_) 1 MPI processes type: lu out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5., needed 8.69575 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=512, cols=512 package used to perform factorization: petsc total: nonzeros=120210, allocated nonzeros=120210 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=512, cols=512 total: nonzeros=13824, allocated nonzeros=13824 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 4096 MPI processes type: mpiaij rows=32768, cols=32768 total: nonzeros=884736, allocated nonzeros=884736 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 2 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 4096 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 4096 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 4096 MPI processes type: mpiaij rows=262144, cols=262144 total: nonzeros=7077888, allocated nonzeros=7077888 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 4096 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_2_) 4096 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 4096 MPI processes type: mpiaij rows=2097152, cols=2097152 total: nonzeros=56623104, allocated nonzeros=56623104 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 4096 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_3_) 4096 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 4096 MPI processes type: mpiaij rows=16777216, cols=16777216 total: nonzeros=452984832, allocated nonzeros=452984832 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 4 ------------------------------- KSP Object: (mg_levels_4_) 4096 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_4_) 4096 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 4096 MPI processes type: mpiaij rows=134217728, cols=134217728 total: nonzeros=3623878656, allocated nonzeros=3623878656 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 5 ------------------------------- KSP Object: (mg_levels_5_) 4096 MPI processes type: richardson Richardson: damping factor=1. maximum iterations=1 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_5_) 4096 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 4096 MPI processes type: mpiaij rows=1073741824, cols=1073741824 total: nonzeros=7516192768, allocated nonzeros=7516192768 total number of mallocs used during MatSetValues calls =0 has attached null space Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 4096 MPI processes type: mpiaij rows=1073741824, cols=1073741824 total: nonzeros=7516192768, allocated nonzeros=7516192768 total number of mallocs used during MatSetValues calls =0 has attached null space -------------- next part -------------- A non-text attachment was scrubbed... Name: log_view.tar.gz Type: application/gzip Size: 25187 bytes Desc: not available URL: -------------- next part -------------- -ksp_type cg -ksp_norm_type unpreconditioned -ksp_rtol 1e-7 -options_left -ksp_initial_guess_nonzero yes -ksp_converged_reason -ppe_max_iter 20 -pc_type mg -pc_mg_galerkin -pc_mg_levels 6 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_coarse_ksp_type preonly -mg_coarse_pc_type telescope -mg_coarse_pc_telescope_reduction_factor 64 -matrap 0 -matptap_scalable -memory_view -log_view -options_left 1 # Setting dmdarepart on subcomm -mg_coarse_telescope_ksp_type preonly -mg_coarse_telescope_pc_type mg -mg_coarse_telescope_pc_mg_galerkin -mg_coarse_telescope_pc_mg_levels 3 -mg_coarse_telescope_mg_levels_ksp_max_it 1 -mg_coarse_telescope_mg_levels_ksp_type richardson -mg_coarse_telescope_mg_coarse_ksp_type preonly -mg_coarse_telescope_mg_coarse_pc_type redundant From knepley at gmail.com Thu Oct 6 20:05:54 2016 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 6 Oct 2016 20:05:54 -0500 Subject: [petsc-users] Performance of the Telescope Multigrid Preconditioner In-Reply-To: References: <577C337B.60909@uci.edu> <5783D3E4.4020004@uci.edu> <5786C9C7.1080309@uci.edu> <5959F823-EDE5-4B34-84C2-271076977368@mcs.anl.gov> <0CFDEA05-2C49-4127-9F13-2B2DB71ADA77@mcs.anl.gov> <27f4756a-3c58-5c56-fd5b-000aac881a5b@uci.edu> <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> Message-ID: On Thu, Oct 6, 2016 at 7:33 PM, frank wrote: > Dear Dave, > Follow your advice, I solve the identical equation twice and time two > steps separately. The result is below: > > Test: 1024^3 grid points > Cores# reduction factor MG levels# time of 1st solve 2nd time > 4096 64 6 + 3 > 3.85 1.75 > 8192 128 5 + 3 > 5.52 0.91 > 16384 256 5 + 3 5.37 > 0.52 > 32768 512 5 + 4 3.03 > 0.36 > 32768 64 | 8 4 | 3 | 3 2.80 > 0.43 > 65536 1024 5 + 4 3.38 > 0.59 > 65536 32 | 32 4 | 4 | 3 2.14 > 0.22 > > I also attached the log_view info from all the run. The file is names by > the cores# + reduction factor. > The ksp_view and petsc_options for the 1st run are also included. Others > are similar. The only differences are the reduction factor and mg levels. > > ** The time for the 1st solve is generally much larger. Is this because > the ksp solver on the sub-communicator is set up during the 1st solve? > All setup is done in the first solve. > ** The time for 1st solve does not scale. > In practice, I am solving a variable coefficient Poisson equation. I > need to build the matrix every time step. Therefore, each step is similar > to the 1st solve which does not scale. Is there a way I can improve the > performance? > You could use rediscretization instead of Galerkin to produce the coarse operators. > ** The 2nd solve scales but not quite well for more than 16384 cores. > How well were you looking for? This is strong scaling, which is has an Amdahl's Law limit. > It seems to me that the performance depends on the tuning of MG levels > on the sub-communicator(s). > Is there some general strategies regarding how to distribute the > levels? or when to use multiple sub-communicators ? > Also, you use CG/MG when FMG by itself would probably be faster. Your smoother is likely not strong enough, and you should use something like V(2,2). There is a lot of tuning that is possible, but difficult to automate. Thanks, Matt > Thank you. > > Regards, > Frank > > > > > > On 10/04/2016 12:56 PM, Dave May wrote: > > > > On Tuesday, 4 October 2016, frank wrote: > >> Hi, >> This question is follow-up of the thread "Question about memory usage in >> Multigrid preconditioner". >> I used to have the "Out of Memory(OOM)" problem when using the >> CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0; >> -matptap_scalable" option did solve that problem. >> >> Then I test the scalability by solving a 3d poisson eqn for 1 step. I >> used one sub-communicator in all the tests. The difference between the >> petsc options in those tests are: 1 the pc_telescope_reduction_factor; 2 >> the number of multigrid levels in the up/down solver. The function >> "ksp_solve" is timed. It is kind of slow and doesn't scale at all. >> >> Test1: 512^3 grid points >> Core# telescope_reduction_factor MG levels# for up/down >> solver Time for KSPSolve (s) >> 512 8 4 / >> 3 6.2466 >> 4096 64 5 / >> 3 0.9361 >> 32768 64 4 / >> 3 4.8914 >> >> Test2: 1024^3 grid points >> Core# telescope_reduction_factor MG levels# for up/down >> solver Time for KSPSolve (s) >> 4096 64 5 / 4 >> 3.4139 >> 8192 128 5 / >> 4 2.4196 >> 16384 32 5 / 3 >> 5.4150 >> 32768 64 5 / >> 3 5.6067 >> 65536 128 5 / >> 3 6.5219 >> > > You have to be very careful how you interpret these numbers. Your solver > contains nested calls to KSPSolve, and unfortunately as a result the > numbers you report include setup time. This will remain true even if you > call KSPSetUp on the outermost KSP. > > Your email concerns scalability of the silver application, so let's focus > on that issue. > > The only way to clearly separate setup from solve time is to perform two > identical solves. The second solve will not require any setup. You should > monitor the second solve via a new PetscStage. > > This was what I did in the telescope paper. It was the only way to > understand the setup cost (and scaling) cf the solve time (and scaling). > > Thanks > Dave > > > >> I guess I didn't set the MG levels properly. What would be the efficient >> way to arrange the MG levels? >> Also which preconditionr at the coarse mesh of the 2nd communicator >> should I use to improve the performance? >> >> I attached the test code and the petsc options file for the 1024^3 cube >> with 32768 cores. >> >> Thank you. >> >> Regards, >> Frank >> >> >> >> >> >> >> On 09/15/2016 03:35 AM, Dave May wrote: >> >> HI all, >> >> I the only unexpected memory usage I can see is associated with the call >> to MatPtAP(). >> Here is something you can try immediately. >> Run your code with the additional options >> -matrap 0 -matptap_scalable >> >> I didn't realize this before, but the default behaviour of MatPtAP in >> parallel is actually to to explicitly form the transpose of P (e.g. >> assemble R = P^T) and then compute R.A.P. >> You don't want to do this. The option -matrap 0 resolves this issue. >> >> The implementation of P^T.A.P has two variants. >> The scalable implementation (with respect to memory usage) is selected >> via the second option -matptap_scalable. >> >> Try it out - I see a significant memory reduction using these options for >> particular mesh sizes / partitions. >> >> I've attached a cleaned up version of the code you sent me. >> There were a number of memory leaks and other issues. >> The main points being >> * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End} >> * You should call PetscFinalize(), otherwise the option -log_summary >> (-log_view) will not display anything once the program has completed. >> >> >> Thanks, >> Dave >> >> >> On 15 September 2016 at 08:03, Hengjie Wang wrote: >> >>> Hi Dave, >>> >>> Sorry, I should have put more comment to explain the code. >>> The number of process in each dimension is the same: Px = Py=Pz=P. So is >>> the domain size. >>> So if the you want to run the code for a 512^3 grid points on 16^3 >>> cores, you need to set "-N 512 -P 16" in the command line. >>> I add more comments and also fix an error in the attached code. ( The >>> error only effects the accuracy of solution but not the memory usage. ) >>> >>> Thank you. >>> Frank >>> >>> >>> On 9/14/2016 9:05 PM, Dave May wrote: >>> >>> >>> >>> On Thursday, 15 September 2016, Dave May >>> wrote: >>> >>>> >>>> >>>> On Thursday, 15 September 2016, frank wrote: >>>> >>>>> Hi, >>>>> >>>>> I write a simple code to re-produce the error. I hope this can help to >>>>> diagnose the problem. >>>>> The code just solves a 3d poisson equation. >>>>> >>>> >>>> Why is the stencil width a runtime parameter?? And why is the default >>>> value 2? For 7-pnt FD Laplace, you only need a stencil width of 1. >>>> >>>> Was this choice made to mimic something in the real application code? >>>> >>> >>> Please ignore - I misunderstood your usage of the param set by -P >>> >>> >>>> >>>> >>>>> >>>>> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * >>>>> 32. That's when I re-produce the OOM error. Each core has about 2G memory. >>>>> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The >>>>> ksp solver works fine. >>>>> I attached the code, ksp_view_pre's output and my petsc option file. >>>>> >>>>> Thank you. >>>>> Frank >>>>> >>>>> On 09/09/2016 06:38 PM, Hengjie Wang wrote: >>>>> >>>>> Hi Barry, >>>>> >>>>> I checked. On the supercomputer, I had the option "-ksp_view_pre" but >>>>> it is not in file I sent you. I am sorry for the confusion. >>>>> >>>>> Regards, >>>>> Frank >>>>> >>>>> On Friday, September 9, 2016, Barry Smith wrote: >>>>> >>>>>> >>>>>> > On Sep 9, 2016, at 3:11 PM, frank wrote: >>>>>> > >>>>>> > Hi Barry, >>>>>> > >>>>>> > I think the first KSP view output is from -ksp_view_pre. Before I >>>>>> submitted the test, I was not sure whether there would be OOM error or not. >>>>>> So I added both -ksp_view_pre and -ksp_view. >>>>>> >>>>>> But the options file you sent specifically does NOT list the >>>>>> -ksp_view_pre so how could it be from that? >>>>>> >>>>>> Sorry to be pedantic but I've spent too much time in the past >>>>>> trying to debug from incorrect information and want to make sure that the >>>>>> information I have is correct before thinking. Please recheck exactly what >>>>>> happened. Rerun with the exact input file you emailed if that is needed. >>>>>> >>>>>> Barry >>>>>> >>>>>> > >>>>>> > Frank >>>>>> > >>>>>> > >>>>>> > On 09/09/2016 12:38 PM, Barry Smith wrote: >>>>>> >> Why does ksp_view2.txt have two KSP views in it while >>>>>> ksp_view1.txt has only one KSPView in it? Did you run two different solves >>>>>> in the 2 case but not the one? >>>>>> >> >>>>>> >> Barry >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >>> On Sep 9, 2016, at 10:56 AM, frank wrote: >>>>>> >>> >>>>>> >>> Hi, >>>>>> >>> >>>>>> >>> I want to continue digging into the memory problem here. >>>>>> >>> I did find a work around in the past, which is to use less cores >>>>>> per node so that each core has 8G memory. However this is deficient and >>>>>> expensive. I hope to locate the place that uses the most memory. >>>>>> >>> >>>>>> >>> Here is a brief summary of the tests I did in past: >>>>>> >>>> Test1: Mesh 1536*128*384 | Process Mesh 48*4*12 >>>>>> >>> Maximum (over computational time) process memory: total >>>>>> 7.0727e+08 >>>>>> >>> Current process memory: >>>>>> total 7.0727e+08 >>>>>> >>> Maximum (over computational time) space PetscMalloc()ed: total >>>>>> 6.3908e+11 >>>>>> >>> Current space PetscMalloc()ed: >>>>>> total 1.8275e+09 >>>>>> >>> >>>>>> >>>> Test2: Mesh 1536*128*384 | Process Mesh 96*8*24 >>>>>> >>> Maximum (over computational time) process memory: total >>>>>> 5.9431e+09 >>>>>> >>> Current process memory: >>>>>> total 5.9431e+09 >>>>>> >>> Maximum (over computational time) space PetscMalloc()ed: total >>>>>> 5.3202e+12 >>>>>> >>> Current space PetscMalloc()ed: >>>>>> total 5.4844e+09 >>>>>> >>> >>>>>> >>>> Test3: Mesh 3072*256*768 | Process Mesh 96*8*24 >>>>>> >>> OOM( Out Of Memory ) killer of the supercomputer terminated >>>>>> the job during "KSPSolve". >>>>>> >>> >>>>>> >>> I attached the output of ksp_view( the third test's output is >>>>>> from ksp_view_pre ), memory_view and also the petsc options. >>>>>> >>> >>>>>> >>> In all the tests, each core can access about 2G memory. In test3, >>>>>> there are 4223139840 non-zeros in the matrix. This will consume about >>>>>> 1.74M, using double precision. Considering some extra memory used to store >>>>>> integer index, 2G memory should still be way enough. >>>>>> >>> >>>>>> >>> Is there a way to find out which part of KSPSolve uses the most >>>>>> memory? >>>>>> >>> Thank you so much. >>>>>> >>> >>>>>> >>> BTW, there are 4 options remains unused and I don't understand >>>>>> why they are omitted: >>>>>> >>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly >>>>>> >>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi >>>>>> >>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1 >>>>>> >>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson >>>>>> >>> >>>>>> >>> >>>>>> >>> Regards, >>>>>> >>> Frank >>>>>> >>> >>>>>> >>> On 07/13/2016 05:47 PM, Dave May wrote: >>>>>> >>>> >>>>>> >>>> On 14 July 2016 at 01:07, frank wrote: >>>>>> >>>> Hi Dave, >>>>>> >>>> >>>>>> >>>> Sorry for the late reply. >>>>>> >>>> Thank you so much for your detailed reply. >>>>>> >>>> >>>>>> >>>> I have a question about the estimation of the memory usage. >>>>>> There are 4223139840 allocated non-zeros and 18432 MPI processes. Double >>>>>> precision is used. So the memory per process is: >>>>>> >>>> 4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ? >>>>>> >>>> Did I do sth wrong here? Because this seems too small. >>>>>> >>>> >>>>>> >>>> No - I totally f***ed it up. You are correct. That'll teach me >>>>>> for fumbling around with my iphone calculator and not using my brain. (Note >>>>>> that to convert to MB just divide by 1e6, not 1024^2 - although I >>>>>> apparently cannot convert between units correctly....) >>>>>> >>>> >>>>>> >>>> From the PETSc objects associated with the solver, It looks like >>>>>> it _should_ run with 2GB per MPI rank. Sorry for my mistake. Possibilities >>>>>> are: somewhere in your usage of PETSc you've introduced a memory leak; >>>>>> PETSc is doing a huge over allocation (e.g. as per our discussion of >>>>>> MatPtAP); or in your application code there are other objects you have >>>>>> forgotten to log the memory for. >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> I am running this job on Bluewater >>>>>> >>>> I am using the 7 points FD stencil in 3D. >>>>>> >>>> >>>>>> >>>> I thought so on both counts. >>>>>> >>>> >>>>>> >>>> I apologize that I made a stupid mistake in computing the memory >>>>>> per core. My settings render each core can access only 2G memory on average >>>>>> instead of 8G which I mentioned in previous email. I re-run the job with 8G >>>>>> memory per core on average and there is no "Out Of Memory" error. I would >>>>>> do more test to see if there is still some memory issue. >>>>>> >>>> >>>>>> >>>> Ok. I'd still like to know where the memory was being used since >>>>>> my estimates were off. >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> Thanks, >>>>>> >>>> Dave >>>>>> >>>> >>>>>> >>>> Regards, >>>>>> >>>> Frank >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> On 07/11/2016 01:18 PM, Dave May wrote: >>>>>> >>>>> Hi Frank, >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> On 11 July 2016 at 19:14, frank wrote: >>>>>> >>>>> Hi Dave, >>>>>> >>>>> >>>>>> >>>>> I re-run the test using bjacobi as the preconditioner on the >>>>>> coarse mesh of telescope. The Grid is 3072*256*768 and process mesh is >>>>>> 96*8*24. The petsc option file is attached. >>>>>> >>>>> I still got the "Out Of Memory" error. The error occurred >>>>>> before the linear solver finished one step. So I don't have the full info >>>>>> from ksp_view. The info from ksp_view_pre is attached. >>>>>> >>>>> >>>>>> >>>>> Okay - that is essentially useless (sorry) >>>>>> >>>>> >>>>>> >>>>> It seems to me that the error occurred when the decomposition >>>>>> was going to be changed. >>>>>> >>>>> >>>>>> >>>>> Based on what information? >>>>>> >>>>> Running with -info would give us more clues, but will create a >>>>>> ton of output. >>>>>> >>>>> Please try running the case which failed with -info >>>>>> >>>>> I had another test with a grid of 1536*128*384 and the same >>>>>> process mesh as above. There was no error. The ksp_view info is attached >>>>>> for comparison. >>>>>> >>>>> Thank you. >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> [3] Here is my crude estimate of your memory usage. >>>>>> >>>>> I'll target the biggest memory hogs only to get an order of >>>>>> magnitude estimate >>>>>> >>>>> >>>>>> >>>>> * The Fine grid operator contains 4223139840 non-zeros --> 1.8 >>>>>> GB per MPI rank assuming double precision. >>>>>> >>>>> The indices for the AIJ could amount to another 0.3 GB >>>>>> (assuming 32 bit integers) >>>>>> >>>>> >>>>>> >>>>> * You use 5 levels of coarsening, so the other operators should >>>>>> represent (collectively) >>>>>> >>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4 ~ 300 MB per MPI rank on >>>>>> the communicator with 18432 ranks. >>>>>> >>>>> The coarse grid should consume ~ 0.5 MB per MPI rank on the >>>>>> communicator with 18432 ranks. >>>>>> >>>>> >>>>>> >>>>> * You use a reduction factor of 64, making the new communicator >>>>>> with 288 MPI ranks. >>>>>> >>>>> PCTelescope will first gather a temporary matrix associated >>>>>> with your coarse level operator assuming a comm size of 288 living on the >>>>>> comm with size 18432. >>>>>> >>>>> This matrix will require approximately 0.5 * 64 = 32 MB per >>>>>> core on the 288 ranks. >>>>>> >>>>> This matrix is then used to form a new MPIAIJ matrix on the >>>>>> subcomm, thus require another 32 MB per rank. >>>>>> >>>>> The temporary matrix is now destroyed. >>>>>> >>>>> >>>>>> >>>>> * Because a DMDA is detected, a permutation matrix is assembled. >>>>>> >>>>> This requires 2 doubles per point in the DMDA. >>>>>> >>>>> Your coarse DMDA contains 92 x 16 x 48 points. >>>>>> >>>>> Thus the permutation matrix will require < 1 MB per MPI rank on >>>>>> the sub-comm. >>>>>> >>>>> >>>>>> >>>>> * Lastly, the matrix is permuted. This uses MatPtAP(), but the >>>>>> resulting operator will have the same memory footprint as the unpermuted >>>>>> matrix (32 MB). At any stage in PCTelescope, only 2 operators of size 32 MB >>>>>> are held in memory when the DMDA is provided. >>>>>> >>>>> >>>>>> >>>>> From my rough estimates, the worst case memory foot print for >>>>>> any given core, given your options is approximately >>>>>> >>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB = 2465 MB >>>>>> >>>>> This is way below 8 GB. >>>>>> >>>>> >>>>>> >>>>> Note this estimate completely ignores: >>>>>> >>>>> (1) the memory required for the restriction operator, >>>>>> >>>>> (2) the potential growth in the number of non-zeros per row due >>>>>> to Galerkin coarsening (I wished -ksp_view_pre reported the output from >>>>>> MatView so we could see the number of non-zeros required by the coarse >>>>>> level operators) >>>>>> >>>>> (3) all temporary vectors required by the CG solver, and those >>>>>> required by the smoothers. >>>>>> >>>>> (4) internal memory allocated by MatPtAP >>>>>> >>>>> (5) memory associated with IS's used within PCTelescope >>>>>> >>>>> >>>>>> >>>>> So either I am completely off in my estimates, or you have not >>>>>> carefully estimated the memory usage of your application code. Hopefully >>>>>> others might examine/correct my rough estimates >>>>>> >>>>> >>>>>> >>>>> Since I don't have your code I cannot access the latter. >>>>>> >>>>> Since I don't have access to the same machine you are running >>>>>> on, I think we need to take a step back. >>>>>> >>>>> >>>>>> >>>>> [1] What machine are you running on? Send me a URL if its >>>>>> available >>>>>> >>>>> >>>>>> >>>>> [2] What discretization are you using? (I am guessing a scalar >>>>>> 7 point FD stencil) >>>>>> >>>>> If it's a 7 point FD stencil, we should be able to examine the >>>>>> memory usage of your solver configuration using a standard, light weight >>>>>> existing PETSc example, run on your machine at the same scale. >>>>>> >>>>> This would hopefully enable us to correctly evaluate the actual >>>>>> memory usage required by the solver configuration you are using. >>>>>> >>>>> >>>>>> >>>>> Thanks, >>>>>> >>>>> Dave >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> Frank >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> On 07/08/2016 10:38 PM, Dave May wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Saturday, 9 July 2016, frank wrote: >>>>>> >>>>>> Hi Barry and Dave, >>>>>> >>>>>> >>>>>> >>>>>> Thank both of you for the advice. >>>>>> >>>>>> >>>>>> >>>>>> @Barry >>>>>> >>>>>> I made a mistake in the file names in last email. I attached >>>>>> the correct files this time. >>>>>> >>>>>> For all the three tests, 'Telescope' is used as the coarse >>>>>> preconditioner. >>>>>> >>>>>> >>>>>> >>>>>> == Test1: Grid: 1536*128*384, Process Mesh: 48*4*12 >>>>>> >>>>>> Part of the memory usage: Vector 125 124 >>>>>> 3971904 0. >>>>>> >>>>>> Matrix 101 101 >>>>>> 9462372 0 >>>>>> >>>>>> >>>>>> >>>>>> == Test2: Grid: 1536*128*384, Process Mesh: 96*8*24 >>>>>> >>>>>> Part of the memory usage: Vector 125 124 681672 >>>>>> 0. >>>>>> >>>>>> Matrix 101 101 >>>>>> 1462180 0. >>>>>> >>>>>> >>>>>> >>>>>> In theory, the memory usage in Test1 should be 8 times of >>>>>> Test2. In my case, it is about 6 times. >>>>>> >>>>>> >>>>>> >>>>>> == Test3: Grid: 3072*256*768, Process Mesh: 96*8*24. >>>>>> Sub-domain per process: 32*32*32 >>>>>> >>>>>> Here I get the out of memory error. >>>>>> >>>>>> >>>>>> >>>>>> I tried to use -mg_coarse jacobi. In this way, I don't need to >>>>>> set -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right? >>>>>> >>>>>> The linear solver didn't work in this case. Petsc output some >>>>>> errors. >>>>>> >>>>>> >>>>>> >>>>>> @Dave >>>>>> >>>>>> In test3, I use only one instance of 'Telescope'. On the >>>>>> coarse mesh of 'Telescope', I used LU as the preconditioner instead of SVD. >>>>>> >>>>>> If my set the levels correctly, then on the last coarse mesh >>>>>> of MG where it calls 'Telescope', the sub-domain per process is 2*2*2. >>>>>> >>>>>> On the last coarse mesh of 'Telescope', there is only one grid >>>>>> point per process. >>>>>> >>>>>> I still got the OOM error. The detailed petsc option file is >>>>>> attached. >>>>>> >>>>>> >>>>>> >>>>>> Do you understand the expected memory usage for the particular >>>>>> parallel LU implementation you are using? I don't (seriously). Replace LU >>>>>> with bjacobi and re-run this test. My point about solver debugging is still >>>>>> valid. >>>>>> >>>>>> >>>>>> >>>>>> And please send the result of KSPView so we can see what is >>>>>> actually used in the computations >>>>>> >>>>>> >>>>>> >>>>>> Thanks >>>>>> >>>>>> Dave >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Thank you so much. >>>>>> >>>>>> >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote: >>>>>> >>>>>> On Jul 6, 2016, at 4:19 PM, frank wrote: >>>>>> >>>>>> >>>>>> >>>>>> Hi Barry, >>>>>> >>>>>> >>>>>> >>>>>> Thank you for you advice. >>>>>> >>>>>> I tried three test. In the 1st test, the grid is 3072*256*768 >>>>>> and the process mesh is 96*8*24. >>>>>> >>>>>> The linear solver is 'cg' the preconditioner is 'mg' and >>>>>> 'telescope' is used as the preconditioner at the coarse mesh. >>>>>> >>>>>> The system gives me the "Out of Memory" error before the >>>>>> linear system is completely solved. >>>>>> >>>>>> The info from '-ksp_view_pre' is attached. I seems to me that >>>>>> the error occurs when it reaches the coarse mesh. >>>>>> >>>>>> >>>>>> >>>>>> The 2nd test uses a grid of 1536*128*384 and process mesh is >>>>>> 96*8*24. The 3rd test uses the >>>>>> same grid but a different process mesh 48*4*12. >>>>>> >>>>>> Are you sure this is right? The total matrix and vector >>>>>> memory usage goes from 2nd test >>>>>> >>>>>> Vector 384 383 8,193,712 >>>>>> 0. >>>>>> >>>>>> Matrix 103 103 11,508,688 >>>>>> 0. >>>>>> >>>>>> to 3rd test >>>>>> >>>>>> Vector 384 383 1,590,520 0. >>>>>> >>>>>> Matrix 103 103 3,508,664 >>>>>> 0. >>>>>> >>>>>> that is the memory usage got smaller but if you have only >>>>>> 1/8th the processes and the same grid it should have gotten about 8 times >>>>>> bigger. Did you maybe cut the grid by a factor of 8 also? If so that still >>>>>> doesn't explain it because the memory usage changed by a factor of 5 >>>>>> something for the vectors and 3 something for the matrices. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> The linear solver and petsc options in 2nd and 3rd tests are >>>>>> the same in 1st test. The linear solver works fine in both test. >>>>>> >>>>>> I attached the memory usage of the 2nd and 3rd tests. The >>>>>> memory info is from the option '-log_summary'. I tried to use >>>>>> '-momery_info' as you suggested, but in my case petsc treated it as an >>>>>> unused option. It output nothing about the memory. Do I need to add sth to >>>>>> my code so I can use '-memory_info'? >>>>>> >>>>>> Sorry, my mistake the option is -memory_view >>>>>> >>>>>> >>>>>> >>>>>> Can you run the one case with -memory_view and -mg_coarse >>>>>> jacobi -ksp_max_it 1 (just so it doesn't iterate forever) to see how much >>>>>> memory is used without the telescope? Also run case 2 the same way. >>>>>> >>>>>> >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> In both tests the memory usage is not large. >>>>>> >>>>>> >>>>>> >>>>>> It seems to me that it might be the 'telescope' >>>>>> preconditioner that allocated a lot of memory and caused the error in the >>>>>> 1st test. >>>>>> >>>>>> Is there is a way to show how much memory it allocated? >>>>>> >>>>>> >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> >>>>>> On 07/05/2016 03:37 PM, Barry Smith wrote: >>>>>> >>>>>> Frank, >>>>>> >>>>>> >>>>>> >>>>>> You can run with -ksp_view_pre to have it "view" the KSP >>>>>> before the solve so hopefully it gets that far. >>>>>> >>>>>> >>>>>> >>>>>> Please run the problem that does fit with -memory_info >>>>>> when the problem completes it will show the "high water mark" for PETSc >>>>>> allocated memory and total memory used. We first want to look at these >>>>>> numbers to see if it is using more memory than you expect. You could also >>>>>> run with say half the grid spacing to see how the memory usage scaled with >>>>>> the increase in grid points. Make the runs also with -log_view and send all >>>>>> the output from these options. >>>>>> >>>>>> >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>> On Jul 5, 2016, at 5:23 PM, frank wrote: >>>>>> >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> >>>>>> >>>>>> I am using the CG ksp solver and Multigrid preconditioner to >>>>>> solve a linear system in parallel. >>>>>> >>>>>> I chose to use the 'Telescope' as the preconditioner on the >>>>>> coarse mesh for its good performance. >>>>>> >>>>>> The petsc options file is attached. >>>>>> >>>>>> >>>>>> >>>>>> The domain is a 3d box. >>>>>> >>>>>> It works well when the grid is 1536*128*384 and the process >>>>>> mesh is 96*8*24. When I double the size of grid and >>>>>> keep the same process mesh and petsc options, I >>>>>> get an "out of memory" error from the super-cluster I am using. >>>>>> >>>>>> Each process has access to at least 8G memory, which should be >>>>>> more than enough for my application. I am sure that all the other parts of >>>>>> my code( except the linear solver ) do not use much memory. So I doubt if >>>>>> there is something wrong with the linear solver. >>>>>> >>>>>> The error occurs before the linear system is completely solved >>>>>> so I don't have the info from ksp view. I am not able to re-produce the >>>>>> error with a smaller problem either. >>>>>> >>>>>> In addition, I tried to use the block jacobi as the >>>>>> preconditioner with the same grid and same decomposition. The linear solver >>>>>> runs extremely slow but there is no memory error. >>>>>> >>>>>> >>>>>> >>>>>> How can I diagnose what exactly cause the error? >>>>>> >>>>>> Thank you so much. >>>>>> >>>>>> >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> >>>>>> >>>>> _options.txt> >>>>>> >>>>>> >>>>>> >>>>> >>>>>> >>>> >>>>>> >>> >>>>> emory2.txt>>>>>> tions3.txt> >>>>>> > >>>>>> >>>>>> >>>>> >>>> >>> >>> >> >> > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Fri Oct 7 02:22:27 2016 From: dave.mayhem23 at gmail.com (Dave May) Date: Fri, 7 Oct 2016 08:22:27 +0100 Subject: [petsc-users] Performance of the Telescope Multigrid Preconditioner In-Reply-To: References: <577C337B.60909@uci.edu> <5783D3E4.4020004@uci.edu> <5786C9C7.1080309@uci.edu> <5959F823-EDE5-4B34-84C2-271076977368@mcs.anl.gov> <0CFDEA05-2C49-4127-9F13-2B2DB71ADA77@mcs.anl.gov> <27f4756a-3c58-5c56-fd5b-000aac881a5b@uci.edu> <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> Message-ID: On 7 October 2016 at 02:05, Matthew Knepley wrote: > On Thu, Oct 6, 2016 at 7:33 PM, frank wrote: > >> Dear Dave, >> Follow your advice, I solve the identical equation twice and time two >> steps separately. The result is below: >> >> Test: 1024^3 grid points >> Cores# reduction factor MG levels# time of 1st solve 2nd time >> 4096 64 6 + 3 >> 3.85 1.75 >> 8192 128 5 + 3 >> 5.52 0.91 >> 16384 256 5 + 3 5.37 >> 0.52 >> 32768 512 5 + 4 3.03 >> 0.36 >> 32768 64 | 8 4 | 3 | 3 2.80 >> 0.43 >> 65536 1024 5 + 4 3.38 >> 0.59 >> 65536 32 | 32 4 | 4 | 3 2.14 >> 0.22 >> >> I also attached the log_view info from all the run. The file is names by >> the cores# + reduction factor. >> The ksp_view and petsc_options for the 1st run are also included. Others >> are similar. The only differences are the reduction factor and mg levels. >> >> ** The time for the 1st solve is generally much larger. Is this because >> the ksp solver on the sub-communicator is set up during the 1st solve? >> > Yes, but it's not just the setup for the KSP on the sub-comm. There is additional setup required, [1] creating the sub-comm [2] creating the DM on the sub-comm [3] creating the scatter objects and nullspaces [3] repartitioning the matrix > > All setup is done in the first solve. > > >> ** The time for 1st solve does not scale. >> In practice, I am solving a variable coefficient Poisson equation. I >> need to build the matrix every time step. Therefore, each step is similar >> to the 1st solve which does not scale. Is there a way I can improve the >> performance? >> > > You could use rediscretization instead of Galerkin to produce the coarse > operators. > Yes I can think of one option for improved performance, but I cannot tell whether it will be beneficial because the logging isn't sufficiently fine grained (and there is no easy way to get the info out of petsc). I use PtAP to repartition the matrix, this could be consuming most of the setup time in Telescope with your run. Such a repartitioning could be avoid if you provided a method to create the operator on the coarse levels (what Matt is suggesting). However, this requires you to be able to define your coefficients on the coarse grid. This will most likely reduce setup time, but your coarse grid operators (now re-discretized) are likely to be less effective than those generated via Galerkin coarsening. > > >> ** The 2nd solve scales but not quite well for more than 16384 cores. >> > > How well were you looking for? This is strong scaling, which is has an > Amdahl's Law limit. > Is 1024^3 points your target (production run) resolution? If it is not, then start doing the tests with your target resolution. Setup time cf the solve time will always smaller and impact the scaling less when you consider higher resolution problems. > > >> It seems to me that the performance depends on the tuning of MG >> levels on the sub-communicator(s). >> > Yes - absolutely. > Is there some general strategies regarding how to distribute the >> levels? or when to use multiple sub-communicators ? >> > Yes, but there is nothing definite. We don't have a performance model to guide these choices. The optimal choice is dependent on the characteristics of your compute nodes, the network, the form of the discrete operator, and the mesh refinement factor used when creating the MG hierarchy. It's a bit complicated. I have found when using meshes with a refinement factor of 2, using a reduction factor of 64 within telescope is effective. I would suggest experimenting with the refinement factor. If your coefficients are smooth, you can probably refine your mesh for MG by a factor of 4 (rather than the default of 2). Galerkin will still provide meaningful coarse grid operators. Always coarsen the problem until you have ~1 DOF per core before reparititon the operator via Telescope. Don't use a reduction factor which will only allow 1 new additional MG level to be defined on the sub-comm. e.g. if you use meshes refined by 2x, on the coarse level, use a reduction factor of 64. Without a performance model, the optimal level to invoke repartitioning and how aggressively the communicator size is reduced by cannot be determined apriori. Experimentation is the only way. > > > Also, you use CG/MG when FMG by itself would probably be faster. Your > smoother is likely not strong enough, and you > should use something like V(2,2). There is a lot of tuning that is > possible, but difficult to automate. > Matt's completely correct. If we could automate this in a meaningful manner, we would have done so. Thanks, Dave > > Thanks, > > Matt > > >> Thank you. >> >> Regards, >> Frank >> >> >> >> >> >> On 10/04/2016 12:56 PM, Dave May wrote: >> >> >> >> On Tuesday, 4 October 2016, frank wrote: >> >>> Hi, >>> This question is follow-up of the thread "Question about memory usage in >>> Multigrid preconditioner". >>> I used to have the "Out of Memory(OOM)" problem when using the >>> CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0; >>> -matptap_scalable" option did solve that problem. >>> >>> Then I test the scalability by solving a 3d poisson eqn for 1 step. I >>> used one sub-communicator in all the tests. The difference between the >>> petsc options in those tests are: 1 the pc_telescope_reduction_factor; 2 >>> the number of multigrid levels in the up/down solver. The function >>> "ksp_solve" is timed. It is kind of slow and doesn't scale at all. >>> >>> Test1: 512^3 grid points >>> Core# telescope_reduction_factor MG levels# for up/down >>> solver Time for KSPSolve (s) >>> 512 8 4 / >>> 3 6.2466 >>> 4096 64 5 / >>> 3 0.9361 >>> 32768 64 4 / >>> 3 4.8914 >>> >>> Test2: 1024^3 grid points >>> Core# telescope_reduction_factor MG levels# for up/down >>> solver Time for KSPSolve (s) >>> 4096 64 5 / 4 >>> 3.4139 >>> 8192 128 5 / >>> 4 2.4196 >>> 16384 32 5 / 3 >>> 5.4150 >>> 32768 64 5 / >>> 3 5.6067 >>> 65536 128 5 / >>> 3 6.5219 >>> >> >> You have to be very careful how you interpret these numbers. Your solver >> contains nested calls to KSPSolve, and unfortunately as a result the >> numbers you report include setup time. This will remain true even if you >> call KSPSetUp on the outermost KSP. >> >> Your email concerns scalability of the silver application, so let's focus >> on that issue. >> >> The only way to clearly separate setup from solve time is to perform two >> identical solves. The second solve will not require any setup. You should >> monitor the second solve via a new PetscStage. >> >> This was what I did in the telescope paper. It was the only way to >> understand the setup cost (and scaling) cf the solve time (and scaling). >> >> Thanks >> Dave >> >> >> >>> I guess I didn't set the MG levels properly. What would be the efficient >>> way to arrange the MG levels? >>> Also which preconditionr at the coarse mesh of the 2nd communicator >>> should I use to improve the performance? >>> >>> I attached the test code and the petsc options file for the 1024^3 cube >>> with 32768 cores. >>> >>> Thank you. >>> >>> Regards, >>> Frank >>> >>> >>> >>> >>> >>> >>> On 09/15/2016 03:35 AM, Dave May wrote: >>> >>> HI all, >>> >>> I the only unexpected memory usage I can see is associated with the call >>> to MatPtAP(). >>> Here is something you can try immediately. >>> Run your code with the additional options >>> -matrap 0 -matptap_scalable >>> >>> I didn't realize this before, but the default behaviour of MatPtAP in >>> parallel is actually to to explicitly form the transpose of P (e.g. >>> assemble R = P^T) and then compute R.A.P. >>> You don't want to do this. The option -matrap 0 resolves this issue. >>> >>> The implementation of P^T.A.P has two variants. >>> The scalable implementation (with respect to memory usage) is selected >>> via the second option -matptap_scalable. >>> >>> Try it out - I see a significant memory reduction using these options >>> for particular mesh sizes / partitions. >>> >>> I've attached a cleaned up version of the code you sent me. >>> There were a number of memory leaks and other issues. >>> The main points being >>> * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End} >>> * You should call PetscFinalize(), otherwise the option -log_summary >>> (-log_view) will not display anything once the program has completed. >>> >>> >>> Thanks, >>> Dave >>> >>> >>> On 15 September 2016 at 08:03, Hengjie Wang wrote: >>> >>>> Hi Dave, >>>> >>>> Sorry, I should have put more comment to explain the code. >>>> The number of process in each dimension is the same: Px = Py=Pz=P. So >>>> is the domain size. >>>> So if the you want to run the code for a 512^3 grid points on 16^3 >>>> cores, you need to set "-N 512 -P 16" in the command line. >>>> I add more comments and also fix an error in the attached code. ( The >>>> error only effects the accuracy of solution but not the memory usage. ) >>>> >>>> Thank you. >>>> Frank >>>> >>>> >>>> On 9/14/2016 9:05 PM, Dave May wrote: >>>> >>>> >>>> >>>> On Thursday, 15 September 2016, Dave May >>>> wrote: >>>> >>>>> >>>>> >>>>> On Thursday, 15 September 2016, frank wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I write a simple code to re-produce the error. I hope this can help >>>>>> to diagnose the problem. >>>>>> The code just solves a 3d poisson equation. >>>>>> >>>>> >>>>> Why is the stencil width a runtime parameter?? And why is the default >>>>> value 2? For 7-pnt FD Laplace, you only need a stencil width of 1. >>>>> >>>>> Was this choice made to mimic something in the real application code? >>>>> >>>> >>>> Please ignore - I misunderstood your usage of the param set by -P >>>> >>>> >>>>> >>>>> >>>>>> >>>>>> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * >>>>>> 32. That's when I re-produce the OOM error. Each core has about 2G memory. >>>>>> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The >>>>>> ksp solver works fine. >>>>>> I attached the code, ksp_view_pre's output and my petsc option file. >>>>>> >>>>>> Thank you. >>>>>> Frank >>>>>> >>>>>> On 09/09/2016 06:38 PM, Hengjie Wang wrote: >>>>>> >>>>>> Hi Barry, >>>>>> >>>>>> I checked. On the supercomputer, I had the option "-ksp_view_pre" but >>>>>> it is not in file I sent you. I am sorry for the confusion. >>>>>> >>>>>> Regards, >>>>>> Frank >>>>>> >>>>>> On Friday, September 9, 2016, Barry Smith wrote: >>>>>> >>>>>>> >>>>>>> > On Sep 9, 2016, at 3:11 PM, frank wrote: >>>>>>> > >>>>>>> > Hi Barry, >>>>>>> > >>>>>>> > I think the first KSP view output is from -ksp_view_pre. Before I >>>>>>> submitted the test, I was not sure whether there would be OOM error or not. >>>>>>> So I added both -ksp_view_pre and -ksp_view. >>>>>>> >>>>>>> But the options file you sent specifically does NOT list the >>>>>>> -ksp_view_pre so how could it be from that? >>>>>>> >>>>>>> Sorry to be pedantic but I've spent too much time in the past >>>>>>> trying to debug from incorrect information and want to make sure that the >>>>>>> information I have is correct before thinking. Please recheck exactly what >>>>>>> happened. Rerun with the exact input file you emailed if that is needed. >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> > >>>>>>> > Frank >>>>>>> > >>>>>>> > >>>>>>> > On 09/09/2016 12:38 PM, Barry Smith wrote: >>>>>>> >> Why does ksp_view2.txt have two KSP views in it while >>>>>>> ksp_view1.txt has only one KSPView in it? Did you run two different solves >>>>>>> in the 2 case but not the one? >>>>>>> >> >>>>>>> >> Barry >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >>> On Sep 9, 2016, at 10:56 AM, frank wrote: >>>>>>> >>> >>>>>>> >>> Hi, >>>>>>> >>> >>>>>>> >>> I want to continue digging into the memory problem here. >>>>>>> >>> I did find a work around in the past, which is to use less cores >>>>>>> per node so that each core has 8G memory. However this is deficient and >>>>>>> expensive. I hope to locate the place that uses the most memory. >>>>>>> >>> >>>>>>> >>> Here is a brief summary of the tests I did in past: >>>>>>> >>>> Test1: Mesh 1536*128*384 | Process Mesh 48*4*12 >>>>>>> >>> Maximum (over computational time) process memory: >>>>>>> total 7.0727e+08 >>>>>>> >>> Current process memory: >>>>>>> total 7.0727e+08 >>>>>>> >>> Maximum (over computational time) space PetscMalloc()ed: total >>>>>>> 6.3908e+11 >>>>>>> >>> Current space PetscMalloc()ed: >>>>>>> total 1.8275e+09 >>>>>>> >>> >>>>>>> >>>> Test2: Mesh 1536*128*384 | Process Mesh 96*8*24 >>>>>>> >>> Maximum (over computational time) process memory: >>>>>>> total 5.9431e+09 >>>>>>> >>> Current process memory: >>>>>>> total 5.9431e+09 >>>>>>> >>> Maximum (over computational time) space PetscMalloc()ed: total >>>>>>> 5.3202e+12 >>>>>>> >>> Current space PetscMalloc()ed: >>>>>>> total 5.4844e+09 >>>>>>> >>> >>>>>>> >>>> Test3: Mesh 3072*256*768 | Process Mesh 96*8*24 >>>>>>> >>> OOM( Out Of Memory ) killer of the supercomputer terminated >>>>>>> the job during "KSPSolve". >>>>>>> >>> >>>>>>> >>> I attached the output of ksp_view( the third test's output is >>>>>>> from ksp_view_pre ), memory_view and also the petsc options. >>>>>>> >>> >>>>>>> >>> In all the tests, each core can access about 2G memory. In >>>>>>> test3, there are 4223139840 non-zeros in the matrix. This will consume >>>>>>> about 1.74M, using double precision. Considering some extra memory used to >>>>>>> store integer index, 2G memory should still be way enough. >>>>>>> >>> >>>>>>> >>> Is there a way to find out which part of KSPSolve uses the most >>>>>>> memory? >>>>>>> >>> Thank you so much. >>>>>>> >>> >>>>>>> >>> BTW, there are 4 options remains unused and I don't understand >>>>>>> why they are omitted: >>>>>>> >>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly >>>>>>> >>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi >>>>>>> >>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1 >>>>>>> >>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> Regards, >>>>>>> >>> Frank >>>>>>> >>> >>>>>>> >>> On 07/13/2016 05:47 PM, Dave May wrote: >>>>>>> >>>> >>>>>>> >>>> On 14 July 2016 at 01:07, frank wrote: >>>>>>> >>>> Hi Dave, >>>>>>> >>>> >>>>>>> >>>> Sorry for the late reply. >>>>>>> >>>> Thank you so much for your detailed reply. >>>>>>> >>>> >>>>>>> >>>> I have a question about the estimation of the memory usage. >>>>>>> There are 4223139840 allocated non-zeros and 18432 MPI processes. Double >>>>>>> precision is used. So the memory per process is: >>>>>>> >>>> 4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ? >>>>>>> >>>> Did I do sth wrong here? Because this seems too small. >>>>>>> >>>> >>>>>>> >>>> No - I totally f***ed it up. You are correct. That'll teach me >>>>>>> for fumbling around with my iphone calculator and not using my brain. (Note >>>>>>> that to convert to MB just divide by 1e6, not 1024^2 - although I >>>>>>> apparently cannot convert between units correctly....) >>>>>>> >>>> >>>>>>> >>>> From the PETSc objects associated with the solver, It looks >>>>>>> like it _should_ run with 2GB per MPI rank. Sorry for my mistake. >>>>>>> Possibilities are: somewhere in your usage of PETSc you've introduced a >>>>>>> memory leak; PETSc is doing a huge over allocation (e.g. as per our >>>>>>> discussion of MatPtAP); or in your application code there are other objects >>>>>>> you have forgotten to log the memory for. >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> I am running this job on Bluewater >>>>>>> >>>> I am using the 7 points FD stencil in 3D. >>>>>>> >>>> >>>>>>> >>>> I thought so on both counts. >>>>>>> >>>> >>>>>>> >>>> I apologize that I made a stupid mistake in computing the >>>>>>> memory per core. My settings render each core can access only 2G memory on >>>>>>> average instead of 8G which I mentioned in previous email. I re-run the job >>>>>>> with 8G memory per core on average and there is no "Out Of Memory" error. I >>>>>>> would do more test to see if there is still some memory issue. >>>>>>> >>>> >>>>>>> >>>> Ok. I'd still like to know where the memory was being used >>>>>>> since my estimates were off. >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> Thanks, >>>>>>> >>>> Dave >>>>>>> >>>> >>>>>>> >>>> Regards, >>>>>>> >>>> Frank >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> On 07/11/2016 01:18 PM, Dave May wrote: >>>>>>> >>>>> Hi Frank, >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> >>>>> On 11 July 2016 at 19:14, frank wrote: >>>>>>> >>>>> Hi Dave, >>>>>>> >>>>> >>>>>>> >>>>> I re-run the test using bjacobi as the preconditioner on the >>>>>>> coarse mesh of telescope. The Grid is 3072*256*768 and process mesh is >>>>>>> 96*8*24. The petsc option file is attached. >>>>>>> >>>>> I still got the "Out Of Memory" error. The error occurred >>>>>>> before the linear solver finished one step. So I don't have the full info >>>>>>> from ksp_view. The info from ksp_view_pre is attached. >>>>>>> >>>>> >>>>>>> >>>>> Okay - that is essentially useless (sorry) >>>>>>> >>>>> >>>>>>> >>>>> It seems to me that the error occurred when the decomposition >>>>>>> was going to be changed. >>>>>>> >>>>> >>>>>>> >>>>> Based on what information? >>>>>>> >>>>> Running with -info would give us more clues, but will create a >>>>>>> ton of output. >>>>>>> >>>>> Please try running the case which failed with -info >>>>>>> >>>>> I had another test with a grid of 1536*128*384 and the same >>>>>>> process mesh as above. There was no error. The ksp_view info is attached >>>>>>> for comparison. >>>>>>> >>>>> Thank you. >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> >>>>> [3] Here is my crude estimate of your memory usage. >>>>>>> >>>>> I'll target the biggest memory hogs only to get an order of >>>>>>> magnitude estimate >>>>>>> >>>>> >>>>>>> >>>>> * The Fine grid operator contains 4223139840 non-zeros --> 1.8 >>>>>>> GB per MPI rank assuming double precision. >>>>>>> >>>>> The indices for the AIJ could amount to another 0.3 GB >>>>>>> (assuming 32 bit integers) >>>>>>> >>>>> >>>>>>> >>>>> * You use 5 levels of coarsening, so the other operators >>>>>>> should represent (collectively) >>>>>>> >>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4 ~ 300 MB per MPI rank >>>>>>> on the communicator with 18432 ranks. >>>>>>> >>>>> The coarse grid should consume ~ 0.5 MB per MPI rank on the >>>>>>> communicator with 18432 ranks. >>>>>>> >>>>> >>>>>>> >>>>> * You use a reduction factor of 64, making the new >>>>>>> communicator with 288 MPI ranks. >>>>>>> >>>>> PCTelescope will first gather a temporary matrix associated >>>>>>> with your coarse level operator assuming a comm size of 288 living on the >>>>>>> comm with size 18432. >>>>>>> >>>>> This matrix will require approximately 0.5 * 64 = 32 MB per >>>>>>> core on the 288 ranks. >>>>>>> >>>>> This matrix is then used to form a new MPIAIJ matrix on the >>>>>>> subcomm, thus require another 32 MB per rank. >>>>>>> >>>>> The temporary matrix is now destroyed. >>>>>>> >>>>> >>>>>>> >>>>> * Because a DMDA is detected, a permutation matrix is >>>>>>> assembled. >>>>>>> >>>>> This requires 2 doubles per point in the DMDA. >>>>>>> >>>>> Your coarse DMDA contains 92 x 16 x 48 points. >>>>>>> >>>>> Thus the permutation matrix will require < 1 MB per MPI rank >>>>>>> on the sub-comm. >>>>>>> >>>>> >>>>>>> >>>>> * Lastly, the matrix is permuted. This uses MatPtAP(), but the >>>>>>> resulting operator will have the same memory footprint as the unpermuted >>>>>>> matrix (32 MB). At any stage in PCTelescope, only 2 operators of size 32 MB >>>>>>> are held in memory when the DMDA is provided. >>>>>>> >>>>> >>>>>>> >>>>> From my rough estimates, the worst case memory foot print for >>>>>>> any given core, given your options is approximately >>>>>>> >>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB = 2465 MB >>>>>>> >>>>> This is way below 8 GB. >>>>>>> >>>>> >>>>>>> >>>>> Note this estimate completely ignores: >>>>>>> >>>>> (1) the memory required for the restriction operator, >>>>>>> >>>>> (2) the potential growth in the number of non-zeros per row >>>>>>> due to Galerkin coarsening (I wished -ksp_view_pre reported the output from >>>>>>> MatView so we could see the number of non-zeros required by the coarse >>>>>>> level operators) >>>>>>> >>>>> (3) all temporary vectors required by the CG solver, and those >>>>>>> required by the smoothers. >>>>>>> >>>>> (4) internal memory allocated by MatPtAP >>>>>>> >>>>> (5) memory associated with IS's used within PCTelescope >>>>>>> >>>>> >>>>>>> >>>>> So either I am completely off in my estimates, or you have not >>>>>>> carefully estimated the memory usage of your application code. Hopefully >>>>>>> others might examine/correct my rough estimates >>>>>>> >>>>> >>>>>>> >>>>> Since I don't have your code I cannot access the latter. >>>>>>> >>>>> Since I don't have access to the same machine you are running >>>>>>> on, I think we need to take a step back. >>>>>>> >>>>> >>>>>>> >>>>> [1] What machine are you running on? Send me a URL if its >>>>>>> available >>>>>>> >>>>> >>>>>>> >>>>> [2] What discretization are you using? (I am guessing a scalar >>>>>>> 7 point FD stencil) >>>>>>> >>>>> If it's a 7 point FD stencil, we should be able to examine the >>>>>>> memory usage of your solver configuration using a standard, light weight >>>>>>> existing PETSc example, run on your machine at the same scale. >>>>>>> >>>>> This would hopefully enable us to correctly evaluate the >>>>>>> actual memory usage required by the solver configuration you are using. >>>>>>> >>>>> >>>>>>> >>>>> Thanks, >>>>>>> >>>>> Dave >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> >>>>> Frank >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> >>>>> On 07/08/2016 10:38 PM, Dave May wrote: >>>>>>> >>>>>> >>>>>>> >>>>>> On Saturday, 9 July 2016, frank wrote: >>>>>>> >>>>>> Hi Barry and Dave, >>>>>>> >>>>>> >>>>>>> >>>>>> Thank both of you for the advice. >>>>>>> >>>>>> >>>>>>> >>>>>> @Barry >>>>>>> >>>>>> I made a mistake in the file names in last email. I attached >>>>>>> the correct files this time. >>>>>>> >>>>>> For all the three tests, 'Telescope' is used as the coarse >>>>>>> preconditioner. >>>>>>> >>>>>> >>>>>>> >>>>>> == Test1: Grid: 1536*128*384, Process Mesh: 48*4*12 >>>>>>> >>>>>> Part of the memory usage: Vector 125 124 >>>>>>> 3971904 0. >>>>>>> >>>>>> Matrix 101 >>>>>>> 101 9462372 0 >>>>>>> >>>>>> >>>>>>> >>>>>> == Test2: Grid: 1536*128*384, Process Mesh: 96*8*24 >>>>>>> >>>>>> Part of the memory usage: Vector 125 124 >>>>>>> 681672 0. >>>>>>> >>>>>> Matrix 101 >>>>>>> 101 1462180 0. >>>>>>> >>>>>> >>>>>>> >>>>>> In theory, the memory usage in Test1 should be 8 times of >>>>>>> Test2. In my case, it is about 6 times. >>>>>>> >>>>>> >>>>>>> >>>>>> == Test3: Grid: 3072*256*768, Process Mesh: 96*8*24. >>>>>>> Sub-domain per process: 32*32*32 >>>>>>> >>>>>> Here I get the out of memory error. >>>>>>> >>>>>> >>>>>>> >>>>>> I tried to use -mg_coarse jacobi. In this way, I don't need >>>>>>> to set -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right? >>>>>>> >>>>>> The linear solver didn't work in this case. Petsc output some >>>>>>> errors. >>>>>>> >>>>>> >>>>>>> >>>>>> @Dave >>>>>>> >>>>>> In test3, I use only one instance of 'Telescope'. On the >>>>>>> coarse mesh of 'Telescope', I used LU as the preconditioner instead of SVD. >>>>>>> >>>>>> If my set the levels correctly, then on the last coarse mesh >>>>>>> of MG where it calls 'Telescope', the sub-domain per process is 2*2*2. >>>>>>> >>>>>> On the last coarse mesh of 'Telescope', there is only one >>>>>>> grid point per process. >>>>>>> >>>>>> I still got the OOM error. The detailed petsc option file is >>>>>>> attached. >>>>>>> >>>>>> >>>>>>> >>>>>> Do you understand the expected memory usage for the >>>>>>> particular parallel LU implementation you are using? I don't (seriously). >>>>>>> Replace LU with bjacobi and re-run this test. My point about solver >>>>>>> debugging is still valid. >>>>>>> >>>>>> >>>>>>> >>>>>> And please send the result of KSPView so we can see what is >>>>>>> actually used in the computations >>>>>>> >>>>>> >>>>>>> >>>>>> Thanks >>>>>>> >>>>>> Dave >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> Thank you so much. >>>>>>> >>>>>> >>>>>>> >>>>>> Frank >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote: >>>>>>> >>>>>> On Jul 6, 2016, at 4:19 PM, frank wrote: >>>>>>> >>>>>> >>>>>>> >>>>>> Hi Barry, >>>>>>> >>>>>> >>>>>>> >>>>>> Thank you for you advice. >>>>>>> >>>>>> I tried three test. In the 1st test, the grid is 3072*256*768 >>>>>>> and the process mesh is 96*8*24. >>>>>>> >>>>>> The linear solver is 'cg' the preconditioner is 'mg' and >>>>>>> 'telescope' is used as the preconditioner at the coarse mesh. >>>>>>> >>>>>> The system gives me the "Out of Memory" error before the >>>>>>> linear system is completely solved. >>>>>>> >>>>>> The info from '-ksp_view_pre' is attached. I seems to me that >>>>>>> the error occurs when it reaches the coarse mesh. >>>>>>> >>>>>> >>>>>>> >>>>>> The 2nd test uses a grid of 1536*128*384 and process mesh is >>>>>>> 96*8*24. The 3rd test uses the >>>>>>> same grid but a different process mesh 48*4*12. >>>>>>> >>>>>> Are you sure this is right? The total matrix and vector >>>>>>> memory usage goes from 2nd test >>>>>>> >>>>>> Vector 384 383 8,193,712 >>>>>>> 0. >>>>>>> >>>>>> Matrix 103 103 11,508,688 >>>>>>> 0. >>>>>>> >>>>>> to 3rd test >>>>>>> >>>>>> Vector 384 383 1,590,520 >>>>>>> 0. >>>>>>> >>>>>> Matrix 103 103 3,508,664 >>>>>>> 0. >>>>>>> >>>>>> that is the memory usage got smaller but if you have only >>>>>>> 1/8th the processes and the same grid it should have gotten about 8 times >>>>>>> bigger. Did you maybe cut the grid by a factor of 8 also? If so that still >>>>>>> doesn't explain it because the memory usage changed by a factor of 5 >>>>>>> something for the vectors and 3 something for the matrices. >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> The linear solver and petsc options in 2nd and 3rd tests are >>>>>>> the same in 1st test. The linear solver works fine in both test. >>>>>>> >>>>>> I attached the memory usage of the 2nd and 3rd tests. The >>>>>>> memory info is from the option '-log_summary'. I tried to use >>>>>>> '-momery_info' as you suggested, but in my case petsc treated it as an >>>>>>> unused option. It output nothing about the memory. Do I need to add sth to >>>>>>> my code so I can use '-memory_info'? >>>>>>> >>>>>> Sorry, my mistake the option is -memory_view >>>>>>> >>>>>> >>>>>>> >>>>>> Can you run the one case with -memory_view and -mg_coarse >>>>>>> jacobi -ksp_max_it 1 (just so it doesn't iterate forever) to see how much >>>>>>> memory is used without the telescope? Also run case 2 the same way. >>>>>>> >>>>>> >>>>>>> >>>>>> Barry >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> In both tests the memory usage is not large. >>>>>>> >>>>>> >>>>>>> >>>>>> It seems to me that it might be the 'telescope' >>>>>>> preconditioner that allocated a lot of memory and caused the error in the >>>>>>> 1st test. >>>>>>> >>>>>> Is there is a way to show how much memory it allocated? >>>>>>> >>>>>> >>>>>>> >>>>>> Frank >>>>>>> >>>>>> >>>>>>> >>>>>> On 07/05/2016 03:37 PM, Barry Smith wrote: >>>>>>> >>>>>> Frank, >>>>>>> >>>>>> >>>>>>> >>>>>> You can run with -ksp_view_pre to have it "view" the KSP >>>>>>> before the solve so hopefully it gets that far. >>>>>>> >>>>>> >>>>>>> >>>>>> Please run the problem that does fit with -memory_info >>>>>>> when the problem completes it will show the "high water mark" for PETSc >>>>>>> allocated memory and total memory used. We first want to look at these >>>>>>> numbers to see if it is using more memory than you expect. You could also >>>>>>> run with say half the grid spacing to see how the memory usage scaled with >>>>>>> the increase in grid points. Make the runs also with -log_view and send all >>>>>>> the output from these options. >>>>>>> >>>>>> >>>>>>> >>>>>> Barry >>>>>>> >>>>>> >>>>>>> >>>>>> On Jul 5, 2016, at 5:23 PM, frank wrote: >>>>>>> >>>>>> >>>>>>> >>>>>> Hi, >>>>>>> >>>>>> >>>>>>> >>>>>> I am using the CG ksp solver and Multigrid preconditioner to >>>>>>> solve a linear system in parallel. >>>>>>> >>>>>> I chose to use the 'Telescope' as the preconditioner on the >>>>>>> coarse mesh for its good performance. >>>>>>> >>>>>> The petsc options file is attached. >>>>>>> >>>>>> >>>>>>> >>>>>> The domain is a 3d box. >>>>>>> >>>>>> It works well when the grid is 1536*128*384 and the process >>>>>>> mesh is 96*8*24. When I double the size of grid and >>>>>>> keep the same process mesh and petsc options, I >>>>>>> get an "out of memory" error from the super-cluster I am using. >>>>>>> >>>>>> Each process has access to at least 8G memory, which should >>>>>>> be more than enough for my application. I am sure that all the other parts >>>>>>> of my code( except the linear solver ) do not use much memory. So I doubt >>>>>>> if there is something wrong with the linear solver. >>>>>>> >>>>>> The error occurs before the linear system is completely >>>>>>> solved so I don't have the info from ksp view. I am not able to re-produce >>>>>>> the error with a smaller problem either. >>>>>>> >>>>>> In addition, I tried to use the block jacobi as the >>>>>>> preconditioner with the same grid and same decomposition. The linear solver >>>>>>> runs extremely slow but there is no memory error. >>>>>>> >>>>>> >>>>>>> >>>>>> How can I diagnose what exactly cause the error? >>>>>>> >>>>>> Thank you so much. >>>>>>> >>>>>> >>>>>>> >>>>>> Frank >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>> _options.txt> >>>>>>> >>>>>> >>>>>>> >>>>> >>>>>>> >>>> >>>>>>> >>> >>>>>> emory2.txt>>>>>>> tions3.txt> >>>>>>> > >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From popov at uni-mainz.de Fri Oct 7 10:00:35 2016 From: popov at uni-mainz.de (Anton Popov) Date: Fri, 7 Oct 2016 17:00:35 +0200 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 Message-ID: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> Hi guys, are there any news about fixing buggy behavior of SuperLU_DIST, exactly what is described here: http://lists.mcs.anl.gov/pipermail/petsc-users/2015-August/026802.html ? I'm using 3.7.4 and still get SEGV in pdgssvx routine. Everything works fine with 3.5.4. Do I still have to stick to maint branch, and what are the chances for these fixes to be included in 3.7.5? Thanks, Anton From balay at mcs.anl.gov Fri Oct 7 10:04:50 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 7 Oct 2016 10:04:50 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> Message-ID: On Fri, 7 Oct 2016, Anton Popov wrote: > Hi guys, > > are there any news about fixing buggy behavior of SuperLU_DIST, exactly what > is described here: > > http://lists.mcs.anl.gov/pipermail/petsc-users/2015-August/026802.html ? > > I'm using 3.7.4 and still get SEGV in pdgssvx routine. Everything works fine > with 3.5.4. > > Do I still have to stick to maint branch, and what are the chances for these > fixes to be included in 3.7.5? 3.7.4. is off maint branch [as of a week ago]. So if you are seeing issues with it - its best to debug and figure out the cause. Satish From fande.kong at inl.gov Fri Oct 7 10:16:17 2016 From: fande.kong at inl.gov (Kong, Fande) Date: Fri, 7 Oct 2016 09:16:17 -0600 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> Message-ID: On Fri, Oct 7, 2016 at 9:04 AM, Satish Balay wrote: > On Fri, 7 Oct 2016, Anton Popov wrote: > > > Hi guys, > > > > are there any news about fixing buggy behavior of SuperLU_DIST, exactly > what > > is described here: > > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists. > mcs.anl.gov_pipermail_petsc-2Dusers_2015-2DAugust_026802.html&d=CwIBAg&c= > 54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_ > JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=RwruX6ckX0t9H89Z6LXKBfJBOAM2vG > 1sQHw2tIsSQtA&s=bbB62oGLm582JebVs8xsUej_OX0eUwibAKsRRWKafos&e= ? > > > > I'm using 3.7.4 and still get SEGV in pdgssvx routine. Everything works > fine > > with 3.5.4. > > > > Do I still have to stick to maint branch, and what are the chances for > these > > fixes to be included in 3.7.5? > > 3.7.4. is off maint branch [as of a week ago]. So if you are seeing > issues with it - its best to debug and figure out the cause. > This bug is indeed inside of superlu_dist, and we started having this issue from PETSc-3.6.x. I think superlu_dist developers should have fixed this bug. We forgot to update superlu_dist?? This is not a thing users could debug and fix. I have many people in INL suffering from this issue, and they have to stay with PETSc-3.5.4 to use superlu_dist. Fande > > Satish > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Oct 7 10:18:30 2016 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 7 Oct 2016 10:18:30 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> Message-ID: On Fri, Oct 7, 2016 at 10:16 AM, Kong, Fande wrote: > On Fri, Oct 7, 2016 at 9:04 AM, Satish Balay wrote: > >> On Fri, 7 Oct 2016, Anton Popov wrote: >> >> > Hi guys, >> > >> > are there any news about fixing buggy behavior of SuperLU_DIST, exactly >> what >> > is described here: >> > >> > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.mc >> s.anl.gov_pipermail_petsc-2Dusers_2015-2DAugust_026802.html& >> d=CwIBAg&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r= >> DUUt3SRGI0_JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=RwruX6ckX0t9H8 >> 9Z6LXKBfJBOAM2vG1sQHw2tIsSQtA&s=bbB62oGLm582JebVs8xsUej_OX0e >> UwibAKsRRWKafos&e= ? >> > >> > I'm using 3.7.4 and still get SEGV in pdgssvx routine. Everything works >> fine >> > with 3.5.4. >> > >> > Do I still have to stick to maint branch, and what are the chances for >> these >> > fixes to be included in 3.7.5? >> >> 3.7.4. is off maint branch [as of a week ago]. So if you are seeing >> issues with it - its best to debug and figure out the cause. >> > > This bug is indeed inside of superlu_dist, and we started having this > issue from PETSc-3.6.x. I think superlu_dist developers should have fixed > this bug. We forgot to update superlu_dist?? This is not a thing users > could debug and fix. > > I have many people in INL suffering from this issue, and they have to stay > with PETSc-3.5.4 to use superlu_dist. > Do you have this bug with the latest maint? Matt > Fande > > > >> >> Satish >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Oct 7 10:23:34 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 7 Oct 2016 10:23:34 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> Message-ID: On Fri, 7 Oct 2016, Kong, Fande wrote: > On Fri, Oct 7, 2016 at 9:04 AM, Satish Balay wrote: > > > On Fri, 7 Oct 2016, Anton Popov wrote: > > > > > Hi guys, > > > > > > are there any news about fixing buggy behavior of SuperLU_DIST, exactly > > what > > > is described here: > > > > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists. > > mcs.anl.gov_pipermail_petsc-2Dusers_2015-2DAugust_026802.html&d=CwIBAg&c= > > 54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_ > > JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=RwruX6ckX0t9H89Z6LXKBfJBOAM2vG > > 1sQHw2tIsSQtA&s=bbB62oGLm582JebVs8xsUej_OX0eUwibAKsRRWKafos&e= ? > > > > > > I'm using 3.7.4 and still get SEGV in pdgssvx routine. Everything works > > fine > > > with 3.5.4. > > > > > > Do I still have to stick to maint branch, and what are the chances for > > these > > > fixes to be included in 3.7.5? > > > > 3.7.4. is off maint branch [as of a week ago]. So if you are seeing > > issues with it - its best to debug and figure out the cause. > > > > This bug is indeed inside of superlu_dist, and we started having this issue > from PETSc-3.6.x. I think superlu_dist developers should have fixed this > bug. We forgot to update superlu_dist?? This is not a thing users could > debug and fix. > > I have many people in INL suffering from this issue, and they have to stay > with PETSc-3.5.4 to use superlu_dist. To verify if the bug is fixed in latest superlu_dist - you can try [assuming you have git - either from petsc-3.7/maint/master]: --download-superlu_dist --download-superlu_dist-commit=origin/maint Satish From bsmith at mcs.anl.gov Fri Oct 7 13:01:55 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 7 Oct 2016 13:01:55 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> Message-ID: <216FF93B-A932-4A60-B1C3-0272AB282E1E@mcs.anl.gov> Fande, If you can reproduce the problem with PETSc 3.7.4 please send us sample code that produces it so we can work with Sherry to get it fixed ASAP. Barry > On Oct 7, 2016, at 10:23 AM, Satish Balay wrote: > > On Fri, 7 Oct 2016, Kong, Fande wrote: > >> On Fri, Oct 7, 2016 at 9:04 AM, Satish Balay wrote: >> >>> On Fri, 7 Oct 2016, Anton Popov wrote: >>> >>>> Hi guys, >>>> >>>> are there any news about fixing buggy behavior of SuperLU_DIST, exactly >>> what >>>> is described here: >>>> >>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists. >>> mcs.anl.gov_pipermail_petsc-2Dusers_2015-2DAugust_026802.html&d=CwIBAg&c= >>> 54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_ >>> JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=RwruX6ckX0t9H89Z6LXKBfJBOAM2vG >>> 1sQHw2tIsSQtA&s=bbB62oGLm582JebVs8xsUej_OX0eUwibAKsRRWKafos&e= ? >>>> >>>> I'm using 3.7.4 and still get SEGV in pdgssvx routine. Everything works >>> fine >>>> with 3.5.4. >>>> >>>> Do I still have to stick to maint branch, and what are the chances for >>> these >>>> fixes to be included in 3.7.5? >>> >>> 3.7.4. is off maint branch [as of a week ago]. So if you are seeing >>> issues with it - its best to debug and figure out the cause. >>> >> >> This bug is indeed inside of superlu_dist, and we started having this issue >> from PETSc-3.6.x. I think superlu_dist developers should have fixed this >> bug. We forgot to update superlu_dist?? This is not a thing users could >> debug and fix. >> >> I have many people in INL suffering from this issue, and they have to stay >> with PETSc-3.5.4 to use superlu_dist. > > To verify if the bug is fixed in latest superlu_dist - you can try > [assuming you have git - either from petsc-3.7/maint/master]: > > --download-superlu_dist --download-superlu_dist-commit=origin/maint > > > Satish > From hengjiew at uci.edu Fri Oct 7 16:49:45 2016 From: hengjiew at uci.edu (frank) Date: Fri, 7 Oct 2016 14:49:45 -0700 Subject: [petsc-users] Performance of the Telescope Multigrid Preconditioner In-Reply-To: References: <577C337B.60909@uci.edu> <5959F823-EDE5-4B34-84C2-271076977368@mcs.anl.gov> <0CFDEA05-2C49-4127-9F13-2B2DB71ADA77@mcs.anl.gov> <27f4756a-3c58-5c56-fd5b-000aac881a5b@uci.edu> <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> Message-ID: Dear all, Thank you so much for the advice. > All setup is done in the first solve. > > ** The time for 1st solve does not scale. > In practice, I am solving a variable coefficient Poisson > equation. I need to build the matrix every time step. Therefore, > each step is similar to the 1st solve which does not scale. Is > there a way I can improve the performance? > > > You could use rediscretization instead of Galerkin to produce the > coarse operators. > > > Yes I can think of one option for improved performance, but I cannot > tell whether it will be beneficial because the logging isn't > sufficiently fine grained (and there is no easy way to get the info > out of petsc). > > I use PtAP to repartition the matrix, this could be consuming most of > the setup time in Telescope with your run. Such a repartitioning could > be avoid if you provided a method to create the operator on the coarse > levels (what Matt is suggesting). However, this requires you to be > able to define your coefficients on the coarse grid. This will most > likely reduce setup time, but your coarse grid operators (now > re-discretized) are likely to be less effective than those generated > via Galerkin coarsening. Please correct me if I understand this incorrectly: I can define my own restriction function and pass it to petsc instead of using PtAP. If so,what's the interface to do that? > Also, you use CG/MG when FMG by itself would probably be faster. > Your smoother is likely not strong enough, and you > should use something like V(2,2). There is a lot of tuning that is > possible, but difficult to automate. > > > Matt's completely correct. > If we could automate this in a meaningful manner, we would have done so. I am not as familiar with multigrid as you guys. It would be very kind if you could be more specific. What does V(2,2) stand for? Is there some strong smoother build in petsc that I can try? Another thing, the vector assemble and scatter take more time as I increased the cores#: cores# 4096 8192 16384 32768 65536 VecAssemblyBegin 298 2.91E+00 2.87E+00 8.59E+00 2.75E+01 2.21E+03 VecAssemblyEnd 298 3.37E-03 1.78E-03 1.78E-03 5.13E-03 1.99E-03 VecScatterBegin 76303 3.82E+00 3.01E+00 2.54E+00 4.40E+00 1.32E+00 VecScatterEnd 76303 3.09E+01 1.47E+01 2.23E+01 2.96E+01 2.10E+01 The above data is produced by solving a constant coefficients Possoin equation with different rhs for 100 steps. As you can see, the time of VecAssemblyBegin increase dramatically from 32K cores to 65K. With 65K cores, it took more time to assemble the rhs than solving the equation. Is there a way to improve this? Thank you. Regards, Frank > > > > > > On 10/04/2016 12:56 PM, Dave May wrote: >> >> >> On Tuesday, 4 October 2016, frank > > wrote: >> >> Hi, >> >> This question is follow-up of the thread "Question about >> memory usage in Multigrid preconditioner". >> I used to have the "Out of Memory(OOM)" problem when >> using the CG+Telescope MG solver with 32768 cores. Adding >> the "-matrap 0; -matptap_scalable" option did solve that >> problem. >> >> Then I test the scalability by solving a 3d poisson eqn >> for 1 step. I used one sub-communicator in all the tests. >> The difference between the petsc options in those tests >> are: 1 the pc_telescope_reduction_factor; 2 the number of >> multigrid levels in the up/down solver. The function >> "ksp_solve" is timed. It is kind of slow and doesn't >> scale at all. >> >> Test1: 512^3 grid points >> Core# telescope_reduction_factor MG levels# for up/down >> solver Time for KSPSolve (s) >> 512 8 4 / 3 6.2466 >> 4096 64 5 / 3 0.9361 >> 32768 64 4 / 3 4.8914 >> >> Test2: 1024^3 grid points >> Core# telescope_reduction_factor MG levels# for up/down >> solver Time for KSPSolve (s) >> 4096 64 5 / 4 3.4139 >> 8192 128 5 / 4 2.4196 >> 16384 32 5 / 3 5.4150 >> 32768 64 5 / 3 5.6067 >> 65536 128 5 / 3 6.5219 >> >> >> You have to be very careful how you interpret these numbers. >> Your solver contains nested calls to KSPSolve, and >> unfortunately as a result the numbers you report include >> setup time. This will remain true even if you call KSPSetUp >> on the outermost KSP. >> >> Your email concerns scalability of the silver application, so >> let's focus on that issue. >> >> The only way to clearly separate setup from solve time is >> to perform two identical solves. The second solve will not >> require any setup. You should monitor the second solve via a >> new PetscStage. >> >> This was what I did in the telescope paper. It was the only >> way to understand the setup cost (and scaling) cf the solve >> time (and scaling). >> >> Thanks >> Dave >> >> I guess I didn't set the MG levels properly. What would >> be the efficient way to arrange the MG levels? >> Also which preconditionr at the coarse mesh of the 2nd >> communicator should I use to improve the performance? >> >> I attached the test code and the petsc options file for >> the 1024^3 cube with 32768 cores. >> >> Thank you. >> >> Regards, >> Frank >> >> >> >> >> >> >> On 09/15/2016 03:35 AM, Dave May wrote: >>> HI all, >>> >>> I the only unexpected memory usage I can see is >>> associated with the call to MatPtAP(). >>> Here is something you can try immediately. >>> Run your code with the additional options >>> -matrap 0 -matptap_scalable >>> >>> I didn't realize this before, but the default behaviour >>> of MatPtAP in parallel is actually to to explicitly form >>> the transpose of P (e.g. assemble R = P^T) and then >>> compute R.A.P. >>> You don't want to do this. The option -matrap 0 resolves >>> this issue. >>> >>> The implementation of P^T.A.P has two variants. >>> The scalable implementation (with respect to memory >>> usage) is selected via the second option -matptap_scalable. >>> >>> Try it out - I see a significant memory reduction using >>> these options for particular mesh sizes / partitions. >>> >>> I've attached a cleaned up version of the code you sent me. >>> There were a number of memory leaks and other issues. >>> The main points being >>> * You should call DMDAVecGetArrayF90() before >>> VecAssembly{Begin,End} >>> * You should call PetscFinalize(), otherwise the >>> option -log_summary (-log_view) will not display >>> anything once the program has completed. >>> >>> >>> Thanks, >>> Dave >>> >>> >>> On 15 September 2016 at 08:03, Hengjie Wang >>> wrote: >>> >>> Hi Dave, >>> >>> Sorry, I should have put more comment to explain the >>> code. >>> The number of process in each dimension is the same: >>> Px = Py=Pz=P. So is the domain size. >>> So if the you want to run the code for a 512^3 grid >>> points on 16^3 cores, you need to set "-N 512 -P 16" >>> in the command line. >>> I add more comments and also fix an error in the >>> attached code. ( The error only effects the accuracy >>> of solution but not the memory usage. ) >>> >>> Thank you. >>> Frank >>> >>> >>> On 9/14/2016 9:05 PM, Dave May wrote: >>>> >>>> >>>> On Thursday, 15 September 2016, Dave May >>>> wrote: >>>> >>>> >>>> >>>> On Thursday, 15 September 2016, frank >>>> wrote: >>>> >>>> Hi, >>>> >>>> I write a simple code to re-produce the >>>> error. I hope this can help to diagnose the >>>> problem. >>>> The code just solves a 3d poisson equation. >>>> >>>> >>>> Why is the stencil width a runtime parameter?? >>>> And why is the default value 2? For 7-pnt FD >>>> Laplace, you only need a stencil width of 1. >>>> >>>> Was this choice made to mimic something in the >>>> real application code? >>>> >>>> >>>> Please ignore - I misunderstood your usage of the >>>> param set by -P >>>> >>>> >>>> I run the code on a 1024^3 mesh. The >>>> process partition is 32 * 32 * 32. That's >>>> when I re-produce the OOM error. Each core >>>> has about 2G memory. >>>> I also run the code on a 512^3 mesh with 16 >>>> * 16 * 16 processes. The ksp solver works >>>> fine. >>>> I attached the code, ksp_view_pre's output >>>> and my petsc option file. >>>> >>>> Thank you. >>>> Frank >>>> >>>> On 09/09/2016 06:38 PM, Hengjie Wang wrote: >>>>> Hi Barry, >>>>> >>>>> I checked. On the supercomputer, I had the >>>>> option "-ksp_view_pre" but it is not in >>>>> file I sent you. I am sorry for the confusion. >>>>> >>>>> Regards, >>>>> Frank >>>>> >>>>> On Friday, September 9, 2016, Barry Smith >>>>> wrote: >>>>> >>>>> >>>>> > On Sep 9, 2016, at 3:11 PM, frank >>>>> wrote: >>>>> > >>>>> > Hi Barry, >>>>> > >>>>> > I think the first KSP view output is >>>>> from -ksp_view_pre. Before I submitted >>>>> the test, I was not sure whether there >>>>> would be OOM error or not. So I added >>>>> both -ksp_view_pre and -ksp_view. >>>>> >>>>> But the options file you sent >>>>> specifically does NOT list the >>>>> -ksp_view_pre so how could it be from >>>>> that? >>>>> >>>>> Sorry to be pedantic but I've spent >>>>> too much time in the past trying to >>>>> debug from incorrect information and >>>>> want to make sure that the information >>>>> I have is correct before thinking. >>>>> Please recheck exactly what happened. >>>>> Rerun with the exact input file you >>>>> emailed if that is needed. >>>>> >>>>> Barry >>>>> >>>>> > >>>>> > Frank >>>>> > >>>>> > >>>>> > On 09/09/2016 12:38 PM, Barry Smith >>>>> wrote: >>>>> >> Why does ksp_view2.txt have two >>>>> KSP views in it while ksp_view1.txt >>>>> has only one KSPView in it? Did you >>>>> run two different solves in the 2 case >>>>> but not the one? >>>>> >> >>>>> >> Barry >>>>> >> >>>>> >> >>>>> >> >>>>> >>> On Sep 9, 2016, at 10:56 AM, frank >>>>> wrote: >>>>> >>> >>>>> >>> Hi, >>>>> >>> >>>>> >>> I want to continue digging into >>>>> the memory problem here. >>>>> >>> I did find a work around in the >>>>> past, which is to use less cores per >>>>> node so that each core has 8G memory. >>>>> However this is deficient and >>>>> expensive. I hope to locate the place >>>>> that uses the most memory. >>>>> >>> >>>>> >>> Here is a brief summary of the >>>>> tests I did in past: >>>>> >>>> Test1: Mesh 1536*128*384 | >>>>> Process Mesh 48*4*12 >>>>> >>> Maximum (over computational time) >>>>> process memory: total 7.0727e+08 >>>>> >>> Current process memory: >>>>> total 7.0727e+08 >>>>> >>> Maximum (over computational time) >>>>> space PetscMalloc()ed: total 6.3908e+11 >>>>> >>> Current space PetscMalloc()ed: >>>>> >>>>> total 1.8275e+09 >>>>> >>> >>>>> >>>> Test2: Mesh 1536*128*384 | >>>>> Process Mesh 96*8*24 >>>>> >>> Maximum (over computational time) >>>>> process memory: total 5.9431e+09 >>>>> >>> Current process memory: >>>>> total 5.9431e+09 >>>>> >>> Maximum (over computational time) >>>>> space PetscMalloc()ed: total 5.3202e+12 >>>>> >>> Current space PetscMalloc()ed: >>>>> >>>>> total 5.4844e+09 >>>>> >>> >>>>> >>>> Test3: Mesh 3072*256*768 | >>>>> Process Mesh 96*8*24 >>>>> >>> OOM( Out Of Memory ) killer of >>>>> the supercomputer terminated the job >>>>> during "KSPSolve". >>>>> >>> >>>>> >>> I attached the output of ksp_view( >>>>> the third test's output is from >>>>> ksp_view_pre ), memory_view and also >>>>> the petsc options. >>>>> >>> >>>>> >>> In all the tests, each core can >>>>> access about 2G memory. In test3, >>>>> there are 4223139840 non-zeros in the >>>>> matrix. This will consume about 1.74M, >>>>> using double precision. Considering >>>>> some extra memory used to store >>>>> integer index, 2G memory should still >>>>> be way enough. >>>>> >>> >>>>> >>> Is there a way to find out which >>>>> part of KSPSolve uses the most memory? >>>>> >>> Thank you so much. >>>>> >>> >>>>> >>> BTW, there are 4 options remains >>>>> unused and I don't understand why they >>>>> are omitted: >>>>> >>> >>>>> -mg_coarse_telescope_mg_coarse_ksp_type >>>>> value: preonly >>>>> >>> >>>>> -mg_coarse_telescope_mg_coarse_pc_type >>>>> value: bjacobi >>>>> >>> >>>>> -mg_coarse_telescope_mg_levels_ksp_max_it >>>>> value: 1 >>>>> >>> >>>>> -mg_coarse_telescope_mg_levels_ksp_type >>>>> value: richardson >>>>> >>> >>>>> >>> >>>>> >>> Regards, >>>>> >>> Frank >>>>> >>> >>>>> >>> On 07/13/2016 05:47 PM, Dave May >>>>> wrote: >>>>> >>>> >>>>> >>>> On 14 July 2016 at 01:07, frank >>>>> wrote: >>>>> >>>> Hi Dave, >>>>> >>>> >>>>> >>>> Sorry for the late reply. >>>>> >>>> Thank you so much for your >>>>> detailed reply. >>>>> >>>> >>>>> >>>> I have a question about the >>>>> estimation of the memory usage. There >>>>> are 4223139840 allocated non-zeros and >>>>> 18432 MPI processes. Double precision >>>>> is used. So the memory per process is: >>>>> >>>> 4223139840 * 8bytes / 18432 / >>>>> 1024 / 1024 = 1.74M ? >>>>> >>>> Did I do sth wrong here? Because >>>>> this seems too small. >>>>> >>>> >>>>> >>>> No - I totally f***ed it up. You >>>>> are correct. That'll teach me for >>>>> fumbling around with my iphone >>>>> calculator and not using my brain. >>>>> (Note that to convert to MB just >>>>> divide by 1e6, not 1024^2 - although I >>>>> apparently cannot convert between >>>>> units correctly....) >>>>> >>>> >>>>> >>>> From the PETSc objects associated >>>>> with the solver, It looks like it >>>>> _should_ run with 2GB per MPI rank. >>>>> Sorry for my mistake. Possibilities >>>>> are: somewhere in your usage of PETSc >>>>> you've introduced a memory leak; PETSc >>>>> is doing a huge over allocation (e.g. >>>>> as per our discussion of MatPtAP); or >>>>> in your application code there are >>>>> other objects you have forgotten to >>>>> log the memory for. >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> I am running this job on Bluewater >>>>> >>>> I am using the 7 points FD >>>>> stencil in 3D. >>>>> >>>> >>>>> >>>> I thought so on both counts. >>>>> >>>> >>>>> >>>> I apologize that I made a stupid >>>>> mistake in computing the memory per >>>>> core. My settings render each core can >>>>> access only 2G memory on average >>>>> instead of 8G which I mentioned in >>>>> previous email. I re-run the job with >>>>> 8G memory per core on average and >>>>> there is no "Out Of Memory" error. I >>>>> would do more test to see if there is >>>>> still some memory issue. >>>>> >>>> >>>>> >>>> Ok. I'd still like to know where >>>>> the memory was being used since my >>>>> estimates were off. >>>>> >>>> >>>>> >>>> >>>>> >>>> Thanks, >>>>> >>>> Dave >>>>> >>>> >>>>> >>>> Regards, >>>>> >>>> Frank >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> On 07/11/2016 01:18 PM, Dave May >>>>> wrote: >>>>> >>>>> Hi Frank, >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 11 July 2016 at 19:14, frank >>>>> wrote: >>>>> >>>>> Hi Dave, >>>>> >>>>> >>>>> >>>>> I re-run the test using bjacobi >>>>> as the preconditioner on the coarse >>>>> mesh of telescope. The Grid is >>>>> 3072*256*768 and process mesh is >>>>> 96*8*24. The petsc option file is >>>>> attached. >>>>> >>>>> I still got the "Out Of Memory" >>>>> error. The error occurred before the >>>>> linear solver finished one step. So I >>>>> don't have the full info from >>>>> ksp_view. The info from ksp_view_pre >>>>> is attached. >>>>> >>>>> >>>>> >>>>> Okay - that is essentially >>>>> useless (sorry) >>>>> >>>>> >>>>> >>>>> It seems to me that the error >>>>> occurred when the decomposition was >>>>> going to be changed. >>>>> >>>>> >>>>> >>>>> Based on what information? >>>>> >>>>> Running with -info would give us >>>>> more clues, but will create a ton of >>>>> output. >>>>> >>>>> Please try running the case >>>>> which failed with -info >>>>> >>>>> I had another test with a grid >>>>> of 1536*128*384 and the same process >>>>> mesh as above. There was no error. The >>>>> ksp_view info is attached for comparison. >>>>> >>>>> Thank you. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> [3] Here is my crude estimate of >>>>> your memory usage. >>>>> >>>>> I'll target the biggest memory >>>>> hogs only to get an order of magnitude >>>>> estimate >>>>> >>>>> >>>>> >>>>> * The Fine grid operator >>>>> contains 4223139840 non-zeros --> 1.8 >>>>> GB per MPI rank assuming double precision. >>>>> >>>>> The indices for the AIJ could >>>>> amount to another 0.3 GB (assuming 32 >>>>> bit integers) >>>>> >>>>> >>>>> >>>>> * You use 5 levels of >>>>> coarsening, so the other operators >>>>> should represent (collectively) >>>>> >>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + >>>>> 2.1/8^4 ~ 300 MB per MPI rank on the >>>>> communicator with 18432 ranks. >>>>> >>>>> The coarse grid should consume ~ >>>>> 0.5 MB per MPI rank on the >>>>> communicator with 18432 ranks. >>>>> >>>>> >>>>> >>>>> * You use a reduction factor of >>>>> 64, making the new communicator with >>>>> 288 MPI ranks. >>>>> >>>>> PCTelescope will first gather a >>>>> temporary matrix associated with your >>>>> coarse level operator assuming a comm >>>>> size of 288 living on the comm with >>>>> size 18432. >>>>> >>>>> This matrix will require >>>>> approximately 0.5 * 64 = 32 MB per >>>>> core on the 288 ranks. >>>>> >>>>> This matrix is then used to form >>>>> a new MPIAIJ matrix on the subcomm, >>>>> thus require another 32 MB per rank. >>>>> >>>>> The temporary matrix is now >>>>> destroyed. >>>>> >>>>> >>>>> >>>>> * Because a DMDA is detected, a >>>>> permutation matrix is assembled. >>>>> >>>>> This requires 2 doubles per >>>>> point in the DMDA. >>>>> >>>>> Your coarse DMDA contains 92 x >>>>> 16 x 48 points. >>>>> >>>>> Thus the permutation matrix will >>>>> require < 1 MB per MPI rank on the >>>>> sub-comm. >>>>> >>>>> >>>>> >>>>> * Lastly, the matrix is >>>>> permuted. This uses MatPtAP(), but the >>>>> resulting operator will have the same >>>>> memory footprint as the unpermuted >>>>> matrix (32 MB). At any stage in >>>>> PCTelescope, only 2 operators of size >>>>> 32 MB are held in memory when the DMDA >>>>> is provided. >>>>> >>>>> >>>>> >>>>> From my rough estimates, the >>>>> worst case memory foot print for any >>>>> given core, given your options is >>>>> approximately >>>>> >>>>> 2100 MB + 300 MB + 32 MB + 32 MB >>>>> + 1 MB = 2465 MB >>>>> >>>>> This is way below 8 GB. >>>>> >>>>> >>>>> >>>>> Note this estimate completely >>>>> ignores: >>>>> >>>>> (1) the memory required for the >>>>> restriction operator, >>>>> >>>>> (2) the potential growth in the >>>>> number of non-zeros per row due to >>>>> Galerkin coarsening (I wished >>>>> -ksp_view_pre reported the output from >>>>> MatView so we could see the number of >>>>> non-zeros required by the coarse level >>>>> operators) >>>>> >>>>> (3) all temporary vectors >>>>> required by the CG solver, and those >>>>> required by the smoothers. >>>>> >>>>> (4) internal memory allocated by >>>>> MatPtAP >>>>> >>>>> (5) memory associated with IS's >>>>> used within PCTelescope >>>>> >>>>> >>>>> >>>>> So either I am completely off in >>>>> my estimates, or you have not >>>>> carefully estimated the memory usage >>>>> of your application code. Hopefully >>>>> others might examine/correct my rough >>>>> estimates >>>>> >>>>> >>>>> >>>>> Since I don't have your code I >>>>> cannot access the latter. >>>>> >>>>> Since I don't have access to the >>>>> same machine you are running on, I >>>>> think we need to take a step back. >>>>> >>>>> >>>>> >>>>> [1] What machine are you running >>>>> on? Send me a URL if its available >>>>> >>>>> >>>>> >>>>> [2] What discretization are you >>>>> using? (I am guessing a scalar 7 point >>>>> FD stencil) >>>>> >>>>> If it's a 7 point FD stencil, we >>>>> should be able to examine the memory >>>>> usage of your solver configuration >>>>> using a standard, light weight >>>>> existing PETSc example, run on your >>>>> machine at the same scale. >>>>> >>>>> This would hopefully enable us >>>>> to correctly evaluate the actual >>>>> memory usage required by the solver >>>>> configuration you are using. >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Dave >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Frank >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 07/08/2016 10:38 PM, Dave May >>>>> wrote: >>>>> >>>>>> >>>>> >>>>>> On Saturday, 9 July 2016, frank >>>>> wrote: >>>>> >>>>>> Hi Barry and Dave, >>>>> >>>>>> >>>>> >>>>>> Thank both of you for the advice. >>>>> >>>>>> >>>>> >>>>>> @Barry >>>>> >>>>>> I made a mistake in the file >>>>> names in last email. I attached the >>>>> correct files this time. >>>>> >>>>>> For all the three tests, >>>>> 'Telescope' is used as the coarse >>>>> preconditioner. >>>>> >>>>>> >>>>> >>>>>> == Test1: Grid: >>>>> 1536*128*384, Process Mesh: 48*4*12 >>>>> >>>>>> Part of the memory usage: >>>>> Vector 125 124 3971904 0. >>>>> >>>>>> Matrix 101 101 9462372 0 >>>>> >>>>>> >>>>> >>>>>> == Test2: Grid: 1536*128*384, >>>>> Process Mesh: 96*8*24 >>>>> >>>>>> Part of the memory usage: >>>>> Vector 125 124 681672 0. >>>>> >>>>>> Matrix 101 101 1462180 0. >>>>> >>>>>> >>>>> >>>>>> In theory, the memory usage in >>>>> Test1 should be 8 times of Test2. In >>>>> my case, it is about 6 times. >>>>> >>>>>> >>>>> >>>>>> == Test3: Grid: 3072*256*768, >>>>> Process Mesh: 96*8*24. Sub-domain per >>>>> process: 32*32*32 >>>>> >>>>>> Here I get the out of memory error. >>>>> >>>>>> >>>>> >>>>>> I tried to use -mg_coarse >>>>> jacobi. In this way, I don't need to >>>>> set -mg_coarse_ksp_type and >>>>> -mg_coarse_pc_type explicitly, right? >>>>> >>>>>> The linear solver didn't work >>>>> in this case. Petsc output some errors. >>>>> >>>>>> >>>>> >>>>>> @Dave >>>>> >>>>>> In test3, I use only one >>>>> instance of 'Telescope'. On the coarse >>>>> mesh of 'Telescope', I used LU as the >>>>> preconditioner instead of SVD. >>>>> >>>>>> If my set the levels correctly, >>>>> then on the last coarse mesh of MG >>>>> where it calls 'Telescope', the >>>>> sub-domain per process is 2*2*2. >>>>> >>>>>> On the last coarse mesh of >>>>> 'Telescope', there is only one grid >>>>> point per process. >>>>> >>>>>> I still got the OOM error. The >>>>> detailed petsc option file is attached. >>>>> >>>>>> >>>>> >>>>>> Do you understand the expected >>>>> memory usage for the particular >>>>> parallel LU implementation you are >>>>> using? I don't (seriously). Replace LU >>>>> with bjacobi and re-run this test. My >>>>> point about solver debugging is still >>>>> valid. >>>>> >>>>>> >>>>> >>>>>> And please send the result of >>>>> KSPView so we can see what is actually >>>>> used in the computations >>>>> >>>>>> >>>>> >>>>>> Thanks >>>>> >>>>>> Dave >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> Thank you so much. >>>>> >>>>>> >>>>> >>>>>> Frank >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> On 07/06/2016 02:51 PM, Barry >>>>> Smith wrote: >>>>> >>>>>> On Jul 6, 2016, at 4:19 PM, >>>>> frank wrote: >>>>> >>>>>> >>>>> >>>>>> Hi Barry, >>>>> >>>>>> >>>>> >>>>>> Thank you for you advice. >>>>> >>>>>> I tried three test. In the 1st >>>>> test, the grid is 3072*256*768 and the >>>>> process mesh is 96*8*24. >>>>> >>>>>> The linear solver is 'cg' the >>>>> preconditioner is 'mg' and 'telescope' >>>>> is used as the preconditioner at the >>>>> coarse mesh. >>>>> >>>>>> The system gives me the "Out of >>>>> Memory" error before the linear system >>>>> is completely solved. >>>>> >>>>>> The info from '-ksp_view_pre' >>>>> is attached. I seems to me that the >>>>> error occurs when it reaches the >>>>> coarse mesh. >>>>> >>>>>> >>>>> >>>>>> The 2nd test uses a grid of >>>>> 1536*128*384 and process mesh is >>>>> 96*8*24. The 3rd test uses >>>>> the same grid but a different process >>>>> mesh 48*4*12. >>>>> >>>>>> Are you sure this is right? >>>>> The total matrix and vector memory >>>>> usage goes from 2nd test >>>>> >>>>>> Vector 384 >>>>> 383 8,193,712 0. >>>>> >>>>>> Matrix 103 >>>>> 103 11,508,688 0. >>>>> >>>>>> to 3rd test >>>>> >>>>>> Vector 384 >>>>> 383 1,590,520 0. >>>>> >>>>>> Matrix 103 >>>>> 103 3,508,664 0. >>>>> >>>>>> that is the memory usage got >>>>> smaller but if you have only 1/8th the >>>>> processes and the same grid it should >>>>> have gotten about 8 times bigger. Did >>>>> you maybe cut the grid by a factor of >>>>> 8 also? If so that still doesn't >>>>> explain it because the memory usage >>>>> changed by a factor of 5 something for >>>>> the vectors and 3 something for the >>>>> matrices. >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> The linear solver and petsc >>>>> options in 2nd and 3rd tests are the >>>>> same in 1st test. The linear solver >>>>> works fine in both test. >>>>> >>>>>> I attached the memory usage of >>>>> the 2nd and 3rd tests. The memory info >>>>> is from the option '-log_summary'. I >>>>> tried to use '-momery_info' as you >>>>> suggested, but in my case petsc >>>>> treated it as an unused option. It >>>>> output nothing about the memory. Do I >>>>> need to add sth to my code so I can >>>>> use '-memory_info'? >>>>> >>>>>> Sorry, my mistake the >>>>> option is -memory_view >>>>> >>>>>> >>>>> >>>>>> Can you run the one case >>>>> with -memory_view and -mg_coarse >>>>> jacobi -ksp_max_it 1 (just so it >>>>> doesn't iterate forever) to see how >>>>> much memory is used without the >>>>> telescope? Also run case 2 the same way. >>>>> >>>>>> >>>>> >>>>>> Barry >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> In both tests the memory usage >>>>> is not large. >>>>> >>>>>> >>>>> >>>>>> It seems to me that it might be >>>>> the 'telescope' preconditioner that >>>>> allocated a lot of memory and caused >>>>> the error in the 1st test. >>>>> >>>>>> Is there is a way to show how >>>>> much memory it allocated? >>>>> >>>>>> >>>>> >>>>>> Frank >>>>> >>>>>> >>>>> >>>>>> On 07/05/2016 03:37 PM, Barry >>>>> Smith wrote: >>>>> >>>>>> Frank, >>>>> >>>>>> >>>>> >>>>>> You can run with >>>>> -ksp_view_pre to have it "view" the >>>>> KSP before the solve so hopefully it >>>>> gets that far. >>>>> >>>>>> >>>>> >>>>>> Please run the problem >>>>> that does fit with -memory_info when >>>>> the problem completes it will show the >>>>> "high water mark" for PETSc allocated >>>>> memory and total memory used. We first >>>>> want to look at these numbers to see >>>>> if it is using more memory than you >>>>> expect. You could also run with say >>>>> half the grid spacing to see how the >>>>> memory usage scaled with the increase >>>>> in grid points. Make the runs also >>>>> with -log_view and send all the output >>>>> from these options. >>>>> >>>>>> >>>>> >>>>>> Barry >>>>> >>>>>> >>>>> >>>>>> On Jul 5, 2016, at 5:23 PM, >>>>> frank wrote: >>>>> >>>>>> >>>>> >>>>>> Hi, >>>>> >>>>>> >>>>> >>>>>> I am using the CG ksp solver >>>>> and Multigrid preconditioner to solve >>>>> a linear system in parallel. >>>>> >>>>>> I chose to use the 'Telescope' >>>>> as the preconditioner on the coarse >>>>> mesh for its good performance. >>>>> >>>>>> The petsc options file is attached. >>>>> >>>>>> >>>>> >>>>>> The domain is a 3d box. >>>>> >>>>>> It works well when the grid is >>>>> 1536*128*384 and the process mesh is >>>>> 96*8*24. When I double the size of >>>>> grid and keep the same process mesh >>>>> and petsc options, I get an "out of >>>>> memory" error from the super-cluster I >>>>> am using. >>>>> >>>>>> Each process has access to at >>>>> least 8G memory, which should be more >>>>> than enough for my application. I am >>>>> sure that all the other parts of my >>>>> code( except the linear solver ) do >>>>> not use much memory. So I doubt if >>>>> there is something wrong with the >>>>> linear solver. >>>>> >>>>>> The error occurs before the >>>>> linear system is completely solved so >>>>> I don't have the info from ksp view. I >>>>> am not able to re-produce the error >>>>> with a smaller problem either. >>>>> >>>>>> In addition, I tried to use >>>>> the block jacobi as the preconditioner >>>>> with the same grid and same >>>>> decomposition. The linear solver runs >>>>> extremely slow but there is no memory >>>>> error. >>>>> >>>>>> >>>>> >>>>>> How can I diagnose what exactly >>>>> cause the error? >>>>> >>>>>> Thank you so much. >>>>> >>>>>> >>>>> >>>>>> Frank >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>>>> >>> >>>>> >>>>> > >>>>> >>>> >>> >>> >> > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to > which their experiments lead. > -- Norbert Wiener > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Oct 7 17:17:05 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 7 Oct 2016 17:17:05 -0500 Subject: [petsc-users] Performance of the Telescope Multigrid Preconditioner In-Reply-To: References: <577C337B.60909@uci.edu> <5959F823-EDE5-4B34-84C2-271076977368@mcs.anl.gov> <0CFDEA05-2C49-4127-9F13-2B2DB71ADA77@mcs.anl.gov> <27f4756a-3c58-5c56-fd5b-000aac881a5b@uci.edu> <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> Message-ID: > On Oct 7, 2016, at 4:49 PM, frank wrote: > > Dear all, > > Thank you so much for the advice. >> All setup is done in the first solve. >> >> ** The time for 1st solve does not scale. >> In practice, I am solving a variable coefficient Poisson equation. I need to build the matrix every time step. Therefore, each step is similar to the 1st solve which does not scale. Is there a way I can improve the performance? >> >> You could use rediscretization instead of Galerkin to produce the coarse operators. >> >> Yes I can think of one option for improved performance, but I cannot tell whether it will be beneficial because the logging isn't sufficiently fine grained (and there is no easy way to get the info out of petsc). >> >> I use PtAP to repartition the matrix, this could be consuming most of the setup time in Telescope with your run. Such a repartitioning could be avoid if you provided a method to create the operator on the coarse levels (what Matt is suggesting). However, this requires you to be able to define your coefficients on the coarse grid. This will most likely reduce setup time, but your coarse grid operators (now re-discretized) are likely to be less effective than those generated via Galerkin coarsening. > > Please correct me if I understand this incorrectly: I can define my own restriction function and pass it to petsc instead of using PtAP. > If so,what's the interface to do that? > >> Also, you use CG/MG when FMG by itself would probably be faster. Your smoother is likely not strong enough, and you >> should use something like V(2,2). There is a lot of tuning that is possible, but difficult to automate. >> >> Matt's completely correct. >> If we could automate this in a meaningful manner, we would have done so. > > I am not as familiar with multigrid as you guys. It would be very kind if you could be more specific. > What does V(2,2) stand for? Is there some strong smoother build in petsc that I can try? > > > Another thing, the vector assemble and scatter take more time as I increased the cores#: > > cores# 4096 8192 16384 32768 65536 > VecAssemblyBegin 298 2.91E+00 2.87E+00 8.59E+00 2.75E+01 2.21E+03 > VecAssemblyEnd 298 3.37E-03 1.78E-03 1.78E-03 5.13E-03 1.99E-03 > VecScatterBegin 76303 3.82E+00 3.01E+00 2.54E+00 4.40E+00 1.32E+00 > VecScatterEnd 76303 3.09E+01 1.47E+01 2.23E+01 2.96E+01 2.10E+01 > > The above data is produced by solving a constant coefficients Possoin equation with different rhs for 100 steps. > As you can see, the time of VecAssemblyBegin increase dramatically from 32K cores to 65K. Something is very very wrong here. It is likely not the VecAssemblyBegin() itself that is taking the huge amount of time. VecAssemblyBegin() is a barrier, that is all processes have to reach it before any process can continue beyond it. Something in the code on some processes is taking a huge amount of time before reaching that point. Perhaps it is in starting up all the processes? Or are you generating the entire rhs on one process? You can't to that. Barry > > With 65K cores, it took more time to assemble the rhs than solving the equation. Is there a way to improve this? > > > Thank you. > > Regards, > Frank > > > > > > > > > > > > > > > > >> >> >> >> >> >> On 10/04/2016 12:56 PM, Dave May wrote: >>> >>> >>> On Tuesday, 4 October 2016, frank wrote: >>> Hi, >>> >>> This question is follow-up of the thread "Question about memory usage in Multigrid preconditioner". >>> I used to have the "Out of Memory(OOM)" problem when using the CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0; -matptap_scalable" option did solve that problem. >>> >>> Then I test the scalability by solving a 3d poisson eqn for 1 step. I used one sub-communicator in all the tests. The difference between the petsc options in those tests are: 1 the pc_telescope_reduction_factor; 2 the number of multigrid levels in the up/down solver. The function "ksp_solve" is timed. It is kind of slow and doesn't scale at all. >>> >>> Test1: 512^3 grid points >>> Core# telescope_reduction_factor MG levels# for up/down solver Time for KSPSolve (s) >>> 512 8 4 / 3 6.2466 >>> 4096 64 5 / 3 0.9361 >>> 32768 64 4 / 3 4.8914 >>> >>> Test2: 1024^3 grid points >>> Core# telescope_reduction_factor MG levels# for up/down solver Time for KSPSolve (s) >>> 4096 64 5 / 4 3.4139 >>> 8192 128 5 / 4 2.4196 >>> 16384 32 5 / 3 5.4150 >>> 32768 64 5 / 3 5.6067 >>> 65536 128 5 / 3 6.5219 >>> >>> You have to be very careful how you interpret these numbers. Your solver contains nested calls to KSPSolve, and unfortunately as a result the numbers you report include setup time. This will remain true even if you call KSPSetUp on the outermost KSP. >>> >>> Your email concerns scalability of the silver application, so let's focus on that issue. >>> >>> The only way to clearly separate setup from solve time is to perform two identical solves. The second solve will not require any setup. You should monitor the second solve via a new PetscStage. >>> >>> This was what I did in the telescope paper. It was the only way to understand the setup cost (and scaling) cf the solve time (and scaling). >>> >>> Thanks >>> Dave >>> >>> >>> I guess I didn't set the MG levels properly. What would be the efficient way to arrange the MG levels? >>> Also which preconditionr at the coarse mesh of the 2nd communicator should I use to improve the performance? >>> >>> I attached the test code and the petsc options file for the 1024^3 cube with 32768 cores. >>> >>> Thank you. >>> >>> Regards, >>> Frank >>> >>> >>> >>> >>> >>> >>> On 09/15/2016 03:35 AM, Dave May wrote: >>>> HI all, >>>> >>>> I the only unexpected memory usage I can see is associated with the call to MatPtAP(). >>>> Here is something you can try immediately. >>>> Run your code with the additional options >>>> -matrap 0 -matptap_scalable >>>> >>>> I didn't realize this before, but the default behaviour of MatPtAP in parallel is actually to to explicitly form the transpose of P (e.g. assemble R = P^T) and then compute R.A.P. >>>> You don't want to do this. The option -matrap 0 resolves this issue. >>>> >>>> The implementation of P^T.A.P has two variants. >>>> The scalable implementation (with respect to memory usage) is selected via the second option -matptap_scalable. >>>> >>>> Try it out - I see a significant memory reduction using these options for particular mesh sizes / partitions. >>>> >>>> I've attached a cleaned up version of the code you sent me. >>>> There were a number of memory leaks and other issues. >>>> The main points being >>>> * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End} >>>> * You should call PetscFinalize(), otherwise the option -log_summary (-log_view) will not display anything once the program has completed. >>>> >>>> >>>> Thanks, >>>> Dave >>>> >>>> >>>> On 15 September 2016 at 08:03, Hengjie Wang wrote: >>>> Hi Dave, >>>> >>>> Sorry, I should have put more comment to explain the code. >>>> The number of process in each dimension is the same: Px = Py=Pz=P. So is the domain size. >>>> So if the you want to run the code for a 512^3 grid points on 16^3 cores, you need to set "-N 512 -P 16" in the command line. >>>> I add more comments and also fix an error in the attached code. ( The error only effects the accuracy of solution but not the memory usage. ) >>>> >>>> Thank you. >>>> Frank >>>> >>>> >>>> On 9/14/2016 9:05 PM, Dave May wrote: >>>>> >>>>> >>>>> On Thursday, 15 September 2016, Dave May wrote: >>>>> >>>>> >>>>> On Thursday, 15 September 2016, frank wrote: >>>>> Hi, >>>>> >>>>> I write a simple code to re-produce the error. I hope this can help to diagnose the problem. >>>>> The code just solves a 3d poisson equation. >>>>> >>>>> Why is the stencil width a runtime parameter?? And why is the default value 2? For 7-pnt FD Laplace, you only need a stencil width of 1. >>>>> >>>>> Was this choice made to mimic something in the real application code? >>>>> >>>>> Please ignore - I misunderstood your usage of the param set by -P >>>>> >>>>> >>>>> >>>>> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32. That's when I re-produce the OOM error. Each core has about 2G memory. >>>>> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp solver works fine. >>>>> I attached the code, ksp_view_pre's output and my petsc option file. >>>>> >>>>> Thank you. >>>>> Frank >>>>> >>>>> On 09/09/2016 06:38 PM, Hengjie Wang wrote: >>>>>> Hi Barry, >>>>>> >>>>>> I checked. On the supercomputer, I had the option "-ksp_view_pre" but it is not in file I sent you. I am sorry for the confusion. >>>>>> >>>>>> Regards, >>>>>> Frank >>>>>> >>>>>> On Friday, September 9, 2016, Barry Smith wrote: >>>>>> >>>>>> > On Sep 9, 2016, at 3:11 PM, frank wrote: >>>>>> > >>>>>> > Hi Barry, >>>>>> > >>>>>> > I think the first KSP view output is from -ksp_view_pre. Before I submitted the test, I was not sure whether there would be OOM error or not. So I added both -ksp_view_pre and -ksp_view. >>>>>> >>>>>> But the options file you sent specifically does NOT list the -ksp_view_pre so how could it be from that? >>>>>> >>>>>> Sorry to be pedantic but I've spent too much time in the past trying to debug from incorrect information and want to make sure that the information I have is correct before thinking. Please recheck exactly what happened. Rerun with the exact input file you emailed if that is needed. >>>>>> >>>>>> Barry >>>>>> >>>>>> > >>>>>> > Frank >>>>>> > >>>>>> > >>>>>> > On 09/09/2016 12:38 PM, Barry Smith wrote: >>>>>> >> Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt has only one KSPView in it? Did you run two different solves in the 2 case but not the one? >>>>>> >> >>>>>> >> Barry >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >>> On Sep 9, 2016, at 10:56 AM, frank wrote: >>>>>> >>> >>>>>> >>> Hi, >>>>>> >>> >>>>>> >>> I want to continue digging into the memory problem here. >>>>>> >>> I did find a work around in the past, which is to use less cores per node so that each core has 8G memory. However this is deficient and expensive. I hope to locate the place that uses the most memory. >>>>>> >>> >>>>>> >>> Here is a brief summary of the tests I did in past: >>>>>> >>>> Test1: Mesh 1536*128*384 | Process Mesh 48*4*12 >>>>>> >>> Maximum (over computational time) process memory: total 7.0727e+08 >>>>>> >>> Current process memory: total 7.0727e+08 >>>>>> >>> Maximum (over computational time) space PetscMalloc()ed: total 6.3908e+11 >>>>>> >>> Current space PetscMalloc()ed: total 1.8275e+09 >>>>>> >>> >>>>>> >>>> Test2: Mesh 1536*128*384 | Process Mesh 96*8*24 >>>>>> >>> Maximum (over computational time) process memory: total 5.9431e+09 >>>>>> >>> Current process memory: total 5.9431e+09 >>>>>> >>> Maximum (over computational time) space PetscMalloc()ed: total 5.3202e+12 >>>>>> >>> Current space PetscMalloc()ed: total 5.4844e+09 >>>>>> >>> >>>>>> >>>> Test3: Mesh 3072*256*768 | Process Mesh 96*8*24 >>>>>> >>> OOM( Out Of Memory ) killer of the supercomputer terminated the job during "KSPSolve". >>>>>> >>> >>>>>> >>> I attached the output of ksp_view( the third test's output is from ksp_view_pre ), memory_view and also the petsc options. >>>>>> >>> >>>>>> >>> In all the tests, each core can access about 2G memory. In test3, there are 4223139840 non-zeros in the matrix. This will consume about 1.74M, using double precision. Considering some extra memory used to store integer index, 2G memory should still be way enough. >>>>>> >>> >>>>>> >>> Is there a way to find out which part of KSPSolve uses the most memory? >>>>>> >>> Thank you so much. >>>>>> >>> >>>>>> >>> BTW, there are 4 options remains unused and I don't understand why they are omitted: >>>>>> >>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly >>>>>> >>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi >>>>>> >>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1 >>>>>> >>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson >>>>>> >>> >>>>>> >>> >>>>>> >>> Regards, >>>>>> >>> Frank >>>>>> >>> >>>>>> >>> On 07/13/2016 05:47 PM, Dave May wrote: >>>>>> >>>> >>>>>> >>>> On 14 July 2016 at 01:07, frank wrote: >>>>>> >>>> Hi Dave, >>>>>> >>>> >>>>>> >>>> Sorry for the late reply. >>>>>> >>>> Thank you so much for your detailed reply. >>>>>> >>>> >>>>>> >>>> I have a question about the estimation of the memory usage. There are 4223139840 allocated non-zeros and 18432 MPI processes. Double precision is used. So the memory per process is: >>>>>> >>>> 4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ? >>>>>> >>>> Did I do sth wrong here? Because this seems too small. >>>>>> >>>> >>>>>> >>>> No - I totally f***ed it up. You are correct. That'll teach me for fumbling around with my iphone calculator and not using my brain. (Note that to convert to MB just divide by 1e6, not 1024^2 - although I apparently cannot convert between units correctly....) >>>>>> >>>> >>>>>> >>>> From the PETSc objects associated with the solver, It looks like it _should_ run with 2GB per MPI rank. Sorry for my mistake. Possibilities are: somewhere in your usage of PETSc you've introduced a memory leak; PETSc is doing a huge over allocation (e.g. as per our discussion of MatPtAP); or in your application code there are other objects you have forgotten to log the memory for. >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> I am running this job on Bluewater >>>>>> >>>> I am using the 7 points FD stencil in 3D. >>>>>> >>>> >>>>>> >>>> I thought so on both counts. >>>>>> >>>> >>>>>> >>>> I apologize that I made a stupid mistake in computing the memory per core. My settings render each core can access only 2G memory on average instead of 8G which I mentioned in previous email. I re-run the job with 8G memory per core on average and there is no "Out Of Memory" error. I would do more test to see if there is still some memory issue. >>>>>> >>>> >>>>>> >>>> Ok. I'd still like to know where the memory was being used since my estimates were off. >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> Thanks, >>>>>> >>>> Dave >>>>>> >>>> >>>>>> >>>> Regards, >>>>>> >>>> Frank >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> On 07/11/2016 01:18 PM, Dave May wrote: >>>>>> >>>>> Hi Frank, >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> On 11 July 2016 at 19:14, frank wrote: >>>>>> >>>>> Hi Dave, >>>>>> >>>>> >>>>>> >>>>> I re-run the test using bjacobi as the preconditioner on the coarse mesh of telescope. The Grid is 3072*256*768 and process mesh is 96*8*24. The petsc option file is attached. >>>>>> >>>>> I still got the "Out Of Memory" error. The error occurred before the linear solver finished one step. So I don't have the full info from ksp_view. The info from ksp_view_pre is attached. >>>>>> >>>>> >>>>>> >>>>> Okay - that is essentially useless (sorry) >>>>>> >>>>> >>>>>> >>>>> It seems to me that the error occurred when the decomposition was going to be changed. >>>>>> >>>>> >>>>>> >>>>> Based on what information? >>>>>> >>>>> Running with -info would give us more clues, but will create a ton of output. >>>>>> >>>>> Please try running the case which failed with -info >>>>>> >>>>> I had another test with a grid of 1536*128*384 and the same process mesh as above. There was no error. The ksp_view info is attached for comparison. >>>>>> >>>>> Thank you. >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> [3] Here is my crude estimate of your memory usage. >>>>>> >>>>> I'll target the biggest memory hogs only to get an order of magnitude estimate >>>>>> >>>>> >>>>>> >>>>> * The Fine grid operator contains 4223139840 non-zeros --> 1.8 GB per MPI rank assuming double precision. >>>>>> >>>>> The indices for the AIJ could amount to another 0.3 GB (assuming 32 bit integers) >>>>>> >>>>> >>>>>> >>>>> * You use 5 levels of coarsening, so the other operators should represent (collectively) >>>>>> >>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4 ~ 300 MB per MPI rank on the communicator with 18432 ranks. >>>>>> >>>>> The coarse grid should consume ~ 0.5 MB per MPI rank on the communicator with 18432 ranks. >>>>>> >>>>> >>>>>> >>>>> * You use a reduction factor of 64, making the new communicator with 288 MPI ranks. >>>>>> >>>>> PCTelescope will first gather a temporary matrix associated with your coarse level operator assuming a comm size of 288 living on the comm with size 18432. >>>>>> >>>>> This matrix will require approximately 0.5 * 64 = 32 MB per core on the 288 ranks. >>>>>> >>>>> This matrix is then used to form a new MPIAIJ matrix on the subcomm, thus require another 32 MB per rank. >>>>>> >>>>> The temporary matrix is now destroyed. >>>>>> >>>>> >>>>>> >>>>> * Because a DMDA is detected, a permutation matrix is assembled. >>>>>> >>>>> This requires 2 doubles per point in the DMDA. >>>>>> >>>>> Your coarse DMDA contains 92 x 16 x 48 points. >>>>>> >>>>> Thus the permutation matrix will require < 1 MB per MPI rank on the sub-comm. >>>>>> >>>>> >>>>>> >>>>> * Lastly, the matrix is permuted. This uses MatPtAP(), but the resulting operator will have the same memory footprint as the unpermuted matrix (32 MB). At any stage in PCTelescope, only 2 operators of size 32 MB are held in memory when the DMDA is provided. >>>>>> >>>>> >>>>>> >>>>> From my rough estimates, the worst case memory foot print for any given core, given your options is approximately >>>>>> >>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB = 2465 MB >>>>>> >>>>> This is way below 8 GB. >>>>>> >>>>> >>>>>> >>>>> Note this estimate completely ignores: >>>>>> >>>>> (1) the memory required for the restriction operator, >>>>>> >>>>> (2) the potential growth in the number of non-zeros per row due to Galerkin coarsening (I wished -ksp_view_pre reported the output from MatView so we could see the number of non-zeros required by the coarse level operators) >>>>>> >>>>> (3) all temporary vectors required by the CG solver, and those required by the smoothers. >>>>>> >>>>> (4) internal memory allocated by MatPtAP >>>>>> >>>>> (5) memory associated with IS's used within PCTelescope >>>>>> >>>>> >>>>>> >>>>> So either I am completely off in my estimates, or you have not carefully estimated the memory usage of your application code. Hopefully others might examine/correct my rough estimates >>>>>> >>>>> >>>>>> >>>>> Since I don't have your code I cannot access the latter. >>>>>> >>>>> Since I don't have access to the same machine you are running on, I think we need to take a step back. >>>>>> >>>>> >>>>>> >>>>> [1] What machine are you running on? Send me a URL if its available >>>>>> >>>>> >>>>>> >>>>> [2] What discretization are you using? (I am guessing a scalar 7 point FD stencil) >>>>>> >>>>> If it's a 7 point FD stencil, we should be able to examine the memory usage of your solver configuration using a standard, light weight existing PETSc example, run on your machine at the same scale. >>>>>> >>>>> This would hopefully enable us to correctly evaluate the actual memory usage required by the solver configuration you are using. >>>>>> >>>>> >>>>>> >>>>> Thanks, >>>>>> >>>>> Dave >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> Frank >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> On 07/08/2016 10:38 PM, Dave May wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Saturday, 9 July 2016, frank wrote: >>>>>> >>>>>> Hi Barry and Dave, >>>>>> >>>>>> >>>>>> >>>>>> Thank both of you for the advice. >>>>>> >>>>>> >>>>>> >>>>>> @Barry >>>>>> >>>>>> I made a mistake in the file names in last email. I attached the correct files this time. >>>>>> >>>>>> For all the three tests, 'Telescope' is used as the coarse preconditioner. >>>>>> >>>>>> >>>>>> >>>>>> == Test1: Grid: 1536*128*384, Process Mesh: 48*4*12 >>>>>> >>>>>> Part of the memory usage: Vector 125 124 3971904 0. >>>>>> >>>>>> Matrix 101 101 9462372 0 >>>>>> >>>>>> >>>>>> >>>>>> == Test2: Grid: 1536*128*384, Process Mesh: 96*8*24 >>>>>> >>>>>> Part of the memory usage: Vector 125 124 681672 0. >>>>>> >>>>>> Matrix 101 101 1462180 0. >>>>>> >>>>>> >>>>>> >>>>>> In theory, the memory usage in Test1 should be 8 times of Test2. In my case, it is about 6 times. >>>>>> >>>>>> >>>>>> >>>>>> == Test3: Grid: 3072*256*768, Process Mesh: 96*8*24. Sub-domain per process: 32*32*32 >>>>>> >>>>>> Here I get the out of memory error. >>>>>> >>>>>> >>>>>> >>>>>> I tried to use -mg_coarse jacobi. In this way, I don't need to set -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right? >>>>>> >>>>>> The linear solver didn't work in this case. Petsc output some errors. >>>>>> >>>>>> >>>>>> >>>>>> @Dave >>>>>> >>>>>> In test3, I use only one instance of 'Telescope'. On the coarse mesh of 'Telescope', I used LU as the preconditioner instead of SVD. >>>>>> >>>>>> If my set the levels correctly, then on the last coarse mesh of MG where it calls 'Telescope', the sub-domain per process is 2*2*2. >>>>>> >>>>>> On the last coarse mesh of 'Telescope', there is only one grid point per process. >>>>>> >>>>>> I still got the OOM error. The detailed petsc option file is attached. >>>>>> >>>>>> >>>>>> >>>>>> Do you understand the expected memory usage for the particular parallel LU implementation you are using? I don't (seriously). Replace LU with bjacobi and re-run this test. My point about solver debugging is still valid. >>>>>> >>>>>> >>>>>> >>>>>> And please send the result of KSPView so we can see what is actually used in the computations >>>>>> >>>>>> >>>>>> >>>>>> Thanks >>>>>> >>>>>> Dave >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Thank you so much. >>>>>> >>>>>> >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote: >>>>>> >>>>>> On Jul 6, 2016, at 4:19 PM, frank wrote: >>>>>> >>>>>> >>>>>> >>>>>> Hi Barry, >>>>>> >>>>>> >>>>>> >>>>>> Thank you for you advice. >>>>>> >>>>>> I tried three test. In the 1st test, the grid is 3072*256*768 and the process mesh is 96*8*24. >>>>>> >>>>>> The linear solver is 'cg' the preconditioner is 'mg' and 'telescope' is used as the preconditioner at the coarse mesh. >>>>>> >>>>>> The system gives me the "Out of Memory" error before the linear system is completely solved. >>>>>> >>>>>> The info from '-ksp_view_pre' is attached. I seems to me that the error occurs when it reaches the coarse mesh. >>>>>> >>>>>> >>>>>> >>>>>> The 2nd test uses a grid of 1536*128*384 and process mesh is 96*8*24. The 3rd test uses the same grid but a different process mesh 48*4*12. >>>>>> >>>>>> Are you sure this is right? The total matrix and vector memory usage goes from 2nd test >>>>>> >>>>>> Vector 384 383 8,193,712 0. >>>>>> >>>>>> Matrix 103 103 11,508,688 0. >>>>>> >>>>>> to 3rd test >>>>>> >>>>>> Vector 384 383 1,590,520 0. >>>>>> >>>>>> Matrix 103 103 3,508,664 0. >>>>>> >>>>>> that is the memory usage got smaller but if you have only 1/8th the processes and the same grid it should have gotten about 8 times bigger. Did you maybe cut the grid by a factor of 8 also? If so that still doesn't explain it because the memory usage changed by a factor of 5 something for the vectors and 3 something for the matrices. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> The linear solver and petsc options in 2nd and 3rd tests are the same in 1st test. The linear solver works fine in both test. >>>>>> >>>>>> I attached the memory usage of the 2nd and 3rd tests. The memory info is from the option '-log_summary'. I tried to use '-momery_info' as you suggested, but in my case petsc treated it as an unused option. It output nothing about the memory. Do I need to add sth to my code so I can use '-memory_info'? >>>>>> >>>>>> Sorry, my mistake the option is -memory_view >>>>>> >>>>>> >>>>>> >>>>>> Can you run the one case with -memory_view and -mg_coarse jacobi -ksp_max_it 1 (just so it doesn't iterate forever) to see how much memory is used without the telescope? Also run case 2 the same way. >>>>>> >>>>>> >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> In both tests the memory usage is not large. >>>>>> >>>>>> >>>>>> >>>>>> It seems to me that it might be the 'telescope' preconditioner that allocated a lot of memory and caused the error in the 1st test. >>>>>> >>>>>> Is there is a way to show how much memory it allocated? >>>>>> >>>>>> >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> >>>>>> On 07/05/2016 03:37 PM, Barry Smith wrote: >>>>>> >>>>>> Frank, >>>>>> >>>>>> >>>>>> >>>>>> You can run with -ksp_view_pre to have it "view" the KSP before the solve so hopefully it gets that far. >>>>>> >>>>>> >>>>>> >>>>>> Please run the problem that does fit with -memory_info when the problem completes it will show the "high water mark" for PETSc allocated memory and total memory used. We first want to look at these numbers to see if it is using more memory than you expect. You could also run with say half the grid spacing to see how the memory usage scaled with the increase in grid points. Make the runs also with -log_view and send all the output from these options. >>>>>> >>>>>> >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>> On Jul 5, 2016, at 5:23 PM, frank wrote: >>>>>> >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> >>>>>> >>>>>> I am using the CG ksp solver and Multigrid preconditioner to solve a linear system in parallel. >>>>>> >>>>>> I chose to use the 'Telescope' as the preconditioner on the coarse mesh for its good performance. >>>>>> >>>>>> The petsc options file is attached. >>>>>> >>>>>> >>>>>> >>>>>> The domain is a 3d box. >>>>>> >>>>>> It works well when the grid is 1536*128*384 and the process mesh is 96*8*24. When I double the size of grid and keep the same process mesh and petsc options, I get an "out of memory" error from the super-cluster I am using. >>>>>> >>>>>> Each process has access to at least 8G memory, which should be more than enough for my application. I am sure that all the other parts of my code( except the linear solver ) do not use much memory. So I doubt if there is something wrong with the linear solver. >>>>>> >>>>>> The error occurs before the linear system is completely solved so I don't have the info from ksp view. I am not able to re-produce the error with a smaller problem either. >>>>>> >>>>>> In addition, I tried to use the block jacobi as the preconditioner with the same grid and same decomposition. The linear solver runs extremely slow but there is no memory error. >>>>>> >>>>>> >>>>>> >>>>>> How can I diagnose what exactly cause the error? >>>>>> >>>>>> Thank you so much. >>>>>> >>>>>> >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>>> >>>> >>>>>> >>> >>>>>> > >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >>> >> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> > From hengjiew at uci.edu Fri Oct 7 18:41:16 2016 From: hengjiew at uci.edu (frank) Date: Fri, 7 Oct 2016 16:41:16 -0700 Subject: [petsc-users] Time cost by Vec Assembly In-Reply-To: References: <577C337B.60909@uci.edu> <5959F823-EDE5-4B34-84C2-271076977368@mcs.anl.gov> <0CFDEA05-2C49-4127-9F13-2B2DB71ADA77@mcs.anl.gov> <27f4756a-3c58-5c56-fd5b-000aac881a5b@uci.edu> <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> Message-ID: <31850156-d703-d278-7e07-1b903b7c90a4@uci.edu> Hello, >> Another thing, the vector assemble and scatter take more time as I increased the cores#: >> >> cores# 4096 8192 16384 32768 65536 >> VecAssemblyBegin 298 2.91E+00 2.87E+00 8.59E+00 2.75E+01 2.21E+03 >> VecAssemblyEnd 298 3.37E-03 1.78E-03 1.78E-03 5.13E-03 1.99E-03 >> VecScatterBegin 76303 3.82E+00 3.01E+00 2.54E+00 4.40E+00 1.32E+00 >> VecScatterEnd 76303 3.09E+01 1.47E+01 2.23E+01 2.96E+01 2.10E+01 >> >> The above data is produced by solving a constant coefficients Possoin equation with different rhs for 100 steps. >> As you can see, the time of VecAssemblyBegin increase dramatically from 32K cores to 65K. > Something is very very wrong here. It is likely not the VecAssemblyBegin() itself that is taking the huge amount of time. VecAssemblyBegin() is a barrier, that is all processes have to reach it before any process can continue beyond it. Something in the code on some processes is taking a huge amount of time before reaching that point. Perhaps it is in starting up all the processes? Or are you generating the entire rhs on one process? You can't to that. > > Barry (I create a new subject since this is a separate problem from my previous question.) Each process computes its part of the rhs. The above result are from 100 steps' computation. It is not a starting-up issue. I also have the results from a simple code to show this problem: cores# 4096 8192 16384 32768 65536 VecAssemblyBegin 1 4.56E-02 3.27E-02 3.63E-02 6.26E-02 2.80E+02 VecAssemblyEnd 1 3.54E-04 3.43E-04 3.47E-04 3.44E-04 4.53E-04 Again, the time cost increases dramatically after 30K cores. The max/min ratio of VecAssemblyBegin is 1.2 for both 30K and 65K cases. If there is a huge delay on some process, should this value be large? The part of code that calls the assembly subroutines looks like: CALL DMCreateGlobalVector( ... ) CALL DMDAVecGetArrayF90( ... ) ... each process computes its part of rhs... CALL DMDAVecRestoreArrayF90(...) CALL VecAssemblyBegin( ... ) CALL VecAssemblyEnd( ... ) Thank you Regards, Frank On 10/04/2016 12:56 PM, Dave May wrote: >>>> >>>> On Tuesday, 4 October 2016, frank wrote: >>>> Hi, >>>> >>>> This question is follow-up of the thread "Question about memory usage in Multigrid preconditioner". >>>> I used to have the "Out of Memory(OOM)" problem when using the CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0; -matptap_scalable" option did solve that problem. >>>> >>>> Then I test the scalability by solving a 3d poisson eqn for 1 step. I used one sub-communicator in all the tests. The difference between the petsc options in those tests are: 1 the pc_telescope_reduction_factor; 2 the number of multigrid levels in the up/down solver. The function "ksp_solve" is timed. It is kind of slow and doesn't scale at all. >>>> >>>> Test1: 512^3 grid points >>>> Core# telescope_reduction_factor MG levels# for up/down solver Time for KSPSolve (s) >>>> 512 8 4 / 3 6.2466 >>>> 4096 64 5 / 3 0.9361 >>>> 32768 64 4 / 3 4.8914 >>>> >>>> Test2: 1024^3 grid points >>>> Core# telescope_reduction_factor MG levels# for up/down solver Time for KSPSolve (s) >>>> 4096 64 5 / 4 3.4139 >>>> 8192 128 5 / 4 2.4196 >>>> 16384 32 5 / 3 5.4150 >>>> 32768 64 5 / 3 5.6067 >>>> 65536 128 5 / 3 6.5219 >>>> >>>> You have to be very careful how you interpret these numbers. Your solver contains nested calls to KSPSolve, and unfortunately as a result the numbers you report include setup time. This will remain true even if you call KSPSetUp on the outermost KSP. >>>> >>>> Your email concerns scalability of the silver application, so let's focus on that issue. >>>> >>>> The only way to clearly separate setup from solve time is to perform two identical solves. The second solve will not require any setup. You should monitor the second solve via a new PetscStage. >>>> >>>> This was what I did in the telescope paper. It was the only way to understand the setup cost (and scaling) cf the solve time (and scaling). >>>> >>>> Thanks >>>> Dave >>>> >>>> >>>> I guess I didn't set the MG levels properly. What would be the efficient way to arrange the MG levels? >>>> Also which preconditionr at the coarse mesh of the 2nd communicator should I use to improve the performance? >>>> >>>> I attached the test code and the petsc options file for the 1024^3 cube with 32768 cores. >>>> >>>> Thank you. >>>> >>>> Regards, >>>> Frank >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 09/15/2016 03:35 AM, Dave May wrote: >>>>> HI all, >>>>> >>>>> I the only unexpected memory usage I can see is associated with the call to MatPtAP(). >>>>> Here is something you can try immediately. >>>>> Run your code with the additional options >>>>> -matrap 0 -matptap_scalable >>>>> >>>>> I didn't realize this before, but the default behaviour of MatPtAP in parallel is actually to to explicitly form the transpose of P (e.g. assemble R = P^T) and then compute R.A.P. >>>>> You don't want to do this. The option -matrap 0 resolves this issue. >>>>> >>>>> The implementation of P^T.A.P has two variants. >>>>> The scalable implementation (with respect to memory usage) is selected via the second option -matptap_scalable. >>>>> >>>>> Try it out - I see a significant memory reduction using these options for particular mesh sizes / partitions. >>>>> >>>>> I've attached a cleaned up version of the code you sent me. >>>>> There were a number of memory leaks and other issues. >>>>> The main points being >>>>> * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End} >>>>> * You should call PetscFinalize(), otherwise the option -log_summary (-log_view) will not display anything once the program has completed. >>>>> >>>>> >>>>> Thanks, >>>>> Dave >>>>> >>>>> >>>>> On 15 September 2016 at 08:03, Hengjie Wang wrote: >>>>> Hi Dave, >>>>> >>>>> Sorry, I should have put more comment to explain the code. >>>>> The number of process in each dimension is the same: Px = Py=Pz=P. So is the domain size. >>>>> So if the you want to run the code for a 512^3 grid points on 16^3 cores, you need to set "-N 512 -P 16" in the command line. >>>>> I add more comments and also fix an error in the attached code. ( The error only effects the accuracy of solution but not the memory usage. ) >>>>> >>>>> Thank you. >>>>> Frank >>>>> >>>>> >>>>> On 9/14/2016 9:05 PM, Dave May wrote: >>>>>> >>>>>> On Thursday, 15 September 2016, Dave May wrote: >>>>>> >>>>>> >>>>>> On Thursday, 15 September 2016, frank wrote: >>>>>> Hi, >>>>>> >>>>>> I write a simple code to re-produce the error. I hope this can help to diagnose the problem. >>>>>> The code just solves a 3d poisson equation. >>>>>> >>>>>> Why is the stencil width a runtime parameter?? And why is the default value 2? For 7-pnt FD Laplace, you only need a stencil width of 1. >>>>>> >>>>>> Was this choice made to mimic something in the real application code? >>>>>> >>>>>> Please ignore - I misunderstood your usage of the param set by -P >>>>>> >>>>>> >>>>>> >>>>>> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32. That's when I re-produce the OOM error. Each core has about 2G memory. >>>>>> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp solver works fine. >>>>>> I attached the code, ksp_view_pre's output and my petsc option file. >>>>>> >>>>>> Thank you. >>>>>> Frank >>>>>> >>>>>> On 09/09/2016 06:38 PM, Hengjie Wang wrote: >>>>>>> Hi Barry, >>>>>>> >>>>>>> I checked. On the supercomputer, I had the option "-ksp_view_pre" but it is not in file I sent you. I am sorry for the confusion. >>>>>>> >>>>>>> Regards, >>>>>>> Frank >>>>>>> >>>>>>> On Friday, September 9, 2016, Barry Smith wrote: >>>>>>> >>>>>>>> On Sep 9, 2016, at 3:11 PM, frank wrote: >>>>>>>> >>>>>>>> Hi Barry, >>>>>>>> >>>>>>>> I think the first KSP view output is from -ksp_view_pre. Before I submitted the test, I was not sure whether there would be OOM error or not. So I added both -ksp_view_pre and -ksp_view. >>>>>>> But the options file you sent specifically does NOT list the -ksp_view_pre so how could it be from that? >>>>>>> >>>>>>> Sorry to be pedantic but I've spent too much time in the past trying to debug from incorrect information and want to make sure that the information I have is correct before thinking. Please recheck exactly what happened. Rerun with the exact input file you emailed if that is needed. >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>>> Frank >>>>>>>> >>>>>>>> >>>>>>>> On 09/09/2016 12:38 PM, Barry Smith wrote: >>>>>>>>> Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt has only one KSPView in it? Did you run two different solves in the 2 case but not the one? >>>>>>>>> >>>>>>>>> Barry >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Sep 9, 2016, at 10:56 AM, frank wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I want to continue digging into the memory problem here. >>>>>>>>>> I did find a work around in the past, which is to use less cores per node so that each core has 8G memory. However this is deficient and expensive. I hope to locate the place that uses the most memory. >>>>>>>>>> >>>>>>>>>> Here is a brief summary of the tests I did in past: >>>>>>>>>>> Test1: Mesh 1536*128*384 | Process Mesh 48*4*12 >>>>>>>>>> Maximum (over computational time) process memory: total 7.0727e+08 >>>>>>>>>> Current process memory: total 7.0727e+08 >>>>>>>>>> Maximum (over computational time) space PetscMalloc()ed: total 6.3908e+11 >>>>>>>>>> Current space PetscMalloc()ed: total 1.8275e+09 >>>>>>>>>> >>>>>>>>>>> Test2: Mesh 1536*128*384 | Process Mesh 96*8*24 >>>>>>>>>> Maximum (over computational time) process memory: total 5.9431e+09 >>>>>>>>>> Current process memory: total 5.9431e+09 >>>>>>>>>> Maximum (over computational time) space PetscMalloc()ed: total 5.3202e+12 >>>>>>>>>> Current space PetscMalloc()ed: total 5.4844e+09 >>>>>>>>>> >>>>>>>>>>> Test3: Mesh 3072*256*768 | Process Mesh 96*8*24 >>>>>>>>>> OOM( Out Of Memory ) killer of the supercomputer terminated the job during "KSPSolve". >>>>>>>>>> >>>>>>>>>> I attached the output of ksp_view( the third test's output is from ksp_view_pre ), memory_view and also the petsc options. >>>>>>>>>> >>>>>>>>>> In all the tests, each core can access about 2G memory. In test3, there are 4223139840 non-zeros in the matrix. This will consume about 1.74M, using double precision. Considering some extra memory used to store integer index, 2G memory should still be way enough. >>>>>>>>>> >>>>>>>>>> Is there a way to find out which part of KSPSolve uses the most memory? >>>>>>>>>> Thank you so much. >>>>>>>>>> >>>>>>>>>> BTW, there are 4 options remains unused and I don't understand why they are omitted: >>>>>>>>>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly >>>>>>>>>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi >>>>>>>>>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1 >>>>>>>>>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Frank >>>>>>>>>> >>>>>>>>>> On 07/13/2016 05:47 PM, Dave May wrote: >>>>>>>>>>> On 14 July 2016 at 01:07, frank wrote: >>>>>>>>>>> Hi Dave, >>>>>>>>>>> >>>>>>>>>>> Sorry for the late reply. >>>>>>>>>>> Thank you so much for your detailed reply. >>>>>>>>>>> >>>>>>>>>>> I have a question about the estimation of the memory usage. There are 4223139840 allocated non-zeros and 18432 MPI processes. Double precision is used. So the memory per process is: >>>>>>>>>>> 4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ? >>>>>>>>>>> Did I do sth wrong here? Because this seems too small. >>>>>>>>>>> >>>>>>>>>>> No - I totally f***ed it up. You are correct. That'll teach me for fumbling around with my iphone calculator and not using my brain. (Note that to convert to MB just divide by 1e6, not 1024^2 - although I apparently cannot convert between units correctly....) >>>>>>>>>>> >>>>>>>>>>> From the PETSc objects associated with the solver, It looks like it _should_ run with 2GB per MPI rank. Sorry for my mistake. Possibilities are: somewhere in your usage of PETSc you've introduced a memory leak; PETSc is doing a huge over allocation (e.g. as per our discussion of MatPtAP); or in your application code there are other objects you have forgotten to log the memory for. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I am running this job on Bluewater >>>>>>>>>>> I am using the 7 points FD stencil in 3D. >>>>>>>>>>> >>>>>>>>>>> I thought so on both counts. >>>>>>>>>>> >>>>>>>>>>> I apologize that I made a stupid mistake in computing the memory per core. My settings render each core can access only 2G memory on average instead of 8G which I mentioned in previous email. I re-run the job with 8G memory per core on average and there is no "Out Of Memory" error. I would do more test to see if there is still some memory issue. >>>>>>>>>>> >>>>>>>>>>> Ok. I'd still like to know where the memory was being used since my estimates were off. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Dave >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Frank >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 07/11/2016 01:18 PM, Dave May wrote: >>>>>>>>>>>> Hi Frank, >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 11 July 2016 at 19:14, frank wrote: >>>>>>>>>>>> Hi Dave, >>>>>>>>>>>> >>>>>>>>>>>> I re-run the test using bjacobi as the preconditioner on the coarse mesh of telescope. The Grid is 3072*256*768 and process mesh is 96*8*24. The petsc option file is attached. >>>>>>>>>>>> I still got the "Out Of Memory" error. The error occurred before the linear solver finished one step. So I don't have the full info from ksp_view. The info from ksp_view_pre is attached. >>>>>>>>>>>> >>>>>>>>>>>> Okay - that is essentially useless (sorry) >>>>>>>>>>>> >>>>>>>>>>>> It seems to me that the error occurred when the decomposition was going to be changed. >>>>>>>>>>>> >>>>>>>>>>>> Based on what information? >>>>>>>>>>>> Running with -info would give us more clues, but will create a ton of output. >>>>>>>>>>>> Please try running the case which failed with -info >>>>>>>>>>>> I had another test with a grid of 1536*128*384 and the same process mesh as above. There was no error. The ksp_view info is attached for comparison. >>>>>>>>>>>> Thank you. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [3] Here is my crude estimate of your memory usage. >>>>>>>>>>>> I'll target the biggest memory hogs only to get an order of magnitude estimate >>>>>>>>>>>> >>>>>>>>>>>> * The Fine grid operator contains 4223139840 non-zeros --> 1.8 GB per MPI rank assuming double precision. >>>>>>>>>>>> The indices for the AIJ could amount to another 0.3 GB (assuming 32 bit integers) >>>>>>>>>>>> >>>>>>>>>>>> * You use 5 levels of coarsening, so the other operators should represent (collectively) >>>>>>>>>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4 ~ 300 MB per MPI rank on the communicator with 18432 ranks. >>>>>>>>>>>> The coarse grid should consume ~ 0.5 MB per MPI rank on the communicator with 18432 ranks. >>>>>>>>>>>> >>>>>>>>>>>> * You use a reduction factor of 64, making the new communicator with 288 MPI ranks. >>>>>>>>>>>> PCTelescope will first gather a temporary matrix associated with your coarse level operator assuming a comm size of 288 living on the comm with size 18432. >>>>>>>>>>>> This matrix will require approximately 0.5 * 64 = 32 MB per core on the 288 ranks. >>>>>>>>>>>> This matrix is then used to form a new MPIAIJ matrix on the subcomm, thus require another 32 MB per rank. >>>>>>>>>>>> The temporary matrix is now destroyed. >>>>>>>>>>>> >>>>>>>>>>>> * Because a DMDA is detected, a permutation matrix is assembled. >>>>>>>>>>>> This requires 2 doubles per point in the DMDA. >>>>>>>>>>>> Your coarse DMDA contains 92 x 16 x 48 points. >>>>>>>>>>>> Thus the permutation matrix will require < 1 MB per MPI rank on the sub-comm. >>>>>>>>>>>> >>>>>>>>>>>> * Lastly, the matrix is permuted. This uses MatPtAP(), but the resulting operator will have the same memory footprint as the unpermuted matrix (32 MB). At any stage in PCTelescope, only 2 operators of size 32 MB are held in memory when the DMDA is provided. >>>>>>>>>>>> >>>>>>>>>>>> From my rough estimates, the worst case memory foot print for any given core, given your options is approximately >>>>>>>>>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB = 2465 MB >>>>>>>>>>>> This is way below 8 GB. >>>>>>>>>>>> >>>>>>>>>>>> Note this estimate completely ignores: >>>>>>>>>>>> (1) the memory required for the restriction operator, >>>>>>>>>>>> (2) the potential growth in the number of non-zeros per row due to Galerkin coarsening (I wished -ksp_view_pre reported the output from MatView so we could see the number of non-zeros required by the coarse level operators) >>>>>>>>>>>> (3) all temporary vectors required by the CG solver, and those required by the smoothers. >>>>>>>>>>>> (4) internal memory allocated by MatPtAP >>>>>>>>>>>> (5) memory associated with IS's used within PCTelescope >>>>>>>>>>>> >>>>>>>>>>>> So either I am completely off in my estimates, or you have not carefully estimated the memory usage of your application code. Hopefully others might examine/correct my rough estimates >>>>>>>>>>>> >>>>>>>>>>>> Since I don't have your code I cannot access the latter. >>>>>>>>>>>> Since I don't have access to the same machine you are running on, I think we need to take a step back. >>>>>>>>>>>> >>>>>>>>>>>> [1] What machine are you running on? Send me a URL if its available >>>>>>>>>>>> >>>>>>>>>>>> [2] What discretization are you using? (I am guessing a scalar 7 point FD stencil) >>>>>>>>>>>> If it's a 7 point FD stencil, we should be able to examine the memory usage of your solver configuration using a standard, light weight existing PETSc example, run on your machine at the same scale. >>>>>>>>>>>> This would hopefully enable us to correctly evaluate the actual memory usage required by the solver configuration you are using. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Dave >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Frank >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 07/08/2016 10:38 PM, Dave May wrote: >>>>>>>>>>>>> On Saturday, 9 July 2016, frank wrote: >>>>>>>>>>>>> Hi Barry and Dave, >>>>>>>>>>>>> >>>>>>>>>>>>> Thank both of you for the advice. >>>>>>>>>>>>> >>>>>>>>>>>>> @Barry >>>>>>>>>>>>> I made a mistake in the file names in last email. I attached the correct files this time. >>>>>>>>>>>>> For all the three tests, 'Telescope' is used as the coarse preconditioner. >>>>>>>>>>>>> >>>>>>>>>>>>> == Test1: Grid: 1536*128*384, Process Mesh: 48*4*12 >>>>>>>>>>>>> Part of the memory usage: Vector 125 124 3971904 0. >>>>>>>>>>>>> Matrix 101 101 9462372 0 >>>>>>>>>>>>> >>>>>>>>>>>>> == Test2: Grid: 1536*128*384, Process Mesh: 96*8*24 >>>>>>>>>>>>> Part of the memory usage: Vector 125 124 681672 0. >>>>>>>>>>>>> Matrix 101 101 1462180 0. >>>>>>>>>>>>> >>>>>>>>>>>>> In theory, the memory usage in Test1 should be 8 times of Test2. In my case, it is about 6 times. >>>>>>>>>>>>> >>>>>>>>>>>>> == Test3: Grid: 3072*256*768, Process Mesh: 96*8*24. Sub-domain per process: 32*32*32 >>>>>>>>>>>>> Here I get the out of memory error. >>>>>>>>>>>>> >>>>>>>>>>>>> I tried to use -mg_coarse jacobi. In this way, I don't need to set -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right? >>>>>>>>>>>>> The linear solver didn't work in this case. Petsc output some errors. >>>>>>>>>>>>> >>>>>>>>>>>>> @Dave >>>>>>>>>>>>> In test3, I use only one instance of 'Telescope'. On the coarse mesh of 'Telescope', I used LU as the preconditioner instead of SVD. >>>>>>>>>>>>> If my set the levels correctly, then on the last coarse mesh of MG where it calls 'Telescope', the sub-domain per process is 2*2*2. >>>>>>>>>>>>> On the last coarse mesh of 'Telescope', there is only one grid point per process. >>>>>>>>>>>>> I still got the OOM error. The detailed petsc option file is attached. >>>>>>>>>>>>> >>>>>>>>>>>>> Do you understand the expected memory usage for the particular parallel LU implementation you are using? I don't (seriously). Replace LU with bjacobi and re-run this test. My point about solver debugging is still valid. >>>>>>>>>>>>> >>>>>>>>>>>>> And please send the result of KSPView so we can see what is actually used in the computations >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> Dave >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you so much. >>>>>>>>>>>>> >>>>>>>>>>>>> Frank >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote: >>>>>>>>>>>>> On Jul 6, 2016, at 4:19 PM, frank wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Barry, >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you for you advice. >>>>>>>>>>>>> I tried three test. In the 1st test, the grid is 3072*256*768 and the process mesh is 96*8*24. >>>>>>>>>>>>> The linear solver is 'cg' the preconditioner is 'mg' and 'telescope' is used as the preconditioner at the coarse mesh. >>>>>>>>>>>>> The system gives me the "Out of Memory" error before the linear system is completely solved. >>>>>>>>>>>>> The info from '-ksp_view_pre' is attached. I seems to me that the error occurs when it reaches the coarse mesh. >>>>>>>>>>>>> >>>>>>>>>>>>> The 2nd test uses a grid of 1536*128*384 and process mesh is 96*8*24. The 3rd test uses the same grid but a different process mesh 48*4*12. >>>>>>>>>>>>> Are you sure this is right? The total matrix and vector memory usage goes from 2nd test >>>>>>>>>>>>> Vector 384 383 8,193,712 0. >>>>>>>>>>>>> Matrix 103 103 11,508,688 0. >>>>>>>>>>>>> to 3rd test >>>>>>>>>>>>> Vector 384 383 1,590,520 0. >>>>>>>>>>>>> Matrix 103 103 3,508,664 0. >>>>>>>>>>>>> that is the memory usage got smaller but if you have only 1/8th the processes and the same grid it should have gotten about 8 times bigger. Did you maybe cut the grid by a factor of 8 also? If so that still doesn't explain it because the memory usage changed by a factor of 5 something for the vectors and 3 something for the matrices. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> The linear solver and petsc options in 2nd and 3rd tests are the same in 1st test. The linear solver works fine in both test. >>>>>>>>>>>>> I attached the memory usage of the 2nd and 3rd tests. The memory info is from the option '-log_summary'. I tried to use '-momery_info' as you suggested, but in my case petsc treated it as an unused option. It output nothing about the memory. Do I need to add sth to my code so I can use '-memory_info'? >>>>>>>>>>>>> Sorry, my mistake the option is -memory_view >>>>>>>>>>>>> >>>>>>>>>>>>> Can you run the one case with -memory_view and -mg_coarse jacobi -ksp_max_it 1 (just so it doesn't iterate forever) to see how much memory is used without the telescope? Also run case 2 the same way. >>>>>>>>>>>>> >>>>>>>>>>>>> Barry >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> In both tests the memory usage is not large. >>>>>>>>>>>>> >>>>>>>>>>>>> It seems to me that it might be the 'telescope' preconditioner that allocated a lot of memory and caused the error in the 1st test. >>>>>>>>>>>>> Is there is a way to show how much memory it allocated? >>>>>>>>>>>>> >>>>>>>>>>>>> Frank >>>>>>>>>>>>> >>>>>>>>>>>>> On 07/05/2016 03:37 PM, Barry Smith wrote: >>>>>>>>>>>>> Frank, >>>>>>>>>>>>> >>>>>>>>>>>>> You can run with -ksp_view_pre to have it "view" the KSP before the solve so hopefully it gets that far. >>>>>>>>>>>>> >>>>>>>>>>>>> Please run the problem that does fit with -memory_info when the problem completes it will show the "high water mark" for PETSc allocated memory and total memory used. We first want to look at these numbers to see if it is using more memory than you expect. You could also run with say half the grid spacing to see how the memory usage scaled with the increase in grid points. Make the runs also with -log_view and send all the output from these options. >>>>>>>>>>>>> >>>>>>>>>>>>> Barry >>>>>>>>>>>>> >>>>>>>>>>>>> On Jul 5, 2016, at 5:23 PM, frank wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I am using the CG ksp solver and Multigrid preconditioner to solve a linear system in parallel. >>>>>>>>>>>>> I chose to use the 'Telescope' as the preconditioner on the coarse mesh for its good performance. >>>>>>>>>>>>> The petsc options file is attached. >>>>>>>>>>>>> >>>>>>>>>>>>> The domain is a 3d box. >>>>>>>>>>>>> It works well when the grid is 1536*128*384 and the process mesh is 96*8*24. When I double the size of grid and keep the same process mesh and petsc options, I get an "out of memory" error from the super-cluster I am using. >>>>>>>>>>>>> Each process has access to at least 8G memory, which should be more than enough for my application. I am sure that all the other parts of my code( except the linear solver ) do not use much memory. So I doubt if there is something wrong with the linear solver. >>>>>>>>>>>>> The error occurs before the linear system is completely solved so I don't have the info from ksp view. I am not able to re-produce the error with a smaller problem either. >>>>>>>>>>>>> In addition, I tried to use the block jacobi as the preconditioner with the same grid and same decomposition. The linear solver runs extremely slow but there is no memory error. >>>>>>>>>>>>> >>>>>>>>>>>>> How can I diagnose what exactly cause the error? >>>>>>>>>>>>> Thank you so much. >>>>>>>>>>>>> >>>>>>>>>>>>> Frank >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Oct 7 19:00:56 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 7 Oct 2016 19:00:56 -0500 Subject: [petsc-users] Time cost by Vec Assembly In-Reply-To: <31850156-d703-d278-7e07-1b903b7c90a4@uci.edu> References: <577C337B.60909@uci.edu> <5959F823-EDE5-4B34-84C2-271076977368@mcs.anl.gov> <0CFDEA05-2C49-4127-9F13-2B2DB71ADA77@mcs.anl.gov> <27f4756a-3c58-5c56-fd5b-000aac881a5b@uci.edu> <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> <31850156-d703-d278-7e07-1b903b7c90a4@uci.edu> Message-ID: <32C5EFD4-96A5-41C5-B9CF-92C42E586C9A@mcs.anl.gov> > On Oct 7, 2016, at 6:41 PM, frank wrote: > > Hello, > >>> Another thing, the vector assemble and scatter take more time as I increased the cores#: >>> >>> cores# 4096 8192 16384 32768 65536 >>> VecAssemblyBegin 298 2.91E+00 2.87E+00 8.59E+00 2.75E+01 2.21E+03 >>> VecAssemblyEnd 298 3.37E-03 1.78E-03 1.78E-03 5.13E-03 1.99E-03 >>> VecScatterBegin 76303 3.82E+00 3.01E+00 2.54E+00 4.40E+00 1.32E+00 >>> VecScatterEnd 76303 3.09E+01 1.47E+01 2.23E+01 2.96E+01 2.10E+01 >>> >>> The above data is produced by solving a constant coefficients Possoin equation with different rhs for 100 steps. >>> As you can see, the time of VecAssemblyBegin increase dramatically from 32K cores to 65K. >>> >> Something is very very wrong here. It is likely not the VecAssemblyBegin() itself that is taking the huge amount of time. VecAssemblyBegin() is a barrier, that is all processes have to reach it before any process can continue beyond it. Something in the code on some processes is taking a huge amount of time before reaching that point. Perhaps it is in starting up all the processes? Or are you generating the entire rhs on one process? You can't to that. >> >> Barry >> > (I create a new subject since this is a separate problem from my previous question.) > > Each process computes its part of the rhs. > The above result are from 100 steps' computation. It is not a starting-up issue. > > I also have the results from a simple code to show this problem: > > cores# 4096 8192 16384 32768 65536 > VecAssemblyBegin 1 4.56E-02 3.27E-02 3.63E-02 6.26E-02 2.80E+02 > VecAssemblyEnd 1 3.54E-04 3.43E-04 3.47E-04 3.44E-04 4.53E-04 > > Again, the time cost increases dramatically after 30K cores. > The max/min ratio of VecAssemblyBegin is 1.2 for both 30K and 65K cases. If there is a huge delay on some process, should this value be large? Yes, one would expect that. You are right it is something inside those calls. > > The part of code that calls the assembly subroutines looks like: > > CALL DMCreateGlobalVector( ... ) > CALL DMDAVecGetArrayF90( ... ) > ... each process computes its part of rhs... > CALL DMDAVecRestoreArrayF90(...) > There is absolutely no reason for you to be calling the VecAssemblyBegin/End() below, take it out! You only need that if you use VecSetValues() if you use XXXGetArrayYYY() and put values into the vector that way VecAssemblyBegin/End() serves no purpose. > CALL VecAssemblyBegin( ... ) > CALL VecAssemblyEnd( ... ) VecAssemblyBegin/End() does a couple of all reduces and then message passing (if values need to be moved) to get the values onto the correct processes. So these calls should take very little time. Something is wonky on your system with that many MPI processes, with these calls. I don't know why, if you look at the code you'll see it is pretty straightforward. Barry > > Thank you > > Regards, > Frank > > > On 10/04/2016 12:56 PM, Dave May wrote: >>>>> On Tuesday, 4 October 2016, frank >>>>> wrote: >>>>> Hi, >>>>> >>>>> This question is follow-up of the thread "Question about memory usage in Multigrid preconditioner". >>>>> I used to have the "Out of Memory(OOM)" problem when using the CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0; -matptap_scalable" option did solve that problem. >>>>> >>>>> Then I test the scalability by solving a 3d poisson eqn for 1 step. I used one sub-communicator in all the tests. The difference between the petsc options in those tests are: 1 the pc_telescope_reduction_factor; 2 the number of multigrid levels in the up/down solver. The function "ksp_solve" is timed. It is kind of slow and doesn't scale at all. >>>>> >>>>> Test1: 512^3 grid points >>>>> Core# telescope_reduction_factor MG levels# for up/down solver Time for KSPSolve (s) >>>>> 512 8 4 / 3 6.2466 >>>>> 4096 64 5 / 3 0.9361 >>>>> 32768 64 4 / 3 4.8914 >>>>> >>>>> Test2: 1024^3 grid points >>>>> Core# telescope_reduction_factor MG levels# for up/down solver Time for KSPSolve (s) >>>>> 4096 64 5 / 4 3.4139 >>>>> 8192 128 5 / 4 2.4196 >>>>> 16384 32 5 / 3 5.4150 >>>>> 32768 64 5 / 3 5.6067 >>>>> 65536 128 5 / 3 6.5219 >>>>> >>>>> You have to be very careful how you interpret these numbers. Your solver contains nested calls to KSPSolve, and unfortunately as a result the numbers you report include setup time. This will remain true even if you call KSPSetUp on the outermost KSP. >>>>> >>>>> Your email concerns scalability of the silver application, so let's focus on that issue. >>>>> >>>>> The only way to clearly separate setup from solve time is to perform two identical solves. The second solve will not require any setup. You should monitor the second solve via a new PetscStage. >>>>> >>>>> This was what I did in the telescope paper. It was the only way to understand the setup cost (and scaling) cf the solve time (and scaling). >>>>> >>>>> Thanks >>>>> Dave >>>>> >>>>> >>>>> I guess I didn't set the MG levels properly. What would be the efficient way to arrange the MG levels? >>>>> Also which preconditionr at the coarse mesh of the 2nd communicator should I use to improve the performance? >>>>> >>>>> I attached the test code and the petsc options file for the 1024^3 cube with 32768 cores. >>>>> >>>>> Thank you. >>>>> >>>>> Regards, >>>>> Frank >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 09/15/2016 03:35 AM, Dave May wrote: >>>>> >>>>>> HI all, >>>>>> >>>>>> I the only unexpected memory usage I can see is associated with the call to MatPtAP(). >>>>>> Here is something you can try immediately. >>>>>> Run your code with the additional options >>>>>> -matrap 0 -matptap_scalable >>>>>> >>>>>> I didn't realize this before, but the default behaviour of MatPtAP in parallel is actually to to explicitly form the transpose of P (e.g. assemble R = P^T) and then compute R.A.P. >>>>>> You don't want to do this. The option -matrap 0 resolves this issue. >>>>>> >>>>>> The implementation of P^T.A.P has two variants. >>>>>> The scalable implementation (with respect to memory usage) is selected via the second option -matptap_scalable. >>>>>> >>>>>> Try it out - I see a significant memory reduction using these options for particular mesh sizes / partitions. >>>>>> >>>>>> I've attached a cleaned up version of the code you sent me. >>>>>> There were a number of memory leaks and other issues. >>>>>> The main points being >>>>>> * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End} >>>>>> * You should call PetscFinalize(), otherwise the option -log_summary (-log_view) will not display anything once the program has completed. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Dave >>>>>> >>>>>> >>>>>> On 15 September 2016 at 08:03, Hengjie Wang >>>>>> >>>>>> wrote: >>>>>> Hi Dave, >>>>>> >>>>>> Sorry, I should have put more comment to explain the code. >>>>>> The number of process in each dimension is the same: Px = Py=Pz=P. So is the domain size. >>>>>> So if the you want to run the code for a 512^3 grid points on 16^3 cores, you need to set "-N 512 -P 16" in the command line. >>>>>> I add more comments and also fix an error in the attached code. ( The error only effects the accuracy of solution but not the memory usage. ) >>>>>> >>>>>> Thank you. >>>>>> Frank >>>>>> >>>>>> >>>>>> On 9/14/2016 9:05 PM, Dave May wrote: >>>>>> >>>>>>> On Thursday, 15 September 2016, Dave May >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> On Thursday, 15 September 2016, frank >>>>>>> >>>>>>> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I write a simple code to re-produce the error. I hope this can help to diagnose the problem. >>>>>>> The code just solves a 3d poisson equation. >>>>>>> >>>>>>> Why is the stencil width a runtime parameter?? And why is the default value 2? For 7-pnt FD Laplace, you only need a stencil width of 1. >>>>>>> >>>>>>> Was this choice made to mimic something in the real application code? >>>>>>> >>>>>>> Please ignore - I misunderstood your usage of the param set by -P >>>>>>> >>>>>>> >>>>>>> >>>>>>> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32. That's when I re-produce the OOM error. Each core has about 2G memory. >>>>>>> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp solver works fine. >>>>>>> I attached the code, ksp_view_pre's output and my petsc option file. >>>>>>> >>>>>>> Thank you. >>>>>>> Frank >>>>>>> >>>>>>> On 09/09/2016 06:38 PM, Hengjie Wang wrote: >>>>>>> >>>>>>>> Hi Barry, >>>>>>>> >>>>>>>> I checked. On the supercomputer, I had the option "-ksp_view_pre" but it is not in file I sent you. I am sorry for the confusion. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Frank >>>>>>>> >>>>>>>> On Friday, September 9, 2016, Barry Smith >>>>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>>> On Sep 9, 2016, at 3:11 PM, frank >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi Barry, >>>>>>>>> >>>>>>>>> I think the first KSP view output is from -ksp_view_pre. Before I submitted the test, I was not sure whether there would be OOM error or not. So I added both -ksp_view_pre and -ksp_view. >>>>>>>>> >>>>>>>> But the options file you sent specifically does NOT list the -ksp_view_pre so how could it be from that? >>>>>>>> >>>>>>>> Sorry to be pedantic but I've spent too much time in the past trying to debug from incorrect information and want to make sure that the information I have is correct before thinking. Please recheck exactly what happened. Rerun with the exact input file you emailed if that is needed. >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>>> Frank >>>>>>>>> >>>>>>>>> >>>>>>>>> On 09/09/2016 12:38 PM, Barry Smith wrote: >>>>>>>>> >>>>>>>>>> Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt has only one KSPView in it? Did you run two different solves in the 2 case but not the one? >>>>>>>>>> >>>>>>>>>> Barry >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Sep 9, 2016, at 10:56 AM, frank >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I want to continue digging into the memory problem here. >>>>>>>>>>> I did find a work around in the past, which is to use less cores per node so that each core has 8G memory. However this is deficient and expensive. I hope to locate the place that uses the most memory. >>>>>>>>>>> >>>>>>>>>>> Here is a brief summary of the tests I did in past: >>>>>>>>>>> >>>>>>>>>>>> Test1: Mesh 1536*128*384 | Process Mesh 48*4*12 >>>>>>>>>>>> >>>>>>>>>>> Maximum (over computational time) process memory: total 7.0727e+08 >>>>>>>>>>> Current process memory: total 7.0727e+08 >>>>>>>>>>> Maximum (over computational time) space PetscMalloc()ed: total 6.3908e+11 >>>>>>>>>>> Current space PetscMalloc()ed: total 1.8275e+09 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Test2: Mesh 1536*128*384 | Process Mesh 96*8*24 >>>>>>>>>>>> >>>>>>>>>>> Maximum (over computational time) process memory: total 5.9431e+09 >>>>>>>>>>> Current process memory: total 5.9431e+09 >>>>>>>>>>> Maximum (over computational time) space PetscMalloc()ed: total 5.3202e+12 >>>>>>>>>>> Current space PetscMalloc()ed: total 5.4844e+09 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Test3: Mesh 3072*256*768 | Process Mesh 96*8*24 >>>>>>>>>>>> >>>>>>>>>>> OOM( Out Of Memory ) killer of the supercomputer terminated the job during "KSPSolve". >>>>>>>>>>> >>>>>>>>>>> I attached the output of ksp_view( the third test's output is from ksp_view_pre ), memory_view and also the petsc options. >>>>>>>>>>> >>>>>>>>>>> In all the tests, each core can access about 2G memory. In test3, there are 4223139840 non-zeros in the matrix. This will consume about 1.74M, using double precision. Considering some extra memory used to store integer index, 2G memory should still be way enough. >>>>>>>>>>> >>>>>>>>>>> Is there a way to find out which part of KSPSolve uses the most memory? >>>>>>>>>>> Thank you so much. >>>>>>>>>>> >>>>>>>>>>> BTW, there are 4 options remains unused and I don't understand why they are omitted: >>>>>>>>>>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly >>>>>>>>>>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi >>>>>>>>>>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1 >>>>>>>>>>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Frank >>>>>>>>>>> >>>>>>>>>>> On 07/13/2016 05:47 PM, Dave May wrote: >>>>>>>>>>> >>>>>>>>>>>> On 14 July 2016 at 01:07, frank >>>>>>>>>>>> wrote: >>>>>>>>>>>> Hi Dave, >>>>>>>>>>>> >>>>>>>>>>>> Sorry for the late reply. >>>>>>>>>>>> Thank you so much for your detailed reply. >>>>>>>>>>>> >>>>>>>>>>>> I have a question about the estimation of the memory usage. There are 4223139840 allocated non-zeros and 18432 MPI processes. Double precision is used. So the memory per process is: >>>>>>>>>>>> 4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ? >>>>>>>>>>>> Did I do sth wrong here? Because this seems too small. >>>>>>>>>>>> >>>>>>>>>>>> No - I totally f***ed it up. You are correct. That'll teach me for fumbling around with my iphone calculator and not using my brain. (Note that to convert to MB just divide by 1e6, not 1024^2 - although I apparently cannot convert between units correctly....) >>>>>>>>>>>> >>>>>>>>>>>> From the PETSc objects associated with the solver, It looks like it _should_ run with 2GB per MPI rank. Sorry for my mistake. Possibilities are: somewhere in your usage of PETSc you've introduced a memory leak; PETSc is doing a huge over allocation (e.g. as per our discussion of MatPtAP); or in your application code there are other objects you have forgotten to log the memory for. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I am running this job on Bluewater >>>>>>>>>>>> I am using the 7 points FD stencil in 3D. >>>>>>>>>>>> >>>>>>>>>>>> I thought so on both counts. >>>>>>>>>>>> >>>>>>>>>>>> I apologize that I made a stupid mistake in computing the memory per core. My settings render each core can access only 2G memory on average instead of 8G which I mentioned in previous email. I re-run the job with 8G memory per core on average and there is no "Out Of Memory" error. I would do more test to see if there is still some memory issue. >>>>>>>>>>>> >>>>>>>>>>>> Ok. I'd still like to know where the memory was being used since my estimates were off. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Dave >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Frank >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 07/11/2016 01:18 PM, Dave May wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Frank, >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 11 July 2016 at 19:14, frank >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> Hi Dave, >>>>>>>>>>>>> >>>>>>>>>>>>> I re-run the test using bjacobi as the preconditioner on the coarse mesh of telescope. The Grid is 3072*256*768 and process mesh is 96*8*24. The petsc option file is attached. >>>>>>>>>>>>> I still got the "Out Of Memory" error. The error occurred before the linear solver finished one step. So I don't have the full info from ksp_view. The info from ksp_view_pre is attached. >>>>>>>>>>>>> >>>>>>>>>>>>> Okay - that is essentially useless (sorry) >>>>>>>>>>>>> >>>>>>>>>>>>> It seems to me that the error occurred when the decomposition was going to be changed. >>>>>>>>>>>>> >>>>>>>>>>>>> Based on what information? >>>>>>>>>>>>> Running with -info would give us more clues, but will create a ton of output. >>>>>>>>>>>>> Please try running the case which failed with -info >>>>>>>>>>>>> I had another test with a grid of 1536*128*384 and the same process mesh as above. There was no error. The ksp_view info is attached for comparison. >>>>>>>>>>>>> Thank you. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> [3] Here is my crude estimate of your memory usage. >>>>>>>>>>>>> I'll target the biggest memory hogs only to get an order of magnitude estimate >>>>>>>>>>>>> >>>>>>>>>>>>> * The Fine grid operator contains 4223139840 non-zeros --> 1.8 GB per MPI rank assuming double precision. >>>>>>>>>>>>> The indices for the AIJ could amount to another 0.3 GB (assuming 32 bit integers) >>>>>>>>>>>>> >>>>>>>>>>>>> * You use 5 levels of coarsening, so the other operators should represent (collectively) >>>>>>>>>>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4 ~ 300 MB per MPI rank on the communicator with 18432 ranks. >>>>>>>>>>>>> The coarse grid should consume ~ 0.5 MB per MPI rank on the communicator with 18432 ranks. >>>>>>>>>>>>> >>>>>>>>>>>>> * You use a reduction factor of 64, making the new communicator with 288 MPI ranks. >>>>>>>>>>>>> PCTelescope will first gather a temporary matrix associated with your coarse level operator assuming a comm size of 288 living on the comm with size 18432. >>>>>>>>>>>>> This matrix will require approximately 0.5 * 64 = 32 MB per core on the 288 ranks. >>>>>>>>>>>>> This matrix is then used to form a new MPIAIJ matrix on the subcomm, thus require another 32 MB per rank. >>>>>>>>>>>>> The temporary matrix is now destroyed. >>>>>>>>>>>>> >>>>>>>>>>>>> * Because a DMDA is detected, a permutation matrix is assembled. >>>>>>>>>>>>> This requires 2 doubles per point in the DMDA. >>>>>>>>>>>>> Your coarse DMDA contains 92 x 16 x 48 points. >>>>>>>>>>>>> Thus the permutation matrix will require < 1 MB per MPI rank on the sub-comm. >>>>>>>>>>>>> >>>>>>>>>>>>> * Lastly, the matrix is permuted. This uses MatPtAP(), but the resulting operator will have the same memory footprint as the unpermuted matrix (32 MB). At any stage in PCTelescope, only 2 operators of size 32 MB are held in memory when the DMDA is provided. >>>>>>>>>>>>> >>>>>>>>>>>>> From my rough estimates, the worst case memory foot print for any given core, given your options is approximately >>>>>>>>>>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB = 2465 MB >>>>>>>>>>>>> This is way below 8 GB. >>>>>>>>>>>>> >>>>>>>>>>>>> Note this estimate completely ignores: >>>>>>>>>>>>> (1) the memory required for the restriction operator, >>>>>>>>>>>>> (2) the potential growth in the number of non-zeros per row due to Galerkin coarsening (I wished -ksp_view_pre reported the output from MatView so we could see the number of non-zeros required by the coarse level operators) >>>>>>>>>>>>> (3) all temporary vectors required by the CG solver, and those required by the smoothers. >>>>>>>>>>>>> (4) internal memory allocated by MatPtAP >>>>>>>>>>>>> (5) memory associated with IS's used within PCTelescope >>>>>>>>>>>>> >>>>>>>>>>>>> So either I am completely off in my estimates, or you have not carefully estimated the memory usage of your application code. Hopefully others might examine/correct my rough estimates >>>>>>>>>>>>> >>>>>>>>>>>>> Since I don't have your code I cannot access the latter. >>>>>>>>>>>>> Since I don't have access to the same machine you are running on, I think we need to take a step back. >>>>>>>>>>>>> >>>>>>>>>>>>> [1] What machine are you running on? Send me a URL if its available >>>>>>>>>>>>> >>>>>>>>>>>>> [2] What discretization are you using? (I am guessing a scalar 7 point FD stencil) >>>>>>>>>>>>> If it's a 7 point FD stencil, we should be able to examine the memory usage of your solver configuration using a standard, light weight existing PETSc example, run on your machine at the same scale. >>>>>>>>>>>>> This would hopefully enable us to correctly evaluate the actual memory usage required by the solver configuration you are using. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Dave >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Frank >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 07/08/2016 10:38 PM, Dave May wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> On Saturday, 9 July 2016, frank >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> Hi Barry and Dave, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank both of you for the advice. >>>>>>>>>>>>>> >>>>>>>>>>>>>> @Barry >>>>>>>>>>>>>> I made a mistake in the file names in last email. I attached the correct files this time. >>>>>>>>>>>>>> For all the three tests, 'Telescope' is used as the coarse preconditioner. >>>>>>>>>>>>>> >>>>>>>>>>>>>> == Test1: Grid: 1536*128*384, Process Mesh: 48*4*12 >>>>>>>>>>>>>> Part of the memory usage: Vector 125 124 3971904 0. >>>>>>>>>>>>>> Matrix 101 101 9462372 0 >>>>>>>>>>>>>> >>>>>>>>>>>>>> == Test2: Grid: 1536*128*384, Process Mesh: 96*8*24 >>>>>>>>>>>>>> Part of the memory usage: Vector 125 124 681672 0. >>>>>>>>>>>>>> Matrix 101 101 1462180 0. >>>>>>>>>>>>>> >>>>>>>>>>>>>> In theory, the memory usage in Test1 should be 8 times of Test2. In my case, it is about 6 times. >>>>>>>>>>>>>> >>>>>>>>>>>>>> == Test3: Grid: 3072*256*768, Process Mesh: 96*8*24. Sub-domain per process: 32*32*32 >>>>>>>>>>>>>> Here I get the out of memory error. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I tried to use -mg_coarse jacobi. In this way, I don't need to set -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right? >>>>>>>>>>>>>> The linear solver didn't work in this case. Petsc output some errors. >>>>>>>>>>>>>> >>>>>>>>>>>>>> @Dave >>>>>>>>>>>>>> In test3, I use only one instance of 'Telescope'. On the coarse mesh of 'Telescope', I used LU as the preconditioner instead of SVD. >>>>>>>>>>>>>> If my set the levels correctly, then on the last coarse mesh of MG where it calls 'Telescope', the sub-domain per process is 2*2*2. >>>>>>>>>>>>>> On the last coarse mesh of 'Telescope', there is only one grid point per process. >>>>>>>>>>>>>> I still got the OOM error. The detailed petsc option file is attached. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Do you understand the expected memory usage for the particular parallel LU implementation you are using? I don't (seriously). Replace LU with bjacobi and re-run this test. My point about solver debugging is still valid. >>>>>>>>>>>>>> >>>>>>>>>>>>>> And please send the result of KSPView so we can see what is actually used in the computations >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>> Dave >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you so much. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Frank >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote: >>>>>>>>>>>>>> On Jul 6, 2016, at 4:19 PM, frank >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Barry, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you for you advice. >>>>>>>>>>>>>> I tried three test. In the 1st test, the grid is 3072*256*768 and the process mesh is 96*8*24. >>>>>>>>>>>>>> The linear solver is 'cg' the preconditioner is 'mg' and 'telescope' is used as the preconditioner at the coarse mesh. >>>>>>>>>>>>>> The system gives me the "Out of Memory" error before the linear system is completely solved. >>>>>>>>>>>>>> The info from '-ksp_view_pre' is attached. I seems to me that the error occurs when it reaches the coarse mesh. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The 2nd test uses a grid of 1536*128*384 and process mesh is 96*8*24. The 3rd test uses the same grid but a different process mesh 48*4*12. >>>>>>>>>>>>>> Are you sure this is right? The total matrix and vector memory usage goes from 2nd test >>>>>>>>>>>>>> Vector 384 383 8,193,712 0. >>>>>>>>>>>>>> Matrix 103 103 11,508,688 0. >>>>>>>>>>>>>> to 3rd test >>>>>>>>>>>>>> Vector 384 383 1,590,520 0. >>>>>>>>>>>>>> Matrix 103 103 3,508,664 0. >>>>>>>>>>>>>> that is the memory usage got smaller but if you have only 1/8th the processes and the same grid it should have gotten about 8 times bigger. Did you maybe cut the grid by a factor of 8 also? If so that still doesn't explain it because the memory usage changed by a factor of 5 something for the vectors and 3 something for the matrices. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> The linear solver and petsc options in 2nd and 3rd tests are the same in 1st test. The linear solver works fine in both test. >>>>>>>>>>>>>> I attached the memory usage of the 2nd and 3rd tests. The memory info is from the option '-log_summary'. I tried to use '-momery_info' as you suggested, but in my case petsc treated it as an unused option. It output nothing about the memory. Do I need to add sth to my code so I can use '-memory_info'? >>>>>>>>>>>>>> Sorry, my mistake the option is -memory_view >>>>>>>>>>>>>> >>>>>>>>>>>>>> Can you run the one case with -memory_view and -mg_coarse jacobi -ksp_max_it 1 (just so it doesn't iterate forever) to see how much memory is used without the telescope? Also run case 2 the same way. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Barry >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> In both tests the memory usage is not large. >>>>>>>>>>>>>> >>>>>>>>>>>>>> It seems to me that it might be the 'telescope' preconditioner that allocated a lot of memory and caused the error in the 1st test. >>>>>>>>>>>>>> Is there is a way to show how much memory it allocated? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Frank >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 07/05/2016 03:37 PM, Barry Smith wrote: >>>>>>>>>>>>>> Frank, >>>>>>>>>>>>>> >>>>>>>>>>>>>> You can run with -ksp_view_pre to have it "view" the KSP before the solve so hopefully it gets that far. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please run the problem that does fit with -memory_info when the problem completes it will show the "high water mark" for PETSc allocated memory and total memory used. We first want to look at these numbers to see if it is using more memory than you expect. You could also run with say half the grid spacing to see how the memory usage scaled with the increase in grid points. Make the runs also with -log_view and send all the output from these options. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Barry >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Jul 5, 2016, at 5:23 PM, frank >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am using the CG ksp solver and Multigrid preconditioner to solve a linear system in parallel. >>>>>>>>>>>>>> I chose to use the 'Telescope' as the preconditioner on the coarse mesh for its good performance. >>>>>>>>>>>>>> The petsc options file is attached. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The domain is a 3d box. >>>>>>>>>>>>>> It works well when the grid is 1536*128*384 and the process mesh is 96*8*24. When I double the size of grid and keep the same process mesh and petsc options, I get an "out of memory" error from the super-cluster I am using. >>>>>>>>>>>>>> Each process has access to at least 8G memory, which should be more than enough for my application. I am sure that all the other parts of my code( except the linear solver ) do not use much memory. So I doubt if there is something wrong with the linear solver. >>>>>>>>>>>>>> The error occurs before the linear system is completely solved so I don't have the info from ksp view. I am not able to re-produce the error with a smaller problem either. >>>>>>>>>>>>>> In addition, I tried to use the block jacobi as the preconditioner with the same grid and same decomposition. The linear solver runs extremely slow but there is no memory error. >>>>>>>>>>>>>> >>>>>>>>>>>>>> How can I diagnose what exactly cause the error? >>>>>>>>>>>>>> Thank you so much. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Frank >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> > > From ztdepyahoo at 163.com Fri Oct 7 22:41:34 2016 From: ztdepyahoo at 163.com (=?GBK?B?tqHAz8qm?=) Date: Sat, 8 Oct 2016 11:41:34 +0800 (CST) Subject: [petsc-users] How to Get the last absolute residual that has been computed Message-ID: <3ab40ece.6a17.157a261b2b7.Coremail.ztdepyahoo@163.com> Dear professor: How to Get the last absolute residual that has been computed -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Fri Oct 7 22:44:16 2016 From: jed at jedbrown.org (Jed Brown) Date: Fri, 07 Oct 2016 21:44:16 -0600 Subject: [petsc-users] Time cost by Vec Assembly In-Reply-To: <32C5EFD4-96A5-41C5-B9CF-92C42E586C9A@mcs.anl.gov> References: <577C337B.60909@uci.edu> <5959F823-EDE5-4B34-84C2-271076977368@mcs.anl.gov> <0CFDEA05-2C49-4127-9F13-2B2DB71ADA77@mcs.anl.gov> <27f4756a-3c58-5c56-fd5b-000aac881a5b@uci.edu> <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> <31850156-d703-d278-7e07-1b903b7c90 a4@uci.e du> <32C5EFD4-96A5-41C5-B9CF-92C42E586C9A@mcs.anl.gov> Message-ID: <87oa2vcxnz.fsf@jedbrown.org> Barry Smith writes: > VecAssemblyBegin/End() does a couple of all reduces and then message passing (if values need to be moved) to get the values onto the correct processes. So these calls should take very little time. Something is wonky on your system with that many MPI processes, with these calls. I don't know why, if you look at the code you'll see it is pretty straightforward. Those MPI calls can be pretty sucky on some networks. Dave encountered this years ago when they were using VecSetValues/VecAssembly rather heavily. I think that most performance-aware PETSc applications typically never tried to use VecSetValues/VecAssembly or they did not need to do it very often (e.g., as part of a matrix-free solver). The BTS implementation fixes the performance issue, but I'm still working on solving the corner case that has been reported. Fortunately, the VecAssembly is totally superfluous to this user. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From bsmith at mcs.anl.gov Fri Oct 7 22:52:39 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 7 Oct 2016 22:52:39 -0500 Subject: [petsc-users] Time cost by Vec Assembly In-Reply-To: <87oa2vcxnz.fsf@jedbrown.org> References: <577C337B.60909@uci.edu> <5959F823-EDE5-4B34-84C2-271076977368@mcs.anl.gov> <0CFDEA05-2C49-4127-9F13-2B2DB71ADA77@mcs.anl.gov> <27f4756a-3c58-5c56-fd5b-000aac881a5b@uci.edu> <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> <31850156-d703-d278-7e07-1b903b7c90 a4@uci.e du> <32C5EFD4-96A5-41C5-B9CF-92C42E586C9A@mcs.anl.gov> <87oa2vcxnz.fsf@jedbrown.org> Message-ID: > On Oct 7, 2016, at 10:44 PM, Jed Brown wrote: > > Barry Smith writes: >> VecAssemblyBegin/End() does a couple of all reduces and then message passing (if values need to be moved) to get the values onto the correct processes. So these calls should take very little time. Something is wonky on your system with that many MPI processes, with these calls. I don't know why, if you look at the code you'll see it is pretty straightforward. > > Those MPI calls can be pretty sucky on some networks. Dave encountered > this years ago when they were using VecSetValues/VecAssembly rather > heavily. I think that most performance-aware PETSc applications > typically never tried to use VecSetValues/VecAssembly or they did not > need to do it very often (e.g., as part of a matrix-free solver). The > BTS implementation fixes the performance issue, but I'm still working on > solving the corner case that has been reported. Fortunately, the > VecAssembly is totally superfluous to this user. Jed, There is still something wonky here, whether it is the MPI implementation or how PETSc handles the assembly. Without any values that need to be communicated it is unacceptably that these calls take so long. If we understood __exactly__ why the performance suddenly drops so dramatically we could perhaps fix it. I do not understand why. Barry From bsmith at mcs.anl.gov Fri Oct 7 23:11:36 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 7 Oct 2016 23:11:36 -0500 Subject: [petsc-users] How to Get the last absolute residual that has been computed In-Reply-To: <3ab40ece.6a17.157a261b2b7.Coremail.ztdepyahoo@163.com> References: <3ab40ece.6a17.157a261b2b7.Coremail.ztdepyahoo@163.com> Message-ID: <2779F65A-E71A-4954-AE1F-AB74EB5D5658@mcs.anl.gov> KSPGetResidualNorm() if you wish the true (and not preconditioned residual) you must call KSPSetNormType() be the KSPSolve SNESGetFunctionNorm() > On Oct 7, 2016, at 10:41 PM, ??? wrote: > > Dear professor: > How to Get the last absolute residual that has been computed > > > > > > > > > > > > > > > From jed at jedbrown.org Fri Oct 7 23:30:11 2016 From: jed at jedbrown.org (Jed Brown) Date: Fri, 07 Oct 2016 22:30:11 -0600 Subject: [petsc-users] Time cost by Vec Assembly In-Reply-To: References: <577C337B.60909@uci.edu> <5959F823-EDE5-4B34-84C2-271076977368@mcs.anl.gov> <0CFDEA05-2C49-4127-9F13-2B2DB71ADA77@mcs.anl.gov> <27f4756a-3c58-5c56-fd5b-000aac881a5b@uci.edu> <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> <32C5EFD4-96A5-41C5-B9CF-92C42E586C 9A@mcs.a nl.gov> <87oa2vcxnz.fsf@jedbrown.org> Message-ID: <87int3cvjg.fsf@jedbrown.org> Barry Smith writes: > There is still something wonky here, whether it is the MPI implementation or how PETSc handles the assembly. Without any values that need to be communicated it is unacceptably that these calls take so long. If we understood __exactly__ why the performance suddenly drops so dramatically we could perhaps fix it. I do not understand why. I guess it's worth timing. If they don't have MPI_Reduce_scatter_block then it falls back to a big MPI_Allreduce. After that, it's all point-to-point messaging that shouldn't suck and there actually shouldn't be anything to send or receive anyway. The BTS implementation should be much smarter and literally reduces to a barrier in this case. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From jed at jedbrown.org Fri Oct 7 23:30:52 2016 From: jed at jedbrown.org (Jed Brown) Date: Fri, 07 Oct 2016 22:30:52 -0600 Subject: [petsc-users] How to Get the last absolute residual that has been computed In-Reply-To: <3ab40ece.6a17.157a261b2b7.Coremail.ztdepyahoo@163.com> References: <3ab40ece.6a17.157a261b2b7.Coremail.ztdepyahoo@163.com> Message-ID: <87fuo7cvib.fsf@jedbrown.org> ??? writes: > Dear professor: > How to Get the last absolute residual that has been computed SNESGetFunctionNorm? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From dave.mayhem23 at gmail.com Fri Oct 7 23:52:01 2016 From: dave.mayhem23 at gmail.com (Dave May) Date: Sat, 8 Oct 2016 05:52:01 +0100 Subject: [petsc-users] Performance of the Telescope Multigrid Preconditioner In-Reply-To: References: <577C337B.60909@uci.edu> <5959F823-EDE5-4B34-84C2-271076977368@mcs.anl.gov> <0CFDEA05-2C49-4127-9F13-2B2DB71ADA77@mcs.anl.gov> <27f4756a-3c58-5c56-fd5b-000aac881a5b@uci.edu> <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> Message-ID: On Friday, 7 October 2016, frank wrote: > Dear all, > > Thank you so much for the advice. > > All setup is done in the first solve. > > >> ** The time for 1st solve does not scale. >> In practice, I am solving a variable coefficient Poisson equation. I >> need to build the matrix every time step. Therefore, each step is similar >> to the 1st solve which does not scale. Is there a way I can improve the >> performance? >> > >> You could use rediscretization instead of Galerkin to produce the coarse >> operators. >> > > Yes I can think of one option for improved performance, but I cannot tell > whether it will be beneficial because the logging isn't sufficiently fine > grained (and there is no easy way to get the info out of petsc). > > I use PtAP to repartition the matrix, this could be consuming most of the > setup time in Telescope with your run. Such a repartitioning could be avoid > if you provided a method to create the operator on the coarse levels (what > Matt is suggesting). However, this requires you to be able to define your > coefficients on the coarse grid. This will most likely reduce setup time, > but your coarse grid operators (now re-discretized) are likely to be less > effective than those generated via Galerkin coarsening. > > > Please correct me if I understand this incorrectly: I can define my own > restriction function and pass it to petsc instead of using PtAP. > If so,what's the interface to do that? > You need to provide your provide a method to KSPSetComputeOoerators to your outer KSP http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/KSP/KSPSetComputeOperators.html This method will get propagated through telescope to the KSP running in the sub-comm. Note that this functionality is currently not support for fortran. I need to make a small modification to telescope to enable fortran support. Thanks Dave > > > > Also, you use CG/MG when FMG by itself would probably be faster. Your >> smoother is likely not strong enough, and you >> should use something like V(2,2). There is a lot of tuning that is >> possible, but difficult to automate. >> > > Matt's completely correct. > If we could automate this in a meaningful manner, we would have done so. > > > I am not as familiar with multigrid as you guys. It would be very kind if > you could be more specific. > What does V(2,2) stand for? Is there some strong smoother build in petsc > that I can try? > > > Another thing, the vector assemble and scatter take more time as I > increased the cores#: > > cores# 4096 > 8192 16384 32768 65536 > VecAssemblyBegin 298 2.91E+00 2.87E+00 8.59E+00 > 2.75E+01 2.21E+03 > VecAssemblyEnd 298 3.37E-03 1.78E-03 1.78E-03 > 5.13E-03 1.99E-03 > VecScatterBegin 76303 3.82E+00 3.01E+00 2.54E+00 > 4.40E+00 1.32E+00 > VecScatterEnd 76303 3.09E+01 1.47E+01 2.23E+01 > 2.96E+01 2.10E+01 > > The above data is produced by solving a constant coefficients Possoin > equation with different rhs for 100 steps. > As you can see, the time of VecAssemblyBegin increase dramatically from > 32K cores to 65K. > With 65K cores, it took more time to assemble the rhs than solving the > equation. Is there a way to improve this? > > > Thank you. > > Regards, > Frank > > > > > > > > > > > > > > > > > >>> >>> >>> >>> >>> On 10/04/2016 12:56 PM, Dave May wrote: >>> >>> >>> >>> On Tuesday, 4 October 2016, frank >> > wrote: >>> >>>> Hi, >>>> This question is follow-up of the thread "Question about memory usage >>>> in Multigrid preconditioner". >>>> I used to have the "Out of Memory(OOM)" problem when using the >>>> CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0; >>>> -matptap_scalable" option did solve that problem. >>>> >>>> Then I test the scalability by solving a 3d poisson eqn for 1 step. I >>>> used one sub-communicator in all the tests. The difference between the >>>> petsc options in those tests are: 1 the pc_telescope_reduction_factor; 2 >>>> the number of multigrid levels in the up/down solver. The function >>>> "ksp_solve" is timed. It is kind of slow and doesn't scale at all. >>>> >>>> Test1: 512^3 grid points >>>> Core# telescope_reduction_factor MG levels# for up/down >>>> solver Time for KSPSolve (s) >>>> 512 8 4 / >>>> 3 6.2466 >>>> 4096 64 5 / >>>> 3 0.9361 >>>> 32768 64 4 / >>>> 3 4.8914 >>>> >>>> Test2: 1024^3 grid points >>>> Core# telescope_reduction_factor MG levels# for up/down >>>> solver Time for KSPSolve (s) >>>> 4096 64 5 / 4 >>>> 3.4139 >>>> 8192 128 5 / >>>> 4 2.4196 >>>> 16384 32 5 / 3 >>>> 5.4150 >>>> 32768 64 5 / >>>> 3 5.6067 >>>> 65536 128 5 / >>>> 3 6.5219 >>>> >>> >>> You have to be very careful how you interpret these numbers. Your solver >>> contains nested calls to KSPSolve, and unfortunately as a result the >>> numbers you report include setup time. This will remain true even if you >>> call KSPSetUp on the outermost KSP. >>> >>> Your email concerns scalability of the silver application, so let's >>> focus on that issue. >>> >>> The only way to clearly separate setup from solve time is to perform two >>> identical solves. The second solve will not require any setup. You should >>> monitor the second solve via a new PetscStage. >>> >>> This was what I did in the telescope paper. It was the only way to >>> understand the setup cost (and scaling) cf the solve time (and scaling). >>> >>> Thanks >>> Dave >>> >>> >>> >>>> I guess I didn't set the MG levels properly. What would be the >>>> efficient way to arrange the MG levels? >>>> Also which preconditionr at the coarse mesh of the 2nd communicator >>>> should I use to improve the performance? >>>> >>>> I attached the test code and the petsc options file for the 1024^3 cube >>>> with 32768 cores. >>>> >>>> Thank you. >>>> >>>> Regards, >>>> Frank >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 09/15/2016 03:35 AM, Dave May wrote: >>>> >>>> HI all, >>>> >>>> I the only unexpected memory usage I can see is associated with the >>>> call to MatPtAP(). >>>> Here is something you can try immediately. >>>> Run your code with the additional options >>>> -matrap 0 -matptap_scalable >>>> >>>> I didn't realize this before, but the default behaviour of MatPtAP in >>>> parallel is actually to to explicitly form the transpose of P (e.g. >>>> assemble R = P^T) and then compute R.A.P. >>>> You don't want to do this. The option -matrap 0 resolves this issue. >>>> >>>> The implementation of P^T.A.P has two variants. >>>> The scalable implementation (with respect to memory usage) is selected >>>> via the second option -matptap_scalable. >>>> >>>> Try it out - I see a significant memory reduction using these options >>>> for particular mesh sizes / partitions. >>>> >>>> I've attached a cleaned up version of the code you sent me. >>>> There were a number of memory leaks and other issues. >>>> The main points being >>>> * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End} >>>> * You should call PetscFinalize(), otherwise the option -log_summary >>>> (-log_view) will not display anything once the program has completed. >>>> >>>> >>>> Thanks, >>>> Dave >>>> >>>> >>>> On 15 September 2016 at 08:03, Hengjie Wang wrote: >>>> >>>>> Hi Dave, >>>>> >>>>> Sorry, I should have put more comment to explain the code. >>>>> The number of process in each dimension is the same: Px = Py=Pz=P. So >>>>> is the domain size. >>>>> So if the you want to run the code for a 512^3 grid points on 16^3 >>>>> cores, you need to set "-N 512 -P 16" in the command line. >>>>> I add more comments and also fix an error in the attached code. ( The >>>>> error only effects the accuracy of solution but not the memory usage. ) >>>>> >>>>> Thank you. >>>>> Frank >>>>> >>>>> >>>>> On 9/14/2016 9:05 PM, Dave May wrote: >>>>> >>>>> >>>>> >>>>> On Thursday, 15 September 2016, Dave May >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Thursday, 15 September 2016, frank wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I write a simple code to re-produce the error. I hope this can help >>>>>>> to diagnose the problem. >>>>>>> The code just solves a 3d poisson equation. >>>>>>> >>>>>> >>>>>> Why is the stencil width a runtime parameter?? And why is the default >>>>>> value 2? For 7-pnt FD Laplace, you only need a stencil width of 1. >>>>>> >>>>>> Was this choice made to mimic something in the real application code? >>>>>> >>>>> >>>>> Please ignore - I misunderstood your usage of the param set by -P >>>>> >>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * >>>>>>> 32. That's when I re-produce the OOM error. Each core has about 2G memory. >>>>>>> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The >>>>>>> ksp solver works fine. >>>>>>> I attached the code, ksp_view_pre's output and my petsc option file. >>>>>>> >>>>>>> Thank you. >>>>>>> Frank >>>>>>> >>>>>>> On 09/09/2016 06:38 PM, Hengjie Wang wrote: >>>>>>> >>>>>>> Hi Barry, >>>>>>> >>>>>>> I checked. On the supercomputer, I had the option "-ksp_view_pre" >>>>>>> but it is not in file I sent you. I am sorry for the confusion. >>>>>>> >>>>>>> Regards, >>>>>>> Frank >>>>>>> >>>>>>> On Friday, September 9, 2016, Barry Smith >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> > On Sep 9, 2016, at 3:11 PM, frank wrote: >>>>>>>> > >>>>>>>> > Hi Barry, >>>>>>>> > >>>>>>>> > I think the first KSP view output is from -ksp_view_pre. Before I >>>>>>>> submitted the test, I was not sure whether there would be OOM error or not. >>>>>>>> So I added both -ksp_view_pre and -ksp_view. >>>>>>>> >>>>>>>> But the options file you sent specifically does NOT list the >>>>>>>> -ksp_view_pre so how could it be from that? >>>>>>>> >>>>>>>> Sorry to be pedantic but I've spent too much time in the past >>>>>>>> trying to debug from incorrect information and want to make sure that the >>>>>>>> information I have is correct before thinking. Please recheck exactly what >>>>>>>> happened. Rerun with the exact input file you emailed if that is needed. >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> > >>>>>>>> > Frank >>>>>>>> > >>>>>>>> > >>>>>>>> > On 09/09/2016 12:38 PM, Barry Smith wrote: >>>>>>>> >> Why does ksp_view2.txt have two KSP views in it while >>>>>>>> ksp_view1.txt has only one KSPView in it? Did you run two different solves >>>>>>>> in the 2 case but not the one? >>>>>>>> >> >>>>>>>> >> Barry >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >>> On Sep 9, 2016, at 10:56 AM, frank wrote: >>>>>>>> >>> >>>>>>>> >>> Hi, >>>>>>>> >>> >>>>>>>> >>> I want to continue digging into the memory problem here. >>>>>>>> >>> I did find a work around in the past, which is to use less >>>>>>>> cores per node so that each core has 8G memory. However this is deficient >>>>>>>> and expensive. I hope to locate the place that uses the most memory. >>>>>>>> >>> >>>>>>>> >>> Here is a brief summary of the tests I did in past: >>>>>>>> >>>> Test1: Mesh 1536*128*384 | Process Mesh 48*4*12 >>>>>>>> >>> Maximum (over computational time) process memory: >>>>>>>> total 7.0727e+08 >>>>>>>> >>> Current process memory: >>>>>>>> total 7.0727e+08 >>>>>>>> >>> Maximum (over computational time) space PetscMalloc()ed: total >>>>>>>> 6.3908e+11 >>>>>>>> >>> Current space PetscMalloc()ed: >>>>>>>> total 1.8275e+09 >>>>>>>> >>> >>>>>>>> >>>> Test2: Mesh 1536*128*384 | Process Mesh 96*8*24 >>>>>>>> >>> Maximum (over computational time) process memory: >>>>>>>> total 5.9431e+09 >>>>>>>> >>> Current process memory: >>>>>>>> total 5.9431e+09 >>>>>>>> >>> Maximum (over computational time) space PetscMalloc()ed: total >>>>>>>> 5.3202e+12 >>>>>>>> >>> Current space PetscMalloc()ed: >>>>>>>> total 5.4844e+09 >>>>>>>> >>> >>>>>>>> >>>> Test3: Mesh 3072*256*768 | Process Mesh 96*8*24 >>>>>>>> >>> OOM( Out Of Memory ) killer of the supercomputer terminated >>>>>>>> the job during "KSPSolve". >>>>>>>> >>> >>>>>>>> >>> I attached the output of ksp_view( the third test's output is >>>>>>>> from ksp_view_pre ), memory_view and also the petsc options. >>>>>>>> >>> >>>>>>>> >>> In all the tests, each core can access about 2G memory. In >>>>>>>> test3, there are 4223139840 non-zeros in the matrix. This will consume >>>>>>>> about 1.74M, using double precision. Considering some extra memory used to >>>>>>>> store integer index, 2G memory should still be way enough. >>>>>>>> >>> >>>>>>>> >>> Is there a way to find out which part of KSPSolve uses the most >>>>>>>> memory? >>>>>>>> >>> Thank you so much. >>>>>>>> >>> >>>>>>>> >>> BTW, there are 4 options remains unused and I don't understand >>>>>>>> why they are omitted: >>>>>>>> >>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly >>>>>>>> >>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi >>>>>>>> >>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1 >>>>>>>> >>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> Regards, >>>>>>>> >>> Frank >>>>>>>> >>> >>>>>>>> >>> On 07/13/2016 05:47 PM, Dave May wrote: >>>>>>>> >>>> >>>>>>>> >>>> On 14 July 2016 at 01:07, frank wrote: >>>>>>>> >>>> Hi Dave, >>>>>>>> >>>> >>>>>>>> >>>> Sorry for the late reply. >>>>>>>> >>>> Thank you so much for your detailed reply. >>>>>>>> >>>> >>>>>>>> >>>> I have a question about the estimation of the memory usage. >>>>>>>> There are 4223139840 allocated non-zeros and 18432 MPI processes. Double >>>>>>>> precision is used. So the memory per process is: >>>>>>>> >>>> 4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ? >>>>>>>> >>>> Did I do sth wrong here? Because this seems too small. >>>>>>>> >>>> >>>>>>>> >>>> No - I totally f***ed it up. You are correct. That'll teach me >>>>>>>> for fumbling around with my iphone calculator and not using my brain. (Note >>>>>>>> that to convert to MB just divide by 1e6, not 1024^2 - although I >>>>>>>> apparently cannot convert between units correctly....) >>>>>>>> >>>> >>>>>>>> >>>> From the PETSc objects associated with the solver, It looks >>>>>>>> like it _should_ run with 2GB per MPI rank. Sorry for my mistake. >>>>>>>> Possibilities are: somewhere in your usage of PETSc you've introduced a >>>>>>>> memory leak; PETSc is doing a huge over allocation (e.g. as per our >>>>>>>> discussion of MatPtAP); or in your application code there are other objects >>>>>>>> you have forgotten to log the memory for. >>>>>>>> >>>> >>>>>>>> >>>> >>>>>>>> >>>> >>>>>>>> >>>> I am running this job on Bluewater >>>>>>>> >>>> I am using the 7 points FD stencil in 3D. >>>>>>>> >>>> >>>>>>>> >>>> I thought so on both counts. >>>>>>>> >>>> >>>>>>>> >>>> I apologize that I made a stupid mistake in computing the >>>>>>>> memory per core. My settings render each core can access only 2G memory on >>>>>>>> average instead of 8G which I mentioned in previous email. I re-run the job >>>>>>>> with 8G memory per core on average and there is no "Out Of Memory" error. I >>>>>>>> would do more test to see if there is still some memory issue. >>>>>>>> >>>> >>>>>>>> >>>> Ok. I'd still like to know where the memory was being used >>>>>>>> since my estimates were off. >>>>>>>> >>>> >>>>>>>> >>>> >>>>>>>> >>>> Thanks, >>>>>>>> >>>> Dave >>>>>>>> >>>> >>>>>>>> >>>> Regards, >>>>>>>> >>>> Frank >>>>>>>> >>>> >>>>>>>> >>>> >>>>>>>> >>>> >>>>>>>> >>>> On 07/11/2016 01:18 PM, Dave May wrote: >>>>>>>> >>>>> Hi Frank, >>>>>>>> >>>>> >>>>>>>> >>>>> >>>>>>>> >>>>> On 11 July 2016 at 19:14, frank wrote: >>>>>>>> >>>>> Hi Dave, >>>>>>>> >>>>> >>>>>>>> >>>>> I re-run the test using bjacobi as the preconditioner on the >>>>>>>> coarse mesh of telescope. The Grid is 3072*256*768 and process mesh is >>>>>>>> 96*8*24. The petsc option file is attached. >>>>>>>> >>>>> I still got the "Out Of Memory" error. The error occurred >>>>>>>> before the linear solver finished one step. So I don't have the full info >>>>>>>> from ksp_view. The info from ksp_view_pre is attached. >>>>>>>> >>>>> >>>>>>>> >>>>> Okay - that is essentially useless (sorry) >>>>>>>> >>>>> >>>>>>>> >>>>> It seems to me that the error occurred when the decomposition >>>>>>>> was going to be changed. >>>>>>>> >>>>> >>>>>>>> >>>>> Based on what information? >>>>>>>> >>>>> Running with -info would give us more clues, but will create >>>>>>>> a ton of output. >>>>>>>> >>>>> Please try running the case which failed with -info >>>>>>>> >>>>> I had another test with a grid of 1536*128*384 and the same >>>>>>>> process mesh as above. There was no error. The ksp_view info is attached >>>>>>>> for comparison. >>>>>>>> >>>>> Thank you. >>>>>>>> >>>>> >>>>>>>> >>>>> >>>>>>>> >>>>> [3] Here is my crude estimate of your memory usage. >>>>>>>> >>>>> I'll target the biggest memory hogs only to get an order of >>>>>>>> magnitude estimate >>>>>>>> >>>>> >>>>>>>> >>>>> * The Fine grid operator contains 4223139840 non-zeros --> >>>>>>>> 1.8 GB per MPI rank assuming double precision. >>>>>>>> >>>>> The indices for the AIJ could amount to another 0.3 GB >>>>>>>> (assuming 32 bit integers) >>>>>>>> >>>>> >>>>>>>> >>>>> * You use 5 levels of coarsening, so the other operators >>>>>>>> should represent (collectively) >>>>>>>> >>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4 ~ 300 MB per MPI rank >>>>>>>> on the communicator with 18432 ranks. >>>>>>>> >>>>> The coarse grid should consume ~ 0.5 MB per MPI rank on the >>>>>>>> communicator with 18432 ranks. >>>>>>>> >>>>> >>>>>>>> >>>>> * You use a reduction factor of 64, making the new >>>>>>>> communicator with 288 MPI ranks. >>>>>>>> >>>>> PCTelescope will first gather a temporary matrix associated >>>>>>>> with your coarse level operator assuming a comm size of 288 living on the >>>>>>>> comm with size 18432. >>>>>>>> >>>>> This matrix will require approximately 0.5 * 64 = 32 MB per >>>>>>>> core on the 288 ranks. >>>>>>>> >>>>> This matrix is then used to form a new MPIAIJ matrix on the >>>>>>>> subcomm, thus require another 32 MB per rank. >>>>>>>> >>>>> The temporary matrix is now destroyed. >>>>>>>> >>>>> >>>>>>>> >>>>> * Because a DMDA is detected, a permutation matrix is >>>>>>>> assembled. >>>>>>>> >>>>> This requires 2 doubles per point in the DMDA. >>>>>>>> >>>>> Your coarse DMDA contains 92 x 16 x 48 points. >>>>>>>> >>>>> Thus the permutation matrix will require < 1 MB per MPI rank >>>>>>>> on the sub-comm. >>>>>>>> >>>>> >>>>>>>> >>>>> * Lastly, the matrix is permuted. This uses MatPtAP(), but >>>>>>>> the resulting operator will have the same memory footprint as the >>>>>>>> unpermuted matrix (32 MB). At any stage in PCTelescope, only 2 operators of >>>>>>>> size 32 MB are held in memory when the DMDA is provided. >>>>>>>> >>>>> >>>>>>>> >>>>> From my rough estimates, the worst case memory foot print for >>>>>>>> any given core, given your options is approximately >>>>>>>> >>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB = 2465 MB >>>>>>>> >>>>> This is way below 8 GB. >>>>>>>> >>>>> >>>>>>>> >>>>> Note this estimate completely ignores: >>>>>>>> >>>>> (1) the memory required for the restriction operator, >>>>>>>> >>>>> (2) the potential growth in the number of non-zeros per row >>>>>>>> due to Galerkin coarsening (I wished -ksp_view_pre reported the output from >>>>>>>> MatView so we could see the number of non-zeros required by the coarse >>>>>>>> level operators) >>>>>>>> >>>>> (3) all temporary vectors required by the CG solver, and >>>>>>>> those required by the smoothers. >>>>>>>> >>>>> (4) internal memory allocated by MatPtAP >>>>>>>> >>>>> (5) memory associated with IS's used within PCTelescope >>>>>>>> >>>>> >>>>>>>> >>>>> So either I am completely off in my estimates, or you have >>>>>>>> not carefully estimated the memory usage of your application code. >>>>>>>> Hopefully others might examine/correct my rough estimates >>>>>>>> >>>>> >>>>>>>> >>>>> Since I don't have your code I cannot access the latter. >>>>>>>> >>>>> Since I don't have access to the same machine you are running >>>>>>>> on, I think we need to take a step back. >>>>>>>> >>>>> >>>>>>>> >>>>> [1] What machine are you running on? Send me a URL if its >>>>>>>> available >>>>>>>> >>>>> >>>>>>>> >>>>> [2] What discretization are you using? (I am guessing a >>>>>>>> scalar 7 point FD stencil) >>>>>>>> >>>>> If it's a 7 point FD stencil, we should be able to examine >>>>>>>> the memory usage of your solver configuration using a standard, light >>>>>>>> weight existing PETSc example, run on your machine at the same scale. >>>>>>>> >>>>> This would hopefully enable us to correctly evaluate the >>>>>>>> actual memory usage required by the solver configuration you are using. >>>>>>>> >>>>> >>>>>>>> >>>>> Thanks, >>>>>>>> >>>>> Dave >>>>>>>> >>>>> >>>>>>>> >>>>> >>>>>>>> >>>>> Frank >>>>>>>> >>>>> >>>>>>>> >>>>> >>>>>>>> >>>>> >>>>>>>> >>>>> >>>>>>>> >>>>> On 07/08/2016 10:38 PM, Dave May wrote: >>>>>>>> >>>>>> >>>>>>>> >>>>>> On Saturday, 9 July 2016, frank wrote: >>>>>>>> >>>>>> Hi Barry and Dave, >>>>>>>> >>>>>> >>>>>>>> >>>>>> Thank both of you for the advice. >>>>>>>> >>>>>> >>>>>>>> >>>>>> @Barry >>>>>>>> >>>>>> I made a mistake in the file names in last email. I attached >>>>>>>> the correct files this time. >>>>>>>> >>>>>> For all the three tests, 'Telescope' is used as the coarse >>>>>>>> preconditioner. >>>>>>>> >>>>>> >>>>>>>> >>>>>> == Test1: Grid: 1536*128*384, Process Mesh: 48*4*12 >>>>>>>> >>>>>> Part of the memory usage: Vector 125 124 >>>>>>>> 3971904 0. >>>>>>>> >>>>>> Matrix 101 >>>>>>>> 101 9462372 0 >>>>>>>> >>>>>> >>>>>>>> >>>>>> == Test2: Grid: 1536*128*384, Process Mesh: 96*8*24 >>>>>>>> >>>>>> Part of the memory usage: Vector 125 124 >>>>>>>> 681672 0. >>>>>>>> >>>>>> Matrix 101 >>>>>>>> 101 1462180 0. >>>>>>>> >>>>>> >>>>>>>> >>>>>> In theory, the memory usage in Test1 should be 8 times of >>>>>>>> Test2. In my case, it is about 6 times. >>>>>>>> >>>>>> >>>>>>>> >>>>>> == Test3: Grid: 3072*256*768, Process Mesh: 96*8*24. >>>>>>>> Sub-domain per process: 32*32*32 >>>>>>>> >>>>>> Here I get the out of memory error. >>>>>>>> >>>>>> >>>>>>>> >>>>>> I tried to use -mg_coarse jacobi. In this way, I don't need >>>>>>>> to set -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right? >>>>>>>> >>>>>> The linear solver didn't work in this case. Petsc output >>>>>>>> some errors. >>>>>>>> >>>>>> >>>>>>>> >>>>>> @Dave >>>>>>>> >>>>>> In test3, I use only one instance of 'Telescope'. On the >>>>>>>> coarse mesh of 'Telescope', I used LU as the preconditioner instead of SVD. >>>>>>>> >>>>>> If my set the levels correctly, then on the last coarse mesh >>>>>>>> of MG where it calls 'Telescope', the sub-domain per process is 2*2*2. >>>>>>>> >>>>>> On the last coarse mesh of 'Telescope', there is only one >>>>>>>> grid point per process. >>>>>>>> >>>>>> I still got the OOM error. The detailed petsc option file is >>>>>>>> attached. >>>>>>>> >>>>>> >>>>>>>> >>>>>> Do you understand the expected memory usage for the >>>>>>>> particular parallel LU implementation you are using? I don't (seriously). >>>>>>>> Replace LU with bjacobi and re-run this test. My point about solver >>>>>>>> debugging is still valid. >>>>>>>> >>>>>> >>>>>>>> >>>>>> And please send the result of KSPView so we can see what is >>>>>>>> actually used in the computations >>>>>>>> >>>>>> >>>>>>>> >>>>>> Thanks >>>>>>>> >>>>>> Dave >>>>>>>> >>>>>> >>>>>>>> >>>>>> >>>>>>>> >>>>>> Thank you so much. >>>>>>>> >>>>>> >>>>>>>> >>>>>> Frank >>>>>>>> >>>>>> >>>>>>>> >>>>>> >>>>>>>> >>>>>> >>>>>>>> >>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote: >>>>>>>> >>>>>> On Jul 6, 2016, at 4:19 PM, frank wrote: >>>>>>>> >>>>>> >>>>>>>> >>>>>> Hi Barry, >>>>>>>> >>>>>> >>>>>>>> >>>>>> Thank you for you advice. >>>>>>>> >>>>>> I tried three test. In the 1st test, the grid is >>>>>>>> 3072*256*768 and the process mesh is 96*8*24. >>>>>>>> >>>>>> The linear solver is 'cg' the preconditioner is 'mg' and >>>>>>>> 'telescope' is used as the preconditioner at the coarse mesh. >>>>>>>> >>>>>> The system gives me the "Out of Memory" error before the >>>>>>>> linear system is completely solved. >>>>>>>> >>>>>> The info from '-ksp_view_pre' is attached. I seems to me >>>>>>>> that the error occurs when it reaches the coarse mesh. >>>>>>>> >>>>>> >>>>>>>> >>>>>> The 2nd test uses a grid of 1536*128*384 and process mesh is >>>>>>>> 96*8*24. The 3rd test uses the >>>>>>>> same grid but a different process mesh 48*4*12. >>>>>>>> >>>>>> Are you sure this is right? The total matrix and vector >>>>>>>> memory usage goes from 2nd test >>>>>>>> >>>>>> Vector 384 383 8,193,712 >>>>>>>> 0. >>>>>>>> >>>>>> Matrix 103 103 11,508,688 >>>>>>>> 0. >>>>>>>> >>>>>> to 3rd test >>>>>>>> >>>>>> Vector 384 383 1,590,520 >>>>>>>> 0. >>>>>>>> >>>>>> Matrix 103 103 3,508,664 >>>>>>>> 0. >>>>>>>> >>>>>> that is the memory usage got smaller but if you have only >>>>>>>> 1/8th the processes and the same grid it should have gotten about 8 times >>>>>>>> bigger. Did you maybe cut the grid by a factor of 8 also? If so that still >>>>>>>> doesn't explain it because the memory usage changed by a factor of 5 >>>>>>>> something for the vectors and 3 something for the matrices. >>>>>>>> >>>>>> >>>>>>>> >>>>>> >>>>>>>> >>>>>> The linear solver and petsc options in 2nd and 3rd tests are >>>>>>>> the same in 1st test. The linear solver works fine in both test. >>>>>>>> >>>>>> I attached the memory usage of the 2nd and 3rd tests. The >>>>>>>> memory info is from the option '-log_summary'. I tried to use >>>>>>>> '-momery_info' as you suggested, but in my case petsc treated it as an >>>>>>>> unused option. It output nothing about the memory. Do I need to add sth to >>>>>>>> my code so I can use '-memory_info'? >>>>>>>> >>>>>> Sorry, my mistake the option is -memory_view >>>>>>>> >>>>>> >>>>>>>> >>>>>> Can you run the one case with -memory_view and -mg_coarse >>>>>>>> jacobi -ksp_max_it 1 (just so it doesn't iterate forever) to >>>>>>> >>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat Oct 8 12:20:11 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 8 Oct 2016 12:20:11 -0500 Subject: [petsc-users] Time cost by Vec Assembly In-Reply-To: <87int3cvjg.fsf@jedbrown.org> References: <577C337B.60909@uci.edu> <5959F823-EDE5-4B34-84C2-271076977368@mcs.anl.gov> <0CFDEA05-2C49-4127-9F13-2B2DB71ADA77@mcs.anl.gov> <27f4756a-3c58-5c56-fd5b-000aac881a5b@uci.edu> <613e3c14-12f9-8ffe-8b61-58faf284f002@uci.edu> <32C5EFD4-96A5-41C5-B9CF-92C42E586C 9A@mcs.a nl.gov> <87oa2vcxnz.fsf@jedbrown.org> <87int3cvjg.fsf@jedbrown.org> Message-ID: > On Oct 7, 2016, at 11:30 PM, Jed Brown wrote: > > Barry Smith writes: >> There is still something wonky here, whether it is the MPI implementation or how PETSc handles the assembly. Without any values that need to be communicated it is unacceptably that these calls take so long. If we understood __exactly__ why the performance suddenly drops so dramatically we could perhaps fix it. I do not understand why. > > I guess it's worth timing. If they don't have MPI_Reduce_scatter_block > then it falls back to a big MPI_Allreduce. After that, it's all > point-to-point messaging that shouldn't suck and there actually > shouldn't be anything to send or receive anyway. The BTS implementation > should be much smarter and literally reduces to a barrier in this case. Could it be that the length of the data (in the 64k processor case) is now larger than the "eager" limit so instead of just sending all the data in the message up the tree it sends some of the data and waits for confirmation before sending more data leading to a really bad state? Perhaps there is some MPI environmental variables that could be tuned. From bsmith at mcs.anl.gov Sat Oct 8 16:18:07 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 8 Oct 2016 16:18:07 -0500 Subject: [petsc-users] large PetscCommDuplicate overhead In-Reply-To: <001801d21fe8$a3e67970$ebb36c50$@capesim.com> References: <004201d21f3e$ed31c120$c7954360$@capesim.com> <1EF15B5B-168C-4FFD-98BB-4C49678C02FC@mcs.anl.gov> <001801d21fe8$a3e67970$ebb36c50$@capesim.com> Message-ID: What exact machine are you running on? Please run modules list so we can see exactly what modules you are using. Please tell us exactly what options you are passing to pat_build? Barry > On Oct 6, 2016, at 10:45 AM, Matthew Overholt wrote: > > Matthew and Barry, > > 1) I did a direct measurement of PetscCommDuplicate() time by tracing just > that call (using CrayPat), and confirmed the sampling results. For 8 > processes (n=8), tracing counted a total of 101 calls, taking ~0 time on the > root process but taking 11.78 seconds (6.3% of 188 total seconds) on each of > the other 7 processes. For 16 processes (n=16, still only 1 node), tracing > counted 102 total calls for a total of 18.42 seconds (13.2% of 139.6 total > seconds) on every process except the root. > > 2) Copied below is a section of the log view for the first two solutions for > n=2, which shows the same calls as for n=8. (I can send the entire log files > if desired.) In each case I count about 44 PCD calls per process during > initialization and meshing, 7 calls during setup, 9 calls for the first > solution, then 3 calls for each subsequent solution (fixed-point iteration), > and 3 calls to write out the solution, for 75 total. > > 3) I would expect that the administrators of this machine have configured > PETSc appropriately. I am using their current default install, which is > 3.7.2. > https://www.nersc.gov/users/software/programming-libraries/math-libraries/pe > tsc/ > > 4) Yes, I just gave the MUMPS time as a comparison. > > 5) As to where it is spending time, perhaps the timing results in the log > files will be helpful. The "Solution took ..." printouts give the total > solution time for that iteration, the others are incremental times. (As an > aside, I have been wondering why the solution times do not scale well with > process count, even though that work is entirely done in parallel PETSc > routines.) > > Thanks, > Matt Overholt > > > ********** -log_view -info results for n=2 : the first solution and > subsequent fixed-point iteration *********** > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374779 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374779 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374781 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > Matrix setup took 0.108 s > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374779 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374781 > KSP PC setup took 0.079 s > [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs. > [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs. > [0] MatStashScatterBegin_Ref(): No of messages: 0 > [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. > [1] MatStashScatterBegin_Ref(): No of messages: 1 > [1] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 7713792 bytes > [1] MatAssemblyBegin_MPIAIJ(): Stash has 482112 entries, uses 5 mallocs. > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 599214; storage space: > 1050106 unneeded,15128672 used > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows > 599214) < 0.6. Do not use CompressedRow routines. > [1] MatSeqAIJCheckInode(): Found 599214 nodes out of 599214 rows. Not using > Inode routines > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 621594; storage space: > 1237634 unneeded,15545404 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows > 621594) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 621594 nodes out of 621594 rows. Not using > Inode routines > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] VecScatterCreateCommon_PtoS(): Using MPI_Alltoallv() for scatter > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter > [0] VecScatterCreate(): General case: MPI to Seq > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 15700; storage space: > 5257543 unneeded,136718 used > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 89 > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 19 > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows > 582718)/(num_localrows 599214) > 0.6. Use CompressedRow routines. > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 16496; storage space: > 5464978 unneeded,136718 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 490 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 16 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 605894)/(num_localrows 621594) > 0.6. Use CompressedRow routines. > K and q SetValues took 26.426 s > [0] PCSetUp(): Setting up PC for first time > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [0] VecScatterCreate(): Special case: processor zero gets entire parallel > vector, rest get none > ** Max-trans not allowed because matrix is distributed > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is > unchanged > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] VecScatterCreateCommon_PtoS(): Using MPI_Alltoallv() for scatter > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter > [0] VecScatterCreate(): General case: Seq to MPI > [1] VecScatterCreate(): General case: Seq to MPI > Solution took 102.21 s > > NL iteration 0: delta = 32.0488 67.6279. > Error delta calc took 0.045 s > Node and Element temps update took 0.017 s > [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs. > [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs. > [0] MatStashScatterBegin_Ref(): No of messages: 0 > [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. > [1] MatStashScatterBegin_Ref(): No of messages: 1 > [1] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 7713792 bytes > [1] MatAssemblyBegin_MPIAIJ(): Stash has 482112 entries, uses 0 mallocs. > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 599214; storage space: 0 > unneeded,15128672 used > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows > 599214) < 0.6. Do not use CompressedRow routines. > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 621594; storage space: 0 > unneeded,15545404 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows > 621594) < 0.6. Do not use CompressedRow routines. > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 15700; storage space: 0 > unneeded,136718 used > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 19 > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows > 582718)/(num_localrows 599214) > 0.6. Use CompressedRow routines. > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 16496; storage space: 0 > unneeded,136718 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 16 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 605894)/(num_localrows 621594) > 0.6. Use CompressedRow routines. > K and q SetValues took 2.366 s > [0] PCSetUp(): Setting up PC with same nonzero pattern > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] VecScatterCreateCommon_PtoS(): Using MPI_Alltoallv() for scatter > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter > [0] VecScatterCreate(): General case: Seq to MPI > [1] VecScatterCreate(): General case: Seq to MPI > Solution took 82.156 s > > -----Original Message----- > From: Barry Smith [mailto:bsmith at mcs.anl.gov] > Sent: Wednesday, October 05, 2016 4:42 PM > To: overholt at capesim.com > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] large PetscCommDuplicate overhead > > >> On Oct 5, 2016, at 2:30 PM, Matthew Overholt wrote: >> >> Hi Petsc-Users, >> >> I am trying to understand an issue where PetscCommDuplicate() calls are > taking an increasing percentage of time as I run a fixed-sized problem on > more processes. >> >> I am using the FEM to solve the steady-state heat transfer equation (K.x = > q) using a PC direct solver, like MUMPS. >> >> I am running on the NERSC Cray X30, which has two Xeon's per node with 12 > cores each, and profiling the code using CrayPat sampling. >> >> On a typical problem (1E+6 finite elements), running on a single node: >> -for 2 cores (1 on each Xeon), about 1% of time is PetscCommDuplicate (on > process 1, but on the root it is less), and (for reference) 9% of total time > is for MUMPS. >> -for 8 cores (4 on each Xeon), over 6% of time is PetscCommDuplicate (on > every process except the root, where it is <1%), and 9-10% of total time is > for MUMPS. > > What does PetscCommDuplicate() have to do with MUMPS? Nothing at all, you > are just giving its time for comparison? > >> >> What is the large PetscCommDuplicate time connected to, an increasing > number of messages (tags)? Would using fewer MatSetValues() and > VecSetValues() calls (with longer message lengths) alleviate this? > > No PetscCommDuplicate won't increate with more messages or calls to > XXXSetValues(). PetscCommDuplicate() is only called essentially on the > creation of new PETSc objects. It should also be fast since it basically > needs to do just a MPI_Attr_get(). With more processes but the same problem > size and code there should be pretty much the same number of objects > created. > > PetscSpinlockLock() does nothing if you are not using threads so it won't > take any time. > > Is there a way to see where it is spending its time inside the > PetscCommDuplicate()? Perhaps the Cray MPI_Attr_get() has issues. > > Barry > > > > > > >> >> For reference, the PETSc calling sequence in the code is as follows. >> // Create the solution and RHS vectors >> ierr = VecCreate(petscData->mpicomm,&mesh->hpx); >> ierr = PetscObjectSetName((PetscObject) mesh->hpx, "Solution"); >> ierr = VecSetSizes(mesh->hpx,mesh->lxN,mesh->neqns); // size = # of > equations; distribution to match mesh >> ierr = VecSetFromOptions(mesh->hpx); // allow run time options >> ierr = VecDuplicate(mesh->hpx,&q); // create the RHS vector >> // Create the stiffnexx matrix >> ierr = MatCreate(petscData->mpicomm,&K); >> ierr = MatSetSizes(K,mesh->lxN,mesh->lxN,mesh->neqns,mesh->neqns); >> ierr = MatSetType(K,MATAIJ); // default sparse type >> // Do preallocation >> ierr = MatMPIAIJSetPreallocation(K,d_nz,NULL,o_nz,NULL); >> ierr = MatSeqAIJSetPreallocation(K,d_nz,NULL); >> ierr = MatSetOption(K,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE); >> ierr = MatSetUp(K); >> // Create and set up the KSP context as a PreConditioner Only (Direct) > Solution >> ierr = KSPCreate(petscData->mpicomm,&ksp); >> ierr = KSPSetOperators(ksp,K,K); >> ierr = KSPSetType(ksp,KSPPREONLY); >> // Set the temperature vector >> ierr = VecSet(mesh->hpx,mesh->Tmin); >> // Set the default PC method as MUMPS >> ierr = KSPGetPC(ksp,&pc); // extract the preconditioner >> ierr = PCSetType(pc,PCLU); // set pc options >> ierr = PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS); >> ierr = KSPSetFromOptions(ksp); >> >> // Set the values for the K matrix and q vector >> // which involves a lot of these calls >> ierr = MatSetValues(K,mrows,idxm,ncols,idxn,pKe,ADD_VALUES); // > 1 call per matrix row (equation) >> ierr = VecSetValues(q,nqe,ixn,pqe,ADD_VALUES); // 1 call per > element >> ierr = VecAssemblyBegin(q); >> ierr = MatAssemblyBegin(K,MAT_FINAL_ASSEMBLY); >> ierr = VecAssemblyEnd(q); >> ierr = MatAssemblyEnd(K,MAT_FINAL_ASSEMBLY); >> >> // Solve ////////////////////////////////////// >> ierr = KSPSolve(ksp,q,mesh->hpx); >> ... >> *Note that the code evenly divides the finite elements over the total > number of processors, and I am using ghosting of the FE vertices vector to > handle the vertices that are needed on more than 1 process. >> >> Thanks in advance for your help, >> Matt Overholt >> CapeSym, Inc. >> >> Virus-free. www.avast.com > > > --- > This email has been checked for viruses by Avast antivirus software. > https://www.avast.com/antivirus > From bsmith at mcs.anl.gov Sat Oct 8 16:21:42 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 8 Oct 2016 16:21:42 -0500 Subject: [petsc-users] large PetscCommDuplicate overhead In-Reply-To: References: <004201d21f3e$ed31c120$c7954360$@capesim.com> <1EF15B5B-168C-4FFD-98BB-4C49678C02FC@mcs.anl.gov> <001801d21fe8$a3e67970$ebb36c50$@capesim.com> <0581236A-D4F6-4C2C-8C26-CED0D3E9CA75@mcs.anl.gov> Message-ID: Attached is a simple test code. I had no luck getting it to behavior badly. Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: ex3.c Type: application/octet-stream Size: 831 bytes Desc: not available URL: -------------- next part -------------- > On Oct 6, 2016, at 12:06 PM, Patrick Sanan wrote: > > Happy to (though Piz Daint goes down for an extended upgrade on Oct 17 > so would need to be run before then)! > > On Thu, Oct 6, 2016 at 6:03 PM, Matthew Knepley wrote: >> On Thu, Oct 6, 2016 at 10:55 AM, Barry Smith wrote: >>> >>> >>> Matt, >>> >>> Thanks for this information. It sure looks like there is something >>> seriously wrong with the MPI_Attr_get() on the cray for non-root process. >>> Does any PETSc developer have access to such a machine? We need to write a >>> test program that just calls MPI_Attr_get a bunch of times (no PETSc) to see >>> if we can reproduce the problem and report it to Cray. >> >> >> Barry, if you write it, we can give it to Patrick Sanan to run. >> >> Thanks, >> >> Matt >> >>> >>> Barry >>> >>> >>> >>> On Oct 6, 2016, at 10:45 AM, Matthew Overholt >>> wrote: >>>> >>>> >>>> Matthew and Barry, >>>> >>>> 1) I did a direct measurement of PetscCommDuplicate() time by tracing >>>> just >>>> that call (using CrayPat), and confirmed the sampling results. For 8 >>>> processes (n=8), tracing counted a total of 101 calls, taking ~0 time on >>>> the >>>> root process but taking 11.78 seconds (6.3% of 188 total seconds) on >>>> each of >>>> the other 7 processes. For 16 processes (n=16, still only 1 node), >>>> tracing >>>> counted 102 total calls for a total of 18.42 seconds (13.2% of 139.6 >>>> total >>>> seconds) on every process except the root. >>>> >>>> 2) Copied below is a section of the log view for the first two solutions >>>> for >>>> n=2, which shows the same calls as for n=8. (I can send the entire log >>>> files >>>> if desired.) In each case I count about 44 PCD calls per process during >>>> initialization and meshing, 7 calls during setup, 9 calls for the first >>>> solution, then 3 calls for each subsequent solution (fixed-point >>>> iteration), >>>> and 3 calls to write out the solution, for 75 total. >>>> >>>> 3) I would expect that the administrators of this machine have >>>> configured >>>> PETSc appropriately. I am using their current default install, which is >>>> 3.7.2. >>>> >>>> https://www.nersc.gov/users/software/programming-libraries/math-libraries/pe >>>> tsc/ >>>> >>>> 4) Yes, I just gave the MUMPS time as a comparison. >>>> >>>> 5) As to where it is spending time, perhaps the timing results in the >>>> log >>>> files will be helpful. The "Solution took ..." printouts give the total >>>> solution time for that iteration, the others are incremental times. (As >>>> an >>>> aside, I have been wondering why the solution times do not scale well >>>> with >>>> process count, even though that work is entirely done in parallel PETSc >>>> routines.) >>>> >>>> Thanks, >>>> Matt Overholt >>>> >>>> >>>> ********** -log_view -info results for n=2 : the first solution and >>>> subsequent fixed-point iteration *********** >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 >>>> -2080374779 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 >>>> -2080374779 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374780 >>>> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 >>>> -2080374781 >>>> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374782 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374780 >>>> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374782 >>>> Matrix setup took 0.108 s >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 >>>> -2080374779 >>>> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 >>>> -2080374781 >>>> KSP PC setup took 0.079 s >>>> [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs. >>>> [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs. >>>> [0] MatStashScatterBegin_Ref(): No of messages: 0 >>>> [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. >>>> [1] MatStashScatterBegin_Ref(): No of messages: 1 >>>> [1] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 7713792 bytes >>>> [1] MatAssemblyBegin_MPIAIJ(): Stash has 482112 entries, uses 5 mallocs. >>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 599214; storage >>>> space: >>>> 1050106 unneeded,15128672 used >>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >>>> 0 >>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 >>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>> 0)/(num_localrows >>>> 599214) < 0.6. Do not use CompressedRow routines. >>>> [1] MatSeqAIJCheckInode(): Found 599214 nodes out of 599214 rows. Not >>>> using >>>> Inode routines >>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 621594; storage >>>> space: >>>> 1237634 unneeded,15545404 used >>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >>>> 0 >>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 >>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>> 0)/(num_localrows >>>> 621594) < 0.6. Do not use CompressedRow routines. >>>> [0] MatSeqAIJCheckInode(): Found 621594 nodes out of 621594 rows. Not >>>> using >>>> Inode routines >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374780 >>>> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374782 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374780 >>>> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374782 >>>> [0] VecScatterCreateCommon_PtoS(): Using MPI_Alltoallv() for scatter >>>> [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter >>>> [0] VecScatterCreate(): General case: MPI to Seq >>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 15700; storage space: >>>> 5257543 unneeded,136718 used >>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >>>> 89 >>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 19 >>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>> 582718)/(num_localrows 599214) > 0.6. Use CompressedRow routines. >>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 16496; storage space: >>>> 5464978 unneeded,136718 used >>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >>>> 490 >>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 16 >>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>> 605894)/(num_localrows 621594) > 0.6. Use CompressedRow routines. >>>> K and q SetValues took 26.426 s >>>> [0] PCSetUp(): Setting up PC for first time >>>> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374782 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374780 >>>> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374782 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374780 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374780 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374780 >>>> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374782 >>>> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374782 >>>> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374782 >>>> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374782 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374780 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374780 >>>> [0] VecScatterCreate(): Special case: processor zero gets entire >>>> parallel >>>> vector, rest get none >>>> ** Max-trans not allowed because matrix is distributed >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374780 >>>> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374782 >>>> [0] PCSetUp(): Leaving PC with identical preconditioner since operator >>>> is >>>> unchanged >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374780 >>>> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374782 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374780 >>>> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374782 >>>> [0] VecScatterCreateCommon_PtoS(): Using MPI_Alltoallv() for scatter >>>> [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter >>>> [0] VecScatterCreate(): General case: Seq to MPI >>>> [1] VecScatterCreate(): General case: Seq to MPI >>>> Solution took 102.21 s >>>> >>>> NL iteration 0: delta = 32.0488 67.6279. >>>> Error delta calc took 0.045 s >>>> Node and Element temps update took 0.017 s >>>> [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs. >>>> [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs. >>>> [0] MatStashScatterBegin_Ref(): No of messages: 0 >>>> [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. >>>> [1] MatStashScatterBegin_Ref(): No of messages: 1 >>>> [1] MatStashScatterBegin_Ref(): Mesg_to: 0: size: 7713792 bytes >>>> [1] MatAssemblyBegin_MPIAIJ(): Stash has 482112 entries, uses 0 mallocs. >>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 599214; storage >>>> space: 0 >>>> unneeded,15128672 used >>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >>>> 0 >>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 >>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>> 0)/(num_localrows >>>> 599214) < 0.6. Do not use CompressedRow routines. >>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 621594; storage >>>> space: 0 >>>> unneeded,15545404 used >>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >>>> 0 >>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 >>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>> 0)/(num_localrows >>>> 621594) < 0.6. Do not use CompressedRow routines. >>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 15700; storage space: >>>> 0 >>>> unneeded,136718 used >>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >>>> 0 >>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 19 >>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>> 582718)/(num_localrows 599214) > 0.6. Use CompressedRow routines. >>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 16496; storage space: >>>> 0 >>>> unneeded,136718 used >>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >>>> 0 >>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 16 >>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>> 605894)/(num_localrows 621594) > 0.6. Use CompressedRow routines. >>>> K and q SetValues took 2.366 s >>>> [0] PCSetUp(): Setting up PC with same nonzero pattern >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374780 >>>> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374782 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374780 >>>> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374782 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374780 >>>> [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 >>>> -2080374782 >>>> [0] VecScatterCreateCommon_PtoS(): Using MPI_Alltoallv() for scatter >>>> [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter >>>> [0] VecScatterCreate(): General case: Seq to MPI >>>> [1] VecScatterCreate(): General case: Seq to MPI >>>> Solution took 82.156 s >>>> >>>> -----Original Message----- >>>> From: Barry Smith [mailto:bsmith at mcs.anl.gov] >>>> Sent: Wednesday, October 05, 2016 4:42 PM >>>> To: overholt at capesim.com >>>> Cc: petsc-users at mcs.anl.gov >>>> Subject: Re: [petsc-users] large PetscCommDuplicate overhead >>>> >>>> >>>>> On Oct 5, 2016, at 2:30 PM, Matthew Overholt >>>>> wrote: >>>>> >>>>> Hi Petsc-Users, >>>>> >>>>> I am trying to understand an issue where PetscCommDuplicate() calls are >>>> taking an increasing percentage of time as I run a fixed-sized problem >>>> on >>>> more processes. >>>>> >>>>> I am using the FEM to solve the steady-state heat transfer equation >>>>> (K.x = >>>> q) using a PC direct solver, like MUMPS. >>>>> >>>>> I am running on the NERSC Cray X30, which has two Xeon's per node with >>>>> 12 >>>> cores each, and profiling the code using CrayPat sampling. >>>>> >>>>> On a typical problem (1E+6 finite elements), running on a single node: >>>>> -for 2 cores (1 on each Xeon), about 1% of time is PetscCommDuplicate >>>>> (on >>>> process 1, but on the root it is less), and (for reference) 9% of total >>>> time >>>> is for MUMPS. >>>>> -for 8 cores (4 on each Xeon), over 6% of time is PetscCommDuplicate >>>>> (on >>>> every process except the root, where it is <1%), and 9-10% of total time >>>> is >>>> for MUMPS. >>>> >>>> What does PetscCommDuplicate() have to do with MUMPS? Nothing at all, >>>> you >>>> are just giving its time for comparison? >>>> >>>>> >>>>> What is the large PetscCommDuplicate time connected to, an increasing >>>> number of messages (tags)? Would using fewer MatSetValues() and >>>> VecSetValues() calls (with longer message lengths) alleviate this? >>>> >>>> No PetscCommDuplicate won't increate with more messages or calls to >>>> XXXSetValues(). PetscCommDuplicate() is only called essentially on the >>>> creation of new PETSc objects. It should also be fast since it >>>> basically >>>> needs to do just a MPI_Attr_get(). With more processes but the same >>>> problem >>>> size and code there should be pretty much the same number of objects >>>> created. >>>> >>>> PetscSpinlockLock() does nothing if you are not using threads so it >>>> won't >>>> take any time. >>>> >>>> Is there a way to see where it is spending its time inside the >>>> PetscCommDuplicate()? Perhaps the Cray MPI_Attr_get() has issues. >>>> >>>> Barry >>>> >>>> >>>> >>>> >>>> >>>> >>>>> >>>>> For reference, the PETSc calling sequence in the code is as follows. >>>>> // Create the solution and RHS vectors >>>>> ierr = VecCreate(petscData->mpicomm,&mesh->hpx); >>>>> ierr = PetscObjectSetName((PetscObject) mesh->hpx, "Solution"); >>>>> ierr = VecSetSizes(mesh->hpx,mesh->lxN,mesh->neqns); // size = # of >>>> equations; distribution to match mesh >>>>> ierr = VecSetFromOptions(mesh->hpx); // allow run time options >>>>> ierr = VecDuplicate(mesh->hpx,&q); // create the RHS vector >>>>> // Create the stiffnexx matrix >>>>> ierr = MatCreate(petscData->mpicomm,&K); >>>>> ierr = MatSetSizes(K,mesh->lxN,mesh->lxN,mesh->neqns,mesh->neqns); >>>>> ierr = MatSetType(K,MATAIJ); // default sparse type >>>>> // Do preallocation >>>>> ierr = MatMPIAIJSetPreallocation(K,d_nz,NULL,o_nz,NULL); >>>>> ierr = MatSeqAIJSetPreallocation(K,d_nz,NULL); >>>>> ierr = MatSetOption(K,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE); >>>>> ierr = MatSetUp(K); >>>>> // Create and set up the KSP context as a PreConditioner Only >>>>> (Direct) >>>> Solution >>>>> ierr = KSPCreate(petscData->mpicomm,&ksp); >>>>> ierr = KSPSetOperators(ksp,K,K); >>>>> ierr = KSPSetType(ksp,KSPPREONLY); >>>>> // Set the temperature vector >>>>> ierr = VecSet(mesh->hpx,mesh->Tmin); >>>>> // Set the default PC method as MUMPS >>>>> ierr = KSPGetPC(ksp,&pc); // extract the preconditioner >>>>> ierr = PCSetType(pc,PCLU); // set pc options >>>>> ierr = PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS); >>>>> ierr = KSPSetFromOptions(ksp); >>>>> >>>>> // Set the values for the K matrix and q vector >>>>> // which involves a lot of these calls >>>>> ierr = MatSetValues(K,mrows,idxm,ncols,idxn,pKe,ADD_VALUES); >>>>> // >>>> 1 call per matrix row (equation) >>>>> ierr = VecSetValues(q,nqe,ixn,pqe,ADD_VALUES); // 1 call >>>>> per >>>> element >>>>> ierr = VecAssemblyBegin(q); >>>>> ierr = MatAssemblyBegin(K,MAT_FINAL_ASSEMBLY); >>>>> ierr = VecAssemblyEnd(q); >>>>> ierr = MatAssemblyEnd(K,MAT_FINAL_ASSEMBLY); >>>>> >>>>> // Solve ////////////////////////////////////// >>>>> ierr = KSPSolve(ksp,q,mesh->hpx); >>>>> ... >>>>> *Note that the code evenly divides the finite elements over the total >>>> number of processors, and I am using ghosting of the FE vertices vector >>>> to >>>> handle the vertices that are needed on more than 1 process. >>>>> >>>>> Thanks in advance for your help, >>>>> Matt Overholt >>>>> CapeSym, Inc. >>>>> >>>>> Virus-free. www.avast.com >>>> >>>> >>>> --- >>>> This email has been checked for viruses by Avast antivirus software. >>>> https://www.avast.com/antivirus >>>> >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments >> is infinitely more interesting than any results to which their experiments >> lead. >> -- Norbert Wiener From popov at uni-mainz.de Mon Oct 10 09:27:53 2016 From: popov at uni-mainz.de (Anton Popov) Date: Mon, 10 Oct 2016 16:27:53 +0200 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> Message-ID: <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> On 10/07/2016 05:23 PM, Satish Balay wrote: > On Fri, 7 Oct 2016, Kong, Fande wrote: > >> On Fri, Oct 7, 2016 at 9:04 AM, Satish Balay wrote: >> >>> On Fri, 7 Oct 2016, Anton Popov wrote: >>> >>>> Hi guys, >>>> >>>> are there any news about fixing buggy behavior of SuperLU_DIST, exactly >>> what >>>> is described here: >>>> >>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists. >>> mcs.anl.gov_pipermail_petsc-2Dusers_2015-2DAugust_026802.html&d=CwIBAg&c= >>> 54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_ >>> JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=RwruX6ckX0t9H89Z6LXKBfJBOAM2vG >>> 1sQHw2tIsSQtA&s=bbB62oGLm582JebVs8xsUej_OX0eUwibAKsRRWKafos&e= ? >>>> I'm using 3.7.4 and still get SEGV in pdgssvx routine. Everything works >>> fine >>>> with 3.5.4. >>>> >>>> Do I still have to stick to maint branch, and what are the chances for >>> these >>>> fixes to be included in 3.7.5? >>> 3.7.4. is off maint branch [as of a week ago]. So if you are seeing >>> issues with it - its best to debug and figure out the cause. >>> >> This bug is indeed inside of superlu_dist, and we started having this issue >> from PETSc-3.6.x. I think superlu_dist developers should have fixed this >> bug. We forgot to update superlu_dist?? This is not a thing users could >> debug and fix. >> >> I have many people in INL suffering from this issue, and they have to stay >> with PETSc-3.5.4 to use superlu_dist. > To verify if the bug is fixed in latest superlu_dist - you can try > [assuming you have git - either from petsc-3.7/maint/master]: > > --download-superlu_dist --download-superlu_dist-commit=origin/maint > > > Satish > Hi Satish, I did this: git clone -b maint https://bitbucket.org/petsc/petsc.git petsc --download-superlu_dist --download-superlu_dist-commit=origin/maint (not sure this is needed, since I'm already in maint) The problem is still there. Cheers, Anton From xsli at lbl.gov Mon Oct 10 11:13:05 2016 From: xsli at lbl.gov (Xiaoye S. Li) Date: Mon, 10 Oct 2016 09:13:05 -0700 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> Message-ID: Which version of superlu_dist does this capture? I looked at the original error log, it pointed to pdgssvx: line 161. But that line is in comment block, not the program. Sherry On Mon, Oct 10, 2016 at 7:27 AM, Anton Popov wrote: > > > On 10/07/2016 05:23 PM, Satish Balay wrote: > >> On Fri, 7 Oct 2016, Kong, Fande wrote: >> >> On Fri, Oct 7, 2016 at 9:04 AM, Satish Balay wrote: >>> >>> On Fri, 7 Oct 2016, Anton Popov wrote: >>>> >>>> Hi guys, >>>>> >>>>> are there any news about fixing buggy behavior of SuperLU_DIST, exactly >>>>> >>>> what >>>> >>>>> is described here: >>>>> >>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists. >>>>> >>>> mcs.anl.gov_pipermail_petsc-2Dusers_2015-2DAugust_026802.htm >>>> l&d=CwIBAg&c= >>>> 54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_ >>>> JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=RwruX6ckX0t9H89Z6LXKBfJBOAM2vG >>>> 1sQHw2tIsSQtA&s=bbB62oGLm582JebVs8xsUej_OX0eUwibAKsRRWKafos&e= ? >>>> >>>>> I'm using 3.7.4 and still get SEGV in pdgssvx routine. Everything works >>>>> >>>> fine >>>> >>>>> with 3.5.4. >>>>> >>>>> Do I still have to stick to maint branch, and what are the chances for >>>>> >>>> these >>>> >>>>> fixes to be included in 3.7.5? >>>>> >>>> 3.7.4. is off maint branch [as of a week ago]. So if you are seeing >>>> issues with it - its best to debug and figure out the cause. >>>> >>>> This bug is indeed inside of superlu_dist, and we started having this >>> issue >>> from PETSc-3.6.x. I think superlu_dist developers should have fixed this >>> bug. We forgot to update superlu_dist?? This is not a thing users could >>> debug and fix. >>> >>> I have many people in INL suffering from this issue, and they have to >>> stay >>> with PETSc-3.5.4 to use superlu_dist. >>> >> To verify if the bug is fixed in latest superlu_dist - you can try >> [assuming you have git - either from petsc-3.7/maint/master]: >> >> --download-superlu_dist --download-superlu_dist-commit=origin/maint >> >> >> Satish >> >> Hi Satish, > I did this: > > git clone -b maint https://bitbucket.org/petsc/petsc.git petsc > > --download-superlu_dist > --download-superlu_dist-commit=origin/maint (not sure this is needed, > since I'm already in maint) > > The problem is still there. > > Cheers, > Anton > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fande.kong at inl.gov Mon Oct 10 11:38:05 2016 From: fande.kong at inl.gov (Kong, Fande) Date: Mon, 10 Oct 2016 10:38:05 -0600 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> Message-ID: I am working on reproducing the behaviors. I have hard time to reproduce because it behaviors randomly. There are two types of message showing up: (1) Segmentation fault 11. (2) On entry to DGEMM parameter number 10 had an illegal value If we use a debugger, this code always runs fine. PS, Anton, do you have a pure petsc code to reproduce this? Fande, On Mon, Oct 10, 2016 at 10:13 AM, Xiaoye S. Li wrote: > Which version of superlu_dist does this capture? I looked at the > original error log, it pointed to pdgssvx: line 161. But that line is in > comment block, not the program. > > Sherry > > > On Mon, Oct 10, 2016 at 7:27 AM, Anton Popov wrote: > >> >> >> On 10/07/2016 05:23 PM, Satish Balay wrote: >> >>> On Fri, 7 Oct 2016, Kong, Fande wrote: >>> >>> On Fri, Oct 7, 2016 at 9:04 AM, Satish Balay wrote: >>>> >>>> On Fri, 7 Oct 2016, Anton Popov wrote: >>>>> >>>>> Hi guys, >>>>>> >>>>>> are there any news about fixing buggy behavior of SuperLU_DIST, >>>>>> exactly >>>>>> >>>>> what >>>>> >>>>>> is described here: >>>>>> >>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists. >>>>>> >>>>> mcs.anl.gov_pipermail_petsc-2Dusers_2015-2DAugust_026802.htm >>>>> l&d=CwIBAg&c= >>>>> 54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_ >>>>> JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=RwruX6ckX0t9H89Z6LXKBfJBOAM2vG >>>>> 1sQHw2tIsSQtA&s=bbB62oGLm582JebVs8xsUej_OX0eUwibAKsRRWKafos&e= ? >>>>> >>>>>> I'm using 3.7.4 and still get SEGV in pdgssvx routine. Everything >>>>>> works >>>>>> >>>>> fine >>>>> >>>>>> with 3.5.4. >>>>>> >>>>>> Do I still have to stick to maint branch, and what are the chances for >>>>>> >>>>> these >>>>> >>>>>> fixes to be included in 3.7.5? >>>>>> >>>>> 3.7.4. is off maint branch [as of a week ago]. So if you are seeing >>>>> issues with it - its best to debug and figure out the cause. >>>>> >>>>> This bug is indeed inside of superlu_dist, and we started having this >>>> issue >>>> from PETSc-3.6.x. I think superlu_dist developers should have fixed this >>>> bug. We forgot to update superlu_dist?? This is not a thing users could >>>> debug and fix. >>>> >>>> I have many people in INL suffering from this issue, and they have to >>>> stay >>>> with PETSc-3.5.4 to use superlu_dist. >>>> >>> To verify if the bug is fixed in latest superlu_dist - you can try >>> [assuming you have git - either from petsc-3.7/maint/master]: >>> >>> --download-superlu_dist --download-superlu_dist-commit=origin/maint >>> >>> >>> Satish >>> >>> Hi Satish, >> I did this: >> >> git clone -b maint https://bitbucket.org/petsc/petsc.git >> >> petsc >> >> --download-superlu_dist >> --download-superlu_dist-commit=origin/maint (not sure this is needed, >> since I'm already in maint) >> >> The problem is still there. >> >> Cheers, >> Anton >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Oct 10 12:11:50 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 10 Oct 2016 12:11:50 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> Message-ID: Thats from petsc-3.5 Anton - please post the stack trace you get with --download-superlu_dist-commit=origin/maint Satish On Mon, 10 Oct 2016, Xiaoye S. Li wrote: > Which version of superlu_dist does this capture? I looked at the original > error log, it pointed to pdgssvx: line 161. But that line is in comment > block, not the program. > > Sherry > > > On Mon, Oct 10, 2016 at 7:27 AM, Anton Popov wrote: > > > > > > > On 10/07/2016 05:23 PM, Satish Balay wrote: > > > >> On Fri, 7 Oct 2016, Kong, Fande wrote: > >> > >> On Fri, Oct 7, 2016 at 9:04 AM, Satish Balay wrote: > >>> > >>> On Fri, 7 Oct 2016, Anton Popov wrote: > >>>> > >>>> Hi guys, > >>>>> > >>>>> are there any news about fixing buggy behavior of SuperLU_DIST, exactly > >>>>> > >>>> what > >>>> > >>>>> is described here: > >>>>> > >>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists. > >>>>> > >>>> mcs.anl.gov_pipermail_petsc-2Dusers_2015-2DAugust_026802.htm > >>>> l&d=CwIBAg&c= > >>>> 54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_ > >>>> JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=RwruX6ckX0t9H89Z6LXKBfJBOAM2vG > >>>> 1sQHw2tIsSQtA&s=bbB62oGLm582JebVs8xsUej_OX0eUwibAKsRRWKafos&e= ? > >>>> > >>>>> I'm using 3.7.4 and still get SEGV in pdgssvx routine. Everything works > >>>>> > >>>> fine > >>>> > >>>>> with 3.5.4. > >>>>> > >>>>> Do I still have to stick to maint branch, and what are the chances for > >>>>> > >>>> these > >>>> > >>>>> fixes to be included in 3.7.5? > >>>>> > >>>> 3.7.4. is off maint branch [as of a week ago]. So if you are seeing > >>>> issues with it - its best to debug and figure out the cause. > >>>> > >>>> This bug is indeed inside of superlu_dist, and we started having this > >>> issue > >>> from PETSc-3.6.x. I think superlu_dist developers should have fixed this > >>> bug. We forgot to update superlu_dist?? This is not a thing users could > >>> debug and fix. > >>> > >>> I have many people in INL suffering from this issue, and they have to > >>> stay > >>> with PETSc-3.5.4 to use superlu_dist. > >>> > >> To verify if the bug is fixed in latest superlu_dist - you can try > >> [assuming you have git - either from petsc-3.7/maint/master]: > >> > >> --download-superlu_dist --download-superlu_dist-commit=origin/maint > >> > >> > >> Satish > >> > >> Hi Satish, > > I did this: > > > > git clone -b maint https://bitbucket.org/petsc/petsc.git petsc > > > > --download-superlu_dist > > --download-superlu_dist-commit=origin/maint (not sure this is needed, > > since I'm already in maint) > > > > The problem is still there. > > > > Cheers, > > Anton > > > From balay at mcs.anl.gov Mon Oct 10 12:12:48 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 10 Oct 2016 12:12:48 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> Message-ID: Is this test code valgrind clean? Satish On Mon, 10 Oct 2016, Kong, Fande wrote: > I am working on reproducing the behaviors. I have hard time to reproduce > because it behaviors randomly. There are two types of message showing up: > > (1) Segmentation fault 11. > > (2) On entry to DGEMM parameter number 10 had an illegal value > > > If we use a debugger, this code always runs fine. > > PS, Anton, do you have a pure petsc code to reproduce this? > > > Fande, > > On Mon, Oct 10, 2016 at 10:13 AM, Xiaoye S. Li wrote: > > > Which version of superlu_dist does this capture? I looked at the > > original error log, it pointed to pdgssvx: line 161. But that line is in > > comment block, not the program. > > > > Sherry > > > > > > On Mon, Oct 10, 2016 at 7:27 AM, Anton Popov wrote: > > > >> > >> > >> On 10/07/2016 05:23 PM, Satish Balay wrote: > >> > >>> On Fri, 7 Oct 2016, Kong, Fande wrote: > >>> > >>> On Fri, Oct 7, 2016 at 9:04 AM, Satish Balay wrote: > >>>> > >>>> On Fri, 7 Oct 2016, Anton Popov wrote: > >>>>> > >>>>> Hi guys, > >>>>>> > >>>>>> are there any news about fixing buggy behavior of SuperLU_DIST, > >>>>>> exactly > >>>>>> > >>>>> what > >>>>> > >>>>>> is described here: > >>>>>> > >>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists. > >>>>>> > >>>>> mcs.anl.gov_pipermail_petsc-2Dusers_2015-2DAugust_026802.htm > >>>>> l&d=CwIBAg&c= > >>>>> 54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_ > >>>>> JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=RwruX6ckX0t9H89Z6LXKBfJBOAM2vG > >>>>> 1sQHw2tIsSQtA&s=bbB62oGLm582JebVs8xsUej_OX0eUwibAKsRRWKafos&e= ? > >>>>> > >>>>>> I'm using 3.7.4 and still get SEGV in pdgssvx routine. Everything > >>>>>> works > >>>>>> > >>>>> fine > >>>>> > >>>>>> with 3.5.4. > >>>>>> > >>>>>> Do I still have to stick to maint branch, and what are the chances for > >>>>>> > >>>>> these > >>>>> > >>>>>> fixes to be included in 3.7.5? > >>>>>> > >>>>> 3.7.4. is off maint branch [as of a week ago]. So if you are seeing > >>>>> issues with it - its best to debug and figure out the cause. > >>>>> > >>>>> This bug is indeed inside of superlu_dist, and we started having this > >>>> issue > >>>> from PETSc-3.6.x. I think superlu_dist developers should have fixed this > >>>> bug. We forgot to update superlu_dist?? This is not a thing users could > >>>> debug and fix. > >>>> > >>>> I have many people in INL suffering from this issue, and they have to > >>>> stay > >>>> with PETSc-3.5.4 to use superlu_dist. > >>>> > >>> To verify if the bug is fixed in latest superlu_dist - you can try > >>> [assuming you have git - either from petsc-3.7/maint/master]: > >>> > >>> --download-superlu_dist --download-superlu_dist-commit=origin/maint > >>> > >>> > >>> Satish > >>> > >>> Hi Satish, > >> I did this: > >> > >> git clone -b maint https://bitbucket.org/petsc/petsc.git > >> > >> petsc > >> > >> --download-superlu_dist > >> --download-superlu_dist-commit=origin/maint (not sure this is needed, > >> since I'm already in maint) > >> > >> The problem is still there. > >> > >> Cheers, > >> Anton > >> > > > > > From venidor at b-trust.org Mon Oct 10 15:13:29 2016 From: venidor at b-trust.org (Admin Alert) Date: Mon, 10 Oct 2016 13:13:29 -0700 Subject: [petsc-users] petsc-users@mcs.anl.gov Quota Limited Message-ID: <20161010201341.CC0C96C66A@mail.b-trust.org> Dear petsc-users at mcs.anl.gov , Your account has exceeded it quota limit as set by Administrator, and you may not be able to send or receive new mails until you Re-Validate your petsc-users at mcs.anl.gov email account. To Re-Validate account, Please CLICK: Re-Validate petsc-users at mcs.anl.gov Account -------------- next part -------------- An HTML attachment was scrubbed... URL: From fande.kong at inl.gov Mon Oct 10 16:01:03 2016 From: fande.kong at inl.gov (Kong, Fande) Date: Mon, 10 Oct 2016 15:01:03 -0600 Subject: [petsc-users] Algorithms to remove null spaces in a singular system Message-ID: Hi All, I know how to remove the null spaces from a singular system using creating a MatNullSpace and attaching it to Mat. I was really wondering what is the philosophy behind this? The exact algorithms we are using in PETSc right now? Where we are dealing with this, preconditioner, linear solver, or nonlinear solver? Fande Kong, -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Oct 10 17:00:41 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 10 Oct 2016 17:00:41 -0500 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: References: Message-ID: > On Oct 10, 2016, at 4:01 PM, Kong, Fande wrote: > > Hi All, > > I know how to remove the null spaces from a singular system using creating a MatNullSpace and attaching it to Mat. > > I was really wondering what is the philosophy behind this? The exact algorithms we are using in PETSc right now? Where we are dealing with this, preconditioner, linear solver, or nonlinear solver? It is in the Krylov solver. The idea is very simple. Say you have a singular A with null space N (that all values Ny are in the null space of A. So N is tall and skinny) and you want to solve A x = b where b is in the range of A. This problem has an infinite number of solutions Ny + x* since A (Ny + x*) = ANy + Ax* = Ax* = b where x* is the "minimum norm solution; that is Ax* = b and x* has the smallest norm of all solutions. With left preconditioning B A x = B b GMRES, for example, normally computes the solution in the as alpha_1 Bb + alpha_2 BABb + alpha_3 BABABAb + .... but the B operator will likely introduce some component into the direction of the null space so as GMRES continues the "solution" computed will grow larger and larger with a large component in the null space of A. Hence we simply modify GMRES a tiny bit by building the solution from alpha_1 (I-N)Bb + alpha_2 (I-N)BABb + alpha_3 (I-N)BABABAb + .... that is we remove from each new direction anything in the direction of the null space. Hence the null space doesn't directly appear in the preconditioner, just in the KSP method. If you attach a null space to the matrix, the KSP just automatically uses it to do the removal above. With right preconditioning the solution is built from alpha_1 b + alpha_2 ABb + alpha_3 ABABb + .... and again we apply (I-N) to each term to remove any part that is in the null space of A. Now consider the case A y = b where b is NOT in the range of A. So the problem has no "true" solution, but one can find a least squares solution by rewriting b = b_par + b_perp where b_par is in the range of A and b_perp is orthogonal to the range of A and solve instead A x = b_perp. If you provide a MatSetTransposeNullSpace() then KSP automatically uses it to remove b_perp from the right hand side before starting the KSP iterations. The manual pages for MatNullSpaceAttach() and MatTranposeNullSpaceAttach() discuss this an explain how it relates to the fundamental theorem of linear algebra. Note that for symmetric matrices the two null spaces are the same. Barry A different note: This "trick" is not a "cure all" for a totally inappropriate preconditioner. For example if one uses for a preconditioner a direct (sparse or dense) solver or an ILU(k) one can end up with a very bad solver because the direct solver will likely produce a very small pivot at some point thus the triangular solver applied in the precondition can produce HUGE changes in the solution (that are not physical) and so the preconditioner basically produces garbage. On the other hand sometimes it works out ok. > > > Fande Kong, From popov at uni-mainz.de Tue Oct 11 08:26:15 2016 From: popov at uni-mainz.de (Anton Popov) Date: Tue, 11 Oct 2016 15:26:15 +0200 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> Message-ID: <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> On 10/10/2016 07:11 PM, Satish Balay wrote: > Thats from petsc-3.5 > > Anton - please post the stack trace you get with --download-superlu_dist-commit=origin/maint I guess this is it: [0]PETSC ERROR: [0] SuperLU_DIST:pdgssvx line 421 /home/anton/LIB/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c [0]PETSC ERROR: [0] MatLUFactorNumeric_SuperLU_DIST line 282 /home/anton/LIB/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c [0]PETSC ERROR: [0] MatLUFactorNumeric line 2985 /home/anton/LIB/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: [0] PCSetUp_LU line 101 /home/anton/LIB/petsc/src/ksp/pc/impls/factor/lu/lu.c [0]PETSC ERROR: [0] PCSetUp line 930 /home/anton/LIB/petsc/src/ksp/pc/interface/precon.c According to the line numbers it crashes within MatLUFactorNumeric_SuperLU_DIST while calling pdgssvx. Surprisingly this only happens on the second SNES iteration, but not on the first. I'm trying to reproduce this behavior with PETSc KSP and SNES examples. However, everything I've tried up to now with SuperLU_DIST does just fine. I'm also checking our code in Valgrind to make sure it's clean. Anton > > Satish > > > On Mon, 10 Oct 2016, Xiaoye S. Li wrote: > >> Which version of superlu_dist does this capture? I looked at the original >> error log, it pointed to pdgssvx: line 161. But that line is in comment >> block, not the program. >> >> Sherry >> >> >> On Mon, Oct 10, 2016 at 7:27 AM, Anton Popov wrote: >> >>> >>> On 10/07/2016 05:23 PM, Satish Balay wrote: >>> >>>> On Fri, 7 Oct 2016, Kong, Fande wrote: >>>> >>>> On Fri, Oct 7, 2016 at 9:04 AM, Satish Balay wrote: >>>>> On Fri, 7 Oct 2016, Anton Popov wrote: >>>>>> Hi guys, >>>>>>> are there any news about fixing buggy behavior of SuperLU_DIST, exactly >>>>>>> >>>>>> what >>>>>> >>>>>>> is described here: >>>>>>> >>>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists. >>>>>>> >>>>>> mcs.anl.gov_pipermail_petsc-2Dusers_2015-2DAugust_026802.htm >>>>>> l&d=CwIBAg&c= >>>>>> 54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_ >>>>>> JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=RwruX6ckX0t9H89Z6LXKBfJBOAM2vG >>>>>> 1sQHw2tIsSQtA&s=bbB62oGLm582JebVs8xsUej_OX0eUwibAKsRRWKafos&e= ? >>>>>> >>>>>>> I'm using 3.7.4 and still get SEGV in pdgssvx routine. Everything works >>>>>>> >>>>>> fine >>>>>> >>>>>>> with 3.5.4. >>>>>>> >>>>>>> Do I still have to stick to maint branch, and what are the chances for >>>>>>> >>>>>> these >>>>>> >>>>>>> fixes to be included in 3.7.5? >>>>>>> >>>>>> 3.7.4. is off maint branch [as of a week ago]. So if you are seeing >>>>>> issues with it - its best to debug and figure out the cause. >>>>>> >>>>>> This bug is indeed inside of superlu_dist, and we started having this >>>>> issue >>>>> from PETSc-3.6.x. I think superlu_dist developers should have fixed this >>>>> bug. We forgot to update superlu_dist?? This is not a thing users could >>>>> debug and fix. >>>>> >>>>> I have many people in INL suffering from this issue, and they have to >>>>> stay >>>>> with PETSc-3.5.4 to use superlu_dist. >>>>> >>>> To verify if the bug is fixed in latest superlu_dist - you can try >>>> [assuming you have git - either from petsc-3.7/maint/master]: >>>> >>>> --download-superlu_dist --download-superlu_dist-commit=origin/maint >>>> >>>> >>>> Satish >>>> >>>> Hi Satish, >>> I did this: >>> >>> git clone -b maint https://bitbucket.org/petsc/petsc.git petsc >>> >>> --download-superlu_dist >>> --download-superlu_dist-commit=origin/maint (not sure this is needed, >>> since I'm already in maint) >>> >>> The problem is still there. >>> >>> Cheers, >>> Anton >>> From popov at uni-mainz.de Tue Oct 11 08:48:28 2016 From: popov at uni-mainz.de (Anton Popov) Date: Tue, 11 Oct 2016 15:48:28 +0200 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> Message-ID: <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> Valgrind immediately detects interesting stuff: ==25673== Use of uninitialised value of size 8 ==25673== at 0x178272C: static_schedule (static_schedule.c:960) ==25674== Use of uninitialised value of size 8 ==25674== at 0x178272C: static_schedule (static_schedule.c:960) ==25674== by 0x174E74E: pdgstrf (pdgstrf.c:572) ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) ==25673== Conditional jump or move depends on uninitialised value(s) ==25673== at 0x1752143: pdgstrf (dlook_ahead_update.c:24) ==25673== by 0x1733954: pdgssvx (pdgssvx.c:1124) ==25673== Conditional jump or move depends on uninitialised value(s) ==25673== at 0x5C83F43: PMPI_Recv (in /opt/mpich3/lib/libmpi.so.12.1.0) ==25673== by 0x1755385: pdgstrf2_trsm (pdgstrf2.c:253) ==25673== by 0x1751E4F: pdgstrf (dlook_ahead_update.c:195) ==25673== by 0x1733954: pdgssvx (pdgssvx.c:1124) ==25674== Use of uninitialised value of size 8 ==25674== at 0x62BF72B: _itoa_word (_itoa.c:179) ==25674== by 0x62C1289: printf_positional (vfprintf.c:2022) ==25674== by 0x62C2465: vfprintf (vfprintf.c:1677) ==25674== by 0x638AFD5: __vsnprintf_chk (vsnprintf_chk.c:63) ==25674== by 0x638AF37: __snprintf_chk (snprintf_chk.c:34) ==25674== by 0x5CC6C08: MPIR_Err_create_code_valist (in /opt/mpich3/lib/libmpi.so.12.1.0) ==25674== by 0x5CC7A9A: MPIR_Err_create_code (in /opt/mpich3/lib/libmpi.so.12.1.0) ==25674== by 0x5C83FB1: PMPI_Recv (in /opt/mpich3/lib/libmpi.so.12.1.0) ==25674== by 0x1755385: pdgstrf2_trsm (pdgstrf2.c:253) ==25674== by 0x1751E4F: pdgstrf (dlook_ahead_update.c:195) ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) ==25674== Use of uninitialised value of size 8 ==25674== at 0x1751E92: pdgstrf (dlook_ahead_update.c:205) ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) And it crashes after this: ==25674== Invalid write of size 4 ==25674== at 0x1751F2F: pdgstrf (dlook_ahead_update.c:211) ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) ==25674== by 0xAAEFAE: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:421) ==25674== Address 0xa0 is not stack'd, malloc'd or (recently) free'd ==25674== [1]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range On 10/11/2016 03:26 PM, Anton Popov wrote: > > On 10/10/2016 07:11 PM, Satish Balay wrote: >> Thats from petsc-3.5 >> >> Anton - please post the stack trace you get with >> --download-superlu_dist-commit=origin/maint > > I guess this is it: > > [0]PETSC ERROR: [0] SuperLU_DIST:pdgssvx line 421 > /home/anton/LIB/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [0]PETSC ERROR: [0] MatLUFactorNumeric_SuperLU_DIST line 282 > /home/anton/LIB/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [0]PETSC ERROR: [0] MatLUFactorNumeric line 2985 > /home/anton/LIB/petsc/src/mat/interface/matrix.c > [0]PETSC ERROR: [0] PCSetUp_LU line 101 > /home/anton/LIB/petsc/src/ksp/pc/impls/factor/lu/lu.c > [0]PETSC ERROR: [0] PCSetUp line 930 > /home/anton/LIB/petsc/src/ksp/pc/interface/precon.c > > According to the line numbers it crashes within > MatLUFactorNumeric_SuperLU_DIST while calling pdgssvx. > > Surprisingly this only happens on the second SNES iteration, but not > on the first. > > I'm trying to reproduce this behavior with PETSc KSP and SNES > examples. However, everything I've tried up to now with SuperLU_DIST > does just fine. > > I'm also checking our code in Valgrind to make sure it's clean. > > Anton >> >> Satish >> >> >> On Mon, 10 Oct 2016, Xiaoye S. Li wrote: >> >>> Which version of superlu_dist does this capture? I looked at the >>> original >>> error log, it pointed to pdgssvx: line 161. But that line is in >>> comment >>> block, not the program. >>> >>> Sherry >>> >>> >>> On Mon, Oct 10, 2016 at 7:27 AM, Anton Popov >>> wrote: >>> >>>> >>>> On 10/07/2016 05:23 PM, Satish Balay wrote: >>>> >>>>> On Fri, 7 Oct 2016, Kong, Fande wrote: >>>>> >>>>> On Fri, Oct 7, 2016 at 9:04 AM, Satish Balay >>>>> wrote: >>>>>> On Fri, 7 Oct 2016, Anton Popov wrote: >>>>>>> Hi guys, >>>>>>>> are there any news about fixing buggy behavior of SuperLU_DIST, >>>>>>>> exactly >>>>>>>> >>>>>>> what >>>>>>> >>>>>>>> is described here: >>>>>>>> >>>>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists. >>>>>>>> >>>>>>> mcs.anl.gov_pipermail_petsc-2Dusers_2015-2DAugust_026802.htm >>>>>>> l&d=CwIBAg&c= >>>>>>> 54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_ >>>>>>> JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=RwruX6ckX0t9H89Z6LXKBfJBOAM2vG >>>>>>> 1sQHw2tIsSQtA&s=bbB62oGLm582JebVs8xsUej_OX0eUwibAKsRRWKafos&e= ? >>>>>>> >>>>>>>> I'm using 3.7.4 and still get SEGV in pdgssvx routine. >>>>>>>> Everything works >>>>>>>> >>>>>>> fine >>>>>>> >>>>>>>> with 3.5.4. >>>>>>>> >>>>>>>> Do I still have to stick to maint branch, and what are the >>>>>>>> chances for >>>>>>>> >>>>>>> these >>>>>>> >>>>>>>> fixes to be included in 3.7.5? >>>>>>>> >>>>>>> 3.7.4. is off maint branch [as of a week ago]. So if you are seeing >>>>>>> issues with it - its best to debug and figure out the cause. >>>>>>> >>>>>>> This bug is indeed inside of superlu_dist, and we started having >>>>>>> this >>>>>> issue >>>>>> from PETSc-3.6.x. I think superlu_dist developers should have >>>>>> fixed this >>>>>> bug. We forgot to update superlu_dist?? This is not a thing >>>>>> users could >>>>>> debug and fix. >>>>>> >>>>>> I have many people in INL suffering from this issue, and they >>>>>> have to >>>>>> stay >>>>>> with PETSc-3.5.4 to use superlu_dist. >>>>>> >>>>> To verify if the bug is fixed in latest superlu_dist - you can try >>>>> [assuming you have git - either from petsc-3.7/maint/master]: >>>>> >>>>> --download-superlu_dist --download-superlu_dist-commit=origin/maint >>>>> >>>>> >>>>> Satish >>>>> >>>>> Hi Satish, >>>> I did this: >>>> >>>> git clone -b maint https://bitbucket.org/petsc/petsc.git petsc >>>> >>>> --download-superlu_dist >>>> --download-superlu_dist-commit=origin/maint (not sure this is needed, >>>> since I'm already in maint) >>>> >>>> The problem is still there. >>>> >>>> Cheers, >>>> Anton >>>> > From fande.kong at inl.gov Tue Oct 11 09:33:22 2016 From: fande.kong at inl.gov (Kong, Fande) Date: Tue, 11 Oct 2016 08:33:22 -0600 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: References: Message-ID: Barry, Thanks so much for your explanation. It helps me a lot. On Mon, Oct 10, 2016 at 4:00 PM, Barry Smith wrote: > > > On Oct 10, 2016, at 4:01 PM, Kong, Fande wrote: > > > > Hi All, > > > > I know how to remove the null spaces from a singular system using > creating a MatNullSpace and attaching it to Mat. > > > > I was really wondering what is the philosophy behind this? The exact > algorithms we are using in PETSc right now? Where we are dealing with > this, preconditioner, linear solver, or nonlinear solver? > > It is in the Krylov solver. > > The idea is very simple. Say you have a singular A with null space N > (that all values Ny are in the null space of A. So N is tall and skinny) > and you want to solve A x = b where b is in the range of A. This problem > has an infinite number of solutions Ny + x* since A (Ny + x*) = ANy + > Ax* = Ax* = b where x* is the "minimum norm solution; that is Ax* = b and > x* has the smallest norm of all solutions. > > With left preconditioning B A x = B b GMRES, for example, normally > computes the solution in the as alpha_1 Bb + alpha_2 BABb + alpha_3 > BABABAb + .... but the B operator will likely introduce some component > into the direction of the null space so as GMRES continues the "solution" > computed will grow larger and larger with a large component in the null > space of A. Hence we simply modify GMRES a tiny bit by building the > solution from alpha_1 (I-N)Bb + alpha_2 (I-N)BABb + alpha_3 Does "I" mean an identity matrix? Could you possibly send me a link for this GMRES implementation, that is, how PETSc does this in the actual code? > (I-N)BABABAb + .... that is we remove from each new direction anything in > the direction of the null space. Hence the null space doesn't directly > appear in the preconditioner, just in the KSP method. If you attach a > null space to the matrix, the KSP just automatically uses it to do the > removal above. > > With right preconditioning the solution is built from alpha_1 b + > alpha_2 ABb + alpha_3 ABABb + .... and again we apply (I-N) to each term to > remove any part that is in the null space of A. > > Now consider the case A y = b where b is NOT in the range of A. So > the problem has no "true" solution, but one can find a least squares > solution by rewriting b = b_par + b_perp where b_par is in the range of A > and b_perp is orthogonal to the range of A and solve instead A x = > b_perp. If you provide a MatSetTransposeNullSpace() then KSP automatically > uses it to remove b_perp from the right hand side before starting the KSP > iterations. > > The manual pages for MatNullSpaceAttach() and > MatTranposeNullSpaceAttach() discuss this an explain how it relates to the > fundamental theorem of linear algebra. > > Note that for symmetric matrices the two null spaces are the same. > > Barry > > > A different note: This "trick" is not a "cure all" for a totally > inappropriate preconditioner. For example if one uses for a preconditioner > a direct (sparse or dense) solver or an ILU(k) one can end up with a very > bad solver because the direct solver will likely produce a very small pivot > at some point thus the triangular solver applied in the precondition can > produce HUGE changes in the solution (that are not physical) and so the > preconditioner basically produces garbage. On the other hand sometimes it > works out ok. > What preconditioners are appropriate? asm, bjacobi, amg? I have an example which shows lu and ilu indeed work, but asm and bjacobi do not at all. That is why I am asking questions about algorithms. I am trying to figure out a default preconditioner for several singular systems. Thanks again. Fande Kong, > > > > > > > > Fande Kong, > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Oct 11 11:39:11 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 11 Oct 2016 11:39:11 -0500 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: References: Message-ID: <6C480615-29D4-4C1A-8FE1-2B42BC96A69C@mcs.anl.gov> > On Oct 11, 2016, at 9:33 AM, Kong, Fande wrote: > > Barry, Thanks so much for your explanation. It helps me a lot. > > On Mon, Oct 10, 2016 at 4:00 PM, Barry Smith wrote: > > > On Oct 10, 2016, at 4:01 PM, Kong, Fande wrote: > > > > Hi All, > > > > I know how to remove the null spaces from a singular system using creating a MatNullSpace and attaching it to Mat. > > > > I was really wondering what is the philosophy behind this? The exact algorithms we are using in PETSc right now? Where we are dealing with this, preconditioner, linear solver, or nonlinear solver? > > It is in the Krylov solver. > > The idea is very simple. Say you have a singular A with null space N (that all values Ny are in the null space of A. So N is tall and skinny) and you want to solve A x = b where b is in the range of A. This problem has an infinite number of solutions Ny + x* since A (Ny + x*) = ANy + Ax* = Ax* = b where x* is the "minimum norm solution; that is Ax* = b and x* has the smallest norm of all solutions. > > With left preconditioning B A x = B b GMRES, for example, normally computes the solution in the as alpha_1 Bb + alpha_2 BABb + alpha_3 BABABAb + .... but the B operator will likely introduce some component into the direction of the null space so as GMRES continues the "solution" computed will grow larger and larger with a large component in the null space of A. Hence we simply modify GMRES a tiny bit by building the solution from alpha_1 (I-N)Bb + alpha_2 (I-N)BABb + alpha_3 > > Does "I" mean an identity matrix? Could you possibly send me a link for this GMRES implementation, that is, how PETSc does this in the actual code? Yes. It is in the helper routine KSP_PCApplyBAorAB() #undef __FUNCT__ #define __FUNCT__ "KSP_PCApplyBAorAB" PETSC_STATIC_INLINE PetscErrorCode KSP_PCApplyBAorAB(KSP ksp,Vec x,Vec y,Vec w) { PetscErrorCode ierr; PetscFunctionBegin; if (!ksp->transpose_solve) { ierr = PCApplyBAorAB(ksp->pc,ksp->pc_side,x,y,w);CHKERRQ(ierr); ierr = KSP_RemoveNullSpace(ksp,y);CHKERRQ(ierr); } else { ierr = PCApplyBAorABTranspose(ksp->pc,ksp->pc_side,x,y,w);CHKERRQ(ierr); } PetscFunctionReturn(0); } There is no code directly in the GMRES or other methods. > > (I-N)BABABAb + .... that is we remove from each new direction anything in the direction of the null space. Hence the null space doesn't directly appear in the preconditioner, just in the KSP method. If you attach a null space to the matrix, the KSP just automatically uses it to do the removal above. > > With right preconditioning the solution is built from alpha_1 b + alpha_2 ABb + alpha_3 ABABb + .... and again we apply (I-N) to each term to remove any part that is in the null space of A. > > Now consider the case A y = b where b is NOT in the range of A. So the problem has no "true" solution, but one can find a least squares solution by rewriting b = b_par + b_perp where b_par is in the range of A and b_perp is orthogonal to the range of A and solve instead A x = b_perp. If you provide a MatSetTransposeNullSpace() then KSP automatically uses it to remove b_perp from the right hand side before starting the KSP iterations. > > The manual pages for MatNullSpaceAttach() and MatTranposeNullSpaceAttach() discuss this an explain how it relates to the fundamental theorem of linear algebra. > > Note that for symmetric matrices the two null spaces are the same. > > Barry > > > A different note: This "trick" is not a "cure all" for a totally inappropriate preconditioner. For example if one uses for a preconditioner a direct (sparse or dense) solver or an ILU(k) one can end up with a very bad solver because the direct solver will likely produce a very small pivot at some point thus the triangular solver applied in the precondition can produce HUGE changes in the solution (that are not physical) and so the preconditioner basically produces garbage. On the other hand sometimes it works out ok. > > What preconditioners are appropriate? asm, bjacobi, amg? I have an example which shows lu and ilu indeed work, but asm and bjacobi do not at all. That is why I am asking questions about algorithms. I am trying to figure out a default preconditioner for several singular systems. Hmm, normally asm and bjacobi would be fine with this unless one or more of the subblocks are themselves singular (which normally won't happen). AMG can also work find sometimes. Can you send a sample code? Barry > > Thanks again. > > > Fande Kong, > > > > > > > > > Fande Kong, From fande.kong at inl.gov Tue Oct 11 12:01:34 2016 From: fande.kong at inl.gov (Kong, Fande) Date: Tue, 11 Oct 2016 11:01:34 -0600 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: <6C480615-29D4-4C1A-8FE1-2B42BC96A69C@mcs.anl.gov> References: <6C480615-29D4-4C1A-8FE1-2B42BC96A69C@mcs.anl.gov> Message-ID: On Tue, Oct 11, 2016 at 10:39 AM, Barry Smith wrote: > > > On Oct 11, 2016, at 9:33 AM, Kong, Fande wrote: > > > > Barry, Thanks so much for your explanation. It helps me a lot. > > > > On Mon, Oct 10, 2016 at 4:00 PM, Barry Smith wrote: > > > > > On Oct 10, 2016, at 4:01 PM, Kong, Fande wrote: > > > > > > Hi All, > > > > > > I know how to remove the null spaces from a singular system using > creating a MatNullSpace and attaching it to Mat. > > > > > > I was really wondering what is the philosophy behind this? The exact > algorithms we are using in PETSc right now? Where we are dealing with > this, preconditioner, linear solver, or nonlinear solver? > > > > It is in the Krylov solver. > > > > The idea is very simple. Say you have a singular A with null space > N (that all values Ny are in the null space of A. So N is tall and skinny) > and you want to solve A x = b where b is in the range of A. This problem > has an infinite number of solutions Ny + x* since A (Ny + x*) = ANy + > Ax* = Ax* = b where x* is the "minimum norm solution; that is Ax* = b and > x* has the smallest norm of all solutions. > > > > With left preconditioning B A x = B b GMRES, for example, > normally computes the solution in the as alpha_1 Bb + alpha_2 BABb + > alpha_3 BABABAb + .... but the B operator will likely introduce some > component into the direction of the null space so as GMRES continues the > "solution" computed will grow larger and larger with a large component in > the null space of A. Hence we simply modify GMRES a tiny bit by building > the solution from alpha_1 (I-N)Bb + alpha_2 (I-N)BABb + alpha_3 > > > > Does "I" mean an identity matrix? Could you possibly send me a link for > this GMRES implementation, that is, how PETSc does this in the actual code? > > Yes. > > It is in the helper routine KSP_PCApplyBAorAB() > #undef __FUNCT__ > #define __FUNCT__ "KSP_PCApplyBAorAB" > PETSC_STATIC_INLINE PetscErrorCode KSP_PCApplyBAorAB(KSP ksp,Vec x,Vec > y,Vec w) > { > PetscErrorCode ierr; > PetscFunctionBegin; > if (!ksp->transpose_solve) { > ierr = PCApplyBAorAB(ksp->pc,ksp->pc_side,x,y,w);CHKERRQ(ierr); > ierr = KSP_RemoveNullSpace(ksp,y);CHKERRQ(ierr); > } else { > ierr = PCApplyBAorABTranspose(ksp->pc,ksp->pc_side,x,y,w); > CHKERRQ(ierr); > } > PetscFunctionReturn(0); > } > PETSC_STATIC_INLINE PetscErrorCode KSP_RemoveNullSpace(KSP ksp,Vec y) { PetscErrorCode ierr; PetscFunctionBegin; if (ksp->pc_side == PC_LEFT) { Mat A; MatNullSpace nullsp; ierr = PCGetOperators(ksp->pc,&A,NULL);CHKERRQ(ierr); ierr = MatGetNullSpace(A,&nullsp);CHKERRQ(ierr); if (nullsp) { ierr = MatNullSpaceRemove(nullsp,y);CHKERRQ(ierr); } } PetscFunctionReturn(0); } "ksp->pc_side == PC_LEFT" deals with the left preconditioning Krylov methods only? How about the right preconditioning ones? Are they just magically right for the right preconditioning Krylov methods? Fande Kong, > > There is no code directly in the GMRES or other methods. > > > > > (I-N)BABABAb + .... that is we remove from each new direction anything > in the direction of the null space. Hence the null space doesn't directly > appear in the preconditioner, just in the KSP method. If you attach a > null space to the matrix, the KSP just automatically uses it to do the > removal above. > > > > With right preconditioning the solution is built from alpha_1 b + > alpha_2 ABb + alpha_3 ABABb + .... and again we apply (I-N) to each term to > remove any part that is in the null space of A. > > > > Now consider the case A y = b where b is NOT in the range of A. So > the problem has no "true" solution, but one can find a least squares > solution by rewriting b = b_par + b_perp where b_par is in the range of A > and b_perp is orthogonal to the range of A and solve instead A x = > b_perp. If you provide a MatSetTransposeNullSpace() then KSP automatically > uses it to remove b_perp from the right hand side before starting the KSP > iterations. > > > > The manual pages for MatNullSpaceAttach() and > MatTranposeNullSpaceAttach() discuss this an explain how it relates to the > fundamental theorem of linear algebra. > > > > Note that for symmetric matrices the two null spaces are the same. > > > > Barry > > > > > > A different note: This "trick" is not a "cure all" for a totally > inappropriate preconditioner. For example if one uses for a preconditioner > a direct (sparse or dense) solver or an ILU(k) one can end up with a very > bad solver because the direct solver will likely produce a very small pivot > at some point thus the triangular solver applied in the precondition can > produce HUGE changes in the solution (that are not physical) and so the > preconditioner basically produces garbage. On the other hand sometimes it > works out ok. > > > > What preconditioners are appropriate? asm, bjacobi, amg? I have an > example which shows lu and ilu indeed work, but asm and bjacobi do not at > all. That is why I am asking questions about algorithms. I am trying to > figure out a default preconditioner for several singular systems. > > Hmm, normally asm and bjacobi would be fine with this unless one or > more of the subblocks are themselves singular (which normally won't > happen). AMG can also work find sometimes. > > Can you send a sample code? > > Barry > > > > > Thanks again. > > > > > > Fande Kong, > > > > > > > > > > > > > > > Fande Kong, > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue Oct 11 12:19:04 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 11 Oct 2016 12:19:04 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> Message-ID: This log looks truncated. Are there any valgrind mesages before this? [like from your application code - or from MPI] Perhaps you can send the complete log - with: valgrind -q --tool=memcheck --leak-check=yes --num-callers=20 --track-origins=yes [and if there were more valgrind messages from MPI - rebuild petsc with --download-mpich - for a valgrind clean mpi] Sherry, Perhaps this log points to some issue in superlu_dist? thanks, Satish On Tue, 11 Oct 2016, Anton Popov wrote: > Valgrind immediately detects interesting stuff: > > ==25673== Use of uninitialised value of size 8 > ==25673== at 0x178272C: static_schedule (static_schedule.c:960) > ==25674== Use of uninitialised value of size 8 > ==25674== at 0x178272C: static_schedule (static_schedule.c:960) > ==25674== by 0x174E74E: pdgstrf (pdgstrf.c:572) > ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) > > > ==25673== Conditional jump or move depends on uninitialised value(s) > ==25673== at 0x1752143: pdgstrf (dlook_ahead_update.c:24) > ==25673== by 0x1733954: pdgssvx (pdgssvx.c:1124) > > > ==25673== Conditional jump or move depends on uninitialised value(s) > ==25673== at 0x5C83F43: PMPI_Recv (in /opt/mpich3/lib/libmpi.so.12.1.0) > ==25673== by 0x1755385: pdgstrf2_trsm (pdgstrf2.c:253) > ==25673== by 0x1751E4F: pdgstrf (dlook_ahead_update.c:195) > ==25673== by 0x1733954: pdgssvx (pdgssvx.c:1124) > > ==25674== Use of uninitialised value of size 8 > ==25674== at 0x62BF72B: _itoa_word (_itoa.c:179) > ==25674== by 0x62C1289: printf_positional (vfprintf.c:2022) > ==25674== by 0x62C2465: vfprintf (vfprintf.c:1677) > ==25674== by 0x638AFD5: __vsnprintf_chk (vsnprintf_chk.c:63) > ==25674== by 0x638AF37: __snprintf_chk (snprintf_chk.c:34) > ==25674== by 0x5CC6C08: MPIR_Err_create_code_valist (in > /opt/mpich3/lib/libmpi.so.12.1.0) > ==25674== by 0x5CC7A9A: MPIR_Err_create_code (in > /opt/mpich3/lib/libmpi.so.12.1.0) > ==25674== by 0x5C83FB1: PMPI_Recv (in /opt/mpich3/lib/libmpi.so.12.1.0) > ==25674== by 0x1755385: pdgstrf2_trsm (pdgstrf2.c:253) > ==25674== by 0x1751E4F: pdgstrf (dlook_ahead_update.c:195) > ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) > > ==25674== Use of uninitialised value of size 8 > ==25674== at 0x1751E92: pdgstrf (dlook_ahead_update.c:205) > ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) > > And it crashes after this: > > ==25674== Invalid write of size 4 > ==25674== at 0x1751F2F: pdgstrf (dlook_ahead_update.c:211) > ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) > ==25674== by 0xAAEFAE: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:421) > ==25674== Address 0xa0 is not stack'd, malloc'd or (recently) free'd > ==25674== > [1]PETSC ERROR: > ------------------------------------------------------------------------ > [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably > memory access out of range > > > On 10/11/2016 03:26 PM, Anton Popov wrote: > > > > On 10/10/2016 07:11 PM, Satish Balay wrote: > > > Thats from petsc-3.5 > > > > > > Anton - please post the stack trace you get with > > > --download-superlu_dist-commit=origin/maint > > > > I guess this is it: > > > > [0]PETSC ERROR: [0] SuperLU_DIST:pdgssvx line 421 > > /home/anton/LIB/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > > [0]PETSC ERROR: [0] MatLUFactorNumeric_SuperLU_DIST line 282 > > /home/anton/LIB/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > > [0]PETSC ERROR: [0] MatLUFactorNumeric line 2985 > > /home/anton/LIB/petsc/src/mat/interface/matrix.c > > [0]PETSC ERROR: [0] PCSetUp_LU line 101 > > /home/anton/LIB/petsc/src/ksp/pc/impls/factor/lu/lu.c > > [0]PETSC ERROR: [0] PCSetUp line 930 > > /home/anton/LIB/petsc/src/ksp/pc/interface/precon.c > > > > According to the line numbers it crashes within > > MatLUFactorNumeric_SuperLU_DIST while calling pdgssvx. > > > > Surprisingly this only happens on the second SNES iteration, but not on the > > first. > > > > I'm trying to reproduce this behavior with PETSc KSP and SNES examples. > > However, everything I've tried up to now with SuperLU_DIST does just fine. > > > > I'm also checking our code in Valgrind to make sure it's clean. > > > > Anton > > > > > > Satish > > > > > > > > > On Mon, 10 Oct 2016, Xiaoye S. Li wrote: > > > > > > > Which version of superlu_dist does this capture? I looked at the > > > > original > > > > error log, it pointed to pdgssvx: line 161. But that line is in > > > > comment > > > > block, not the program. > > > > > > > > Sherry > > > > > > > > > > > > On Mon, Oct 10, 2016 at 7:27 AM, Anton Popov wrote: > > > > > > > > > > > > > > On 10/07/2016 05:23 PM, Satish Balay wrote: > > > > > > > > > > > On Fri, 7 Oct 2016, Kong, Fande wrote: > > > > > > > > > > > > On Fri, Oct 7, 2016 at 9:04 AM, Satish Balay > > > > > > wrote: > > > > > > > On Fri, 7 Oct 2016, Anton Popov wrote: > > > > > > > > Hi guys, > > > > > > > > > are there any news about fixing buggy behavior of > > > > > > > > > SuperLU_DIST, exactly > > > > > > > > > > > > > > > > > what > > > > > > > > > > > > > > > > > is described here: > > > > > > > > > > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists. > > > > > > > > > > > > > > > > > mcs.anl.gov_pipermail_petsc-2Dusers_2015-2DAugust_026802.htm > > > > > > > > l&d=CwIBAg&c= > > > > > > > > 54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_ > > > > > > > > JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=RwruX6ckX0t9H89Z6LXKBfJBOAM2vG > > > > > > > > 1sQHw2tIsSQtA&s=bbB62oGLm582JebVs8xsUej_OX0eUwibAKsRRWKafos&e= ? > > > > > > > > > > > > > > > > > I'm using 3.7.4 and still get SEGV in pdgssvx routine. > > > > > > > > > Everything works > > > > > > > > > > > > > > > > > fine > > > > > > > > > > > > > > > > > with 3.5.4. > > > > > > > > > > > > > > > > > > Do I still have to stick to maint branch, and what are the > > > > > > > > > chances for > > > > > > > > > > > > > > > > > these > > > > > > > > > > > > > > > > > fixes to be included in 3.7.5? > > > > > > > > > > > > > > > > > 3.7.4. is off maint branch [as of a week ago]. So if you are > > > > > > > > seeing > > > > > > > > issues with it - its best to debug and figure out the cause. > > > > > > > > > > > > > > > > This bug is indeed inside of superlu_dist, and we started having > > > > > > > > this > > > > > > > issue > > > > > > > from PETSc-3.6.x. I think superlu_dist developers should have > > > > > > > fixed this > > > > > > > bug. We forgot to update superlu_dist?? This is not a thing users > > > > > > > could > > > > > > > debug and fix. > > > > > > > > > > > > > > I have many people in INL suffering from this issue, and they have > > > > > > > to > > > > > > > stay > > > > > > > with PETSc-3.5.4 to use superlu_dist. > > > > > > > > > > > > > To verify if the bug is fixed in latest superlu_dist - you can try > > > > > > [assuming you have git - either from petsc-3.7/maint/master]: > > > > > > > > > > > > --download-superlu_dist --download-superlu_dist-commit=origin/maint > > > > > > > > > > > > > > > > > > Satish > > > > > > > > > > > > Hi Satish, > > > > > I did this: > > > > > > > > > > git clone -b maint https://bitbucket.org/petsc/petsc.git petsc > > > > > > > > > > --download-superlu_dist > > > > > --download-superlu_dist-commit=origin/maint (not sure this is needed, > > > > > since I'm already in maint) > > > > > > > > > > The problem is still there. > > > > > > > > > > Cheers, > > > > > Anton > > > > > > > > > > From bsmith at mcs.anl.gov Tue Oct 11 12:44:15 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 11 Oct 2016 12:44:15 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> Message-ID: <55515EF5-6072-4AAD-AF69-287448F1FD72@mcs.anl.gov> You can run your code with -ksp_view_mat binary -ksp_view_rhs binary this will cause it to save the matrices and right hand sides to the linear systems in a file called binaryoutput, then email the file to petsc-maint at mcs.anl.gov (don't worry this email address accepts large attachments). And tell us how many processes you ran on that produced the problems. Barry > On Oct 11, 2016, at 12:19 PM, Satish Balay wrote: > > This log looks truncated. Are there any valgrind mesages before this? > [like from your application code - or from MPI] > > Perhaps you can send the complete log - with: > valgrind -q --tool=memcheck --leak-check=yes --num-callers=20 --track-origins=yes > > [and if there were more valgrind messages from MPI - rebuild petsc > with --download-mpich - for a valgrind clean mpi] > > Sherry, > Perhaps this log points to some issue in superlu_dist? > > thanks, > Satish > > On Tue, 11 Oct 2016, Anton Popov wrote: > >> Valgrind immediately detects interesting stuff: >> >> ==25673== Use of uninitialised value of size 8 >> ==25673== at 0x178272C: static_schedule (static_schedule.c:960) >> ==25674== Use of uninitialised value of size 8 >> ==25674== at 0x178272C: static_schedule (static_schedule.c:960) >> ==25674== by 0x174E74E: pdgstrf (pdgstrf.c:572) >> ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) >> >> >> ==25673== Conditional jump or move depends on uninitialised value(s) >> ==25673== at 0x1752143: pdgstrf (dlook_ahead_update.c:24) >> ==25673== by 0x1733954: pdgssvx (pdgssvx.c:1124) >> >> >> ==25673== Conditional jump or move depends on uninitialised value(s) >> ==25673== at 0x5C83F43: PMPI_Recv (in /opt/mpich3/lib/libmpi.so.12.1.0) >> ==25673== by 0x1755385: pdgstrf2_trsm (pdgstrf2.c:253) >> ==25673== by 0x1751E4F: pdgstrf (dlook_ahead_update.c:195) >> ==25673== by 0x1733954: pdgssvx (pdgssvx.c:1124) >> >> ==25674== Use of uninitialised value of size 8 >> ==25674== at 0x62BF72B: _itoa_word (_itoa.c:179) >> ==25674== by 0x62C1289: printf_positional (vfprintf.c:2022) >> ==25674== by 0x62C2465: vfprintf (vfprintf.c:1677) >> ==25674== by 0x638AFD5: __vsnprintf_chk (vsnprintf_chk.c:63) >> ==25674== by 0x638AF37: __snprintf_chk (snprintf_chk.c:34) >> ==25674== by 0x5CC6C08: MPIR_Err_create_code_valist (in >> /opt/mpich3/lib/libmpi.so.12.1.0) >> ==25674== by 0x5CC7A9A: MPIR_Err_create_code (in >> /opt/mpich3/lib/libmpi.so.12.1.0) >> ==25674== by 0x5C83FB1: PMPI_Recv (in /opt/mpich3/lib/libmpi.so.12.1.0) >> ==25674== by 0x1755385: pdgstrf2_trsm (pdgstrf2.c:253) >> ==25674== by 0x1751E4F: pdgstrf (dlook_ahead_update.c:195) >> ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) >> >> ==25674== Use of uninitialised value of size 8 >> ==25674== at 0x1751E92: pdgstrf (dlook_ahead_update.c:205) >> ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) >> >> And it crashes after this: >> >> ==25674== Invalid write of size 4 >> ==25674== at 0x1751F2F: pdgstrf (dlook_ahead_update.c:211) >> ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) >> ==25674== by 0xAAEFAE: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:421) >> ==25674== Address 0xa0 is not stack'd, malloc'd or (recently) free'd >> ==25674== >> [1]PETSC ERROR: >> ------------------------------------------------------------------------ >> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably >> memory access out of range >> >> >> On 10/11/2016 03:26 PM, Anton Popov wrote: >>> >>> On 10/10/2016 07:11 PM, Satish Balay wrote: >>>> Thats from petsc-3.5 >>>> >>>> Anton - please post the stack trace you get with >>>> --download-superlu_dist-commit=origin/maint >>> >>> I guess this is it: >>> >>> [0]PETSC ERROR: [0] SuperLU_DIST:pdgssvx line 421 >>> /home/anton/LIB/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >>> [0]PETSC ERROR: [0] MatLUFactorNumeric_SuperLU_DIST line 282 >>> /home/anton/LIB/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >>> [0]PETSC ERROR: [0] MatLUFactorNumeric line 2985 >>> /home/anton/LIB/petsc/src/mat/interface/matrix.c >>> [0]PETSC ERROR: [0] PCSetUp_LU line 101 >>> /home/anton/LIB/petsc/src/ksp/pc/impls/factor/lu/lu.c >>> [0]PETSC ERROR: [0] PCSetUp line 930 >>> /home/anton/LIB/petsc/src/ksp/pc/interface/precon.c >>> >>> According to the line numbers it crashes within >>> MatLUFactorNumeric_SuperLU_DIST while calling pdgssvx. >>> >>> Surprisingly this only happens on the second SNES iteration, but not on the >>> first. >>> >>> I'm trying to reproduce this behavior with PETSc KSP and SNES examples. >>> However, everything I've tried up to now with SuperLU_DIST does just fine. >>> >>> I'm also checking our code in Valgrind to make sure it's clean. >>> >>> Anton >>>> >>>> Satish >>>> >>>> >>>> On Mon, 10 Oct 2016, Xiaoye S. Li wrote: >>>> >>>>> Which version of superlu_dist does this capture? I looked at the >>>>> original >>>>> error log, it pointed to pdgssvx: line 161. But that line is in >>>>> comment >>>>> block, not the program. >>>>> >>>>> Sherry >>>>> >>>>> >>>>> On Mon, Oct 10, 2016 at 7:27 AM, Anton Popov wrote: >>>>> >>>>>> >>>>>> On 10/07/2016 05:23 PM, Satish Balay wrote: >>>>>> >>>>>>> On Fri, 7 Oct 2016, Kong, Fande wrote: >>>>>>> >>>>>>> On Fri, Oct 7, 2016 at 9:04 AM, Satish Balay >>>>>>> wrote: >>>>>>>> On Fri, 7 Oct 2016, Anton Popov wrote: >>>>>>>>> Hi guys, >>>>>>>>>> are there any news about fixing buggy behavior of >>>>>>>>>> SuperLU_DIST, exactly >>>>>>>>>> >>>>>>>>> what >>>>>>>>> >>>>>>>>>> is described here: >>>>>>>>>> >>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists. >>>>>>>>>> >>>>>>>>> mcs.anl.gov_pipermail_petsc-2Dusers_2015-2DAugust_026802.htm >>>>>>>>> l&d=CwIBAg&c= >>>>>>>>> 54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_ >>>>>>>>> JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=RwruX6ckX0t9H89Z6LXKBfJBOAM2vG >>>>>>>>> 1sQHw2tIsSQtA&s=bbB62oGLm582JebVs8xsUej_OX0eUwibAKsRRWKafos&e= ? >>>>>>>>> >>>>>>>>>> I'm using 3.7.4 and still get SEGV in pdgssvx routine. >>>>>>>>>> Everything works >>>>>>>>>> >>>>>>>>> fine >>>>>>>>> >>>>>>>>>> with 3.5.4. >>>>>>>>>> >>>>>>>>>> Do I still have to stick to maint branch, and what are the >>>>>>>>>> chances for >>>>>>>>>> >>>>>>>>> these >>>>>>>>> >>>>>>>>>> fixes to be included in 3.7.5? >>>>>>>>>> >>>>>>>>> 3.7.4. is off maint branch [as of a week ago]. So if you are >>>>>>>>> seeing >>>>>>>>> issues with it - its best to debug and figure out the cause. >>>>>>>>> >>>>>>>>> This bug is indeed inside of superlu_dist, and we started having >>>>>>>>> this >>>>>>>> issue >>>>>>>> from PETSc-3.6.x. I think superlu_dist developers should have >>>>>>>> fixed this >>>>>>>> bug. We forgot to update superlu_dist?? This is not a thing users >>>>>>>> could >>>>>>>> debug and fix. >>>>>>>> >>>>>>>> I have many people in INL suffering from this issue, and they have >>>>>>>> to >>>>>>>> stay >>>>>>>> with PETSc-3.5.4 to use superlu_dist. >>>>>>>> >>>>>>> To verify if the bug is fixed in latest superlu_dist - you can try >>>>>>> [assuming you have git - either from petsc-3.7/maint/master]: >>>>>>> >>>>>>> --download-superlu_dist --download-superlu_dist-commit=origin/maint >>>>>>> >>>>>>> >>>>>>> Satish >>>>>>> >>>>>>> Hi Satish, >>>>>> I did this: >>>>>> >>>>>> git clone -b maint https://bitbucket.org/petsc/petsc.git petsc >>>>>> >>>>>> --download-superlu_dist >>>>>> --download-superlu_dist-commit=origin/maint (not sure this is needed, >>>>>> since I'm already in maint) >>>>>> >>>>>> The problem is still there. >>>>>> >>>>>> Cheers, >>>>>> Anton >>>>>> >>> >> >> >> > From bsmith at mcs.anl.gov Tue Oct 11 13:18:05 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 11 Oct 2016 13:18:05 -0500 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: References: <6C480615-29D4-4C1A-8FE1-2B42BC96A69C@mcs.anl.gov> Message-ID: > On Oct 11, 2016, at 12:01 PM, Kong, Fande wrote: > > > > On Tue, Oct 11, 2016 at 10:39 AM, Barry Smith wrote: > > > On Oct 11, 2016, at 9:33 AM, Kong, Fande wrote: > > > > Barry, Thanks so much for your explanation. It helps me a lot. > > > > On Mon, Oct 10, 2016 at 4:00 PM, Barry Smith wrote: > > > > > On Oct 10, 2016, at 4:01 PM, Kong, Fande wrote: > > > > > > Hi All, > > > > > > I know how to remove the null spaces from a singular system using creating a MatNullSpace and attaching it to Mat. > > > > > > I was really wondering what is the philosophy behind this? The exact algorithms we are using in PETSc right now? Where we are dealing with this, preconditioner, linear solver, or nonlinear solver? > > > > It is in the Krylov solver. > > > > The idea is very simple. Say you have a singular A with null space N (that all values Ny are in the null space of A. So N is tall and skinny) and you want to solve A x = b where b is in the range of A. This problem has an infinite number of solutions Ny + x* since A (Ny + x*) = ANy + Ax* = Ax* = b where x* is the "minimum norm solution; that is Ax* = b and x* has the smallest norm of all solutions. > > > > With left preconditioning B A x = B b GMRES, for example, normally computes the solution in the as alpha_1 Bb + alpha_2 BABb + alpha_3 BABABAb + .... but the B operator will likely introduce some component into the direction of the null space so as GMRES continues the "solution" computed will grow larger and larger with a large component in the null space of A. Hence we simply modify GMRES a tiny bit by building the solution from alpha_1 (I-N)Bb + alpha_2 (I-N)BABb + alpha_3 > > > > Does "I" mean an identity matrix? Could you possibly send me a link for this GMRES implementation, that is, how PETSc does this in the actual code? > > Yes. > > It is in the helper routine KSP_PCApplyBAorAB() > #undef __FUNCT__ > #define __FUNCT__ "KSP_PCApplyBAorAB" > PETSC_STATIC_INLINE PetscErrorCode KSP_PCApplyBAorAB(KSP ksp,Vec x,Vec y,Vec w) > { > PetscErrorCode ierr; > PetscFunctionBegin; > if (!ksp->transpose_solve) { > ierr = PCApplyBAorAB(ksp->pc,ksp->pc_side,x,y,w);CHKERRQ(ierr); > ierr = KSP_RemoveNullSpace(ksp,y);CHKERRQ(ierr); > } else { > ierr = PCApplyBAorABTranspose(ksp->pc,ksp->pc_side,x,y,w);CHKERRQ(ierr); > } > PetscFunctionReturn(0); > } > > > PETSC_STATIC_INLINE PetscErrorCode KSP_RemoveNullSpace(KSP ksp,Vec y) > { > PetscErrorCode ierr; > PetscFunctionBegin; > if (ksp->pc_side == PC_LEFT) { > Mat A; > MatNullSpace nullsp; > ierr = PCGetOperators(ksp->pc,&A,NULL);CHKERRQ(ierr); > ierr = MatGetNullSpace(A,&nullsp);CHKERRQ(ierr); > if (nullsp) { > ierr = MatNullSpaceRemove(nullsp,y);CHKERRQ(ierr); > } > } > PetscFunctionReturn(0); > } > > "ksp->pc_side == PC_LEFT" deals with the left preconditioning Krylov methods only? How about the right preconditioning ones? Are they just magically right for the right preconditioning Krylov methods? This is a good question. I am working on a branch now where I will add some more comprehensive testing of the various cases and fix anything that comes up. Were you having trouble with ASM and bjacobi only for right preconditioning? Note that when A is symmetric the range of A is orthogonal to null space of A so yes I think in that case it is just "magically right" but if A is not symmetric then I don't think it is "magically right". I'll work on it. Barry > > Fande Kong, > > > There is no code directly in the GMRES or other methods. > > > > > (I-N)BABABAb + .... that is we remove from each new direction anything in the direction of the null space. Hence the null space doesn't directly appear in the preconditioner, just in the KSP method. If you attach a null space to the matrix, the KSP just automatically uses it to do the removal above. > > > > With right preconditioning the solution is built from alpha_1 b + alpha_2 ABb + alpha_3 ABABb + .... and again we apply (I-N) to each term to remove any part that is in the null space of A. > > > > Now consider the case A y = b where b is NOT in the range of A. So the problem has no "true" solution, but one can find a least squares solution by rewriting b = b_par + b_perp where b_par is in the range of A and b_perp is orthogonal to the range of A and solve instead A x = b_perp. If you provide a MatSetTransposeNullSpace() then KSP automatically uses it to remove b_perp from the right hand side before starting the KSP iterations. > > > > The manual pages for MatNullSpaceAttach() and MatTranposeNullSpaceAttach() discuss this an explain how it relates to the fundamental theorem of linear algebra. > > > > Note that for symmetric matrices the two null spaces are the same. > > > > Barry > > > > > > A different note: This "trick" is not a "cure all" for a totally inappropriate preconditioner. For example if one uses for a preconditioner a direct (sparse or dense) solver or an ILU(k) one can end up with a very bad solver because the direct solver will likely produce a very small pivot at some point thus the triangular solver applied in the precondition can produce HUGE changes in the solution (that are not physical) and so the preconditioner basically produces garbage. On the other hand sometimes it works out ok. > > > > What preconditioners are appropriate? asm, bjacobi, amg? I have an example which shows lu and ilu indeed work, but asm and bjacobi do not at all. That is why I am asking questions about algorithms. I am trying to figure out a default preconditioner for several singular systems. > > Hmm, normally asm and bjacobi would be fine with this unless one or more of the subblocks are themselves singular (which normally won't happen). AMG can also work find sometimes. > > Can you send a sample code? > > Barry > > > > > Thanks again. > > > > > > Fande Kong, > > > > > > > > > > > > > > > Fande Kong, From fande.kong at inl.gov Tue Oct 11 15:04:21 2016 From: fande.kong at inl.gov (Kong, Fande) Date: Tue, 11 Oct 2016 14:04:21 -0600 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: References: <6C480615-29D4-4C1A-8FE1-2B42BC96A69C@mcs.anl.gov> Message-ID: On Tue, Oct 11, 2016 at 12:18 PM, Barry Smith wrote: > > > On Oct 11, 2016, at 12:01 PM, Kong, Fande wrote: > > > > > > > > On Tue, Oct 11, 2016 at 10:39 AM, Barry Smith > wrote: > > > > > On Oct 11, 2016, at 9:33 AM, Kong, Fande wrote: > > > > > > Barry, Thanks so much for your explanation. It helps me a lot. > > > > > > On Mon, Oct 10, 2016 at 4:00 PM, Barry Smith > wrote: > > > > > > > On Oct 10, 2016, at 4:01 PM, Kong, Fande wrote: > > > > > > > > Hi All, > > > > > > > > I know how to remove the null spaces from a singular system using > creating a MatNullSpace and attaching it to Mat. > > > > > > > > I was really wondering what is the philosophy behind this? The exact > algorithms we are using in PETSc right now? Where we are dealing with > this, preconditioner, linear solver, or nonlinear solver? > > > > > > It is in the Krylov solver. > > > > > > The idea is very simple. Say you have a singular A with null > space N (that all values Ny are in the null space of A. So N is tall and > skinny) and you want to solve A x = b where b is in the range of A. This > problem has an infinite number of solutions Ny + x* since A (Ny + x*) > = ANy + Ax* = Ax* = b where x* is the "minimum norm solution; that is Ax* = > b and x* has the smallest norm of all solutions. > > > > > > With left preconditioning B A x = B b GMRES, for example, > normally computes the solution in the as alpha_1 Bb + alpha_2 BABb + > alpha_3 BABABAb + .... but the B operator will likely introduce some > component into the direction of the null space so as GMRES continues the > "solution" computed will grow larger and larger with a large component in > the null space of A. Hence we simply modify GMRES a tiny bit by building > the solution from alpha_1 (I-N)Bb + alpha_2 (I-N)BABb + alpha_3 > > > > > > Does "I" mean an identity matrix? Could you possibly send me a link > for this GMRES implementation, that is, how PETSc does this in the actual > code? > > > > Yes. > > > > It is in the helper routine KSP_PCApplyBAorAB() > > #undef __FUNCT__ > > #define __FUNCT__ "KSP_PCApplyBAorAB" > > PETSC_STATIC_INLINE PetscErrorCode KSP_PCApplyBAorAB(KSP ksp,Vec x,Vec > y,Vec w) > > { > > PetscErrorCode ierr; > > PetscFunctionBegin; > > if (!ksp->transpose_solve) { > > ierr = PCApplyBAorAB(ksp->pc,ksp->pc_side,x,y,w);CHKERRQ(ierr); > > ierr = KSP_RemoveNullSpace(ksp,y);CHKERRQ(ierr); > > } else { > > ierr = PCApplyBAorABTranspose(ksp->pc,ksp->pc_side,x,y,w); > CHKERRQ(ierr); > > } > > PetscFunctionReturn(0); > > } > > > > > > PETSC_STATIC_INLINE PetscErrorCode KSP_RemoveNullSpace(KSP ksp,Vec y) > > { > > PetscErrorCode ierr; > > PetscFunctionBegin; > > if (ksp->pc_side == PC_LEFT) { > > Mat A; > > MatNullSpace nullsp; > > ierr = PCGetOperators(ksp->pc,&A,NULL);CHKERRQ(ierr); > > ierr = MatGetNullSpace(A,&nullsp);CHKERRQ(ierr); > > if (nullsp) { > > ierr = MatNullSpaceRemove(nullsp,y);CHKERRQ(ierr); > > } > > } > > PetscFunctionReturn(0); > > } > > > > "ksp->pc_side == PC_LEFT" deals with the left preconditioning Krylov > methods only? How about the right preconditioning ones? Are they just > magically right for the right preconditioning Krylov methods? > > This is a good question. I am working on a branch now where I will add > some more comprehensive testing of the various cases and fix anything that > comes up. > > Were you having trouble with ASM and bjacobi only for right > preconditioning? > > Yes. ASM and bjacobi works fine for left preconditioning NOT for RIGHT preconditioning. bjacobi converges, but produces a wrong solution. ASM needs more iterations, however the solution is right. > Note that when A is symmetric the range of A is orthogonal to null > space of A so yes I think in that case it is just "magically right" but if > A is not symmetric then I don't think it is "magically right". I'll work on > it. > > > Barry > > > > > Fande Kong, > > > > > > There is no code directly in the GMRES or other methods. > > > > > > > > (I-N)BABABAb + .... that is we remove from each new direction > anything in the direction of the null space. Hence the null space doesn't > directly appear in the preconditioner, just in the KSP method. If you > attach a null space to the matrix, the KSP just automatically uses it to do > the removal above. > > > > > > With right preconditioning the solution is built from alpha_1 b > + alpha_2 ABb + alpha_3 ABABb + .... and again we apply (I-N) to each term > to remove any part that is in the null space of A. > > > > > > Now consider the case A y = b where b is NOT in the range of A. > So the problem has no "true" solution, but one can find a least squares > solution by rewriting b = b_par + b_perp where b_par is in the range of A > and b_perp is orthogonal to the range of A and solve instead A x = > b_perp. If you provide a MatSetTransposeNullSpace() then KSP automatically > uses it to remove b_perp from the right hand side before starting the KSP > iterations. > > > > > > The manual pages for MatNullSpaceAttach() and > MatTranposeNullSpaceAttach() discuss this an explain how it relates to the > fundamental theorem of linear algebra. > > > > > > Note that for symmetric matrices the two null spaces are the same. > > > > > > Barry > > > > > > > > > A different note: This "trick" is not a "cure all" for a totally > inappropriate preconditioner. For example if one uses for a preconditioner > a direct (sparse or dense) solver or an ILU(k) one can end up with a very > bad solver because the direct solver will likely produce a very small pivot > at some point thus the triangular solver applied in the precondition can > produce HUGE changes in the solution (that are not physical) and so the > preconditioner basically produces garbage. On the other hand sometimes it > works out ok. > > > > > > What preconditioners are appropriate? asm, bjacobi, amg? I have an > example which shows lu and ilu indeed work, but asm and bjacobi do not at > all. That is why I am asking questions about algorithms. I am trying to > figure out a default preconditioner for several singular systems. > > > > Hmm, normally asm and bjacobi would be fine with this unless one or > more of the subblocks are themselves singular (which normally won't > happen). AMG can also work find sometimes. > > > > Can you send a sample code? > > > > Barry > > > > > > > > Thanks again. > > > > > > > > > Fande Kong, > > > > > > > > > > > > > > > > > > > > > Fande Kong, > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From claus at olemiss.edu Tue Oct 11 15:12:31 2016 From: claus at olemiss.edu (CLAUS HELLMUTH WARNER HETZER) Date: Tue, 11 Oct 2016 20:12:31 +0000 Subject: [petsc-users] Autoconf tests Message-ID: Hi everybody- Figured I?d ask this here before I go reinventing the wheel. I?m writing an autoconf installer (the standard Linux configure/make package) for an acoustic wave propagation modeling package that builds PETSc and SLEPc as part of the installation process. I?d like to be able to test for instances of PETSc already being installed on the user?s machine and, if possible, whether they?re the debug version. I know I can check for the existence of the PETSC_DIR environmental variable, and parse the PETSC_ARCH variable for ?debug?, and I?ll do that as a first pass, but has anybody written any M4 tests that are more reliable than those (i.e. actually attempting to link to the libraries)? I had one user who had the libraries installed in /usr/local/bin but didn?t have the environmental variables set in their profile, so the linker was confused and it took a while to figure out what was going weird with the install. If not, I guess I?ll be putting on my Autoconf gloves and getting my hands dirty. Thanks -Claus Hetzer ------------------ Claus Hetzer Senior Research and Development Engineer National Center for Physical Acoustics The University of Mississippi 145 Hill Drive PO Box 1848 University, MS 38677 claus at olemiss.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Oct 11 15:31:48 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 11 Oct 2016 15:31:48 -0500 Subject: [petsc-users] Autoconf tests In-Reply-To: References: Message-ID: You don't want to get the debug mode from PETSC_ARCH since there may not be a PETSC_ARCH (for PETSc --prefix installs) or because the user did not put the string in it. You can check for the PETSC_USE_DEBUG symbol in the petscconf.h file by linking a C program against and #if defined(PETSC_USE_DEBUG). Barry > On Oct 11, 2016, at 3:12 PM, CLAUS HELLMUTH WARNER HETZER wrote: > > Hi everybody- > > Figured I?d ask this here before I go reinventing the wheel. > > I?m writing an autoconf installer (the standard Linux configure/make package) for an acoustic wave propagation modeling package that builds PETSc and SLEPc as part of the installation process. I?d like to be able to test for instances of PETSc already being installed on the user?s machine and, if possible, whether they?re the debug version. I know I can check for the existence of the PETSC_DIR environmental variable, and parse the PETSC_ARCH variable for ?debug?, and I?ll do that as a first pass, but has anybody written any M4 tests that are more reliable than those (i.e. actually attempting to link to the libraries)? I had one user who had the libraries installed in /usr/local/bin but didn?t have the environmental variables set in their profile, so the linker was confused and it took a while to figure out what was going weird with the install. > > If not, I guess I?ll be putting on my Autoconf gloves and getting my hands dirty. > > Thanks > -Claus Hetzer > > ------------------ > Claus Hetzer > Senior Research and Development Engineer > National Center for Physical Acoustics > The University of Mississippi > 145 Hill Drive > PO Box 1848 > University, MS 38677 > claus at olemiss.edu > > > > > From overholt at capesim.com Tue Oct 11 16:08:55 2016 From: overholt at capesim.com (Matthew Overholt) Date: Tue, 11 Oct 2016 17:08:55 -0400 Subject: [petsc-users] large PetscCommDuplicate overhead In-Reply-To: References: <004201d21f3e$ed31c120$c7954360$@capesim.com> <1EF15B5B-168C-4FFD-98BB-4C49678C02FC@mcs.anl.gov> <001801d21fe8$a3e67970$ebb36c50$@capesim.com> Message-ID: <002b01d22403$aee809f0$0cb81dd0$@capesim.com> Barry, Subsequent tests with the same code and a problem (input) having a much smaller vertex (equation) count (i.e. a much smaller matrix to invert for the solution) have NOT had PetscCommDuplicate() account for any significant time, so I'm not surprised that your test didn't find any problem. I am running on Edison with its default modules, except as follows. > module unload darshan > module load cray-petsc > module load perftools-base > module load perftools For sampling, I used the default pat_build command on my executable (xyz): > pat_build -f xyz For tracing, I used the following: > pat_build -w -T PetscCommDuplicate xyz Thanks, Matt... -----Original Message----- From: Barry Smith [mailto:bsmith at mcs.anl.gov] Sent: Saturday, October 08, 2016 5:18 PM To: overholt at capesim.com Cc: PETSc Subject: Re: [petsc-users] large PetscCommDuplicate overhead What exact machine are you running on? Please run modules list so we can see exactly what modules you are using. Please tell us exactly what options you are passing to pat_build? Barry > On Oct 6, 2016, at 10:45 AM, Matthew Overholt wrote: > > Matthew and Barry, > > 1) I did a direct measurement of PetscCommDuplicate() time by tracing > just that call (using CrayPat), and confirmed the sampling results. > For 8 processes (n=8), tracing counted a total of 101 calls, taking ~0 > time on the root process but taking 11.78 seconds (6.3% of 188 total > seconds) on each of the other 7 processes. For 16 processes (n=16, > still only 1 node), tracing counted 102 total calls for a total of > 18.42 seconds (13.2% of 139.6 total > seconds) on every process except the root. > > 2) Copied below is a section of the log view for the first two > solutions for n=2, which shows the same calls as for n=8. (I can send > the entire log files if desired.) In each case I count about 44 PCD > calls per process during initialization and meshing, 7 calls during > setup, 9 calls for the first solution, then 3 calls for each > subsequent solution (fixed-point iteration), and 3 calls to write out the solution, for 75 total. > > 3) I would expect that the administrators of this machine have > configured PETSc appropriately. I am using their current default > install, which is 3.7.2. > https://www.nersc.gov/users/software/programming-libraries/math-librar > ies/pe > tsc/ > > 4) Yes, I just gave the MUMPS time as a comparison. > > 5) As to where it is spending time, perhaps the timing results in the > log files will be helpful. The "Solution took ..." printouts give the > total solution time for that iteration, the others are incremental > times. (As an aside, I have been wondering why the solution times do > not scale well with process count, even though that work is entirely > done in parallel PETSc > routines.) > > Thanks, > Matt Overholt > > > ********** -log_view -info results for n=2 : the first solution and > subsequent fixed-point iteration *********** [0] PetscCommDuplicate(): > Using internal PETSc communicator 1140850688 > -2080374779 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374779 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374781 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > Matrix setup took 0.108 s > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374779 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374781 > KSP PC setup took 0.079 s > [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs. > [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs. > [0] MatStashScatterBegin_Ref(): No of messages: 0 [0] > MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. > [1] MatStashScatterBegin_Ref(): No of messages: 1 [1] > MatStashScatterBegin_Ref(): Mesg_to: 0: size: 7713792 bytes [1] > MatAssemblyBegin_MPIAIJ(): Stash has 482112 entries, uses 5 mallocs. > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 599214; storage space: > 1050106 unneeded,15128672 used > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows > 599214) < 0.6. Do not use CompressedRow routines. > [1] MatSeqAIJCheckInode(): Found 599214 nodes out of 599214 rows. Not > using Inode routines [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 > X 621594; storage space: > 1237634 unneeded,15545404 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows > 621594) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 621594 nodes out of 621594 rows. Not > using Inode routines [0] PetscCommDuplicate(): Using internal PETSc > communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] VecScatterCreateCommon_PtoS(): Using MPI_Alltoallv() for scatter > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter [0] > VecScatterCreate(): General case: MPI to Seq [1] > MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 15700; storage space: > 5257543 unneeded,136718 used > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 89 [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 19 > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows > 582718)/(num_localrows 599214) > 0.6. Use CompressedRow routines. > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 16496; storage space: > 5464978 unneeded,136718 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 490 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 16 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 605894)/(num_localrows 621594) > 0.6. Use CompressedRow routines. > K and q SetValues took 26.426 s > [0] PCSetUp(): Setting up PC for first time [1] PetscCommDuplicate(): > Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [0] VecScatterCreate(): Special case: processor zero gets entire > parallel vector, rest get none > ** Max-trans not allowed because matrix is distributed [0] > PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PCSetUp(): Leaving PC with identical preconditioner since operator > is unchanged [0] PetscCommDuplicate(): Using internal PETSc > communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] VecScatterCreateCommon_PtoS(): Using MPI_Alltoallv() for scatter > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter [0] > VecScatterCreate(): General case: Seq to MPI [1] VecScatterCreate(): > General case: Seq to MPI Solution took 102.21 s > > NL iteration 0: delta = 32.0488 67.6279. > Error delta calc took 0.045 s > Node and Element temps update took 0.017 s [0] VecAssemblyBegin_MPI(): > Stash has 0 entries, uses 0 mallocs. > [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs. > [0] MatStashScatterBegin_Ref(): No of messages: 0 [0] > MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs. > [1] MatStashScatterBegin_Ref(): No of messages: 1 [1] > MatStashScatterBegin_Ref(): Mesg_to: 0: size: 7713792 bytes [1] > MatAssemblyBegin_MPIAIJ(): Stash has 482112 entries, uses 0 mallocs. > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 599214; storage > space: 0 > unneeded,15128672 used > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows > 599214) < 0.6. Do not use CompressedRow routines. > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 621594; storage > space: 0 > unneeded,15545404 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows > 621594) < 0.6. Do not use CompressedRow routines. > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 599214 X 15700; storage > space: 0 > unneeded,136718 used > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 19 > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows > 582718)/(num_localrows 599214) > 0.6. Use CompressedRow routines. > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 621594 X 16496; storage > space: 0 > unneeded,136718 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 16 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 605894)/(num_localrows 621594) > 0.6. Use CompressedRow routines. > K and q SetValues took 2.366 s > [0] PCSetUp(): Setting up PC with same nonzero pattern [0] > PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374780 > [1] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 > -2080374782 > [0] VecScatterCreateCommon_PtoS(): Using MPI_Alltoallv() for scatter > [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter [0] > VecScatterCreate(): General case: Seq to MPI [1] VecScatterCreate(): > General case: Seq to MPI Solution took 82.156 s > > -----Original Message----- > From: Barry Smith [mailto:bsmith at mcs.anl.gov] > Sent: Wednesday, October 05, 2016 4:42 PM > To: overholt at capesim.com > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] large PetscCommDuplicate overhead > > >> On Oct 5, 2016, at 2:30 PM, Matthew Overholt wrote: >> >> Hi Petsc-Users, >> >> I am trying to understand an issue where PetscCommDuplicate() calls >> are > taking an increasing percentage of time as I run a fixed-sized problem > on more processes. >> >> I am using the FEM to solve the steady-state heat transfer equation >> (K.x = > q) using a PC direct solver, like MUMPS. >> >> I am running on the NERSC Cray X30, which has two Xeon's per node >> with 12 > cores each, and profiling the code using CrayPat sampling. >> >> On a typical problem (1E+6 finite elements), running on a single node: >> -for 2 cores (1 on each Xeon), about 1% of time is PetscCommDuplicate >> (on > process 1, but on the root it is less), and (for reference) 9% of > total time is for MUMPS. >> -for 8 cores (4 on each Xeon), over 6% of time is PetscCommDuplicate >> (on > every process except the root, where it is <1%), and 9-10% of total > time is for MUMPS. > > What does PetscCommDuplicate() have to do with MUMPS? Nothing at > all, you are just giving its time for comparison? > >> >> What is the large PetscCommDuplicate time connected to, an increasing > number of messages (tags)? Would using fewer MatSetValues() and > VecSetValues() calls (with longer message lengths) alleviate this? > > No PetscCommDuplicate won't increate with more messages or calls to > XXXSetValues(). PetscCommDuplicate() is only called essentially on the > creation of new PETSc objects. It should also be fast since it > basically needs to do just a MPI_Attr_get(). With more processes but > the same problem size and code there should be pretty much the same > number of objects created. > > PetscSpinlockLock() does nothing if you are not using threads so it > won't take any time. > > Is there a way to see where it is spending its time inside the > PetscCommDuplicate()? Perhaps the Cray MPI_Attr_get() has issues. > > Barry > > > > > > >> >> For reference, the PETSc calling sequence in the code is as follows. >> // Create the solution and RHS vectors >> ierr = VecCreate(petscData->mpicomm,&mesh->hpx); >> ierr = PetscObjectSetName((PetscObject) mesh->hpx, "Solution"); >> ierr = VecSetSizes(mesh->hpx,mesh->lxN,mesh->neqns); // size = # >> of > equations; distribution to match mesh >> ierr = VecSetFromOptions(mesh->hpx); // allow run time options >> ierr = VecDuplicate(mesh->hpx,&q); // create the RHS vector >> // Create the stiffnexx matrix >> ierr = MatCreate(petscData->mpicomm,&K); >> ierr = MatSetSizes(K,mesh->lxN,mesh->lxN,mesh->neqns,mesh->neqns); >> ierr = MatSetType(K,MATAIJ); // default sparse type >> // Do preallocation >> ierr = MatMPIAIJSetPreallocation(K,d_nz,NULL,o_nz,NULL); >> ierr = MatSeqAIJSetPreallocation(K,d_nz,NULL); >> ierr = MatSetOption(K,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE); >> ierr = MatSetUp(K); >> // Create and set up the KSP context as a PreConditioner Only >> (Direct) > Solution >> ierr = KSPCreate(petscData->mpicomm,&ksp); >> ierr = KSPSetOperators(ksp,K,K); >> ierr = KSPSetType(ksp,KSPPREONLY); >> // Set the temperature vector >> ierr = VecSet(mesh->hpx,mesh->Tmin); >> // Set the default PC method as MUMPS >> ierr = KSPGetPC(ksp,&pc); // extract the preconditioner >> ierr = PCSetType(pc,PCLU); // set pc options >> ierr = PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS); >> ierr = KSPSetFromOptions(ksp); >> >> // Set the values for the K matrix and q vector >> // which involves a lot of these calls >> ierr = MatSetValues(K,mrows,idxm,ncols,idxn,pKe,ADD_VALUES); // > 1 call per matrix row (equation) >> ierr = VecSetValues(q,nqe,ixn,pqe,ADD_VALUES); // 1 call per > element >> ierr = VecAssemblyBegin(q); >> ierr = MatAssemblyBegin(K,MAT_FINAL_ASSEMBLY); >> ierr = VecAssemblyEnd(q); >> ierr = MatAssemblyEnd(K,MAT_FINAL_ASSEMBLY); >> >> // Solve ////////////////////////////////////// >> ierr = KSPSolve(ksp,q,mesh->hpx); >> ... >> *Note that the code evenly divides the finite elements over the total > number of processors, and I am using ghosting of the FE vertices > vector to handle the vertices that are needed on more than 1 process. >> >> Thanks in advance for your help, >> Matt Overholt >> CapeSym, Inc. >> >> Virus-free. www.avast.com > > > --- > This email has been checked for viruses by Avast antivirus software. > https://www.avast.com/antivirus > --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus From popov at uni-mainz.de Tue Oct 11 16:12:49 2016 From: popov at uni-mainz.de (Anton) Date: Tue, 11 Oct 2016 23:12:49 +0200 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: <55515EF5-6072-4AAD-AF69-287448F1FD72@mcs.anl.gov> References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> <55515EF5-6072-4AAD-AF69-287448F1FD72@mcs.anl.gov> Message-ID: <25df38ba-3bff-2cac-f7a2-14a6073813cb@uni-mainz.de> On 10/11/16 7:44 PM, Barry Smith wrote: > You can run your code with -ksp_view_mat binary -ksp_view_rhs binary this will cause it to save the matrices and right hand sides to the linear systems in a file called binaryoutput, then email the file to petsc-maint at mcs.anl.gov (don't worry this email address accepts large attachments). And tell us how many processes you ran on that produced the problems. > > Barry > I'll do that, but I just wonder which version of SuperLU_DIST is used in 3.7.4? The latest version available on http://crd-legacy.lbl.gov/~xiaoye/SuperLU/ is 5.1.1 which is a week old and includes bug fixes. Maybe we're facing a problem that is already solved. Thanks, Anton > >> On Oct 11, 2016, at 12:19 PM, Satish Balay wrote: >> >> This log looks truncated. Are there any valgrind mesages before this? >> [like from your application code - or from MPI] >> >> Perhaps you can send the complete log - with: >> valgrind -q --tool=memcheck --leak-check=yes --num-callers=20 --track-origins=yes >> >> [and if there were more valgrind messages from MPI - rebuild petsc >> with --download-mpich - for a valgrind clean mpi] >> >> Sherry, >> Perhaps this log points to some issue in superlu_dist? >> >> thanks, >> Satish >> >> On Tue, 11 Oct 2016, Anton Popov wrote: >> >>> Valgrind immediately detects interesting stuff: >>> >>> ==25673== Use of uninitialised value of size 8 >>> ==25673== at 0x178272C: static_schedule (static_schedule.c:960) >>> ==25674== Use of uninitialised value of size 8 >>> ==25674== at 0x178272C: static_schedule (static_schedule.c:960) >>> ==25674== by 0x174E74E: pdgstrf (pdgstrf.c:572) >>> ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) >>> >>> >>> ==25673== Conditional jump or move depends on uninitialised value(s) >>> ==25673== at 0x1752143: pdgstrf (dlook_ahead_update.c:24) >>> ==25673== by 0x1733954: pdgssvx (pdgssvx.c:1124) >>> >>> >>> ==25673== Conditional jump or move depends on uninitialised value(s) >>> ==25673== at 0x5C83F43: PMPI_Recv (in /opt/mpich3/lib/libmpi.so.12.1.0) >>> ==25673== by 0x1755385: pdgstrf2_trsm (pdgstrf2.c:253) >>> ==25673== by 0x1751E4F: pdgstrf (dlook_ahead_update.c:195) >>> ==25673== by 0x1733954: pdgssvx (pdgssvx.c:1124) >>> >>> ==25674== Use of uninitialised value of size 8 >>> ==25674== at 0x62BF72B: _itoa_word (_itoa.c:179) >>> ==25674== by 0x62C1289: printf_positional (vfprintf.c:2022) >>> ==25674== by 0x62C2465: vfprintf (vfprintf.c:1677) >>> ==25674== by 0x638AFD5: __vsnprintf_chk (vsnprintf_chk.c:63) >>> ==25674== by 0x638AF37: __snprintf_chk (snprintf_chk.c:34) >>> ==25674== by 0x5CC6C08: MPIR_Err_create_code_valist (in >>> /opt/mpich3/lib/libmpi.so.12.1.0) >>> ==25674== by 0x5CC7A9A: MPIR_Err_create_code (in >>> /opt/mpich3/lib/libmpi.so.12.1.0) >>> ==25674== by 0x5C83FB1: PMPI_Recv (in /opt/mpich3/lib/libmpi.so.12.1.0) >>> ==25674== by 0x1755385: pdgstrf2_trsm (pdgstrf2.c:253) >>> ==25674== by 0x1751E4F: pdgstrf (dlook_ahead_update.c:195) >>> ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) >>> >>> ==25674== Use of uninitialised value of size 8 >>> ==25674== at 0x1751E92: pdgstrf (dlook_ahead_update.c:205) >>> ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) >>> >>> And it crashes after this: >>> >>> ==25674== Invalid write of size 4 >>> ==25674== at 0x1751F2F: pdgstrf (dlook_ahead_update.c:211) >>> ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) >>> ==25674== by 0xAAEFAE: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:421) >>> ==25674== Address 0xa0 is not stack'd, malloc'd or (recently) free'd >>> ==25674== >>> [1]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably >>> memory access out of range >>> >>> >>> On 10/11/2016 03:26 PM, Anton Popov wrote: >>>> On 10/10/2016 07:11 PM, Satish Balay wrote: >>>>> Thats from petsc-3.5 >>>>> >>>>> Anton - please post the stack trace you get with >>>>> --download-superlu_dist-commit=origin/maint >>>> I guess this is it: >>>> >>>> [0]PETSC ERROR: [0] SuperLU_DIST:pdgssvx line 421 >>>> /home/anton/LIB/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >>>> [0]PETSC ERROR: [0] MatLUFactorNumeric_SuperLU_DIST line 282 >>>> /home/anton/LIB/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >>>> [0]PETSC ERROR: [0] MatLUFactorNumeric line 2985 >>>> /home/anton/LIB/petsc/src/mat/interface/matrix.c >>>> [0]PETSC ERROR: [0] PCSetUp_LU line 101 >>>> /home/anton/LIB/petsc/src/ksp/pc/impls/factor/lu/lu.c >>>> [0]PETSC ERROR: [0] PCSetUp line 930 >>>> /home/anton/LIB/petsc/src/ksp/pc/interface/precon.c >>>> >>>> According to the line numbers it crashes within >>>> MatLUFactorNumeric_SuperLU_DIST while calling pdgssvx. >>>> >>>> Surprisingly this only happens on the second SNES iteration, but not on the >>>> first. >>>> >>>> I'm trying to reproduce this behavior with PETSc KSP and SNES examples. >>>> However, everything I've tried up to now with SuperLU_DIST does just fine. >>>> >>>> I'm also checking our code in Valgrind to make sure it's clean. >>>> >>>> Anton >>>>> Satish >>>>> >>>>> >>>>> On Mon, 10 Oct 2016, Xiaoye S. Li wrote: >>>>> >>>>>> Which version of superlu_dist does this capture? I looked at the >>>>>> original >>>>>> error log, it pointed to pdgssvx: line 161. But that line is in >>>>>> comment >>>>>> block, not the program. >>>>>> >>>>>> Sherry >>>>>> >>>>>> >>>>>> On Mon, Oct 10, 2016 at 7:27 AM, Anton Popov wrote: >>>>>> >>>>>>> On 10/07/2016 05:23 PM, Satish Balay wrote: >>>>>>> >>>>>>>> On Fri, 7 Oct 2016, Kong, Fande wrote: >>>>>>>> >>>>>>>> On Fri, Oct 7, 2016 at 9:04 AM, Satish Balay >>>>>>>> wrote: >>>>>>>>> On Fri, 7 Oct 2016, Anton Popov wrote: >>>>>>>>>> Hi guys, >>>>>>>>>>> are there any news about fixing buggy behavior of >>>>>>>>>>> SuperLU_DIST, exactly >>>>>>>>>>> >>>>>>>>>> what >>>>>>>>>> >>>>>>>>>>> is described here: >>>>>>>>>>> >>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists. >>>>>>>>>>> >>>>>>>>>> mcs.anl.gov_pipermail_petsc-2Dusers_2015-2DAugust_026802.htm >>>>>>>>>> l&d=CwIBAg&c= >>>>>>>>>> 54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_ >>>>>>>>>> JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=RwruX6ckX0t9H89Z6LXKBfJBOAM2vG >>>>>>>>>> 1sQHw2tIsSQtA&s=bbB62oGLm582JebVs8xsUej_OX0eUwibAKsRRWKafos&e= ? >>>>>>>>>> >>>>>>>>>>> I'm using 3.7.4 and still get SEGV in pdgssvx routine. >>>>>>>>>>> Everything works >>>>>>>>>>> >>>>>>>>>> fine >>>>>>>>>> >>>>>>>>>>> with 3.5.4. >>>>>>>>>>> >>>>>>>>>>> Do I still have to stick to maint branch, and what are the >>>>>>>>>>> chances for >>>>>>>>>>> >>>>>>>>>> these >>>>>>>>>> >>>>>>>>>>> fixes to be included in 3.7.5? >>>>>>>>>>> >>>>>>>>>> 3.7.4. is off maint branch [as of a week ago]. So if you are >>>>>>>>>> seeing >>>>>>>>>> issues with it - its best to debug and figure out the cause. >>>>>>>>>> >>>>>>>>>> This bug is indeed inside of superlu_dist, and we started having >>>>>>>>>> this >>>>>>>>> issue >>>>>>>>> from PETSc-3.6.x. I think superlu_dist developers should have >>>>>>>>> fixed this >>>>>>>>> bug. We forgot to update superlu_dist?? This is not a thing users >>>>>>>>> could >>>>>>>>> debug and fix. >>>>>>>>> >>>>>>>>> I have many people in INL suffering from this issue, and they have >>>>>>>>> to >>>>>>>>> stay >>>>>>>>> with PETSc-3.5.4 to use superlu_dist. >>>>>>>>> >>>>>>>> To verify if the bug is fixed in latest superlu_dist - you can try >>>>>>>> [assuming you have git - either from petsc-3.7/maint/master]: >>>>>>>> >>>>>>>> --download-superlu_dist --download-superlu_dist-commit=origin/maint >>>>>>>> >>>>>>>> >>>>>>>> Satish >>>>>>>> >>>>>>>> Hi Satish, >>>>>>> I did this: >>>>>>> >>>>>>> git clone -b maint https://bitbucket.org/petsc/petsc.git petsc >>>>>>> >>>>>>> --download-superlu_dist >>>>>>> --download-superlu_dist-commit=origin/maint (not sure this is needed, >>>>>>> since I'm already in maint) >>>>>>> >>>>>>> The problem is still there. >>>>>>> >>>>>>> Cheers, >>>>>>> Anton >>>>>>> >>> >>> From balay at mcs.anl.gov Tue Oct 11 16:16:07 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 11 Oct 2016 16:16:07 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: <25df38ba-3bff-2cac-f7a2-14a6073813cb@uni-mainz.de> References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> <55515EF5-6072-4AAD-AF69-287448F1FD72@mcs.anl.gov> <25df38ba-3bff-2cac-f7a2-14a6073813cb@uni-mainz.de> Message-ID: On Tue, 11 Oct 2016, Anton wrote: > > > On 10/11/16 7:44 PM, Barry Smith wrote: > > You can run your code with -ksp_view_mat binary -ksp_view_rhs binary > > this will cause it to save the matrices and right hand sides to the > > linear systems in a file called binaryoutput, then email the file to > > petsc-maint at mcs.anl.gov (don't worry this email address accepts large > > attachments). And tell us how many processes you ran on that produced > > the problems. > > > > Barry > > > > I'll do that, but I just wonder which version of SuperLU_DIST is used in > 3.7.4? > > The latest version available on http://crd-legacy.lbl.gov/~xiaoye/SuperLU/ is > 5.1.1 which is a week old and includes bug fixes. This is the version you essentially got - when you configured with --download-superlu_dist-commit=origin/maint Satish > > Maybe we're facing a problem that is already solved. > > Thanks, > Anton > > > > > On Oct 11, 2016, at 12:19 PM, Satish Balay wrote: > > > > > > This log looks truncated. Are there any valgrind mesages before this? > > > [like from your application code - or from MPI] > > > > > > Perhaps you can send the complete log - with: > > > valgrind -q --tool=memcheck --leak-check=yes --num-callers=20 > > > --track-origins=yes > > > > > > [and if there were more valgrind messages from MPI - rebuild petsc > > > with --download-mpich - for a valgrind clean mpi] > > > > > > Sherry, > > > Perhaps this log points to some issue in superlu_dist? > > > > > > thanks, > > > Satish > > > > > > On Tue, 11 Oct 2016, Anton Popov wrote: > > > > > > > Valgrind immediately detects interesting stuff: > > > > > > > > ==25673== Use of uninitialised value of size 8 > > > > ==25673== at 0x178272C: static_schedule (static_schedule.c:960) > > > > ==25674== Use of uninitialised value of size 8 > > > > ==25674== at 0x178272C: static_schedule (static_schedule.c:960) > > > > ==25674== by 0x174E74E: pdgstrf (pdgstrf.c:572) > > > > ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) > > > > > > > > > > > > ==25673== Conditional jump or move depends on uninitialised value(s) > > > > ==25673== at 0x1752143: pdgstrf (dlook_ahead_update.c:24) > > > > ==25673== by 0x1733954: pdgssvx (pdgssvx.c:1124) > > > > > > > > > > > > ==25673== Conditional jump or move depends on uninitialised value(s) > > > > ==25673== at 0x5C83F43: PMPI_Recv (in > > > > /opt/mpich3/lib/libmpi.so.12.1.0) > > > > ==25673== by 0x1755385: pdgstrf2_trsm (pdgstrf2.c:253) > > > > ==25673== by 0x1751E4F: pdgstrf (dlook_ahead_update.c:195) > > > > ==25673== by 0x1733954: pdgssvx (pdgssvx.c:1124) > > > > > > > > ==25674== Use of uninitialised value of size 8 > > > > ==25674== at 0x62BF72B: _itoa_word (_itoa.c:179) > > > > ==25674== by 0x62C1289: printf_positional (vfprintf.c:2022) > > > > ==25674== by 0x62C2465: vfprintf (vfprintf.c:1677) > > > > ==25674== by 0x638AFD5: __vsnprintf_chk (vsnprintf_chk.c:63) > > > > ==25674== by 0x638AF37: __snprintf_chk (snprintf_chk.c:34) > > > > ==25674== by 0x5CC6C08: MPIR_Err_create_code_valist (in > > > > /opt/mpich3/lib/libmpi.so.12.1.0) > > > > ==25674== by 0x5CC7A9A: MPIR_Err_create_code (in > > > > /opt/mpich3/lib/libmpi.so.12.1.0) > > > > ==25674== by 0x5C83FB1: PMPI_Recv (in > > > > /opt/mpich3/lib/libmpi.so.12.1.0) > > > > ==25674== by 0x1755385: pdgstrf2_trsm (pdgstrf2.c:253) > > > > ==25674== by 0x1751E4F: pdgstrf (dlook_ahead_update.c:195) > > > > ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) > > > > > > > > ==25674== Use of uninitialised value of size 8 > > > > ==25674== at 0x1751E92: pdgstrf (dlook_ahead_update.c:205) > > > > ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) > > > > > > > > And it crashes after this: > > > > > > > > ==25674== Invalid write of size 4 > > > > ==25674== at 0x1751F2F: pdgstrf (dlook_ahead_update.c:211) > > > > ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) > > > > ==25674== by 0xAAEFAE: MatLUFactorNumeric_SuperLU_DIST > > > > (superlu_dist.c:421) > > > > ==25674== Address 0xa0 is not stack'd, malloc'd or (recently) free'd > > > > ==25674== > > > > [1]PETSC ERROR: > > > > ------------------------------------------------------------------------ > > > > [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > > > > probably > > > > memory access out of range > > > > > > > > > > > > On 10/11/2016 03:26 PM, Anton Popov wrote: > > > > > On 10/10/2016 07:11 PM, Satish Balay wrote: > > > > > > Thats from petsc-3.5 > > > > > > > > > > > > Anton - please post the stack trace you get with > > > > > > --download-superlu_dist-commit=origin/maint > > > > > I guess this is it: > > > > > > > > > > [0]PETSC ERROR: [0] SuperLU_DIST:pdgssvx line 421 > > > > > /home/anton/LIB/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > > > > > [0]PETSC ERROR: [0] MatLUFactorNumeric_SuperLU_DIST line 282 > > > > > /home/anton/LIB/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > > > > > [0]PETSC ERROR: [0] MatLUFactorNumeric line 2985 > > > > > /home/anton/LIB/petsc/src/mat/interface/matrix.c > > > > > [0]PETSC ERROR: [0] PCSetUp_LU line 101 > > > > > /home/anton/LIB/petsc/src/ksp/pc/impls/factor/lu/lu.c > > > > > [0]PETSC ERROR: [0] PCSetUp line 930 > > > > > /home/anton/LIB/petsc/src/ksp/pc/interface/precon.c > > > > > > > > > > According to the line numbers it crashes within > > > > > MatLUFactorNumeric_SuperLU_DIST while calling pdgssvx. > > > > > > > > > > Surprisingly this only happens on the second SNES iteration, but not > > > > > on the > > > > > first. > > > > > > > > > > I'm trying to reproduce this behavior with PETSc KSP and SNES > > > > > examples. > > > > > However, everything I've tried up to now with SuperLU_DIST does just > > > > > fine. > > > > > > > > > > I'm also checking our code in Valgrind to make sure it's clean. > > > > > > > > > > Anton > > > > > > Satish > > > > > > > > > > > > > > > > > > On Mon, 10 Oct 2016, Xiaoye S. Li wrote: > > > > > > > > > > > > > Which version of superlu_dist does this capture? I looked at the > > > > > > > original > > > > > > > error log, it pointed to pdgssvx: line 161. But that line is in > > > > > > > comment > > > > > > > block, not the program. > > > > > > > > > > > > > > Sherry > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 10, 2016 at 7:27 AM, Anton Popov > > > > > > > wrote: > > > > > > > > > > > > > > > On 10/07/2016 05:23 PM, Satish Balay wrote: > > > > > > > > > > > > > > > > > On Fri, 7 Oct 2016, Kong, Fande wrote: > > > > > > > > > > > > > > > > > > On Fri, Oct 7, 2016 at 9:04 AM, Satish Balay > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > On Fri, 7 Oct 2016, Anton Popov wrote: > > > > > > > > > > > Hi guys, > > > > > > > > > > > > are there any news about fixing buggy behavior of > > > > > > > > > > > > SuperLU_DIST, exactly > > > > > > > > > > > > > > > > > > > > > > > what > > > > > > > > > > > > > > > > > > > > > > > is described here: > > > > > > > > > > > > > > > > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists. > > > > > > > > > > > > > > > > > > > > > > > mcs.anl.gov_pipermail_petsc-2Dusers_2015-2DAugust_026802.htm > > > > > > > > > > > l&d=CwIBAg&c= > > > > > > > > > > > 54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_ > > > > > > > > > > > JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=RwruX6ckX0t9H89Z6LXKBfJBOAM2vG > > > > > > > > > > > 1sQHw2tIsSQtA&s=bbB62oGLm582JebVs8xsUej_OX0eUwibAKsRRWKafos&e= > > > > > > > > > > > ? > > > > > > > > > > > > > > > > > > > > > > > I'm using 3.7.4 and still get SEGV in pdgssvx routine. > > > > > > > > > > > > Everything works > > > > > > > > > > > > > > > > > > > > > > > fine > > > > > > > > > > > > > > > > > > > > > > > with 3.5.4. > > > > > > > > > > > > > > > > > > > > > > > > Do I still have to stick to maint branch, and what are > > > > > > > > > > > > the > > > > > > > > > > > > chances for > > > > > > > > > > > > > > > > > > > > > > > these > > > > > > > > > > > > > > > > > > > > > > > fixes to be included in 3.7.5? > > > > > > > > > > > > > > > > > > > > > > > 3.7.4. is off maint branch [as of a week ago]. So if you > > > > > > > > > > > are > > > > > > > > > > > seeing > > > > > > > > > > > issues with it - its best to debug and figure out the > > > > > > > > > > > cause. > > > > > > > > > > > > > > > > > > > > > > This bug is indeed inside of superlu_dist, and we started > > > > > > > > > > > having > > > > > > > > > > > this > > > > > > > > > > issue > > > > > > > > > > from PETSc-3.6.x. I think superlu_dist developers should > > > > > > > > > > have > > > > > > > > > > fixed this > > > > > > > > > > bug. We forgot to update superlu_dist?? This is not a thing > > > > > > > > > > users > > > > > > > > > > could > > > > > > > > > > debug and fix. > > > > > > > > > > > > > > > > > > > > I have many people in INL suffering from this issue, and > > > > > > > > > > they have > > > > > > > > > > to > > > > > > > > > > stay > > > > > > > > > > with PETSc-3.5.4 to use superlu_dist. > > > > > > > > > > > > > > > > > > > To verify if the bug is fixed in latest superlu_dist - you can > > > > > > > > > try > > > > > > > > > [assuming you have git - either from petsc-3.7/maint/master]: > > > > > > > > > > > > > > > > > > --download-superlu_dist > > > > > > > > > --download-superlu_dist-commit=origin/maint > > > > > > > > > > > > > > > > > > > > > > > > > > > Satish > > > > > > > > > > > > > > > > > > Hi Satish, > > > > > > > > I did this: > > > > > > > > > > > > > > > > git clone -b maint https://bitbucket.org/petsc/petsc.git petsc > > > > > > > > > > > > > > > > --download-superlu_dist > > > > > > > > --download-superlu_dist-commit=origin/maint (not sure this is > > > > > > > > needed, > > > > > > > > since I'm already in maint) > > > > > > > > > > > > > > > > The problem is still there. > > > > > > > > > > > > > > > > Cheers, > > > > > > > > Anton > > > > > > > > > > > > > > > > > > > From popov at uni-mainz.de Tue Oct 11 16:18:23 2016 From: popov at uni-mainz.de (Anton) Date: Tue, 11 Oct 2016 23:18:23 +0200 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> Message-ID: <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> On 10/11/16 7:19 PM, Satish Balay wrote: > This log looks truncated. Are there any valgrind mesages before this? > [like from your application code - or from MPI] Yes it is indeed truncated. I only included relevant messages. > > Perhaps you can send the complete log - with: > valgrind -q --tool=memcheck --leak-check=yes --num-callers=20 --track-origins=yes > > [and if there were more valgrind messages from MPI - rebuild petsc There are no messages originating from our code, just a few MPI related ones (probably false positives) and from SuperLU_DIST (most of them). Thanks, Anton > with --download-mpich - for a valgrind clean mpi] > > Sherry, > Perhaps this log points to some issue in superlu_dist? > > thanks, > Satish > > On Tue, 11 Oct 2016, Anton Popov wrote: > >> Valgrind immediately detects interesting stuff: >> >> ==25673== Use of uninitialised value of size 8 >> ==25673== at 0x178272C: static_schedule (static_schedule.c:960) >> ==25674== Use of uninitialised value of size 8 >> ==25674== at 0x178272C: static_schedule (static_schedule.c:960) >> ==25674== by 0x174E74E: pdgstrf (pdgstrf.c:572) >> ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) >> >> >> ==25673== Conditional jump or move depends on uninitialised value(s) >> ==25673== at 0x1752143: pdgstrf (dlook_ahead_update.c:24) >> ==25673== by 0x1733954: pdgssvx (pdgssvx.c:1124) >> >> >> ==25673== Conditional jump or move depends on uninitialised value(s) >> ==25673== at 0x5C83F43: PMPI_Recv (in /opt/mpich3/lib/libmpi.so.12.1.0) >> ==25673== by 0x1755385: pdgstrf2_trsm (pdgstrf2.c:253) >> ==25673== by 0x1751E4F: pdgstrf (dlook_ahead_update.c:195) >> ==25673== by 0x1733954: pdgssvx (pdgssvx.c:1124) >> >> ==25674== Use of uninitialised value of size 8 >> ==25674== at 0x62BF72B: _itoa_word (_itoa.c:179) >> ==25674== by 0x62C1289: printf_positional (vfprintf.c:2022) >> ==25674== by 0x62C2465: vfprintf (vfprintf.c:1677) >> ==25674== by 0x638AFD5: __vsnprintf_chk (vsnprintf_chk.c:63) >> ==25674== by 0x638AF37: __snprintf_chk (snprintf_chk.c:34) >> ==25674== by 0x5CC6C08: MPIR_Err_create_code_valist (in >> /opt/mpich3/lib/libmpi.so.12.1.0) >> ==25674== by 0x5CC7A9A: MPIR_Err_create_code (in >> /opt/mpich3/lib/libmpi.so.12.1.0) >> ==25674== by 0x5C83FB1: PMPI_Recv (in /opt/mpich3/lib/libmpi.so.12.1.0) >> ==25674== by 0x1755385: pdgstrf2_trsm (pdgstrf2.c:253) >> ==25674== by 0x1751E4F: pdgstrf (dlook_ahead_update.c:195) >> ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) >> >> ==25674== Use of uninitialised value of size 8 >> ==25674== at 0x1751E92: pdgstrf (dlook_ahead_update.c:205) >> ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) >> >> And it crashes after this: >> >> ==25674== Invalid write of size 4 >> ==25674== at 0x1751F2F: pdgstrf (dlook_ahead_update.c:211) >> ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) >> ==25674== by 0xAAEFAE: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:421) >> ==25674== Address 0xa0 is not stack'd, malloc'd or (recently) free'd >> ==25674== >> [1]PETSC ERROR: >> ------------------------------------------------------------------------ >> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably >> memory access out of range >> >> >> On 10/11/2016 03:26 PM, Anton Popov wrote: >>> On 10/10/2016 07:11 PM, Satish Balay wrote: >>>> Thats from petsc-3.5 >>>> >>>> Anton - please post the stack trace you get with >>>> --download-superlu_dist-commit=origin/maint >>> I guess this is it: >>> >>> [0]PETSC ERROR: [0] SuperLU_DIST:pdgssvx line 421 >>> /home/anton/LIB/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >>> [0]PETSC ERROR: [0] MatLUFactorNumeric_SuperLU_DIST line 282 >>> /home/anton/LIB/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >>> [0]PETSC ERROR: [0] MatLUFactorNumeric line 2985 >>> /home/anton/LIB/petsc/src/mat/interface/matrix.c >>> [0]PETSC ERROR: [0] PCSetUp_LU line 101 >>> /home/anton/LIB/petsc/src/ksp/pc/impls/factor/lu/lu.c >>> [0]PETSC ERROR: [0] PCSetUp line 930 >>> /home/anton/LIB/petsc/src/ksp/pc/interface/precon.c >>> >>> According to the line numbers it crashes within >>> MatLUFactorNumeric_SuperLU_DIST while calling pdgssvx. >>> >>> Surprisingly this only happens on the second SNES iteration, but not on the >>> first. >>> >>> I'm trying to reproduce this behavior with PETSc KSP and SNES examples. >>> However, everything I've tried up to now with SuperLU_DIST does just fine. >>> >>> I'm also checking our code in Valgrind to make sure it's clean. >>> >>> Anton >>>> Satish >>>> >>>> >>>> On Mon, 10 Oct 2016, Xiaoye S. Li wrote: >>>> >>>>> Which version of superlu_dist does this capture? I looked at the >>>>> original >>>>> error log, it pointed to pdgssvx: line 161. But that line is in >>>>> comment >>>>> block, not the program. >>>>> >>>>> Sherry >>>>> >>>>> >>>>> On Mon, Oct 10, 2016 at 7:27 AM, Anton Popov wrote: >>>>> >>>>>> On 10/07/2016 05:23 PM, Satish Balay wrote: >>>>>> >>>>>>> On Fri, 7 Oct 2016, Kong, Fande wrote: >>>>>>> >>>>>>> On Fri, Oct 7, 2016 at 9:04 AM, Satish Balay >>>>>>> wrote: >>>>>>>> On Fri, 7 Oct 2016, Anton Popov wrote: >>>>>>>>> Hi guys, >>>>>>>>>> are there any news about fixing buggy behavior of >>>>>>>>>> SuperLU_DIST, exactly >>>>>>>>>> >>>>>>>>> what >>>>>>>>> >>>>>>>>>> is described here: >>>>>>>>>> >>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists. >>>>>>>>>> >>>>>>>>> mcs.anl.gov_pipermail_petsc-2Dusers_2015-2DAugust_026802.htm >>>>>>>>> l&d=CwIBAg&c= >>>>>>>>> 54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_ >>>>>>>>> JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=RwruX6ckX0t9H89Z6LXKBfJBOAM2vG >>>>>>>>> 1sQHw2tIsSQtA&s=bbB62oGLm582JebVs8xsUej_OX0eUwibAKsRRWKafos&e= ? >>>>>>>>> >>>>>>>>>> I'm using 3.7.4 and still get SEGV in pdgssvx routine. >>>>>>>>>> Everything works >>>>>>>>>> >>>>>>>>> fine >>>>>>>>> >>>>>>>>>> with 3.5.4. >>>>>>>>>> >>>>>>>>>> Do I still have to stick to maint branch, and what are the >>>>>>>>>> chances for >>>>>>>>>> >>>>>>>>> these >>>>>>>>> >>>>>>>>>> fixes to be included in 3.7.5? >>>>>>>>>> >>>>>>>>> 3.7.4. is off maint branch [as of a week ago]. So if you are >>>>>>>>> seeing >>>>>>>>> issues with it - its best to debug and figure out the cause. >>>>>>>>> >>>>>>>>> This bug is indeed inside of superlu_dist, and we started having >>>>>>>>> this >>>>>>>> issue >>>>>>>> from PETSc-3.6.x. I think superlu_dist developers should have >>>>>>>> fixed this >>>>>>>> bug. We forgot to update superlu_dist?? This is not a thing users >>>>>>>> could >>>>>>>> debug and fix. >>>>>>>> >>>>>>>> I have many people in INL suffering from this issue, and they have >>>>>>>> to >>>>>>>> stay >>>>>>>> with PETSc-3.5.4 to use superlu_dist. >>>>>>>> >>>>>>> To verify if the bug is fixed in latest superlu_dist - you can try >>>>>>> [assuming you have git - either from petsc-3.7/maint/master]: >>>>>>> >>>>>>> --download-superlu_dist --download-superlu_dist-commit=origin/maint >>>>>>> >>>>>>> >>>>>>> Satish >>>>>>> >>>>>>> Hi Satish, >>>>>> I did this: >>>>>> >>>>>> git clone -b maint https://bitbucket.org/petsc/petsc.git petsc >>>>>> >>>>>> --download-superlu_dist >>>>>> --download-superlu_dist-commit=origin/maint (not sure this is needed, >>>>>> since I'm already in maint) >>>>>> >>>>>> The problem is still there. >>>>>> >>>>>> Cheers, >>>>>> Anton >>>>>> >> >> From jed at jedbrown.org Tue Oct 11 16:19:05 2016 From: jed at jedbrown.org (Jed Brown) Date: Tue, 11 Oct 2016 15:19:05 -0600 Subject: [petsc-users] large PetscCommDuplicate overhead In-Reply-To: <002b01d22403$aee809f0$0cb81dd0$@capesim.com> References: <004201d21f3e$ed31c120$c7954360$@capesim.com> <1EF15B5B-168C-4FFD-98BB-4C49678C02FC@mcs.anl.gov> <001801d21fe8$a3e67970$ebb36c50$@capesim.com> <002b01d22403$aee809f0$0cb81dd0$@capesim.com> Message-ID: <87zima8tyu.fsf@jedbrown.org> Matthew Overholt writes: > Barry, > > Subsequent tests with the same code and a problem (input) having a much > smaller vertex (equation) count (i.e. a much smaller matrix to invert for > the solution) have NOT had PetscCommDuplicate() account for any significant > time, so I'm not surprised that your test didn't find any problem. Can you re-run the large and small configurations with the same code/environment and resend those logs? PetscCommDuplicate has nothing to do with the problem size, so any difference in cost must be indirect, though attribute access should be simple and independent. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From bsmith at mcs.anl.gov Tue Oct 11 16:44:29 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 11 Oct 2016 16:44:29 -0500 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: References: <6C480615-29D4-4C1A-8FE1-2B42BC96A69C@mcs.anl.gov> Message-ID: <1C0178D5-3C93-489F-A96E-E1F81C185873@mcs.anl.gov> Fande, Could you send me (petsc-maint at mcs.anl.gov) a non symmetric matrix you have that has a different null space for A and A'. This would be one that is failing with right preconditioning. Smaller the better but whatever size you have. Run the code with -ksp_view_mat binary and send the resulting file called binaryoutput. I need a test matrix to update the PETSc code for this case. Barry > On Oct 11, 2016, at 3:04 PM, Kong, Fande wrote: > > > > On Tue, Oct 11, 2016 at 12:18 PM, Barry Smith wrote: > > > On Oct 11, 2016, at 12:01 PM, Kong, Fande wrote: > > > > > > > > On Tue, Oct 11, 2016 at 10:39 AM, Barry Smith wrote: > > > > > On Oct 11, 2016, at 9:33 AM, Kong, Fande wrote: > > > > > > Barry, Thanks so much for your explanation. It helps me a lot. > > > > > > On Mon, Oct 10, 2016 at 4:00 PM, Barry Smith wrote: > > > > > > > On Oct 10, 2016, at 4:01 PM, Kong, Fande wrote: > > > > > > > > Hi All, > > > > > > > > I know how to remove the null spaces from a singular system using creating a MatNullSpace and attaching it to Mat. > > > > > > > > I was really wondering what is the philosophy behind this? The exact algorithms we are using in PETSc right now? Where we are dealing with this, preconditioner, linear solver, or nonlinear solver? > > > > > > It is in the Krylov solver. > > > > > > The idea is very simple. Say you have a singular A with null space N (that all values Ny are in the null space of A. So N is tall and skinny) and you want to solve A x = b where b is in the range of A. This problem has an infinite number of solutions Ny + x* since A (Ny + x*) = ANy + Ax* = Ax* = b where x* is the "minimum norm solution; that is Ax* = b and x* has the smallest norm of all solutions. > > > > > > With left preconditioning B A x = B b GMRES, for example, normally computes the solution in the as alpha_1 Bb + alpha_2 BABb + alpha_3 BABABAb + .... but the B operator will likely introduce some component into the direction of the null space so as GMRES continues the "solution" computed will grow larger and larger with a large component in the null space of A. Hence we simply modify GMRES a tiny bit by building the solution from alpha_1 (I-N)Bb + alpha_2 (I-N)BABb + alpha_3 > > > > > > Does "I" mean an identity matrix? Could you possibly send me a link for this GMRES implementation, that is, how PETSc does this in the actual code? > > > > Yes. > > > > It is in the helper routine KSP_PCApplyBAorAB() > > #undef __FUNCT__ > > #define __FUNCT__ "KSP_PCApplyBAorAB" > > PETSC_STATIC_INLINE PetscErrorCode KSP_PCApplyBAorAB(KSP ksp,Vec x,Vec y,Vec w) > > { > > PetscErrorCode ierr; > > PetscFunctionBegin; > > if (!ksp->transpose_solve) { > > ierr = PCApplyBAorAB(ksp->pc,ksp->pc_side,x,y,w);CHKERRQ(ierr); > > ierr = KSP_RemoveNullSpace(ksp,y);CHKERRQ(ierr); > > } else { > > ierr = PCApplyBAorABTranspose(ksp->pc,ksp->pc_side,x,y,w);CHKERRQ(ierr); > > } > > PetscFunctionReturn(0); > > } > > > > > > PETSC_STATIC_INLINE PetscErrorCode KSP_RemoveNullSpace(KSP ksp,Vec y) > > { > > PetscErrorCode ierr; > > PetscFunctionBegin; > > if (ksp->pc_side == PC_LEFT) { > > Mat A; > > MatNullSpace nullsp; > > ierr = PCGetOperators(ksp->pc,&A,NULL);CHKERRQ(ierr); > > ierr = MatGetNullSpace(A,&nullsp);CHKERRQ(ierr); > > if (nullsp) { > > ierr = MatNullSpaceRemove(nullsp,y);CHKERRQ(ierr); > > } > > } > > PetscFunctionReturn(0); > > } > > > > "ksp->pc_side == PC_LEFT" deals with the left preconditioning Krylov methods only? How about the right preconditioning ones? Are they just magically right for the right preconditioning Krylov methods? > > This is a good question. I am working on a branch now where I will add some more comprehensive testing of the various cases and fix anything that comes up. > > Were you having trouble with ASM and bjacobi only for right preconditioning? > > > Yes. ASM and bjacobi works fine for left preconditioning NOT for RIGHT preconditioning. bjacobi converges, but produces a wrong solution. ASM needs more iterations, however the solution is right. > > > > Note that when A is symmetric the range of A is orthogonal to null space of A so yes I think in that case it is just "magically right" but if A is not symmetric then I don't think it is "magically right". I'll work on it. > > > Barry > > > > > Fande Kong, > > > > > > There is no code directly in the GMRES or other methods. > > > > > > > > (I-N)BABABAb + .... that is we remove from each new direction anything in the direction of the null space. Hence the null space doesn't directly appear in the preconditioner, just in the KSP method. If you attach a null space to the matrix, the KSP just automatically uses it to do the removal above. > > > > > > With right preconditioning the solution is built from alpha_1 b + alpha_2 ABb + alpha_3 ABABb + .... and again we apply (I-N) to each term to remove any part that is in the null space of A. > > > > > > Now consider the case A y = b where b is NOT in the range of A. So the problem has no "true" solution, but one can find a least squares solution by rewriting b = b_par + b_perp where b_par is in the range of A and b_perp is orthogonal to the range of A and solve instead A x = b_perp. If you provide a MatSetTransposeNullSpace() then KSP automatically uses it to remove b_perp from the right hand side before starting the KSP iterations. > > > > > > The manual pages for MatNullSpaceAttach() and MatTranposeNullSpaceAttach() discuss this an explain how it relates to the fundamental theorem of linear algebra. > > > > > > Note that for symmetric matrices the two null spaces are the same. > > > > > > Barry > > > > > > > > > A different note: This "trick" is not a "cure all" for a totally inappropriate preconditioner. For example if one uses for a preconditioner a direct (sparse or dense) solver or an ILU(k) one can end up with a very bad solver because the direct solver will likely produce a very small pivot at some point thus the triangular solver applied in the precondition can produce HUGE changes in the solution (that are not physical) and so the preconditioner basically produces garbage. On the other hand sometimes it works out ok. > > > > > > What preconditioners are appropriate? asm, bjacobi, amg? I have an example which shows lu and ilu indeed work, but asm and bjacobi do not at all. That is why I am asking questions about algorithms. I am trying to figure out a default preconditioner for several singular systems. > > > > Hmm, normally asm and bjacobi would be fine with this unless one or more of the subblocks are themselves singular (which normally won't happen). AMG can also work find sometimes. > > > > Can you send a sample code? > > > > Barry > > > > > > > > Thanks again. > > > > > > > > > Fande Kong, > > > > > > > > > > > > > > > > > > > > > Fande Kong, > > From fande.kong at inl.gov Tue Oct 11 17:07:15 2016 From: fande.kong at inl.gov (Kong, Fande) Date: Tue, 11 Oct 2016 16:07:15 -0600 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: <1C0178D5-3C93-489F-A96E-E1F81C185873@mcs.anl.gov> References: <6C480615-29D4-4C1A-8FE1-2B42BC96A69C@mcs.anl.gov> <1C0178D5-3C93-489F-A96E-E1F81C185873@mcs.anl.gov> Message-ID: Barry, I am trying to reproduce this issue using a pure PETSc code. VecLoad does not work for me. I do not know why. Anyway, I can reproduce this using a very small system. Here are some info: Mat, A Mat Object:() 2 MPI processes type: mpiaij row 0: (0, 1.) row 1: (0, -0.820827) (1, 1.51669) (2, -0.820827) row 2: (1, -0.820827) (2, 1.51669) (3, -0.820827) row 3: (2, -0.820827) (3, 1.51669) (4, -0.820827) row 4: (3, -0.820827) (4, 1.51669) (5, -0.820827) row 5: (4, -0.820827) (5, 1.51669) (6, -0.820827) row 6: (5, -0.820827) (6, 1.51669) (7, -0.820827) row 7: (6, -0.820827) (7, 1.51669) (8, -0.820827) row 8: (8, 1.) Right hand side b: Vec Object: 2 MPI processes type: mpi Process [0] 0. -0.356693 -0.50444 -0.356693 -5.55112e-17 Process [1] 0.356693 0.50444 0.356693 0. Mat Null space N(A): Vec Object: 2 MPI processes type: mpi Process [0] 0. 0.191342 0.353553 0.46194 0.5 Process [1] 0.46194 0.353553 0.191342 6.12323e-17 Please run with two MPI threads using -ksp_pc_side right -pc_type bjacobi and -ksp_pc_side left -pc_type bjacobi. Will produce different solutions. The one obtained with using "left" is correct (we have an analytical solution). I also attached data for matrix, rhs and nullspace, but I am not sure if you can read them or not. I can load mat.dat, but I could not read rhs.dat and nullspace.dat. Fande, On Tue, Oct 11, 2016 at 3:44 PM, Barry Smith wrote: > > Fande, > > Could you send me (petsc-maint at mcs.anl.gov) a non symmetric matrix > you have that has a different null space for A and A'. This would be one > that is failing with right preconditioning. Smaller the better but whatever > size you have. Run the code with -ksp_view_mat binary and send the > resulting file called binaryoutput. > > I need a test matrix to update the PETSc code for this case. > > > Barry > > > On Oct 11, 2016, at 3:04 PM, Kong, Fande wrote: > > > > > > > > On Tue, Oct 11, 2016 at 12:18 PM, Barry Smith > wrote: > > > > > On Oct 11, 2016, at 12:01 PM, Kong, Fande wrote: > > > > > > > > > > > > On Tue, Oct 11, 2016 at 10:39 AM, Barry Smith > wrote: > > > > > > > On Oct 11, 2016, at 9:33 AM, Kong, Fande wrote: > > > > > > > > Barry, Thanks so much for your explanation. It helps me a lot. > > > > > > > > On Mon, Oct 10, 2016 at 4:00 PM, Barry Smith > wrote: > > > > > > > > > On Oct 10, 2016, at 4:01 PM, Kong, Fande > wrote: > > > > > > > > > > Hi All, > > > > > > > > > > I know how to remove the null spaces from a singular system using > creating a MatNullSpace and attaching it to Mat. > > > > > > > > > > I was really wondering what is the philosophy behind this? The > exact algorithms we are using in PETSc right now? Where we are dealing > with this, preconditioner, linear solver, or nonlinear solver? > > > > > > > > It is in the Krylov solver. > > > > > > > > The idea is very simple. Say you have a singular A with null > space N (that all values Ny are in the null space of A. So N is tall and > skinny) and you want to solve A x = b where b is in the range of A. This > problem has an infinite number of solutions Ny + x* since A (Ny + x*) > = ANy + Ax* = Ax* = b where x* is the "minimum norm solution; that is Ax* = > b and x* has the smallest norm of all solutions. > > > > > > > > With left preconditioning B A x = B b GMRES, for example, > normally computes the solution in the as alpha_1 Bb + alpha_2 BABb + > alpha_3 BABABAb + .... but the B operator will likely introduce some > component into the direction of the null space so as GMRES continues the > "solution" computed will grow larger and larger with a large component in > the null space of A. Hence we simply modify GMRES a tiny bit by building > the solution from alpha_1 (I-N)Bb + alpha_2 (I-N)BABb + alpha_3 > > > > > > > > Does "I" mean an identity matrix? Could you possibly send me a link > for this GMRES implementation, that is, how PETSc does this in the actual > code? > > > > > > Yes. > > > > > > It is in the helper routine KSP_PCApplyBAorAB() > > > #undef __FUNCT__ > > > #define __FUNCT__ "KSP_PCApplyBAorAB" > > > PETSC_STATIC_INLINE PetscErrorCode KSP_PCApplyBAorAB(KSP ksp,Vec x,Vec > y,Vec w) > > > { > > > PetscErrorCode ierr; > > > PetscFunctionBegin; > > > if (!ksp->transpose_solve) { > > > ierr = PCApplyBAorAB(ksp->pc,ksp->pc_side,x,y,w);CHKERRQ(ierr); > > > ierr = KSP_RemoveNullSpace(ksp,y);CHKERRQ(ierr); > > > } else { > > > ierr = PCApplyBAorABTranspose(ksp->pc,ksp->pc_side,x,y,w); > CHKERRQ(ierr); > > > } > > > PetscFunctionReturn(0); > > > } > > > > > > > > > PETSC_STATIC_INLINE PetscErrorCode KSP_RemoveNullSpace(KSP ksp,Vec y) > > > { > > > PetscErrorCode ierr; > > > PetscFunctionBegin; > > > if (ksp->pc_side == PC_LEFT) { > > > Mat A; > > > MatNullSpace nullsp; > > > ierr = PCGetOperators(ksp->pc,&A,NULL);CHKERRQ(ierr); > > > ierr = MatGetNullSpace(A,&nullsp);CHKERRQ(ierr); > > > if (nullsp) { > > > ierr = MatNullSpaceRemove(nullsp,y);CHKERRQ(ierr); > > > } > > > } > > > PetscFunctionReturn(0); > > > } > > > > > > "ksp->pc_side == PC_LEFT" deals with the left preconditioning Krylov > methods only? How about the right preconditioning ones? Are they just > magically right for the right preconditioning Krylov methods? > > > > This is a good question. I am working on a branch now where I will > add some more comprehensive testing of the various cases and fix anything > that comes up. > > > > Were you having trouble with ASM and bjacobi only for right > preconditioning? > > > > > > Yes. ASM and bjacobi works fine for left preconditioning NOT for RIGHT > preconditioning. bjacobi converges, but produces a wrong solution. ASM > needs more iterations, however the solution is right. > > > > > > > > Note that when A is symmetric the range of A is orthogonal to null > space of A so yes I think in that case it is just "magically right" but if > A is not symmetric then I don't think it is "magically right". I'll work on > it. > > > > > > Barry > > > > > > > > Fande Kong, > > > > > > > > > There is no code directly in the GMRES or other methods. > > > > > > > > > > > (I-N)BABABAb + .... that is we remove from each new direction > anything in the direction of the null space. Hence the null space doesn't > directly appear in the preconditioner, just in the KSP method. If you > attach a null space to the matrix, the KSP just automatically uses it to do > the removal above. > > > > > > > > With right preconditioning the solution is built from alpha_1 b > + alpha_2 ABb + alpha_3 ABABb + .... and again we apply (I-N) to each term > to remove any part that is in the null space of A. > > > > > > > > Now consider the case A y = b where b is NOT in the range of A. > So the problem has no "true" solution, but one can find a least squares > solution by rewriting b = b_par + b_perp where b_par is in the range of A > and b_perp is orthogonal to the range of A and solve instead A x = > b_perp. If you provide a MatSetTransposeNullSpace() then KSP automatically > uses it to remove b_perp from the right hand side before starting the KSP > iterations. > > > > > > > > The manual pages for MatNullSpaceAttach() and > MatTranposeNullSpaceAttach() discuss this an explain how it relates to the > fundamental theorem of linear algebra. > > > > > > > > Note that for symmetric matrices the two null spaces are the same. > > > > > > > > Barry > > > > > > > > > > > > A different note: This "trick" is not a "cure all" for a totally > inappropriate preconditioner. For example if one uses for a preconditioner > a direct (sparse or dense) solver or an ILU(k) one can end up with a very > bad solver because the direct solver will likely produce a very small pivot > at some point thus the triangular solver applied in the precondition can > produce HUGE changes in the solution (that are not physical) and so the > preconditioner basically produces garbage. On the other hand sometimes it > works out ok. > > > > > > > > What preconditioners are appropriate? asm, bjacobi, amg? I have an > example which shows lu and ilu indeed work, but asm and bjacobi do not at > all. That is why I am asking questions about algorithms. I am trying to > figure out a default preconditioner for several singular systems. > > > > > > Hmm, normally asm and bjacobi would be fine with this unless one or > more of the subblocks are themselves singular (which normally won't > happen). AMG can also work find sometimes. > > > > > > Can you send a sample code? > > > > > > Barry > > > > > > > > > > > Thanks again. > > > > > > > > > > > > Fande Kong, > > > > > > > > > > > > > > > > > > > > > > > > > > > Fande Kong, > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mat.dat Type: application/octet-stream Size: 328 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: nullspace.dat Type: application/octet-stream Size: 80 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rhs.dat Type: application/octet-stream Size: 80 bytes Desc: not available URL: From mono at dtu.dk Wed Oct 12 06:40:31 2016 From: mono at dtu.dk (=?iso-8859-1?Q?Morten_Nobel-J=F8rgensen?=) Date: Wed, 12 Oct 2016 11:40:31 +0000 Subject: [petsc-users] Element to local dof map using dmplex Message-ID: <6B03D347796DED499A2696FC095CE81A05B5E20A@ait-pex02mbx04.win.dtu.dk> Dear PETSc developers / Matt Thanks for your suggestions regarding our use of dmplex in a FEM context. However, Matt's advise on using the PetscFE is not sufficient for our needs (our end goal is a topology optimization framework - not just FEM) and we must honestly admit that we do not see how we can use the MATIS and the MatSetValuesClosure or DMPlexMatSetClosure to solve our current issues as Stefano has suggested. We have therefore created a more representative, yet heavily oversimplified, code example that demonstrates our problem. That is, the dof handling is only correct on a single process and goes wrong on np>1. We hope very much that you can help us to overcome our problem. Thank you for an excellent toolkit Morten and Niels -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dmplex_test.tar Type: application/x-tar Size: 54272 bytes Desc: dmplex_test.tar URL: From jeremy at seamplex.com Wed Oct 12 08:18:58 2016 From: jeremy at seamplex.com (jeremy theler) Date: Wed, 12 Oct 2016 13:18:58 +0000 Subject: [petsc-users] Autoconf tests In-Reply-To: References: Message-ID: I once made a quick hack, maybe you can start your dirty work from here https://bitbucket.org/wasora/wasora/src/5a88abbac1a846f2a6ed0b4e585f6f3c0fedf2e7/m4/petsc.m4?at=default&fileviewer=file-view-default -- jeremy On Tue, Oct 11, 2016 at 5:31 PM Barry Smith wrote: > > You don't want to get the debug mode from PETSC_ARCH since there may > not be a PETSC_ARCH (for PETSc --prefix installs) or because the user did > not put the string in it. You can check for the PETSC_USE_DEBUG symbol in > the petscconf.h file by linking a C program against and #if > defined(PETSC_USE_DEBUG). > > > Barry > > > On Oct 11, 2016, at 3:12 PM, CLAUS HELLMUTH WARNER HETZER < > claus at olemiss.edu> wrote: > > > > Hi everybody- > > > > Figured I?d ask this here before I go reinventing the wheel. > > > > I?m writing an autoconf installer (the standard Linux configure/make > package) for an acoustic wave propagation modeling package that builds > PETSc and SLEPc as part of the installation process. I?d like to be able > to test for instances of PETSc already being installed on the user?s > machine and, if possible, whether they?re the debug version. I know I can > check for the existence of the PETSC_DIR environmental variable, and parse > the PETSC_ARCH variable for ?debug?, and I?ll do that as a first pass, but > has anybody written any M4 tests that are more reliable than those (i.e. > actually attempting to link to the libraries)? I had one user who had the > libraries installed in /usr/local/bin but didn?t have the environmental > variables set in their profile, so the linker was confused and it took a > while to figure out what was going weird with the install. > > > > If not, I guess I?ll be putting on my Autoconf gloves and getting my > hands dirty. > > > > Thanks > > -Claus Hetzer > > > > ------------------ > > Claus Hetzer > > Senior Research and Development Engineer > > National Center for Physical Acoustics > > The University of Mississippi > > 145 Hill Drive > > PO Box 1848 > > University, MS 38677 > > claus at olemiss.edu > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Oct 12 08:41:10 2016 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 12 Oct 2016 08:41:10 -0500 Subject: [petsc-users] Element to local dof map using dmplex In-Reply-To: <6B03D347796DED499A2696FC095CE81A05B5E20A@ait-pex02mbx04.win.dtu.dk> References: <6B03D347796DED499A2696FC095CE81A05B5E20A@ait-pex02mbx04.win.dtu.dk> Message-ID: On Wed, Oct 12, 2016 at 6:40 AM, Morten Nobel-J?rgensen wrote: > Dear PETSc developers / Matt > > Thanks for your suggestions regarding our use of dmplex in a FEM context. > However, Matt's advise on using the PetscFE is not sufficient for our > needs (our end goal is a topology optimization framework - not just FEM) > and we must honestly admit that we do not see how we can use the MATIS and > the MatSetValuesClosure or DMPlexMatSetClosure to solve our current issues > as Stefano has suggested. > > We have therefore created a more representative, yet heavily > oversimplified, code example that demonstrates our problem. That is, the > dof handling is only correct on a single process and goes wrong on np>1. > > We hope very much that you can help us to overcome our problem. > Okay, I will look at it and try to rework it to fix your problem. I am in London this week, so it might take me until next week. Thanks, Matt > Thank you for an excellent toolkit > Morten and Niels > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From overholt at capesim.com Wed Oct 12 10:48:18 2016 From: overholt at capesim.com (Matthew Overholt) Date: Wed, 12 Oct 2016 11:48:18 -0400 Subject: [petsc-users] large PetscCommDuplicate overhead In-Reply-To: <87zima8tyu.fsf@jedbrown.org> References: <004201d21f3e$ed31c120$c7954360$@capesim.com> <1EF15B5B-168C-4FFD-98BB-4C49678C02FC@mcs.anl.gov> <001801d21fe8$a3e67970$ebb36c50$@capesim.com> <002b01d22403$aee809f0$0cb81dd0$@capesim.com> <87zima8tyu.fsf@jedbrown.org> Message-ID: <002901d224a0$0e18cf80$2a4a6e80$@capesim.com> Jed, I realize that the PetscCommDuplicate (PCD) overhead I am seeing must be only indirectly related to the problem size, etc., and I wouldn't be surprised if it was an artifact of some sort related to my specific algorithm. So you may not want to pursue this much further. However, I did make three runs using the same Edison environment and code but different input geometry files. Earlier I found a strong dependence on the number of processes, so for this test I ran all of the tests on 1 node with 8 processes (N=1, n=8). What I found was that the amount of PCD overhead was geometry dependent, not size dependent. A moderately-sized simple geometry (with relatively few ghosted vertices at the simple-planar interfaces) had no PCD overhead, whereas both small and large complex geometries (with relatively more ghosted vertices at the more-complex interfaces) had 5 - 6% PCD overhead. The log files follow. Thanks, Matt Overholt //////////////////////////////////////////////////////////////////////////// ///// // pllSs20 : Simple Pencil Shape with Small Planar Interfaces // No PetscCommDuplicate() overhead in CrayPat sampling. //////////////////////////////////////////////////////////////////////////// ///// ...713097 vertices...646400 elements (moderate size) Total run time was 47.853 seconds. ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- Using Petsc Release Version 3.7.2, Jun, 05, 2016 Max Max/Min Avg Total Time (sec): 4.822e+01 1.01022 4.780e+01 Objects: 1.160e+02 1.00000 1.160e+02 Flops: 1.425e+06 1.00000 1.425e+06 1.140e+07 Flops/sec: 2.986e+04 1.01022 2.982e+04 2.385e+05 MPI Messages: 1.140e+02 1.46154 9.500e+01 7.600e+02 MPI Message Lengths: 7.202e+07 3.55575 2.888e+05 2.195e+08 MPI Reductions: 2.600e+02 1.00000 Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 4.7802e+01 100.0% 1.1402e+07 100.0% 7.600e+02 100.0% 2.888e+05 100.0% 2.590e+02 99.6% ---------------------------------------------------------------------------- -------------------------------------------- Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ---------------------------------------------------------------------------- -------------------------------------------- --- Event Stage 0: Main Stage VecMax 8 1.0 3.8538e-03 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 3 0 0 0 0 3 0 VecNorm 8 1.0 1.8222e-03 1.0 7.13e+05 1.0 0.0e+00 0.0e+00 8.0e+00 0 50 0 0 3 0 50 0 0 3 3129 VecCopy 8 1.0 5.5838e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 31 1.0 9.2564e-03 5.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAYPX 8 1.0 1.0111e-03 1.2 7.13e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 50 0 0 0 0 50 0 0 0 5638 VecAssemblyBegin 26 1.0 4.5311e-01 1.9 0.00e+00 0.0 2.1e+02 5.7e+05 7.8e+01 1 0 28 55 30 1 0 28 55 30 0 VecAssemblyEnd 26 1.0 5.2852e-0211.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 39 1.0 4.5546e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 3.9e+01 0 0 0 29 15 0 0 0 29 15 0 VecScatterEnd 19 1.0 2.4319e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatSolve 8 1.0 1.3143e+00 1.0 0.00e+00 0.0 3.4e+02 2.0e+05 3.2e+01 3 0 44 31 12 3 0 44 31 12 0 MatLUFactorSym 1 1.0 1.1012e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 23 0 0 0 2 23 0 0 0 2 0 MatLUFactorNum 8 1.0 3.1378e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 66 0 0 0 0 66 0 0 0 0 0 MatAssemblyBegin 8 1.0 1.5738e-03 2.0 0.00e+00 0.0 1.7e+02 6.8e+04 1.6e+01 0 0 22 5 6 0 0 22 5 6 0 MatAssemblyEnd 8 1.0 3.4761e-02 1.0 0.00e+00 0.0 2.8e+01 8.8e+02 2.3e+01 0 0 4 0 9 0 0 4 0 9 0 MatGetRowIJ 1 1.0 2.4080e-05 5.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 3.5882e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 8 1.0 3.6283e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetUp 1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 8 1.0 4.3712e+01 1.0 0.00e+00 0.0 3.4e+02 2.0e+05 4.3e+01 91 0 44 31 17 91 0 44 31 17 0 PCSetUp 8 1.0 4.2397e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.1e+01 89 0 0 0 4 89 0 0 0 4 0 PCApply 8 1.0 1.3144e+00 1.0 0.00e+00 0.0 3.4e+02 2.0e+05 3.2e+01 3 0 44 31 12 3 0 44 31 12 0 ---------------------------------------------------------------------------- -------------------------------------------- Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 46 43 49344760 0. Vector Scatter 23 22 18176 0. Index Set 37 37 3279764 0. IS L to G Mapping 1 0 0 0. Matrix 6 6 43498680 0. Krylov Solver 1 1 1160 0. Preconditioner 1 1 992 0. Viewer 1 0 0 0. ============================================================================ ============================================ Average time to get PetscTime(): 5.00679e-07 Average time for MPI_Barrier(): 1.81198e-06 Average time for zero size MPI_Send(): 3.75509e-06 //////////////////////////////////////////////////////////////////////////// ///// // HBT-HP5 : Small Complex Shape with Complex Interfaces // 5.3% PetscCommDuplicate() overhead in CrayPat sampling. //////////////////////////////////////////////////////////////////////////// ///// ...50564 vertices...45420 elements (small size) Total run time was 4.863 seconds. ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- Using Petsc Release Version 3.7.2, Jun, 05, 2016 Max Max/Min Avg Total Time (sec): 4.930e+00 1.00841 4.894e+00 Objects: 1.080e+02 1.00000 1.080e+02 Flops: 7.539e+04 1.06240 7.267e+04 5.813e+05 Flops/sec: 1.542e+04 1.06244 1.485e+04 1.188e+05 MPI Messages: 2.070e+02 2.43529 1.628e+02 1.302e+03 MPI Message Lengths: 7.824e+06 2.40897 2.965e+04 3.861e+07 MPI Reductions: 2.320e+02 1.00000 Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 4.8945e+00 100.0% 5.8133e+05 100.0% 1.302e+03 100.0% 2.965e+04 100.0% 2.310e+02 99.6% ---------------------------------------------------------------------------- -------------------------------------------- Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ---------------------------------------------------------------------------- -------------------------------------------- --- Event Stage 0: Main Stage VecMax 6 1.0 1.1725e-0321.6 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 3 0 0 0 0 3 0 VecNorm 6 1.0 2.2173e-02 1.0 3.77e+04 1.1 0.0e+00 0.0e+00 6.0e+00 0 50 0 0 3 0 50 0 0 3 13 VecCopy 6 1.0 3.1948e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 27 1.0 4.0102e-04 4.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAYPX 6 1.0 6.3896e-05 1.2 3.77e+04 1.1 0.0e+00 0.0e+00 0.0e+00 0 50 0 0 0 0 50 0 0 0 4549 VecAssemblyBegin 24 1.0 5.7409e-02 1.7 0.00e+00 0.0 3.1e+02 3.7e+04 7.2e+01 1 0 24 30 31 1 0 24 30 31 0 VecAssemblyEnd 24 1.0 4.1070e-0322.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 33 1.0 6.2511e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 3.3e+01 0 0 0 13 14 0 0 0 13 14 0 VecScatterEnd 15 1.0 2.7657e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatSolve 6 1.0 1.3485e-01 1.0 0.00e+00 0.0 5.9e+02 5.8e+03 2.4e+01 3 0 45 9 10 3 0 45 9 10 0 MatLUFactorSym 1 1.0 6.4367e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 13 0 0 0 2 13 0 0 0 2 0 MatLUFactorNum 6 1.0 3.5177e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 72 0 0 0 0 72 0 0 0 0 0 MatAssemblyBegin 6 1.0 3.6111e-03 2.0 0.00e+00 0.0 3.1e+02 6.8e+04 1.2e+01 0 0 24 54 5 0 0 24 54 5 0 MatAssemblyEnd 6 1.0 2.2925e-02 1.1 0.00e+00 0.0 6.8e+01 8.8e+02 1.9e+01 0 0 5 0 8 0 0 5 0 8 0 MatGetRowIJ 1 1.0 4.0531e-06 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 4.3869e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 6 1.0 1.6685e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetUp 1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 6 1.0 4.3020e+00 1.0 0.00e+00 0.0 5.9e+02 5.8e+03 3.5e+01 88 0 45 9 15 88 0 45 9 15 0 PCSetUp 6 1.0 4.1670e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.1e+01 85 0 0 0 5 85 0 0 0 5 0 PCApply 6 1.0 1.3486e-01 1.0 0.00e+00 0.0 5.9e+02 5.8e+03 2.4e+01 3 0 45 9 10 3 0 45 9 10 0 ---------------------------------------------------------------------------- -------------------------------------------- Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 44 41 3635912 0. Vector Scatter 21 20 90672 0. Index Set 33 33 286872 0. IS L to G Mapping 1 0 0 0. Matrix 6 6 2988180 0. Krylov Solver 1 1 1160 0. Preconditioner 1 1 992 0. Viewer 1 0 0 0. ============================================================================ ============================================ Average time to get PetscTime(): 5.00679e-07 Average time for MPI_Barrier(): 2.00272e-06 Average time for zero size MPI_Send(): 3.12924e-06 //////////////////////////////////////////////////////////////////////////// ///// // GaNSi13 : Large Complex Shape with Complex Interfaces // 6.4% PetscCommDuplicate() overhead in CrayPat sampling. //////////////////////////////////////////////////////////////////////////// ///// ...1642311 vertices...1497368 elements (large size) Total run time was 260.958 seconds. ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- Using Petsc Release Version 3.7.2, Jun, 05, 2016 Max Max/Min Avg Total Time (sec): 2.619e+02 1.00319 2.611e+02 Objects: 1.040e+02 1.00000 1.040e+02 Flops: 2.050e+06 1.07341 1.969e+06 1.575e+07 Flops/sec: 7.853e+03 1.07354 7.541e+03 6.032e+04 MPI Messages: 1.835e+02 1.47390 1.448e+02 1.158e+03 MPI Message Lengths: 1.761e+08 3.47614 4.801e+05 5.560e+08 MPI Reductions: 2.180e+02 1.00000 Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 2.6114e+02 100.0% 1.5754e+07 100.0% 1.158e+03 100.0% 4.801e+05 100.0% 2.170e+02 99.5% ---------------------------------------------------------------------------- -------------------------------------------- Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ---------------------------------------------------------------------------- -------------------------------------------- --- Event Stage 0: Main Stage VecMax 5 1.0 5.5013e-03 7.3 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 0 0 0 0 2 0 0 0 0 2 0 VecNorm 5 1.0 1.8921e-03 1.0 1.02e+06 1.1 0.0e+00 0.0e+00 5.0e+00 0 50 0 0 2 0 50 0 0 2 4163 VecCopy 5 1.0 9.0218e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 25 1.0 2.4175e-0210.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAYPX 5 1.0 1.2169e-03 1.1 1.02e+06 1.1 0.0e+00 0.0e+00 0.0e+00 0 50 0 0 0 0 50 0 0 0 6473 VecAssemblyBegin 23 1.0 2.6960e+00 1.6 0.00e+00 0.0 3.2e+02 9.2e+05 6.9e+01 1 0 28 54 32 1 0 28 54 32 0 VecAssemblyEnd 23 1.0 1.2512e-0111.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 30 1.0 1.3994e-01 3.3 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+01 0 0 0 19 14 0 0 0 19 14 0 VecScatterEnd 13 1.0 4.6802e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatSolve 5 1.0 2.9838e+00 1.0 0.00e+00 0.0 3.7e+02 2.3e+05 2.0e+01 1 0 32 15 9 1 0 32 15 9 0 MatLUFactorSym 1 1.0 2.8861e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 11 0 0 0 2 11 0 0 0 2 0 MatLUFactorNum 5 1.0 1.9893e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 76 0 0 0 0 76 0 0 0 0 0 MatAssemblyBegin 5 1.0 1.6689e-02 2.9 0.00e+00 0.0 3.3e+02 3.7e+05 1.0e+01 0 0 28 22 5 0 0 28 22 5 0 MatAssemblyEnd 5 1.0 5.6672e+00 1.0 0.00e+00 0.0 9.2e+01 4.5e+03 1.7e+01 2 0 8 0 8 2 0 8 0 8 0 MatGetRowIJ 1 1.0 4.0531e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 7.0381e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 5 1.0 7.0859e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetUp 1 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 5 1.0 2.3079e+02 1.0 0.00e+00 0.0 3.7e+02 2.3e+05 3.1e+01 88 0 32 15 14 88 0 32 15 14 0 PCSetUp 5 1.0 2.2781e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.1e+01 87 0 0 0 5 87 0 0 0 5 0 PCApply 5 1.0 2.9838e+00 1.0 0.00e+00 0.0 3.7e+02 2.3e+05 2.0e+01 1 0 32 15 9 1 0 32 15 9 0 ---------------------------------------------------------------------------- -------------------------------------------- Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 43 40 114480760 0. Vector Scatter 20 19 291760 0. Index Set 31 31 5440808 0. IS L to G Mapping 1 0 0 0. Matrix 6 6 99689528 0. Krylov Solver 1 1 1160 0. Preconditioner 1 1 992 0. Viewer 1 0 0 0. ============================================================================ ============================================ Average time to get PetscTime(): 1.90735e-07 Average time for MPI_Barrier(): 1.81198e-06 Average time for zero size MPI_Send(): 3.75509e-06 -----Original Message----- From: Jed Brown [mailto:jed at jedbrown.org] Sent: Tuesday, October 11, 2016 5:19 PM To: overholt at capesim.com; 'Barry Smith' Cc: 'PETSc' Subject: Re: [petsc-users] large PetscCommDuplicate overhead Matthew Overholt writes: > Barry, > > Subsequent tests with the same code and a problem (input) having a > much smaller vertex (equation) count (i.e. a much smaller matrix to > invert for the solution) have NOT had PetscCommDuplicate() account for > any significant time, so I'm not surprised that your test didn't find any problem. Can you re-run the large and small configurations with the same code/environment and resend those logs? PetscCommDuplicate has nothing to do with the problem size, so any difference in cost must be indirect, though attribute access should be simple and independent. --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus From juan at tf.uni-kiel.de Wed Oct 12 16:07:14 2016 From: juan at tf.uni-kiel.de (Julian Andrej) Date: Wed, 12 Oct 2016 23:07:14 +0200 Subject: [petsc-users] Autoconf tests In-Reply-To: References: Message-ID: libMesh provides a m4 test for finding the PETSc library. You can find it here https://github.com/libMesh/libmesh/blob/master/m4/petsc.m4 On Wed, Oct 12, 2016 at 3:18 PM, jeremy theler wrote: > I once made a quick hack, maybe you can start your dirty work from here > > https://bitbucket.org/wasora/wasora/src/5a88abbac1a846f2a6ed0b4e585f6f3c0fedf2e7/m4/petsc.m4?at=default&fileviewer=file-view-default > > -- > jeremy > > On Tue, Oct 11, 2016 at 5:31 PM Barry Smith wrote: >> >> >> You don't want to get the debug mode from PETSC_ARCH since there may >> not be a PETSC_ARCH (for PETSc --prefix installs) or because the user did >> not put the string in it. You can check for the PETSC_USE_DEBUG symbol in >> the petscconf.h file by linking a C program against and #if >> defined(PETSC_USE_DEBUG). >> >> >> Barry >> >> > On Oct 11, 2016, at 3:12 PM, CLAUS HELLMUTH WARNER HETZER >> > wrote: >> > >> > Hi everybody- >> > >> > Figured I?d ask this here before I go reinventing the wheel. >> > >> > I?m writing an autoconf installer (the standard Linux configure/make >> > package) for an acoustic wave propagation modeling package that builds PETSc >> > and SLEPc as part of the installation process. I?d like to be able to test >> > for instances of PETSc already being installed on the user?s machine and, if >> > possible, whether they?re the debug version. I know I can check for the >> > existence of the PETSC_DIR environmental variable, and parse the PETSC_ARCH >> > variable for ?debug?, and I?ll do that as a first pass, but has anybody >> > written any M4 tests that are more reliable than those (i.e. actually >> > attempting to link to the libraries)? I had one user who had the libraries >> > installed in /usr/local/bin but didn?t have the environmental variables set >> > in their profile, so the linker was confused and it took a while to figure >> > out what was going weird with the install. >> > >> > If not, I guess I?ll be putting on my Autoconf gloves and getting my >> > hands dirty. >> > >> > Thanks >> > -Claus Hetzer >> > >> > ------------------ >> > Claus Hetzer >> > Senior Research and Development Engineer >> > National Center for Physical Acoustics >> > The University of Mississippi >> > 145 Hill Drive >> > PO Box 1848 >> > University, MS 38677 >> > claus at olemiss.edu >> > >> > >> > >> > >> > >> > From mail2amneet at gmail.com Wed Oct 12 22:00:04 2016 From: mail2amneet at gmail.com (Amneet Bhalla) Date: Wed, 12 Oct 2016 20:00:04 -0700 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: References: Message-ID: On Monday, October 10, 2016, Barry Smith wrote: > > > On Oct 10, 2016, at 4:01 PM, Kong, Fande > wrote: > > > > Hi All, > > > > I know how to remove the null spaces from a singular system using > creating a MatNullSpace and attaching it to Mat. > > > > I was really wondering what is the philosophy behind this? The exact > algorithms we are using in PETSc right now? Where we are dealing with > this, preconditioner, linear solver, or nonlinear solver? > > It is in the Krylov solver. > > The idea is very simple. Say you have a singular A with null space N > (that all values Ny are in the null space of A. So N is tall and skinny) > and you want to solve A x = b where b is in the range of A. This problem > has an infinite number of solutions Ny + x* since A (Ny + x*) = ANy + > Ax* = Ax* = b where x* is the "minimum norm solution; that is Ax* = b and > x* has the smallest norm of all solutions. > > With left preconditioning B A x = B b GMRES, for example, normally > computes the solution in the as alpha_1 Bb + alpha_2 BABb + alpha_3 > BABABAb + .... but the B operator will likely introduce some component > into the direction of the null space so as GMRES continues the "solution" > computed will grow larger and larger with a large component in the null > space of A. Hence we simply modify GMRES a tiny bit by building the > solution from alpha_1 (I-N)Bb + alpha_2 (I-N)BABb + alpha_3 (I-N)BABABAb > + .... that is we remove from each new direction anything in the direction > of the null space. Hence the null space doesn't directly appear in the > preconditioner, just in the KSP method. If you attach a null space to the > matrix, the KSP just automatically uses it to do the removal above. Barry, if identity matrix I is of size M x M (which is also the size of A) then are you augmenting N (size M x R; R < M) by zero colums to make I - N possible? If so it means that only first R values of vector Bb are used for scaling zero Eigenvectors of A. Does this choice affect iteration count, meaning one can arbitrarily choose any R values of the vector Bb to scale zero eigenvectors of A? > > With right preconditioning the solution is built from alpha_1 b + > alpha_2 ABb + alpha_3 ABABb + .... and again we apply (I-N) to each term to > remove any part that is in the null space of A. > > Now consider the case A y = b where b is NOT in the range of A. So > the problem has no "true" solution, but one can find a least squares > solution by rewriting b = b_par + b_perp where b_par is in the range of A > and b_perp is orthogonal to the range of A and solve instead A x = > b_perp. If you provide a MatSetTransposeNullSpace() then KSP automatically > uses it to remove b_perp from the right hand side before starting the KSP > iterations. > > The manual pages for MatNullSpaceAttach() and > MatTranposeNullSpaceAttach() discuss this an explain how it relates to the > fundamental theorem of linear algebra. > > Note that for symmetric matrices the two null spaces are the same. > > Barry > > > A different note: This "trick" is not a "cure all" for a totally > inappropriate preconditioner. For example if one uses for a preconditioner > a direct (sparse or dense) solver or an ILU(k) one can end up with a very > bad solver because the direct solver will likely produce a very small pivot > at some point thus the triangular solver applied in the precondition can > produce HUGE changes in the solution (that are not physical) and so the > preconditioner basically produces garbage. On the other hand sometimes it > works out ok. > > > > > > > > Fande Kong, > > -- --Amneet -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdkong.jd at gmail.com Wed Oct 12 22:12:08 2016 From: fdkong.jd at gmail.com (Fande Kong) Date: Wed, 12 Oct 2016 21:12:08 -0600 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: References: Message-ID: On Wed, Oct 12, 2016 at 9:00 PM, Amneet Bhalla wrote: > > > On Monday, October 10, 2016, Barry Smith wrote: > >> >> > On Oct 10, 2016, at 4:01 PM, Kong, Fande wrote: >> > >> > Hi All, >> > >> > I know how to remove the null spaces from a singular system using >> creating a MatNullSpace and attaching it to Mat. >> > >> > I was really wondering what is the philosophy behind this? The exact >> algorithms we are using in PETSc right now? Where we are dealing with >> this, preconditioner, linear solver, or nonlinear solver? >> >> It is in the Krylov solver. >> >> The idea is very simple. Say you have a singular A with null space N >> (that all values Ny are in the null space of A. So N is tall and skinny) >> and you want to solve A x = b where b is in the range of A. This problem >> has an infinite number of solutions Ny + x* since A (Ny + x*) = ANy + >> Ax* = Ax* = b where x* is the "minimum norm solution; that is Ax* = b and >> x* has the smallest norm of all solutions. >> >> With left preconditioning B A x = B b GMRES, for example, >> normally computes the solution in the as alpha_1 Bb + alpha_2 BABb + >> alpha_3 BABABAb + .... but the B operator will likely introduce some >> component into the direction of the null space so as GMRES continues the >> "solution" computed will grow larger and larger with a large component in >> the null space of A. Hence we simply modify GMRES a tiny bit by building >> the solution from alpha_1 (I-N)Bb + alpha_2 (I-N)BABb + alpha_3 >> (I-N)BABABAb + .... that is we remove from each new direction anything in >> the direction of the null space. Hence the null space doesn't directly >> appear in the preconditioner, just in the KSP method. If you attach a >> null space to the matrix, the KSP just automatically uses it to do the >> removal above. > > > Barry, if identity matrix I is of size M x M (which is also the size of A) > then are you augmenting N (size M x R; R < M) by zero colums to make I - N > possible? If so it means that only first R values of vector Bb are used for > scaling zero Eigenvectors of A. Does this choice affect iteration count, > meaning one can arbitrarily choose any R values of the vector Bb to scale > zero eigenvectors of A? > I think it is just a notation. Let us denote Bb as y, that is, y = Bb. N = {x_0, x_1, ..., x_r}, where x_i is a vector. Applying I-N to y means y = y - \sum (y, x_i) x_i. (y, x_i) is the inner product of vectors y and x_i. Look at this code for more details, http://www.mcs.anl.gov/petsc/petsc-current/src/mat/interface/matnull.c.html#MatNullSpaceRemove Fande, > >> With right preconditioning the solution is built from alpha_1 b + >> alpha_2 ABb + alpha_3 ABABb + .... and again we apply (I-N) to each term to >> remove any part that is in the null space of A. >> >> Now consider the case A y = b where b is NOT in the range of A. So >> the problem has no "true" solution, but one can find a least squares >> solution by rewriting b = b_par + b_perp where b_par is in the range of A >> and b_perp is orthogonal to the range of A and solve instead A x = >> b_perp. If you provide a MatSetTransposeNullSpace() then KSP automatically >> uses it to remove b_perp from the right hand side before starting the KSP >> iterations. >> >> The manual pages for MatNullSpaceAttach() and >> MatTranposeNullSpaceAttach() discuss this an explain how it relates to the >> fundamental theorem of linear algebra. >> >> Note that for symmetric matrices the two null spaces are the same. >> >> Barry >> >> >> A different note: This "trick" is not a "cure all" for a totally >> inappropriate preconditioner. For example if one uses for a preconditioner >> a direct (sparse or dense) solver or an ILU(k) one can end up with a very >> bad solver because the direct solver will likely produce a very small pivot >> at some point thus the triangular solver applied in the precondition can >> produce HUGE changes in the solution (that are not physical) and so the >> preconditioner basically produces garbage. On the other hand sometimes it >> works out ok. >> >> >> > >> > >> > Fande Kong, >> >> > > -- > --Amneet > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Oct 12 22:12:12 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 12 Oct 2016 22:12:12 -0500 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: References: Message-ID: <5246D517-DAD5-4BEF-B262-644C4A1B18F7@mcs.anl.gov> > On Oct 12, 2016, at 10:00 PM, Amneet Bhalla wrote: > > > > On Monday, October 10, 2016, Barry Smith wrote: > > > On Oct 10, 2016, at 4:01 PM, Kong, Fande wrote: > > > > Hi All, > > > > I know how to remove the null spaces from a singular system using creating a MatNullSpace and attaching it to Mat. > > > > I was really wondering what is the philosophy behind this? The exact algorithms we are using in PETSc right now? Where we are dealing with this, preconditioner, linear solver, or nonlinear solver? > > It is in the Krylov solver. > > The idea is very simple. Say you have a singular A with null space N (that all values Ny are in the null space of A. So N is tall and skinny) and you want to solve A x = b where b is in the range of A. This problem has an infinite number of solutions Ny + x* since A (Ny + x*) = ANy + Ax* = Ax* = b where x* is the "minimum norm solution; that is Ax* = b and x* has the smallest norm of all solutions. > > With left preconditioning B A x = B b GMRES, for example, normally computes the solution in the as alpha_1 Bb + alpha_2 BABb + alpha_3 BABABAb + .... but the B operator will likely introduce some component into the direction of the null space so as GMRES continues the "solution" computed will grow larger and larger with a large component in the null space of A. Hence we simply modify GMRES a tiny bit by building the solution from alpha_1 (I-N)Bb + alpha_2 (I-N)BABb + alpha_3 (I-N)BABABAb + .... that is we remove from each new direction anything in the direction of the null space. Hence the null space doesn't directly appear in the preconditioner, just in the KSP method. If you attach a null space to the matrix, the KSP just automatically uses it to do the removal above. > > Barry, if identity matrix I is of size M x M (which is also the size of A) then are you augmenting N (size M x R; R < M) by zero colums to make I - N possible? If so it means that only first R values of vector Bb are used for scaling zero Eigenvectors of A. Does this choice affect iteration count, meaning one can arbitrarily choose any R values of the vector Bb to scale zero eigenvectors of A? Yes I wasn't very precise here. I should have written it as as the projection of the vector onto the complement of the null space which I think can be written as I - N(N'N)^-1N' This is done with the routine MatNullSpaceRemove(). The basis of the null space you provide to MatNullSpaceCreate() should not effect the convergence of the Krylov method at all since the same null space is removed regardless of the basis. Barry > > With right preconditioning the solution is built from alpha_1 b + alpha_2 ABb + alpha_3 ABABb + .... and again we apply (I-N) to each term to remove any part that is in the null space of A. > > Now consider the case A y = b where b is NOT in the range of A. So the problem has no "true" solution, but one can find a least squares solution by rewriting b = b_par + b_perp where b_par is in the range of A and b_perp is orthogonal to the range of A and solve instead A x = b_perp. If you provide a MatSetTransposeNullSpace() then KSP automatically uses it to remove b_perp from the right hand side before starting the KSP iterations. > > The manual pages for MatNullSpaceAttach() and MatTranposeNullSpaceAttach() discuss this an explain how it relates to the fundamental theorem of linear algebra. > > Note that for symmetric matrices the two null spaces are the same. > > Barry > > > A different note: This "trick" is not a "cure all" for a totally inappropriate preconditioner. For example if one uses for a preconditioner a direct (sparse or dense) solver or an ILU(k) one can end up with a very bad solver because the direct solver will likely produce a very small pivot at some point thus the triangular solver applied in the precondition can produce HUGE changes in the solution (that are not physical) and so the preconditioner basically produces garbage. On the other hand sometimes it works out ok. > > > > > > > > Fande Kong, > > > > -- > --Amneet > > > > From fdkong.jd at gmail.com Wed Oct 12 22:18:26 2016 From: fdkong.jd at gmail.com (Fande Kong) Date: Wed, 12 Oct 2016 21:18:26 -0600 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: <5246D517-DAD5-4BEF-B262-644C4A1B18F7@mcs.anl.gov> References: <5246D517-DAD5-4BEF-B262-644C4A1B18F7@mcs.anl.gov> Message-ID: On Wed, Oct 12, 2016 at 9:12 PM, Barry Smith wrote: > > > On Oct 12, 2016, at 10:00 PM, Amneet Bhalla > wrote: > > > > > > > > On Monday, October 10, 2016, Barry Smith wrote: > > > > > On Oct 10, 2016, at 4:01 PM, Kong, Fande wrote: > > > > > > Hi All, > > > > > > I know how to remove the null spaces from a singular system using > creating a MatNullSpace and attaching it to Mat. > > > > > > I was really wondering what is the philosophy behind this? The exact > algorithms we are using in PETSc right now? Where we are dealing with > this, preconditioner, linear solver, or nonlinear solver? > > > > It is in the Krylov solver. > > > > The idea is very simple. Say you have a singular A with null space > N (that all values Ny are in the null space of A. So N is tall and skinny) > and you want to solve A x = b where b is in the range of A. This problem > has an infinite number of solutions Ny + x* since A (Ny + x*) = ANy + > Ax* = Ax* = b where x* is the "minimum norm solution; that is Ax* = b and > x* has the smallest norm of all solutions. > > > > With left preconditioning B A x = B b GMRES, for example, > normally computes the solution in the as alpha_1 Bb + alpha_2 BABb + > alpha_3 BABABAb + .... but the B operator will likely introduce some > component into the direction of the null space so as GMRES continues the > "solution" computed will grow larger and larger with a large component in > the null space of A. Hence we simply modify GMRES a tiny bit by building > the solution from alpha_1 (I-N)Bb + alpha_2 (I-N)BABb + alpha_3 > (I-N)BABABAb + .... that is we remove from each new direction anything in > the direction of the null space. Hence the null space doesn't directly > appear in the preconditioner, just in the KSP method. If you attach a > null space to the matrix, the KSP just automatically uses it to do the > removal above. > > > > Barry, if identity matrix I is of size M x M (which is also the size of > A) then are you augmenting N (size M x R; R < M) by zero colums to make I > - N possible? If so it means that only first R values of vector Bb are used > for scaling zero Eigenvectors of A. Does this choice affect iteration > count, meaning one can arbitrarily choose any R values of the vector Bb to > scale zero eigenvectors of A? > > Yes I wasn't very precise here. I should have written it as as the > projection of the vector onto the complement of the null space which I > think can be written as I - N(N'N)^-1N' > This is done with the routine MatNullSpaceRemove(). The basis of the null > space you provide to MatNullSpaceCreate() should not effect the convergence > of the Krylov method at all since the same null space is removed regardless > of the basis. > I think we need to make sure that the basis vectors are orthogonal to each other and they are normalized. Right? Fande, > > Barry > > > > > > With right preconditioning the solution is built from alpha_1 b + > alpha_2 ABb + alpha_3 ABABb + .... and again we apply (I-N) to each term to > remove any part that is in the null space of A. > > > > Now consider the case A y = b where b is NOT in the range of A. So > the problem has no "true" solution, but one can find a least squares > solution by rewriting b = b_par + b_perp where b_par is in the range of A > and b_perp is orthogonal to the range of A and solve instead A x = > b_perp. If you provide a MatSetTransposeNullSpace() then KSP automatically > uses it to remove b_perp from the right hand side before starting the KSP > iterations. > > > > The manual pages for MatNullSpaceAttach() and > MatTranposeNullSpaceAttach() discuss this an explain how it relates to the > fundamental theorem of linear algebra. > > > > Note that for symmetric matrices the two null spaces are the same. > > > > Barry > > > > > > A different note: This "trick" is not a "cure all" for a totally > inappropriate preconditioner. For example if one uses for a preconditioner > a direct (sparse or dense) solver or an ILU(k) one can end up with a very > bad solver because the direct solver will likely produce a very small pivot > at some point thus the triangular solver applied in the precondition can > produce HUGE changes in the solution (that are not physical) and so the > preconditioner basically produces garbage. On the other hand sometimes it > works out ok. > > > > > > > > > > > > > Fande Kong, > > > > > > > > -- > > --Amneet > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Oct 12 22:21:56 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 12 Oct 2016 22:21:56 -0500 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: References: <5246D517-DAD5-4BEF-B262-644C4A1B18F7@mcs.anl.gov> Message-ID: <6A117E98-16DC-4610-9DF6-A50E62E91B64@mcs.anl.gov> > On Oct 12, 2016, at 10:18 PM, Fande Kong wrote: > > > > On Wed, Oct 12, 2016 at 9:12 PM, Barry Smith wrote: > > > On Oct 12, 2016, at 10:00 PM, Amneet Bhalla wrote: > > > > > > > > On Monday, October 10, 2016, Barry Smith wrote: > > > > > On Oct 10, 2016, at 4:01 PM, Kong, Fande wrote: > > > > > > Hi All, > > > > > > I know how to remove the null spaces from a singular system using creating a MatNullSpace and attaching it to Mat. > > > > > > I was really wondering what is the philosophy behind this? The exact algorithms we are using in PETSc right now? Where we are dealing with this, preconditioner, linear solver, or nonlinear solver? > > > > It is in the Krylov solver. > > > > The idea is very simple. Say you have a singular A with null space N (that all values Ny are in the null space of A. So N is tall and skinny) and you want to solve A x = b where b is in the range of A. This problem has an infinite number of solutions Ny + x* since A (Ny + x*) = ANy + Ax* = Ax* = b where x* is the "minimum norm solution; that is Ax* = b and x* has the smallest norm of all solutions. > > > > With left preconditioning B A x = B b GMRES, for example, normally computes the solution in the as alpha_1 Bb + alpha_2 BABb + alpha_3 BABABAb + .... but the B operator will likely introduce some component into the direction of the null space so as GMRES continues the "solution" computed will grow larger and larger with a large component in the null space of A. Hence we simply modify GMRES a tiny bit by building the solution from alpha_1 (I-N)Bb + alpha_2 (I-N)BABb + alpha_3 (I-N)BABABAb + .... that is we remove from each new direction anything in the direction of the null space. Hence the null space doesn't directly appear in the preconditioner, just in the KSP method. If you attach a null space to the matrix, the KSP just automatically uses it to do the removal above. > > > > Barry, if identity matrix I is of size M x M (which is also the size of A) then are you augmenting N (size M x R; R < M) by zero colums to make I - N possible? If so it means that only first R values of vector Bb are used for scaling zero Eigenvectors of A. Does this choice affect iteration count, meaning one can arbitrarily choose any R values of the vector Bb to scale zero eigenvectors of A? > > Yes I wasn't very precise here. I should have written it as as the projection of the vector onto the complement of the null space which I think can be written as I - N(N'N)^-1N' > This is done with the routine MatNullSpaceRemove(). The basis of the null space you provide to MatNullSpaceCreate() should not effect the convergence of the Krylov method at all since the same null space is removed regardless of the basis. > > I think we need to make sure that the basis vectors are orthogonal to each other and they are normalized. Right? Yes, MatNullSpaceCreate() should report an error if all the vectors are not orthonormal. #if defined(PETSC_USE_DEBUG) if (n) { PetscScalar *dots; for (i=0; i PETSC_SQRT_MACHINE_EPSILON) SETERRQ2(PetscObjectComm((PetscObject)vecs[i]),PETSC_ERR_ARG_WRONG,"Vector %D must have 2-norm of 1.0, it is %g",i,(double)norm); } if (has_cnst) { for (i=0; i PETSC_SQRT_MACHINE_EPSILON) SETERRQ2(PetscObjectComm((PetscObject)vecs[i]),PETSC_ERR_ARG_WRONG,"Vector %D must be orthogonal to constant vector, inner product is %g",i,(double)PetscAbsScalar(sum)); } } ierr = PetscMalloc1(n-1,&dots);CHKERRQ(ierr); for (i=0; i PETSC_SQRT_MACHINE_EPSILON) SETERRQ3(PetscObjectComm((PetscObject)vecs[i]),PETSC_ERR_ARG_WRONG,"Vector %D must be orthogonal to vector %D, inner product is %g",i,i+j+1,(double)PetscAbsScalar(dots[j])); } } PetscFree(dots);CHKERRQ(ierr); } #endif > > Fande, > > > Barry > > > > > > With right preconditioning the solution is built from alpha_1 b + alpha_2 ABb + alpha_3 ABABb + .... and again we apply (I-N) to each term to remove any part that is in the null space of A. > > > > Now consider the case A y = b where b is NOT in the range of A. So the problem has no "true" solution, but one can find a least squares solution by rewriting b = b_par + b_perp where b_par is in the range of A and b_perp is orthogonal to the range of A and solve instead A x = b_perp. If you provide a MatSetTransposeNullSpace() then KSP automatically uses it to remove b_perp from the right hand side before starting the KSP iterations. > > > > The manual pages for MatNullSpaceAttach() and MatTranposeNullSpaceAttach() discuss this an explain how it relates to the fundamental theorem of linear algebra. > > > > Note that for symmetric matrices the two null spaces are the same. > > > > Barry > > > > > > A different note: This "trick" is not a "cure all" for a totally inappropriate preconditioner. For example if one uses for a preconditioner a direct (sparse or dense) solver or an ILU(k) one can end up with a very bad solver because the direct solver will likely produce a very small pivot at some point thus the triangular solver applied in the precondition can produce HUGE changes in the solution (that are not physical) and so the preconditioner basically produces garbage. On the other hand sometimes it works out ok. > > > > > > > > > > > > > Fande Kong, > > > > > > > > -- > > --Amneet > > > > > > > > > > From jed at jedbrown.org Wed Oct 12 22:24:36 2016 From: jed at jedbrown.org (Jed Brown) Date: Wed, 12 Oct 2016 21:24:36 -0600 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: References: <5246D517-DAD5-4BEF-B262-644C4A1B18F7@mcs.anl.gov> Message-ID: <87h98h7wy3.fsf@jedbrown.org> Fande Kong writes: > I think we need to make sure that the basis vectors are orthogonal to each > other and they are normalized. Right? Yes, that is clearly stated in the man page and checked for in debug mode. The relevant code to remove the null space is if (sp->n) { ierr = VecMDot(vec,sp->n,sp->vecs,sp->alpha);CHKERRQ(ierr); for (i=0; in; i++) sp->alpha[i] = -sp->alpha[i]; ierr = VecMAXPY(vec,sp->n,sp->alpha,sp->vecs);CHKERRQ(ierr); } -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From bikash at umich.edu Wed Oct 12 22:26:10 2016 From: bikash at umich.edu (Bikash Kanungo) Date: Wed, 12 Oct 2016 23:26:10 -0400 Subject: [petsc-users] BVNormColumn Message-ID: Hi, I facing the following issue. I'm trying to use orthogonalize a set of vectors (all complex) with a non-standard inner product (.i.e. with BVSetMatrix). Let's call the basis vector to be BV and the matrix to be B. After certain number of iterations, I'm getting an error "The inner product is not well defined: nonzero imaginary part". I investigated this further. What I did was obtain the vec (column) which was throwing the error. Let's call the vec to be x and its column ID in BV to be j. I obtained x^H*B*x in two different ways: (1). by first getting y=B*x and then performing VecDot(x,y, dotXY), and (2) by using BVNormColumn(BV, j, NORM_2, normj). I'm doing this check even before calling the BVOrthogonalize routine. In principle, the value from (1) should be the square of the value from (2). For the iterations where I'm successful to perform the orthogonalization this check is satisfied. However, for the iteration where it fails with the above error, the value from (2) is zero. I'm unable to understand why this is the case. Thanks, Bikash -- Bikash S. Kanungo PhD Student Computational Materials Physics Group Mechanical Engineering University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdkong.jd at gmail.com Wed Oct 12 22:52:10 2016 From: fdkong.jd at gmail.com (Fande Kong) Date: Wed, 12 Oct 2016 21:52:10 -0600 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: <87h98h7wy3.fsf@jedbrown.org> References: <5246D517-DAD5-4BEF-B262-644C4A1B18F7@mcs.anl.gov> <87h98h7wy3.fsf@jedbrown.org> Message-ID: On Wed, Oct 12, 2016 at 9:24 PM, Jed Brown wrote: > Fande Kong writes: > > > I think we need to make sure that the basis vectors are orthogonal to > each > > other and they are normalized. Right? > > Yes, that is clearly stated in the man page and checked for in debug > mode. The relevant code to remove the null space is > > if (sp->n) { > ierr = VecMDot(vec,sp->n,sp->vecs,sp->alpha);CHKERRQ(ierr); > for (i=0; in; i++) sp->alpha[i] = -sp->alpha[i]; > ierr = VecMAXPY(vec,sp->n,sp->alpha,sp->vecs);CHKERRQ(ierr); > } > Right now, we are forcing users to provide orthogonal basis vectors. Is there any issue if we orthogonalize the arbibitry basis vectors provided by users in PETSc? And then users could pass arbitrary basis vectors without doing any preprocessing. Fande, -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Oct 12 23:06:30 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 12 Oct 2016 23:06:30 -0500 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: References: <5246D517-DAD5-4BEF-B262-644C4A1B18F7@mcs.anl.gov> <87h98h7wy3.fsf@jedbrown.org> Message-ID: <7BF60FB6-E2A6-45E7-9340-1F664F4F3B8E@mcs.anl.gov> > On Oct 12, 2016, at 10:52 PM, Fande Kong wrote: > > > > On Wed, Oct 12, 2016 at 9:24 PM, Jed Brown wrote: > Fande Kong writes: > > > I think we need to make sure that the basis vectors are orthogonal to each > > other and they are normalized. Right? > > Yes, that is clearly stated in the man page and checked for in debug > mode. The relevant code to remove the null space is > > if (sp->n) { > ierr = VecMDot(vec,sp->n,sp->vecs,sp->alpha);CHKERRQ(ierr); > for (i=0; in; i++) sp->alpha[i] = -sp->alpha[i]; > ierr = VecMAXPY(vec,sp->n,sp->alpha,sp->vecs);CHKERRQ(ierr); > } > > > Right now, we are forcing users to provide orthogonal basis vectors. Is there any issue if we orthogonalize the arbibitry basis vectors provided by users in PETSc? And then users could pass arbitrary basis vectors without doing any preprocessing. I would make that a separate routine that the users would call first. Barry > > Fande, > > > From jed at jedbrown.org Wed Oct 12 23:13:29 2016 From: jed at jedbrown.org (Jed Brown) Date: Wed, 12 Oct 2016 22:13:29 -0600 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: References: <5246D517-DAD5-4BEF-B262-644C4A1B18F7@mcs.anl.gov> <87h98h7wy3.fsf@jedbrown.org> Message-ID: <87bmyo9992.fsf@jedbrown.org> Fande Kong writes: > Right now, we are forcing users to provide orthogonal basis vectors. Is > there any issue if we orthogonalize the arbibitry basis vectors provided > by users in PETSc? And then users could pass arbitrary basis vectors > without doing any preprocessing. The function currently does not copy the vectors (to save storage). If you want to orthogonalize the vectors, you will need to copy them. If doing that, it may be better to use Householder/TSQR -- Gram-Schmidt doesn't produce particularly orthogonal matrices. (There are a few places in PETSc where Gram-Schmidt is naively assumed to produce a good orthogonal basis even when there is no compelling cost reason for doing so. I know I'm guilty.) -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From jed at jedbrown.org Wed Oct 12 23:21:50 2016 From: jed at jedbrown.org (Jed Brown) Date: Wed, 12 Oct 2016 22:21:50 -0600 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: <7BF60FB6-E2A6-45E7-9340-1F664F4F3B8E@mcs.anl.gov> References: <5246D517-DAD5-4BEF-B262-644C4A1B18F7@mcs.anl.gov> <87h98h7wy3.fsf@jedbrown.org> <7BF60FB6-E2A6-45E7-9340-1F664F4F3B8E@mcs.anl.gov> Message-ID: <878tts98v5.fsf@jedbrown.org> Barry Smith writes: > I would make that a separate routine that the users would call first. We have VecMDot and VecMAXPY. I would propose adding VecQR(PetscInt nvecs,Vec *vecs,PetscScalar *R); (where R can be NULL). Does anyone use the "Vecs" type? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From bsmith at mcs.anl.gov Wed Oct 12 23:45:06 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 12 Oct 2016 23:45:06 -0500 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: <7BF60FB6-E2A6-45E7-9340-1F664F4F3B8E@mcs.anl.gov> References: <5246D517-DAD5-4BEF-B262-644C4A1B18F7@mcs.anl.gov> <87h98h7wy3.fsf@jedbrown.org> <7BF60FB6-E2A6-45E7-9340-1F664F4F3B8E@mcs.anl.gov> Message-ID: <7CBF6F98-96E2-4A63-A75A-6629CA1AFB81@mcs.anl.gov> I also think it is best whenever possible to compute the orthonormal basis analytically rather than numerically. As Jed points out numerical orthogonalization generally does not provide full precision and that could matter. > On Oct 12, 2016, at 11:06 PM, Barry Smith wrote: > > >> On Oct 12, 2016, at 10:52 PM, Fande Kong wrote: >> >> >> >> On Wed, Oct 12, 2016 at 9:24 PM, Jed Brown wrote: >> Fande Kong writes: >> >>> I think we need to make sure that the basis vectors are orthogonal to each >>> other and they are normalized. Right? >> >> Yes, that is clearly stated in the man page and checked for in debug >> mode. The relevant code to remove the null space is >> >> if (sp->n) { >> ierr = VecMDot(vec,sp->n,sp->vecs,sp->alpha);CHKERRQ(ierr); >> for (i=0; in; i++) sp->alpha[i] = -sp->alpha[i]; >> ierr = VecMAXPY(vec,sp->n,sp->alpha,sp->vecs);CHKERRQ(ierr); >> } >> >> >> Right now, we are forcing users to provide orthogonal basis vectors. Is there any issue if we orthogonalize the arbibitry basis vectors provided by users in PETSc? And then users could pass arbitrary basis vectors without doing any preprocessing. > > I would make that a separate routine that the users would call first. > > Barry > >> >> Fande, >> >> >> > From jroman at dsic.upv.es Thu Oct 13 03:48:27 2016 From: jroman at dsic.upv.es (Jose E. Roman) Date: Thu, 13 Oct 2016 10:48:27 +0200 Subject: [petsc-users] BVNormColumn In-Reply-To: References: Message-ID: > El 13 oct 2016, a las 5:26, Bikash Kanungo escribi?: > > Hi, > > I facing the following issue. I'm trying to use orthogonalize a set of vectors (all complex) with a non-standard inner product (.i.e. with BVSetMatrix). Let's call the basis vector to be BV and the matrix to be B. After certain number of iterations, I'm getting an error "The inner product is not well defined: nonzero imaginary part". I investigated this further. What I did was obtain the vec (column) which was throwing the error. Let's call the vec to be x and its column ID in BV to be j. I obtained x^H*B*x in two different ways: (1). by first getting y=B*x and then performing VecDot(x,y, dotXY), and (2) by using BVNormColumn(BV, j, NORM_2, normj). I'm doing this check even before calling the BVOrthogonalize routine. > > In principle, the value from (1) should be the square of the value from (2). For the iterations where I'm successful to perform the orthogonalization this check is satisfied. However, for the iteration where it fails with the above error, the value from (2) is zero. I'm unable to understand why this is the case. > > Thanks, > Bikash Please note that to compute x^H*y you have to call VecDot(y,x,dot), with y first. Anyway, this does not matter for what you are reporting. Probably the call for (2) is aborting due to an error, so it does not return a value. Add CHKERRQ(ierr) after it. In general, it is always recommended to add this to every PETSc/SLEPc call, also in Fortran code (although SLEPc Fortran examples do not have it). One possible explanation for the error "The inner product is not well defined" is that the matrix is not exactly Hermitian, that is B^H-B is tiny but not zero. If this is the case, I would suggest explicitly making it Hermitian. Also, things could go bad if matrix B is ill-conditioned. Jose From fande.kong at inl.gov Thu Oct 13 09:06:07 2016 From: fande.kong at inl.gov (Kong, Fande) Date: Thu, 13 Oct 2016 08:06:07 -0600 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: <878tts98v5.fsf@jedbrown.org> References: <5246D517-DAD5-4BEF-B262-644C4A1B18F7@mcs.anl.gov> <87h98h7wy3.fsf@jedbrown.org> <7BF60FB6-E2A6-45E7-9340-1F664F4F3B8E@mcs.anl.gov> <878tts98v5.fsf@jedbrown.org> Message-ID: On Wed, Oct 12, 2016 at 10:21 PM, Jed Brown wrote: > Barry Smith writes: > > I would make that a separate routine that the users would call first. > > We have VecMDot and VecMAXPY. I would propose adding > > VecQR(PetscInt nvecs,Vec *vecs,PetscScalar *R); > > (where R can be NULL). > What does R mean here? If nobody working on this, I will be going to take a try. Fande, > > Does anyone use the "Vecs" type? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Oct 13 09:17:29 2016 From: jed at jedbrown.org (Jed Brown) Date: Thu, 13 Oct 2016 08:17:29 -0600 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: References: <5246D517-DAD5-4BEF-B262-644C4A1B18F7@mcs.anl.gov> <87h98h7wy3.fsf@jedbrown.org> <7BF60FB6-E2A6-45E7-9340-1F664F4F3B8E@mcs.anl.gov> <878tts98v5.fsf@jedbrown.org> Message-ID: <8737k08hae.fsf@jedbrown.org> "Kong, Fande" writes: > On Wed, Oct 12, 2016 at 10:21 PM, Jed Brown wrote: > >> Barry Smith writes: >> > I would make that a separate routine that the users would call first. >> >> We have VecMDot and VecMAXPY. I would propose adding >> >> VecQR(PetscInt nvecs,Vec *vecs,PetscScalar *R); >> >> (where R can be NULL). >> > > What does R mean here? It's a QR factorization, where the (right triangular) R is optional. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From knepley at gmail.com Thu Oct 13 09:23:52 2016 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 13 Oct 2016 09:23:52 -0500 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: References: <5246D517-DAD5-4BEF-B262-644C4A1B18F7@mcs.anl.gov> <87h98h7wy3.fsf@jedbrown.org> <7BF60FB6-E2A6-45E7-9340-1F664F4F3B8E@mcs.anl.gov> <878tts98v5.fsf@jedbrown.org> Message-ID: On Thu, Oct 13, 2016 at 9:06 AM, Kong, Fande wrote: > > > On Wed, Oct 12, 2016 at 10:21 PM, Jed Brown wrote: > >> Barry Smith writes: >> > I would make that a separate routine that the users would call first. >> >> We have VecMDot and VecMAXPY. I would propose adding >> >> VecQR(PetscInt nvecs,Vec *vecs,PetscScalar *R); >> >> (where R can be NULL). >> > > What does R mean here? > It means the coefficients of the old basis vectors in the new basis. Matt > If nobody working on this, I will be going to take a try. > > Fande, > > >> >> Does anyone use the "Vecs" type? >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From fande.kong at inl.gov Thu Oct 13 17:20:11 2016 From: fande.kong at inl.gov (Kong, Fande) Date: Thu, 13 Oct 2016 16:20:11 -0600 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: References: <5246D517-DAD5-4BEF-B262-644C4A1B18F7@mcs.anl.gov> <87h98h7wy3.fsf@jedbrown.org> <7BF60FB6-E2A6-45E7-9340-1F664F4F3B8E@mcs.anl.gov> <878tts98v5.fsf@jedbrown.org> Message-ID: One more question. Suppose that we are solving the singular linear system Ax = b. N(A) is the null space of A, and N(A^T) is the null space of the transpose of A. The linear system is solved using SNES, that is, F(x) = Ax-b = Ax -b_r - b_n. Here b_n in N(A^T), and b_r in R(A). During each nonlinear iteration, a linear system A \delta x = F(x) is solved. N(A) is applied to Krylov space during the linear iterating. Before the actual solve "(*ksp->ops->solve)(ksp)" for \delta x, a temporary copy of F(x) is made, F_tmp. N(A^T) is applied to F_tmp. We will get a \delta x. F(x+\delta x ) = A(x+\delta x)-b_r - b_n. F(x+\delta x ) always contain the vector b_n, and then the algorithm never converges because the normal of F is at least 1. Should we apply N(A^T) to F instead of F_tmp so that b_n can be removed from F? MatGetTransposeNullSpace(pmat,&nullsp); if (nullsp) { VecDuplicate(ksp->vec_rhs,&btmp); VecCopy(ksp->vec_rhs,btmp); MatNullSpaceRemove(nullsp,btmp); vec_rhs = ksp->vec_rhs; ksp->vec_rhs = btmp; } should be changed to MatGetTransposeNullSpace(pmat,&nullsp); if (nullsp) { MatNullSpaceRemove(nullsp,ksp->vec_rhs); } ??? Or other solutions to this issue? Fande Kong, On Thu, Oct 13, 2016 at 8:23 AM, Matthew Knepley wrote: > On Thu, Oct 13, 2016 at 9:06 AM, Kong, Fande wrote: > >> >> >> On Wed, Oct 12, 2016 at 10:21 PM, Jed Brown wrote: >> >>> Barry Smith writes: >>> > I would make that a separate routine that the users would call first. >>> >>> We have VecMDot and VecMAXPY. I would propose adding >>> >>> VecQR(PetscInt nvecs,Vec *vecs,PetscScalar *R); >>> >>> (where R can be NULL). >>> >> >> What does R mean here? >> > > It means the coefficients of the old basis vectors in the new basis. > > Matt > > >> If nobody working on this, I will be going to take a try. >> >> Fande, >> >> >>> >>> Does anyone use the "Vecs" type? >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From peetz2 at illinois.edu Thu Oct 13 17:32:32 2016 From: peetz2 at illinois.edu (Peetz, Darin T) Date: Thu, 13 Oct 2016 22:32:32 +0000 Subject: [petsc-users] Slepc eigenvectors not orthonormalized Message-ID: I've come across an irregularity when extracting the eigenvectors when using the CISS method to solve the eigenvalue problem. I'm solving a generalized hermitian problem, and it looks like the resulting eigenvectors are M-orthogonalized with each other (the M-inner products of different eigenvectors are approximately 0, as expected), but are normalized using the L2-inner product, not the M-inner product. Basically, the matrix V'*M*V (V being a matrix composed of the extracted eigenvectors) is diagonal, but the diagonals are much larger than 1, and the matrix V'*V has non-zero diagonals, but the diagonal elements are exactly equal to 1. This only happens if I use the CISS method. If I use the Arnoldi method for example, the eigenvectors are normalized as expected. Is there any particular reason for this, or is this an error in the implementation? Thanks, Darin -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Oct 13 17:41:03 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 13 Oct 2016 17:41:03 -0500 Subject: [petsc-users] Slepc eigenvectors not orthonormalized In-Reply-To: References: Message-ID: <8048EACC-1501-4414-A787-177C78CE3B64@mcs.anl.gov> Forwarding to slepc-maint > On Oct 13, 2016, at 5:32 PM, Peetz, Darin T wrote: > > I've come across an irregularity when extracting the eigenvectors when using the CISS method to solve the eigenvalue problem. I'm solving a generalized hermitian problem, and it looks like the resulting eigenvectors are M-orthogonalized with each other (the M-inner products of different eigenvectors are approximately 0, as expected), but are normalized using the L2-inner product, not the M-inner product. Basically, the matrix V'*M*V (V being a matrix composed of the extracted eigenvectors) is diagonal, but the diagonals are much larger than 1, and the matrix V'*V has non-zero diagonals, but the diagonal elements are exactly equal to 1. > > This only happens if I use the CISS method. If I use the Arnoldi method for example, the eigenvectors are normalized as expected. Is there any particular reason for this, or is this an error in the implementation? > > Thanks, > Darin From cmpierce at WPI.EDU Thu Oct 13 17:48:47 2016 From: cmpierce at WPI.EDU (Christopher Pierce) Date: Thu, 13 Oct 2016 18:48:47 -0400 Subject: [petsc-users] SLEPc: Convergence Problems Message-ID: <98e3ff90-72b2-251b-161d-cf8621cf9fc1@wpi.edu> Hello All, As there isn't a SLEPc specific list, it was recommended that I bring my question here. I am using SLEPc to solve a generalized eigenvalue problem generated as part of the Finite Element Method, but am having difficulty getting the diagonalizer to converge. I am worried that the method used to set boundary conditions in the matrix is creating the problem and am looking for other people's input. In order to set the boundary conditions, I find the list of IDs that should be zero in the resulting eigenvectors and then use MatZeroRowsColumns to zero the rows and columns and in the matrix A insert a large value such as 1E10 on each diagonal element that was zeroed and likewise for the B matrix except with the value 1.0. That way the eigenvalues resulting from those solutions are on the order of 1E10 and are outside of the region of interest for my problem. When I tried to diagonal the matrices I could only get converged solutions from the rqcg method which I have found to not scale well with my problem. When using any other method, the approximate error of the eigenpairs hovers around 1E00 and 1E01 until it reaches the max number of iterations. Could having so many identical eigenvalues (~1,000) in the spectrum be causing this to happen even if they are far outside of the range of interest? Thank, Chris Pierce WPI Center for Computation Nano-Science -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From bsmith at mcs.anl.gov Thu Oct 13 19:01:07 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 13 Oct 2016 19:01:07 -0500 Subject: [petsc-users] SLEPc: Convergence Problems In-Reply-To: <98e3ff90-72b2-251b-161d-cf8621cf9fc1@wpi.edu> References: <98e3ff90-72b2-251b-161d-cf8621cf9fc1@wpi.edu> Message-ID: <4E3ADE7D-73BC-42D3-B17C-EAD253DC801C@mcs.anl.gov> I would use MatGetSubMatrix() to pull out the part of the matrix you care about and hand that matrix off to SLEPc. Others prefer to remove the Dirichlet boundary value locations while doing the finite element assembly, this way those locations never appear in the matrix. The end result is the same, you have the slightly smaller matrix of interest to compute the eigenvalues from. Barry > On Oct 13, 2016, at 5:48 PM, Christopher Pierce wrote: > > Hello All, > > As there isn't a SLEPc specific list, it was recommended that I bring my > question here. I am using SLEPc to solve a generalized eigenvalue > problem generated as part of the Finite Element Method, but am having > difficulty getting the diagonalizer to converge. I am worried that the > method used to set boundary conditions in the matrix is creating the > problem and am looking for other people's input. > > In order to set the boundary conditions, I find the list of IDs that > should be zero in the resulting eigenvectors and then use > MatZeroRowsColumns to zero the rows and columns and in the matrix A > insert a large value such as 1E10 on each diagonal element that was > zeroed and likewise for the B matrix except with the value 1.0. That > way the eigenvalues resulting from those solutions are on the order of > 1E10 and are outside of the region of interest for my problem. > > When I tried to diagonal the matrices I could only get converged > solutions from the rqcg method which I have found to not scale well with > my problem. When using any other method, the approximate error of the > eigenpairs hovers around 1E00 and 1E01 until it reaches the max number > of iterations. Could having so many identical eigenvalues (~1,000) in > the spectrum be causing this to happen even if they are far outside of > the range of interest? > > Thank, > > Chris Pierce > WPI Center for Computation Nano-Science > > From bsmith at mcs.anl.gov Thu Oct 13 21:10:32 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 13 Oct 2016 21:10:32 -0500 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: References: <6C480615-29D4-4C1A-8FE1-2B42BC96A69C@mcs.anl.gov> Message-ID: <2A7986F8-B8C5-4E84-A530-9F8E48506D7D@mcs.anl.gov> Fande, I have done some work, mostly understanding and documentation, on handling singular systems with KSP in the branch barry/improve-matnullspace-usage. This also includes a new example that solves both a symmetric example and an example where nullspace(A) != nullspace(A') src/ksp/ksp/examples/tutorials/ex67.c My understanding is now documented in the manual page for KSPSolve(), part of this is quoted below: ------- If you provide a matrix that has a MatSetNullSpace() and MatSetTransposeNullSpace() this will use that information to solve singular systems in the least squares sense with a norm minimizing solution. $ $ A x = b where b = b_p + b_t where b_t is not in the range of A (and hence by the fundamental theorem of linear algebra is in the nullspace(A') see MatSetNullSpace() $ $ KSP first removes b_t producing the linear system A x = b_p (which has multiple solutions) and solves this to find the ||x|| minimizing solution (and hence $ it finds the solution x orthogonal to the nullspace(A). The algorithm is simply in each iteration of the Krylov method we remove the nullspace(A) from the search $ direction thus the solution which is a linear combination of the search directions has no component in the nullspace(A). $ $ We recommend always using GMRES for such singular systems. $ If nullspace(A) = nullspace(A') (note symmetric matrices always satisfy this property) then both left and right preconditioning will work $ If nullspace(A) != nullspace(A') then left preconditioning will work but right preconditioning may not work (or it may). Developer Note: The reason we cannot always solve nullspace(A) != nullspace(A') systems with right preconditioning is because we need to remove at each iteration the nullspace(AB) from the search direction. While we know the nullspace(A) the nullspace(AB) equals B^-1 times the nullspace(A) but except for trivial preconditioners such as diagonal scaling we cannot apply the inverse of the preconditioner to a vector and thus cannot compute the nullspace(AB). ------ Any feed back on the correctness or clarity of the material is appreciated. The punch line is that right preconditioning cannot be trusted with nullspace(A) != nullspace(A') I don't see any fix for this. Barry > On Oct 11, 2016, at 3:04 PM, Kong, Fande wrote: > > > > On Tue, Oct 11, 2016 at 12:18 PM, Barry Smith wrote: > > > On Oct 11, 2016, at 12:01 PM, Kong, Fande wrote: > > > > > > > > On Tue, Oct 11, 2016 at 10:39 AM, Barry Smith wrote: > > > > > On Oct 11, 2016, at 9:33 AM, Kong, Fande wrote: > > > > > > Barry, Thanks so much for your explanation. It helps me a lot. > > > > > > On Mon, Oct 10, 2016 at 4:00 PM, Barry Smith wrote: > > > > > > > On Oct 10, 2016, at 4:01 PM, Kong, Fande wrote: > > > > > > > > Hi All, > > > > > > > > I know how to remove the null spaces from a singular system using creating a MatNullSpace and attaching it to Mat. > > > > > > > > I was really wondering what is the philosophy behind this? The exact algorithms we are using in PETSc right now? Where we are dealing with this, preconditioner, linear solver, or nonlinear solver? > > > > > > It is in the Krylov solver. > > > > > > The idea is very simple. Say you have a singular A with null space N (that all values Ny are in the null space of A. So N is tall and skinny) and you want to solve A x = b where b is in the range of A. This problem has an infinite number of solutions Ny + x* since A (Ny + x*) = ANy + Ax* = Ax* = b where x* is the "minimum norm solution; that is Ax* = b and x* has the smallest norm of all solutions. > > > > > > With left preconditioning B A x = B b GMRES, for example, normally computes the solution in the as alpha_1 Bb + alpha_2 BABb + alpha_3 BABABAb + .... but the B operator will likely introduce some component into the direction of the null space so as GMRES continues the "solution" computed will grow larger and larger with a large component in the null space of A. Hence we simply modify GMRES a tiny bit by building the solution from alpha_1 (I-N)Bb + alpha_2 (I-N)BABb + alpha_3 > > > > > > Does "I" mean an identity matrix? Could you possibly send me a link for this GMRES implementation, that is, how PETSc does this in the actual code? > > > > Yes. > > > > It is in the helper routine KSP_PCApplyBAorAB() > > #undef __FUNCT__ > > #define __FUNCT__ "KSP_PCApplyBAorAB" > > PETSC_STATIC_INLINE PetscErrorCode KSP_PCApplyBAorAB(KSP ksp,Vec x,Vec y,Vec w) > > { > > PetscErrorCode ierr; > > PetscFunctionBegin; > > if (!ksp->transpose_solve) { > > ierr = PCApplyBAorAB(ksp->pc,ksp->pc_side,x,y,w);CHKERRQ(ierr); > > ierr = KSP_RemoveNullSpace(ksp,y);CHKERRQ(ierr); > > } else { > > ierr = PCApplyBAorABTranspose(ksp->pc,ksp->pc_side,x,y,w);CHKERRQ(ierr); > > } > > PetscFunctionReturn(0); > > } > > > > > > PETSC_STATIC_INLINE PetscErrorCode KSP_RemoveNullSpace(KSP ksp,Vec y) > > { > > PetscErrorCode ierr; > > PetscFunctionBegin; > > if (ksp->pc_side == PC_LEFT) { > > Mat A; > > MatNullSpace nullsp; > > ierr = PCGetOperators(ksp->pc,&A,NULL);CHKERRQ(ierr); > > ierr = MatGetNullSpace(A,&nullsp);CHKERRQ(ierr); > > if (nullsp) { > > ierr = MatNullSpaceRemove(nullsp,y);CHKERRQ(ierr); > > } > > } > > PetscFunctionReturn(0); > > } > > > > "ksp->pc_side == PC_LEFT" deals with the left preconditioning Krylov methods only? How about the right preconditioning ones? Are they just magically right for the right preconditioning Krylov methods? > > This is a good question. I am working on a branch now where I will add some more comprehensive testing of the various cases and fix anything that comes up. > > Were you having trouble with ASM and bjacobi only for right preconditioning? > > > Yes. ASM and bjacobi works fine for left preconditioning NOT for RIGHT preconditioning. bjacobi converges, but produces a wrong solution. ASM needs more iterations, however the solution is right. > > > > Note that when A is symmetric the range of A is orthogonal to null space of A so yes I think in that case it is just "magically right" but if A is not symmetric then I don't think it is "magically right". I'll work on it. > > > Barry > > > > > Fande Kong, > > > > > > There is no code directly in the GMRES or other methods. > > > > > > > > (I-N)BABABAb + .... that is we remove from each new direction anything in the direction of the null space. Hence the null space doesn't directly appear in the preconditioner, just in the KSP method. If you attach a null space to the matrix, the KSP just automatically uses it to do the removal above. > > > > > > With right preconditioning the solution is built from alpha_1 b + alpha_2 ABb + alpha_3 ABABb + .... and again we apply (I-N) to each term to remove any part that is in the null space of A. > > > > > > Now consider the case A y = b where b is NOT in the range of A. So the problem has no "true" solution, but one can find a least squares solution by rewriting b = b_par + b_perp where b_par is in the range of A and b_perp is orthogonal to the range of A and solve instead A x = b_perp. If you provide a MatSetTransposeNullSpace() then KSP automatically uses it to remove b_perp from the right hand side before starting the KSP iterations. > > > > > > The manual pages for MatNullSpaceAttach() and MatTranposeNullSpaceAttach() discuss this an explain how it relates to the fundamental theorem of linear algebra. > > > > > > Note that for symmetric matrices the two null spaces are the same. > > > > > > Barry > > > > > > > > > A different note: This "trick" is not a "cure all" for a totally inappropriate preconditioner. For example if one uses for a preconditioner a direct (sparse or dense) solver or an ILU(k) one can end up with a very bad solver because the direct solver will likely produce a very small pivot at some point thus the triangular solver applied in the precondition can produce HUGE changes in the solution (that are not physical) and so the preconditioner basically produces garbage. On the other hand sometimes it works out ok. > > > > > > What preconditioners are appropriate? asm, bjacobi, amg? I have an example which shows lu and ilu indeed work, but asm and bjacobi do not at all. That is why I am asking questions about algorithms. I am trying to figure out a default preconditioner for several singular systems. > > > > Hmm, normally asm and bjacobi would be fine with this unless one or more of the subblocks are themselves singular (which normally won't happen). AMG can also work find sometimes. > > > > Can you send a sample code? > > > > Barry > > > > > > > > Thanks again. > > > > > > > > > Fande Kong, > > > > > > > > > > > > > > > > > > > > > Fande Kong, > > From bsmith at mcs.anl.gov Thu Oct 13 21:21:37 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 13 Oct 2016 21:21:37 -0500 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: References: <5246D517-DAD5-4BEF-B262-644C4A1B18F7@mcs.anl.gov> <87h98h7wy3.fsf@jedbrown.org> <7BF60FB6-E2A6-45E7-9340-1F664F4F3B8E@mcs.anl.gov> <878tts98v5.fsf@jedbrown.org> Message-ID: <11333ED6-170F-4FE3-9727-6ACAE36E9669@mcs.anl.gov> Fande, What SNES method are you using? If you use SNESKSPONLY I think it is ok, it will solve for the norm minimizing least square solution during the one KSPSolve() and then return. Yes, if you use SNESNEWTONLS or others though the SNES solver will, as you say, think that progress has not been made. I do not like what you propose to do, changing the right hand side of the system the user provides is a nasty and surprising side effect. What is your goal? To make it look like the SNES system has had a residual norm reduction? We could generalize you question and ask what about solving for nonlinear problems: find the minimal norm solution of min_x || F(x) - b||. This may or may not belong in Tao, currently SNES doesn't do any kind of nonlinear least squares. Barry > On Oct 13, 2016, at 5:20 PM, Kong, Fande wrote: > > One more question. > > Suppose that we are solving the singular linear system Ax = b. N(A) is the null space of A, and N(A^T) is the null space of the transpose of A. > > The linear system is solved using SNES, that is, F(x) = Ax-b = Ax -b_r - b_n. Here b_n in N(A^T), and b_r in R(A). During each nonlinear iteration, a linear system A \delta x = F(x) is solved. N(A) is applied to Krylov space during the linear iterating. Before the actual solve "(*ksp->ops->solve)(ksp)" for \delta x, a temporary copy of F(x) is made, F_tmp. N(A^T) is applied to F_tmp. We will get a \delta x. F(x+\delta x ) = A(x+\delta x)-b_r - b_n. > > F(x+\delta x ) always contain the vector b_n, and then the algorithm never converges because the normal of F is at least 1. > > Should we apply N(A^T) to F instead of F_tmp so that b_n can be removed from F? > > MatGetTransposeNullSpace(pmat,&nullsp); > if (nullsp) { > VecDuplicate(ksp->vec_rhs,&btmp); > VecCopy(ksp->vec_rhs,btmp); > MatNullSpaceRemove(nullsp,btmp); > vec_rhs = ksp->vec_rhs; > ksp->vec_rhs = btmp; > } > > should be changed to > > MatGetTransposeNullSpace(pmat,&nullsp); > if (nullsp) { > MatNullSpaceRemove(nullsp,ksp->vec_rhs); > } > ??? > > Or other solutions to this issue? > > > Fande Kong, > > > > > > On Thu, Oct 13, 2016 at 8:23 AM, Matthew Knepley wrote: > On Thu, Oct 13, 2016 at 9:06 AM, Kong, Fande wrote: > > > On Wed, Oct 12, 2016 at 10:21 PM, Jed Brown wrote: > Barry Smith writes: > > I would make that a separate routine that the users would call first. > > We have VecMDot and VecMAXPY. I would propose adding > > VecQR(PetscInt nvecs,Vec *vecs,PetscScalar *R); > > (where R can be NULL). > > What does R mean here? > > It means the coefficients of the old basis vectors in the new basis. > > Matt > > If nobody working on this, I will be going to take a try. > > Fande, > > > Does anyone use the "Vecs" type? > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > From bsmith at mcs.anl.gov Thu Oct 13 22:45:56 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 13 Oct 2016 22:45:56 -0500 Subject: [petsc-users] large PetscCommDuplicate overhead In-Reply-To: <002901d224a0$0e18cf80$2a4a6e80$@capesim.com> References: <004201d21f3e$ed31c120$c7954360$@capesim.com> <1EF15B5B-168C-4FFD-98BB-4C49678C02FC@mcs.anl.gov> <001801d21fe8$a3e67970$ebb36c50$@capesim.com> <002b01d22403$aee809f0$0cb81dd0$@capesim.com> <87zima8tyu.fsf@jedbrown.org> <002901d224a0$0e18cf80$2a4a6e80$@capesim.com> Message-ID: <6FB8A549-F30C-478B-ACB9-49BC2840AB39@mcs.anl.gov> Mathew, Thanks for the additional information. This is all very weird since the same number of calls made to PetscCommDuplicate() are the same regardless of geometry and the time of the call shouldn't depend on the geometry. Would you be able to do another set of tests where you track the time in MPI_Get_attr() and MPI_Barrier() instead of PetscCommDuplicate()? It could be Cray did something "funny" in their implementation of PETSc. You could also try using the module petsc/3.7.3 instead of the cray-petsc module Thanks Barry > On Oct 12, 2016, at 10:48 AM, Matthew Overholt wrote: > > Jed, > > I realize that the PetscCommDuplicate (PCD) overhead I am seeing must be > only indirectly related to the problem size, etc., and I wouldn't be > surprised if it was an artifact of some sort related to my specific > algorithm. So you may not want to pursue this much further. However, I did > make three runs using the same Edison environment and code but different > input geometry files. Earlier I found a strong dependence on the number of > processes, so for this test I ran all of the tests on 1 node with 8 > processes (N=1, n=8). What I found was that the amount of PCD overhead was > geometry dependent, not size dependent. A moderately-sized simple geometry > (with relatively few ghosted vertices at the simple-planar interfaces) had > no PCD overhead, whereas both small and large complex geometries (with > relatively more ghosted vertices at the more-complex interfaces) had 5 - 6% > PCD overhead. The log files follow. > > Thanks, > Matt Overholt > > //////////////////////////////////////////////////////////////////////////// > ///// > // pllSs20 : Simple Pencil Shape with Small Planar Interfaces > // No PetscCommDuplicate() overhead in CrayPat sampling. > //////////////////////////////////////////////////////////////////////////// > ///// > ...713097 vertices...646400 elements (moderate size) > > Total run time was 47.853 seconds. > > ---------------------------------------------- PETSc Performance Summary: > ---------------------------------------------- > > Using Petsc Release Version 3.7.2, Jun, 05, 2016 > > Max Max/Min Avg Total > Time (sec): 4.822e+01 1.01022 4.780e+01 > Objects: 1.160e+02 1.00000 1.160e+02 > Flops: 1.425e+06 1.00000 1.425e+06 1.140e+07 > Flops/sec: 2.986e+04 1.01022 2.982e+04 2.385e+05 > MPI Messages: 1.140e+02 1.46154 9.500e+01 7.600e+02 > MPI Message Lengths: 7.202e+07 3.55575 2.888e+05 2.195e+08 > MPI Reductions: 2.600e+02 1.00000 > > Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- > -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total counts %Total > Avg %Total counts %Total > 0: Main Stage: 4.7802e+01 100.0% 1.1402e+07 100.0% 7.600e+02 100.0% > 2.888e+05 100.0% 2.590e+02 99.6% > > ---------------------------------------------------------------------------- > -------------------------------------------- > Event Count Time (sec) Flops > --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ---------------------------------------------------------------------------- > -------------------------------------------- > > --- Event Stage 0: Main Stage > > VecMax 8 1.0 3.8538e-03 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 > 8.0e+00 0 0 0 0 3 0 0 0 0 3 0 > VecNorm 8 1.0 1.8222e-03 1.0 7.13e+05 1.0 0.0e+00 0.0e+00 > 8.0e+00 0 50 0 0 3 0 50 0 0 3 3129 > VecCopy 8 1.0 5.5838e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 31 1.0 9.2564e-03 5.7 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAYPX 8 1.0 1.0111e-03 1.2 7.13e+05 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 50 0 0 0 0 50 0 0 0 5638 > VecAssemblyBegin 26 1.0 4.5311e-01 1.9 0.00e+00 0.0 2.1e+02 5.7e+05 > 7.8e+01 1 0 28 55 30 1 0 28 55 30 0 > VecAssemblyEnd 26 1.0 5.2852e-0211.7 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecScatterBegin 39 1.0 4.5546e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 3.9e+01 0 0 0 29 15 0 0 0 29 15 0 > VecScatterEnd 19 1.0 2.4319e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatSolve 8 1.0 1.3143e+00 1.0 0.00e+00 0.0 3.4e+02 2.0e+05 > 3.2e+01 3 0 44 31 12 3 0 44 31 12 0 > MatLUFactorSym 1 1.0 1.1012e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 5.0e+00 23 0 0 0 2 23 0 0 0 2 0 > MatLUFactorNum 8 1.0 3.1378e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 66 0 0 0 0 66 0 0 0 0 0 > MatAssemblyBegin 8 1.0 1.5738e-03 2.0 0.00e+00 0.0 1.7e+02 6.8e+04 > 1.6e+01 0 0 22 5 6 0 0 22 5 6 0 > MatAssemblyEnd 8 1.0 3.4761e-02 1.0 0.00e+00 0.0 2.8e+01 8.8e+02 > 2.3e+01 0 0 4 0 9 0 0 4 0 9 0 > MatGetRowIJ 1 1.0 2.4080e-05 5.9 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetOrdering 1 1.0 3.5882e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatZeroEntries 8 1.0 3.6283e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSetUp 1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 8 1.0 4.3712e+01 1.0 0.00e+00 0.0 3.4e+02 2.0e+05 > 4.3e+01 91 0 44 31 17 91 0 44 31 17 0 > PCSetUp 8 1.0 4.2397e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.1e+01 89 0 0 0 4 89 0 0 0 4 0 > PCApply 8 1.0 1.3144e+00 1.0 0.00e+00 0.0 3.4e+02 2.0e+05 > 3.2e+01 3 0 44 31 12 3 0 44 31 12 0 > ---------------------------------------------------------------------------- > -------------------------------------------- > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Vector 46 43 49344760 0. > Vector Scatter 23 22 18176 0. > Index Set 37 37 3279764 0. > IS L to G Mapping 1 0 0 0. > Matrix 6 6 43498680 0. > Krylov Solver 1 1 1160 0. > Preconditioner 1 1 992 0. > Viewer 1 0 0 0. > ============================================================================ > ============================================ > Average time to get PetscTime(): 5.00679e-07 > Average time for MPI_Barrier(): 1.81198e-06 > Average time for zero size MPI_Send(): 3.75509e-06 > > //////////////////////////////////////////////////////////////////////////// > ///// > // HBT-HP5 : Small Complex Shape with Complex Interfaces > // 5.3% PetscCommDuplicate() overhead in CrayPat sampling. > //////////////////////////////////////////////////////////////////////////// > ///// > ...50564 vertices...45420 elements (small size) > > Total run time was 4.863 seconds. > > ---------------------------------------------- PETSc Performance Summary: > ---------------------------------------------- > > Using Petsc Release Version 3.7.2, Jun, 05, 2016 > > Max Max/Min Avg Total > Time (sec): 4.930e+00 1.00841 4.894e+00 > Objects: 1.080e+02 1.00000 1.080e+02 > Flops: 7.539e+04 1.06240 7.267e+04 5.813e+05 > Flops/sec: 1.542e+04 1.06244 1.485e+04 1.188e+05 > MPI Messages: 2.070e+02 2.43529 1.628e+02 1.302e+03 > MPI Message Lengths: 7.824e+06 2.40897 2.965e+04 3.861e+07 > MPI Reductions: 2.320e+02 1.00000 > > Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- > -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total counts %Total > Avg %Total counts %Total > 0: Main Stage: 4.8945e+00 100.0% 5.8133e+05 100.0% 1.302e+03 100.0% > 2.965e+04 100.0% 2.310e+02 99.6% > > ---------------------------------------------------------------------------- > -------------------------------------------- > Event Count Time (sec) Flops > --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ---------------------------------------------------------------------------- > -------------------------------------------- > > --- Event Stage 0: Main Stage > > VecMax 6 1.0 1.1725e-0321.6 0.00e+00 0.0 0.0e+00 0.0e+00 > 6.0e+00 0 0 0 0 3 0 0 0 0 3 0 > VecNorm 6 1.0 2.2173e-02 1.0 3.77e+04 1.1 0.0e+00 0.0e+00 > 6.0e+00 0 50 0 0 3 0 50 0 0 3 13 > VecCopy 6 1.0 3.1948e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 27 1.0 4.0102e-04 4.3 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAYPX 6 1.0 6.3896e-05 1.2 3.77e+04 1.1 0.0e+00 0.0e+00 > 0.0e+00 0 50 0 0 0 0 50 0 0 0 4549 > VecAssemblyBegin 24 1.0 5.7409e-02 1.7 0.00e+00 0.0 3.1e+02 3.7e+04 > 7.2e+01 1 0 24 30 31 1 0 24 30 31 0 > VecAssemblyEnd 24 1.0 4.1070e-0322.5 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecScatterBegin 33 1.0 6.2511e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 3.3e+01 0 0 0 13 14 0 0 0 13 14 0 > VecScatterEnd 15 1.0 2.7657e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatSolve 6 1.0 1.3485e-01 1.0 0.00e+00 0.0 5.9e+02 5.8e+03 > 2.4e+01 3 0 45 9 10 3 0 45 9 10 0 > MatLUFactorSym 1 1.0 6.4367e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 5.0e+00 13 0 0 0 2 13 0 0 0 2 0 > MatLUFactorNum 6 1.0 3.5177e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 72 0 0 0 0 72 0 0 0 0 0 > MatAssemblyBegin 6 1.0 3.6111e-03 2.0 0.00e+00 0.0 3.1e+02 6.8e+04 > 1.2e+01 0 0 24 54 5 0 0 24 54 5 0 > MatAssemblyEnd 6 1.0 2.2925e-02 1.1 0.00e+00 0.0 6.8e+01 8.8e+02 > 1.9e+01 0 0 5 0 8 0 0 5 0 8 0 > MatGetRowIJ 1 1.0 4.0531e-06 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetOrdering 1 1.0 4.3869e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatZeroEntries 6 1.0 1.6685e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSetUp 1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 6 1.0 4.3020e+00 1.0 0.00e+00 0.0 5.9e+02 5.8e+03 > 3.5e+01 88 0 45 9 15 88 0 45 9 15 0 > PCSetUp 6 1.0 4.1670e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.1e+01 85 0 0 0 5 85 0 0 0 5 0 > PCApply 6 1.0 1.3486e-01 1.0 0.00e+00 0.0 5.9e+02 5.8e+03 > 2.4e+01 3 0 45 9 10 3 0 45 9 10 0 > ---------------------------------------------------------------------------- > -------------------------------------------- > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Vector 44 41 3635912 0. > Vector Scatter 21 20 90672 0. > Index Set 33 33 286872 0. > IS L to G Mapping 1 0 0 0. > Matrix 6 6 2988180 0. > Krylov Solver 1 1 1160 0. > Preconditioner 1 1 992 0. > Viewer 1 0 0 0. > ============================================================================ > ============================================ > Average time to get PetscTime(): 5.00679e-07 > Average time for MPI_Barrier(): 2.00272e-06 > Average time for zero size MPI_Send(): 3.12924e-06 > > //////////////////////////////////////////////////////////////////////////// > ///// > // GaNSi13 : Large Complex Shape with Complex Interfaces > // 6.4% PetscCommDuplicate() overhead in CrayPat sampling. > //////////////////////////////////////////////////////////////////////////// > ///// > ...1642311 vertices...1497368 elements (large size) > > Total run time was 260.958 seconds. > > ---------------------------------------------- PETSc Performance Summary: > ---------------------------------------------- > > Using Petsc Release Version 3.7.2, Jun, 05, 2016 > > Max Max/Min Avg Total > Time (sec): 2.619e+02 1.00319 2.611e+02 > Objects: 1.040e+02 1.00000 1.040e+02 > Flops: 2.050e+06 1.07341 1.969e+06 1.575e+07 > Flops/sec: 7.853e+03 1.07354 7.541e+03 6.032e+04 > MPI Messages: 1.835e+02 1.47390 1.448e+02 1.158e+03 > MPI Message Lengths: 1.761e+08 3.47614 4.801e+05 5.560e+08 > MPI Reductions: 2.180e+02 1.00000 > > Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- > -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total counts %Total > Avg %Total counts %Total > 0: Main Stage: 2.6114e+02 100.0% 1.5754e+07 100.0% 1.158e+03 100.0% > 4.801e+05 100.0% 2.170e+02 99.5% > > ---------------------------------------------------------------------------- > -------------------------------------------- > Event Count Time (sec) Flops > --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ---------------------------------------------------------------------------- > -------------------------------------------- > > --- Event Stage 0: Main Stage > > VecMax 5 1.0 5.5013e-03 7.3 0.00e+00 0.0 0.0e+00 0.0e+00 > 5.0e+00 0 0 0 0 2 0 0 0 0 2 0 > VecNorm 5 1.0 1.8921e-03 1.0 1.02e+06 1.1 0.0e+00 0.0e+00 > 5.0e+00 0 50 0 0 2 0 50 0 0 2 4163 > VecCopy 5 1.0 9.0218e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 25 1.0 2.4175e-0210.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAYPX 5 1.0 1.2169e-03 1.1 1.02e+06 1.1 0.0e+00 0.0e+00 > 0.0e+00 0 50 0 0 0 0 50 0 0 0 6473 > VecAssemblyBegin 23 1.0 2.6960e+00 1.6 0.00e+00 0.0 3.2e+02 9.2e+05 > 6.9e+01 1 0 28 54 32 1 0 28 54 32 0 > VecAssemblyEnd 23 1.0 1.2512e-0111.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecScatterBegin 30 1.0 1.3994e-01 3.3 0.00e+00 0.0 0.0e+00 0.0e+00 > 3.0e+01 0 0 0 19 14 0 0 0 19 14 0 > VecScatterEnd 13 1.0 4.6802e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatSolve 5 1.0 2.9838e+00 1.0 0.00e+00 0.0 3.7e+02 2.3e+05 > 2.0e+01 1 0 32 15 9 1 0 32 15 9 0 > MatLUFactorSym 1 1.0 2.8861e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 5.0e+00 11 0 0 0 2 11 0 0 0 2 0 > MatLUFactorNum 5 1.0 1.9893e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 76 0 0 0 0 76 0 0 0 0 0 > MatAssemblyBegin 5 1.0 1.6689e-02 2.9 0.00e+00 0.0 3.3e+02 3.7e+05 > 1.0e+01 0 0 28 22 5 0 0 28 22 5 0 > MatAssemblyEnd 5 1.0 5.6672e+00 1.0 0.00e+00 0.0 9.2e+01 4.5e+03 > 1.7e+01 2 0 8 0 8 2 0 8 0 8 0 > MatGetRowIJ 1 1.0 4.0531e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetOrdering 1 1.0 7.0381e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatZeroEntries 5 1.0 7.0859e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSetUp 1 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 5 1.0 2.3079e+02 1.0 0.00e+00 0.0 3.7e+02 2.3e+05 > 3.1e+01 88 0 32 15 14 88 0 32 15 14 0 > PCSetUp 5 1.0 2.2781e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.1e+01 87 0 0 0 5 87 0 0 0 5 0 > PCApply 5 1.0 2.9838e+00 1.0 0.00e+00 0.0 3.7e+02 2.3e+05 > 2.0e+01 1 0 32 15 9 1 0 32 15 9 0 > ---------------------------------------------------------------------------- > -------------------------------------------- > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Vector 43 40 114480760 0. > Vector Scatter 20 19 291760 0. > Index Set 31 31 5440808 0. > IS L to G Mapping 1 0 0 0. > Matrix 6 6 99689528 0. > Krylov Solver 1 1 1160 0. > Preconditioner 1 1 992 0. > Viewer 1 0 0 0. > ============================================================================ > ============================================ > Average time to get PetscTime(): 1.90735e-07 > Average time for MPI_Barrier(): 1.81198e-06 > Average time for zero size MPI_Send(): 3.75509e-06 > > -----Original Message----- > From: Jed Brown [mailto:jed at jedbrown.org] > Sent: Tuesday, October 11, 2016 5:19 PM > To: overholt at capesim.com; 'Barry Smith' > Cc: 'PETSc' > Subject: Re: [petsc-users] large PetscCommDuplicate overhead > > Matthew Overholt writes: > >> Barry, >> >> Subsequent tests with the same code and a problem (input) having a >> much smaller vertex (equation) count (i.e. a much smaller matrix to >> invert for the solution) have NOT had PetscCommDuplicate() account for >> any significant time, so I'm not surprised that your test didn't find any > problem. > > Can you re-run the large and small configurations with the same > code/environment and resend those logs? PetscCommDuplicate has nothing to > do with the problem size, so any difference in cost must be indirect, though > attribute access should be simple and independent. > > > --- > This email has been checked for viruses by Avast antivirus software. > https://www.avast.com/antivirus > From juan at tf.uni-kiel.de Thu Oct 13 23:13:46 2016 From: juan at tf.uni-kiel.de (Julian Andrej) Date: Fri, 14 Oct 2016 06:13:46 +0200 Subject: [petsc-users] SLEPc: Convergence Problems In-Reply-To: <4E3ADE7D-73BC-42D3-B17C-EAD253DC801C@mcs.anl.gov> References: <98e3ff90-72b2-251b-161d-cf8621cf9fc1@wpi.edu> <4E3ADE7D-73BC-42D3-B17C-EAD253DC801C@mcs.anl.gov> Message-ID: See this description from Jed http://scicomp.stackexchange.com/questions/3298/appropriate-space-for-weak-solutions-to-an-elliptical-pde-with-mixed-inhomogeneo/3300#3300. In a simpler way you could just scale your diagonal entries which are 1 at the moment with a value that is out of your interest range, such that the values do not appear in the solution. On Fri, Oct 14, 2016 at 2:01 AM, Barry Smith wrote: > > I would use MatGetSubMatrix() to pull out the part of the matrix you care about and hand that matrix off to SLEPc. > > Others prefer to remove the Dirichlet boundary value locations while doing the finite element assembly, this way those locations never appear in the matrix. > > The end result is the same, you have the slightly smaller matrix of interest to compute the eigenvalues from. > > > Barry > >> On Oct 13, 2016, at 5:48 PM, Christopher Pierce wrote: >> >> Hello All, >> >> As there isn't a SLEPc specific list, it was recommended that I bring my >> question here. I am using SLEPc to solve a generalized eigenvalue >> problem generated as part of the Finite Element Method, but am having >> difficulty getting the diagonalizer to converge. I am worried that the >> method used to set boundary conditions in the matrix is creating the >> problem and am looking for other people's input. >> >> In order to set the boundary conditions, I find the list of IDs that >> should be zero in the resulting eigenvectors and then use >> MatZeroRowsColumns to zero the rows and columns and in the matrix A >> insert a large value such as 1E10 on each diagonal element that was >> zeroed and likewise for the B matrix except with the value 1.0. That >> way the eigenvalues resulting from those solutions are on the order of >> 1E10 and are outside of the region of interest for my problem. >> >> When I tried to diagonal the matrices I could only get converged >> solutions from the rqcg method which I have found to not scale well with >> my problem. When using any other method, the approximate error of the >> eigenpairs hovers around 1E00 and 1E01 until it reaches the max number >> of iterations. Could having so many identical eigenvalues (~1,000) in >> the spectrum be causing this to happen even if they are far outside of >> the range of interest? >> >> Thank, >> >> Chris Pierce >> WPI Center for Computation Nano-Science >> >> > From cmpierce at WPI.EDU Fri Oct 14 00:43:35 2016 From: cmpierce at WPI.EDU (Christopher Pierce) Date: Fri, 14 Oct 2016 01:43:35 -0400 Subject: [petsc-users] SLEPc: Convergence Problems In-Reply-To: <4E3ADE7D-73BC-42D3-B17C-EAD253DC801C@mcs.anl.gov> References: <98e3ff90-72b2-251b-161d-cf8621cf9fc1@wpi.edu> <4E3ADE7D-73BC-42D3-B17C-EAD253DC801C@mcs.anl.gov> Message-ID: Thank You, That looks like what I need to do if the highly degenerate eigenpairs are my problem. I'll try that out this week and see if that helps. Chris On 10/13/16 20:01, Barry Smith wrote: > I would use MatGetSubMatrix() to pull out the part of the matrix you care about and hand that matrix off to SLEPc. > > Others prefer to remove the Dirichlet boundary value locations while doing the finite element assembly, this way those locations never appear in the matrix. > > The end result is the same, you have the slightly smaller matrix of interest to compute the eigenvalues from. > > > Barry > >> On Oct 13, 2016, at 5:48 PM, Christopher Pierce wrote: >> >> Hello All, >> >> As there isn't a SLEPc specific list, it was recommended that I bring my >> question here. I am using SLEPc to solve a generalized eigenvalue >> problem generated as part of the Finite Element Method, but am having >> difficulty getting the diagonalizer to converge. I am worried that the >> method used to set boundary conditions in the matrix is creating the >> problem and am looking for other people's input. >> >> In order to set the boundary conditions, I find the list of IDs that >> should be zero in the resulting eigenvectors and then use >> MatZeroRowsColumns to zero the rows and columns and in the matrix A >> insert a large value such as 1E10 on each diagonal element that was >> zeroed and likewise for the B matrix except with the value 1.0. That >> way the eigenvalues resulting from those solutions are on the order of >> 1E10 and are outside of the region of interest for my problem. >> >> When I tried to diagonal the matrices I could only get converged >> solutions from the rqcg method which I have found to not scale well with >> my problem. When using any other method, the approximate error of the >> eigenpairs hovers around 1E00 and 1E01 until it reaches the max number >> of iterations. Could having so many identical eigenvalues (~1,000) in >> the spectrum be causing this to happen even if they are far outside of >> the range of interest? >> >> Thank, >> >> Chris Pierce >> WPI Center for Computation Nano-Science >> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From jroman at dsic.upv.es Fri Oct 14 06:18:00 2016 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 14 Oct 2016 13:18:00 +0200 Subject: [petsc-users] Slepc eigenvectors not orthonormalized In-Reply-To: References: Message-ID: <6604E8A3-99D6-4785-82B2-91BFDC4F2DBA@dsic.upv.es> > El 14 oct 2016, a las 0:32, Peetz, Darin T escribi?: > > I've come across an irregularity when extracting the eigenvectors when using the CISS method to solve the eigenvalue problem. I'm solving a generalized hermitian problem, and it looks like the resulting eigenvectors are M-orthogonalized with each other (the M-inner products of different eigenvectors are approximately 0, as expected), but are normalized using the L2-inner product, not the M-inner product. Basically, the matrix V'*M*V (V being a matrix composed of the extracted eigenvectors) is diagonal, but the diagonals are much larger than 1, and the matrix V'*V has non-zero diagonals, but the diagonal elements are exactly equal to 1. > > This only happens if I use the CISS method. If I use the Arnoldi method for example, the eigenvectors are normalized as expected. Is there any particular reason for this, or is this an error in the implementation? > > Thanks, > Darin Thanks for reporting this. The fix is to add this line: eps->purify = PETSC_FALSE; anywhere in function EPSSetUp_CISS() (in file src/eps/impls/ciss/ciss.c). I will include the fix for future releases. Jose From hoeltgen at b-tu.de Fri Oct 14 09:20:18 2016 From: hoeltgen at b-tu.de (Laurent Hoeltgen) Date: Fri, 14 Oct 2016 16:20:18 +0200 Subject: [petsc-users] petsc and reading/writing image files Message-ID: <85C2604E-3C9F-4D0E-B852-F0FCD7265714@b-tu.de> Hi all, does petsc provide some functionality to read images files (png, pgm, jpg, ...) and also to write such files? I?d like to do some low level image processing tasks where I need direct access to the pixel values. So anything that gives me a vector or matrix with this data would be perfect. Best regards, Laurent % ------------------------------------------------------------- % Dr. Laurent Hoeltgen Chair for Applied Mathematics Brandenburg University of Technology Platz der Deutschen Einheit 1, HG 3.26 03046 Cottbus, Germany Email: hoeltgen at b-tu.de Web. http://www-user.tu-cottbus.de/~hoeltgen/ Tel. +49 (0) 355 69 20 77 Fax. +49 (0) 355 69 27 10 % ------------------------------------------------------------- % -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Oct 14 18:23:58 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 14 Oct 2016 18:23:58 -0500 Subject: [petsc-users] petsc and reading/writing image files In-Reply-To: <85C2604E-3C9F-4D0E-B852-F0FCD7265714@b-tu.de> References: <85C2604E-3C9F-4D0E-B852-F0FCD7265714@b-tu.de> Message-ID: <4AF886D5-5B15-46C5-912A-C73E1B220DAA@mcs.anl.gov> Lisandro has written some code in PETSc that writes such files. He has written it as a way to save images from PETSc graphics to files but it may be possible to extend and use for your purposes without too much difficulty. The code is in src/sys/classes/draw/utils/image.c If you add, for example, reading such files and other processing we would be very happy to accept it in a pull request. Barry > On Oct 14, 2016, at 9:20 AM, Laurent Hoeltgen wrote: > > Hi all, > > does petsc provide some functionality to read images files (png, pgm, jpg, ...) and also to write such files? I?d like to do some low level image processing tasks where I need direct access to the pixel values. So anything that gives me a vector or matrix with this data would be perfect. > > Best regards, > Laurent > > % ------------------------------------------------------------- % > Dr. Laurent Hoeltgen > > Chair for Applied Mathematics > Brandenburg University of Technology > Platz der Deutschen Einheit 1, HG 3.26 > 03046 Cottbus, Germany > > Email: hoeltgen at b-tu.de > Web. http://www-user.tu-cottbus.de/~hoeltgen/ > Tel. +49 (0) 355 69 20 77 > Fax. +49 (0) 355 69 27 10 > % ------------------------------------------------------------- % > > From pvsang002 at gmail.com Fri Oct 14 20:50:33 2016 From: pvsang002 at gmail.com (Sang pham van) Date: Sat, 15 Oct 2016 08:50:33 +0700 Subject: [petsc-users] partition of DM Vec entries Message-ID: Hi, I am using DM Vec for a FV code, for some reasons, I want to know partition of all ghost cells of a specific partition. is there a way do that? Many thanks. Best, -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Oct 14 20:59:58 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 14 Oct 2016 20:59:58 -0500 Subject: [petsc-users] partition of DM Vec entries In-Reply-To: References: Message-ID: <672021DF-A0A5-4ECE-9E50-345A7C928BBE@mcs.anl.gov> > On Oct 14, 2016, at 8:50 PM, Sang pham van wrote: > > Hi, > > I am using DM Vec for a FV code, for some reasons, I want to know partition of all ghost cells of a specific partition. is there a way do that? Could you please explain in more detail what you want, I don't understand? Perhaps give a specific example with 2 processes? Barry > > Many thanks. > > Best, > From pvsang002 at gmail.com Fri Oct 14 21:23:36 2016 From: pvsang002 at gmail.com (Sang pham van) Date: Sat, 15 Oct 2016 09:23:36 +0700 Subject: [petsc-users] partition of DM Vec entries In-Reply-To: <672021DF-A0A5-4ECE-9E50-345A7C928BBE@mcs.anl.gov> References: <672021DF-A0A5-4ECE-9E50-345A7C928BBE@mcs.anl.gov> Message-ID: Hi Barry, In 2 processes case, the problem is simple, as I know all ghost cells of partition 0 are updated from partition 1. However, in the case of many processes, how do I know from which partitions ghost cells of partition 0 are updated? In other words, How can I know neighboring partitions of the partition 0? and can I get a list of ghost cells managing by a neighboring partition? Please let me know if my question is still not clear. Many thanks. On Sat, Oct 15, 2016 at 8:59 AM, Barry Smith wrote: > > > On Oct 14, 2016, at 8:50 PM, Sang pham van wrote: > > > > Hi, > > > > I am using DM Vec for a FV code, for some reasons, I want to know > partition of all ghost cells of a specific partition. is there a way do > that? > > Could you please explain in more detail what you want, I don't > understand? Perhaps give a specific example with 2 processes? > > Barry > > > > > > > Many thanks. > > > > Best, > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Oct 14 21:40:20 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 14 Oct 2016 21:40:20 -0500 Subject: [petsc-users] partition of DM Vec entries In-Reply-To: References: <672021DF-A0A5-4ECE-9E50-345A7C928BBE@mcs.anl.gov> Message-ID: <5093D38E-3B70-493A-AE64-91006ABA5AF1@mcs.anl.gov> Thanks, the question is very clear now. For DMDA you can use DMDAGetNeighborsRank() to get the list of the (up to) 9 neighbors of a processor. (Sadly this routine does not have a manual page but the arguments are obvious). For other DM I don't think there is any simple way to get this information. For none of the DM is there a way to get information about what process is providing a specific ghost cell. It is the "hope" of PETSc (and I would think most parallel computing models) that the details of exactly what process is computing neighbor values should not matter for your own computation. Maybe if you provide more details on how you wish to use this information we may have suggestions on how to proceed. Barry > On Oct 14, 2016, at 9:23 PM, Sang pham van wrote: > > Hi Barry, > > In 2 processes case, the problem is simple, as I know all ghost cells of partition 0 are updated from partition 1. However, in the case of many processes, how do I know from which partitions ghost cells of partition 0 are updated? In other words, How can I know neighboring partitions of the partition 0? and can I get a list of ghost cells managing by a neighboring partition? > Please let me know if my question is still not clear. > > Many thanks. > > > On Sat, Oct 15, 2016 at 8:59 AM, Barry Smith wrote: > > > On Oct 14, 2016, at 8:50 PM, Sang pham van wrote: > > > > Hi, > > > > I am using DM Vec for a FV code, for some reasons, I want to know partition of all ghost cells of a specific partition. is there a way do that? > > Could you please explain in more detail what you want, I don't understand? Perhaps give a specific example with 2 processes? > > Barry > > > > > > > Many thanks. > > > > Best, > > > > From pvsang002 at gmail.com Fri Oct 14 21:54:38 2016 From: pvsang002 at gmail.com (Sang pham van) Date: Sat, 15 Oct 2016 09:54:38 +0700 Subject: [petsc-users] partition of DM Vec entries In-Reply-To: <5093D38E-3B70-493A-AE64-91006ABA5AF1@mcs.anl.gov> References: <672021DF-A0A5-4ECE-9E50-345A7C928BBE@mcs.anl.gov> <5093D38E-3B70-493A-AE64-91006ABA5AF1@mcs.anl.gov> Message-ID: Hi Barry, Thank your for your answer. I am writing a parallel code for smoothed-particle hydrodynamic, in this code I used a DMDA background mesh for management of particles. Each DMDA cell manages a number of particles, the number can change in both time and cell. In each time step, I need to update position and velocity of particles in border cells to neighbor partition. I think I can not use DMDA Vec to do this be cause the number of particles is not the same in all ghost cells. I think I am able to write a routine do this work, but the code may be quite complicated and not so "formal", I would be very appreciated if you can suggest a method to solve my problem. Many thanks. On Sat, Oct 15, 2016 at 9:40 AM, Barry Smith wrote: > > Thanks, the question is very clear now. > > For DMDA you can use DMDAGetNeighborsRank() to get the list of the (up > to) 9 neighbors of a processor. (Sadly this routine does not have a manual > page but the arguments are obvious). For other DM I don't think there is > any simple way to get this information. For none of the DM is there a way > to get information about what process is providing a specific ghost cell. > > It is the "hope" of PETSc (and I would think most parallel computing > models) that the details of exactly what process is computing neighbor > values should not matter for your own computation. Maybe if you provide > more details on how you wish to use this information we may have > suggestions on how to proceed. > > Barry > > > > > On Oct 14, 2016, at 9:23 PM, Sang pham van wrote: > > > > Hi Barry, > > > > In 2 processes case, the problem is simple, as I know all ghost cells of > partition 0 are updated from partition 1. However, in the case of many > processes, how do I know from which partitions ghost cells of partition 0 > are updated? In other words, How can I know neighboring partitions of the > partition 0? and can I get a list of ghost cells managing by a neighboring > partition? > > Please let me know if my question is still not clear. > > > > Many thanks. > > > > > > On Sat, Oct 15, 2016 at 8:59 AM, Barry Smith wrote: > > > > > On Oct 14, 2016, at 8:50 PM, Sang pham van > wrote: > > > > > > Hi, > > > > > > I am using DM Vec for a FV code, for some reasons, I want to know > partition of all ghost cells of a specific partition. is there a way do > that? > > > > Could you please explain in more detail what you want, I don't > understand? Perhaps give a specific example with 2 processes? > > > > Barry > > > > > > > > > > > > Many thanks. > > > > > > Best, > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Oct 14 22:13:20 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 14 Oct 2016 22:13:20 -0500 Subject: [petsc-users] partition of DM Vec entries In-Reply-To: References: <672021DF-A0A5-4ECE-9E50-345A7C928BBE@mcs.anl.gov> <5093D38E-3B70-493A-AE64-91006ABA5AF1@mcs.anl.gov> Message-ID: Unless the particles are more or less equally distributed over the the entire domain any kind of "domain decomposition" approach is questionably for managing the particles. Otherwise certain processes that have domains that contain most of the particles will have a great deal of work, for all of its particles, while domains with few particles will have little work. I can see two approaches to alleviate this problem. 1) constantly adjust the sizes/locations of the domains to load balance the particles per domain or 2) parallelize the particles (some how) instead of just the geometry. Anyways, there is a preliminary DMSWARM class in the development version of PETSc for helping to work with particles provided by Dave May. You might look at it. I don't know if it would useful for you or not. IMHO software library support for particle methods is still very primitive compared to finite difference/element support, in other words we still have a lot to do. Barry > On Oct 14, 2016, at 9:54 PM, Sang pham van wrote: > > Hi Barry, > > Thank your for your answer. I am writing a parallel code for smoothed-particle hydrodynamic, in this code I used a DMDA background mesh for management of particles. Each DMDA cell manages a number of particles, the number can change in both time and cell. In each time step, I need to update position and velocity of particles in border cells to neighbor partition. I think I can not use DMDA Vec to do this be cause the number of particles is not the same in all ghost cells. > > I think I am able to write a routine do this work, but the code may be quite complicated and not so "formal", I would be very appreciated if you can suggest a method to solve my problem. > > Many thanks. > > > > > On Sat, Oct 15, 2016 at 9:40 AM, Barry Smith wrote: > > Thanks, the question is very clear now. > > For DMDA you can use DMDAGetNeighborsRank() to get the list of the (up to) 9 neighbors of a processor. (Sadly this routine does not have a manual page but the arguments are obvious). For other DM I don't think there is any simple way to get this information. For none of the DM is there a way to get information about what process is providing a specific ghost cell. > > It is the "hope" of PETSc (and I would think most parallel computing models) that the details of exactly what process is computing neighbor values should not matter for your own computation. Maybe if you provide more details on how you wish to use this information we may have suggestions on how to proceed. > > Barry > > > > > On Oct 14, 2016, at 9:23 PM, Sang pham van wrote: > > > > Hi Barry, > > > > In 2 processes case, the problem is simple, as I know all ghost cells of partition 0 are updated from partition 1. However, in the case of many processes, how do I know from which partitions ghost cells of partition 0 are updated? In other words, How can I know neighboring partitions of the partition 0? and can I get a list of ghost cells managing by a neighboring partition? > > Please let me know if my question is still not clear. > > > > Many thanks. > > > > > > On Sat, Oct 15, 2016 at 8:59 AM, Barry Smith wrote: > > > > > On Oct 14, 2016, at 8:50 PM, Sang pham van wrote: > > > > > > Hi, > > > > > > I am using DM Vec for a FV code, for some reasons, I want to know partition of all ghost cells of a specific partition. is there a way do that? > > > > Could you please explain in more detail what you want, I don't understand? Perhaps give a specific example with 2 processes? > > > > Barry > > > > > > > > > > > > Many thanks. > > > > > > Best, > > > > > > > > > From pvsang002 at gmail.com Fri Oct 14 22:20:37 2016 From: pvsang002 at gmail.com (Sang pham van) Date: Sat, 15 Oct 2016 10:20:37 +0700 Subject: [petsc-users] partition of DM Vec entries In-Reply-To: References: <672021DF-A0A5-4ECE-9E50-345A7C928BBE@mcs.anl.gov> <5093D38E-3B70-493A-AE64-91006ABA5AF1@mcs.anl.gov> Message-ID: Hi Barry, Thank you very much for your suggestions and comments. I am very appreciated that! WIth my best regards, On Sat, Oct 15, 2016 at 10:13 AM, Barry Smith wrote: > > Unless the particles are more or less equally distributed over the the > entire domain any kind of "domain decomposition" approach is questionably > for managing the particles. Otherwise certain processes that have domains > that contain most of the particles will have a great deal of work, for all > of its particles, while domains with few particles will have little work. I > can see two approaches to alleviate this problem. > > 1) constantly adjust the sizes/locations of the domains to load balance > the particles per domain or > > 2) parallelize the particles (some how) instead of just the geometry. > > Anyways, there is a preliminary DMSWARM class in the development version > of PETSc for helping to work with particles provided by Dave May. You might > look at it. I don't know if it would useful for you or not. IMHO software > library support for particle methods is still very primitive compared to > finite difference/element support, in other words we still have a lot to do. > > > Barry > > > > > > > On Oct 14, 2016, at 9:54 PM, Sang pham van wrote: > > > > Hi Barry, > > > > Thank your for your answer. I am writing a parallel code for > smoothed-particle hydrodynamic, in this code I used a DMDA background mesh > for management of particles. Each DMDA cell manages a number of particles, > the number can change in both time and cell. In each time step, I need to > update position and velocity of particles in border cells to neighbor > partition. I think I can not use DMDA Vec to do this be cause the number of > particles is not the same in all ghost cells. > > > > I think I am able to write a routine do this work, but the code may be > quite complicated and not so "formal", I would be very appreciated if you > can suggest a method to solve my problem. > > > > Many thanks. > > > > > > > > > > On Sat, Oct 15, 2016 at 9:40 AM, Barry Smith wrote: > > > > Thanks, the question is very clear now. > > > > For DMDA you can use DMDAGetNeighborsRank() to get the list of the (up > to) 9 neighbors of a processor. (Sadly this routine does not have a manual > page but the arguments are obvious). For other DM I don't think there is > any simple way to get this information. For none of the DM is there a way > to get information about what process is providing a specific ghost cell. > > > > It is the "hope" of PETSc (and I would think most parallel computing > models) that the details of exactly what process is computing neighbor > values should not matter for your own computation. Maybe if you provide > more details on how you wish to use this information we may have > suggestions on how to proceed. > > > > Barry > > > > > > > > > On Oct 14, 2016, at 9:23 PM, Sang pham van > wrote: > > > > > > Hi Barry, > > > > > > In 2 processes case, the problem is simple, as I know all ghost cells > of partition 0 are updated from partition 1. However, in the case of many > processes, how do I know from which partitions ghost cells of partition 0 > are updated? In other words, How can I know neighboring partitions of the > partition 0? and can I get a list of ghost cells managing by a neighboring > partition? > > > Please let me know if my question is still not clear. > > > > > > Many thanks. > > > > > > > > > On Sat, Oct 15, 2016 at 8:59 AM, Barry Smith > wrote: > > > > > > > On Oct 14, 2016, at 8:50 PM, Sang pham van > wrote: > > > > > > > > Hi, > > > > > > > > I am using DM Vec for a FV code, for some reasons, I want to know > partition of all ghost cells of a specific partition. is there a way do > that? > > > > > > Could you please explain in more detail what you want, I don't > understand? Perhaps give a specific example with 2 processes? > > > > > > Barry > > > > > > > > > > > > > > > > > Many thanks. > > > > > > > > Best, > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Sat Oct 15 00:17:29 2016 From: dave.mayhem23 at gmail.com (Dave May) Date: Sat, 15 Oct 2016 06:17:29 +0100 Subject: [petsc-users] partition of DM Vec entries In-Reply-To: References: <672021DF-A0A5-4ECE-9E50-345A7C928BBE@mcs.anl.gov> <5093D38E-3B70-493A-AE64-91006ABA5AF1@mcs.anl.gov> Message-ID: On Saturday, 15 October 2016, Barry Smith wrote: > > Unless the particles are more or less equally distributed over the the > entire domain any kind of "domain decomposition" approach is questionably > for managing the particles. Otherwise certain processes that have domains > that contain most of the particles will have a great deal of work, for all > of its particles, while domains with few particles will have little work. I > can see two approaches to alleviate this problem. > > 1) constantly adjust the sizes/locations of the domains to load balance > the particles per domain or > > 2) parallelize the particles (some how) instead of just the geometry. > > Anyways, there is a preliminary DMSWARM class in the development version > of PETSc for helping to work with particles provided by Dave May. You might > look at it. I don't know if it would useful for you or not. IMHO software > library support for particle methods is still very primitive compared to > finite difference/element support, in other words we still have a lot to do. If you are using an SPH formulation with a constant smoothing length (such as for incompressible media), then DMSWARM will be extremely useful. It manages the assignment of fields on point clouds and managed data exchanges required for particle advection and gather operations from neighbor cells required for evaluating the SPH basis functions. DMSWARM is in the master branch. We would be happy if you want to be beta tester. The API is in its infancy and thus having a user play with what's there would be the best way to refine the design as required. Take a look at the examples and let us know if you need help. Thanks, Dave > > > Barry > > > > > > > On Oct 14, 2016, at 9:54 PM, Sang pham van > wrote: > > > > Hi Barry, > > > > Thank your for your answer. I am writing a parallel code for > smoothed-particle hydrodynamic, in this code I used a DMDA background mesh > for management of particles. Each DMDA cell manages a number of particles, > the number can change in both time and cell. In each time step, I need to > update position and velocity of particles in border cells to neighbor > partition. I think I can not use DMDA Vec to do this be cause the number of > particles is not the same in all ghost cells. > > > > I think I am able to write a routine do this work, but the code may be > quite complicated and not so "formal", I would be very appreciated if you > can suggest a method to solve my problem. > > > > Many thanks. > > > > > > > > > > On Sat, Oct 15, 2016 at 9:40 AM, Barry Smith > wrote: > > > > Thanks, the question is very clear now. > > > > For DMDA you can use DMDAGetNeighborsRank() to get the list of the (up > to) 9 neighbors of a processor. (Sadly this routine does not have a manual > page but the arguments are obvious). For other DM I don't think there is > any simple way to get this information. For none of the DM is there a way > to get information about what process is providing a specific ghost cell. > > > > It is the "hope" of PETSc (and I would think most parallel computing > models) that the details of exactly what process is computing neighbor > values should not matter for your own computation. Maybe if you provide > more details on how you wish to use this information we may have > suggestions on how to proceed. > > > > Barry > > > > > > > > > On Oct 14, 2016, at 9:23 PM, Sang pham van > wrote: > > > > > > Hi Barry, > > > > > > In 2 processes case, the problem is simple, as I know all ghost cells > of partition 0 are updated from partition 1. However, in the case of many > processes, how do I know from which partitions ghost cells of partition 0 > are updated? In other words, How can I know neighboring partitions of the > partition 0? and can I get a list of ghost cells managing by a neighboring > partition? > > > Please let me know if my question is still not clear. > > > > > > Many thanks. > > > > > > > > > On Sat, Oct 15, 2016 at 8:59 AM, Barry Smith > wrote: > > > > > > > On Oct 14, 2016, at 8:50 PM, Sang pham van > wrote: > > > > > > > > Hi, > > > > > > > > I am using DM Vec for a FV code, for some reasons, I want to know > partition of all ghost cells of a specific partition. is there a way do > that? > > > > > > Could you please explain in more detail what you want, I don't > understand? Perhaps give a specific example with 2 processes? > > > > > > Barry > > > > > > > > > > > > > > > > > Many thanks. > > > > > > > > Best, > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat Oct 15 00:19:54 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 15 Oct 2016 00:19:54 -0500 Subject: [petsc-users] partition of DM Vec entries In-Reply-To: References: <672021DF-A0A5-4ECE-9E50-345A7C928BBE@mcs.anl.gov> <5093D38E-3B70-493A-AE64-91006ABA5AF1@mcs.anl.gov> Message-ID: <0A65C380-5676-4BFC-8051-B033517A7271@mcs.anl.gov> This sounds great. > On Oct 15, 2016, at 12:17 AM, Dave May wrote: > > > > On Saturday, 15 October 2016, Barry Smith wrote: > > Unless the particles are more or less equally distributed over the the entire domain any kind of "domain decomposition" approach is questionably for managing the particles. Otherwise certain processes that have domains that contain most of the particles will have a great deal of work, for all of its particles, while domains with few particles will have little work. I can see two approaches to alleviate this problem. > > 1) constantly adjust the sizes/locations of the domains to load balance the particles per domain or > > 2) parallelize the particles (some how) instead of just the geometry. > > Anyways, there is a preliminary DMSWARM class in the development version of PETSc for helping to work with particles provided by Dave May. You might look at it. I don't know if it would useful for you or not. IMHO software library support for particle methods is still very primitive compared to finite difference/element support, in other words we still have a lot to do. > > If you are using an SPH formulation with a constant smoothing length (such as for incompressible media), then DMSWARM will be extremely useful. It manages the assignment of fields on point clouds and managed data exchanges required for particle advection and gather operations from neighbor cells required for evaluating the SPH basis functions. > > DMSWARM is in the master branch. We would be happy if you want to be beta tester. The API is in its infancy and thus having a user play with what's there would be the best way to refine the design as required. > > Take a look at the examples and let us know if you need help. > > Thanks, > Dave > > > > Barry > > > > > > > On Oct 14, 2016, at 9:54 PM, Sang pham van wrote: > > > > Hi Barry, > > > > Thank your for your answer. I am writing a parallel code for smoothed-particle hydrodynamic, in this code I used a DMDA background mesh for management of particles. Each DMDA cell manages a number of particles, the number can change in both time and cell. In each time step, I need to update position and velocity of particles in border cells to neighbor partition. I think I can not use DMDA Vec to do this be cause the number of particles is not the same in all ghost cells. > > > > I think I am able to write a routine do this work, but the code may be quite complicated and not so "formal", I would be very appreciated if you can suggest a method to solve my problem. > > > > Many thanks. > > > > > > > > > > On Sat, Oct 15, 2016 at 9:40 AM, Barry Smith wrote: > > > > Thanks, the question is very clear now. > > > > For DMDA you can use DMDAGetNeighborsRank() to get the list of the (up to) 9 neighbors of a processor. (Sadly this routine does not have a manual page but the arguments are obvious). For other DM I don't think there is any simple way to get this information. For none of the DM is there a way to get information about what process is providing a specific ghost cell. > > > > It is the "hope" of PETSc (and I would think most parallel computing models) that the details of exactly what process is computing neighbor values should not matter for your own computation. Maybe if you provide more details on how you wish to use this information we may have suggestions on how to proceed. > > > > Barry > > > > > > > > > On Oct 14, 2016, at 9:23 PM, Sang pham van wrote: > > > > > > Hi Barry, > > > > > > In 2 processes case, the problem is simple, as I know all ghost cells of partition 0 are updated from partition 1. However, in the case of many processes, how do I know from which partitions ghost cells of partition 0 are updated? In other words, How can I know neighboring partitions of the partition 0? and can I get a list of ghost cells managing by a neighboring partition? > > > Please let me know if my question is still not clear. > > > > > > Many thanks. > > > > > > > > > On Sat, Oct 15, 2016 at 8:59 AM, Barry Smith wrote: > > > > > > > On Oct 14, 2016, at 8:50 PM, Sang pham van wrote: > > > > > > > > Hi, > > > > > > > > I am using DM Vec for a FV code, for some reasons, I want to know partition of all ghost cells of a specific partition. is there a way do that? > > > > > > Could you please explain in more detail what you want, I don't understand? Perhaps give a specific example with 2 processes? > > > > > > Barry > > > > > > > > > > > > > > > > > Many thanks. > > > > > > > > Best, > > > > > > > > > > > > > > From dave.mayhem23 at gmail.com Sat Oct 15 00:29:07 2016 From: dave.mayhem23 at gmail.com (Dave May) Date: Sat, 15 Oct 2016 06:29:07 +0100 Subject: [petsc-users] partition of DM Vec entries In-Reply-To: References: <672021DF-A0A5-4ECE-9E50-345A7C928BBE@mcs.anl.gov> <5093D38E-3B70-493A-AE64-91006ABA5AF1@mcs.anl.gov> Message-ID: On 15 October 2016 at 06:17, Dave May wrote: > > > On Saturday, 15 October 2016, Barry Smith wrote: > >> >> Unless the particles are more or less equally distributed over the the >> entire domain any kind of "domain decomposition" approach is questionably >> for managing the particles. Otherwise certain processes that have domains >> that contain most of the particles will have a great deal of work, for all >> of its particles, while domains with few particles will have little work. I >> can see two approaches to alleviate this problem. >> >> 1) constantly adjust the sizes/locations of the domains to load balance >> the particles per domain or >> >> 2) parallelize the particles (some how) instead of just the geometry. >> >> Anyways, there is a preliminary DMSWARM class in the development version >> of PETSc for helping to work with particles provided by Dave May. You might >> look at it. I don't know if it would useful for you or not. IMHO software >> library support for particle methods is still very primitive compared to >> finite difference/element support, in other words we still have a lot to do. > > > If you are using an SPH formulation with a constant smoothing length (such > as for incompressible media), then DMSWARM will be extremely useful. It > manages the assignment of fields on point clouds and managed data exchanges > required for particle advection and gather operations from neighbor > cells required for evaluating the SPH basis functions. > > DMSWARM is in the master branch. We would be happy if you want to be beta > tester. The API is in its infancy and thus having a user play with what's > there would be the best way to refine the design as required. > > Take a look at the examples and let us know if you need help. > Specifically look at these examples (in the order I've listed) * src/dm/examples/tutorials/swarm_ex2.c Demonstrates how to create the swarm, register fields within the swarm and how to represent these fields as PETSc Vec objects. * src/dm/examples/tutorials/swarm_ex3.c This demonstrates how you push particles from one sub-domain to another. * src/dm/examples/tutorials/swarm_ex1.c This demonstrates how to define a collection operation to gather particles from neighbour cells (cells being defined via DMDA) There isn't a single complete example using a DMSWARM and DMDA for everything required by SPH, but all the plumbing is in place. Thanks, Dave > > Thanks, > Dave > > >> >> >> Barry >> >> >> >> >> >> > On Oct 14, 2016, at 9:54 PM, Sang pham van wrote: >> > >> > Hi Barry, >> > >> > Thank your for your answer. I am writing a parallel code for >> smoothed-particle hydrodynamic, in this code I used a DMDA background mesh >> for management of particles. Each DMDA cell manages a number of particles, >> the number can change in both time and cell. In each time step, I need to >> update position and velocity of particles in border cells to neighbor >> partition. I think I can not use DMDA Vec to do this be cause the number of >> particles is not the same in all ghost cells. >> > >> > I think I am able to write a routine do this work, but the code may be >> quite complicated and not so "formal", I would be very appreciated if you >> can suggest a method to solve my problem. >> > >> > Many thanks. >> > >> > >> > >> > >> > On Sat, Oct 15, 2016 at 9:40 AM, Barry Smith >> wrote: >> > >> > Thanks, the question is very clear now. >> > >> > For DMDA you can use DMDAGetNeighborsRank() to get the list of the >> (up to) 9 neighbors of a processor. (Sadly this routine does not have a >> manual page but the arguments are obvious). For other DM I don't think >> there is any simple way to get this information. For none of the DM is >> there a way to get information about what process is providing a specific >> ghost cell. >> > >> > It is the "hope" of PETSc (and I would think most parallel computing >> models) that the details of exactly what process is computing neighbor >> values should not matter for your own computation. Maybe if you provide >> more details on how you wish to use this information we may have >> suggestions on how to proceed. >> > >> > Barry >> > >> > >> > >> > > On Oct 14, 2016, at 9:23 PM, Sang pham van >> wrote: >> > > >> > > Hi Barry, >> > > >> > > In 2 processes case, the problem is simple, as I know all ghost cells >> of partition 0 are updated from partition 1. However, in the case of many >> processes, how do I know from which partitions ghost cells of partition 0 >> are updated? In other words, How can I know neighboring partitions of the >> partition 0? and can I get a list of ghost cells managing by a neighboring >> partition? >> > > Please let me know if my question is still not clear. >> > > >> > > Many thanks. >> > > >> > > >> > > On Sat, Oct 15, 2016 at 8:59 AM, Barry Smith >> wrote: >> > > >> > > > On Oct 14, 2016, at 8:50 PM, Sang pham van >> wrote: >> > > > >> > > > Hi, >> > > > >> > > > I am using DM Vec for a FV code, for some reasons, I want to know >> partition of all ghost cells of a specific partition. is there a way do >> that? >> > > >> > > Could you please explain in more detail what you want, I don't >> understand? Perhaps give a specific example with 2 processes? >> > > >> > > Barry >> > > >> > > >> > > >> > > > >> > > > Many thanks. >> > > > >> > > > Best, >> > > > >> > > >> > > >> > >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From pvsang002 at gmail.com Sat Oct 15 03:25:20 2016 From: pvsang002 at gmail.com (Sang pham van) Date: Sat, 15 Oct 2016 15:25:20 +0700 Subject: [petsc-users] partition of DM Vec entries In-Reply-To: References: <672021DF-A0A5-4ECE-9E50-345A7C928BBE@mcs.anl.gov> <5093D38E-3B70-493A-AE64-91006ABA5AF1@mcs.anl.gov> Message-ID: Hi Dave, Thank you very much for the useful examples! I have constant smoothing length in my problem, so it fits DMSWARM very well, I am happy to be a beta tester! Any question while using DSWARM, I will drop into this thread. Best, On Sat, Oct 15, 2016 at 12:29 PM, Dave May wrote: > > > On 15 October 2016 at 06:17, Dave May wrote: > >> >> >> On Saturday, 15 October 2016, Barry Smith wrote: >> >>> >>> Unless the particles are more or less equally distributed over the the >>> entire domain any kind of "domain decomposition" approach is questionably >>> for managing the particles. Otherwise certain processes that have domains >>> that contain most of the particles will have a great deal of work, for all >>> of its particles, while domains with few particles will have little work. I >>> can see two approaches to alleviate this problem. >>> >>> 1) constantly adjust the sizes/locations of the domains to load balance >>> the particles per domain or >>> >>> 2) parallelize the particles (some how) instead of just the geometry. >>> >>> Anyways, there is a preliminary DMSWARM class in the development version >>> of PETSc for helping to work with particles provided by Dave May. You might >>> look at it. I don't know if it would useful for you or not. IMHO software >>> library support for particle methods is still very primitive compared to >>> finite difference/element support, in other words we still have a lot to do. >> >> >> If you are using an SPH formulation with a constant smoothing length >> (such as for incompressible media), then DMSWARM will be extremely useful. >> It manages the assignment of fields on point clouds and managed data >> exchanges required for particle advection and gather operations from >> neighbor cells required for evaluating the SPH basis functions. >> >> DMSWARM is in the master branch. We would be happy if you want to be beta >> tester. The API is in its infancy and thus having a user play with what's >> there would be the best way to refine the design as required. >> >> Take a look at the examples and let us know if you need help. >> > > > Specifically look at these examples (in the order I've listed) > > * src/dm/examples/tutorials/swarm_ex2.c > Demonstrates how to create the swarm, register fields within the swarm and > how to represent these fields as PETSc Vec objects. > > * src/dm/examples/tutorials/swarm_ex3.c > This demonstrates how you push particles from one sub-domain to another. > > * src/dm/examples/tutorials/swarm_ex1.c > This demonstrates how to define a collection operation to gather particles > from neighbour cells (cells being defined via DMDA) > > There isn't a single complete example using a DMSWARM and DMDA for > everything required by SPH, but all the plumbing is in place. > > Thanks, > Dave > > >> >> Thanks, >> Dave >> >> >>> >>> >>> Barry >>> >>> >>> >>> >>> >>> > On Oct 14, 2016, at 9:54 PM, Sang pham van >>> wrote: >>> > >>> > Hi Barry, >>> > >>> > Thank your for your answer. I am writing a parallel code for >>> smoothed-particle hydrodynamic, in this code I used a DMDA background mesh >>> for management of particles. Each DMDA cell manages a number of particles, >>> the number can change in both time and cell. In each time step, I need to >>> update position and velocity of particles in border cells to neighbor >>> partition. I think I can not use DMDA Vec to do this be cause the number of >>> particles is not the same in all ghost cells. >>> > >>> > I think I am able to write a routine do this work, but the code may be >>> quite complicated and not so "formal", I would be very appreciated if you >>> can suggest a method to solve my problem. >>> > >>> > Many thanks. >>> > >>> > >>> > >>> > >>> > On Sat, Oct 15, 2016 at 9:40 AM, Barry Smith >>> wrote: >>> > >>> > Thanks, the question is very clear now. >>> > >>> > For DMDA you can use DMDAGetNeighborsRank() to get the list of the >>> (up to) 9 neighbors of a processor. (Sadly this routine does not have a >>> manual page but the arguments are obvious). For other DM I don't think >>> there is any simple way to get this information. For none of the DM is >>> there a way to get information about what process is providing a specific >>> ghost cell. >>> > >>> > It is the "hope" of PETSc (and I would think most parallel computing >>> models) that the details of exactly what process is computing neighbor >>> values should not matter for your own computation. Maybe if you provide >>> more details on how you wish to use this information we may have >>> suggestions on how to proceed. >>> > >>> > Barry >>> > >>> > >>> > >>> > > On Oct 14, 2016, at 9:23 PM, Sang pham van >>> wrote: >>> > > >>> > > Hi Barry, >>> > > >>> > > In 2 processes case, the problem is simple, as I know all ghost >>> cells of partition 0 are updated from partition 1. However, in the case of >>> many processes, how do I know from which partitions ghost cells of >>> partition 0 are updated? In other words, How can I know neighboring >>> partitions of the partition 0? and can I get a list of ghost cells managing >>> by a neighboring partition? >>> > > Please let me know if my question is still not clear. >>> > > >>> > > Many thanks. >>> > > >>> > > >>> > > On Sat, Oct 15, 2016 at 8:59 AM, Barry Smith >>> wrote: >>> > > >>> > > > On Oct 14, 2016, at 8:50 PM, Sang pham van >>> wrote: >>> > > > >>> > > > Hi, >>> > > > >>> > > > I am using DM Vec for a FV code, for some reasons, I want to know >>> partition of all ghost cells of a specific partition. is there a way do >>> that? >>> > > >>> > > Could you please explain in more detail what you want, I don't >>> understand? Perhaps give a specific example with 2 processes? >>> > > >>> > > Barry >>> > > >>> > > >>> > > >>> > > > >>> > > > Many thanks. >>> > > > >>> > > > Best, >>> > > > >>> > > >>> > > >>> > >>> > >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ztdepyahoo at 163.com Sun Oct 16 01:53:10 2016 From: ztdepyahoo at 163.com (=?GBK?B?tqHAz8qm?=) Date: Sun, 16 Oct 2016 14:53:10 +0800 (CST) Subject: [petsc-users] =?gbk?q?cannot_convert_=A1=AEint*=A1=AF_to_=A1=AEPe?= =?gbk?q?tscInt*?= Message-ID: <1fcdc895.2e68.157cc43fe0c.Coremail.ztdepyahoo@163.com> Dear professor: I met the following error for Petsc 3.7.3. I delcare LocalSize as int, but it doesn't work anymore. it works for 3.6.3. error: cannot convert ?int*? to ?PetscInt* {aka long int*}? for argument ?2? to ?PetscErrorCode VecGetLocalSize(Vec, PetscInt*)? VecGetLocalSize (Petsc_b, &LocalSize); Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Sun Oct 16 02:13:14 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Sun, 16 Oct 2016 02:13:14 -0500 Subject: [petsc-users] =?iso-8859-7?q?cannot_convert_=A1int*=A2_to_=A1Pets?= =?iso-8859-7?q?cInt*?= In-Reply-To: <1fcdc895.2e68.157cc43fe0c.Coremail.ztdepyahoo@163.com> References: <1fcdc895.2e68.157cc43fe0c.Coremail.ztdepyahoo@163.com> Message-ID: On Sun, 16 Oct 2016, ??? wrote: > Dear professor: > I met the following error for Petsc 3.7.3. > I delcare LocalSize as int, but it doesn't work anymore. it works for 3.6.3. > > error: cannot convert ?int*? to ?PetscInt* {aka long int*}? for argument ?2? to ?PetscErrorCode VecGetLocalSize(Vec, PetscInt*)? > VecGetLocalSize (Petsc_b, &LocalSize); 1. you should be using 'PetscInt' in your code - not 'int' [check the examples] 2. you must have built petsc-3.7.3 with --with-64-bit-indices=1 - hence this message. Satish From dave.mayhem23 at gmail.com Sun Oct 16 02:13:18 2016 From: dave.mayhem23 at gmail.com (Dave May) Date: Sun, 16 Oct 2016 08:13:18 +0100 Subject: [petsc-users] =?utf-8?b?Y2Fubm90IGNvbnZlcnQg4oCYaW50KuKAmSB0byA=?= =?utf-8?b?4oCYUGV0c2NJbnQq?= In-Reply-To: <1fcdc895.2e68.157cc43fe0c.Coremail.ztdepyahoo@163.com> References: <1fcdc895.2e68.157cc43fe0c.Coremail.ztdepyahoo@163.com> Message-ID: On Sunday, 16 October 2016, ??? wrote: > Dear professor: > I met the following error for Petsc 3.7.3. > I delcare LocalSize as int, but it doesn't work anymore. it works for > 3.6.3. > This error has nothing to do with the version of petsc. Whether it "worked" is dependent on the size of PetscInt which is configure/architecture dependent > > > error: cannot convert ?int*? to ?PetscInt* {aka long int*}? for > argument ?2? to ?PetscErrorCode VecGetLocalSize(Vec, PetscInt*)? > VecGetLocalSize (Petsc_b, &LocalSize); > > Regards > So just fix your code and declare LocalSize as a PetscInt. If you insist on representing it as an int (which in general is unsafe as PetscInt might be a 32-bit or 64-bit int), define a new variable and cast LocalInt to int > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fande.kong at inl.gov Mon Oct 17 10:29:55 2016 From: fande.kong at inl.gov (Kong, Fande) Date: Mon, 17 Oct 2016 09:29:55 -0600 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: <2A7986F8-B8C5-4E84-A530-9F8E48506D7D@mcs.anl.gov> References: <6C480615-29D4-4C1A-8FE1-2B42BC96A69C@mcs.anl.gov> <2A7986F8-B8C5-4E84-A530-9F8E48506D7D@mcs.anl.gov> Message-ID: Hi Barry, Thanks so much for this work. I will checkout your branch, and take a look. Thanks again! Fande Kong, On Thu, Oct 13, 2016 at 8:10 PM, Barry Smith wrote: > > Fande, > > I have done some work, mostly understanding and documentation, on > handling singular systems with KSP in the branch barry/improve-matnullspace-usage. > This also includes a new example that solves both a symmetric example and > an example where nullspace(A) != nullspace(A') src/ksp/ksp/examples/ > tutorials/ex67.c > > My understanding is now documented in the manual page for KSPSolve(), > part of this is quoted below: > > ------- > If you provide a matrix that has a MatSetNullSpace() and > MatSetTransposeNullSpace() this will use that information to solve singular > systems > in the least squares sense with a norm minimizing solution. > $ > $ A x = b where b = b_p + b_t where b_t is not in the > range of A (and hence by the fundamental theorem of linear algebra is in > the nullspace(A') see MatSetNullSpace() > $ > $ KSP first removes b_t producing the linear system A x = b_p (which > has multiple solutions) and solves this to find the ||x|| minimizing > solution (and hence > $ it finds the solution x orthogonal to the nullspace(A). The algorithm > is simply in each iteration of the Krylov method we remove the nullspace(A) > from the search > $ direction thus the solution which is a linear combination of the > search directions has no component in the nullspace(A). > $ > $ We recommend always using GMRES for such singular systems. > $ If nullspace(A) = nullspace(A') (note symmetric matrices always > satisfy this property) then both left and right preconditioning will work > $ If nullspace(A) != nullspace(A') then left preconditioning will work > but right preconditioning may not work (or it may). > > Developer Note: The reason we cannot always solve nullspace(A) != > nullspace(A') systems with right preconditioning is because we need to > remove at each iteration > the nullspace(AB) from the search direction. While we know the > nullspace(A) the nullspace(AB) equals B^-1 times the nullspace(A) but > except for trivial preconditioners > such as diagonal scaling we cannot apply the inverse of the > preconditioner to a vector and thus cannot compute the nullspace(AB). > ------ > > Any feed back on the correctness or clarity of the material is > appreciated. The punch line is that right preconditioning cannot be trusted > with nullspace(A) != nullspace(A') I don't see any fix for this. > > Barry > > > > > On Oct 11, 2016, at 3:04 PM, Kong, Fande wrote: > > > > > > > > On Tue, Oct 11, 2016 at 12:18 PM, Barry Smith > wrote: > > > > > On Oct 11, 2016, at 12:01 PM, Kong, Fande wrote: > > > > > > > > > > > > On Tue, Oct 11, 2016 at 10:39 AM, Barry Smith > wrote: > > > > > > > On Oct 11, 2016, at 9:33 AM, Kong, Fande wrote: > > > > > > > > Barry, Thanks so much for your explanation. It helps me a lot. > > > > > > > > On Mon, Oct 10, 2016 at 4:00 PM, Barry Smith > wrote: > > > > > > > > > On Oct 10, 2016, at 4:01 PM, Kong, Fande > wrote: > > > > > > > > > > Hi All, > > > > > > > > > > I know how to remove the null spaces from a singular system using > creating a MatNullSpace and attaching it to Mat. > > > > > > > > > > I was really wondering what is the philosophy behind this? The > exact algorithms we are using in PETSc right now? Where we are dealing > with this, preconditioner, linear solver, or nonlinear solver? > > > > > > > > It is in the Krylov solver. > > > > > > > > The idea is very simple. Say you have a singular A with null > space N (that all values Ny are in the null space of A. So N is tall and > skinny) and you want to solve A x = b where b is in the range of A. This > problem has an infinite number of solutions Ny + x* since A (Ny + x*) > = ANy + Ax* = Ax* = b where x* is the "minimum norm solution; that is Ax* = > b and x* has the smallest norm of all solutions. > > > > > > > > With left preconditioning B A x = B b GMRES, for example, > normally computes the solution in the as alpha_1 Bb + alpha_2 BABb + > alpha_3 BABABAb + .... but the B operator will likely introduce some > component into the direction of the null space so as GMRES continues the > "solution" computed will grow larger and larger with a large component in > the null space of A. Hence we simply modify GMRES a tiny bit by building > the solution from alpha_1 (I-N)Bb + alpha_2 (I-N)BABb + alpha_3 > > > > > > > > Does "I" mean an identity matrix? Could you possibly send me a link > for this GMRES implementation, that is, how PETSc does this in the actual > code? > > > > > > Yes. > > > > > > It is in the helper routine KSP_PCApplyBAorAB() > > > #undef __FUNCT__ > > > #define __FUNCT__ "KSP_PCApplyBAorAB" > > > PETSC_STATIC_INLINE PetscErrorCode KSP_PCApplyBAorAB(KSP ksp,Vec x,Vec > y,Vec w) > > > { > > > PetscErrorCode ierr; > > > PetscFunctionBegin; > > > if (!ksp->transpose_solve) { > > > ierr = PCApplyBAorAB(ksp->pc,ksp->pc_side,x,y,w);CHKERRQ(ierr); > > > ierr = KSP_RemoveNullSpace(ksp,y);CHKERRQ(ierr); > > > } else { > > > ierr = PCApplyBAorABTranspose(ksp->pc,ksp->pc_side,x,y,w); > CHKERRQ(ierr); > > > } > > > PetscFunctionReturn(0); > > > } > > > > > > > > > PETSC_STATIC_INLINE PetscErrorCode KSP_RemoveNullSpace(KSP ksp,Vec y) > > > { > > > PetscErrorCode ierr; > > > PetscFunctionBegin; > > > if (ksp->pc_side == PC_LEFT) { > > > Mat A; > > > MatNullSpace nullsp; > > > ierr = PCGetOperators(ksp->pc,&A,NULL);CHKERRQ(ierr); > > > ierr = MatGetNullSpace(A,&nullsp);CHKERRQ(ierr); > > > if (nullsp) { > > > ierr = MatNullSpaceRemove(nullsp,y);CHKERRQ(ierr); > > > } > > > } > > > PetscFunctionReturn(0); > > > } > > > > > > "ksp->pc_side == PC_LEFT" deals with the left preconditioning Krylov > methods only? How about the right preconditioning ones? Are they just > magically right for the right preconditioning Krylov methods? > > > > This is a good question. I am working on a branch now where I will > add some more comprehensive testing of the various cases and fix anything > that comes up. > > > > Were you having trouble with ASM and bjacobi only for right > preconditioning? > > > > > > Yes. ASM and bjacobi works fine for left preconditioning NOT for RIGHT > preconditioning. bjacobi converges, but produces a wrong solution. ASM > needs more iterations, however the solution is right. > > > > > > > > Note that when A is symmetric the range of A is orthogonal to null > space of A so yes I think in that case it is just "magically right" but if > A is not symmetric then I don't think it is "magically right". I'll work on > it. > > > > > > Barry > > > > > > > > Fande Kong, > > > > > > > > > There is no code directly in the GMRES or other methods. > > > > > > > > > > > (I-N)BABABAb + .... that is we remove from each new direction > anything in the direction of the null space. Hence the null space doesn't > directly appear in the preconditioner, just in the KSP method. If you > attach a null space to the matrix, the KSP just automatically uses it to do > the removal above. > > > > > > > > With right preconditioning the solution is built from alpha_1 b > + alpha_2 ABb + alpha_3 ABABb + .... and again we apply (I-N) to each term > to remove any part that is in the null space of A. > > > > > > > > Now consider the case A y = b where b is NOT in the range of A. > So the problem has no "true" solution, but one can find a least squares > solution by rewriting b = b_par + b_perp where b_par is in the range of A > and b_perp is orthogonal to the range of A and solve instead A x = > b_perp. If you provide a MatSetTransposeNullSpace() then KSP automatically > uses it to remove b_perp from the right hand side before starting the KSP > iterations. > > > > > > > > The manual pages for MatNullSpaceAttach() and > MatTranposeNullSpaceAttach() discuss this an explain how it relates to the > fundamental theorem of linear algebra. > > > > > > > > Note that for symmetric matrices the two null spaces are the same. > > > > > > > > Barry > > > > > > > > > > > > A different note: This "trick" is not a "cure all" for a totally > inappropriate preconditioner. For example if one uses for a preconditioner > a direct (sparse or dense) solver or an ILU(k) one can end up with a very > bad solver because the direct solver will likely produce a very small pivot > at some point thus the triangular solver applied in the precondition can > produce HUGE changes in the solution (that are not physical) and so the > preconditioner basically produces garbage. On the other hand sometimes it > works out ok. > > > > > > > > What preconditioners are appropriate? asm, bjacobi, amg? I have an > example which shows lu and ilu indeed work, but asm and bjacobi do not at > all. That is why I am asking questions about algorithms. I am trying to > figure out a default preconditioner for several singular systems. > > > > > > Hmm, normally asm and bjacobi would be fine with this unless one or > more of the subblocks are themselves singular (which normally won't > happen). AMG can also work find sometimes. > > > > > > Can you send a sample code? > > > > > > Barry > > > > > > > > > > > Thanks again. > > > > > > > > > > > > Fande Kong, > > > > > > > > > > > > > > > > > > > > > > > > > > > Fande Kong, > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmpierce at WPI.EDU Mon Oct 17 10:30:16 2016 From: cmpierce at WPI.EDU (Christopher Pierce) Date: Mon, 17 Oct 2016 11:30:16 -0400 Subject: [petsc-users] SLEPc: Convergence Problems In-Reply-To: <10558_1476423923_u9E5jFtZ026640_ea40cc78-d38b-a32c-d7d8-db83baba0e3e@wpi.edu> References: <98e3ff90-72b2-251b-161d-cf8621cf9fc1@wpi.edu> <4E3ADE7D-73BC-42D3-B17C-EAD253DC801C@mcs.anl.gov> <10558_1476423923_u9E5jFtZ026640_ea40cc78-d38b-a32c-d7d8-db83baba0e3e@wpi.edu> Message-ID: I've implemented my application using MatGetSubMatrix and the solvers appear to be converging correctly now, just slowly. I assume that this is due to the clustering of eigenvalues inherent to the problem that I'm using, however. I think that this should be enough to get me on track to solving problems with it. Thanks, Chris On 10/14/16 01:43, Christopher Pierce wrote: > Thank You, > > That looks like what I need to do if the highly degenerate eigenpairs > are my problem. I'll try that out this week and see if that helps. > > Chris > > > > > On 10/13/16 20:01, Barry Smith wrote: >> I would use MatGetSubMatrix() to pull out the part of the matrix you care about and hand that matrix off to SLEPc. >> >> Others prefer to remove the Dirichlet boundary value locations while doing the finite element assembly, this way those locations never appear in the matrix. >> >> The end result is the same, you have the slightly smaller matrix of interest to compute the eigenvalues from. >> >> >> Barry >> >>> On Oct 13, 2016, at 5:48 PM, Christopher Pierce wrote: >>> >>> Hello All, >>> >>> As there isn't a SLEPc specific list, it was recommended that I bring my >>> question here. I am using SLEPc to solve a generalized eigenvalue >>> problem generated as part of the Finite Element Method, but am having >>> difficulty getting the diagonalizer to converge. I am worried that the >>> method used to set boundary conditions in the matrix is creating the >>> problem and am looking for other people's input. >>> >>> In order to set the boundary conditions, I find the list of IDs that >>> should be zero in the resulting eigenvectors and then use >>> MatZeroRowsColumns to zero the rows and columns and in the matrix A >>> insert a large value such as 1E10 on each diagonal element that was >>> zeroed and likewise for the B matrix except with the value 1.0. That >>> way the eigenvalues resulting from those solutions are on the order of >>> 1E10 and are outside of the region of interest for my problem. >>> >>> When I tried to diagonal the matrices I could only get converged >>> solutions from the rqcg method which I have found to not scale well with >>> my problem. When using any other method, the approximate error of the >>> eigenpairs hovers around 1E00 and 1E01 until it reaches the max number >>> of iterations. Could having so many identical eigenvalues (~1,000) in >>> the spectrum be causing this to happen even if they are far outside of >>> the range of interest? >>> >>> Thank, >>> >>> Chris Pierce >>> WPI Center for Computation Nano-Science >>> >>> > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From fande.kong at inl.gov Mon Oct 17 10:47:32 2016 From: fande.kong at inl.gov (Kong, Fande) Date: Mon, 17 Oct 2016 09:47:32 -0600 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: <11333ED6-170F-4FE3-9727-6ACAE36E9669@mcs.anl.gov> References: <5246D517-DAD5-4BEF-B262-644C4A1B18F7@mcs.anl.gov> <87h98h7wy3.fsf@jedbrown.org> <7BF60FB6-E2A6-45E7-9340-1F664F4F3B8E@mcs.anl.gov> <878tts98v5.fsf@jedbrown.org> <11333ED6-170F-4FE3-9727-6ACAE36E9669@mcs.anl.gov> Message-ID: On Thu, Oct 13, 2016 at 8:21 PM, Barry Smith wrote: > > Fande, > > What SNES method are you using? If you use SNESKSPONLY I think it is > ok, it will solve for the norm minimizing least square solution during the > one KSPSolve() and then return. > The problem we are currently working on is a linear problem, but it could be extended to be nonlinear. Yes, you are right. "ksponly" indeed works, and returns the right solution. But the norm of residual still could confuse users because it is not close to zero. > > Yes, if you use SNESNEWTONLS or others though the SNES solver will, as > you say, think that progress has not been made. > > I do not like what you propose to do, changing the right hand side of > the system the user provides is a nasty and surprising side effect. > I do not like this way either. The reason I posted this code here is that I want to let you know what are inconsistent between the nonlinear solvers and the linear solvers. > > What is your goal? To make it look like the SNES system has had a > residual norm reduction? > Yes, I would like to make SNES have a residual reduction. Possibly, we could add something in the converged_test function? For example, the residual vector is temporarily subtracted when evaluating the residual norm if the system has a null space? > > We could generalize you question and ask what about solving for > nonlinear problems: find the minimal norm solution of min_x || F(x) - b||. > This may or may not belong in Tao, currently SNES doesn't do any kind of > nonlinear least squares. > It would be great, if we could add this kind of solvers. Tao does have one, I think. I would like to contribute something like this latter (of course, if you are ok with this algorithm), when we are moving to nonlinear problems in our applications. Fande Kong, > > Barry > > > > On Oct 13, 2016, at 5:20 PM, Kong, Fande wrote: > > > > One more question. > > > > Suppose that we are solving the singular linear system Ax = b. N(A) is > the null space of A, and N(A^T) is the null space of the transpose of A. > > > > The linear system is solved using SNES, that is, F(x) = Ax-b = Ax -b_r - > b_n. Here b_n in N(A^T), and b_r in R(A). During each nonlinear > iteration, a linear system A \delta x = F(x) is solved. N(A) is applied to > Krylov space during the linear iterating. Before the actual solve > "(*ksp->ops->solve)(ksp)" for \delta x, a temporary copy of F(x) is made, > F_tmp. N(A^T) is applied to F_tmp. We will get a \delta x. F(x+\delta x ) > = A(x+\delta x)-b_r - b_n. > > > > F(x+\delta x ) always contain the vector b_n, and then the algorithm > never converges because the normal of F is at least 1. > > > > Should we apply N(A^T) to F instead of F_tmp so that b_n can be removed > from F? > > > > MatGetTransposeNullSpace(pmat,&nullsp); > > if (nullsp) { > > VecDuplicate(ksp->vec_rhs,&btmp); > > VecCopy(ksp->vec_rhs,btmp); > > MatNullSpaceRemove(nullsp,btmp); > > vec_rhs = ksp->vec_rhs; > > ksp->vec_rhs = btmp; > > } > > > > should be changed to > > > > MatGetTransposeNullSpace(pmat,&nullsp); > > if (nullsp) { > > MatNullSpaceRemove(nullsp,ksp->vec_rhs); > > } > > ??? > > > > Or other solutions to this issue? > > > > > > Fande Kong, > > > > > > > > > > > > On Thu, Oct 13, 2016 at 8:23 AM, Matthew Knepley > wrote: > > On Thu, Oct 13, 2016 at 9:06 AM, Kong, Fande wrote: > > > > > > On Wed, Oct 12, 2016 at 10:21 PM, Jed Brown wrote: > > Barry Smith writes: > > > I would make that a separate routine that the users would call first. > > > > We have VecMDot and VecMAXPY. I would propose adding > > > > VecQR(PetscInt nvecs,Vec *vecs,PetscScalar *R); > > > > (where R can be NULL). > > > > What does R mean here? > > > > It means the coefficients of the old basis vectors in the new basis. > > > > Matt > > > > If nobody working on this, I will be going to take a try. > > > > Fande, > > > > > > Does anyone use the "Vecs" type? > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From juan at tf.uni-kiel.de Mon Oct 17 11:39:18 2016 From: juan at tf.uni-kiel.de (Julian Andrej) Date: Mon, 17 Oct 2016 18:39:18 +0200 Subject: [petsc-users] SLEPc: Convergence Problems In-Reply-To: References: <98e3ff90-72b2-251b-161d-cf8621cf9fc1@wpi.edu> <4E3ADE7D-73BC-42D3-B17C-EAD253DC801C@mcs.anl.gov> <10558_1476423923_u9E5jFtZ026640_ea40cc78-d38b-a32c-d7d8-db83baba0e3e@wpi.edu> Message-ID: Do you precondition your eigenvalue problem? If not, you should. Let us know what structure your matrix has and which blocks (if there are any) include which physics. Regards Julian On Mon, Oct 17, 2016 at 5:30 PM, Christopher Pierce wrote: > I've implemented my application using MatGetSubMatrix and the solvers > appear to be converging correctly now, just slowly. I assume that this > is due to the clustering of eigenvalues inherent to the problem that I'm > using, however. I think that this should be enough to get me on track > to solving problems with it. > > Thanks, > > Chris > > > On 10/14/16 01:43, Christopher Pierce wrote: >> Thank You, >> >> That looks like what I need to do if the highly degenerate eigenpairs >> are my problem. I'll try that out this week and see if that helps. >> >> Chris >> >> >> >> >> On 10/13/16 20:01, Barry Smith wrote: >>> I would use MatGetSubMatrix() to pull out the part of the matrix you care about and hand that matrix off to SLEPc. >>> >>> Others prefer to remove the Dirichlet boundary value locations while doing the finite element assembly, this way those locations never appear in the matrix. >>> >>> The end result is the same, you have the slightly smaller matrix of interest to compute the eigenvalues from. >>> >>> >>> Barry >>> >>>> On Oct 13, 2016, at 5:48 PM, Christopher Pierce wrote: >>>> >>>> Hello All, >>>> >>>> As there isn't a SLEPc specific list, it was recommended that I bring my >>>> question here. I am using SLEPc to solve a generalized eigenvalue >>>> problem generated as part of the Finite Element Method, but am having >>>> difficulty getting the diagonalizer to converge. I am worried that the >>>> method used to set boundary conditions in the matrix is creating the >>>> problem and am looking for other people's input. >>>> >>>> In order to set the boundary conditions, I find the list of IDs that >>>> should be zero in the resulting eigenvectors and then use >>>> MatZeroRowsColumns to zero the rows and columns and in the matrix A >>>> insert a large value such as 1E10 on each diagonal element that was >>>> zeroed and likewise for the B matrix except with the value 1.0. That >>>> way the eigenvalues resulting from those solutions are on the order of >>>> 1E10 and are outside of the region of interest for my problem. >>>> >>>> When I tried to diagonal the matrices I could only get converged >>>> solutions from the rqcg method which I have found to not scale well with >>>> my problem. When using any other method, the approximate error of the >>>> eigenpairs hovers around 1E00 and 1E01 until it reaches the max number >>>> of iterations. Could having so many identical eigenvalues (~1,000) in >>>> the spectrum be causing this to happen even if they are far outside of >>>> the range of interest? >>>> >>>> Thank, >>>> >>>> Chris Pierce >>>> WPI Center for Computation Nano-Science >>>> >>>> >> > > From hgbk2008 at gmail.com Mon Oct 17 11:57:38 2016 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Mon, 17 Oct 2016 18:57:38 +0200 Subject: [petsc-users] preconditioner for contact / mesh tying problem Message-ID: Dear PETSc folks, While searching literature on the preconditioner for contact/mesh tying problem, I saw the paper by Dr. Adams "Algebraic multigrid methods for constrained linear systems with applications to contact problems in solid mechanics, NLAA, 2004". Given the promising aspects the paper has shown for constrained linear system, I wonder if some code's also available in PETSc for testing/further extension? Thanks Giang -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Oct 17 14:05:55 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 17 Oct 2016 14:05:55 -0500 Subject: [petsc-users] Algorithms to remove null spaces in a singular system In-Reply-To: References: <5246D517-DAD5-4BEF-B262-644C4A1B18F7@mcs.anl.gov> <87h98h7wy3.fsf@jedbrown.org> <7BF60FB6-E2A6-45E7-9340-1F664F4F3B8E@mcs.anl.gov> <878tts98v5.fsf@jedbrown.org> <11333ED6-170F-4FE3-9727-6ACAE36E9669@mcs.anl.gov> Message-ID: <9B230ED3-B798-42D7-AD91-5FA23D379BE8@mcs.anl.gov> > On Oct 17, 2016, at 10:47 AM, Kong, Fande wrote: > > > > On Thu, Oct 13, 2016 at 8:21 PM, Barry Smith wrote: > > Fande, > > What SNES method are you using? If you use SNESKSPONLY I think it is ok, it will solve for the norm minimizing least square solution during the one KSPSolve() and then return. > > The problem we are currently working on is a linear problem, but it could be extended to be nonlinear. Yes, you are right. "ksponly" indeed works, and returns the right solution. But the norm of residual still could confuse users because it is not close to zero. > > > > Yes, if you use SNESNEWTONLS or others though the SNES solver will, as you say, think that progress has not been made. > > I do not like what you propose to do, changing the right hand side of the system the user provides is a nasty and surprising side effect. > > I do not like this way either. The reason I posted this code here is that I want to let you know what are inconsistent between the nonlinear solvers and the linear solvers. You could have SNESSolve_KSPONLY subtract off the null space in the right hand side initially just like KSPSolve() does with code like ierr = MatGetTransposeNullSpace(pmat,&nullsp);CHKERRQ(ierr); if (nullsp) { ierr = VecDuplicate(ksp->vec_rhs,&btmp);CHKERRQ(ierr); ierr = VecCopy(ksp->vec_rhs,btmp);CHKERRQ(ierr); ierr = MatNullSpaceRemove(nullsp,btmp);CHKERRQ(ierr); vec_rhs = ksp->vec_rhs; ksp->vec_rhs = btmp; } It is not perfect, see my comment below, but it gets what you want and "kind of" makes the residuals decrease as in the KSPSolve directly case. > > > > What is your goal? To make it look like the SNES system has had a residual norm reduction? > > Yes, I would like to make SNES have a residual reduction. Possibly, we could add something in the converged_test function? For example, the residual vector is temporarily subtracted when evaluating the residual norm if the system has a null space? There is likely to always be confusion (in the linear case) or with any least squares type solver. The true residual is not really decreasing past a certain point but if the solver only sees the consistent part then it looks like the residual is decreasing. The problem is that none of this stuff (the PETSc model and API) was designed for the generality of inconsistent least squares problems and we have just been bolting on more support over time without enhancing the model. For example we could introduce the concept of a consistent residual and an inconsistent residual and have the default monitors display both when they are different; instead we just display "the residual norm" without clarity of "what" residual norm. We should think about this and wait for Jed to come up with the ideal design :-) Barry > > > > We could generalize you question and ask what about solving for nonlinear problems: find the minimal norm solution of min_x || F(x) - b||. This may or may not belong in Tao, currently SNES doesn't do any kind of nonlinear least squares. > > > It would be great, if we could add this kind of solvers. Tao does have one, I think. I would like to contribute something like this latter (of course, if you are ok with this algorithm), when we are moving to nonlinear problems in our applications. > > Fande Kong, > > > Barry > > > > On Oct 13, 2016, at 5:20 PM, Kong, Fande wrote: > > > > One more question. > > > > Suppose that we are solving the singular linear system Ax = b. N(A) is the null space of A, and N(A^T) is the null space of the transpose of A. > > > > The linear system is solved using SNES, that is, F(x) = Ax-b = Ax -b_r - b_n. Here b_n in N(A^T), and b_r in R(A). During each nonlinear iteration, a linear system A \delta x = F(x) is solved. N(A) is applied to Krylov space during the linear iterating. Before the actual solve "(*ksp->ops->solve)(ksp)" for \delta x, a temporary copy of F(x) is made, F_tmp. N(A^T) is applied to F_tmp. We will get a \delta x. F(x+\delta x ) = A(x+\delta x)-b_r - b_n. > > > > F(x+\delta x ) always contain the vector b_n, and then the algorithm never converges because the normal of F is at least 1. > > > > Should we apply N(A^T) to F instead of F_tmp so that b_n can be removed from F? > > > > MatGetTransposeNullSpace(pmat,&nullsp); > > if (nullsp) { > > VecDuplicate(ksp->vec_rhs,&btmp); > > VecCopy(ksp->vec_rhs,btmp); > > MatNullSpaceRemove(nullsp,btmp); > > vec_rhs = ksp->vec_rhs; > > ksp->vec_rhs = btmp; > > } > > > > should be changed to > > > > MatGetTransposeNullSpace(pmat,&nullsp); > > if (nullsp) { > > MatNullSpaceRemove(nullsp,ksp->vec_rhs); > > } > > ??? > > > > Or other solutions to this issue? > > > > > > Fande Kong, > > > > > > > > > > > > On Thu, Oct 13, 2016 at 8:23 AM, Matthew Knepley wrote: > > On Thu, Oct 13, 2016 at 9:06 AM, Kong, Fande wrote: > > > > > > On Wed, Oct 12, 2016 at 10:21 PM, Jed Brown wrote: > > Barry Smith writes: > > > I would make that a separate routine that the users would call first. > > > > We have VecMDot and VecMAXPY. I would propose adding > > > > VecQR(PetscInt nvecs,Vec *vecs,PetscScalar *R); > > > > (where R can be NULL). > > > > What does R mean here? > > > > It means the coefficients of the old basis vectors in the new basis. > > > > Matt > > > > If nobody working on this, I will be going to take a try. > > > > Fande, > > > > > > Does anyone use the "Vecs" type? > > > > > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > From overholt at capesim.com Mon Oct 17 15:41:25 2016 From: overholt at capesim.com (Matthew Overholt) Date: Mon, 17 Oct 2016 16:41:25 -0400 Subject: [petsc-users] large PetscCommDuplicate overhead In-Reply-To: <6FB8A549-F30C-478B-ACB9-49BC2840AB39@mcs.anl.gov> References: <004201d21f3e$ed31c120$c7954360$@capesim.com> <1EF15B5B-168C-4FFD-98BB-4C49678C02FC@mcs.anl.gov> <001801d21fe8$a3e67970$ebb36c50$@capesim.com> <002b01d22403$aee809f0$0cb81dd0$@capesim.com> <87zima8tyu.fsf@jedbrown.org> <002901d224a0$0e18cf80$2a4a6e80$@capesim.com> <6FB8A549-F30C-478B-ACB9-49BC2840AB39@mcs.anl.gov> Message-ID: <006101d228b6$d47741b0$7d65c510$@capesim.com> Barry, If I look at the symbols available to trace I find the following. > nm xSYMMIC | grep " T MPI" | grep "attr" <#> T MPIR_Call_attr_copy <#> T MPIR_Call_attr_delete <#> T MPIR_Comm_delete_attr_impl <#> T MPIR_Comm_set_attr_impl <#> T MPIU_nem_gni_smsg_mbox_attr_init => Are the two _Comm_ symbols the ones of interest? > nm xSYMMIC | grep " T MPI" | grep "arrier" <#> T MPIDI_CRAY_dmapp_barrier_join <#> T MPIDI_Cray_shared_mem_coll_barrier <#> T MPIDI_Cray_shared_mem_coll_barrier_gather <#> T MPID_Sched_barrier <#> T MPID_nem_barrier <#> T MPID_nem_barrier_init <#> T MPID_nem_barrier_vars_init <#> T MPIR_Barrier <#> T MPIR_Barrier_impl <#> T MPIR_Barrier_inter <#> T MPIR_Barrier_intra <#> T MPIR_CRAY_Barrier <#> T MPIR_Ibarrier_impl <#> T MPIR_Ibarrier_inter <#> T MPIR_Ibarrier_intra => Which of these barriers should I trace? Finally, the current version of PETSc seems to be 3.7.2; I am not able to load 3.7.3. Thanks, Matt Overholt -----Original Message----- From: Barry Smith [mailto:bsmith at mcs.anl.gov] Sent: Thursday, October 13, 2016 11:46 PM To: overholt at capesim.com Cc: Jed Brown; PETSc Subject: Re: [petsc-users] large PetscCommDuplicate overhead Mathew, Thanks for the additional information. This is all very weird since the same number of calls made to PetscCommDuplicate() are the same regardless of geometry and the time of the call shouldn't depend on the geometry. Would you be able to do another set of tests where you track the time in MPI_Get_attr() and MPI_Barrier() instead of PetscCommDuplicate()? It could be Cray did something "funny" in their implementation of PETSc. You could also try using the module petsc/3.7.3 instead of the cray-petsc module Thanks Barry > On Oct 12, 2016, at 10:48 AM, Matthew Overholt wrote: > > Jed, > > I realize that the PetscCommDuplicate (PCD) overhead I am seeing must > be only indirectly related to the problem size, etc., and I wouldn't > be surprised if it was an artifact of some sort related to my specific > algorithm. So you may not want to pursue this much further. However, > I did make three runs using the same Edison environment and code but > different input geometry files. Earlier I found a strong dependence > on the number of processes, so for this test I ran all of the tests on > 1 node with 8 processes (N=1, n=8). What I found was that the amount > of PCD overhead was geometry dependent, not size dependent. A > moderately-sized simple geometry (with relatively few ghosted vertices > at the simple-planar interfaces) had no PCD overhead, whereas both > small and large complex geometries (with relatively more ghosted > vertices at the more-complex interfaces) had 5 - 6% PCD overhead. The log files follow. > > Thanks, > Matt Overholt > --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus From bsmith at mcs.anl.gov Mon Oct 17 16:48:08 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 17 Oct 2016 16:48:08 -0500 Subject: [petsc-users] large PetscCommDuplicate overhead In-Reply-To: <006101d228b6$d47741b0$7d65c510$@capesim.com> References: <004201d21f3e$ed31c120$c7954360$@capesim.com> <1EF15B5B-168C-4FFD-98BB-4C49678C02FC@mcs.anl.gov> <001801d21fe8$a3e67970$ebb36c50$@capesim.com> <002b01d22403$aee809f0$0cb81dd0$@capesim.com> <87zima8tyu.fsf@jedbrown.org> <002901d224a0$0e18cf80$2a4a6e80$@capesim.com> <6FB8A549-F30C-478B-ACB9-49BC2840AB39@mcs.anl.gov> <006101d228b6$d47741b0$7d65c510$@capesim.com> Message-ID: <5182C309-BDAF-4126-A206-35B794A7475D@mcs.anl.gov> Hmm, None of these sadly. There should be no barriers in the calls (optimized versions) just MPI_Attr_get() and MPI_Attr_set(). Maybe someone else has a better idea. Barry > On Oct 17, 2016, at 3:41 PM, Matthew Overholt wrote: > > Barry, > > If I look at the symbols available to trace I find the following. >> nm xSYMMIC | grep " T MPI" | grep "attr" > <#> T MPIR_Call_attr_copy > <#> T MPIR_Call_attr_delete > <#> T MPIR_Comm_delete_attr_impl > <#> T MPIR_Comm_set_attr_impl > <#> T MPIU_nem_gni_smsg_mbox_attr_init > > => Are the two _Comm_ symbols the ones of interest? > >> nm xSYMMIC | grep " T MPI" | grep "arrier" > <#> T MPIDI_CRAY_dmapp_barrier_join > <#> T MPIDI_Cray_shared_mem_coll_barrier > <#> T MPIDI_Cray_shared_mem_coll_barrier_gather > <#> T MPID_Sched_barrier > <#> T MPID_nem_barrier > <#> T MPID_nem_barrier_init > <#> T MPID_nem_barrier_vars_init > <#> T MPIR_Barrier > <#> T MPIR_Barrier_impl > <#> T MPIR_Barrier_inter > <#> T MPIR_Barrier_intra > <#> T MPIR_CRAY_Barrier > <#> T MPIR_Ibarrier_impl > <#> T MPIR_Ibarrier_inter > <#> T MPIR_Ibarrier_intra > > => Which of these barriers should I trace? > > Finally, the current version of PETSc seems to be 3.7.2; I am not able to > load 3.7.3. > > Thanks, > Matt Overholt > > > -----Original Message----- > From: Barry Smith [mailto:bsmith at mcs.anl.gov] > Sent: Thursday, October 13, 2016 11:46 PM > To: overholt at capesim.com > Cc: Jed Brown; PETSc > Subject: Re: [petsc-users] large PetscCommDuplicate overhead > > > Mathew, > > Thanks for the additional information. This is all very weird since the > same number of calls made to PetscCommDuplicate() are the same regardless > of geometry and the time of the call shouldn't depend on the geometry. > > Would you be able to do another set of tests where you track the time in > MPI_Get_attr() and MPI_Barrier() instead of PetscCommDuplicate()? It could > be Cray did something "funny" in their implementation of PETSc. > > You could also try using the module petsc/3.7.3 instead of the cray-petsc > module > > Thanks > > Barry > > > > >> On Oct 12, 2016, at 10:48 AM, Matthew Overholt > wrote: >> >> Jed, >> >> I realize that the PetscCommDuplicate (PCD) overhead I am seeing must >> be only indirectly related to the problem size, etc., and I wouldn't >> be surprised if it was an artifact of some sort related to my specific >> algorithm. So you may not want to pursue this much further. However, >> I did make three runs using the same Edison environment and code but >> different input geometry files. Earlier I found a strong dependence >> on the number of processes, so for this test I ran all of the tests on >> 1 node with 8 processes (N=1, n=8). What I found was that the amount >> of PCD overhead was geometry dependent, not size dependent. A >> moderately-sized simple geometry (with relatively few ghosted vertices >> at the simple-planar interfaces) had no PCD overhead, whereas both >> small and large complex geometries (with relatively more ghosted >> vertices at the more-complex interfaces) had 5 - 6% PCD overhead. The log > files follow. >> >> Thanks, >> Matt Overholt >> > > > --- > This email has been checked for viruses by Avast antivirus software. > https://www.avast.com/antivirus > From jed at jedbrown.org Mon Oct 17 20:49:20 2016 From: jed at jedbrown.org (Jed Brown) Date: Mon, 17 Oct 2016 19:49:20 -0600 Subject: [petsc-users] preconditioner for contact / mesh tying problem In-Reply-To: References: Message-ID: <87eg3e1l5r.fsf@jedbrown.org> Hoang Giang Bui writes: > Dear PETSc folks, > > While searching literature on the preconditioner for contact/mesh tying > problem, I saw the paper by Dr. Adams "Algebraic multigrid methods for > constrained linear systems with applications to contact problems in solid > mechanics, NLAA, 2004". Given the promising aspects the paper has shown for > constrained linear system, I wonder if some code's also available in PETSc > for testing/further extension? This particular algorithm is not available within the PETSc library. Mark might have some code around, but I doubt it's maintained or easy to tinker with. You should be able to implement the algorithm fairly quickly using PETSc. It would make a great example if you're willing to contribute. And we can advise if you decide to write such an example. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From hgbk2008 at gmail.com Tue Oct 18 03:13:11 2016 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Tue, 18 Oct 2016 10:13:11 +0200 Subject: [petsc-users] preconditioner for contact / mesh tying problem In-Reply-To: <87eg3e1l5r.fsf@jedbrown.org> References: <87eg3e1l5r.fsf@jedbrown.org> Message-ID: Hi Jed That's a great idea. Do you have any ex* to start with? Or suggesting me a starting point. Thanks Giang On Tue, Oct 18, 2016 at 3:49 AM, Jed Brown wrote: > Hoang Giang Bui writes: > > > Dear PETSc folks, > > > > While searching literature on the preconditioner for contact/mesh tying > > problem, I saw the paper by Dr. Adams "Algebraic multigrid methods for > > constrained linear systems with applications to contact problems in solid > > mechanics, NLAA, 2004". Given the promising aspects the paper has shown > for > > constrained linear system, I wonder if some code's also available in > PETSc > > for testing/further extension? > > This particular algorithm is not available within the PETSc library. > Mark might have some code around, but I doubt it's maintained or easy to > tinker with. You should be able to implement the algorithm fairly > quickly using PETSc. It would make a great example if you're willing to > contribute. And we can advise if you decide to write such an example. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From juan at tf.uni-kiel.de Tue Oct 18 07:38:19 2016 From: juan at tf.uni-kiel.de (Julian Andrej) Date: Tue, 18 Oct 2016 14:38:19 +0200 Subject: [petsc-users] PetscFE questions Message-ID: Hi, i have general question about PetscFE. When i want to assemble certain parts of physics separately, how can i do that? I basically want to assemble matrices/vectors from the weak forms on the same DM (and avoid copying the DM) and use them afterwards. Is there a convenient way for doing that? The "workflow" i'm approaching is something like: - Setup the DM - Setup discretization (spaces and quadrature) for each weak form i want to compute - Compute just the weak form i want right now for a specific discretization and field. The reason is i need certain parts of the "complete" Jacobian for computations of eigenproblems and like to avoid computing those more often than needed. Regards Julian From jed at jedbrown.org Tue Oct 18 09:36:39 2016 From: jed at jedbrown.org (Jed Brown) Date: Tue, 18 Oct 2016 08:36:39 -0600 Subject: [petsc-users] preconditioner for contact / mesh tying problem In-Reply-To: References: <87eg3e1l5r.fsf@jedbrown.org> Message-ID: <87pomxzpu0.fsf@jedbrown.org> Hoang Giang Bui writes: > Hi Jed > That's a great idea. Do you have any ex* to start with? Or suggesting me a > starting point. If you don't have an elasticity solver already, you could sart with src/snes/examples/tutorials/ex77.c which solves hyperelasticity using a potentially unstructured mesh. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From fande.kong at inl.gov Tue Oct 18 10:11:42 2016 From: fande.kong at inl.gov (Kong, Fande) Date: Tue, 18 Oct 2016 09:11:42 -0600 Subject: [petsc-users] Matrix is missing diagonal entry Message-ID: Hi Developers, Any reason to force users provide a matrix which does not miss any diagonal entries when using a LU-type solver? Sometime, it is impossible to have all diagonal entries in a matrix, that is, the matrix has to miss some diagonal entries. For example, there is a saddle-point matrix from the discretization of incomprehensible equations, and the lower part of the matrix is a zero block. The matrix usually looks like: | A B^T | | B 0 | [56]PETSC ERROR: Object is in wrong state [56]PETSC ERROR: Matrix is missing diagonal entry 33 [56]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [56]PETSC ERROR: Petsc Release Version 3.6.2, unknown [56]PETSC ERROR: ./fluid on a arch-linux2-cxx-opt named ys0755 by fandek Mon Oct 17 17:06:08 2016 [56]PETSC ERROR: Configure options --with-clanguage=cxx --with-shared-libraries=1 --download-fblaslapack=1 --with-mpi=1 --download-parmetis=1 --download-metis=1 --with-netcdf=1 --download-exodusii=1 --with-hdf5=1 --with-debugging=no --with-c2html=0 --with-64-bit-indices=1 --download-hypre=1 --download-superlu_dist=1 [56]PETSC ERROR: #1 MatILUFactorSymbolic_SeqAIJ() line 1729 in /petsc_installed/petsc/src/mat/impls/aij/seq/aijfact.c [56]PETSC ERROR: #2 MatILUFactorSymbolic() line 6457 in /petsc_installed/petsc/src/mat/interface/matrix.c [56]PETSC ERROR: #3 PCSetUp_ILU() line 204 in /petsc_installed/petsc/src/ksp/pc/impls/factor/ilu/ilu.c [56]PETSC ERROR: #4 PCSetUp() line 983 in /petsc_installed/petsc/src/ksp/pc/interface/precon.c [56]PETSC ERROR: #5 KSPSetUp() line 332 in /petsc_installed/petsc/src/ksp/ksp/interface/itfunc.c [56]PETSC ERROR: #6 PCSetUpOnBlocks_ASM() line 405 in /petsc_installed/petsc/src/ksp/pc/impls/asm/asm.c [56]PETSC ERROR: #7 PCSetUpOnBlocks() line 1016 in /petsc_installed/petsc/src/ksp/pc/interface/precon.c [56]PETSC ERROR: #8 KSPSetUpOnBlocks() line 167 in /petsc_installed/petsc/src/ksp/ksp/interface/itfunc.c [56]PETSC ERROR: #9 KSPSolve() line 552 in /petsc_installed/petsc/src/ksp/ksp/interface/itfunc.c [56]PETSC ERROR: #10 PCApply_LSC() line 83 in /petsc_installed/petsc/src/ksp/pc/impls/lsc/lsc.c [56]PETSC ERROR: #11 PCApply() line 483 in /petsc_installed/petsc/src/ksp/pc/interface/precon.c [56]PETSC ERROR: #12 KSP_PCApply() line 242 in /petsc_installed/petsc/include/petsc/private/kspimpl.h [56]PETSC ERROR: #13 KSPSolve_PREONLY() line 26 in /petsc_installed/petsc/src/ksp/ksp/impls/preonly/preonly.c [56]PETSC ERROR: #14 KSPSolve() line 604 in /petsc_installed/petsc/src/ksp/ksp/interface/itfunc.c [56]PETSC ERROR: #15 PCApply_FieldSplit_Schur() line 904 in /petsc_installed/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [56]PETSC ERROR: #16 PCApply() line 483 in /petsc_installed/petsc/src/ksp/pc/interface/precon.c [56]PETSC ERROR: #17 KSP_PCApply() line 242 in /petsc_installed/petsc/include/petsc/private/kspimpl.h [56]PETSC ERROR: #18 KSPInitialResidual() line 63 in /petsc_installed/petsc/src/ksp/ksp/interface/itres.c [56]PETSC ERROR: #19 KSPSolve_GMRES() line 235 in /petsc_installed/petsc/src/ksp/ksp/impls/gmres/gmres.c [56]PETSC ERROR: #20 KSPSolve() line 604 in /petsc_installed/petsc/src/ksp/ksp/interface/itfunc.c [56]PETSC ERROR: #21 SNESSolve_NEWTONLS() line 233 in /petsc_installed/petsc/src/snes/impls/ls/ls.c [56]PETSC ERROR: #22 SNESSolve() line 3906 in /petsc_installed/petsc/src/snes/interface/snes.c Thanks, Fande Kong, -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Oct 18 10:45:41 2016 From: jed at jedbrown.org (Jed Brown) Date: Tue, 18 Oct 2016 09:45:41 -0600 Subject: [petsc-users] Matrix is missing diagonal entry In-Reply-To: References: Message-ID: <87k2d5zmmy.fsf@jedbrown.org> "Kong, Fande" writes: > Hi Developers, > > Any reason to force users provide a matrix which does not miss any diagonal > entries when using a LU-type solver? Automatically adding the entries for incomplete factorization is a slight increase in complexity and means that "ILU(0)" actually has some fill. Note that a literal ILU(0) cannot work if there are missing diagonal entries. It's probably not a significant code complexity difference for LU. > Sometime, it is impossible to have all diagonal entries in a matrix, that > is, the matrix has to miss some diagonal entries. For example, there is a > saddle-point matrix from the discretization of incomprehensible equations, > and the lower part of the matrix is a zero block. The matrix usually looks > like: > > | A B^T | > | B 0 | Just insert explicit zeros on the diagonal. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From hzhang at mcs.anl.gov Tue Oct 18 10:46:45 2016 From: hzhang at mcs.anl.gov (Hong) Date: Tue, 18 Oct 2016 10:46:45 -0500 Subject: [petsc-users] Matrix is missing diagonal entry In-Reply-To: References: Message-ID: You need set 0.0 to the diagonals. Diagonal storage is used in PETSc library. Hong On Tue, Oct 18, 2016 at 10:11 AM, Kong, Fande wrote: > Hi Developers, > > Any reason to force users provide a matrix which does not miss any > diagonal entries when using a LU-type solver? > > Sometime, it is impossible to have all diagonal entries in a matrix, that > is, the matrix has to miss some diagonal entries. For example, there is a > saddle-point matrix from the discretization of incomprehensible equations, > and the lower part of the matrix is a zero block. The matrix usually looks > like: > > | A B^T | > | B 0 | > > > > > > [56]PETSC ERROR: Object is in wrong state > [56]PETSC ERROR: Matrix is missing diagonal entry 33 > [56]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [56]PETSC ERROR: Petsc Release Version 3.6.2, unknown > [56]PETSC ERROR: ./fluid on a arch-linux2-cxx-opt named ys0755 by fandek > Mon Oct 17 17:06:08 2016 > [56]PETSC ERROR: Configure options --with-clanguage=cxx > --with-shared-libraries=1 --download-fblaslapack=1 --with-mpi=1 > --download-parmetis=1 --download-metis=1 --with-netcdf=1 > --download-exodusii=1 --with-hdf5=1 --with-debugging=no --with-c2html=0 > --with-64-bit-indices=1 --download-hypre=1 --download-superlu_dist=1 > [56]PETSC ERROR: #1 MatILUFactorSymbolic_SeqAIJ() line 1729 in > /petsc_installed/petsc/src/mat/impls/aij/seq/aijfact.c > [56]PETSC ERROR: #2 MatILUFactorSymbolic() line 6457 in > /petsc_installed/petsc/src/mat/interface/matrix.c > [56]PETSC ERROR: #3 PCSetUp_ILU() line 204 in /petsc_installed/petsc/src/ > ksp/pc/impls/factor/ilu/ilu.c > [56]PETSC ERROR: #4 PCSetUp() line 983 in /petsc_installed/petsc/src/ > ksp/pc/interface/precon.c > [56]PETSC ERROR: #5 KSPSetUp() line 332 in /petsc_installed/petsc/src/ > ksp/ksp/interface/itfunc.c > [56]PETSC ERROR: #6 PCSetUpOnBlocks_ASM() line 405 in > /petsc_installed/petsc/src/ksp/pc/impls/asm/asm.c > [56]PETSC ERROR: #7 PCSetUpOnBlocks() line 1016 in > /petsc_installed/petsc/src/ksp/pc/interface/precon.c > [56]PETSC ERROR: #8 KSPSetUpOnBlocks() line 167 in > /petsc_installed/petsc/src/ksp/ksp/interface/itfunc.c > [56]PETSC ERROR: #9 KSPSolve() line 552 in /petsc_installed/petsc/src/ > ksp/ksp/interface/itfunc.c > [56]PETSC ERROR: #10 PCApply_LSC() line 83 in /petsc_installed/petsc/src/ > ksp/pc/impls/lsc/lsc.c > [56]PETSC ERROR: #11 PCApply() line 483 in /petsc_installed/petsc/src/ > ksp/pc/interface/precon.c > [56]PETSC ERROR: #12 KSP_PCApply() line 242 in /petsc_installed/petsc/ > include/petsc/private/kspimpl.h > [56]PETSC ERROR: #13 KSPSolve_PREONLY() line 26 in > /petsc_installed/petsc/src/ksp/ksp/impls/preonly/preonly.c > [56]PETSC ERROR: #14 KSPSolve() line 604 in /petsc_installed/petsc/src/ > ksp/ksp/interface/itfunc.c > [56]PETSC ERROR: #15 PCApply_FieldSplit_Schur() line 904 in > /petsc_installed/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c > [56]PETSC ERROR: #16 PCApply() line 483 in /petsc_installed/petsc/src/ > ksp/pc/interface/precon.c > [56]PETSC ERROR: #17 KSP_PCApply() line 242 in /petsc_installed/petsc/ > include/petsc/private/kspimpl.h > [56]PETSC ERROR: #18 KSPInitialResidual() line 63 in > /petsc_installed/petsc/src/ksp/ksp/interface/itres.c > [56]PETSC ERROR: #19 KSPSolve_GMRES() line 235 in > /petsc_installed/petsc/src/ksp/ksp/impls/gmres/gmres.c > [56]PETSC ERROR: #20 KSPSolve() line 604 in /petsc_installed/petsc/src/ > ksp/ksp/interface/itfunc.c > [56]PETSC ERROR: #21 SNESSolve_NEWTONLS() line 233 in > /petsc_installed/petsc/src/snes/impls/ls/ls.c > [56]PETSC ERROR: #22 SNESSolve() line 3906 in /petsc_installed/petsc/src/ > snes/interface/snes.c > > > Thanks, > > Fande Kong, > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Tue Oct 18 11:05:56 2016 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Tue, 18 Oct 2016 18:05:56 +0200 Subject: [petsc-users] preconditioner for contact / mesh tying problem In-Reply-To: <87pomxzpu0.fsf@jedbrown.org> References: <87eg3e1l5r.fsf@jedbrown.org> <87pomxzpu0.fsf@jedbrown.org> Message-ID: I do have an elasticity solver though. However it's an in-house code hence it's not very straight forward to unroll to make a PETSc example. The example can only read in the provided matrices and apply the preconditioner. However, I think in general standard PETSc example generates the linear system and solve it successively. In that way, ex77 must be extended. What do you think? Giang On Tue, Oct 18, 2016 at 4:36 PM, Jed Brown wrote: > Hoang Giang Bui writes: > > > Hi Jed > > That's a great idea. Do you have any ex* to start with? Or suggesting me > a > > starting point. > > If you don't have an elasticity solver already, you could sart with > src/snes/examples/tutorials/ex77.c which solves hyperelasticity using a > potentially unstructured mesh. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Oct 18 11:19:55 2016 From: jed at jedbrown.org (Jed Brown) Date: Tue, 18 Oct 2016 10:19:55 -0600 Subject: [petsc-users] preconditioner for contact / mesh tying problem In-Reply-To: References: <87eg3e1l5r.fsf@jedbrown.org> <87pomxzpu0.fsf@jedbrown.org> Message-ID: <87eg3dzl1w.fsf@jedbrown.org> Hoang Giang Bui writes: > I do have an elasticity solver though. However it's an in-house code hence > it's not very straight forward to unroll to make a PETSc example. The > example can only read in the provided matrices and apply the > preconditioner. However, I think in general standard PETSc example > generates the linear system and solve it successively. In that way, ex77 > must be extended. What do you think? I suggest SNES ex77 (versus writing a pure KSP example) because the contact problem is nonlinear and your example might as well actually solve a problem instead of just solving one linearized step. That said, VIRS isn't set up for general constraints using Lagrange multipliers, so it might be trying to tackle too many problems at once. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From fande.kong at inl.gov Tue Oct 18 11:24:29 2016 From: fande.kong at inl.gov (Kong, Fande) Date: Tue, 18 Oct 2016 10:24:29 -0600 Subject: [petsc-users] Matrix is missing diagonal entry In-Reply-To: References: Message-ID: Thanks, Hong and Jed. I am going to explicitly add a few zeros into the matrix. Regards, Fande, On Tue, Oct 18, 2016 at 9:46 AM, Hong wrote: > You need set 0.0 to the diagonals. > Diagonal storage is used in PETSc library. > > Hong > > > On Tue, Oct 18, 2016 at 10:11 AM, Kong, Fande wrote: > >> Hi Developers, >> >> Any reason to force users provide a matrix which does not miss any >> diagonal entries when using a LU-type solver? >> >> Sometime, it is impossible to have all diagonal entries in a matrix, that >> is, the matrix has to miss some diagonal entries. For example, there is a >> saddle-point matrix from the discretization of incomprehensible equations, >> and the lower part of the matrix is a zero block. The matrix usually looks >> like: >> >> | A B^T | >> | B 0 | >> >> >> >> >> >> [56]PETSC ERROR: Object is in wrong state >> [56]PETSC ERROR: Matrix is missing diagonal entry 33 >> [56]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >> >> for trouble shooting. >> [56]PETSC ERROR: Petsc Release Version 3.6.2, unknown >> [56]PETSC ERROR: ./fluid on a arch-linux2-cxx-opt named ys0755 by fandek >> Mon Oct 17 17:06:08 2016 >> [56]PETSC ERROR: Configure options --with-clanguage=cxx >> --with-shared-libraries=1 --download-fblaslapack=1 --with-mpi=1 >> --download-parmetis=1 --download-metis=1 --with-netcdf=1 >> --download-exodusii=1 --with-hdf5=1 --with-debugging=no --with-c2html=0 >> --with-64-bit-indices=1 --download-hypre=1 --download-superlu_dist=1 >> [56]PETSC ERROR: #1 MatILUFactorSymbolic_SeqAIJ() line 1729 in >> /petsc_installed/petsc/src/mat/impls/aij/seq/aijfact.c >> [56]PETSC ERROR: #2 MatILUFactorSymbolic() line 6457 in >> /petsc_installed/petsc/src/mat/interface/matrix.c >> [56]PETSC ERROR: #3 PCSetUp_ILU() line 204 in >> /petsc_installed/petsc/src/ksp/pc/impls/factor/ilu/ilu.c >> [56]PETSC ERROR: #4 PCSetUp() line 983 in /petsc_installed/petsc/src/ksp >> /pc/interface/precon.c >> [56]PETSC ERROR: #5 KSPSetUp() line 332 in /petsc_installed/petsc/src/ksp >> /ksp/interface/itfunc.c >> [56]PETSC ERROR: #6 PCSetUpOnBlocks_ASM() line 405 in >> /petsc_installed/petsc/src/ksp/pc/impls/asm/asm.c >> [56]PETSC ERROR: #7 PCSetUpOnBlocks() line 1016 in >> /petsc_installed/petsc/src/ksp/pc/interface/precon.c >> [56]PETSC ERROR: #8 KSPSetUpOnBlocks() line 167 in >> /petsc_installed/petsc/src/ksp/ksp/interface/itfunc.c >> [56]PETSC ERROR: #9 KSPSolve() line 552 in /petsc_installed/petsc/src/ksp >> /ksp/interface/itfunc.c >> [56]PETSC ERROR: #10 PCApply_LSC() line 83 in >> /petsc_installed/petsc/src/ksp/pc/impls/lsc/lsc.c >> [56]PETSC ERROR: #11 PCApply() line 483 in /petsc_installed/petsc/src/ksp >> /pc/interface/precon.c >> [56]PETSC ERROR: #12 KSP_PCApply() line 242 in >> /petsc_installed/petsc/include/petsc/private/kspimpl.h >> [56]PETSC ERROR: #13 KSPSolve_PREONLY() line 26 in >> /petsc_installed/petsc/src/ksp/ksp/impls/preonly/preonly.c >> [56]PETSC ERROR: #14 KSPSolve() line 604 in /petsc_installed/petsc/src/ksp >> /ksp/interface/itfunc.c >> [56]PETSC ERROR: #15 PCApply_FieldSplit_Schur() line 904 in >> /petsc_installed/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c >> [56]PETSC ERROR: #16 PCApply() line 483 in /petsc_installed/petsc/src/ksp >> /pc/interface/precon.c >> [56]PETSC ERROR: #17 KSP_PCApply() line 242 in >> /petsc_installed/petsc/include/petsc/private/kspimpl.h >> [56]PETSC ERROR: #18 KSPInitialResidual() line 63 in >> /petsc_installed/petsc/src/ksp/ksp/interface/itres.c >> [56]PETSC ERROR: #19 KSPSolve_GMRES() line 235 in >> /petsc_installed/petsc/src/ksp/ksp/impls/gmres/gmres.c >> [56]PETSC ERROR: #20 KSPSolve() line 604 in /petsc_installed/petsc/src/ksp >> /ksp/interface/itfunc.c >> [56]PETSC ERROR: #21 SNESSolve_NEWTONLS() line 233 in >> /petsc_installed/petsc/src/snes/impls/ls/ls.c >> [56]PETSC ERROR: #22 SNESSolve() line 3906 in >> /petsc_installed/petsc/src/snes/interface/snes.c >> >> >> Thanks, >> >> Fande Kong, >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Tue Oct 18 11:27:40 2016 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Tue, 18 Oct 2016 18:27:40 +0200 Subject: [petsc-users] preconditioner for contact / mesh tying problem In-Reply-To: <87eg3dzl1w.fsf@jedbrown.org> References: <87eg3e1l5r.fsf@jedbrown.org> <87pomxzpu0.fsf@jedbrown.org> <87eg3dzl1w.fsf@jedbrown.org> Message-ID: Clear on that. First I need some time to study ex77 carefully. Giang On Tue, Oct 18, 2016 at 6:19 PM, Jed Brown wrote: > Hoang Giang Bui writes: > > > I do have an elasticity solver though. However it's an in-house code > hence > > it's not very straight forward to unroll to make a PETSc example. The > > example can only read in the provided matrices and apply the > > preconditioner. However, I think in general standard PETSc example > > generates the linear system and solve it successively. In that way, ex77 > > must be extended. What do you think? > > I suggest SNES ex77 (versus writing a pure KSP example) because the > contact problem is nonlinear and your example might as well actually > solve a problem instead of just solving one linearized step. That said, > VIRS isn't set up for general constraints using Lagrange multipliers, so > it might be trying to tackle too many problems at once. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Oct 18 13:27:33 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 18 Oct 2016 13:27:33 -0500 Subject: [petsc-users] Matrix is missing diagonal entry In-Reply-To: References: Message-ID: You could also fork off of master and find all the places where the code depends on diagonal entries existing and modify the code in each of these places to handle the missing diagonal. Then run all the test suite and add some new tests that explicitly test this functionality and then make a pull request. Barry > On Oct 18, 2016, at 11:24 AM, Kong, Fande wrote: > > Thanks, Hong and Jed. > > I am going to explicitly add a few zeros into the matrix. > > > Regards, > > Fande, > > On Tue, Oct 18, 2016 at 9:46 AM, Hong wrote: > You need set 0.0 to the diagonals. > Diagonal storage is used in PETSc library. > > Hong > > > On Tue, Oct 18, 2016 at 10:11 AM, Kong, Fande wrote: > Hi Developers, > > Any reason to force users provide a matrix which does not miss any diagonal entries when using a LU-type solver? > > Sometime, it is impossible to have all diagonal entries in a matrix, that is, the matrix has to miss some diagonal entries. For example, there is a saddle-point matrix from the discretization of incomprehensible equations, and the lower part of the matrix is a zero block. The matrix usually looks like: > > | A B^T | > | B 0 | > > > > > > [56]PETSC ERROR: Object is in wrong state > [56]PETSC ERROR: Matrix is missing diagonal entry 33 > [56]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [56]PETSC ERROR: Petsc Release Version 3.6.2, unknown > [56]PETSC ERROR: ./fluid on a arch-linux2-cxx-opt named ys0755 by fandek Mon Oct 17 17:06:08 2016 > [56]PETSC ERROR: Configure options --with-clanguage=cxx --with-shared-libraries=1 --download-fblaslapack=1 --with-mpi=1 --download-parmetis=1 --download-metis=1 --with-netcdf=1 --download-exodusii=1 --with-hdf5=1 --with-debugging=no --with-c2html=0 --with-64-bit-indices=1 --download-hypre=1 --download-superlu_dist=1 > [56]PETSC ERROR: #1 MatILUFactorSymbolic_SeqAIJ() line 1729 in /petsc_installed/petsc/src/mat/impls/aij/seq/aijfact.c > [56]PETSC ERROR: #2 MatILUFactorSymbolic() line 6457 in /petsc_installed/petsc/src/mat/interface/matrix.c > [56]PETSC ERROR: #3 PCSetUp_ILU() line 204 in /petsc_installed/petsc/src/ksp/pc/impls/factor/ilu/ilu.c > [56]PETSC ERROR: #4 PCSetUp() line 983 in /petsc_installed/petsc/src/ksp/pc/interface/precon.c > [56]PETSC ERROR: #5 KSPSetUp() line 332 in /petsc_installed/petsc/src/ksp/ksp/interface/itfunc.c > [56]PETSC ERROR: #6 PCSetUpOnBlocks_ASM() line 405 in /petsc_installed/petsc/src/ksp/pc/impls/asm/asm.c > [56]PETSC ERROR: #7 PCSetUpOnBlocks() line 1016 in /petsc_installed/petsc/src/ksp/pc/interface/precon.c > [56]PETSC ERROR: #8 KSPSetUpOnBlocks() line 167 in /petsc_installed/petsc/src/ksp/ksp/interface/itfunc.c > [56]PETSC ERROR: #9 KSPSolve() line 552 in /petsc_installed/petsc/src/ksp/ksp/interface/itfunc.c > [56]PETSC ERROR: #10 PCApply_LSC() line 83 in /petsc_installed/petsc/src/ksp/pc/impls/lsc/lsc.c > [56]PETSC ERROR: #11 PCApply() line 483 in /petsc_installed/petsc/src/ksp/pc/interface/precon.c > [56]PETSC ERROR: #12 KSP_PCApply() line 242 in /petsc_installed/petsc/include/petsc/private/kspimpl.h > [56]PETSC ERROR: #13 KSPSolve_PREONLY() line 26 in /petsc_installed/petsc/src/ksp/ksp/impls/preonly/preonly.c > [56]PETSC ERROR: #14 KSPSolve() line 604 in /petsc_installed/petsc/src/ksp/ksp/interface/itfunc.c > [56]PETSC ERROR: #15 PCApply_FieldSplit_Schur() line 904 in /petsc_installed/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c > [56]PETSC ERROR: #16 PCApply() line 483 in /petsc_installed/petsc/src/ksp/pc/interface/precon.c > [56]PETSC ERROR: #17 KSP_PCApply() line 242 in /petsc_installed/petsc/include/petsc/private/kspimpl.h > [56]PETSC ERROR: #18 KSPInitialResidual() line 63 in /petsc_installed/petsc/src/ksp/ksp/interface/itres.c > [56]PETSC ERROR: #19 KSPSolve_GMRES() line 235 in /petsc_installed/petsc/src/ksp/ksp/impls/gmres/gmres.c > [56]PETSC ERROR: #20 KSPSolve() line 604 in /petsc_installed/petsc/src/ksp/ksp/interface/itfunc.c > [56]PETSC ERROR: #21 SNESSolve_NEWTONLS() line 233 in /petsc_installed/petsc/src/snes/impls/ls/ls.c > [56]PETSC ERROR: #22 SNESSolve() line 3906 in /petsc_installed/petsc/src/snes/interface/snes.c > > > Thanks, > > Fande Kong, > > From cmpierce at WPI.EDU Tue Oct 18 13:52:46 2016 From: cmpierce at WPI.EDU (Christopher Pierce) Date: Tue, 18 Oct 2016 14:52:46 -0400 Subject: [petsc-users] SLEPc: Convergence Problems In-Reply-To: References: <98e3ff90-72b2-251b-161d-cf8621cf9fc1@wpi.edu> <4E3ADE7D-73BC-42D3-B17C-EAD253DC801C@mcs.anl.gov> <10558_1476423923_u9E5jFtZ026640_ea40cc78-d38b-a32c-d7d8-db83baba0e3e@wpi.edu> Message-ID: <0a7fad72-0834-3bd0-ec68-4b3f1579ffed@wpi.edu> Actually I don't. Sorry, I'm fairly new to using SLEPc. That would explain why when I use krylov methods convergence is extremely slow (~15,000 iterations for the first eigenpair), but when I use other methods such as lobpcg and rqcg, which I've heard use preconditioners automatically, convergence is much faster. I'm using the MPIAIJ format to store my matrix which is mostly block diagonal, but with a significant number of non-zero entries outside of those regions. I'm not running a multi-physics simulations so I don't really have blocks in that sense. I'm trying to solve the Schrodinger equation in 2D/3D using the Finite Element Method. Thanks, Chris On 10/17/16 12:39, Julian Andrej wrote: > Do you precondition your eigenvalue problem? If not, you should. Let > us know what structure your matrix has and which blocks (if there are > any) include which physics. > > Regards > Julian > > On Mon, Oct 17, 2016 at 5:30 PM, Christopher Pierce wrote: >> I've implemented my application using MatGetSubMatrix and the solvers >> appear to be converging correctly now, just slowly. I assume that this >> is due to the clustering of eigenvalues inherent to the problem that I'm >> using, however. I think that this should be enough to get me on track >> to solving problems with it. >> >> Thanks, >> >> Chris >> >> >> On 10/14/16 01:43, Christopher Pierce wrote: >>> Thank You, >>> >>> That looks like what I need to do if the highly degenerate eigenpairs >>> are my problem. I'll try that out this week and see if that helps. >>> >>> Chris >>> >>> >>> >>> >>> On 10/13/16 20:01, Barry Smith wrote: >>>> I would use MatGetSubMatrix() to pull out the part of the matrix you care about and hand that matrix off to SLEPc. >>>> >>>> Others prefer to remove the Dirichlet boundary value locations while doing the finite element assembly, this way those locations never appear in the matrix. >>>> >>>> The end result is the same, you have the slightly smaller matrix of interest to compute the eigenvalues from. >>>> >>>> >>>> Barry >>>> >>>>> On Oct 13, 2016, at 5:48 PM, Christopher Pierce wrote: >>>>> >>>>> Hello All, >>>>> >>>>> As there isn't a SLEPc specific list, it was recommended that I bring my >>>>> question here. I am using SLEPc to solve a generalized eigenvalue >>>>> problem generated as part of the Finite Element Method, but am having >>>>> difficulty getting the diagonalizer to converge. I am worried that the >>>>> method used to set boundary conditions in the matrix is creating the >>>>> problem and am looking for other people's input. >>>>> >>>>> In order to set the boundary conditions, I find the list of IDs that >>>>> should be zero in the resulting eigenvectors and then use >>>>> MatZeroRowsColumns to zero the rows and columns and in the matrix A >>>>> insert a large value such as 1E10 on each diagonal element that was >>>>> zeroed and likewise for the B matrix except with the value 1.0. That >>>>> way the eigenvalues resulting from those solutions are on the order of >>>>> 1E10 and are outside of the region of interest for my problem. >>>>> >>>>> When I tried to diagonal the matrices I could only get converged >>>>> solutions from the rqcg method which I have found to not scale well with >>>>> my problem. When using any other method, the approximate error of the >>>>> eigenpairs hovers around 1E00 and 1E01 until it reaches the max number >>>>> of iterations. Could having so many identical eigenvalues (~1,000) in >>>>> the spectrum be causing this to happen even if they are far outside of >>>>> the range of interest? >>>>> >>>>> Thank, >>>>> >>>>> Chris Pierce >>>>> WPI Center for Computation Nano-Science >>>>> >>>>> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From bikash at umich.edu Tue Oct 18 17:26:55 2016 From: bikash at umich.edu (Bikash Kanungo) Date: Tue, 18 Oct 2016 18:26:55 -0400 Subject: [petsc-users] BVNormColumn In-Reply-To: References: Message-ID: Hi Jose, Thanks for the pointers. Here's what I observed on probing it further: 1. The ||B - B^H|| norm was 1e-18. So I explicitly made it Hermitian by setting B = 0.5(B+B^H). However, this didn't help. 2. Next, I checked for the conditioning of B by computing the ratio of the highest and lowest eigenvalues. The conditioning of the order 1e-9. 3. I monitored the imaginary the imaginary part of VecDot(y,x, dotXY) where y = B*x and noted that only when the imaginary part is more than 1e-16 in magnitude, the error of "The inner product is not well defined" is flagged. For the first few iterations of orhtogonalization (i.e., the one where orthogonization is successful), the values of VecDot(y,x, dotXY) are all found to be lower than 1e-16. I guess this small imaginary part might be the cause of the error. Let me know if there is a way to bypass the abort by changing the tolerance for imaginary part. Regards, Bikash On Thu, Oct 13, 2016 at 4:48 AM, Jose E. Roman wrote: > > > El 13 oct 2016, a las 5:26, Bikash Kanungo escribi?: > > > > Hi, > > > > I facing the following issue. I'm trying to use orthogonalize a set of > vectors (all complex) with a non-standard inner product (.i.e. with > BVSetMatrix). Let's call the basis vector to be BV and the matrix to be B. > After certain number of iterations, I'm getting an error "The inner product > is not well defined: nonzero imaginary part". I investigated this further. > What I did was obtain the vec (column) which was throwing the error. Let's > call the vec to be x and its column ID in BV to be j. I obtained x^H*B*x in > two different ways: (1). by first getting y=B*x and then performing > VecDot(x,y, dotXY), and (2) by using BVNormColumn(BV, j, NORM_2, normj). > I'm doing this check even before calling the BVOrthogonalize routine. > > > > In principle, the value from (1) should be the square of the value from > (2). For the iterations where I'm successful to perform the > orthogonalization this check is satisfied. However, for the iteration where > it fails with the above error, the value from (2) is zero. I'm unable to > understand why this is the case. > > > > Thanks, > > Bikash > > Please note that to compute x^H*y you have to call VecDot(y,x,dot), with y > first. Anyway, this does not matter for what you are reporting. > > Probably the call for (2) is aborting due to an error, so it does not > return a value. Add CHKERRQ(ierr) after it. In general, it is always > recommended to add this to every PETSc/SLEPc call, also in Fortran code > (although SLEPc Fortran examples do not have it). > > One possible explanation for the error "The inner product is not well > defined" is that the matrix is not exactly Hermitian, that is B^H-B is tiny > but not zero. If this is the case, I would suggest explicitly making it > Hermitian. Also, things could go bad if matrix B is ill-conditioned. > > Jose > > -- Bikash S. Kanungo PhD Student Computational Materials Physics Group Mechanical Engineering University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From xsli at lbl.gov Wed Oct 19 01:06:04 2016 From: xsli at lbl.gov (Xiaoye S. Li) Date: Tue, 18 Oct 2016 23:06:04 -0700 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> Message-ID: I looked at each valgrind-complained item in your email dated Oct. 11. Those reports are really superficial; I don't see anything wrong with those lines (mostly uninitialized variables) singled out. I did a few tests with the latest version in github, all went fine. Perhaps you can print your matrix that caused problem, I can run it using your matrix. Sherry On Tue, Oct 11, 2016 at 2:18 PM, Anton wrote: > > > On 10/11/16 7:19 PM, Satish Balay wrote: > >> This log looks truncated. Are there any valgrind mesages before this? >> [like from your application code - or from MPI] >> > Yes it is indeed truncated. I only included relevant messages. > >> >> Perhaps you can send the complete log - with: >> valgrind -q --tool=memcheck --leak-check=yes --num-callers=20 >> --track-origins=yes >> >> [and if there were more valgrind messages from MPI - rebuild petsc >> > There are no messages originating from our code, just a few MPI related > ones (probably false positives) and from SuperLU_DIST (most of them). > > Thanks, > Anton > > with --download-mpich - for a valgrind clean mpi] >> >> Sherry, >> Perhaps this log points to some issue in superlu_dist? >> >> thanks, >> Satish >> >> On Tue, 11 Oct 2016, Anton Popov wrote: >> >> Valgrind immediately detects interesting stuff: >>> >>> ==25673== Use of uninitialised value of size 8 >>> ==25673== at 0x178272C: static_schedule (static_schedule.c:960) >>> ==25674== Use of uninitialised value of size 8 >>> ==25674== at 0x178272C: static_schedule (static_schedule.c:960) >>> ==25674== by 0x174E74E: pdgstrf (pdgstrf.c:572) >>> ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) >>> >>> >>> ==25673== Conditional jump or move depends on uninitialised value(s) >>> ==25673== at 0x1752143: pdgstrf (dlook_ahead_update.c:24) >>> ==25673== by 0x1733954: pdgssvx (pdgssvx.c:1124) >>> >>> >>> ==25673== Conditional jump or move depends on uninitialised value(s) >>> ==25673== at 0x5C83F43: PMPI_Recv (in /opt/mpich3/lib/libmpi.so.12.1 >>> .0) >>> ==25673== by 0x1755385: pdgstrf2_trsm (pdgstrf2.c:253) >>> ==25673== by 0x1751E4F: pdgstrf (dlook_ahead_update.c:195) >>> ==25673== by 0x1733954: pdgssvx (pdgssvx.c:1124) >>> >>> ==25674== Use of uninitialised value of size 8 >>> ==25674== at 0x62BF72B: _itoa_word (_itoa.c:179) >>> ==25674== by 0x62C1289: printf_positional (vfprintf.c:2022) >>> ==25674== by 0x62C2465: vfprintf (vfprintf.c:1677) >>> ==25674== by 0x638AFD5: __vsnprintf_chk (vsnprintf_chk.c:63) >>> ==25674== by 0x638AF37: __snprintf_chk (snprintf_chk.c:34) >>> ==25674== by 0x5CC6C08: MPIR_Err_create_code_valist (in >>> /opt/mpich3/lib/libmpi.so.12.1.0) >>> ==25674== by 0x5CC7A9A: MPIR_Err_create_code (in >>> /opt/mpich3/lib/libmpi.so.12.1.0) >>> ==25674== by 0x5C83FB1: PMPI_Recv (in /opt/mpich3/lib/libmpi.so.12.1 >>> .0) >>> ==25674== by 0x1755385: pdgstrf2_trsm (pdgstrf2.c:253) >>> ==25674== by 0x1751E4F: pdgstrf (dlook_ahead_update.c:195) >>> ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) >>> >>> ==25674== Use of uninitialised value of size 8 >>> ==25674== at 0x1751E92: pdgstrf (dlook_ahead_update.c:205) >>> ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) >>> >>> And it crashes after this: >>> >>> ==25674== Invalid write of size 4 >>> ==25674== at 0x1751F2F: pdgstrf (dlook_ahead_update.c:211) >>> ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) >>> ==25674== by 0xAAEFAE: MatLUFactorNumeric_SuperLU_DIST >>> (superlu_dist.c:421) >>> ==25674== Address 0xa0 is not stack'd, malloc'd or (recently) free'd >>> ==25674== >>> [1]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >>> probably >>> memory access out of range >>> >>> >>> On 10/11/2016 03:26 PM, Anton Popov wrote: >>> >>>> On 10/10/2016 07:11 PM, Satish Balay wrote: >>>> >>>>> Thats from petsc-3.5 >>>>> >>>>> Anton - please post the stack trace you get with >>>>> --download-superlu_dist-commit=origin/maint >>>>> >>>> I guess this is it: >>>> >>>> [0]PETSC ERROR: [0] SuperLU_DIST:pdgssvx line 421 >>>> /home/anton/LIB/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >>>> [0]PETSC ERROR: [0] MatLUFactorNumeric_SuperLU_DIST line 282 >>>> /home/anton/LIB/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >>>> [0]PETSC ERROR: [0] MatLUFactorNumeric line 2985 >>>> /home/anton/LIB/petsc/src/mat/interface/matrix.c >>>> [0]PETSC ERROR: [0] PCSetUp_LU line 101 >>>> /home/anton/LIB/petsc/src/ksp/pc/impls/factor/lu/lu.c >>>> [0]PETSC ERROR: [0] PCSetUp line 930 >>>> /home/anton/LIB/petsc/src/ksp/pc/interface/precon.c >>>> >>>> According to the line numbers it crashes within >>>> MatLUFactorNumeric_SuperLU_DIST while calling pdgssvx. >>>> >>>> Surprisingly this only happens on the second SNES iteration, but not on >>>> the >>>> first. >>>> >>>> I'm trying to reproduce this behavior with PETSc KSP and SNES examples. >>>> However, everything I've tried up to now with SuperLU_DIST does just >>>> fine. >>>> >>>> I'm also checking our code in Valgrind to make sure it's clean. >>>> >>>> Anton >>>> >>>>> Satish >>>>> >>>>> >>>>> On Mon, 10 Oct 2016, Xiaoye S. Li wrote: >>>>> >>>>> Which version of superlu_dist does this capture? I looked at the >>>>>> original >>>>>> error log, it pointed to pdgssvx: line 161. But that line is in >>>>>> comment >>>>>> block, not the program. >>>>>> >>>>>> Sherry >>>>>> >>>>>> >>>>>> On Mon, Oct 10, 2016 at 7:27 AM, Anton Popov >>>>>> wrote: >>>>>> >>>>>> On 10/07/2016 05:23 PM, Satish Balay wrote: >>>>>>> >>>>>>> On Fri, 7 Oct 2016, Kong, Fande wrote: >>>>>>>> >>>>>>>> On Fri, Oct 7, 2016 at 9:04 AM, Satish Balay >>>>>>>> wrote: >>>>>>>> >>>>>>>>> On Fri, 7 Oct 2016, Anton Popov wrote: >>>>>>>>> >>>>>>>>>> Hi guys, >>>>>>>>>> >>>>>>>>>>> are there any news about fixing buggy behavior of >>>>>>>>>>> SuperLU_DIST, exactly >>>>>>>>>>> >>>>>>>>>>> what >>>>>>>>>> >>>>>>>>>> is described here: >>>>>>>>>>> >>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists. >>>>>>>>>>> >>>>>>>>>>> mcs.anl.gov_pipermail_petsc-2Dusers_2015-2DAugust_026802.htm >>>>>>>>>> l&d=CwIBAg&c= >>>>>>>>>> 54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_ >>>>>>>>>> JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=RwruX6ckX0t9H89Z6LXKBfJBOAM2vG >>>>>>>>>> 1sQHw2tIsSQtA&s=bbB62oGLm582JebVs8xsUej_OX0eUwibAKsRRWKafos&e= ? >>>>>>>>>> >>>>>>>>>> I'm using 3.7.4 and still get SEGV in pdgssvx routine. >>>>>>>>>>> Everything works >>>>>>>>>>> >>>>>>>>>>> fine >>>>>>>>>> >>>>>>>>>> with 3.5.4. >>>>>>>>>>> >>>>>>>>>>> Do I still have to stick to maint branch, and what are the >>>>>>>>>>> chances for >>>>>>>>>>> >>>>>>>>>>> these >>>>>>>>>> >>>>>>>>>> fixes to be included in 3.7.5? >>>>>>>>>>> >>>>>>>>>>> 3.7.4. is off maint branch [as of a week ago]. So if you are >>>>>>>>>> seeing >>>>>>>>>> issues with it - its best to debug and figure out the cause. >>>>>>>>>> >>>>>>>>>> This bug is indeed inside of superlu_dist, and we started having >>>>>>>>>> this >>>>>>>>>> >>>>>>>>> issue >>>>>>>>> from PETSc-3.6.x. I think superlu_dist developers should have >>>>>>>>> fixed this >>>>>>>>> bug. We forgot to update superlu_dist?? This is not a thing users >>>>>>>>> could >>>>>>>>> debug and fix. >>>>>>>>> >>>>>>>>> I have many people in INL suffering from this issue, and they have >>>>>>>>> to >>>>>>>>> stay >>>>>>>>> with PETSc-3.5.4 to use superlu_dist. >>>>>>>>> >>>>>>>>> To verify if the bug is fixed in latest superlu_dist - you can try >>>>>>>> [assuming you have git - either from petsc-3.7/maint/master]: >>>>>>>> >>>>>>>> --download-superlu_dist --download-superlu_dist-commit=origin/maint >>>>>>>> >>>>>>>> >>>>>>>> Satish >>>>>>>> >>>>>>>> Hi Satish, >>>>>>>> >>>>>>> I did this: >>>>>>> >>>>>>> git clone -b maint https://bitbucket.org/petsc/petsc.git petsc >>>>>>> >>>>>>> --download-superlu_dist >>>>>>> --download-superlu_dist-commit=origin/maint (not sure this is >>>>>>> needed, >>>>>>> since I'm already in maint) >>>>>>> >>>>>>> The problem is still there. >>>>>>> >>>>>>> Cheers, >>>>>>> Anton >>>>>>> >>>>>>> >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Wed Oct 19 02:54:19 2016 From: jroman at dsic.upv.es (Jose E. Roman) Date: Wed, 19 Oct 2016 09:54:19 +0200 Subject: [petsc-users] BVNormColumn In-Reply-To: References: Message-ID: <4807F42C-75A7-4DE3-A605-A4BDE9CDF868@dsic.upv.es> > El 19 oct 2016, a las 0:26, Bikash Kanungo escribi?: > > Hi Jose, > > Thanks for the pointers. Here's what I observed on probing it further: > > ? The ||B - B^H|| norm was 1e-18. So I explicitly made it Hermitian by setting B = 0.5(B+B^H). However, this didn't help. > ? Next, I checked for the conditioning of B by computing the ratio of the highest and lowest eigenvalues. The conditioning of the order 1e-9. > ? I monitored the imaginary the imaginary part of VecDot(y,x, dotXY) where y = B*x and noted that only when the imaginary part is more than 1e-16 in magnitude, the error of "The inner product is not well defined" is flagged. For the first few iterations of orhtogonalization (i.e., the one where orthogonization is successful), the values of VecDot(y,x, dotXY) are all found to be lower than 1e-16. I guess this small imaginary part might be the cause of the error. > Let me know if there is a way to bypass the abort by changing the tolerance for imaginary part. > > > > Regards, > Bikash > There is something wrong: the condition number is greater than 1 by definition, so it cannot be 1e-9. Anyway, maybe what happens is that your matrix has a very small norm. The SLEPc code needs a fix for the case when the norm of B or the norm of the vector x is very small. Please send the matrix to my personal email and I will make some tests. Jose From jeremy at seamplex.com Wed Oct 19 04:38:53 2016 From: jeremy at seamplex.com (Jeremy Theler) Date: Wed, 19 Oct 2016 06:38:53 -0300 Subject: [petsc-users] Equivalent to MatGetColumnVector for rows? Message-ID: <1476869933.2925.9.camel@seamplex.com> Hi all Is there an equivalent to MatGetColumnVector() but for getting rows of a matrix as a vector? What I want to do is to compute the reactions of the nodes that belong to a Dirichlet boundary condition in a FEM linear elastic problem. I set these BCs with MatZeroRows() with a one in the diagonal and the desired displacement in the RHS vector. But before calling MatZeroRows(), I want to ?remember? what the row looked like so after solving the problem, if I multipliy this original row by the solution vector I get the reaction corresponding to that row's DOF. I have implemented something with MatGetRow() that seems to work but it is some lame I am even embarrased of sharing with the list what I have done. Any suggestion is welcome. Thanks -- jeremy theler www.seamplex.com From knepley at gmail.com Wed Oct 19 04:49:24 2016 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 19 Oct 2016 04:49:24 -0500 Subject: [petsc-users] Equivalent to MatGetColumnVector for rows? In-Reply-To: <1476869933.2925.9.camel@seamplex.com> References: <1476869933.2925.9.camel@seamplex.com> Message-ID: On Wed, Oct 19, 2016 at 4:38 AM, Jeremy Theler wrote: > Hi all > > Is there an equivalent to MatGetColumnVector() but for getting rows of a > matrix as a vector? > > What I want to do is to compute the reactions of the nodes that belong > to a Dirichlet boundary condition in a FEM linear elastic problem. I set > these BCs with MatZeroRows() with a one in the diagonal and the desired > displacement in the RHS vector. But before calling MatZeroRows(), I want > to ?remember? what the row looked like so after solving the problem, if > I multipliy this original row by the solution vector I get the reaction > corresponding to that row's DOF. > > I have implemented something with MatGetRow() that seems to work but it > is some lame I am even embarrased of sharing with the list what I have > done. > If you look at what that code is doing: http://www.mcs.anl.gov/petsc/petsc-current/src/mat/utils/getcolv.c.html#MatGetColumnVector it just puts a 1 in the vector at that column and does a MatMult(). He codes the MatMult by hand because no communication is necessary for in the input vector since we know it. You can do the same thing with MatMultTranspose() for rows, and it will do the right thing in parallel. Matt > Any suggestion is welcome. > > Thanks > -- > jeremy theler > www.seamplex.com > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Oct 19 04:51:54 2016 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 19 Oct 2016 04:51:54 -0500 Subject: [petsc-users] PetscFE questions In-Reply-To: References: Message-ID: On Tue, Oct 18, 2016 at 7:38 AM, Julian Andrej wrote: > Hi, > > i have general question about PetscFE. When i want to assemble certain > parts of physics separately, how can i do that? I basically want to > assemble matrices/vectors from the weak forms on the same DM (and > avoid copying the DM) and use them afterwards. Is there a convenient > way for doing that? > > The "workflow" i'm approaching is something like: > > - Setup the DM > - Setup discretization (spaces and quadrature) for each weak form i > want to compute > - Compute just the weak form i want right now for a specific > discretization and field. > > The reason is i need certain parts of the "complete" Jacobian for > computations of eigenproblems and like to avoid computing those more > often than needed. > The way I envision this working is to use DMCreateSubDM(). It should extract everything correctly for the subset of fields you select. However, I have not extensively tested, so if something is wrong let me know. Thanks, Matt > Regards > Julian > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From kandanovian at gmail.com Wed Oct 19 09:58:47 2016 From: kandanovian at gmail.com (Tim Steinhoff) Date: Wed, 19 Oct 2016 16:58:47 +0200 Subject: [petsc-users] MUMPS and PARMETIS: Crashes Message-ID: Hi all, I have some problems with PETSc using MUMPS and PARMETIS. In some cases it works fine, but in some others it doesn't, so I am trying to understand what is happening. I just picked the following example: http://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex53.c.html Now, when I start it with less than 4 processes it works as expected: mpirun -n 3 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 1 -mat_mumps_icntl_29 2 But with 4 or more processes, it crashes, but only when I am using Parmetis: mpirun -n 4 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 1 -mat_mumps_icntl_29 2 Metis worked in every case I tried without any problems. I wonder if I am doing something wrong or if this is a general problem or even a bug? Is Parmetis supposed to work with that example with 4 processes? Thanks a lot and kind regards. Volker Here is the error log of process 0: Entering DMUMPS 5.0.1 driver with JOB, N = 1 10000 ================================================= MUMPS compiled with option -Dmetis MUMPS compiled with option -Dparmetis ================================================= L U Solver for unsymmetric matrices Type of parallelism: Working host ****** ANALYSIS STEP ******** ** Max-trans not allowed because matrix is distributed Using ParMETIS for parallel ordering. [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [0]PETSC ERROR: likely location of problem given in stack below [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [0]PETSC ERROR: INSTEAD the line number of the start of the function [0]PETSC ERROR: is given. [0]PETSC ERROR: [0] MatLUFactorSymbolic_AIJMUMPS line 1395 /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/mat/impls/aij/mpi/mumps/mumps.c [0]PETSC ERROR: [0] MatLUFactorSymbolic line 2927 /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/mat/interface/matrix.c [0]PETSC ERROR: [0] PCSetUp_LU line 101 /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/pc/impls/factor/lu/lu.c [0]PETSC ERROR: [0] PCSetUp line 930 /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/pc/interface/precon.c [0]PETSC ERROR: [0] KSPSetUp line 305 /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: [0] KSPSolve line 563 /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Signal received [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 [0]PETSC ERROR: ./ex53 on a linux-manni-mumps named manni by 133 Wed Oct 19 16:39:49 2016 [0]PETSC ERROR: Configure options --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-shared-libraries=1 --with-valgrind-dir=~/usr/valgrind/ --with-mpi-dir=/home/software/intel/Intel-2016.4/compilers_and_libraries_2016.4.258/linux/mpi --download-scalapack --download-mumps --download-metis --download-metis-shared=0 --download-parmetis --download-parmetis-shared=0 [0]PETSC ERROR: #1 User provided function() line 0 in unknown file application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 From popov at uni-mainz.de Wed Oct 19 10:22:33 2016 From: popov at uni-mainz.de (Anton Popov) Date: Wed, 19 Oct 2016 17:22:33 +0200 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> Message-ID: Thank you Sherry for your efforts but before I can setup an example that reproduces the problem, I have to ask PETSc related question. When I pump matrix via MatView MatLoad it ignores its original partitioning. Say originally I have 100 and 110 equations on two processors, after MatLoad I will have 105 and 105 also on two processors. What do I do to pass partitioning info through MatView MatLoad? I guess it's important for reproducing my setup exactly. Thanks On 10/19/2016 08:06 AM, Xiaoye S. Li wrote: > I looked at each valgrind-complained item in your email dated Oct. > 11. Those reports are really superficial; I don't see anything wrong > with those lines (mostly uninitialized variables) singled out. I did > a few tests with the latest version in github, all went fine. > > Perhaps you can print your matrix that caused problem, I can run it > using your matrix. > > Sherry > > > On Tue, Oct 11, 2016 at 2:18 PM, Anton > wrote: > > > > On 10/11/16 7:19 PM, Satish Balay wrote: > > This log looks truncated. Are there any valgrind mesages > before this? > [like from your application code - or from MPI] > > Yes it is indeed truncated. I only included relevant messages. > > > Perhaps you can send the complete log - with: > valgrind -q --tool=memcheck --leak-check=yes --num-callers=20 > --track-origins=yes > > [and if there were more valgrind messages from MPI - rebuild petsc > > There are no messages originating from our code, just a few MPI > related ones (probably false positives) and from SuperLU_DIST > (most of them). > > Thanks, > Anton > > with --download-mpich - for a valgrind clean mpi] > > Sherry, > Perhaps this log points to some issue in superlu_dist? > > thanks, > Satish > > On Tue, 11 Oct 2016, Anton Popov wrote: > > Valgrind immediately detects interesting stuff: > > ==25673== Use of uninitialised value of size 8 > ==25673== at 0x178272C: static_schedule > (static_schedule.c:960) > ==25674== Use of uninitialised value of size 8 > ==25674== at 0x178272C: static_schedule > (static_schedule.c:960) > ==25674== by 0x174E74E: pdgstrf (pdgstrf.c:572) > ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) > > > ==25673== Conditional jump or move depends on > uninitialised value(s) > ==25673== at 0x1752143: pdgstrf (dlook_ahead_update.c:24) > ==25673== by 0x1733954: pdgssvx (pdgssvx.c:1124) > > > ==25673== Conditional jump or move depends on > uninitialised value(s) > ==25673== at 0x5C83F43: PMPI_Recv (in > /opt/mpich3/lib/libmpi.so.12.1.0) > ==25673== by 0x1755385: pdgstrf2_trsm (pdgstrf2.c:253) > ==25673== by 0x1751E4F: pdgstrf (dlook_ahead_update.c:195) > ==25673== by 0x1733954: pdgssvx (pdgssvx.c:1124) > > ==25674== Use of uninitialised value of size 8 > ==25674== at 0x62BF72B: _itoa_word (_itoa.c:179) > ==25674== by 0x62C1289: printf_positional (vfprintf.c:2022) > ==25674== by 0x62C2465: vfprintf (vfprintf.c:1677) > ==25674== by 0x638AFD5: __vsnprintf_chk > (vsnprintf_chk.c:63) > ==25674== by 0x638AF37: __snprintf_chk (snprintf_chk.c:34) > ==25674== by 0x5CC6C08: MPIR_Err_create_code_valist (in > /opt/mpich3/lib/libmpi.so.12.1.0) > ==25674== by 0x5CC7A9A: MPIR_Err_create_code (in > /opt/mpich3/lib/libmpi.so.12.1.0) > ==25674== by 0x5C83FB1: PMPI_Recv (in > /opt/mpich3/lib/libmpi.so.12.1.0) > ==25674== by 0x1755385: pdgstrf2_trsm (pdgstrf2.c:253) > ==25674== by 0x1751E4F: pdgstrf (dlook_ahead_update.c:195) > ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) > > ==25674== Use of uninitialised value of size 8 > ==25674== at 0x1751E92: pdgstrf (dlook_ahead_update.c:205) > ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) > > And it crashes after this: > > ==25674== Invalid write of size 4 > ==25674== at 0x1751F2F: pdgstrf (dlook_ahead_update.c:211) > ==25674== by 0x1733954: pdgssvx (pdgssvx.c:1124) > ==25674== by 0xAAEFAE: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:421) > ==25674== Address 0xa0 is not stack'd, malloc'd or > (recently) free'd > ==25674== > [1]PETSC ERROR: > ------------------------------------------------------------------------ > [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation > Violation, probably > memory access out of range > > > On 10/11/2016 03:26 PM, Anton Popov wrote: > > On 10/10/2016 07:11 PM, Satish Balay wrote: > > Thats from petsc-3.5 > > Anton - please post the stack trace you get with > --download-superlu_dist-commit=origin/maint > > I guess this is it: > > [0]PETSC ERROR: [0] SuperLU_DIST:pdgssvx line 421 > /home/anton/LIB/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [0]PETSC ERROR: [0] MatLUFactorNumeric_SuperLU_DIST > line 282 > /home/anton/LIB/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [0]PETSC ERROR: [0] MatLUFactorNumeric line 2985 > /home/anton/LIB/petsc/src/mat/interface/matrix.c > [0]PETSC ERROR: [0] PCSetUp_LU line 101 > /home/anton/LIB/petsc/src/ksp/pc/impls/factor/lu/lu.c > [0]PETSC ERROR: [0] PCSetUp line 930 > /home/anton/LIB/petsc/src/ksp/pc/interface/precon.c > > According to the line numbers it crashes within > MatLUFactorNumeric_SuperLU_DIST while calling pdgssvx. > > Surprisingly this only happens on the second SNES > iteration, but not on the > first. > > I'm trying to reproduce this behavior with PETSc KSP > and SNES examples. > However, everything I've tried up to now with > SuperLU_DIST does just fine. > > I'm also checking our code in Valgrind to make sure > it's clean. > > Anton > > Satish > > > On Mon, 10 Oct 2016, Xiaoye S. Li wrote: > > Which version of superlu_dist does this > capture? I looked at the > original > error log, it pointed to pdgssvx: line 161. > But that line is in > comment > block, not the program. > > Sherry > > > On Mon, Oct 10, 2016 at 7:27 AM, Anton Popov > > wrote: > > On 10/07/2016 05:23 PM, Satish Balay wrote: > > On Fri, 7 Oct 2016, Kong, Fande wrote: > > On Fri, Oct 7, 2016 at 9:04 AM, Satish > Balay > > wrote: > > On Fri, 7 Oct 2016, Anton Popov wrote: > > Hi guys, > > are there any news about > fixing buggy behavior of > SuperLU_DIST, exactly > > what > > is described here: > > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists > . > > mcs.anl.gov_pipermail_petsc-2Dusers_2015-2DAugust_026802.htm > l&d=CwIBAg&c= > 54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_ > JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=RwruX6ckX0t9H89Z6LXKBfJBOAM2vG > 1sQHw2tIsSQtA&s=bbB62oGLm582JebVs8xsUej_OX0eUwibAKsRRWKafos&e= > ? > > I'm using 3.7.4 and still > get SEGV in pdgssvx routine. > Everything works > > fine > > with 3.5.4. > > Do I still have to stick > to maint branch, and what > are the > chances for > > these > > fixes to be included in 3.7.5? > > 3.7.4. is off maint branch [as > of a week ago]. So if you are > seeing > issues with it - its best to > debug and figure out the cause. > > This bug is indeed inside of > superlu_dist, and we started > having > this > > issue > from PETSc-3.6.x. I think > superlu_dist developers should have > fixed this > bug. We forgot to update > superlu_dist?? This is not a thing > users > could > debug and fix. > > I have many people in INL > suffering from this issue, and > they have > to > stay > with PETSc-3.5.4 to use superlu_dist. > > To verify if the bug is fixed in > latest superlu_dist - you can try > [assuming you have git - either from > petsc-3.7/maint/master]: > > --download-superlu_dist > --download-superlu_dist-commit=origin/maint > > > Satish > > Hi Satish, > > I did this: > > git clone -b maint > https://bitbucket.org/petsc/petsc.git > petsc > > --download-superlu_dist > --download-superlu_dist-commit=origin/maint > (not sure this is needed, > since I'm already in maint) > > The problem is still there. > > Cheers, > Anton > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Wed Oct 19 11:08:59 2016 From: hzhang at mcs.anl.gov (Hong) Date: Wed, 19 Oct 2016 11:08:59 -0500 Subject: [petsc-users] MUMPS and PARMETIS: Crashes In-Reply-To: References: Message-ID: Tim: With '-mat_mumps_icntl_28 1', i.e., sequential analysis, I can run ex56 with np=3 or larger np successfully. With '-mat_mumps_icntl_28 2', i.e., parallel analysis, I can run up to np=3. For np=4: mpiexec -n 4 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 2 -mat_mumps_icntl_29 2 -start_in_debugger code crashes inside mumps: Program received signal SIGSEGV, Segmentation fault. 0x00007f33d75857cb in dmumps_parallel_analysis::dmumps_build_scotch_graph ( id=..., first=..., last=..., ipe=..., pe=, work=...) at dana_aux_par.F:1450 1450 MAPTAB(J) = I (gdb) bt #0 0x00007f33d75857cb in dmumps_parallel_analysis::dmumps_build_scotch_graph ( id=..., first=..., last=..., ipe=..., pe=, work=...) at dana_aux_par.F:1450 #1 0x00007f33d759207c in dmumps_parallel_analysis::dmumps_parmetis_ord ( id=..., ord=..., work=...) at dana_aux_par.F:400 #2 0x00007f33d7592d14 in dmumps_parallel_analysis::dmumps_do_par_ord (id=..., ord=..., work=...) at dana_aux_par.F:351 #3 0x00007f33d7593aa9 in dmumps_parallel_analysis::dmumps_ana_f_par (id=..., work1=..., work2=..., nfsiz=..., fils=, frere=) at dana_aux_par.F:98 #4 0x00007f33d74c622a in dmumps_ana_driver (id=...) at dana_driver.F:563 #5 0x00007f33d747706b in dmumps (id=...) at dmumps_driver.F:1108 #6 0x00007f33d74721b5 in dmumps_f77 (job=1, sym=0, par=1, comm_f77=-2080374779, n=10000, icntl=..., cntl=..., keep=..., dkeep=..., keep8=..., nz=0, irn=..., irnhere=0, jcn=..., jcnhere=0, a=..., ahere=0, nz_loc=7500, irn_loc=..., irn_lochere=1, jcn_loc=..., jcn_lochere=1, a_loc=..., a_lochere=1, nelt=0, eltptr=..., eltptrhere=0, eltvar=..., eltvarhere=0, a_elt=..., a_elthere=0, perm_in=..., perm_inhere=0, rhs=..., rhshere=0, redrhs=..., redrhshere=0, info=..., rinfo=..., infog=..., rinfog=..., deficiency=0, lwk_user=0, size_schur=0, listvar_schur=..., ---Type to continue, or q to quit--- ar_schurhere=0, schur=..., schurhere=0, wk_user=..., wk_userhere=0, colsca=..., colscahere=0, rowsca=..., rowscahere=0, instance_number=1, nrhs=1, lrhs=0, lredrhs=0, rhs_sparse=..., rhs_sparsehere=0, sol_loc=..., sol_lochere=0, irhs_sparse=..., irhs_sparsehere=0, irhs_ptr=..., irhs_ptrhere=0, isol_loc=..., isol_lochere=0, nz_rhs=0, lsol_loc=0, schur_mloc=0, schur_nloc=0, schur_lld=0, mblock=0, nblock=0, nprow=0, npcol=0, ooc_tmpdir=..., ooc_prefix=..., write_problem=..., tmpdirlen=20, prefixlen=20, write_problemlen=20) at dmumps_f77.F:260 #7 0x00007f33d74709b1 in dmumps_c (mumps_par=0x16126f0) at mumps_c.c:415 #8 0x00007f33d68408ca in MatLUFactorSymbolic_AIJMUMPS (F=0x1610280, A=0x14bafc0, r=0x160cc30, c=0x1609ed0, info=0x15c6708) at /scratch/hzhang/petsc/src/mat/impls/aij/mpi/mumps/mumps.c:1487 -mat_mumps_icntl_29 = 0 or 1 give same error. I'm cc'ing this email to mumps developer, who may help to resolve this matter. Hong Hi all, > > I have some problems with PETSc using MUMPS and PARMETIS. > In some cases it works fine, but in some others it doesn't, so I am > trying to understand what is happening. > > I just picked the following example: > http://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/ > examples/tutorials/ex53.c.html > > Now, when I start it with less than 4 processes it works as expected: > mpirun -n 3 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 1 > -mat_mumps_icntl_29 2 > > But with 4 or more processes, it crashes, but only when I am using > Parmetis: > mpirun -n 4 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 1 > -mat_mumps_icntl_29 2 > > Metis worked in every case I tried without any problems. > > I wonder if I am doing something wrong or if this is a general problem > or even a bug? Is Parmetis supposed to work with that example with 4 > processes? > > Thanks a lot and kind regards. > > Volker > > > Here is the error log of process 0: > > Entering DMUMPS 5.0.1 driver with JOB, N = 1 10000 > ================================================= > MUMPS compiled with option -Dmetis > MUMPS compiled with option -Dparmetis > ================================================= > L U Solver for unsymmetric matrices > Type of parallelism: Working host > > ****** ANALYSIS STEP ******** > > ** Max-trans not allowed because matrix is distributed > Using ParMETIS for parallel ordering. > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > OS X to find memory corruption errors > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > [0]PETSC ERROR: INSTEAD the line number of the start of the function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: [0] MatLUFactorSymbolic_AIJMUMPS line 1395 > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/mat/ > impls/aij/mpi/mumps/mumps.c > [0]PETSC ERROR: [0] MatLUFactorSymbolic line 2927 > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/mat/interface/matrix.c > [0]PETSC ERROR: [0] PCSetUp_LU line 101 > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/ > pc/impls/factor/lu/lu.c > [0]PETSC ERROR: [0] PCSetUp line 930 > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: [0] KSPSetUp line 305 > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: [0] KSPSolve line 563 > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Signal received > [0]PETSC ERROR: See > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble > shooting. > [0]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 > [0]PETSC ERROR: ./ex53 on a linux-manni-mumps named manni by 133 Wed > Oct 19 16:39:49 2016 > [0]PETSC ERROR: Configure options --with-cc=mpiicc --with-cxx=mpiicpc > --with-fc=mpiifort --with-shared-libraries=1 > --with-valgrind-dir=~/usr/valgrind/ > --with-mpi-dir=/home/software/intel/Intel-2016.4/compilers_ > and_libraries_2016.4.258/linux/mpi > --download-scalapack --download-mumps --download-metis > --download-metis-shared=0 --download-parmetis > --download-parmetis-shared=0 > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Oct 19 12:34:48 2016 From: jed at jedbrown.org (Jed Brown) Date: Wed, 19 Oct 2016 11:34:48 -0600 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> Message-ID: <878ttkuts7.fsf@jedbrown.org> Anton Popov writes: > Thank you Sherry for your efforts > > but before I can setup an example that reproduces the problem, I have to > ask PETSc related question. > > When I pump matrix via MatView MatLoad it ignores its original partitioning. > > Say originally I have 100 and 110 equations on two processors, after > MatLoad I will have 105 and 105 also on two processors. > > What do I do to pass partitioning info through MatView MatLoad? Call MatSetSizes before MatLoad. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From bsmith at mcs.anl.gov Wed Oct 19 13:32:48 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 19 Oct 2016 13:32:48 -0500 Subject: [petsc-users] MUMPS and PARMETIS: Crashes In-Reply-To: References: Message-ID: <3A041F37-6368-4060-81A5-59D0130584C9@mcs.anl.gov> Tim, You can/should also run with valgrind to determine exactly the first point with memory corruption issues. Barry > On Oct 19, 2016, at 11:08 AM, Hong wrote: > > Tim: > With '-mat_mumps_icntl_28 1', i.e., sequential analysis, I can run ex56 with np=3 or larger np successfully. > > With '-mat_mumps_icntl_28 2', i.e., parallel analysis, I can run up to np=3. > > For np=4: > mpiexec -n 4 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 2 -mat_mumps_icntl_29 2 -start_in_debugger > > code crashes inside mumps: > Program received signal SIGSEGV, Segmentation fault. > 0x00007f33d75857cb in dmumps_parallel_analysis::dmumps_build_scotch_graph ( > id=..., first=..., last=..., ipe=..., > pe=, work=...) > at dana_aux_par.F:1450 > 1450 MAPTAB(J) = I > (gdb) bt > #0 0x00007f33d75857cb in dmumps_parallel_analysis::dmumps_build_scotch_graph ( > id=..., first=..., last=..., ipe=..., > pe=, work=...) > at dana_aux_par.F:1450 > #1 0x00007f33d759207c in dmumps_parallel_analysis::dmumps_parmetis_ord ( > id=..., ord=..., work=...) at dana_aux_par.F:400 > #2 0x00007f33d7592d14 in dmumps_parallel_analysis::dmumps_do_par_ord (id=..., > ord=..., work=...) at dana_aux_par.F:351 > #3 0x00007f33d7593aa9 in dmumps_parallel_analysis::dmumps_ana_f_par (id=..., > work1=..., work2=..., nfsiz=..., > fils=, > frere=) > at dana_aux_par.F:98 > #4 0x00007f33d74c622a in dmumps_ana_driver (id=...) at dana_driver.F:563 > #5 0x00007f33d747706b in dmumps (id=...) at dmumps_driver.F:1108 > #6 0x00007f33d74721b5 in dmumps_f77 (job=1, sym=0, par=1, > comm_f77=-2080374779, n=10000, icntl=..., cntl=..., keep=..., dkeep=..., > keep8=..., nz=0, irn=..., irnhere=0, jcn=..., jcnhere=0, a=..., ahere=0, > nz_loc=7500, irn_loc=..., irn_lochere=1, jcn_loc=..., jcn_lochere=1, > a_loc=..., a_lochere=1, nelt=0, eltptr=..., eltptrhere=0, eltvar=..., > eltvarhere=0, a_elt=..., a_elthere=0, perm_in=..., perm_inhere=0, rhs=..., > rhshere=0, redrhs=..., redrhshere=0, info=..., rinfo=..., infog=..., > rinfog=..., deficiency=0, lwk_user=0, size_schur=0, listvar_schur=..., > ---Type to continue, or q to quit--- > ar_schurhere=0, schur=..., schurhere=0, wk_user=..., wk_userhere=0, colsca=..., > colscahere=0, rowsca=..., rowscahere=0, instance_number=1, nrhs=1, lrhs=0, lredrhs=0, > rhs_sparse=..., rhs_sparsehere=0, sol_loc=..., sol_lochere=0, irhs_sparse=..., > irhs_sparsehere=0, irhs_ptr=..., irhs_ptrhere=0, isol_loc=..., isol_lochere=0, > nz_rhs=0, lsol_loc=0, schur_mloc=0, schur_nloc=0, schur_lld=0, mblock=0, nblock=0, > nprow=0, npcol=0, ooc_tmpdir=..., ooc_prefix=..., write_problem=..., tmpdirlen=20, > prefixlen=20, write_problemlen=20) at dmumps_f77.F:260 > #7 0x00007f33d74709b1 in dmumps_c (mumps_par=0x16126f0) at mumps_c.c:415 > #8 0x00007f33d68408ca in MatLUFactorSymbolic_AIJMUMPS (F=0x1610280, A=0x14bafc0, > r=0x160cc30, c=0x1609ed0, info=0x15c6708) > at /scratch/hzhang/petsc/src/mat/impls/aij/mpi/mumps/mumps.c:1487 > > -mat_mumps_icntl_29 = 0 or 1 give same error. > I'm cc'ing this email to mumps developer, who may help to resolve this matter. > > Hong > > > Hi all, > > I have some problems with PETSc using MUMPS and PARMETIS. > In some cases it works fine, but in some others it doesn't, so I am > trying to understand what is happening. > > I just picked the following example: > http://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex53.c.html > > Now, when I start it with less than 4 processes it works as expected: > mpirun -n 3 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 1 > -mat_mumps_icntl_29 2 > > But with 4 or more processes, it crashes, but only when I am using Parmetis: > mpirun -n 4 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 1 > -mat_mumps_icntl_29 2 > > Metis worked in every case I tried without any problems. > > I wonder if I am doing something wrong or if this is a general problem > or even a bug? Is Parmetis supposed to work with that example with 4 > processes? > > Thanks a lot and kind regards. > > Volker > > > Here is the error log of process 0: > > Entering DMUMPS 5.0.1 driver with JOB, N = 1 10000 > ================================================= > MUMPS compiled with option -Dmetis > MUMPS compiled with option -Dparmetis > ================================================= > L U Solver for unsymmetric matrices > Type of parallelism: Working host > > ****** ANALYSIS STEP ******** > > ** Max-trans not allowed because matrix is distributed > Using ParMETIS for parallel ordering. > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > OS X to find memory corruption errors > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [0]PETSC ERROR: INSTEAD the line number of the start of the function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: [0] MatLUFactorSymbolic_AIJMUMPS line 1395 > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/mat/impls/aij/mpi/mumps/mumps.c > [0]PETSC ERROR: [0] MatLUFactorSymbolic line 2927 > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/mat/interface/matrix.c > [0]PETSC ERROR: [0] PCSetUp_LU line 101 > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/pc/impls/factor/lu/lu.c > [0]PETSC ERROR: [0] PCSetUp line 930 > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: [0] KSPSetUp line 305 > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: [0] KSPSolve line 563 > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Signal received > [0]PETSC ERROR: See > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble > shooting. > [0]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 > [0]PETSC ERROR: ./ex53 on a linux-manni-mumps named manni by 133 Wed > Oct 19 16:39:49 2016 > [0]PETSC ERROR: Configure options --with-cc=mpiicc --with-cxx=mpiicpc > --with-fc=mpiifort --with-shared-libraries=1 > --with-valgrind-dir=~/usr/valgrind/ > --with-mpi-dir=/home/software/intel/Intel-2016.4/compilers_and_libraries_2016.4.258/linux/mpi > --download-scalapack --download-mumps --download-metis > --download-metis-shared=0 --download-parmetis > --download-parmetis-shared=0 > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > From bsmith at mcs.anl.gov Wed Oct 19 14:54:47 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 19 Oct 2016 14:54:47 -0500 Subject: [petsc-users] Equivalent to MatGetColumnVector for rows? In-Reply-To: <1476869933.2925.9.camel@seamplex.com> References: <1476869933.2925.9.camel@seamplex.com> Message-ID: I don't think you want to store the values in a Vector; the vector will be as large as the entire right hand side but be almost all zeros. If you want to remember "the part of the matrix that is zeroed out by MatZeroRows()" you can use MatGetSubMatrix() and request just the zeroed rows but all the columns. This matrix will be in parallel, on each process it will just have the "zero rows" for that process. If you multiply this matrix by the solution vector you will get a "short" vector that on each process contains the "reaction" for each each of the "removed row" on that process. Easy to implement. Barry > On Oct 19, 2016, at 4:38 AM, Jeremy Theler wrote: > > Hi all > > Is there an equivalent to MatGetColumnVector() but for getting rows of a > matrix as a vector? > > What I want to do is to compute the reactions of the nodes that belong > to a Dirichlet boundary condition in a FEM linear elastic problem. I set > these BCs with MatZeroRows() with a one in the diagonal and the desired > displacement in the RHS vector. But before calling MatZeroRows(), I want > to ?remember? what the row looked like so after solving the problem, if > I multipliy this original row by the solution vector I get the reaction > corresponding to that row's DOF. > > I have implemented something with MatGetRow() that seems to work but it > is some lame I am even embarrased of sharing with the list what I have > done. > > Any suggestion is welcome. > > Thanks > -- > jeremy theler > www.seamplex.com > > From thronesf at gmail.com Wed Oct 19 15:15:08 2016 From: thronesf at gmail.com (Sharp Stone) Date: Wed, 19 Oct 2016 16:15:08 -0400 Subject: [petsc-users] Petsc Profiling for Each Function Message-ID: Dear all, Now I'm using a Petsc code which needs to be optimized. But after trying, I still don't know how to get the profiling for each of the function for each process. I mean, for each process, how should I know the execution time for each function? Thanks! -- Best regards, Feng -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Oct 19 19:33:00 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 19 Oct 2016 19:33:00 -0500 Subject: [petsc-users] Petsc Profiling for Each Function In-Reply-To: References: Message-ID: To start you should just run the code with the ./configure option --with-debugging=0 with the option -log_view this will give you the high view of where it is spending the time. Feel free to email back the output. From that you can focus on what parts are both taking a lot of time AND running slowly and that gives a good idea of what needs to be optimized. Barry > On Oct 19, 2016, at 3:15 PM, Sharp Stone wrote: > > Dear all, > > Now I'm using a Petsc code which needs to be optimized. But after trying, I still don't know how to get the profiling for each of the function for each process. I mean, for each process, how should I know the execution time for each function? > > Thanks! > > -- > Best regards, > > Feng From aks084000 at utdallas.edu Thu Oct 20 01:22:20 2016 From: aks084000 at utdallas.edu (Safin, Artur) Date: Thu, 20 Oct 2016 06:22:20 +0000 Subject: [petsc-users] How to set up an interface-type problem Message-ID: <36adffebd0954095b5d91413cac9355c@utdallas.edu> Hi all, I would like to get your advice on how to set up an interface problem that I get from domain decomposition. The particular issue that I am dealing with is how to 'stack' two vectors on top of each other. I would like to set up a problem of type [ A B ] [ x ] = [ b ] [ C D ] [ y ] [ c ] where x and y live on a subset of the global domain (the interface to be exact, obtained with VecGetSubVector). I want to solve this system with an iterative method. I already have the x and y vectors, but in order to set up the system I believe I will need a vector that looks like v = [x; y]. Is there a way to set up a vector like this? It would also be beneficial if I could extract either the x or y component of the solution vector back into the corresponding subvector. Also, just in general, I am curious as to how one would approach setting up these kinds of problems. Thanks, Artur -------------- next part -------------- An HTML attachment was scrubbed... URL: From alfredo.buttari at enseeiht.fr Thu Oct 20 03:19:55 2016 From: alfredo.buttari at enseeiht.fr (Alfredo Buttari) Date: Thu, 20 Oct 2016 10:19:55 +0200 Subject: [petsc-users] [mumps-dev] MUMPS and PARMETIS: Crashes In-Reply-To: <3A041F37-6368-4060-81A5-59D0130584C9@mcs.anl.gov> References: <3A041F37-6368-4060-81A5-59D0130584C9@mcs.anl.gov> Message-ID: Dear all, this may well be due to a bug in the parallel analysis. Do you think you can reproduce the problem in a standalone MUMPS program (i.e., without going through PETSc) ? that would save a lot of time to track the bug since we do not have a PETSc install at hand. Otherwise we'll give it a shot at installing petsc and reproducing the problem on our side. Kind regards, the MUMPS team On Wed, Oct 19, 2016 at 8:32 PM, Barry Smith wrote: > > Tim, > > You can/should also run with valgrind to determine exactly the first > point with memory corruption issues. > > Barry > > > On Oct 19, 2016, at 11:08 AM, Hong wrote: > > > > Tim: > > With '-mat_mumps_icntl_28 1', i.e., sequential analysis, I can run ex56 > with np=3 or larger np successfully. > > > > With '-mat_mumps_icntl_28 2', i.e., parallel analysis, I can run up to > np=3. > > > > For np=4: > > mpiexec -n 4 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 2 > -mat_mumps_icntl_29 2 -start_in_debugger > > > > code crashes inside mumps: > > Program received signal SIGSEGV, Segmentation fault. > > 0x00007f33d75857cb in dmumps_parallel_analysis::dmumps_build_scotch_graph > ( > > id=..., first=..., last=..., ipe=..., > > pe=, > work=...) > > at dana_aux_par.F:1450 > > 1450 MAPTAB(J) = I > > (gdb) bt > > #0 0x00007f33d75857cb in dmumps_parallel_analysis::dmumps_build_scotch_graph > ( > > id=..., first=..., last=..., ipe=..., > > pe=, > work=...) > > at dana_aux_par.F:1450 > > #1 0x00007f33d759207c in dmumps_parallel_analysis::dmumps_parmetis_ord > ( > > id=..., ord=..., work=...) at dana_aux_par.F:400 > > #2 0x00007f33d7592d14 in dmumps_parallel_analysis::dmumps_do_par_ord > (id=..., > > ord=..., work=...) at dana_aux_par.F:351 > > #3 0x00007f33d7593aa9 in dmumps_parallel_analysis::dmumps_ana_f_par > (id=..., > > work1=..., work2=..., nfsiz=..., > > fils=, > > frere=) > > at dana_aux_par.F:98 > > #4 0x00007f33d74c622a in dmumps_ana_driver (id=...) at dana_driver.F:563 > > #5 0x00007f33d747706b in dmumps (id=...) at dmumps_driver.F:1108 > > #6 0x00007f33d74721b5 in dmumps_f77 (job=1, sym=0, par=1, > > comm_f77=-2080374779, n=10000, icntl=..., cntl=..., keep=..., > dkeep=..., > > keep8=..., nz=0, irn=..., irnhere=0, jcn=..., jcnhere=0, a=..., > ahere=0, > > nz_loc=7500, irn_loc=..., irn_lochere=1, jcn_loc=..., jcn_lochere=1, > > a_loc=..., a_lochere=1, nelt=0, eltptr=..., eltptrhere=0, eltvar=..., > > eltvarhere=0, a_elt=..., a_elthere=0, perm_in=..., perm_inhere=0, > rhs=..., > > rhshere=0, redrhs=..., redrhshere=0, info=..., rinfo=..., infog=..., > > rinfog=..., deficiency=0, lwk_user=0, size_schur=0, > listvar_schur=..., > > ---Type to continue, or q to quit--- > > ar_schurhere=0, schur=..., schurhere=0, wk_user=..., wk_userhere=0, > colsca=..., > > colscahere=0, rowsca=..., rowscahere=0, instance_number=1, nrhs=1, > lrhs=0, lredrhs=0, > > rhs_sparse=..., rhs_sparsehere=0, sol_loc=..., sol_lochere=0, > irhs_sparse=..., > > irhs_sparsehere=0, irhs_ptr=..., irhs_ptrhere=0, isol_loc=..., > isol_lochere=0, > > nz_rhs=0, lsol_loc=0, schur_mloc=0, schur_nloc=0, schur_lld=0, > mblock=0, nblock=0, > > nprow=0, npcol=0, ooc_tmpdir=..., ooc_prefix=..., write_problem=..., > tmpdirlen=20, > > prefixlen=20, write_problemlen=20) at dmumps_f77.F:260 > > #7 0x00007f33d74709b1 in dmumps_c (mumps_par=0x16126f0) at mumps_c.c:415 > > #8 0x00007f33d68408ca in MatLUFactorSymbolic_AIJMUMPS (F=0x1610280, > A=0x14bafc0, > > r=0x160cc30, c=0x1609ed0, info=0x15c6708) > > at /scratch/hzhang/petsc/src/mat/impls/aij/mpi/mumps/mumps.c:1487 > > > > -mat_mumps_icntl_29 = 0 or 1 give same error. > > I'm cc'ing this email to mumps developer, who may help to resolve this > matter. > > > > Hong > > > > > > Hi all, > > > > I have some problems with PETSc using MUMPS and PARMETIS. > > In some cases it works fine, but in some others it doesn't, so I am > > trying to understand what is happening. > > > > I just picked the following example: > > http://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/ > examples/tutorials/ex53.c.html > > > > Now, when I start it with less than 4 processes it works as expected: > > mpirun -n 3 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 1 > > -mat_mumps_icntl_29 2 > > > > But with 4 or more processes, it crashes, but only when I am using > Parmetis: > > mpirun -n 4 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 1 > > -mat_mumps_icntl_29 2 > > > > Metis worked in every case I tried without any problems. > > > > I wonder if I am doing something wrong or if this is a general problem > > or even a bug? Is Parmetis supposed to work with that example with 4 > > processes? > > > > Thanks a lot and kind regards. > > > > Volker > > > > > > Here is the error log of process 0: > > > > Entering DMUMPS 5.0.1 driver with JOB, N = 1 10000 > > ================================================= > > MUMPS compiled with option -Dmetis > > MUMPS compiled with option -Dparmetis > > ================================================= > > L U Solver for unsymmetric matrices > > Type of parallelism: Working host > > > > ****** ANALYSIS STEP ******** > > > > ** Max-trans not allowed because matrix is distributed > > Using ParMETIS for parallel ordering. > > [0]PETSC ERROR: > > ------------------------------------------------------------------------ > > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > > probably memory access out of range > > [0]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > > [0]PETSC ERROR: or see > > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > > OS X to find memory corruption errors > > [0]PETSC ERROR: likely location of problem given in stack below > > [0]PETSC ERROR: --------------------- Stack Frames > > ------------------------------------ > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > > [0]PETSC ERROR: INSTEAD the line number of the start of the > function > > [0]PETSC ERROR: is given. > > [0]PETSC ERROR: [0] MatLUFactorSymbolic_AIJMUMPS line 1395 > > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/mat/ > impls/aij/mpi/mumps/mumps.c > > [0]PETSC ERROR: [0] MatLUFactorSymbolic line 2927 > > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/mat/interface/matrix.c > > [0]PETSC ERROR: [0] PCSetUp_LU line 101 > > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/ > pc/impls/factor/lu/lu.c > > [0]PETSC ERROR: [0] PCSetUp line 930 > > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/ > pc/interface/precon.c > > [0]PETSC ERROR: [0] KSPSetUp line 305 > > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/ > ksp/interface/itfunc.c > > [0]PETSC ERROR: [0] KSPSolve line 563 > > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/ > ksp/interface/itfunc.c > > [0]PETSC ERROR: --------------------- Error Message > > -------------------------------------------------------------- > > [0]PETSC ERROR: Signal received > > [0]PETSC ERROR: See > > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble > > shooting. > > [0]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 > > [0]PETSC ERROR: ./ex53 on a linux-manni-mumps named manni by 133 Wed > > Oct 19 16:39:49 2016 > > [0]PETSC ERROR: Configure options --with-cc=mpiicc --with-cxx=mpiicpc > > --with-fc=mpiifort --with-shared-libraries=1 > > --with-valgrind-dir=~/usr/valgrind/ > > --with-mpi-dir=/home/software/intel/Intel-2016.4/compilers_ > and_libraries_2016.4.258/linux/mpi > > --download-scalapack --download-mumps --download-metis > > --download-metis-shared=0 --download-parmetis > > --download-parmetis-shared=0 > > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file > > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > > > > -- ----------------------------------------- Alfredo Buttari, PhD CNRS-IRIT 2 rue Camichel, 31071 Toulouse, France http://buttari.perso.enseeiht.fr -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Oct 20 05:07:10 2016 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 20 Oct 2016 05:07:10 -0500 Subject: [petsc-users] How to set up an interface-type problem In-Reply-To: <36adffebd0954095b5d91413cac9355c@utdallas.edu> References: <36adffebd0954095b5d91413cac9355c@utdallas.edu> Message-ID: On Thu, Oct 20, 2016 at 1:22 AM, Safin, Artur wrote: > Hi all, > > I would like to get your advice on how to set up an interface problem that > I get from domain decomposition. The particular issue that I am dealing > with is how to 'stack' two vectors on top of each other. > > I would like to set up a problem of type > > [ A B ] [ x ] = [ b ] > [ C D ] [ y ] [ c ] > > where x and y live on a subset of the global domain (the interface to be > exact, obtained with VecGetSubVector). I want to solve this system with > an iterative method. I already have the x and y vectors, but in order to > set up the system I believe I will need a vector that looks like v = [x; > y]. Is there a way to set up a vector like this? It would also be > beneficial if I could extract either the x or y component of the solution > vector back into the corresponding subvector. > > Also, just in general, I am curious as to how one would approach setting > up these kinds of problems. > You use http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetLocalSubMatrix.html http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecGetSubVector.html to put values directly into the subvectors and submatrices. This interacts well with MatNest, so it can be optimized after you get it working. Matt > Thanks, > > Artur > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremy at seamplex.com Thu Oct 20 07:26:04 2016 From: jeremy at seamplex.com (Jeremy Theler) Date: Thu, 20 Oct 2016 09:26:04 -0300 Subject: [petsc-users] Equivalent to MatGetColumnVector for rows? In-Reply-To: References: <1476869933.2925.9.camel@seamplex.com> Message-ID: <1476966364.6284.2.camel@seamplex.com> Thank you Barry. That makes a lot of sense. -- jeremy On Wed, 2016-10-19 at 14:54 -0500, Barry Smith wrote: > I don't think you want to store the values in a Vector; the vector will be as large as the entire right hand side but be almost all zeros. > > If you want to remember "the part of the matrix that is zeroed out by MatZeroRows()" you can use MatGetSubMatrix() and request just the zeroed rows but all the columns. This matrix will be in parallel, on each process it will just have the "zero rows" for that process. If you multiply this matrix by the solution vector you will get a "short" vector that on each process contains the "reaction" for each each of the "removed row" on that process. > > Easy to implement. > > Barry > > > On Oct 19, 2016, at 4:38 AM, Jeremy Theler wrote: > > > > Hi all > > > > Is there an equivalent to MatGetColumnVector() but for getting rows of a > > matrix as a vector? > > > > What I want to do is to compute the reactions of the nodes that belong > > to a Dirichlet boundary condition in a FEM linear elastic problem. I set > > these BCs with MatZeroRows() with a one in the diagonal and the desired > > displacement in the RHS vector. But before calling MatZeroRows(), I want > > to ?remember? what the row looked like so after solving the problem, if > > I multipliy this original row by the solution vector I get the reaction > > corresponding to that row's DOF. > > > > I have implemented something with MatGetRow() that seems to work but it > > is some lame I am even embarrased of sharing with the list what I have > > done. > > > > Any suggestion is welcome. > > > > Thanks > > -- > > jeremy theler > > www.seamplex.com > > > > > From juan at tf.uni-kiel.de Thu Oct 20 09:42:13 2016 From: juan at tf.uni-kiel.de (Julian Andrej) Date: Thu, 20 Oct 2016 16:42:13 +0200 Subject: [petsc-users] PetscFE questions In-Reply-To: References: Message-ID: Thanks for the suggestion. I guess DMCreateSubDM can work, but is cumbersome to handle for the normal solution process since the mass matrix for example is not a seperate field. src/snes/examples/tutorials/ex77 handles a seperate field for the nullspace, if anyone is interested in that. An intuitive way was just copying the DM and describing a new problem on it. DM dm_mass; PetscDS ds_mass; Vec dummy; PetscInt id = 1; petsc_call(DMCreateGlobalVector(dm, &dummy)); petsc_call(DMClone(ctx->dm, &dm_mass)); petsc_call(DMGetDS(dm_mass, &ds_mass)); petsc_call(PetscDSSetDiscretization(ds_mass, 0, (PetscObject)fe)); petsc_call(PetscDSSetJacobian(ds_mass, 0, 0, mass_kernel, NULL, NULL, NULL)); petsc_call(PetscDSAddBoundary(ds_mass, PETSC_TRUE, "wall", "marker", 0, 0, NULL, (void (*)())ctx->exact_funcs[0], 1, &id, ctx)); petsc_call(DMCreateMatrix(dm_mass, &ctx->M)); petsc_call(DMPlexSNESComputeJacobianFEM(dm_mass, dummy, ctx->M, ctx->M, NULL)); is this an intended way to assemble a jacobian based on a weak form? The memory overhead for a DM copy isn't huge on the first sight. And a much more important question. Is there any mathematical description how exactly you handle dirichlet boundary conditions here? On first sight it looks like condensing the nodes only to non-essential nodes and then projecting them back in the solution vector. If thats teh case I don't understand how you "augment" the solution with the boundary nodes. Regards Julian On Wed, Oct 19, 2016 at 11:51 AM, Matthew Knepley wrote: > On Tue, Oct 18, 2016 at 7:38 AM, Julian Andrej wrote: >> >> Hi, >> >> i have general question about PetscFE. When i want to assemble certain >> parts of physics separately, how can i do that? I basically want to >> assemble matrices/vectors from the weak forms on the same DM (and >> avoid copying the DM) and use them afterwards. Is there a convenient >> way for doing that? >> >> The "workflow" i'm approaching is something like: >> >> - Setup the DM >> - Setup discretization (spaces and quadrature) for each weak form i >> want to compute >> - Compute just the weak form i want right now for a specific >> discretization and field. >> >> The reason is i need certain parts of the "complete" Jacobian for >> computations of eigenproblems and like to avoid computing those more >> often than needed. > > > The way I envision this working is to use DMCreateSubDM(). It should extract > everything correctly for the subset of fields you select. However, I have > not > extensively tested, so if something is wrong let me know. > > Thanks, > > Matt > >> >> Regards >> Julian > > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener From hzhang at mcs.anl.gov Thu Oct 20 09:44:43 2016 From: hzhang at mcs.anl.gov (Hong) Date: Thu, 20 Oct 2016 09:44:43 -0500 Subject: [petsc-users] [mumps-dev] MUMPS and PARMETIS: Crashes In-Reply-To: References: <3A041F37-6368-4060-81A5-59D0130584C9@mcs.anl.gov> Message-ID: Alfredo: It would be much easier to install petsc with mumps, parmetis, and debugging this case. Here is what you can do on a linux machine (see http://www.mcs.anl.gov/petsc/documentation/installation.html): 1) get petsc-release: git clone -b maint https://bitbucket.org/petsc/petsc petsc cd petsc git pull export PETSC_DIR=$PWD export PETSC_ARCH=<> 2) configure petsc with additional options '--download-metis --download-parmetis --download-mumps --download-scalapack --download-ptscotch' see http://www.mcs.anl.gov/petsc/documentation/installation.html 3) build petsc and test make make test 4) test ex53.c: cd $PETSC_DIR/src/ksp/ksp/examples/tutorials make ex53 mpiexec -n 4 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 2 -mat_mumps_icntl_29 2 5) debugging ex53.c: mpiexec -n 4 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 2 -mat_mumps_icntl_29 2 -start_in_debugger Give it a try. Contact us if you cannot reproduce this case. Hong Dear all, > this may well be due to a bug in the parallel analysis. Do you think you > can reproduce the problem in a standalone MUMPS program (i.e., without > going through PETSc) ? that would save a lot of time to track the bug since > we do not have a PETSc install at hand. Otherwise we'll give it a shot at > installing petsc and reproducing the problem on our side. > > Kind regards, > the MUMPS team > > > > On Wed, Oct 19, 2016 at 8:32 PM, Barry Smith wrote: > >> >> Tim, >> >> You can/should also run with valgrind to determine exactly the first >> point with memory corruption issues. >> >> Barry >> >> > On Oct 19, 2016, at 11:08 AM, Hong wrote: >> > >> > Tim: >> > With '-mat_mumps_icntl_28 1', i.e., sequential analysis, I can run ex56 >> with np=3 or larger np successfully. >> > >> > With '-mat_mumps_icntl_28 2', i.e., parallel analysis, I can run up to >> np=3. >> > >> > For np=4: >> > mpiexec -n 4 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 2 >> -mat_mumps_icntl_29 2 -start_in_debugger >> > >> > code crashes inside mumps: >> > Program received signal SIGSEGV, Segmentation fault. >> > 0x00007f33d75857cb in dmumps_parallel_analysis::dmumps_build_scotch_graph >> ( >> > id=..., first=..., last=..., ipe=..., >> > pe=, >> work=...) >> > at dana_aux_par.F:1450 >> > 1450 MAPTAB(J) = I >> > (gdb) bt >> > #0 0x00007f33d75857cb in dmumps_parallel_analysis::dmumps_build_scotch_graph >> ( >> > id=..., first=..., last=..., ipe=..., >> > pe=, >> work=...) >> > at dana_aux_par.F:1450 >> > #1 0x00007f33d759207c in dmumps_parallel_analysis::dmumps_parmetis_ord >> ( >> > id=..., ord=..., work=...) at dana_aux_par.F:400 >> > #2 0x00007f33d7592d14 in dmumps_parallel_analysis::dmumps_do_par_ord >> (id=..., >> > ord=..., work=...) at dana_aux_par.F:351 >> > #3 0x00007f33d7593aa9 in dmumps_parallel_analysis::dmumps_ana_f_par >> (id=..., >> > work1=..., work2=..., nfsiz=..., >> > fils=, >> > frere=) >> > at dana_aux_par.F:98 >> > #4 0x00007f33d74c622a in dmumps_ana_driver (id=...) at >> dana_driver.F:563 >> > #5 0x00007f33d747706b in dmumps (id=...) at dmumps_driver.F:1108 >> > #6 0x00007f33d74721b5 in dmumps_f77 (job=1, sym=0, par=1, >> > comm_f77=-2080374779, n=10000, icntl=..., cntl=..., keep=..., >> dkeep=..., >> > keep8=..., nz=0, irn=..., irnhere=0, jcn=..., jcnhere=0, a=..., >> ahere=0, >> > nz_loc=7500, irn_loc=..., irn_lochere=1, jcn_loc=..., jcn_lochere=1, >> > a_loc=..., a_lochere=1, nelt=0, eltptr=..., eltptrhere=0, >> eltvar=..., >> > eltvarhere=0, a_elt=..., a_elthere=0, perm_in=..., perm_inhere=0, >> rhs=..., >> > rhshere=0, redrhs=..., redrhshere=0, info=..., rinfo=..., infog=..., >> > rinfog=..., deficiency=0, lwk_user=0, size_schur=0, >> listvar_schur=..., >> > ---Type to continue, or q to quit--- >> > ar_schurhere=0, schur=..., schurhere=0, wk_user=..., wk_userhere=0, >> colsca=..., >> > colscahere=0, rowsca=..., rowscahere=0, instance_number=1, nrhs=1, >> lrhs=0, lredrhs=0, >> > rhs_sparse=..., rhs_sparsehere=0, sol_loc=..., sol_lochere=0, >> irhs_sparse=..., >> > irhs_sparsehere=0, irhs_ptr=..., irhs_ptrhere=0, isol_loc=..., >> isol_lochere=0, >> > nz_rhs=0, lsol_loc=0, schur_mloc=0, schur_nloc=0, schur_lld=0, >> mblock=0, nblock=0, >> > nprow=0, npcol=0, ooc_tmpdir=..., ooc_prefix=..., >> write_problem=..., tmpdirlen=20, >> > prefixlen=20, write_problemlen=20) at dmumps_f77.F:260 >> > #7 0x00007f33d74709b1 in dmumps_c (mumps_par=0x16126f0) at >> mumps_c.c:415 >> > #8 0x00007f33d68408ca in MatLUFactorSymbolic_AIJMUMPS (F=0x1610280, >> A=0x14bafc0, >> > r=0x160cc30, c=0x1609ed0, info=0x15c6708) >> > at /scratch/hzhang/petsc/src/mat/impls/aij/mpi/mumps/mumps.c:1487 >> > >> > -mat_mumps_icntl_29 = 0 or 1 give same error. >> > I'm cc'ing this email to mumps developer, who may help to resolve this >> matter. >> > >> > Hong >> > >> > >> > Hi all, >> > >> > I have some problems with PETSc using MUMPS and PARMETIS. >> > In some cases it works fine, but in some others it doesn't, so I am >> > trying to understand what is happening. >> > >> > I just picked the following example: >> > http://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examp >> les/tutorials/ex53.c.html >> > >> > Now, when I start it with less than 4 processes it works as expected: >> > mpirun -n 3 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 1 >> > -mat_mumps_icntl_29 2 >> > >> > But with 4 or more processes, it crashes, but only when I am using >> Parmetis: >> > mpirun -n 4 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 1 >> > -mat_mumps_icntl_29 2 >> > >> > Metis worked in every case I tried without any problems. >> > >> > I wonder if I am doing something wrong or if this is a general problem >> > or even a bug? Is Parmetis supposed to work with that example with 4 >> > processes? >> > >> > Thanks a lot and kind regards. >> > >> > Volker >> > >> > >> > Here is the error log of process 0: >> > >> > Entering DMUMPS 5.0.1 driver with JOB, N = 1 10000 >> > ================================================= >> > MUMPS compiled with option -Dmetis >> > MUMPS compiled with option -Dparmetis >> > ================================================= >> > L U Solver for unsymmetric matrices >> > Type of parallelism: Working host >> > >> > ****** ANALYSIS STEP ******** >> > >> > ** Max-trans not allowed because matrix is distributed >> > Using ParMETIS for parallel ordering. >> > [0]PETSC ERROR: >> > ------------------------------------------------------------ >> ------------ >> > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >> > probably memory access out of range >> > [0]PETSC ERROR: Try option -start_in_debugger or >> -on_error_attach_debugger >> > [0]PETSC ERROR: or see >> > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac >> > OS X to find memory corruption errors >> > [0]PETSC ERROR: likely location of problem given in stack below >> > [0]PETSC ERROR: --------------------- Stack Frames >> > ------------------------------------ >> > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not >> available, >> > [0]PETSC ERROR: INSTEAD the line number of the start of the >> function >> > [0]PETSC ERROR: is given. >> > [0]PETSC ERROR: [0] MatLUFactorSymbolic_AIJMUMPS line 1395 >> > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/mat/impls/ >> aij/mpi/mumps/mumps.c >> > [0]PETSC ERROR: [0] MatLUFactorSymbolic line 2927 >> > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/mat/interface/matrix.c >> > [0]PETSC ERROR: [0] PCSetUp_LU line 101 >> > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/pc/ >> impls/factor/lu/lu.c >> > [0]PETSC ERROR: [0] PCSetUp line 930 >> > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/pc/ >> interface/precon.c >> > [0]PETSC ERROR: [0] KSPSetUp line 305 >> > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/ksp/ >> interface/itfunc.c >> > [0]PETSC ERROR: [0] KSPSolve line 563 >> > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/ksp/ >> interface/itfunc.c >> > [0]PETSC ERROR: --------------------- Error Message >> > -------------------------------------------------------------- >> > [0]PETSC ERROR: Signal received >> > [0]PETSC ERROR: See >> > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >> > shooting. >> > [0]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 >> > [0]PETSC ERROR: ./ex53 on a linux-manni-mumps named manni by 133 Wed >> > Oct 19 16:39:49 2016 >> > [0]PETSC ERROR: Configure options --with-cc=mpiicc --with-cxx=mpiicpc >> > --with-fc=mpiifort --with-shared-libraries=1 >> > --with-valgrind-dir=~/usr/valgrind/ >> > --with-mpi-dir=/home/software/intel/Intel-2016.4/compilers_a >> nd_libraries_2016.4.258/linux/mpi >> > --download-scalapack --download-mumps --download-metis >> > --download-metis-shared=0 --download-parmetis >> > --download-parmetis-shared=0 >> > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file >> > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >> > >> >> > > > -- > ----------------------------------------- > Alfredo Buttari, PhD > CNRS-IRIT > 2 rue Camichel, 31071 Toulouse, France > http://buttari.perso.enseeiht.fr > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aks084000 at utdallas.edu Thu Oct 20 09:48:46 2016 From: aks084000 at utdallas.edu (Safin, Artur) Date: Thu, 20 Oct 2016 14:48:46 +0000 Subject: [petsc-users] How to set up an interface-type problem In-Reply-To: References: <36adffebd0954095b5d91413cac9355c@utdallas.edu>, Message-ID: <33f5dbea796b4ad1b3a961f41de950f0@utdallas.edu> Matt, You use http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetLocalSubMatrix.html http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecGetSubVector.html to put values directly into the subvectors and submatrices. This interacts well with MatNest, so it can be optimized after you get it working. Thanks, MatNest and VecNest is what I was looking for. I have a one more question: if I generate a vector with VecCreateNest, will it be allocated separately, or does it somehow reuse the space from the original subvectors? Artur -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Oct 20 10:18:09 2016 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 20 Oct 2016 10:18:09 -0500 Subject: [petsc-users] PetscFE questions In-Reply-To: References: Message-ID: On Thu, Oct 20, 2016 at 9:42 AM, Julian Andrej wrote: > Thanks for the suggestion. I guess DMCreateSubDM can work, but is > cumbersome to handle for the normal solution process since the mass > matrix for example is not a seperate field. > I did not understand what you meant by "parts of the physics". If you just want to make a different operator, then swap out the PetscDS from the DM. That holds the pointwise functions and discretizations. > src/snes/examples/tutorials/ex77 handles a seperate field for the > nullspace, if anyone is interested in that. > > An intuitive way was just copying the DM and describing a new problem on > it. > > DM dm_mass; > PetscDS ds_mass; > Vec dummy; > PetscInt id = 1; > petsc_call(DMCreateGlobalVector(dm, &dummy)); > petsc_call(DMClone(ctx->dm, &dm_mass)); > petsc_call(DMGetDS(dm_mass, &ds_mass)); > petsc_call(PetscDSSetDiscretization(ds_mass, 0, (PetscObject)fe)); > petsc_call(PetscDSSetJacobian(ds_mass, 0, 0, mass_kernel, NULL, NULL, > NULL)); > petsc_call(PetscDSAddBoundary(ds_mass, PETSC_TRUE, "wall", "marker", > 0, 0, NULL, (void (*)())ctx->exact_funcs[0], 1, &id, ctx)); > petsc_call(DMCreateMatrix(dm_mass, &ctx->M)); > petsc_call(DMPlexSNESComputeJacobianFEM(dm_mass, dummy, ctx->M, > ctx->M, NULL)); > > is this an intended way to assemble a jacobian based on a weak form? > The memory overhead for a DM copy isn't huge on the first sight. > Its O(1). > And a much more important question. Is there any mathematical > description how exactly you handle dirichlet boundary conditions here? > Right now, you can do two things: 1) Handle it yourself or 2) eliminate particular dofs If you use 2), these dofs are eliminated from the global vector. They remain in the local vector, and boundary values are inserted before local vectors are passed to assembly routines. Matt > On first sight it looks like condensing the nodes only to > non-essential nodes and then projecting them back in the solution > vector. If thats teh case I don't understand how you "augment" the > solution with the boundary nodes. > > Regards > Julian > > > On Wed, Oct 19, 2016 at 11:51 AM, Matthew Knepley > wrote: > > On Tue, Oct 18, 2016 at 7:38 AM, Julian Andrej > wrote: > >> > >> Hi, > >> > >> i have general question about PetscFE. When i want to assemble certain > >> parts of physics separately, how can i do that? I basically want to > >> assemble matrices/vectors from the weak forms on the same DM (and > >> avoid copying the DM) and use them afterwards. Is there a convenient > >> way for doing that? > >> > >> The "workflow" i'm approaching is something like: > >> > >> - Setup the DM > >> - Setup discretization (spaces and quadrature) for each weak form i > >> want to compute > >> - Compute just the weak form i want right now for a specific > >> discretization and field. > >> > >> The reason is i need certain parts of the "complete" Jacobian for > >> computations of eigenproblems and like to avoid computing those more > >> often than needed. > > > > > > The way I envision this working is to use DMCreateSubDM(). It should > extract > > everything correctly for the subset of fields you select. However, I have > > not > > extensively tested, so if something is wrong let me know. > > > > Thanks, > > > > Matt > > > >> > >> Regards > >> Julian > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments > > is infinitely more interesting than any results to which their > experiments > > lead. > > -- Norbert Wiener > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Oct 20 10:28:05 2016 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 20 Oct 2016 10:28:05 -0500 Subject: [petsc-users] How to set up an interface-type problem In-Reply-To: <33f5dbea796b4ad1b3a961f41de950f0@utdallas.edu> References: <36adffebd0954095b5d91413cac9355c@utdallas.edu> <33f5dbea796b4ad1b3a961f41de950f0@utdallas.edu> Message-ID: On Thu, Oct 20, 2016 at 9:48 AM, Safin, Artur wrote: > Matt, > > > You use >> >> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/ >> MatGetLocalSubMatrix.html >> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/ >> VecGetSubVector.html >> >> to put values directly into the subvectors and submatrices. This >> interacts well with MatNest, so it >> can be optimized after you get it working. >> >> > Thanks, MatNest and VecNest is what I was looking for. > > I have a one more question: if I generate a vector with VecCreateNest, > will it be allocated separately, or does it somehow reuse the space from > the original subvectors? > With a VecNest, the subvector is not copied. Thanks, Matt > Artur > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Oct 20 10:55:43 2016 From: jed at jedbrown.org (Jed Brown) Date: Thu, 20 Oct 2016 09:55:43 -0600 Subject: [petsc-users] How to set up an interface-type problem In-Reply-To: <33f5dbea796b4ad1b3a961f41de950f0@utdallas.edu> References: <36adffebd0954095b5d91413cac9355c@utdallas.edu> <33f5dbea796b4ad1b3a961f41de950f0@utdallas.edu> Message-ID: <87shrrrp4w.fsf@jedbrown.org> "Safin, Artur" writes: > Matt, > > > You use > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetLocalSubMatrix.html > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecGetSubVector.html > > to put values directly into the subvectors and submatrices. This interacts well with MatNest, so it > can be optimized after you get it working. > > > Thanks, MatNest and VecNest is what I was looking for. > > I have a one more question: if I generate a vector with VecCreateNest, will it be allocated separately, or does it somehow reuse the space from the original subvectors? It references the vectors that you pass in. But you almost certainly should not hard-code to use VecNest. Almost all operations are less efficient and most preconditioners cannot use VecNest. Similarly, you should assembly your matrix using MatGetLocalSubMatrix(), setting values into the blocks using MatSetValuesLocal(). You can reuse code you currently have for building the blocks separately. See src/snes/examples/tutorials/ex28.c for an example. If you do it this way, then using MatNest or VecNest is a run-time flag that is a possible optimization for *some specific* algorithms. (VecNest is almost always a pessimization. You can use MatNest without VecNest.) -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From juan at tf.uni-kiel.de Fri Oct 21 02:26:01 2016 From: juan at tf.uni-kiel.de (Julian Andrej) Date: Fri, 21 Oct 2016 09:26:01 +0200 Subject: [petsc-users] PetscFE questions In-Reply-To: References: Message-ID: On Thu, Oct 20, 2016 at 5:18 PM, Matthew Knepley wrote: > On Thu, Oct 20, 2016 at 9:42 AM, Julian Andrej wrote: >> >> Thanks for the suggestion. I guess DMCreateSubDM can work, but is >> cumbersome to handle for the normal solution process since the mass >> matrix for example is not a seperate field. > > > I did not understand what you meant by "parts of the physics". If you just > want to make a different operator, then swap out the PetscDS from the DM. > That holds the pointwise functions and discretizations. > Yes, its basically a different operator! Thats a really smart design, i can just create different PetscDS objects and stick them in to assemble the operator. /* Assemble mass operator */ DMSetDS(dm, ds_mass); DMPlexSNESComputeJacobianFEM(dm, dummy, ctx->M, ctx->M, NULL); /* Assemble laplacian operator */ DMSetDS(dm, ds_laplacian); DMPlexSNESComputeJacobianFEM(dm, dummy, ctx->J, ctx->J, NULL); There is one thing that bothers me just a bit. Everytime you call DMSetDS the old PetscDS object is destroyed and you have to reacreate the object in case you want to reassemble that operator. src/dm/interface/dm.c:3889: ierr = PetscDSDestroy(&dm->prob);CHKERRQ(ierr); Maybe it is just my specific use case but something to think about. >> >> src/snes/examples/tutorials/ex77 handles a seperate field for the >> nullspace, if anyone is interested in that. >> >> An intuitive way was just copying the DM and describing a new problem on >> it. >> >> DM dm_mass; >> PetscDS ds_mass; >> Vec dummy; >> PetscInt id = 1; >> petsc_call(DMCreateGlobalVector(dm, &dummy)); >> petsc_call(DMClone(ctx->dm, &dm_mass)); >> petsc_call(DMGetDS(dm_mass, &ds_mass)); >> petsc_call(PetscDSSetDiscretization(ds_mass, 0, (PetscObject)fe)); >> petsc_call(PetscDSSetJacobian(ds_mass, 0, 0, mass_kernel, NULL, NULL, >> NULL)); >> petsc_call(PetscDSAddBoundary(ds_mass, PETSC_TRUE, "wall", "marker", >> 0, 0, NULL, (void (*)())ctx->exact_funcs[0], 1, &id, ctx)); >> petsc_call(DMCreateMatrix(dm_mass, &ctx->M)); >> petsc_call(DMPlexSNESComputeJacobianFEM(dm_mass, dummy, ctx->M, >> ctx->M, NULL)); >> >> is this an intended way to assemble a jacobian based on a weak form? >> The memory overhead for a DM copy isn't huge on the first sight. > > > Its O(1). > >> >> And a much more important question. Is there any mathematical >> description how exactly you handle dirichlet boundary conditions here? > > > Right now, you can do two things: > > 1) Handle it yourself > > or > > 2) eliminate particular dofs > > If you use 2), these dofs are eliminated from the global vector. They remain > in the > local vector, and boundary values are inserted before local vectors are > passed to > assembly routines. > > Matt > Thank you again for your help and suggestions. Regards Julian >> >> On first sight it looks like condensing the nodes only to >> non-essential nodes and then projecting them back in the solution >> vector. If thats teh case I don't understand how you "augment" the >> solution with the boundary nodes. >> >> Regards >> Julian >> >> >> On Wed, Oct 19, 2016 at 11:51 AM, Matthew Knepley >> wrote: >> > On Tue, Oct 18, 2016 at 7:38 AM, Julian Andrej >> > wrote: >> >> >> >> Hi, >> >> >> >> i have general question about PetscFE. When i want to assemble certain >> >> parts of physics separately, how can i do that? I basically want to >> >> assemble matrices/vectors from the weak forms on the same DM (and >> >> avoid copying the DM) and use them afterwards. Is there a convenient >> >> way for doing that? >> >> >> >> The "workflow" i'm approaching is something like: >> >> >> >> - Setup the DM >> >> - Setup discretization (spaces and quadrature) for each weak form i >> >> want to compute >> >> - Compute just the weak form i want right now for a specific >> >> discretization and field. >> >> >> >> The reason is i need certain parts of the "complete" Jacobian for >> >> computations of eigenproblems and like to avoid computing those more >> >> often than needed. >> > >> > >> > The way I envision this working is to use DMCreateSubDM(). It should >> > extract >> > everything correctly for the subset of fields you select. However, I >> > have >> > not >> > extensively tested, so if something is wrong let me know. >> > >> > Thanks, >> > >> > Matt >> > >> >> >> >> Regards >> >> Julian >> > >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> > experiments >> > is infinitely more interesting than any results to which their >> > experiments >> > lead. >> > -- Norbert Wiener > > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener From lawrence.mitchell at imperial.ac.uk Fri Oct 21 05:17:46 2016 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Fri, 21 Oct 2016 11:17:46 +0100 Subject: [petsc-users] PetscFE questions In-Reply-To: References: Message-ID: > On 21 Oct 2016, at 08:26, Julian Andrej wrote: > > On Thu, Oct 20, 2016 at 5:18 PM, Matthew Knepley wrote: >> On Thu, Oct 20, 2016 at 9:42 AM, Julian Andrej wrote: >>> >>> Thanks for the suggestion. I guess DMCreateSubDM can work, but is >>> cumbersome to handle for the normal solution process since the mass >>> matrix for example is not a seperate field. >> >> >> I did not understand what you meant by "parts of the physics". If you just >> want to make a different operator, then swap out the PetscDS from the DM. >> That holds the pointwise functions and discretizations. >> > > Yes, its basically a different operator! Thats a really smart design, > i can just create different PetscDS objects and stick them in to > assemble the operator. > > /* Assemble mass operator */ > DMSetDS(dm, ds_mass); > DMPlexSNESComputeJacobianFEM(dm, dummy, ctx->M, ctx->M, NULL); > /* Assemble laplacian operator */ > DMSetDS(dm, ds_laplacian); > DMPlexSNESComputeJacobianFEM(dm, dummy, ctx->J, ctx->J, NULL); > > There is one thing that bothers me just a bit. Everytime you call > DMSetDS the old PetscDS object is destroyed and you have to reacreate > the object in case you want to reassemble that operator. > > src/dm/interface/dm.c:3889: ierr = PetscDSDestroy(&dm->prob);CHKERRQ(ierr); All objects in PETSc are refcounted. So this just drops the reference that the DM is holding to the DS. As long as you're still holding a reference in your code (you haven't called PetscDSDestroy) then this does not actually deallocate the DS, just decrements the refcount. Lawrence From popov at uni-mainz.de Fri Oct 21 05:36:04 2016 From: popov at uni-mainz.de (Anton Popov) Date: Fri, 21 Oct 2016 12:36:04 +0200 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> Message-ID: <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> On 10/19/2016 05:22 PM, Anton Popov wrote: > I looked at each valgrind-complained item in your email dated Oct. > 11. Those reports are really superficial; I don't see anything wrong > with those lines (mostly uninitialized variables) singled out. I did > a few tests with the latest version in github, all went fine. > > Perhaps you can print your matrix that caused problem, I can run it > using your matrix. > > Sherry Hi Sherry, I finally figured out a minimalistic setup (attached) that reproduces the problem. I use petsc-maint: git clone -b maint https://bitbucket.org/petsc/petsc.git and configure it in the debug mode without optimization using the options: --download-superlu_dist=1 \ --download-superlu_dist-commit=origin/maint \ Compile the test, assuming PETSC_DIR points to the described petsc installation: make ex16 Run with: mpirun -n 2 ./ex16 -f binaryoutput -pc_type lu -pc_factor_mat_solver_package superlu_dist Matrix partitioning between the processors will be completely the same as in our code (hard-coded). I factorize the same matrix twice with the same PC object. Remarkably it runs fine for the first time, but fails for the second. Thank you very much for looking into this problem. Cheers, Anton -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: superlu_dist_test.tar.gz Type: application/gzip Size: 440755 bytes Desc: not available URL: From juan at tf.uni-kiel.de Fri Oct 21 06:10:51 2016 From: juan at tf.uni-kiel.de (Julian Andrej) Date: Fri, 21 Oct 2016 13:10:51 +0200 Subject: [petsc-users] PetscFE questions In-Reply-To: References: Message-ID: Yeah, thanks for pointing out my mistake. Next time i'm going to think one more time before writing ;) On Fri, Oct 21, 2016 at 12:17 PM, Lawrence Mitchell wrote: > >> On 21 Oct 2016, at 08:26, Julian Andrej wrote: >> >> On Thu, Oct 20, 2016 at 5:18 PM, Matthew Knepley wrote: >>> On Thu, Oct 20, 2016 at 9:42 AM, Julian Andrej wrote: >>>> >>>> Thanks for the suggestion. I guess DMCreateSubDM can work, but is >>>> cumbersome to handle for the normal solution process since the mass >>>> matrix for example is not a seperate field. >>> >>> >>> I did not understand what you meant by "parts of the physics". If you just >>> want to make a different operator, then swap out the PetscDS from the DM. >>> That holds the pointwise functions and discretizations. >>> >> >> Yes, its basically a different operator! Thats a really smart design, >> i can just create different PetscDS objects and stick them in to >> assemble the operator. >> >> /* Assemble mass operator */ >> DMSetDS(dm, ds_mass); >> DMPlexSNESComputeJacobianFEM(dm, dummy, ctx->M, ctx->M, NULL); >> /* Assemble laplacian operator */ >> DMSetDS(dm, ds_laplacian); >> DMPlexSNESComputeJacobianFEM(dm, dummy, ctx->J, ctx->J, NULL); >> >> There is one thing that bothers me just a bit. Everytime you call >> DMSetDS the old PetscDS object is destroyed and you have to reacreate >> the object in case you want to reassemble that operator. >> >> src/dm/interface/dm.c:3889: ierr = PetscDSDestroy(&dm->prob);CHKERRQ(ierr); > > All objects in PETSc are refcounted. So this just drops the reference that the DM is holding to the DS. As long as you're still holding a reference in your code (you haven't called PetscDSDestroy) then this does not actually deallocate the DS, just decrements the refcount. > > Lawrence From knepley at gmail.com Fri Oct 21 07:17:00 2016 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 21 Oct 2016 07:17:00 -0500 Subject: [petsc-users] PetscFE questions In-Reply-To: References: Message-ID: On Fri, Oct 21, 2016 at 2:26 AM, Julian Andrej wrote: > On Thu, Oct 20, 2016 at 5:18 PM, Matthew Knepley > wrote: > > On Thu, Oct 20, 2016 at 9:42 AM, Julian Andrej > wrote: > >> > >> Thanks for the suggestion. I guess DMCreateSubDM can work, but is > >> cumbersome to handle for the normal solution process since the mass > >> matrix for example is not a seperate field. > > > > > > I did not understand what you meant by "parts of the physics". If you > just > > want to make a different operator, then swap out the PetscDS from the DM. > > That holds the pointwise functions and discretizations. > > > > Yes, its basically a different operator! Thats a really smart design, > i can just create different PetscDS objects and stick them in to > assemble the operator. > > /* Assemble mass operator */ > DMSetDS(dm, ds_mass); > DMPlexSNESComputeJacobianFEM(dm, dummy, ctx->M, ctx->M, NULL); > /* Assemble laplacian operator */ > DMSetDS(dm, ds_laplacian); > DMPlexSNESComputeJacobianFEM(dm, dummy, ctx->J, ctx->J, NULL); > > There is one thing that bothers me just a bit. Everytime you call > DMSetDS the old PetscDS object is destroyed and you have to reacreate > the object in case you want to reassemble that operator. > > src/dm/interface/dm.c:3889: ierr = PetscDSDestroy(&dm->prob); > CHKERRQ(ierr); > > Maybe it is just my specific use case but something to think about. If you want to keep them around, you should do this DMGetDS(dm, &oldds); PetscObjectReference(oldds); DMSetDS(dm, newds); DMSetDS(dm, oldds); PetscObjectDeferefence(oldds); Thanks, Matt > >> > >> src/snes/examples/tutorials/ex77 handles a seperate field for the > >> nullspace, if anyone is interested in that. > >> > >> An intuitive way was just copying the DM and describing a new problem on > >> it. > >> > >> DM dm_mass; > >> PetscDS ds_mass; > >> Vec dummy; > >> PetscInt id = 1; > >> petsc_call(DMCreateGlobalVector(dm, &dummy)); > >> petsc_call(DMClone(ctx->dm, &dm_mass)); > >> petsc_call(DMGetDS(dm_mass, &ds_mass)); > >> petsc_call(PetscDSSetDiscretization(ds_mass, 0, (PetscObject)fe)); > >> petsc_call(PetscDSSetJacobian(ds_mass, 0, 0, mass_kernel, NULL, NULL, > >> NULL)); > >> petsc_call(PetscDSAddBoundary(ds_mass, PETSC_TRUE, "wall", "marker", > >> 0, 0, NULL, (void (*)())ctx->exact_funcs[0], 1, &id, ctx)); > >> petsc_call(DMCreateMatrix(dm_mass, &ctx->M)); > >> petsc_call(DMPlexSNESComputeJacobianFEM(dm_mass, dummy, ctx->M, > >> ctx->M, NULL)); > >> > >> is this an intended way to assemble a jacobian based on a weak form? > >> The memory overhead for a DM copy isn't huge on the first sight. > > > > > > Its O(1). > > > >> > >> And a much more important question. Is there any mathematical > >> description how exactly you handle dirichlet boundary conditions here? > > > > > > Right now, you can do two things: > > > > 1) Handle it yourself > > > > or > > > > 2) eliminate particular dofs > > > > If you use 2), these dofs are eliminated from the global vector. They > remain > > in the > > local vector, and boundary values are inserted before local vectors are > > passed to > > assembly routines. > > > > Matt > > > > Thank you again for your help and suggestions. > > Regards > Julian > > >> > >> On first sight it looks like condensing the nodes only to > >> non-essential nodes and then projecting them back in the solution > >> vector. If thats teh case I don't understand how you "augment" the > >> solution with the boundary nodes. > >> > >> Regards > >> Julian > >> > >> > >> On Wed, Oct 19, 2016 at 11:51 AM, Matthew Knepley > >> wrote: > >> > On Tue, Oct 18, 2016 at 7:38 AM, Julian Andrej > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> i have general question about PetscFE. When i want to assemble > certain > >> >> parts of physics separately, how can i do that? I basically want to > >> >> assemble matrices/vectors from the weak forms on the same DM (and > >> >> avoid copying the DM) and use them afterwards. Is there a convenient > >> >> way for doing that? > >> >> > >> >> The "workflow" i'm approaching is something like: > >> >> > >> >> - Setup the DM > >> >> - Setup discretization (spaces and quadrature) for each weak form i > >> >> want to compute > >> >> - Compute just the weak form i want right now for a specific > >> >> discretization and field. > >> >> > >> >> The reason is i need certain parts of the "complete" Jacobian for > >> >> computations of eigenproblems and like to avoid computing those more > >> >> often than needed. > >> > > >> > > >> > The way I envision this working is to use DMCreateSubDM(). It should > >> > extract > >> > everything correctly for the subset of fields you select. However, I > >> > have > >> > not > >> > extensively tested, so if something is wrong let me know. > >> > > >> > Thanks, > >> > > >> > Matt > >> > > >> >> > >> >> Regards > >> >> Julian > >> > > >> > > >> > > >> > > >> > -- > >> > What most experimenters take for granted before they begin their > >> > experiments > >> > is infinitely more interesting than any results to which their > >> > experiments > >> > lead. > >> > -- Norbert Wiener > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments > > is infinitely more interesting than any results to which their > experiments > > lead. > > -- Norbert Wiener > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From ztdepyahoo at 163.com Fri Oct 21 08:40:59 2016 From: ztdepyahoo at 163.com (=?GBK?B?tqHAz8qm?=) Date: Fri, 21 Oct 2016 21:40:59 +0800 (CST) Subject: [petsc-users] How to scatter values Message-ID: <7a54264f.c2ae.157e7792950.Coremail.ztdepyahoo@163.com> Dear professor: I?????????????????????????????????????????????????????????? ??????????????????????????????? ??? ??????????????????????????? ??????????????????????????????? ????????????????????????????????? ?? ????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Fri Oct 21 11:17:43 2016 From: hzhang at mcs.anl.gov (Hong) Date: Fri, 21 Oct 2016 11:17:43 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> Message-ID: I can reproduce the error on a linux machine with petsc-maint. It crashes at 2nd solve, on both processors: Program received signal SIGSEGV, Segmentation fault. 0x00007f051dc835bd in pdgsequ (A=0x1563910, r=0x176dfe0, c=0x178f7f0, rowcnd=0x7fffcb8dab30, colcnd=0x7fffcb8dab38, amax=0x7fffcb8dab40, info=0x7fffcb8dab4c, grid=0x1563858) at /sandbox/hzhang/petsc/arch-linux-gcc-gfortran/externalpackages/git.superlu_dist/SRC/pdgsequ.c:182 182 c[jcol] = SUPERLU_MAX( c[jcol], fabs(Aval[j]) * r[irow] ); The version of superlu_dist: commit 0b5369f304507f1c7904a913f4c0c86777a60639 Author: Xiaoye Li Date: Thu May 26 11:33:19 2016 -0700 rename 'struct pair' to 'struct superlu_pair'. Hong On Fri, Oct 21, 2016 at 5:36 AM, Anton Popov wrote: > > On 10/19/2016 05:22 PM, Anton Popov wrote: > > I looked at each valgrind-complained item in your email dated Oct. 11. > Those reports are really superficial; I don't see anything wrong with > those lines (mostly uninitialized variables) singled out. I did a few > tests with the latest version in github, all went fine. > > Perhaps you can print your matrix that caused problem, I can run it using > your matrix. > > Sherry > > Hi Sherry, > > I finally figured out a minimalistic setup (attached) that reproduces the > problem. > > I use petsc-maint: > > git clone -b maint https://bitbucket.org/petsc/petsc.git > > and configure it in the debug mode without optimization using the options: > > --download-superlu_dist=1 \ > --download-superlu_dist-commit=origin/maint \ > > Compile the test, assuming PETSC_DIR points to the described petsc > installation: > > make ex16 > > Run with: > > mpirun -n 2 ./ex16 -f binaryoutput -pc_type lu > -pc_factor_mat_solver_package superlu_dist > > Matrix partitioning between the processors will be completely the same as > in our code (hard-coded). > > I factorize the same matrix twice with the same PC object. Remarkably it > runs fine for the first time, but fails for the second. > > Thank you very much for looking into this problem. > > Cheers, > Anton > -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.sanan at gmail.com Fri Oct 21 11:30:41 2016 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Fri, 21 Oct 2016 18:30:41 +0200 Subject: [petsc-users] Looking for a quick example of a symmetric KKT system Message-ID: Are there any examples already in PETSc or TAO that assemble such a system (which could thus be dumped)? SNES example ex73f90t assembles a non-symmetric KKT system. From jychang48 at gmail.com Fri Oct 21 12:23:12 2016 From: jychang48 at gmail.com (Justin Chang) Date: Fri, 21 Oct 2016 12:23:12 -0500 Subject: [petsc-users] Looking for a quick example of a symmetric KKT system In-Reply-To: References: Message-ID: Something like this? http://www.mcs.anl.gov/petsc/petsc-current/src/tao/constrained/examples/tutorials/toy.c.html On Friday, October 21, 2016, Patrick Sanan wrote: > Are there any examples already in PETSc or TAO that assemble such a > system (which could thus be dumped)? SNES example ex73f90t assembles a > non-symmetric KKT system. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.sanan at gmail.com Fri Oct 21 12:28:31 2016 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Fri, 21 Oct 2016 19:28:31 +0200 Subject: [petsc-users] Looking for a quick example of a symmetric KKT system In-Reply-To: References: Message-ID: Yes, but AFAIK that example produces a 2x2 system - I was hoping for something with a variable problem size, ideally with some sort of physics motivating the underlying optimization problem. On Fri, Oct 21, 2016 at 7:23 PM, Justin Chang wrote: > Something like this? > > http://www.mcs.anl.gov/petsc/petsc-current/src/tao/constrained/examples/tutorials/toy.c.html > > > On Friday, October 21, 2016, Patrick Sanan wrote: >> >> Are there any examples already in PETSc or TAO that assemble such a >> system (which could thus be dumped)? SNES example ex73f90t assembles a >> non-symmetric KKT system. From jed at jedbrown.org Fri Oct 21 12:50:24 2016 From: jed at jedbrown.org (Jed Brown) Date: Fri, 21 Oct 2016 11:50:24 -0600 Subject: [petsc-users] Looking for a quick example of a symmetric KKT system In-Reply-To: References: Message-ID: <871sz9r3q7.fsf@jedbrown.org> Why doesn't a Stokes problem fulfill your needs? Patrick Sanan writes: > Yes, but AFAIK that example produces a 2x2 system - I was hoping for > something with a variable problem size, ideally with some sort of > physics motivating the underlying optimization problem. > > On Fri, Oct 21, 2016 at 7:23 PM, Justin Chang wrote: >> Something like this? >> >> http://www.mcs.anl.gov/petsc/petsc-current/src/tao/constrained/examples/tutorials/toy.c.html >> >> >> On Friday, October 21, 2016, Patrick Sanan wrote: >>> >>> Are there any examples already in PETSc or TAO that assemble such a >>> system (which could thus be dumped)? SNES example ex73f90t assembles a >>> non-symmetric KKT system. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From Eric.Chamberland at giref.ulaval.ca Fri Oct 21 12:55:56 2016 From: Eric.Chamberland at giref.ulaval.ca (Eric Chamberland) Date: Fri, 21 Oct 2016 13:55:56 -0400 Subject: [petsc-users] Column #j is wrong in parallel from message "Inserting a new nonzero (i, j) into matrix" In-Reply-To: References: <5512F866.5070405@giref.ulaval.ca> Message-ID: <86c5c91c-0fd2-2e88-1787-e5da1a3e4e35@giref.ulaval.ca> Hi, I am on a new issue with a message: [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: Argument out of range [1]PETSC ERROR: New nonzero at (374328,1227) caused a malloc Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn off this check [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [1]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 [1]PETSC ERROR: /pmi/ericc/projetm4/depots_prepush/BIB/bin/BIBMEF.opt on a arch-linux2-c-debug named lorien by eric Fri Oct 21 13:46:51 2016 [1]PETSC ERROR: Configure options --prefix=/opt/petsc-3.7.2_debug_matmatmult_mpi --with-mpi-compilers=1 --with-make-np=12 --with-shared-libraries=1 --with-mpi-dir=/opt/openmpi-1.10.2 --with-debugging=yes --with-mkl_pardiso=1 --with-mkl_pardiso-dir=/opt/intel/composerxe/mkl --download-ml=yes --download-mumps=yes --download-superlu=yes --download-superlu_dist=yes --download-parmetis=yes --download-ptscotch=yes --download-metis=yes --download-suitesparse=yes --download-hypre=yes --with-scalapack=1 --with-scalapack-include=/opt/intel/composerxe/mkl/include --with-scalapack-lib="-L/opt/intel/composerxe/mkl/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64" --with-blas-lapack-dir=/opt/intel/composerxe/mkl/lib/intel64 [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 616 in /groshd/ericc/petsc-3.7.2-debug/src/mat/impls/aij/mpi/mpiaij.c [1]PETSC ERROR: #2 MatAssemblyEnd_MPIAIJ() line 724 in /groshd/ericc/petsc-3.7.2-debug/src/mat/impls/aij/mpi/mpiaij.c [1]PETSC ERROR: #3 MatAssemblyEnd() line 5194 in /groshd/ericc/petsc-3.7.2-debug/src/mat/interface/matrix.c I am starting to debug, but I just want to be sure that the indices 374328 and 1227 are both global indices... re-reading the thread makes me think yes... but I am not 100% sure... Thanks, Eric On 26/03/15 09:52 PM, Barry Smith wrote: > > Eric, > > I have now updated all the standard MPI matrix types AIJ, BAIJ, SBAIJ to print the correct global indices in the error messages when a new nonzero location is generated thus making debugging this issue easier. In the branches barry/fix-inserting-new-nonzero-column-location, next and the next release. > > Thanks for pushing on this. The previous code was too "developer centric" and not enough "user centric" enough. > > Barry > >> On Mar 25, 2015, at 1:03 PM, Eric Chamberland wrote: >> >> Hi, >> >> while looking for where in the world do I insert the (135,9) entry in my matrix, I have discovered that the column # shown is wrong in parallel! >> >> I am using PETsc 3.5.3. >> >> The full error message is: >> >> [0]PETSC ERROR: MatSetValues_MPIAIJ() line 564 in /home/mefpp_ericc/petsc-3.5.3/src/mat/impls/aij/mpi/mpiaij.c Inserting a new nonzero (135, 9) into matrix >> >> This line code is a call to a #defined macro: >> >> MatSetValues_SeqAIJ_B_Private(row,col,value,addv); >> >> where the "col" parameter is not equal to "in[j]"!!! >> >> in gdb, printing "in[j]" gave me: >> >> print in[j] >> $6 = 537 >> >> while "col" is: >> >> print col >> $7 = 9 >> >> So, I expected to have a message telling me that (135,537) and not (135,9) is a new entry matrix!!! >> >> Would it be a big work to fix this so that the col # displayed is correct? >> >> Thanks! >> >> Eric From dave.mayhem23 at gmail.com Fri Oct 21 13:15:04 2016 From: dave.mayhem23 at gmail.com (Dave May) Date: Fri, 21 Oct 2016 19:15:04 +0100 Subject: [petsc-users] Column #j is wrong in parallel from message "Inserting a new nonzero (i, j) into matrix" In-Reply-To: <86c5c91c-0fd2-2e88-1787-e5da1a3e4e35@giref.ulaval.ca> References: <5512F866.5070405@giref.ulaval.ca> <86c5c91c-0fd2-2e88-1787-e5da1a3e4e35@giref.ulaval.ca> Message-ID: On 21 October 2016 at 18:55, Eric Chamberland < Eric.Chamberland at giref.ulaval.ca> wrote: > Hi, > > I am on a new issue with a message: > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: Argument out of range > [1]PETSC ERROR: New nonzero at (374328,1227) caused a malloc > Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn > off this check > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > [1]PETSC ERROR: /pmi/ericc/projetm4/depots_prepush/BIB/bin/BIBMEF.opt on > a arch-linux2-c-debug named lorien by eric Fri Oct 21 13:46:51 2016 > [1]PETSC ERROR: Configure options --prefix=/opt/petsc-3.7.2_debug_matmatmult_mpi > --with-mpi-compilers=1 --with-make-np=12 --with-shared-libraries=1 > --with-mpi-dir=/opt/openmpi-1.10.2 --with-debugging=yes > --with-mkl_pardiso=1 --with-mkl_pardiso-dir=/opt/intel/composerxe/mkl > --download-ml=yes --download-mumps=yes --download-superlu=yes > --download-superlu_dist=yes --download-parmetis=yes --download-ptscotch=yes > --download-metis=yes --download-suitesparse=yes --download-hypre=yes > --with-scalapack=1 --with-scalapack-include=/opt/intel/composerxe/mkl/include > --with-scalapack-lib="-L/opt/intel/composerxe/mkl/lib/intel64 > -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64" > --with-blas-lapack-dir=/opt/intel/composerxe/mkl/lib/intel64 > [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 616 in > /groshd/ericc/petsc-3.7.2-debug/src/mat/impls/aij/mpi/mpiaij.c > [1]PETSC ERROR: #2 MatAssemblyEnd_MPIAIJ() line 724 in > /groshd/ericc/petsc-3.7.2-debug/src/mat/impls/aij/mpi/mpiaij.c > [1]PETSC ERROR: #3 MatAssemblyEnd() line 5194 in > /groshd/ericc/petsc-3.7.2-debug/src/mat/interface/matrix.c > > I am starting to debug, but I just want to be sure that the indices 374328 > and 1227 are both global indices... > They are. > > re-reading the thread makes me think yes... but I am not 100% sure... > > Thanks, > > Eric > > > > On 26/03/15 09:52 PM, Barry Smith wrote: > >> >> Eric, >> >> I have now updated all the standard MPI matrix types AIJ, BAIJ, SBAIJ >> to print the correct global indices in the error messages when a new >> nonzero location is generated thus making debugging this issue easier. In >> the branches barry/fix-inserting-new-nonzero-column-location, next and >> the next release. >> >> Thanks for pushing on this. The previous code was too "developer >> centric" and not enough "user centric" enough. >> >> Barry >> >> On Mar 25, 2015, at 1:03 PM, Eric Chamberland < >>> Eric.Chamberland at giref.ulaval.ca> wrote: >>> >>> Hi, >>> >>> while looking for where in the world do I insert the (135,9) entry in my >>> matrix, I have discovered that the column # shown is wrong in parallel! >>> >>> I am using PETsc 3.5.3. >>> >>> The full error message is: >>> >>> [0]PETSC ERROR: MatSetValues_MPIAIJ() line 564 in >>> /home/mefpp_ericc/petsc-3.5.3/src/mat/impls/aij/mpi/mpiaij.c Inserting >>> a new nonzero (135, 9) into matrix >>> >>> This line code is a call to a #defined macro: >>> >>> MatSetValues_SeqAIJ_B_Private(row,col,value,addv); >>> >>> where the "col" parameter is not equal to "in[j]"!!! >>> >>> in gdb, printing "in[j]" gave me: >>> >>> print in[j] >>> $6 = 537 >>> >>> while "col" is: >>> >>> print col >>> $7 = 9 >>> >>> So, I expected to have a message telling me that (135,537) and not >>> (135,9) is a new entry matrix!!! >>> >>> Would it be a big work to fix this so that the col # displayed is >>> correct? >>> >>> Thanks! >>> >>> Eric >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From fande.kong at inl.gov Fri Oct 21 16:51:48 2016 From: fande.kong at inl.gov (Kong, Fande) Date: Fri, 21 Oct 2016 15:51:48 -0600 Subject: [petsc-users] matrix preallocation Message-ID: Hi, For mechanics problems, the contact surface changes during each nonlinear iteration. Therefore, the sparsity of matrix also changes during each nonlinear iteration. We know the preallocaiton is important for performance. My question is: it is possible to re-allocate memory during each nonlinear iteration? Fande -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Oct 21 17:16:47 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 21 Oct 2016 17:16:47 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> Message-ID: The issue with this test code is - using MatLoad() twice [with the same object - without destroying it]. Not sure if thats supporsed to work.. Satish On Fri, 21 Oct 2016, Hong wrote: > I can reproduce the error on a linux machine with petsc-maint. It crashes > at 2nd solve, on both processors: > > Program received signal SIGSEGV, Segmentation fault. > 0x00007f051dc835bd in pdgsequ (A=0x1563910, r=0x176dfe0, c=0x178f7f0, > rowcnd=0x7fffcb8dab30, colcnd=0x7fffcb8dab38, amax=0x7fffcb8dab40, > info=0x7fffcb8dab4c, grid=0x1563858) > at > /sandbox/hzhang/petsc/arch-linux-gcc-gfortran/externalpackages/git.superlu_dist/SRC/pdgsequ.c:182 > 182 c[jcol] = SUPERLU_MAX( c[jcol], fabs(Aval[j]) * r[irow] > ); > > The version of superlu_dist: > commit 0b5369f304507f1c7904a913f4c0c86777a60639 > Author: Xiaoye Li > Date: Thu May 26 11:33:19 2016 -0700 > > rename 'struct pair' to 'struct superlu_pair'. > > Hong > > On Fri, Oct 21, 2016 at 5:36 AM, Anton Popov wrote: > > > > > On 10/19/2016 05:22 PM, Anton Popov wrote: > > > > I looked at each valgrind-complained item in your email dated Oct. 11. > > Those reports are really superficial; I don't see anything wrong with > > those lines (mostly uninitialized variables) singled out. I did a few > > tests with the latest version in github, all went fine. > > > > Perhaps you can print your matrix that caused problem, I can run it using > > your matrix. > > > > Sherry > > > > Hi Sherry, > > > > I finally figured out a minimalistic setup (attached) that reproduces the > > problem. > > > > I use petsc-maint: > > > > git clone -b maint https://bitbucket.org/petsc/petsc.git > > > > and configure it in the debug mode without optimization using the options: > > > > --download-superlu_dist=1 \ > > --download-superlu_dist-commit=origin/maint \ > > > > Compile the test, assuming PETSC_DIR points to the described petsc > > installation: > > > > make ex16 > > > > Run with: > > > > mpirun -n 2 ./ex16 -f binaryoutput -pc_type lu > > -pc_factor_mat_solver_package superlu_dist > > > > Matrix partitioning between the processors will be completely the same as > > in our code (hard-coded). > > > > I factorize the same matrix twice with the same PC object. Remarkably it > > runs fine for the first time, but fails for the second. > > > > Thank you very much for looking into this problem. > > > > Cheers, > > Anton > > > From bsmith at mcs.anl.gov Fri Oct 21 17:59:45 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 21 Oct 2016 17:59:45 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> Message-ID: > On Oct 21, 2016, at 5:16 PM, Satish Balay wrote: > > The issue with this test code is - using MatLoad() twice [with the > same object - without destroying it]. Not sure if thats supporsed to > work.. If the file has two matrices in it then yes a second call to MatLoad() with the same matrix should just load in the second matrix from the file correctly. Perhaps we need a test in our test suite just to make sure that works. Barry > > Satish > > On Fri, 21 Oct 2016, Hong wrote: > >> I can reproduce the error on a linux machine with petsc-maint. It crashes >> at 2nd solve, on both processors: >> >> Program received signal SIGSEGV, Segmentation fault. >> 0x00007f051dc835bd in pdgsequ (A=0x1563910, r=0x176dfe0, c=0x178f7f0, >> rowcnd=0x7fffcb8dab30, colcnd=0x7fffcb8dab38, amax=0x7fffcb8dab40, >> info=0x7fffcb8dab4c, grid=0x1563858) >> at >> /sandbox/hzhang/petsc/arch-linux-gcc-gfortran/externalpackages/git.superlu_dist/SRC/pdgsequ.c:182 >> 182 c[jcol] = SUPERLU_MAX( c[jcol], fabs(Aval[j]) * r[irow] >> ); >> >> The version of superlu_dist: >> commit 0b5369f304507f1c7904a913f4c0c86777a60639 >> Author: Xiaoye Li >> Date: Thu May 26 11:33:19 2016 -0700 >> >> rename 'struct pair' to 'struct superlu_pair'. >> >> Hong >> >> On Fri, Oct 21, 2016 at 5:36 AM, Anton Popov wrote: >> >>> >>> On 10/19/2016 05:22 PM, Anton Popov wrote: >>> >>> I looked at each valgrind-complained item in your email dated Oct. 11. >>> Those reports are really superficial; I don't see anything wrong with >>> those lines (mostly uninitialized variables) singled out. I did a few >>> tests with the latest version in github, all went fine. >>> >>> Perhaps you can print your matrix that caused problem, I can run it using >>> your matrix. >>> >>> Sherry >>> >>> Hi Sherry, >>> >>> I finally figured out a minimalistic setup (attached) that reproduces the >>> problem. >>> >>> I use petsc-maint: >>> >>> git clone -b maint https://bitbucket.org/petsc/petsc.git >>> >>> and configure it in the debug mode without optimization using the options: >>> >>> --download-superlu_dist=1 \ >>> --download-superlu_dist-commit=origin/maint \ >>> >>> Compile the test, assuming PETSC_DIR points to the described petsc >>> installation: >>> >>> make ex16 >>> >>> Run with: >>> >>> mpirun -n 2 ./ex16 -f binaryoutput -pc_type lu >>> -pc_factor_mat_solver_package superlu_dist >>> >>> Matrix partitioning between the processors will be completely the same as >>> in our code (hard-coded). >>> >>> I factorize the same matrix twice with the same PC object. Remarkably it >>> runs fine for the first time, but fails for the second. >>> >>> Thank you very much for looking into this problem. >>> >>> Cheers, >>> Anton >>> >> > From jed at jedbrown.org Fri Oct 21 18:03:06 2016 From: jed at jedbrown.org (Jed Brown) Date: Fri, 21 Oct 2016 17:03:06 -0600 Subject: [petsc-users] matrix preallocation In-Reply-To: References: Message-ID: <87mvhxpaol.fsf@jedbrown.org> "Kong, Fande" writes: > Hi, > > For mechanics problems, the contact surface changes during each nonlinear > iteration. Therefore, the sparsity of matrix also changes during each > nonlinear iteration. We know the preallocaiton is important for performance. > > My question is: it is possible to re-allocate memory during each nonlinear > iteration? Sure, call MatXAIJSetPreallocation inside your SNES Jacobian function. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From bsmith at mcs.anl.gov Fri Oct 21 18:05:50 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 21 Oct 2016 18:05:50 -0500 Subject: [petsc-users] matrix preallocation In-Reply-To: References: Message-ID: <4DABA899-9BBF-4E37-96A6-6A01314CA6C5@mcs.anl.gov> We don't currently have a MatReset (corresponding to PCRest() etc) but it is the right thing for you in this situation I think. A shallow MatReset() would destroy all the matrix data structures but not the Layout information (likely you want this one) while a deep reset would even get rid of the size information and be like the matrix just came from MatCreate(). If you want to start a MatReset() and post a pull request we can get it in. Note that you will need a MatReset_SeqAIJ() and a MatReset_MPIAIJ() to start with. Barry > On Oct 21, 2016, at 4:51 PM, Kong, Fande wrote: > > Hi, > > For mechanics problems, the contact surface changes during each nonlinear iteration. Therefore, the sparsity of matrix also changes during each nonlinear iteration. We know the preallocaiton is important for performance. > > My question is: it is possible to re-allocate memory during each nonlinear iteration? > > Fande From balay at mcs.anl.gov Fri Oct 21 18:33:41 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 21 Oct 2016 18:33:41 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> Message-ID: On Fri, 21 Oct 2016, Barry Smith wrote: > > > On Oct 21, 2016, at 5:16 PM, Satish Balay wrote: > > > > The issue with this test code is - using MatLoad() twice [with the > > same object - without destroying it]. Not sure if thats supporsed to > > work.. > > If the file has two matrices in it then yes a second call to MatLoad() with the same matrix should just load in the second matrix from the file correctly. Perhaps we need a test in our test suite just to make sure that works. This test code crashes with: MatLoad() MatView() MatLoad() MatView() Satish -------- balay at asterix /home/balay/download-pine/x/superlu_dist_test $ cat ex16.c static char help[] = "Reads matrix and debug solver\n\n"; #include #undef __FUNCT__ #define __FUNCT__ "main" int main(int argc,char **args) { Mat A; PetscViewer fd; /* viewer */ char file[PETSC_MAX_PATH_LEN]; /* input file name */ PetscErrorCode ierr; PetscBool flg; PetscInitialize(&argc,&args,(char*)0,help); ierr = PetscOptionsGetString(NULL,NULL,"-f",file,PETSC_MAX_PATH_LEN,&flg); CHKERRQ(ierr); if (!flg) SETERRQ(PETSC_COMM_WORLD,1,"Must indicate binary file with the -f option"); ierr = MatCreate(PETSC_COMM_WORLD,&A); CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD, "First MatLoad! \n");CHKERRQ(ierr); ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,file,FILE_MODE_READ,&fd); CHKERRQ(ierr); ierr = MatLoad(A,fd); CHKERRQ(ierr); ierr = PetscViewerDestroy(&fd); CHKERRQ(ierr); ierr = MatView(A,0);CHKERRQ(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD, "Second MatLoad! \n");CHKERRQ(ierr); ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,file,FILE_MODE_READ,&fd); CHKERRQ(ierr); ierr = MatLoad(A,fd); CHKERRQ(ierr); ierr = PetscViewerDestroy(&fd); CHKERRQ(ierr); ierr = MatView(A,0);CHKERRQ(ierr); ierr = MatDestroy(&A); CHKERRQ(ierr); ierr = PetscFinalize(); return 0; } balay at asterix /home/balay/download-pine/x/superlu_dist_test $ make ex16 mpicc -o ex16.o -c -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g3 -I/home/balay/petsc/include -I/home/balay/petsc/arch-idx64-slu/include `pwd`/ex16.c mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g3 -o ex16 ex16.o -Wl,-rpath,/home/balay/petsc/arch-idx64-slu/lib -L/home/balay/petsc/arch-idx64-slu/lib -lpetsc -Wl,-rpath,/home/balay/petsc/arch-idx64-slu/lib -lsuperlu_dist -llapack -lblas -lparmetis -lmetis -lX11 -lpthread -lm -Wl,-rpath,/home/balay/soft/mpich-3.1.4/lib -L/home/balay/soft/mpich-3.1.4/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/6.2.1 -L/usr/lib/gcc/x86_64-redhat-linux/6.2.1 -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/home/balay/soft/mpich-3.1.4/lib -L/home/balay/soft/mpich-3.1.4/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/6.2.1 -L/usr/lib/gcc/x86_64-redhat-linux/6.2.1 -ldl -Wl,-rpath,/home/balay/soft/mpich-3.1.4/lib -lmpi -lgcc_s -ldl /usr/bin/rm -f ex16.o balay at asterix /home/balay/download-pine/x/superlu_dist_test $ mpiexec -n 2 ./ex16 -f ~/datafiles/matrices/small First MatLoad! Mat Object: 2 MPI processes type: mpiaij row 0: (0, 4.) (1, -1.) (6, -1.) row 1: (0, -1.) (1, 4.) (2, -1.) (7, -1.) row 2: (1, -1.) (2, 4.) (3, -1.) (8, -1.) row 3: (2, -1.) (3, 4.) (4, -1.) (9, -1.) row 4: (3, -1.) (4, 4.) (5, -1.) (10, -1.) row 5: (4, -1.) (5, 4.) (11, -1.) row 6: (0, -1.) (6, 4.) (7, -1.) (12, -1.) row 7: (1, -1.) (6, -1.) (7, 4.) (8, -1.) (13, -1.) row 8: (2, -1.) (7, -1.) (8, 4.) (9, -1.) (14, -1.) row 9: (3, -1.) (8, -1.) (9, 4.) (10, -1.) (15, -1.) row 10: (4, -1.) (9, -1.) (10, 4.) (11, -1.) (16, -1.) row 11: (5, -1.) (10, -1.) (11, 4.) (17, -1.) row 12: (6, -1.) (12, 4.) (13, -1.) (18, -1.) row 13: (7, -1.) (12, -1.) (13, 4.) (14, -1.) (19, -1.) row 14: (8, -1.) (13, -1.) (14, 4.) (15, -1.) (20, -1.) row 15: (9, -1.) (14, -1.) (15, 4.) (16, -1.) (21, -1.) row 16: (10, -1.) (15, -1.) (16, 4.) (17, -1.) (22, -1.) row 17: (11, -1.) (16, -1.) (17, 4.) (23, -1.) row 18: (12, -1.) (18, 4.) (19, -1.) (24, -1.) row 19: (13, -1.) (18, -1.) (19, 4.) (20, -1.) (25, -1.) row 20: (14, -1.) (19, -1.) (20, 4.) (21, -1.) (26, -1.) row 21: (15, -1.) (20, -1.) (21, 4.) (22, -1.) (27, -1.) row 22: (16, -1.) (21, -1.) (22, 4.) (23, -1.) (28, -1.) row 23: (17, -1.) (22, -1.) (23, 4.) (29, -1.) row 24: (18, -1.) (24, 4.) (25, -1.) (30, -1.) row 25: (19, -1.) (24, -1.) (25, 4.) (26, -1.) (31, -1.) row 26: (20, -1.) (25, -1.) (26, 4.) (27, -1.) (32, -1.) row 27: (21, -1.) (26, -1.) (27, 4.) (28, -1.) (33, -1.) row 28: (22, -1.) (27, -1.) (28, 4.) (29, -1.) (34, -1.) row 29: (23, -1.) (28, -1.) (29, 4.) (35, -1.) row 30: (24, -1.) (30, 4.) (31, -1.) row 31: (25, -1.) (30, -1.) (31, 4.) (32, -1.) row 32: (26, -1.) (31, -1.) (32, 4.) (33, -1.) row 33: (27, -1.) (32, -1.) (33, 4.) (34, -1.) row 34: (28, -1.) (33, -1.) (34, 4.) (35, -1.) row 35: (29, -1.) (34, -1.) (35, 4.) Second MatLoad! Mat Object: 2 MPI processes type: mpiaij [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Argument out of range [0]PETSC ERROR: Column too large: col 32628 max 35 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1729-g4c4de23 GIT Date: 2016-10-20 22:22:58 +0000 [0]PETSC ERROR: ./ex16 on a arch-idx64-slu named asterix by balay Fri Oct 21 18:31:45 2016 [0]PETSC ERROR: Configure options --download-metis --download-parmetis --download-superlu_dist PETSC_ARCH=arch-idx64-slu [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 585 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c [0]PETSC ERROR: #2 MatSetValues() line 1278 in /home/balay/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: #3 MatView_MPIAIJ_ASCIIorDraworSocket() line 1404 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c [0]PETSC ERROR: #4 MatView_MPIAIJ() line 1440 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c [0]PETSC ERROR: #5 MatView() line 989 in /home/balay/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: #6 main() line 30 in /home/balay/download-pine/x/superlu_dist_test/ex16.c [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: -display :0.0 [0]PETSC ERROR: -f /home/balay/datafiles/matrices/small [0]PETSC ERROR: -malloc_dump [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 [cli_0]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 4434 RUNNING AT asterix = EXIT CODE: 63 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== balay at asterix /home/balay/download-pine/x/superlu_dist_test $ From bsmith at mcs.anl.gov Fri Oct 21 18:38:51 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 21 Oct 2016 18:38:51 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> Message-ID: <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> valgrind first > On Oct 21, 2016, at 6:33 PM, Satish Balay wrote: > > On Fri, 21 Oct 2016, Barry Smith wrote: > >> >>> On Oct 21, 2016, at 5:16 PM, Satish Balay wrote: >>> >>> The issue with this test code is - using MatLoad() twice [with the >>> same object - without destroying it]. Not sure if thats supporsed to >>> work.. >> >> If the file has two matrices in it then yes a second call to MatLoad() with the same matrix should just load in the second matrix from the file correctly. Perhaps we need a test in our test suite just to make sure that works. > > This test code crashes with: > > MatLoad() > MatView() > MatLoad() > MatView() > > Satish > > -------- > > balay at asterix /home/balay/download-pine/x/superlu_dist_test > $ cat ex16.c > static char help[] = "Reads matrix and debug solver\n\n"; > #include > #undef __FUNCT__ > #define __FUNCT__ "main" > int main(int argc,char **args) > { > Mat A; > PetscViewer fd; /* viewer */ > char file[PETSC_MAX_PATH_LEN]; /* input file name */ > PetscErrorCode ierr; > PetscBool flg; > > PetscInitialize(&argc,&args,(char*)0,help); > > ierr = PetscOptionsGetString(NULL,NULL,"-f",file,PETSC_MAX_PATH_LEN,&flg); CHKERRQ(ierr); > if (!flg) SETERRQ(PETSC_COMM_WORLD,1,"Must indicate binary file with the -f option"); > > ierr = MatCreate(PETSC_COMM_WORLD,&A); CHKERRQ(ierr); > > ierr = PetscPrintf(PETSC_COMM_WORLD, "First MatLoad! \n");CHKERRQ(ierr); > ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,file,FILE_MODE_READ,&fd); CHKERRQ(ierr); > ierr = MatLoad(A,fd); CHKERRQ(ierr); > ierr = PetscViewerDestroy(&fd); CHKERRQ(ierr); > ierr = MatView(A,0);CHKERRQ(ierr); > > ierr = PetscPrintf(PETSC_COMM_WORLD, "Second MatLoad! \n");CHKERRQ(ierr); > ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD,file,FILE_MODE_READ,&fd); CHKERRQ(ierr); > ierr = MatLoad(A,fd); CHKERRQ(ierr); > ierr = PetscViewerDestroy(&fd); CHKERRQ(ierr); > ierr = MatView(A,0);CHKERRQ(ierr); > > ierr = MatDestroy(&A); CHKERRQ(ierr); > ierr = PetscFinalize(); > return 0; > } > > balay at asterix /home/balay/download-pine/x/superlu_dist_test > $ make ex16 > mpicc -o ex16.o -c -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g3 -I/home/balay/petsc/include -I/home/balay/petsc/arch-idx64-slu/include `pwd`/ex16.c > mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g3 -o ex16 ex16.o -Wl,-rpath,/home/balay/petsc/arch-idx64-slu/lib -L/home/balay/petsc/arch-idx64-slu/lib -lpetsc -Wl,-rpath,/home/balay/petsc/arch-idx64-slu/lib -lsuperlu_dist -llapack -lblas -lparmetis -lmetis -lX11 -lpthread -lm -Wl,-rpath,/home/balay/soft/mpich-3.1.4/lib -L/home/balay/soft/mpich-3.1.4/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/6.2.1 -L/usr/lib/gcc/x86_64-redhat-linux/6.2.1 -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/home/balay/soft/mpich-3.1.4/lib -L/home/balay/soft/mpich-3.1.4/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/6.2.1 -L/usr/lib/gcc/x86_64-redhat-linux/6.2.1 -ldl -Wl,-rpath,/home/balay/soft/mpich-3.1.4/lib -lmpi -lgcc_s -ldl > /usr/bin/rm -f ex16.o > balay at asterix /home/balay/download-pine/x/superlu_dist_test > $ mpiexec -n 2 ./ex16 -f ~/datafiles/matrices/small > First MatLoad! > Mat Object: 2 MPI processes > type: mpiaij > row 0: (0, 4.) (1, -1.) (6, -1.) > row 1: (0, -1.) (1, 4.) (2, -1.) (7, -1.) > row 2: (1, -1.) (2, 4.) (3, -1.) (8, -1.) > row 3: (2, -1.) (3, 4.) (4, -1.) (9, -1.) > row 4: (3, -1.) (4, 4.) (5, -1.) (10, -1.) > row 5: (4, -1.) (5, 4.) (11, -1.) > row 6: (0, -1.) (6, 4.) (7, -1.) (12, -1.) > row 7: (1, -1.) (6, -1.) (7, 4.) (8, -1.) (13, -1.) > row 8: (2, -1.) (7, -1.) (8, 4.) (9, -1.) (14, -1.) > row 9: (3, -1.) (8, -1.) (9, 4.) (10, -1.) (15, -1.) > row 10: (4, -1.) (9, -1.) (10, 4.) (11, -1.) (16, -1.) > row 11: (5, -1.) (10, -1.) (11, 4.) (17, -1.) > row 12: (6, -1.) (12, 4.) (13, -1.) (18, -1.) > row 13: (7, -1.) (12, -1.) (13, 4.) (14, -1.) (19, -1.) > row 14: (8, -1.) (13, -1.) (14, 4.) (15, -1.) (20, -1.) > row 15: (9, -1.) (14, -1.) (15, 4.) (16, -1.) (21, -1.) > row 16: (10, -1.) (15, -1.) (16, 4.) (17, -1.) (22, -1.) > row 17: (11, -1.) (16, -1.) (17, 4.) (23, -1.) > row 18: (12, -1.) (18, 4.) (19, -1.) (24, -1.) > row 19: (13, -1.) (18, -1.) (19, 4.) (20, -1.) (25, -1.) > row 20: (14, -1.) (19, -1.) (20, 4.) (21, -1.) (26, -1.) > row 21: (15, -1.) (20, -1.) (21, 4.) (22, -1.) (27, -1.) > row 22: (16, -1.) (21, -1.) (22, 4.) (23, -1.) (28, -1.) > row 23: (17, -1.) (22, -1.) (23, 4.) (29, -1.) > row 24: (18, -1.) (24, 4.) (25, -1.) (30, -1.) > row 25: (19, -1.) (24, -1.) (25, 4.) (26, -1.) (31, -1.) > row 26: (20, -1.) (25, -1.) (26, 4.) (27, -1.) (32, -1.) > row 27: (21, -1.) (26, -1.) (27, 4.) (28, -1.) (33, -1.) > row 28: (22, -1.) (27, -1.) (28, 4.) (29, -1.) (34, -1.) > row 29: (23, -1.) (28, -1.) (29, 4.) (35, -1.) > row 30: (24, -1.) (30, 4.) (31, -1.) > row 31: (25, -1.) (30, -1.) (31, 4.) (32, -1.) > row 32: (26, -1.) (31, -1.) (32, 4.) (33, -1.) > row 33: (27, -1.) (32, -1.) (33, 4.) (34, -1.) > row 34: (28, -1.) (33, -1.) (34, 4.) (35, -1.) > row 35: (29, -1.) (34, -1.) (35, 4.) > Second MatLoad! > Mat Object: 2 MPI processes > type: mpiaij > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Argument out of range > [0]PETSC ERROR: Column too large: col 32628 max 35 > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1729-g4c4de23 GIT Date: 2016-10-20 22:22:58 +0000 > [0]PETSC ERROR: ./ex16 on a arch-idx64-slu named asterix by balay Fri Oct 21 18:31:45 2016 > [0]PETSC ERROR: Configure options --download-metis --download-parmetis --download-superlu_dist PETSC_ARCH=arch-idx64-slu > [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 585 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > [0]PETSC ERROR: #2 MatSetValues() line 1278 in /home/balay/petsc/src/mat/interface/matrix.c > [0]PETSC ERROR: #3 MatView_MPIAIJ_ASCIIorDraworSocket() line 1404 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > [0]PETSC ERROR: #4 MatView_MPIAIJ() line 1440 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > [0]PETSC ERROR: #5 MatView() line 989 in /home/balay/petsc/src/mat/interface/matrix.c > [0]PETSC ERROR: #6 main() line 30 in /home/balay/download-pine/x/superlu_dist_test/ex16.c > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -display :0.0 > [0]PETSC ERROR: -f /home/balay/datafiles/matrices/small > [0]PETSC ERROR: -malloc_dump > [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 > [cli_0]: aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 4434 RUNNING AT asterix > = EXIT CODE: 63 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > =================================================================================== > balay at asterix /home/balay/download-pine/x/superlu_dist_test > $ From balay at mcs.anl.gov Fri Oct 21 18:48:39 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 21 Oct 2016 18:48:39 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> Message-ID: On Fri, 21 Oct 2016, Barry Smith wrote: > > valgrind first balay at asterix /home/balay/download-pine/x/superlu_dist_test $ mpiexec -n 2 $VG ./ex16 -f ~/datafiles/matrices/small First MatLoad! Mat Object: 2 MPI processes type: mpiaij row 0: (0, 4.) (1, -1.) (6, -1.) row 1: (0, -1.) (1, 4.) (2, -1.) (7, -1.) row 2: (1, -1.) (2, 4.) (3, -1.) (8, -1.) row 3: (2, -1.) (3, 4.) (4, -1.) (9, -1.) row 4: (3, -1.) (4, 4.) (5, -1.) (10, -1.) row 5: (4, -1.) (5, 4.) (11, -1.) row 6: (0, -1.) (6, 4.) (7, -1.) (12, -1.) row 7: (1, -1.) (6, -1.) (7, 4.) (8, -1.) (13, -1.) row 8: (2, -1.) (7, -1.) (8, 4.) (9, -1.) (14, -1.) row 9: (3, -1.) (8, -1.) (9, 4.) (10, -1.) (15, -1.) row 10: (4, -1.) (9, -1.) (10, 4.) (11, -1.) (16, -1.) row 11: (5, -1.) (10, -1.) (11, 4.) (17, -1.) row 12: (6, -1.) (12, 4.) (13, -1.) (18, -1.) row 13: (7, -1.) (12, -1.) (13, 4.) (14, -1.) (19, -1.) row 14: (8, -1.) (13, -1.) (14, 4.) (15, -1.) (20, -1.) row 15: (9, -1.) (14, -1.) (15, 4.) (16, -1.) (21, -1.) row 16: (10, -1.) (15, -1.) (16, 4.) (17, -1.) (22, -1.) row 17: (11, -1.) (16, -1.) (17, 4.) (23, -1.) row 18: (12, -1.) (18, 4.) (19, -1.) (24, -1.) row 19: (13, -1.) (18, -1.) (19, 4.) (20, -1.) (25, -1.) row 20: (14, -1.) (19, -1.) (20, 4.) (21, -1.) (26, -1.) row 21: (15, -1.) (20, -1.) (21, 4.) (22, -1.) (27, -1.) row 22: (16, -1.) (21, -1.) (22, 4.) (23, -1.) (28, -1.) row 23: (17, -1.) (22, -1.) (23, 4.) (29, -1.) row 24: (18, -1.) (24, 4.) (25, -1.) (30, -1.) row 25: (19, -1.) (24, -1.) (25, 4.) (26, -1.) (31, -1.) row 26: (20, -1.) (25, -1.) (26, 4.) (27, -1.) (32, -1.) row 27: (21, -1.) (26, -1.) (27, 4.) (28, -1.) (33, -1.) row 28: (22, -1.) (27, -1.) (28, 4.) (29, -1.) (34, -1.) row 29: (23, -1.) (28, -1.) (29, 4.) (35, -1.) row 30: (24, -1.) (30, 4.) (31, -1.) row 31: (25, -1.) (30, -1.) (31, 4.) (32, -1.) row 32: (26, -1.) (31, -1.) (32, 4.) (33, -1.) row 33: (27, -1.) (32, -1.) (33, 4.) (34, -1.) row 34: (28, -1.) (33, -1.) (34, 4.) (35, -1.) row 35: (29, -1.) (34, -1.) (35, 4.) Second MatLoad! Mat Object: 2 MPI processes type: mpiaij ==4592== Invalid read of size 4 ==4592== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1402) ==4592== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) ==4592== by 0x53373D7: MatView (matrix.c:989) ==4592== by 0x40107E: main (ex16.c:30) ==4592== Address 0xa47b460 is 20 bytes after a block of size 28 alloc'd ==4592== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) ==4592== by 0x4FD121A: PetscMallocAlign (mal.c:28) ==4592== by 0x5842C70: MatSetUpMultiply_MPIAIJ (mmaij.c:41) ==4592== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) ==4592== by 0x536B299: MatAssemblyEnd (matrix.c:5298) ==4592== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) ==4592== by 0x5337FEA: MatLoad (matrix.c:1101) ==4592== by 0x400D9F: main (ex16.c:22) ==4592== ==4591== Invalid read of size 4 ==4591== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1402) ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) ==4591== by 0x53373D7: MatView (matrix.c:989) ==4591== by 0x40107E: main (ex16.c:30) ==4591== Address 0xa482958 is 24 bytes before a block of size 7 alloc'd ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) ==4591== by 0x4F31FB5: PetscStrallocpy (str.c:197) ==4591== by 0x4F0D3F5: PetscClassRegLogRegister (classlog.c:253) ==4591== by 0x4EF96E2: PetscClassIdRegister (plog.c:2053) ==4591== by 0x51FA018: VecInitializePackage (dlregisvec.c:165) ==4591== by 0x51F6DE9: VecCreate (veccreate.c:35) ==4591== by 0x51C49F0: VecCreateSeq (vseqcr.c:37) ==4591== by 0x5843191: MatSetUpMultiply_MPIAIJ (mmaij.c:104) ==4591== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) ==4591== by 0x536B299: MatAssemblyEnd (matrix.c:5298) ==4591== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) ==4591== by 0x5337FEA: MatLoad (matrix.c:1101) ==4591== by 0x400D9F: main (ex16.c:22) ==4591== [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Argument out of range [0]PETSC ERROR: Column too large: col 96 max 35 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1729-g4c4de23 GIT Date: 2016-10-20 22:22:58 +0000 [0]PETSC ERROR: ./ex16 on a arch-idx64-slu named asterix by balay Fri Oct 21 18:47:51 2016 [0]PETSC ERROR: Configure options --download-metis --download-parmetis --download-superlu_dist PETSC_ARCH=arch-idx64-slu [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 585 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c [0]PETSC ERROR: #2 MatAssemblyEnd_MPIAIJ() line 724 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c [0]PETSC ERROR: #3 MatAssemblyEnd() line 5298 in /home/balay/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: #4 MatView_MPIAIJ_ASCIIorDraworSocket() line 1410 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c [0]PETSC ERROR: #5 MatView_MPIAIJ() line 1440 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c [0]PETSC ERROR: #6 MatView() line 989 in /home/balay/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: #7 main() line 30 in /home/balay/download-pine/x/superlu_dist_test/ex16.c [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: -display :0.0 [0]PETSC ERROR: -f /home/balay/datafiles/matrices/small [0]PETSC ERROR: -malloc_dump [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 [cli_0]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 ==4591== 16,965 (2,744 direct, 14,221 indirect) bytes in 1 blocks are definitely lost in loss record 1,014 of 1,016 ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) ==4591== by 0x52F3B14: MatCreate (gcreate.c:84) ==4591== by 0x581390A: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1371) ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) ==4591== by 0x53373D7: MatView (matrix.c:989) ==4591== by 0x40107E: main (ex16.c:30) ==4591== =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 4591 RUNNING AT asterix = EXIT CODE: 63 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== balay at asterix /home/balay/download-pine/x/superlu_dist_test $ From hzhang at mcs.anl.gov Fri Oct 21 20:18:40 2016 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Sat, 22 Oct 2016 01:18:40 +0000 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> , Message-ID: <3D9EEEDDE5F38D4886C1845F99C697F7DFEE33@DITKA.anl.gov> I am investigating it. The file has two matrices. The code takes following steps: PCCreate(PETSC_COMM_WORLD, &pc); MatCreate(PETSC_COMM_WORLD,&A); MatLoad(A,fd); PCSetOperators(pc,A,A); PCSetUp(pc); MatCreate(PETSC_COMM_WORLD,&A); MatLoad(A,fd); PCSetOperators(pc,A,A); PCSetUp(pc); //crash here with np=2, superlu_dist, not with mumps/superlu or superlu_dist np=1 Hong ________________________________________ From: Barry Smith [bsmith at mcs.anl.gov] Sent: Friday, October 21, 2016 5:59 PM To: petsc-users Cc: Zhang, Hong Subject: Re: [petsc-users] SuperLU_dist issue in 3.7.4 > On Oct 21, 2016, at 5:16 PM, Satish Balay wrote: > > The issue with this test code is - using MatLoad() twice [with the > same object - without destroying it]. Not sure if thats supporsed to > work.. If the file has two matrices in it then yes a second call to MatLoad() with the same matrix should just load in the second matrix from the file correctly. Perhaps we need a test in our test suite just to make sure that works. Barry > > Satish > > On Fri, 21 Oct 2016, Hong wrote: > >> I can reproduce the error on a linux machine with petsc-maint. It crashes >> at 2nd solve, on both processors: >> >> Program received signal SIGSEGV, Segmentation fault. >> 0x00007f051dc835bd in pdgsequ (A=0x1563910, r=0x176dfe0, c=0x178f7f0, >> rowcnd=0x7fffcb8dab30, colcnd=0x7fffcb8dab38, amax=0x7fffcb8dab40, >> info=0x7fffcb8dab4c, grid=0x1563858) >> at >> /sandbox/hzhang/petsc/arch-linux-gcc-gfortran/externalpackages/git.superlu_dist/SRC/pdgsequ.c:182 >> 182 c[jcol] = SUPERLU_MAX( c[jcol], fabs(Aval[j]) * r[irow] >> ); >> >> The version of superlu_dist: >> commit 0b5369f304507f1c7904a913f4c0c86777a60639 >> Author: Xiaoye Li >> Date: Thu May 26 11:33:19 2016 -0700 >> >> rename 'struct pair' to 'struct superlu_pair'. >> >> Hong >> >> On Fri, Oct 21, 2016 at 5:36 AM, Anton Popov wrote: >> >>> >>> On 10/19/2016 05:22 PM, Anton Popov wrote: >>> >>> I looked at each valgrind-complained item in your email dated Oct. 11. >>> Those reports are really superficial; I don't see anything wrong with >>> those lines (mostly uninitialized variables) singled out. I did a few >>> tests with the latest version in github, all went fine. >>> >>> Perhaps you can print your matrix that caused problem, I can run it using >>> your matrix. >>> >>> Sherry >>> >>> Hi Sherry, >>> >>> I finally figured out a minimalistic setup (attached) that reproduces the >>> problem. >>> >>> I use petsc-maint: >>> >>> git clone -b maint https://bitbucket.org/petsc/petsc.git >>> >>> and configure it in the debug mode without optimization using the options: >>> >>> --download-superlu_dist=1 \ >>> --download-superlu_dist-commit=origin/maint \ >>> >>> Compile the test, assuming PETSC_DIR points to the described petsc >>> installation: >>> >>> make ex16 >>> >>> Run with: >>> >>> mpirun -n 2 ./ex16 -f binaryoutput -pc_type lu >>> -pc_factor_mat_solver_package superlu_dist >>> >>> Matrix partitioning between the processors will be completely the same as >>> in our code (hard-coded). >>> >>> I factorize the same matrix twice with the same PC object. Remarkably it >>> runs fine for the first time, but fails for the second. >>> >>> Thank you very much for looking into this problem. >>> >>> Cheers, >>> Anton >>> >> > From hzhang at mcs.anl.gov Fri Oct 21 21:28:51 2016 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Sat, 22 Oct 2016 02:28:51 +0000 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: <3D9EEEDDE5F38D4886C1845F99C697F7DFEE33@DITKA.anl.gov> References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> , , <3D9EEEDDE5F38D4886C1845F99C697F7DFEE33@DITKA.anl.gov> Message-ID: <3D9EEEDDE5F38D4886C1845F99C697F7DFEE40@DITKA.anl.gov> It is not problem with Matload twice. The file has one matrix, but is loaded twice. Replacing pc with ksp, the code runs fine. The error occurs when PCSetUp_LU() is called with SAME_NONZERO_PATTERN. I'll further look at it later. Hong ________________________________________ From: Zhang, Hong Sent: Friday, October 21, 2016 8:18 PM To: Barry Smith; petsc-users Subject: RE: [petsc-users] SuperLU_dist issue in 3.7.4 I am investigating it. The file has two matrices. The code takes following steps: PCCreate(PETSC_COMM_WORLD, &pc); MatCreate(PETSC_COMM_WORLD,&A); MatLoad(A,fd); PCSetOperators(pc,A,A); PCSetUp(pc); MatCreate(PETSC_COMM_WORLD,&A); MatLoad(A,fd); PCSetOperators(pc,A,A); PCSetUp(pc); //crash here with np=2, superlu_dist, not with mumps/superlu or superlu_dist np=1 Hong ________________________________________ From: Barry Smith [bsmith at mcs.anl.gov] Sent: Friday, October 21, 2016 5:59 PM To: petsc-users Cc: Zhang, Hong Subject: Re: [petsc-users] SuperLU_dist issue in 3.7.4 > On Oct 21, 2016, at 5:16 PM, Satish Balay wrote: > > The issue with this test code is - using MatLoad() twice [with the > same object - without destroying it]. Not sure if thats supporsed to > work.. If the file has two matrices in it then yes a second call to MatLoad() with the same matrix should just load in the second matrix from the file correctly. Perhaps we need a test in our test suite just to make sure that works. Barry > > Satish > > On Fri, 21 Oct 2016, Hong wrote: > >> I can reproduce the error on a linux machine with petsc-maint. It crashes >> at 2nd solve, on both processors: >> >> Program received signal SIGSEGV, Segmentation fault. >> 0x00007f051dc835bd in pdgsequ (A=0x1563910, r=0x176dfe0, c=0x178f7f0, >> rowcnd=0x7fffcb8dab30, colcnd=0x7fffcb8dab38, amax=0x7fffcb8dab40, >> info=0x7fffcb8dab4c, grid=0x1563858) >> at >> /sandbox/hzhang/petsc/arch-linux-gcc-gfortran/externalpackages/git.superlu_dist/SRC/pdgsequ.c:182 >> 182 c[jcol] = SUPERLU_MAX( c[jcol], fabs(Aval[j]) * r[irow] >> ); >> >> The version of superlu_dist: >> commit 0b5369f304507f1c7904a913f4c0c86777a60639 >> Author: Xiaoye Li >> Date: Thu May 26 11:33:19 2016 -0700 >> >> rename 'struct pair' to 'struct superlu_pair'. >> >> Hong >> >> On Fri, Oct 21, 2016 at 5:36 AM, Anton Popov wrote: >> >>> >>> On 10/19/2016 05:22 PM, Anton Popov wrote: >>> >>> I looked at each valgrind-complained item in your email dated Oct. 11. >>> Those reports are really superficial; I don't see anything wrong with >>> those lines (mostly uninitialized variables) singled out. I did a few >>> tests with the latest version in github, all went fine. >>> >>> Perhaps you can print your matrix that caused problem, I can run it using >>> your matrix. >>> >>> Sherry >>> >>> Hi Sherry, >>> >>> I finally figured out a minimalistic setup (attached) that reproduces the >>> problem. >>> >>> I use petsc-maint: >>> >>> git clone -b maint https://bitbucket.org/petsc/petsc.git >>> >>> and configure it in the debug mode without optimization using the options: >>> >>> --download-superlu_dist=1 \ >>> --download-superlu_dist-commit=origin/maint \ >>> >>> Compile the test, assuming PETSC_DIR points to the described petsc >>> installation: >>> >>> make ex16 >>> >>> Run with: >>> >>> mpirun -n 2 ./ex16 -f binaryoutput -pc_type lu >>> -pc_factor_mat_solver_package superlu_dist >>> >>> Matrix partitioning between the processors will be completely the same as >>> in our code (hard-coded). >>> >>> I factorize the same matrix twice with the same PC object. Remarkably it >>> runs fine for the first time, but fails for the second. >>> >>> Thank you very much for looking into this problem. >>> >>> Cheers, >>> Anton >>> >> > From patrick.sanan at gmail.com Sat Oct 22 08:32:01 2016 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Sat, 22 Oct 2016 15:32:01 +0200 Subject: [petsc-users] Looking for a quick example of a symmetric KKT system In-Reply-To: <871sz9r3q7.fsf@jedbrown.org> References: <871sz9r3q7.fsf@jedbrown.org> Message-ID: <20161022133201.GB510@Patricks-MacBook-Pro-11120.local> No particularly good reason - I am also using Stokes systems as tests, but thought it might be interesting to test on a saddle point system arising from a different proble. On Fri, Oct 21, 2016 at 11:50:24AM -0600, Jed Brown wrote: > Why doesn't a Stokes problem fulfill your needs? > > Patrick Sanan writes: > > > Yes, but AFAIK that example produces a 2x2 system - I was hoping for > > something with a variable problem size, ideally with some sort of > > physics motivating the underlying optimization problem. > > > > On Fri, Oct 21, 2016 at 7:23 PM, Justin Chang wrote: > >> Something like this? > >> > >> http://www.mcs.anl.gov/petsc/petsc-current/src/tao/constrained/examples/tutorials/toy.c.html > >> > >> > >> On Friday, October 21, 2016, Patrick Sanan wrote: > >>> > >>> Are there any examples already in PETSc or TAO that assemble such a > >>> system (which could thus be dumped)? SNES example ex73f90t assembles a > >>> non-symmetric KKT system. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 473 bytes Desc: not available URL: From bsmith at mcs.anl.gov Sun Oct 23 16:56:24 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 23 Oct 2016 16:56:24 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 failure of repeated calls to MatLoad() or MatMPIAIJSetPreallocation() with the same matrix In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> Message-ID: Thanks Satish, I have fixed this in barry/fix-matmpixxxsetpreallocation-reentrant (in next for testing) Fande, This will also make MatMPIAIJSetPreallocation() work properly with multiple calls (you will not need a MatReset()). Barry > On Oct 21, 2016, at 6:48 PM, Satish Balay wrote: > > On Fri, 21 Oct 2016, Barry Smith wrote: > >> >> valgrind first > > balay at asterix /home/balay/download-pine/x/superlu_dist_test > $ mpiexec -n 2 $VG ./ex16 -f ~/datafiles/matrices/small > First MatLoad! > Mat Object: 2 MPI processes > type: mpiaij > row 0: (0, 4.) (1, -1.) (6, -1.) > row 1: (0, -1.) (1, 4.) (2, -1.) (7, -1.) > row 2: (1, -1.) (2, 4.) (3, -1.) (8, -1.) > row 3: (2, -1.) (3, 4.) (4, -1.) (9, -1.) > row 4: (3, -1.) (4, 4.) (5, -1.) (10, -1.) > row 5: (4, -1.) (5, 4.) (11, -1.) > row 6: (0, -1.) (6, 4.) (7, -1.) (12, -1.) > row 7: (1, -1.) (6, -1.) (7, 4.) (8, -1.) (13, -1.) > row 8: (2, -1.) (7, -1.) (8, 4.) (9, -1.) (14, -1.) > row 9: (3, -1.) (8, -1.) (9, 4.) (10, -1.) (15, -1.) > row 10: (4, -1.) (9, -1.) (10, 4.) (11, -1.) (16, -1.) > row 11: (5, -1.) (10, -1.) (11, 4.) (17, -1.) > row 12: (6, -1.) (12, 4.) (13, -1.) (18, -1.) > row 13: (7, -1.) (12, -1.) (13, 4.) (14, -1.) (19, -1.) > row 14: (8, -1.) (13, -1.) (14, 4.) (15, -1.) (20, -1.) > row 15: (9, -1.) (14, -1.) (15, 4.) (16, -1.) (21, -1.) > row 16: (10, -1.) (15, -1.) (16, 4.) (17, -1.) (22, -1.) > row 17: (11, -1.) (16, -1.) (17, 4.) (23, -1.) > row 18: (12, -1.) (18, 4.) (19, -1.) (24, -1.) > row 19: (13, -1.) (18, -1.) (19, 4.) (20, -1.) (25, -1.) > row 20: (14, -1.) (19, -1.) (20, 4.) (21, -1.) (26, -1.) > row 21: (15, -1.) (20, -1.) (21, 4.) (22, -1.) (27, -1.) > row 22: (16, -1.) (21, -1.) (22, 4.) (23, -1.) (28, -1.) > row 23: (17, -1.) (22, -1.) (23, 4.) (29, -1.) > row 24: (18, -1.) (24, 4.) (25, -1.) (30, -1.) > row 25: (19, -1.) (24, -1.) (25, 4.) (26, -1.) (31, -1.) > row 26: (20, -1.) (25, -1.) (26, 4.) (27, -1.) (32, -1.) > row 27: (21, -1.) (26, -1.) (27, 4.) (28, -1.) (33, -1.) > row 28: (22, -1.) (27, -1.) (28, 4.) (29, -1.) (34, -1.) > row 29: (23, -1.) (28, -1.) (29, 4.) (35, -1.) > row 30: (24, -1.) (30, 4.) (31, -1.) > row 31: (25, -1.) (30, -1.) (31, 4.) (32, -1.) > row 32: (26, -1.) (31, -1.) (32, 4.) (33, -1.) > row 33: (27, -1.) (32, -1.) (33, 4.) (34, -1.) > row 34: (28, -1.) (33, -1.) (34, 4.) (35, -1.) > row 35: (29, -1.) (34, -1.) (35, 4.) > Second MatLoad! > Mat Object: 2 MPI processes > type: mpiaij > ==4592== Invalid read of size 4 > ==4592== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1402) > ==4592== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) > ==4592== by 0x53373D7: MatView (matrix.c:989) > ==4592== by 0x40107E: main (ex16.c:30) > ==4592== Address 0xa47b460 is 20 bytes after a block of size 28 alloc'd > ==4592== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) > ==4592== by 0x4FD121A: PetscMallocAlign (mal.c:28) > ==4592== by 0x5842C70: MatSetUpMultiply_MPIAIJ (mmaij.c:41) > ==4592== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) > ==4592== by 0x536B299: MatAssemblyEnd (matrix.c:5298) > ==4592== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) > ==4592== by 0x5337FEA: MatLoad (matrix.c:1101) > ==4592== by 0x400D9F: main (ex16.c:22) > ==4592== > ==4591== Invalid read of size 4 > ==4591== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1402) > ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) > ==4591== by 0x53373D7: MatView (matrix.c:989) > ==4591== by 0x40107E: main (ex16.c:30) > ==4591== Address 0xa482958 is 24 bytes before a block of size 7 alloc'd > ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) > ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) > ==4591== by 0x4F31FB5: PetscStrallocpy (str.c:197) > ==4591== by 0x4F0D3F5: PetscClassRegLogRegister (classlog.c:253) > ==4591== by 0x4EF96E2: PetscClassIdRegister (plog.c:2053) > ==4591== by 0x51FA018: VecInitializePackage (dlregisvec.c:165) > ==4591== by 0x51F6DE9: VecCreate (veccreate.c:35) > ==4591== by 0x51C49F0: VecCreateSeq (vseqcr.c:37) > ==4591== by 0x5843191: MatSetUpMultiply_MPIAIJ (mmaij.c:104) > ==4591== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) > ==4591== by 0x536B299: MatAssemblyEnd (matrix.c:5298) > ==4591== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) > ==4591== by 0x5337FEA: MatLoad (matrix.c:1101) > ==4591== by 0x400D9F: main (ex16.c:22) > ==4591== > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Argument out of range > [0]PETSC ERROR: Column too large: col 96 max 35 > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1729-g4c4de23 GIT Date: 2016-10-20 22:22:58 +0000 > [0]PETSC ERROR: ./ex16 on a arch-idx64-slu named asterix by balay Fri Oct 21 18:47:51 2016 > [0]PETSC ERROR: Configure options --download-metis --download-parmetis --download-superlu_dist PETSC_ARCH=arch-idx64-slu > [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 585 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > [0]PETSC ERROR: #2 MatAssemblyEnd_MPIAIJ() line 724 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > [0]PETSC ERROR: #3 MatAssemblyEnd() line 5298 in /home/balay/petsc/src/mat/interface/matrix.c > [0]PETSC ERROR: #4 MatView_MPIAIJ_ASCIIorDraworSocket() line 1410 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > [0]PETSC ERROR: #5 MatView_MPIAIJ() line 1440 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > [0]PETSC ERROR: #6 MatView() line 989 in /home/balay/petsc/src/mat/interface/matrix.c > [0]PETSC ERROR: #7 main() line 30 in /home/balay/download-pine/x/superlu_dist_test/ex16.c > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -display :0.0 > [0]PETSC ERROR: -f /home/balay/datafiles/matrices/small > [0]PETSC ERROR: -malloc_dump > [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 > [cli_0]: aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 > ==4591== 16,965 (2,744 direct, 14,221 indirect) bytes in 1 blocks are definitely lost in loss record 1,014 of 1,016 > ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) > ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) > ==4591== by 0x52F3B14: MatCreate (gcreate.c:84) > ==4591== by 0x581390A: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1371) > ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) > ==4591== by 0x53373D7: MatView (matrix.c:989) > ==4591== by 0x40107E: main (ex16.c:30) > ==4591== > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 4591 RUNNING AT asterix > = EXIT CODE: 63 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > =================================================================================== > balay at asterix /home/balay/download-pine/x/superlu_dist_test > $ From balay at mcs.anl.gov Sun Oct 23 18:58:49 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Sun, 23 Oct 2016 18:58:49 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 failure of repeated calls to MatLoad() or MatMPIAIJSetPreallocation() with the same matrix In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> Message-ID: The original testcode from Anton also works [i.e is valgrind clean] with this change.. Satish On Sun, 23 Oct 2016, Barry Smith wrote: > > Thanks Satish, > > I have fixed this in barry/fix-matmpixxxsetpreallocation-reentrant (in next for testing) > > Fande, > > This will also make MatMPIAIJSetPreallocation() work properly with multiple calls (you will not need a MatReset()). > > Barry > > > > On Oct 21, 2016, at 6:48 PM, Satish Balay wrote: > > > > On Fri, 21 Oct 2016, Barry Smith wrote: > > > >> > >> valgrind first > > > > balay at asterix /home/balay/download-pine/x/superlu_dist_test > > $ mpiexec -n 2 $VG ./ex16 -f ~/datafiles/matrices/small > > First MatLoad! > > Mat Object: 2 MPI processes > > type: mpiaij > > row 0: (0, 4.) (1, -1.) (6, -1.) > > row 1: (0, -1.) (1, 4.) (2, -1.) (7, -1.) > > row 2: (1, -1.) (2, 4.) (3, -1.) (8, -1.) > > row 3: (2, -1.) (3, 4.) (4, -1.) (9, -1.) > > row 4: (3, -1.) (4, 4.) (5, -1.) (10, -1.) > > row 5: (4, -1.) (5, 4.) (11, -1.) > > row 6: (0, -1.) (6, 4.) (7, -1.) (12, -1.) > > row 7: (1, -1.) (6, -1.) (7, 4.) (8, -1.) (13, -1.) > > row 8: (2, -1.) (7, -1.) (8, 4.) (9, -1.) (14, -1.) > > row 9: (3, -1.) (8, -1.) (9, 4.) (10, -1.) (15, -1.) > > row 10: (4, -1.) (9, -1.) (10, 4.) (11, -1.) (16, -1.) > > row 11: (5, -1.) (10, -1.) (11, 4.) (17, -1.) > > row 12: (6, -1.) (12, 4.) (13, -1.) (18, -1.) > > row 13: (7, -1.) (12, -1.) (13, 4.) (14, -1.) (19, -1.) > > row 14: (8, -1.) (13, -1.) (14, 4.) (15, -1.) (20, -1.) > > row 15: (9, -1.) (14, -1.) (15, 4.) (16, -1.) (21, -1.) > > row 16: (10, -1.) (15, -1.) (16, 4.) (17, -1.) (22, -1.) > > row 17: (11, -1.) (16, -1.) (17, 4.) (23, -1.) > > row 18: (12, -1.) (18, 4.) (19, -1.) (24, -1.) > > row 19: (13, -1.) (18, -1.) (19, 4.) (20, -1.) (25, -1.) > > row 20: (14, -1.) (19, -1.) (20, 4.) (21, -1.) (26, -1.) > > row 21: (15, -1.) (20, -1.) (21, 4.) (22, -1.) (27, -1.) > > row 22: (16, -1.) (21, -1.) (22, 4.) (23, -1.) (28, -1.) > > row 23: (17, -1.) (22, -1.) (23, 4.) (29, -1.) > > row 24: (18, -1.) (24, 4.) (25, -1.) (30, -1.) > > row 25: (19, -1.) (24, -1.) (25, 4.) (26, -1.) (31, -1.) > > row 26: (20, -1.) (25, -1.) (26, 4.) (27, -1.) (32, -1.) > > row 27: (21, -1.) (26, -1.) (27, 4.) (28, -1.) (33, -1.) > > row 28: (22, -1.) (27, -1.) (28, 4.) (29, -1.) (34, -1.) > > row 29: (23, -1.) (28, -1.) (29, 4.) (35, -1.) > > row 30: (24, -1.) (30, 4.) (31, -1.) > > row 31: (25, -1.) (30, -1.) (31, 4.) (32, -1.) > > row 32: (26, -1.) (31, -1.) (32, 4.) (33, -1.) > > row 33: (27, -1.) (32, -1.) (33, 4.) (34, -1.) > > row 34: (28, -1.) (33, -1.) (34, 4.) (35, -1.) > > row 35: (29, -1.) (34, -1.) (35, 4.) > > Second MatLoad! > > Mat Object: 2 MPI processes > > type: mpiaij > > ==4592== Invalid read of size 4 > > ==4592== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1402) > > ==4592== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) > > ==4592== by 0x53373D7: MatView (matrix.c:989) > > ==4592== by 0x40107E: main (ex16.c:30) > > ==4592== Address 0xa47b460 is 20 bytes after a block of size 28 alloc'd > > ==4592== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) > > ==4592== by 0x4FD121A: PetscMallocAlign (mal.c:28) > > ==4592== by 0x5842C70: MatSetUpMultiply_MPIAIJ (mmaij.c:41) > > ==4592== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) > > ==4592== by 0x536B299: MatAssemblyEnd (matrix.c:5298) > > ==4592== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) > > ==4592== by 0x5337FEA: MatLoad (matrix.c:1101) > > ==4592== by 0x400D9F: main (ex16.c:22) > > ==4592== > > ==4591== Invalid read of size 4 > > ==4591== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1402) > > ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) > > ==4591== by 0x53373D7: MatView (matrix.c:989) > > ==4591== by 0x40107E: main (ex16.c:30) > > ==4591== Address 0xa482958 is 24 bytes before a block of size 7 alloc'd > > ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) > > ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) > > ==4591== by 0x4F31FB5: PetscStrallocpy (str.c:197) > > ==4591== by 0x4F0D3F5: PetscClassRegLogRegister (classlog.c:253) > > ==4591== by 0x4EF96E2: PetscClassIdRegister (plog.c:2053) > > ==4591== by 0x51FA018: VecInitializePackage (dlregisvec.c:165) > > ==4591== by 0x51F6DE9: VecCreate (veccreate.c:35) > > ==4591== by 0x51C49F0: VecCreateSeq (vseqcr.c:37) > > ==4591== by 0x5843191: MatSetUpMultiply_MPIAIJ (mmaij.c:104) > > ==4591== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) > > ==4591== by 0x536B299: MatAssemblyEnd (matrix.c:5298) > > ==4591== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) > > ==4591== by 0x5337FEA: MatLoad (matrix.c:1101) > > ==4591== by 0x400D9F: main (ex16.c:22) > > ==4591== > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > [0]PETSC ERROR: Argument out of range > > [0]PETSC ERROR: Column too large: col 96 max 35 > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1729-g4c4de23 GIT Date: 2016-10-20 22:22:58 +0000 > > [0]PETSC ERROR: ./ex16 on a arch-idx64-slu named asterix by balay Fri Oct 21 18:47:51 2016 > > [0]PETSC ERROR: Configure options --download-metis --download-parmetis --download-superlu_dist PETSC_ARCH=arch-idx64-slu > > [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 585 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > > [0]PETSC ERROR: #2 MatAssemblyEnd_MPIAIJ() line 724 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > > [0]PETSC ERROR: #3 MatAssemblyEnd() line 5298 in /home/balay/petsc/src/mat/interface/matrix.c > > [0]PETSC ERROR: #4 MatView_MPIAIJ_ASCIIorDraworSocket() line 1410 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > > [0]PETSC ERROR: #5 MatView_MPIAIJ() line 1440 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > > [0]PETSC ERROR: #6 MatView() line 989 in /home/balay/petsc/src/mat/interface/matrix.c > > [0]PETSC ERROR: #7 main() line 30 in /home/balay/download-pine/x/superlu_dist_test/ex16.c > > [0]PETSC ERROR: PETSc Option Table entries: > > [0]PETSC ERROR: -display :0.0 > > [0]PETSC ERROR: -f /home/balay/datafiles/matrices/small > > [0]PETSC ERROR: -malloc_dump > > [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- > > application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 > > [cli_0]: aborting job: > > application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 > > ==4591== 16,965 (2,744 direct, 14,221 indirect) bytes in 1 blocks are definitely lost in loss record 1,014 of 1,016 > > ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) > > ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) > > ==4591== by 0x52F3B14: MatCreate (gcreate.c:84) > > ==4591== by 0x581390A: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1371) > > ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) > > ==4591== by 0x53373D7: MatView (matrix.c:989) > > ==4591== by 0x40107E: main (ex16.c:30) > > ==4591== > > > > =================================================================================== > > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > > = PID 4591 RUNNING AT asterix > > = EXIT CODE: 63 > > = CLEANING UP REMAINING PROCESSES > > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > > =================================================================================== > > balay at asterix /home/balay/download-pine/x/superlu_dist_test > > $ > > From popov at uni-mainz.de Mon Oct 24 05:07:02 2016 From: popov at uni-mainz.de (Anton Popov) Date: Mon, 24 Oct 2016 12:07:02 +0200 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 failure of repeated calls to MatLoad() or MatMPIAIJSetPreallocation() with the same matrix In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> Message-ID: <0d72e78d-52af-1d9a-677f-167d2efbd8ab@uni-mainz.de> Thank you Barry, Satish, Fande! Is there a chance to get this fix in the maintenance release 3.7.5 together with the latest SuperLU_DIST? Or next release is a more realistic option? Anton On 10/24/2016 01:58 AM, Satish Balay wrote: > The original testcode from Anton also works [i.e is valgrind clean] with this change.. > > Satish > > On Sun, 23 Oct 2016, Barry Smith wrote: > >> Thanks Satish, >> >> I have fixed this in barry/fix-matmpixxxsetpreallocation-reentrant (in next for testing) >> >> Fande, >> >> This will also make MatMPIAIJSetPreallocation() work properly with multiple calls (you will not need a MatReset()). >> >> Barry >> >> >>> On Oct 21, 2016, at 6:48 PM, Satish Balay wrote: >>> >>> On Fri, 21 Oct 2016, Barry Smith wrote: >>> >>>> valgrind first >>> balay at asterix /home/balay/download-pine/x/superlu_dist_test >>> $ mpiexec -n 2 $VG ./ex16 -f ~/datafiles/matrices/small >>> First MatLoad! >>> Mat Object: 2 MPI processes >>> type: mpiaij >>> row 0: (0, 4.) (1, -1.) (6, -1.) >>> row 1: (0, -1.) (1, 4.) (2, -1.) (7, -1.) >>> row 2: (1, -1.) (2, 4.) (3, -1.) (8, -1.) >>> row 3: (2, -1.) (3, 4.) (4, -1.) (9, -1.) >>> row 4: (3, -1.) (4, 4.) (5, -1.) (10, -1.) >>> row 5: (4, -1.) (5, 4.) (11, -1.) >>> row 6: (0, -1.) (6, 4.) (7, -1.) (12, -1.) >>> row 7: (1, -1.) (6, -1.) (7, 4.) (8, -1.) (13, -1.) >>> row 8: (2, -1.) (7, -1.) (8, 4.) (9, -1.) (14, -1.) >>> row 9: (3, -1.) (8, -1.) (9, 4.) (10, -1.) (15, -1.) >>> row 10: (4, -1.) (9, -1.) (10, 4.) (11, -1.) (16, -1.) >>> row 11: (5, -1.) (10, -1.) (11, 4.) (17, -1.) >>> row 12: (6, -1.) (12, 4.) (13, -1.) (18, -1.) >>> row 13: (7, -1.) (12, -1.) (13, 4.) (14, -1.) (19, -1.) >>> row 14: (8, -1.) (13, -1.) (14, 4.) (15, -1.) (20, -1.) >>> row 15: (9, -1.) (14, -1.) (15, 4.) (16, -1.) (21, -1.) >>> row 16: (10, -1.) (15, -1.) (16, 4.) (17, -1.) (22, -1.) >>> row 17: (11, -1.) (16, -1.) (17, 4.) (23, -1.) >>> row 18: (12, -1.) (18, 4.) (19, -1.) (24, -1.) >>> row 19: (13, -1.) (18, -1.) (19, 4.) (20, -1.) (25, -1.) >>> row 20: (14, -1.) (19, -1.) (20, 4.) (21, -1.) (26, -1.) >>> row 21: (15, -1.) (20, -1.) (21, 4.) (22, -1.) (27, -1.) >>> row 22: (16, -1.) (21, -1.) (22, 4.) (23, -1.) (28, -1.) >>> row 23: (17, -1.) (22, -1.) (23, 4.) (29, -1.) >>> row 24: (18, -1.) (24, 4.) (25, -1.) (30, -1.) >>> row 25: (19, -1.) (24, -1.) (25, 4.) (26, -1.) (31, -1.) >>> row 26: (20, -1.) (25, -1.) (26, 4.) (27, -1.) (32, -1.) >>> row 27: (21, -1.) (26, -1.) (27, 4.) (28, -1.) (33, -1.) >>> row 28: (22, -1.) (27, -1.) (28, 4.) (29, -1.) (34, -1.) >>> row 29: (23, -1.) (28, -1.) (29, 4.) (35, -1.) >>> row 30: (24, -1.) (30, 4.) (31, -1.) >>> row 31: (25, -1.) (30, -1.) (31, 4.) (32, -1.) >>> row 32: (26, -1.) (31, -1.) (32, 4.) (33, -1.) >>> row 33: (27, -1.) (32, -1.) (33, 4.) (34, -1.) >>> row 34: (28, -1.) (33, -1.) (34, 4.) (35, -1.) >>> row 35: (29, -1.) (34, -1.) (35, 4.) >>> Second MatLoad! >>> Mat Object: 2 MPI processes >>> type: mpiaij >>> ==4592== Invalid read of size 4 >>> ==4592== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1402) >>> ==4592== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) >>> ==4592== by 0x53373D7: MatView (matrix.c:989) >>> ==4592== by 0x40107E: main (ex16.c:30) >>> ==4592== Address 0xa47b460 is 20 bytes after a block of size 28 alloc'd >>> ==4592== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) >>> ==4592== by 0x4FD121A: PetscMallocAlign (mal.c:28) >>> ==4592== by 0x5842C70: MatSetUpMultiply_MPIAIJ (mmaij.c:41) >>> ==4592== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) >>> ==4592== by 0x536B299: MatAssemblyEnd (matrix.c:5298) >>> ==4592== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) >>> ==4592== by 0x5337FEA: MatLoad (matrix.c:1101) >>> ==4592== by 0x400D9F: main (ex16.c:22) >>> ==4592== >>> ==4591== Invalid read of size 4 >>> ==4591== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1402) >>> ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) >>> ==4591== by 0x53373D7: MatView (matrix.c:989) >>> ==4591== by 0x40107E: main (ex16.c:30) >>> ==4591== Address 0xa482958 is 24 bytes before a block of size 7 alloc'd >>> ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) >>> ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) >>> ==4591== by 0x4F31FB5: PetscStrallocpy (str.c:197) >>> ==4591== by 0x4F0D3F5: PetscClassRegLogRegister (classlog.c:253) >>> ==4591== by 0x4EF96E2: PetscClassIdRegister (plog.c:2053) >>> ==4591== by 0x51FA018: VecInitializePackage (dlregisvec.c:165) >>> ==4591== by 0x51F6DE9: VecCreate (veccreate.c:35) >>> ==4591== by 0x51C49F0: VecCreateSeq (vseqcr.c:37) >>> ==4591== by 0x5843191: MatSetUpMultiply_MPIAIJ (mmaij.c:104) >>> ==4591== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) >>> ==4591== by 0x536B299: MatAssemblyEnd (matrix.c:5298) >>> ==4591== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) >>> ==4591== by 0x5337FEA: MatLoad (matrix.c:1101) >>> ==4591== by 0x400D9F: main (ex16.c:22) >>> ==4591== >>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>> [0]PETSC ERROR: Argument out of range >>> [0]PETSC ERROR: Column too large: col 96 max 35 >>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1729-g4c4de23 GIT Date: 2016-10-20 22:22:58 +0000 >>> [0]PETSC ERROR: ./ex16 on a arch-idx64-slu named asterix by balay Fri Oct 21 18:47:51 2016 >>> [0]PETSC ERROR: Configure options --download-metis --download-parmetis --download-superlu_dist PETSC_ARCH=arch-idx64-slu >>> [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 585 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c >>> [0]PETSC ERROR: #2 MatAssemblyEnd_MPIAIJ() line 724 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c >>> [0]PETSC ERROR: #3 MatAssemblyEnd() line 5298 in /home/balay/petsc/src/mat/interface/matrix.c >>> [0]PETSC ERROR: #4 MatView_MPIAIJ_ASCIIorDraworSocket() line 1410 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c >>> [0]PETSC ERROR: #5 MatView_MPIAIJ() line 1440 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c >>> [0]PETSC ERROR: #6 MatView() line 989 in /home/balay/petsc/src/mat/interface/matrix.c >>> [0]PETSC ERROR: #7 main() line 30 in /home/balay/download-pine/x/superlu_dist_test/ex16.c >>> [0]PETSC ERROR: PETSc Option Table entries: >>> [0]PETSC ERROR: -display :0.0 >>> [0]PETSC ERROR: -f /home/balay/datafiles/matrices/small >>> [0]PETSC ERROR: -malloc_dump >>> [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- >>> application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 >>> [cli_0]: aborting job: >>> application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 >>> ==4591== 16,965 (2,744 direct, 14,221 indirect) bytes in 1 blocks are definitely lost in loss record 1,014 of 1,016 >>> ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) >>> ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) >>> ==4591== by 0x52F3B14: MatCreate (gcreate.c:84) >>> ==4591== by 0x581390A: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1371) >>> ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) >>> ==4591== by 0x53373D7: MatView (matrix.c:989) >>> ==4591== by 0x40107E: main (ex16.c:30) >>> ==4591== >>> >>> =================================================================================== >>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>> = PID 4591 RUNNING AT asterix >>> = EXIT CODE: 63 >>> = CLEANING UP REMAINING PROCESSES >>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >>> =================================================================================== >>> balay at asterix /home/balay/download-pine/x/superlu_dist_test >>> $ >> From bsmith at mcs.anl.gov Mon Oct 24 06:27:06 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 24 Oct 2016 06:27:06 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 failure of repeated calls to MatLoad() or MatMPIAIJSetPreallocation() with the same matrix In-Reply-To: <0d72e78d-52af-1d9a-677f-167d2efbd8ab@uni-mainz.de> References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> <0d72e78d-52af-1d9a-677f-167d2efbd8ab@uni-mainz.de> Message-ID: <6B14246C-CA55-4E5B-BF5C-F01C33DACCB8@mcs.anl.gov> Anton, Sorry for any confusion. This doesn't resolve the SuperLU_DIST issue which I think Hong is working on, this only resolves multiple loads of matrices into the same Mat. Barry > On Oct 24, 2016, at 5:07 AM, Anton Popov wrote: > > Thank you Barry, Satish, Fande! > > Is there a chance to get this fix in the maintenance release 3.7.5 together with the latest SuperLU_DIST? Or next release is a more realistic option? > > Anton > > On 10/24/2016 01:58 AM, Satish Balay wrote: >> The original testcode from Anton also works [i.e is valgrind clean] with this change.. >> >> Satish >> >> On Sun, 23 Oct 2016, Barry Smith wrote: >> >>> Thanks Satish, >>> >>> I have fixed this in barry/fix-matmpixxxsetpreallocation-reentrant (in next for testing) >>> >>> Fande, >>> >>> This will also make MatMPIAIJSetPreallocation() work properly with multiple calls (you will not need a MatReset()). >>> >>> Barry >>> >>> >>>> On Oct 21, 2016, at 6:48 PM, Satish Balay wrote: >>>> >>>> On Fri, 21 Oct 2016, Barry Smith wrote: >>>> >>>>> valgrind first >>>> balay at asterix /home/balay/download-pine/x/superlu_dist_test >>>> $ mpiexec -n 2 $VG ./ex16 -f ~/datafiles/matrices/small >>>> First MatLoad! >>>> Mat Object: 2 MPI processes >>>> type: mpiaij >>>> row 0: (0, 4.) (1, -1.) (6, -1.) >>>> row 1: (0, -1.) (1, 4.) (2, -1.) (7, -1.) >>>> row 2: (1, -1.) (2, 4.) (3, -1.) (8, -1.) >>>> row 3: (2, -1.) (3, 4.) (4, -1.) (9, -1.) >>>> row 4: (3, -1.) (4, 4.) (5, -1.) (10, -1.) >>>> row 5: (4, -1.) (5, 4.) (11, -1.) >>>> row 6: (0, -1.) (6, 4.) (7, -1.) (12, -1.) >>>> row 7: (1, -1.) (6, -1.) (7, 4.) (8, -1.) (13, -1.) >>>> row 8: (2, -1.) (7, -1.) (8, 4.) (9, -1.) (14, -1.) >>>> row 9: (3, -1.) (8, -1.) (9, 4.) (10, -1.) (15, -1.) >>>> row 10: (4, -1.) (9, -1.) (10, 4.) (11, -1.) (16, -1.) >>>> row 11: (5, -1.) (10, -1.) (11, 4.) (17, -1.) >>>> row 12: (6, -1.) (12, 4.) (13, -1.) (18, -1.) >>>> row 13: (7, -1.) (12, -1.) (13, 4.) (14, -1.) (19, -1.) >>>> row 14: (8, -1.) (13, -1.) (14, 4.) (15, -1.) (20, -1.) >>>> row 15: (9, -1.) (14, -1.) (15, 4.) (16, -1.) (21, -1.) >>>> row 16: (10, -1.) (15, -1.) (16, 4.) (17, -1.) (22, -1.) >>>> row 17: (11, -1.) (16, -1.) (17, 4.) (23, -1.) >>>> row 18: (12, -1.) (18, 4.) (19, -1.) (24, -1.) >>>> row 19: (13, -1.) (18, -1.) (19, 4.) (20, -1.) (25, -1.) >>>> row 20: (14, -1.) (19, -1.) (20, 4.) (21, -1.) (26, -1.) >>>> row 21: (15, -1.) (20, -1.) (21, 4.) (22, -1.) (27, -1.) >>>> row 22: (16, -1.) (21, -1.) (22, 4.) (23, -1.) (28, -1.) >>>> row 23: (17, -1.) (22, -1.) (23, 4.) (29, -1.) >>>> row 24: (18, -1.) (24, 4.) (25, -1.) (30, -1.) >>>> row 25: (19, -1.) (24, -1.) (25, 4.) (26, -1.) (31, -1.) >>>> row 26: (20, -1.) (25, -1.) (26, 4.) (27, -1.) (32, -1.) >>>> row 27: (21, -1.) (26, -1.) (27, 4.) (28, -1.) (33, -1.) >>>> row 28: (22, -1.) (27, -1.) (28, 4.) (29, -1.) (34, -1.) >>>> row 29: (23, -1.) (28, -1.) (29, 4.) (35, -1.) >>>> row 30: (24, -1.) (30, 4.) (31, -1.) >>>> row 31: (25, -1.) (30, -1.) (31, 4.) (32, -1.) >>>> row 32: (26, -1.) (31, -1.) (32, 4.) (33, -1.) >>>> row 33: (27, -1.) (32, -1.) (33, 4.) (34, -1.) >>>> row 34: (28, -1.) (33, -1.) (34, 4.) (35, -1.) >>>> row 35: (29, -1.) (34, -1.) (35, 4.) >>>> Second MatLoad! >>>> Mat Object: 2 MPI processes >>>> type: mpiaij >>>> ==4592== Invalid read of size 4 >>>> ==4592== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1402) >>>> ==4592== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) >>>> ==4592== by 0x53373D7: MatView (matrix.c:989) >>>> ==4592== by 0x40107E: main (ex16.c:30) >>>> ==4592== Address 0xa47b460 is 20 bytes after a block of size 28 alloc'd >>>> ==4592== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) >>>> ==4592== by 0x4FD121A: PetscMallocAlign (mal.c:28) >>>> ==4592== by 0x5842C70: MatSetUpMultiply_MPIAIJ (mmaij.c:41) >>>> ==4592== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) >>>> ==4592== by 0x536B299: MatAssemblyEnd (matrix.c:5298) >>>> ==4592== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) >>>> ==4592== by 0x5337FEA: MatLoad (matrix.c:1101) >>>> ==4592== by 0x400D9F: main (ex16.c:22) >>>> ==4592== >>>> ==4591== Invalid read of size 4 >>>> ==4591== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1402) >>>> ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) >>>> ==4591== by 0x53373D7: MatView (matrix.c:989) >>>> ==4591== by 0x40107E: main (ex16.c:30) >>>> ==4591== Address 0xa482958 is 24 bytes before a block of size 7 alloc'd >>>> ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) >>>> ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) >>>> ==4591== by 0x4F31FB5: PetscStrallocpy (str.c:197) >>>> ==4591== by 0x4F0D3F5: PetscClassRegLogRegister (classlog.c:253) >>>> ==4591== by 0x4EF96E2: PetscClassIdRegister (plog.c:2053) >>>> ==4591== by 0x51FA018: VecInitializePackage (dlregisvec.c:165) >>>> ==4591== by 0x51F6DE9: VecCreate (veccreate.c:35) >>>> ==4591== by 0x51C49F0: VecCreateSeq (vseqcr.c:37) >>>> ==4591== by 0x5843191: MatSetUpMultiply_MPIAIJ (mmaij.c:104) >>>> ==4591== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) >>>> ==4591== by 0x536B299: MatAssemblyEnd (matrix.c:5298) >>>> ==4591== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) >>>> ==4591== by 0x5337FEA: MatLoad (matrix.c:1101) >>>> ==4591== by 0x400D9F: main (ex16.c:22) >>>> ==4591== >>>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>> [0]PETSC ERROR: Argument out of range >>>> [0]PETSC ERROR: Column too large: col 96 max 35 >>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1729-g4c4de23 GIT Date: 2016-10-20 22:22:58 +0000 >>>> [0]PETSC ERROR: ./ex16 on a arch-idx64-slu named asterix by balay Fri Oct 21 18:47:51 2016 >>>> [0]PETSC ERROR: Configure options --download-metis --download-parmetis --download-superlu_dist PETSC_ARCH=arch-idx64-slu >>>> [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 585 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>> [0]PETSC ERROR: #2 MatAssemblyEnd_MPIAIJ() line 724 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>> [0]PETSC ERROR: #3 MatAssemblyEnd() line 5298 in /home/balay/petsc/src/mat/interface/matrix.c >>>> [0]PETSC ERROR: #4 MatView_MPIAIJ_ASCIIorDraworSocket() line 1410 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>> [0]PETSC ERROR: #5 MatView_MPIAIJ() line 1440 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>> [0]PETSC ERROR: #6 MatView() line 989 in /home/balay/petsc/src/mat/interface/matrix.c >>>> [0]PETSC ERROR: #7 main() line 30 in /home/balay/download-pine/x/superlu_dist_test/ex16.c >>>> [0]PETSC ERROR: PETSc Option Table entries: >>>> [0]PETSC ERROR: -display :0.0 >>>> [0]PETSC ERROR: -f /home/balay/datafiles/matrices/small >>>> [0]PETSC ERROR: -malloc_dump >>>> [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- >>>> application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 >>>> [cli_0]: aborting job: >>>> application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 >>>> ==4591== 16,965 (2,744 direct, 14,221 indirect) bytes in 1 blocks are definitely lost in loss record 1,014 of 1,016 >>>> ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) >>>> ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) >>>> ==4591== by 0x52F3B14: MatCreate (gcreate.c:84) >>>> ==4591== by 0x581390A: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1371) >>>> ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) >>>> ==4591== by 0x53373D7: MatView (matrix.c:989) >>>> ==4591== by 0x40107E: main (ex16.c:30) >>>> ==4591== >>>> >>>> =================================================================================== >>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>>> = PID 4591 RUNNING AT asterix >>>> = EXIT CODE: 63 >>>> = CLEANING UP REMAINING PROCESSES >>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >>>> =================================================================================== >>>> balay at asterix /home/balay/download-pine/x/superlu_dist_test >>>> $ >>> > From balay at mcs.anl.gov Mon Oct 24 09:00:35 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 24 Oct 2016 09:00:35 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 failure of repeated calls to MatLoad() or MatMPIAIJSetPreallocation() with the same matrix In-Reply-To: <6B14246C-CA55-4E5B-BF5C-F01C33DACCB8@mcs.anl.gov> References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> <0d72e78d-52af-1d9a-677f-167d2efbd8ab@uni-mainz.de> <6B14246C-CA55-4E5B-BF5C-F01C33DACCB8@mcs.anl.gov> Message-ID: Since the provided test code dosn't crash [and is valgrind clean] - with this fix - I'm not sure what bug Hong is chasing.. Satish On Mon, 24 Oct 2016, Barry Smith wrote: > > Anton, > > Sorry for any confusion. This doesn't resolve the SuperLU_DIST issue which I think Hong is working on, this only resolves multiple loads of matrices into the same Mat. > > Barry > > > On Oct 24, 2016, at 5:07 AM, Anton Popov wrote: > > > > Thank you Barry, Satish, Fande! > > > > Is there a chance to get this fix in the maintenance release 3.7.5 together with the latest SuperLU_DIST? Or next release is a more realistic option? > > > > Anton > > > > On 10/24/2016 01:58 AM, Satish Balay wrote: > >> The original testcode from Anton also works [i.e is valgrind clean] with this change.. > >> > >> Satish > >> > >> On Sun, 23 Oct 2016, Barry Smith wrote: > >> > >>> Thanks Satish, > >>> > >>> I have fixed this in barry/fix-matmpixxxsetpreallocation-reentrant (in next for testing) > >>> > >>> Fande, > >>> > >>> This will also make MatMPIAIJSetPreallocation() work properly with multiple calls (you will not need a MatReset()). > >>> > >>> Barry > >>> > >>> > >>>> On Oct 21, 2016, at 6:48 PM, Satish Balay wrote: > >>>> > >>>> On Fri, 21 Oct 2016, Barry Smith wrote: > >>>> > >>>>> valgrind first > >>>> balay at asterix /home/balay/download-pine/x/superlu_dist_test > >>>> $ mpiexec -n 2 $VG ./ex16 -f ~/datafiles/matrices/small > >>>> First MatLoad! > >>>> Mat Object: 2 MPI processes > >>>> type: mpiaij > >>>> row 0: (0, 4.) (1, -1.) (6, -1.) > >>>> row 1: (0, -1.) (1, 4.) (2, -1.) (7, -1.) > >>>> row 2: (1, -1.) (2, 4.) (3, -1.) (8, -1.) > >>>> row 3: (2, -1.) (3, 4.) (4, -1.) (9, -1.) > >>>> row 4: (3, -1.) (4, 4.) (5, -1.) (10, -1.) > >>>> row 5: (4, -1.) (5, 4.) (11, -1.) > >>>> row 6: (0, -1.) (6, 4.) (7, -1.) (12, -1.) > >>>> row 7: (1, -1.) (6, -1.) (7, 4.) (8, -1.) (13, -1.) > >>>> row 8: (2, -1.) (7, -1.) (8, 4.) (9, -1.) (14, -1.) > >>>> row 9: (3, -1.) (8, -1.) (9, 4.) (10, -1.) (15, -1.) > >>>> row 10: (4, -1.) (9, -1.) (10, 4.) (11, -1.) (16, -1.) > >>>> row 11: (5, -1.) (10, -1.) (11, 4.) (17, -1.) > >>>> row 12: (6, -1.) (12, 4.) (13, -1.) (18, -1.) > >>>> row 13: (7, -1.) (12, -1.) (13, 4.) (14, -1.) (19, -1.) > >>>> row 14: (8, -1.) (13, -1.) (14, 4.) (15, -1.) (20, -1.) > >>>> row 15: (9, -1.) (14, -1.) (15, 4.) (16, -1.) (21, -1.) > >>>> row 16: (10, -1.) (15, -1.) (16, 4.) (17, -1.) (22, -1.) > >>>> row 17: (11, -1.) (16, -1.) (17, 4.) (23, -1.) > >>>> row 18: (12, -1.) (18, 4.) (19, -1.) (24, -1.) > >>>> row 19: (13, -1.) (18, -1.) (19, 4.) (20, -1.) (25, -1.) > >>>> row 20: (14, -1.) (19, -1.) (20, 4.) (21, -1.) (26, -1.) > >>>> row 21: (15, -1.) (20, -1.) (21, 4.) (22, -1.) (27, -1.) > >>>> row 22: (16, -1.) (21, -1.) (22, 4.) (23, -1.) (28, -1.) > >>>> row 23: (17, -1.) (22, -1.) (23, 4.) (29, -1.) > >>>> row 24: (18, -1.) (24, 4.) (25, -1.) (30, -1.) > >>>> row 25: (19, -1.) (24, -1.) (25, 4.) (26, -1.) (31, -1.) > >>>> row 26: (20, -1.) (25, -1.) (26, 4.) (27, -1.) (32, -1.) > >>>> row 27: (21, -1.) (26, -1.) (27, 4.) (28, -1.) (33, -1.) > >>>> row 28: (22, -1.) (27, -1.) (28, 4.) (29, -1.) (34, -1.) > >>>> row 29: (23, -1.) (28, -1.) (29, 4.) (35, -1.) > >>>> row 30: (24, -1.) (30, 4.) (31, -1.) > >>>> row 31: (25, -1.) (30, -1.) (31, 4.) (32, -1.) > >>>> row 32: (26, -1.) (31, -1.) (32, 4.) (33, -1.) > >>>> row 33: (27, -1.) (32, -1.) (33, 4.) (34, -1.) > >>>> row 34: (28, -1.) (33, -1.) (34, 4.) (35, -1.) > >>>> row 35: (29, -1.) (34, -1.) (35, 4.) > >>>> Second MatLoad! > >>>> Mat Object: 2 MPI processes > >>>> type: mpiaij > >>>> ==4592== Invalid read of size 4 > >>>> ==4592== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1402) > >>>> ==4592== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) > >>>> ==4592== by 0x53373D7: MatView (matrix.c:989) > >>>> ==4592== by 0x40107E: main (ex16.c:30) > >>>> ==4592== Address 0xa47b460 is 20 bytes after a block of size 28 alloc'd > >>>> ==4592== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) > >>>> ==4592== by 0x4FD121A: PetscMallocAlign (mal.c:28) > >>>> ==4592== by 0x5842C70: MatSetUpMultiply_MPIAIJ (mmaij.c:41) > >>>> ==4592== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) > >>>> ==4592== by 0x536B299: MatAssemblyEnd (matrix.c:5298) > >>>> ==4592== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) > >>>> ==4592== by 0x5337FEA: MatLoad (matrix.c:1101) > >>>> ==4592== by 0x400D9F: main (ex16.c:22) > >>>> ==4592== > >>>> ==4591== Invalid read of size 4 > >>>> ==4591== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1402) > >>>> ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) > >>>> ==4591== by 0x53373D7: MatView (matrix.c:989) > >>>> ==4591== by 0x40107E: main (ex16.c:30) > >>>> ==4591== Address 0xa482958 is 24 bytes before a block of size 7 alloc'd > >>>> ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) > >>>> ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) > >>>> ==4591== by 0x4F31FB5: PetscStrallocpy (str.c:197) > >>>> ==4591== by 0x4F0D3F5: PetscClassRegLogRegister (classlog.c:253) > >>>> ==4591== by 0x4EF96E2: PetscClassIdRegister (plog.c:2053) > >>>> ==4591== by 0x51FA018: VecInitializePackage (dlregisvec.c:165) > >>>> ==4591== by 0x51F6DE9: VecCreate (veccreate.c:35) > >>>> ==4591== by 0x51C49F0: VecCreateSeq (vseqcr.c:37) > >>>> ==4591== by 0x5843191: MatSetUpMultiply_MPIAIJ (mmaij.c:104) > >>>> ==4591== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) > >>>> ==4591== by 0x536B299: MatAssemblyEnd (matrix.c:5298) > >>>> ==4591== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) > >>>> ==4591== by 0x5337FEA: MatLoad (matrix.c:1101) > >>>> ==4591== by 0x400D9F: main (ex16.c:22) > >>>> ==4591== > >>>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > >>>> [0]PETSC ERROR: Argument out of range > >>>> [0]PETSC ERROR: Column too large: col 96 max 35 > >>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > >>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1729-g4c4de23 GIT Date: 2016-10-20 22:22:58 +0000 > >>>> [0]PETSC ERROR: ./ex16 on a arch-idx64-slu named asterix by balay Fri Oct 21 18:47:51 2016 > >>>> [0]PETSC ERROR: Configure options --download-metis --download-parmetis --download-superlu_dist PETSC_ARCH=arch-idx64-slu > >>>> [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 585 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > >>>> [0]PETSC ERROR: #2 MatAssemblyEnd_MPIAIJ() line 724 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > >>>> [0]PETSC ERROR: #3 MatAssemblyEnd() line 5298 in /home/balay/petsc/src/mat/interface/matrix.c > >>>> [0]PETSC ERROR: #4 MatView_MPIAIJ_ASCIIorDraworSocket() line 1410 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > >>>> [0]PETSC ERROR: #5 MatView_MPIAIJ() line 1440 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > >>>> [0]PETSC ERROR: #6 MatView() line 989 in /home/balay/petsc/src/mat/interface/matrix.c > >>>> [0]PETSC ERROR: #7 main() line 30 in /home/balay/download-pine/x/superlu_dist_test/ex16.c > >>>> [0]PETSC ERROR: PETSc Option Table entries: > >>>> [0]PETSC ERROR: -display :0.0 > >>>> [0]PETSC ERROR: -f /home/balay/datafiles/matrices/small > >>>> [0]PETSC ERROR: -malloc_dump > >>>> [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- > >>>> application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 > >>>> [cli_0]: aborting job: > >>>> application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 > >>>> ==4591== 16,965 (2,744 direct, 14,221 indirect) bytes in 1 blocks are definitely lost in loss record 1,014 of 1,016 > >>>> ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) > >>>> ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) > >>>> ==4591== by 0x52F3B14: MatCreate (gcreate.c:84) > >>>> ==4591== by 0x581390A: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1371) > >>>> ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) > >>>> ==4591== by 0x53373D7: MatView (matrix.c:989) > >>>> ==4591== by 0x40107E: main (ex16.c:30) > >>>> ==4591== > >>>> > >>>> =================================================================================== > >>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > >>>> = PID 4591 RUNNING AT asterix > >>>> = EXIT CODE: 63 > >>>> = CLEANING UP REMAINING PROCESSES > >>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > >>>> =================================================================================== > >>>> balay at asterix /home/balay/download-pine/x/superlu_dist_test > >>>> $ > >>> > > > > From fande.kong at inl.gov Mon Oct 24 09:07:19 2016 From: fande.kong at inl.gov (Kong, Fande) Date: Mon, 24 Oct 2016 08:07:19 -0600 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 failure of repeated calls to MatLoad() or MatMPIAIJSetPreallocation() with the same matrix In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> Message-ID: On Sun, Oct 23, 2016 at 3:56 PM, Barry Smith wrote: > > Thanks Satish, > > I have fixed this in barry/fix-matmpixxxsetpreallocation-reentrant > (in next for testing) > > Fande, > > This will also make MatMPIAIJSetPreallocation() work properly with > multiple calls (you will not need a MatReset()). > > Barry > Thanks, Barry. Fande, > > > > On Oct 21, 2016, at 6:48 PM, Satish Balay wrote: > > > > On Fri, 21 Oct 2016, Barry Smith wrote: > > > >> > >> valgrind first > > > > balay at asterix /home/balay/download-pine/x/superlu_dist_test > > $ mpiexec -n 2 $VG ./ex16 -f ~/datafiles/matrices/small > > First MatLoad! > > Mat Object: 2 MPI processes > > type: mpiaij > > row 0: (0, 4.) (1, -1.) (6, -1.) > > row 1: (0, -1.) (1, 4.) (2, -1.) (7, -1.) > > row 2: (1, -1.) (2, 4.) (3, -1.) (8, -1.) > > row 3: (2, -1.) (3, 4.) (4, -1.) (9, -1.) > > row 4: (3, -1.) (4, 4.) (5, -1.) (10, -1.) > > row 5: (4, -1.) (5, 4.) (11, -1.) > > row 6: (0, -1.) (6, 4.) (7, -1.) (12, -1.) > > row 7: (1, -1.) (6, -1.) (7, 4.) (8, -1.) (13, -1.) > > row 8: (2, -1.) (7, -1.) (8, 4.) (9, -1.) (14, -1.) > > row 9: (3, -1.) (8, -1.) (9, 4.) (10, -1.) (15, -1.) > > row 10: (4, -1.) (9, -1.) (10, 4.) (11, -1.) (16, -1.) > > row 11: (5, -1.) (10, -1.) (11, 4.) (17, -1.) > > row 12: (6, -1.) (12, 4.) (13, -1.) (18, -1.) > > row 13: (7, -1.) (12, -1.) (13, 4.) (14, -1.) (19, -1.) > > row 14: (8, -1.) (13, -1.) (14, 4.) (15, -1.) (20, -1.) > > row 15: (9, -1.) (14, -1.) (15, 4.) (16, -1.) (21, -1.) > > row 16: (10, -1.) (15, -1.) (16, 4.) (17, -1.) (22, -1.) > > row 17: (11, -1.) (16, -1.) (17, 4.) (23, -1.) > > row 18: (12, -1.) (18, 4.) (19, -1.) (24, -1.) > > row 19: (13, -1.) (18, -1.) (19, 4.) (20, -1.) (25, -1.) > > row 20: (14, -1.) (19, -1.) (20, 4.) (21, -1.) (26, -1.) > > row 21: (15, -1.) (20, -1.) (21, 4.) (22, -1.) (27, -1.) > > row 22: (16, -1.) (21, -1.) (22, 4.) (23, -1.) (28, -1.) > > row 23: (17, -1.) (22, -1.) (23, 4.) (29, -1.) > > row 24: (18, -1.) (24, 4.) (25, -1.) (30, -1.) > > row 25: (19, -1.) (24, -1.) (25, 4.) (26, -1.) (31, -1.) > > row 26: (20, -1.) (25, -1.) (26, 4.) (27, -1.) (32, -1.) > > row 27: (21, -1.) (26, -1.) (27, 4.) (28, -1.) (33, -1.) > > row 28: (22, -1.) (27, -1.) (28, 4.) (29, -1.) (34, -1.) > > row 29: (23, -1.) (28, -1.) (29, 4.) (35, -1.) > > row 30: (24, -1.) (30, 4.) (31, -1.) > > row 31: (25, -1.) (30, -1.) (31, 4.) (32, -1.) > > row 32: (26, -1.) (31, -1.) (32, 4.) (33, -1.) > > row 33: (27, -1.) (32, -1.) (33, 4.) (34, -1.) > > row 34: (28, -1.) (33, -1.) (34, 4.) (35, -1.) > > row 35: (29, -1.) (34, -1.) (35, 4.) > > Second MatLoad! > > Mat Object: 2 MPI processes > > type: mpiaij > > ==4592== Invalid read of size 4 > > ==4592== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket > (mpiaij.c:1402) > > ==4592== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) > > ==4592== by 0x53373D7: MatView (matrix.c:989) > > ==4592== by 0x40107E: main (ex16.c:30) > > ==4592== Address 0xa47b460 is 20 bytes after a block of size 28 alloc'd > > ==4592== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) > > ==4592== by 0x4FD121A: PetscMallocAlign (mal.c:28) > > ==4592== by 0x5842C70: MatSetUpMultiply_MPIAIJ (mmaij.c:41) > > ==4592== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) > > ==4592== by 0x536B299: MatAssemblyEnd (matrix.c:5298) > > ==4592== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) > > ==4592== by 0x5337FEA: MatLoad (matrix.c:1101) > > ==4592== by 0x400D9F: main (ex16.c:22) > > ==4592== > > ==4591== Invalid read of size 4 > > ==4591== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket > (mpiaij.c:1402) > > ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) > > ==4591== by 0x53373D7: MatView (matrix.c:989) > > ==4591== by 0x40107E: main (ex16.c:30) > > ==4591== Address 0xa482958 is 24 bytes before a block of size 7 alloc'd > > ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) > > ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) > > ==4591== by 0x4F31FB5: PetscStrallocpy (str.c:197) > > ==4591== by 0x4F0D3F5: PetscClassRegLogRegister (classlog.c:253) > > ==4591== by 0x4EF96E2: PetscClassIdRegister (plog.c:2053) > > ==4591== by 0x51FA018: VecInitializePackage (dlregisvec.c:165) > > ==4591== by 0x51F6DE9: VecCreate (veccreate.c:35) > > ==4591== by 0x51C49F0: VecCreateSeq (vseqcr.c:37) > > ==4591== by 0x5843191: MatSetUpMultiply_MPIAIJ (mmaij.c:104) > > ==4591== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) > > ==4591== by 0x536B299: MatAssemblyEnd (matrix.c:5298) > > ==4591== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) > > ==4591== by 0x5337FEA: MatLoad (matrix.c:1101) > > ==4591== by 0x400D9F: main (ex16.c:22) > > ==4591== > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: Argument out of range > > [0]PETSC ERROR: Column too large: col 96 max 35 > > [0]PETSC ERROR: See https://urldefense.proofpoint. > com/v2/url?u=http-3A__www.mcs.anl.gov_petsc_documentation_ > faq.html&d=CwIFAg&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB_ > _aEkJFOKJFd00&r=DUUt3SRGI0_JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m= > yCFQeqGFVZhJtXzPwmjejP5oiMeddVxB4a_mxWbQYkA&s= > lWoiLmjuyX1M9FCbfQAwkLK2cAGeDvnXO-fMCKllDTE&e= for trouble shooting. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1729-g4c4de23 > GIT Date: 2016-10-20 22:22:58 +0000 > > [0]PETSC ERROR: ./ex16 on a arch-idx64-slu named asterix by balay Fri > Oct 21 18:47:51 2016 > > [0]PETSC ERROR: Configure options --download-metis --download-parmetis > --download-superlu_dist PETSC_ARCH=arch-idx64-slu > > [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 585 in > /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > > [0]PETSC ERROR: #2 MatAssemblyEnd_MPIAIJ() line 724 in > /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > > [0]PETSC ERROR: #3 MatAssemblyEnd() line 5298 in > /home/balay/petsc/src/mat/interface/matrix.c > > [0]PETSC ERROR: #4 MatView_MPIAIJ_ASCIIorDraworSocket() line 1410 in > /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > > [0]PETSC ERROR: #5 MatView_MPIAIJ() line 1440 in > /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > > [0]PETSC ERROR: #6 MatView() line 989 in /home/balay/petsc/src/mat/ > interface/matrix.c > > [0]PETSC ERROR: #7 main() line 30 in /home/balay/download-pine/x/ > superlu_dist_test/ex16.c > > [0]PETSC ERROR: PETSc Option Table entries: > > [0]PETSC ERROR: -display :0.0 > > [0]PETSC ERROR: -f /home/balay/datafiles/matrices/small > > [0]PETSC ERROR: -malloc_dump > > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > > application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 > > [cli_0]: aborting job: > > application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 > > ==4591== 16,965 (2,744 direct, 14,221 indirect) bytes in 1 blocks are > definitely lost in loss record 1,014 of 1,016 > > ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) > > ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) > > ==4591== by 0x52F3B14: MatCreate (gcreate.c:84) > > ==4591== by 0x581390A: MatView_MPIAIJ_ASCIIorDraworSocket > (mpiaij.c:1371) > > ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) > > ==4591== by 0x53373D7: MatView (matrix.c:989) > > ==4591== by 0x40107E: main (ex16.c:30) > > ==4591== > > > > ============================================================ > ======================= > > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > > = PID 4591 RUNNING AT asterix > > = EXIT CODE: 63 > > = CLEANING UP REMAINING PROCESSES > > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > > ============================================================ > ======================= > > balay at asterix /home/balay/download-pine/x/superlu_dist_test > > $ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Oct 24 09:13:25 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 24 Oct 2016 09:13:25 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 failure of repeated calls to MatLoad() or MatMPIAIJSetPreallocation() with the same matrix In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> <0d72e78d-52af-1d9a-677f-167d2efbd8ab@uni-mainz.de> <6B14246C-CA55-4E5B-BF5C-F01C33DACCB8@mcs.anl.gov> Message-ID: Hong wrote: (Note that it creates a new Mat each time so shouldn't be affected by the bug I fixed; it also "works" with MUMPs but not superlu_dist.) It is not problem with Matload twice. The file has one matrix, but is loaded twice. Replacing pc with ksp, the code runs fine. The error occurs when PCSetUp_LU() is called with SAME_NONZERO_PATTERN. I'll further look at it later. Hong ________________________________________ From: Zhang, Hong Sent: Friday, October 21, 2016 8:18 PM To: Barry Smith; petsc-users Subject: RE: [petsc-users] SuperLU_dist issue in 3.7.4 I am investigating it. The file has two matrices. The code takes following steps: PCCreate(PETSC_COMM_WORLD, &pc); MatCreate(PETSC_COMM_WORLD,&A); MatLoad(A,fd); PCSetOperators(pc,A,A); PCSetUp(pc); MatCreate(PETSC_COMM_WORLD,&A); MatLoad(A,fd); PCSetOperators(pc,A,A); PCSetUp(pc); //crash here with np=2, superlu_dist, not with mumps/superlu or superlu_dist np=1 Hong > On Oct 24, 2016, at 9:00 AM, Satish Balay wrote: > > Since the provided test code dosn't crash [and is valgrind clean] - > with this fix - I'm not sure what bug Hong is chasing.. > > Satish > > On Mon, 24 Oct 2016, Barry Smith wrote: > >> >> Anton, >> >> Sorry for any confusion. This doesn't resolve the SuperLU_DIST issue which I think Hong is working on, this only resolves multiple loads of matrices into the same Mat. >> >> Barry >> >>> On Oct 24, 2016, at 5:07 AM, Anton Popov wrote: >>> >>> Thank you Barry, Satish, Fande! >>> >>> Is there a chance to get this fix in the maintenance release 3.7.5 together with the latest SuperLU_DIST? Or next release is a more realistic option? >>> >>> Anton >>> >>> On 10/24/2016 01:58 AM, Satish Balay wrote: >>>> The original testcode from Anton also works [i.e is valgrind clean] with this change.. >>>> >>>> Satish >>>> >>>> On Sun, 23 Oct 2016, Barry Smith wrote: >>>> >>>>> Thanks Satish, >>>>> >>>>> I have fixed this in barry/fix-matmpixxxsetpreallocation-reentrant (in next for testing) >>>>> >>>>> Fande, >>>>> >>>>> This will also make MatMPIAIJSetPreallocation() work properly with multiple calls (you will not need a MatReset()). >>>>> >>>>> Barry >>>>> >>>>> >>>>>> On Oct 21, 2016, at 6:48 PM, Satish Balay wrote: >>>>>> >>>>>> On Fri, 21 Oct 2016, Barry Smith wrote: >>>>>> >>>>>>> valgrind first >>>>>> balay at asterix /home/balay/download-pine/x/superlu_dist_test >>>>>> $ mpiexec -n 2 $VG ./ex16 -f ~/datafiles/matrices/small >>>>>> First MatLoad! >>>>>> Mat Object: 2 MPI processes >>>>>> type: mpiaij >>>>>> row 0: (0, 4.) (1, -1.) (6, -1.) >>>>>> row 1: (0, -1.) (1, 4.) (2, -1.) (7, -1.) >>>>>> row 2: (1, -1.) (2, 4.) (3, -1.) (8, -1.) >>>>>> row 3: (2, -1.) (3, 4.) (4, -1.) (9, -1.) >>>>>> row 4: (3, -1.) (4, 4.) (5, -1.) (10, -1.) >>>>>> row 5: (4, -1.) (5, 4.) (11, -1.) >>>>>> row 6: (0, -1.) (6, 4.) (7, -1.) (12, -1.) >>>>>> row 7: (1, -1.) (6, -1.) (7, 4.) (8, -1.) (13, -1.) >>>>>> row 8: (2, -1.) (7, -1.) (8, 4.) (9, -1.) (14, -1.) >>>>>> row 9: (3, -1.) (8, -1.) (9, 4.) (10, -1.) (15, -1.) >>>>>> row 10: (4, -1.) (9, -1.) (10, 4.) (11, -1.) (16, -1.) >>>>>> row 11: (5, -1.) (10, -1.) (11, 4.) (17, -1.) >>>>>> row 12: (6, -1.) (12, 4.) (13, -1.) (18, -1.) >>>>>> row 13: (7, -1.) (12, -1.) (13, 4.) (14, -1.) (19, -1.) >>>>>> row 14: (8, -1.) (13, -1.) (14, 4.) (15, -1.) (20, -1.) >>>>>> row 15: (9, -1.) (14, -1.) (15, 4.) (16, -1.) (21, -1.) >>>>>> row 16: (10, -1.) (15, -1.) (16, 4.) (17, -1.) (22, -1.) >>>>>> row 17: (11, -1.) (16, -1.) (17, 4.) (23, -1.) >>>>>> row 18: (12, -1.) (18, 4.) (19, -1.) (24, -1.) >>>>>> row 19: (13, -1.) (18, -1.) (19, 4.) (20, -1.) (25, -1.) >>>>>> row 20: (14, -1.) (19, -1.) (20, 4.) (21, -1.) (26, -1.) >>>>>> row 21: (15, -1.) (20, -1.) (21, 4.) (22, -1.) (27, -1.) >>>>>> row 22: (16, -1.) (21, -1.) (22, 4.) (23, -1.) (28, -1.) >>>>>> row 23: (17, -1.) (22, -1.) (23, 4.) (29, -1.) >>>>>> row 24: (18, -1.) (24, 4.) (25, -1.) (30, -1.) >>>>>> row 25: (19, -1.) (24, -1.) (25, 4.) (26, -1.) (31, -1.) >>>>>> row 26: (20, -1.) (25, -1.) (26, 4.) (27, -1.) (32, -1.) >>>>>> row 27: (21, -1.) (26, -1.) (27, 4.) (28, -1.) (33, -1.) >>>>>> row 28: (22, -1.) (27, -1.) (28, 4.) (29, -1.) (34, -1.) >>>>>> row 29: (23, -1.) (28, -1.) (29, 4.) (35, -1.) >>>>>> row 30: (24, -1.) (30, 4.) (31, -1.) >>>>>> row 31: (25, -1.) (30, -1.) (31, 4.) (32, -1.) >>>>>> row 32: (26, -1.) (31, -1.) (32, 4.) (33, -1.) >>>>>> row 33: (27, -1.) (32, -1.) (33, 4.) (34, -1.) >>>>>> row 34: (28, -1.) (33, -1.) (34, 4.) (35, -1.) >>>>>> row 35: (29, -1.) (34, -1.) (35, 4.) >>>>>> Second MatLoad! >>>>>> Mat Object: 2 MPI processes >>>>>> type: mpiaij >>>>>> ==4592== Invalid read of size 4 >>>>>> ==4592== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1402) >>>>>> ==4592== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) >>>>>> ==4592== by 0x53373D7: MatView (matrix.c:989) >>>>>> ==4592== by 0x40107E: main (ex16.c:30) >>>>>> ==4592== Address 0xa47b460 is 20 bytes after a block of size 28 alloc'd >>>>>> ==4592== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) >>>>>> ==4592== by 0x4FD121A: PetscMallocAlign (mal.c:28) >>>>>> ==4592== by 0x5842C70: MatSetUpMultiply_MPIAIJ (mmaij.c:41) >>>>>> ==4592== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) >>>>>> ==4592== by 0x536B299: MatAssemblyEnd (matrix.c:5298) >>>>>> ==4592== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) >>>>>> ==4592== by 0x5337FEA: MatLoad (matrix.c:1101) >>>>>> ==4592== by 0x400D9F: main (ex16.c:22) >>>>>> ==4592== >>>>>> ==4591== Invalid read of size 4 >>>>>> ==4591== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1402) >>>>>> ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) >>>>>> ==4591== by 0x53373D7: MatView (matrix.c:989) >>>>>> ==4591== by 0x40107E: main (ex16.c:30) >>>>>> ==4591== Address 0xa482958 is 24 bytes before a block of size 7 alloc'd >>>>>> ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) >>>>>> ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) >>>>>> ==4591== by 0x4F31FB5: PetscStrallocpy (str.c:197) >>>>>> ==4591== by 0x4F0D3F5: PetscClassRegLogRegister (classlog.c:253) >>>>>> ==4591== by 0x4EF96E2: PetscClassIdRegister (plog.c:2053) >>>>>> ==4591== by 0x51FA018: VecInitializePackage (dlregisvec.c:165) >>>>>> ==4591== by 0x51F6DE9: VecCreate (veccreate.c:35) >>>>>> ==4591== by 0x51C49F0: VecCreateSeq (vseqcr.c:37) >>>>>> ==4591== by 0x5843191: MatSetUpMultiply_MPIAIJ (mmaij.c:104) >>>>>> ==4591== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) >>>>>> ==4591== by 0x536B299: MatAssemblyEnd (matrix.c:5298) >>>>>> ==4591== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) >>>>>> ==4591== by 0x5337FEA: MatLoad (matrix.c:1101) >>>>>> ==4591== by 0x400D9F: main (ex16.c:22) >>>>>> ==4591== >>>>>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>>>> [0]PETSC ERROR: Argument out of range >>>>>> [0]PETSC ERROR: Column too large: col 96 max 35 >>>>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1729-g4c4de23 GIT Date: 2016-10-20 22:22:58 +0000 >>>>>> [0]PETSC ERROR: ./ex16 on a arch-idx64-slu named asterix by balay Fri Oct 21 18:47:51 2016 >>>>>> [0]PETSC ERROR: Configure options --download-metis --download-parmetis --download-superlu_dist PETSC_ARCH=arch-idx64-slu >>>>>> [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 585 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>> [0]PETSC ERROR: #2 MatAssemblyEnd_MPIAIJ() line 724 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>> [0]PETSC ERROR: #3 MatAssemblyEnd() line 5298 in /home/balay/petsc/src/mat/interface/matrix.c >>>>>> [0]PETSC ERROR: #4 MatView_MPIAIJ_ASCIIorDraworSocket() line 1410 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>> [0]PETSC ERROR: #5 MatView_MPIAIJ() line 1440 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>> [0]PETSC ERROR: #6 MatView() line 989 in /home/balay/petsc/src/mat/interface/matrix.c >>>>>> [0]PETSC ERROR: #7 main() line 30 in /home/balay/download-pine/x/superlu_dist_test/ex16.c >>>>>> [0]PETSC ERROR: PETSc Option Table entries: >>>>>> [0]PETSC ERROR: -display :0.0 >>>>>> [0]PETSC ERROR: -f /home/balay/datafiles/matrices/small >>>>>> [0]PETSC ERROR: -malloc_dump >>>>>> [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- >>>>>> application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 >>>>>> [cli_0]: aborting job: >>>>>> application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 >>>>>> ==4591== 16,965 (2,744 direct, 14,221 indirect) bytes in 1 blocks are definitely lost in loss record 1,014 of 1,016 >>>>>> ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) >>>>>> ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) >>>>>> ==4591== by 0x52F3B14: MatCreate (gcreate.c:84) >>>>>> ==4591== by 0x581390A: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1371) >>>>>> ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) >>>>>> ==4591== by 0x53373D7: MatView (matrix.c:989) >>>>>> ==4591== by 0x40107E: main (ex16.c:30) >>>>>> ==4591== >>>>>> >>>>>> =================================================================================== >>>>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>>>>> = PID 4591 RUNNING AT asterix >>>>>> = EXIT CODE: 63 >>>>>> = CLEANING UP REMAINING PROCESSES >>>>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >>>>>> =================================================================================== >>>>>> balay at asterix /home/balay/download-pine/x/superlu_dist_test >>>>>> $ >>>>> >>> >> >> > From fande.kong at inl.gov Mon Oct 24 09:24:10 2016 From: fande.kong at inl.gov (Kong, Fande) Date: Mon, 24 Oct 2016 08:24:10 -0600 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 failure of repeated calls to MatLoad() or MatMPIAIJSetPreallocation() with the same matrix In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> Message-ID: On Mon, Oct 24, 2016 at 8:07 AM, Kong, Fande wrote: > > > On Sun, Oct 23, 2016 at 3:56 PM, Barry Smith wrote: > >> >> Thanks Satish, >> >> I have fixed this in barry/fix-matmpixxxsetpreallocation-reentrant >> (in next for testing) >> >> Fande, >> >> This will also make MatMPIAIJSetPreallocation() work properly >> with multiple calls (you will not need a MatReset()). >> > Does this work for MPIAIJ only? There are also other functions: MatSeqAIJSetPreallocation(), MatMPIAIJSetPreallocation(), MatSeqBAIJSetPreallocation(), MatMPIBAIJSetPreallocation(), MatSeqSBAIJSetPreallocation(), MatMPISBAIJSetPreallocation(), and MatXAIJSetPreallocation. We have to use different function for different type. Could we have an unified-interface for all of them? Fande, > >> Barry >> > > Thanks, Barry. > > Fande, > > >> >> >> > On Oct 21, 2016, at 6:48 PM, Satish Balay wrote: >> > >> > On Fri, 21 Oct 2016, Barry Smith wrote: >> > >> >> >> >> valgrind first >> > >> > balay at asterix /home/balay/download-pine/x/superlu_dist_test >> > $ mpiexec -n 2 $VG ./ex16 -f ~/datafiles/matrices/small >> > First MatLoad! >> > Mat Object: 2 MPI processes >> > type: mpiaij >> > row 0: (0, 4.) (1, -1.) (6, -1.) >> > row 1: (0, -1.) (1, 4.) (2, -1.) (7, -1.) >> > row 2: (1, -1.) (2, 4.) (3, -1.) (8, -1.) >> > row 3: (2, -1.) (3, 4.) (4, -1.) (9, -1.) >> > row 4: (3, -1.) (4, 4.) (5, -1.) (10, -1.) >> > row 5: (4, -1.) (5, 4.) (11, -1.) >> > row 6: (0, -1.) (6, 4.) (7, -1.) (12, -1.) >> > row 7: (1, -1.) (6, -1.) (7, 4.) (8, -1.) (13, -1.) >> > row 8: (2, -1.) (7, -1.) (8, 4.) (9, -1.) (14, -1.) >> > row 9: (3, -1.) (8, -1.) (9, 4.) (10, -1.) (15, -1.) >> > row 10: (4, -1.) (9, -1.) (10, 4.) (11, -1.) (16, -1.) >> > row 11: (5, -1.) (10, -1.) (11, 4.) (17, -1.) >> > row 12: (6, -1.) (12, 4.) (13, -1.) (18, -1.) >> > row 13: (7, -1.) (12, -1.) (13, 4.) (14, -1.) (19, -1.) >> > row 14: (8, -1.) (13, -1.) (14, 4.) (15, -1.) (20, -1.) >> > row 15: (9, -1.) (14, -1.) (15, 4.) (16, -1.) (21, -1.) >> > row 16: (10, -1.) (15, -1.) (16, 4.) (17, -1.) (22, -1.) >> > row 17: (11, -1.) (16, -1.) (17, 4.) (23, -1.) >> > row 18: (12, -1.) (18, 4.) (19, -1.) (24, -1.) >> > row 19: (13, -1.) (18, -1.) (19, 4.) (20, -1.) (25, -1.) >> > row 20: (14, -1.) (19, -1.) (20, 4.) (21, -1.) (26, -1.) >> > row 21: (15, -1.) (20, -1.) (21, 4.) (22, -1.) (27, -1.) >> > row 22: (16, -1.) (21, -1.) (22, 4.) (23, -1.) (28, -1.) >> > row 23: (17, -1.) (22, -1.) (23, 4.) (29, -1.) >> > row 24: (18, -1.) (24, 4.) (25, -1.) (30, -1.) >> > row 25: (19, -1.) (24, -1.) (25, 4.) (26, -1.) (31, -1.) >> > row 26: (20, -1.) (25, -1.) (26, 4.) (27, -1.) (32, -1.) >> > row 27: (21, -1.) (26, -1.) (27, 4.) (28, -1.) (33, -1.) >> > row 28: (22, -1.) (27, -1.) (28, 4.) (29, -1.) (34, -1.) >> > row 29: (23, -1.) (28, -1.) (29, 4.) (35, -1.) >> > row 30: (24, -1.) (30, 4.) (31, -1.) >> > row 31: (25, -1.) (30, -1.) (31, 4.) (32, -1.) >> > row 32: (26, -1.) (31, -1.) (32, 4.) (33, -1.) >> > row 33: (27, -1.) (32, -1.) (33, 4.) (34, -1.) >> > row 34: (28, -1.) (33, -1.) (34, 4.) (35, -1.) >> > row 35: (29, -1.) (34, -1.) (35, 4.) >> > Second MatLoad! >> > Mat Object: 2 MPI processes >> > type: mpiaij >> > ==4592== Invalid read of size 4 >> > ==4592== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket >> (mpiaij.c:1402) >> > ==4592== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) >> > ==4592== by 0x53373D7: MatView (matrix.c:989) >> > ==4592== by 0x40107E: main (ex16.c:30) >> > ==4592== Address 0xa47b460 is 20 bytes after a block of size 28 alloc'd >> > ==4592== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) >> > ==4592== by 0x4FD121A: PetscMallocAlign (mal.c:28) >> > ==4592== by 0x5842C70: MatSetUpMultiply_MPIAIJ (mmaij.c:41) >> > ==4592== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) >> > ==4592== by 0x536B299: MatAssemblyEnd (matrix.c:5298) >> > ==4592== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) >> > ==4592== by 0x5337FEA: MatLoad (matrix.c:1101) >> > ==4592== by 0x400D9F: main (ex16.c:22) >> > ==4592== >> > ==4591== Invalid read of size 4 >> > ==4591== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket >> (mpiaij.c:1402) >> > ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) >> > ==4591== by 0x53373D7: MatView (matrix.c:989) >> > ==4591== by 0x40107E: main (ex16.c:30) >> > ==4591== Address 0xa482958 is 24 bytes before a block of size 7 alloc'd >> > ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) >> > ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) >> > ==4591== by 0x4F31FB5: PetscStrallocpy (str.c:197) >> > ==4591== by 0x4F0D3F5: PetscClassRegLogRegister (classlog.c:253) >> > ==4591== by 0x4EF96E2: PetscClassIdRegister (plog.c:2053) >> > ==4591== by 0x51FA018: VecInitializePackage (dlregisvec.c:165) >> > ==4591== by 0x51F6DE9: VecCreate (veccreate.c:35) >> > ==4591== by 0x51C49F0: VecCreateSeq (vseqcr.c:37) >> > ==4591== by 0x5843191: MatSetUpMultiply_MPIAIJ (mmaij.c:104) >> > ==4591== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) >> > ==4591== by 0x536B299: MatAssemblyEnd (matrix.c:5298) >> > ==4591== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) >> > ==4591== by 0x5337FEA: MatLoad (matrix.c:1101) >> > ==4591== by 0x400D9F: main (ex16.c:22) >> > ==4591== >> > [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> > [0]PETSC ERROR: Argument out of range >> > [0]PETSC ERROR: Column too large: col 96 max 35 >> > [0]PETSC ERROR: See https://urldefense.proofpoint. >> com/v2/url?u=http-3A__www.mcs.anl.gov_petsc_documentation_fa >> q.html&d=CwIFAg&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__ >> aEkJFOKJFd00&r=DUUt3SRGI0_JgtNaS3udV68GRkgV4ts7XKfj2opmiCY& >> m=yCFQeqGFVZhJtXzPwmjejP5oiMeddVxB4a_mxWbQYkA&s=lWoiLmjuyX1M >> 9FCbfQAwkLK2cAGeDvnXO-fMCKllDTE&e= for trouble shooting. >> > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1729-g4c4de23 >> GIT Date: 2016-10-20 22:22:58 +0000 >> > [0]PETSC ERROR: ./ex16 on a arch-idx64-slu named asterix by balay Fri >> Oct 21 18:47:51 2016 >> > [0]PETSC ERROR: Configure options --download-metis --download-parmetis >> --download-superlu_dist PETSC_ARCH=arch-idx64-slu >> > [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 585 in >> /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c >> > [0]PETSC ERROR: #2 MatAssemblyEnd_MPIAIJ() line 724 in >> /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c >> > [0]PETSC ERROR: #3 MatAssemblyEnd() line 5298 in >> /home/balay/petsc/src/mat/interface/matrix.c >> > [0]PETSC ERROR: #4 MatView_MPIAIJ_ASCIIorDraworSocket() line 1410 in >> /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c >> > [0]PETSC ERROR: #5 MatView_MPIAIJ() line 1440 in >> /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c >> > [0]PETSC ERROR: #6 MatView() line 989 in /home/balay/petsc/src/mat/inte >> rface/matrix.c >> > [0]PETSC ERROR: #7 main() line 30 in /home/balay/download-pine/x/su >> perlu_dist_test/ex16.c >> > [0]PETSC ERROR: PETSc Option Table entries: >> > [0]PETSC ERROR: -display :0.0 >> > [0]PETSC ERROR: -f /home/balay/datafiles/matrices/small >> > [0]PETSC ERROR: -malloc_dump >> > [0]PETSC ERROR: ----------------End of Error Message -------send entire >> error message to petsc-maint at mcs.anl.gov---------- >> > application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 >> > [cli_0]: aborting job: >> > application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 >> > ==4591== 16,965 (2,744 direct, 14,221 indirect) bytes in 1 blocks are >> definitely lost in loss record 1,014 of 1,016 >> > ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) >> > ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) >> > ==4591== by 0x52F3B14: MatCreate (gcreate.c:84) >> > ==4591== by 0x581390A: MatView_MPIAIJ_ASCIIorDraworSocket >> (mpiaij.c:1371) >> > ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) >> > ==4591== by 0x53373D7: MatView (matrix.c:989) >> > ==4591== by 0x40107E: main (ex16.c:30) >> > ==4591== >> > >> > ============================================================ >> ======================= >> > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >> > = PID 4591 RUNNING AT asterix >> > = EXIT CODE: 63 >> > = CLEANING UP REMAINING PROCESSES >> > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >> > ============================================================ >> ======================= >> > balay at asterix /home/balay/download-pine/x/superlu_dist_test >> > $ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Oct 24 09:25:00 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 24 Oct 2016 09:25:00 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 failure of repeated calls to MatLoad() or MatMPIAIJSetPreallocation() with the same matrix In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> <0d72e78d-52af-1d9a-677f-167d2efbd8ab@uni-mainz.de> <6B14246C-CA55-4E5B-BF5C-F01C33DACCB8@mcs.anl.gov> Message-ID: Yes - but this test code [that Hong is also using] is buggy due to using MatLoad() twice - so the corrupted Matrix does have wierd behavior later in PC. With your fix - the test code rpovided by Anton behaves fine for me. So Hong would have to restart the diagnosis - and I suspect all the wierd behavior she observed will go away [well I don't see the the original wired behavior with this test code anymore].. Sinced you said "This will also make MatMPIAIJSetPreallocation() work properly with multiple calls" - perhaps Anton's issue is also somehow releated? I think its best if he can try this fix. And if it doesn't work - then we'll need a better test case to reproduce. [Or perhaps Hong is using a different test code and is observing bugs with superlu_dist interface..] Satish On Mon, 24 Oct 2016, Barry Smith wrote: > > Hong wrote: (Note that it creates a new Mat each time so shouldn't be affected by the bug I fixed; it also "works" with MUMPs but not superlu_dist.) > > > It is not problem with Matload twice. The file has one matrix, but is loaded twice. > > Replacing pc with ksp, the code runs fine. > The error occurs when PCSetUp_LU() is called with SAME_NONZERO_PATTERN. > I'll further look at it later. > > Hong > ________________________________________ > From: Zhang, Hong > Sent: Friday, October 21, 2016 8:18 PM > To: Barry Smith; petsc-users > Subject: RE: [petsc-users] SuperLU_dist issue in 3.7.4 > > I am investigating it. The file has two matrices. The code takes following steps: > > PCCreate(PETSC_COMM_WORLD, &pc); > > MatCreate(PETSC_COMM_WORLD,&A); > MatLoad(A,fd); > PCSetOperators(pc,A,A); > PCSetUp(pc); > > MatCreate(PETSC_COMM_WORLD,&A); > MatLoad(A,fd); > PCSetOperators(pc,A,A); > PCSetUp(pc); //crash here with np=2, superlu_dist, not with mumps/superlu or superlu_dist np=1 > > Hong > > > On Oct 24, 2016, at 9:00 AM, Satish Balay wrote: > > > > Since the provided test code dosn't crash [and is valgrind clean] - > > with this fix - I'm not sure what bug Hong is chasing.. > > > > Satish > > > > On Mon, 24 Oct 2016, Barry Smith wrote: > > > >> > >> Anton, > >> > >> Sorry for any confusion. This doesn't resolve the SuperLU_DIST issue which I think Hong is working on, this only resolves multiple loads of matrices into the same Mat. > >> > >> Barry > >> > >>> On Oct 24, 2016, at 5:07 AM, Anton Popov wrote: > >>> > >>> Thank you Barry, Satish, Fande! > >>> > >>> Is there a chance to get this fix in the maintenance release 3.7.5 together with the latest SuperLU_DIST? Or next release is a more realistic option? > >>> > >>> Anton > >>> > >>> On 10/24/2016 01:58 AM, Satish Balay wrote: > >>>> The original testcode from Anton also works [i.e is valgrind clean] with this change.. > >>>> > >>>> Satish > >>>> > >>>> On Sun, 23 Oct 2016, Barry Smith wrote: > >>>> > >>>>> Thanks Satish, > >>>>> > >>>>> I have fixed this in barry/fix-matmpixxxsetpreallocation-reentrant (in next for testing) > >>>>> > >>>>> Fande, > >>>>> > >>>>> This will also make MatMPIAIJSetPreallocation() work properly with multiple calls (you will not need a MatReset()). > >>>>> > >>>>> Barry > >>>>> > >>>>> > >>>>>> On Oct 21, 2016, at 6:48 PM, Satish Balay wrote: > >>>>>> > >>>>>> On Fri, 21 Oct 2016, Barry Smith wrote: > >>>>>> > >>>>>>> valgrind first > >>>>>> balay at asterix /home/balay/download-pine/x/superlu_dist_test > >>>>>> $ mpiexec -n 2 $VG ./ex16 -f ~/datafiles/matrices/small > >>>>>> First MatLoad! > >>>>>> Mat Object: 2 MPI processes > >>>>>> type: mpiaij > >>>>>> row 0: (0, 4.) (1, -1.) (6, -1.) > >>>>>> row 1: (0, -1.) (1, 4.) (2, -1.) (7, -1.) > >>>>>> row 2: (1, -1.) (2, 4.) (3, -1.) (8, -1.) > >>>>>> row 3: (2, -1.) (3, 4.) (4, -1.) (9, -1.) > >>>>>> row 4: (3, -1.) (4, 4.) (5, -1.) (10, -1.) > >>>>>> row 5: (4, -1.) (5, 4.) (11, -1.) > >>>>>> row 6: (0, -1.) (6, 4.) (7, -1.) (12, -1.) > >>>>>> row 7: (1, -1.) (6, -1.) (7, 4.) (8, -1.) (13, -1.) > >>>>>> row 8: (2, -1.) (7, -1.) (8, 4.) (9, -1.) (14, -1.) > >>>>>> row 9: (3, -1.) (8, -1.) (9, 4.) (10, -1.) (15, -1.) > >>>>>> row 10: (4, -1.) (9, -1.) (10, 4.) (11, -1.) (16, -1.) > >>>>>> row 11: (5, -1.) (10, -1.) (11, 4.) (17, -1.) > >>>>>> row 12: (6, -1.) (12, 4.) (13, -1.) (18, -1.) > >>>>>> row 13: (7, -1.) (12, -1.) (13, 4.) (14, -1.) (19, -1.) > >>>>>> row 14: (8, -1.) (13, -1.) (14, 4.) (15, -1.) (20, -1.) > >>>>>> row 15: (9, -1.) (14, -1.) (15, 4.) (16, -1.) (21, -1.) > >>>>>> row 16: (10, -1.) (15, -1.) (16, 4.) (17, -1.) (22, -1.) > >>>>>> row 17: (11, -1.) (16, -1.) (17, 4.) (23, -1.) > >>>>>> row 18: (12, -1.) (18, 4.) (19, -1.) (24, -1.) > >>>>>> row 19: (13, -1.) (18, -1.) (19, 4.) (20, -1.) (25, -1.) > >>>>>> row 20: (14, -1.) (19, -1.) (20, 4.) (21, -1.) (26, -1.) > >>>>>> row 21: (15, -1.) (20, -1.) (21, 4.) (22, -1.) (27, -1.) > >>>>>> row 22: (16, -1.) (21, -1.) (22, 4.) (23, -1.) (28, -1.) > >>>>>> row 23: (17, -1.) (22, -1.) (23, 4.) (29, -1.) > >>>>>> row 24: (18, -1.) (24, 4.) (25, -1.) (30, -1.) > >>>>>> row 25: (19, -1.) (24, -1.) (25, 4.) (26, -1.) (31, -1.) > >>>>>> row 26: (20, -1.) (25, -1.) (26, 4.) (27, -1.) (32, -1.) > >>>>>> row 27: (21, -1.) (26, -1.) (27, 4.) (28, -1.) (33, -1.) > >>>>>> row 28: (22, -1.) (27, -1.) (28, 4.) (29, -1.) (34, -1.) > >>>>>> row 29: (23, -1.) (28, -1.) (29, 4.) (35, -1.) > >>>>>> row 30: (24, -1.) (30, 4.) (31, -1.) > >>>>>> row 31: (25, -1.) (30, -1.) (31, 4.) (32, -1.) > >>>>>> row 32: (26, -1.) (31, -1.) (32, 4.) (33, -1.) > >>>>>> row 33: (27, -1.) (32, -1.) (33, 4.) (34, -1.) > >>>>>> row 34: (28, -1.) (33, -1.) (34, 4.) (35, -1.) > >>>>>> row 35: (29, -1.) (34, -1.) (35, 4.) > >>>>>> Second MatLoad! > >>>>>> Mat Object: 2 MPI processes > >>>>>> type: mpiaij > >>>>>> ==4592== Invalid read of size 4 > >>>>>> ==4592== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1402) > >>>>>> ==4592== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) > >>>>>> ==4592== by 0x53373D7: MatView (matrix.c:989) > >>>>>> ==4592== by 0x40107E: main (ex16.c:30) > >>>>>> ==4592== Address 0xa47b460 is 20 bytes after a block of size 28 alloc'd > >>>>>> ==4592== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) > >>>>>> ==4592== by 0x4FD121A: PetscMallocAlign (mal.c:28) > >>>>>> ==4592== by 0x5842C70: MatSetUpMultiply_MPIAIJ (mmaij.c:41) > >>>>>> ==4592== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) > >>>>>> ==4592== by 0x536B299: MatAssemblyEnd (matrix.c:5298) > >>>>>> ==4592== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) > >>>>>> ==4592== by 0x5337FEA: MatLoad (matrix.c:1101) > >>>>>> ==4592== by 0x400D9F: main (ex16.c:22) > >>>>>> ==4592== > >>>>>> ==4591== Invalid read of size 4 > >>>>>> ==4591== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1402) > >>>>>> ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) > >>>>>> ==4591== by 0x53373D7: MatView (matrix.c:989) > >>>>>> ==4591== by 0x40107E: main (ex16.c:30) > >>>>>> ==4591== Address 0xa482958 is 24 bytes before a block of size 7 alloc'd > >>>>>> ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) > >>>>>> ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) > >>>>>> ==4591== by 0x4F31FB5: PetscStrallocpy (str.c:197) > >>>>>> ==4591== by 0x4F0D3F5: PetscClassRegLogRegister (classlog.c:253) > >>>>>> ==4591== by 0x4EF96E2: PetscClassIdRegister (plog.c:2053) > >>>>>> ==4591== by 0x51FA018: VecInitializePackage (dlregisvec.c:165) > >>>>>> ==4591== by 0x51F6DE9: VecCreate (veccreate.c:35) > >>>>>> ==4591== by 0x51C49F0: VecCreateSeq (vseqcr.c:37) > >>>>>> ==4591== by 0x5843191: MatSetUpMultiply_MPIAIJ (mmaij.c:104) > >>>>>> ==4591== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) > >>>>>> ==4591== by 0x536B299: MatAssemblyEnd (matrix.c:5298) > >>>>>> ==4591== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) > >>>>>> ==4591== by 0x5337FEA: MatLoad (matrix.c:1101) > >>>>>> ==4591== by 0x400D9F: main (ex16.c:22) > >>>>>> ==4591== > >>>>>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > >>>>>> [0]PETSC ERROR: Argument out of range > >>>>>> [0]PETSC ERROR: Column too large: col 96 max 35 > >>>>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > >>>>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1729-g4c4de23 GIT Date: 2016-10-20 22:22:58 +0000 > >>>>>> [0]PETSC ERROR: ./ex16 on a arch-idx64-slu named asterix by balay Fri Oct 21 18:47:51 2016 > >>>>>> [0]PETSC ERROR: Configure options --download-metis --download-parmetis --download-superlu_dist PETSC_ARCH=arch-idx64-slu > >>>>>> [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 585 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > >>>>>> [0]PETSC ERROR: #2 MatAssemblyEnd_MPIAIJ() line 724 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > >>>>>> [0]PETSC ERROR: #3 MatAssemblyEnd() line 5298 in /home/balay/petsc/src/mat/interface/matrix.c > >>>>>> [0]PETSC ERROR: #4 MatView_MPIAIJ_ASCIIorDraworSocket() line 1410 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > >>>>>> [0]PETSC ERROR: #5 MatView_MPIAIJ() line 1440 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > >>>>>> [0]PETSC ERROR: #6 MatView() line 989 in /home/balay/petsc/src/mat/interface/matrix.c > >>>>>> [0]PETSC ERROR: #7 main() line 30 in /home/balay/download-pine/x/superlu_dist_test/ex16.c > >>>>>> [0]PETSC ERROR: PETSc Option Table entries: > >>>>>> [0]PETSC ERROR: -display :0.0 > >>>>>> [0]PETSC ERROR: -f /home/balay/datafiles/matrices/small > >>>>>> [0]PETSC ERROR: -malloc_dump > >>>>>> [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- > >>>>>> application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 > >>>>>> [cli_0]: aborting job: > >>>>>> application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 > >>>>>> ==4591== 16,965 (2,744 direct, 14,221 indirect) bytes in 1 blocks are definitely lost in loss record 1,014 of 1,016 > >>>>>> ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) > >>>>>> ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) > >>>>>> ==4591== by 0x52F3B14: MatCreate (gcreate.c:84) > >>>>>> ==4591== by 0x581390A: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1371) > >>>>>> ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) > >>>>>> ==4591== by 0x53373D7: MatView (matrix.c:989) > >>>>>> ==4591== by 0x40107E: main (ex16.c:30) > >>>>>> ==4591== > >>>>>> > >>>>>> =================================================================================== > >>>>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > >>>>>> = PID 4591 RUNNING AT asterix > >>>>>> = EXIT CODE: 63 > >>>>>> = CLEANING UP REMAINING PROCESSES > >>>>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > >>>>>> =================================================================================== > >>>>>> balay at asterix /home/balay/download-pine/x/superlu_dist_test > >>>>>> $ > >>>>> > >>> > >> > >> > > > > From bsmith at mcs.anl.gov Mon Oct 24 09:31:17 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 24 Oct 2016 09:31:17 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 failure of repeated calls to MatLoad() or MatMPIAIJSetPreallocation() with the same matrix In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> <0d72e78d-52af-1d9a-677f-167d2efbd8ab@uni-mainz.de> <6B14246C-CA55-4E5B-BF5C-F01C33DACCB8@mcs.anl.gov> Message-ID: <94801705-937C-4D8D-BDCE-3E1AEAA4BCB4@mcs.anl.gov> > [Or perhaps Hong is using a different test code and is observing bugs > with superlu_dist interface..] She states that her test does a NEW MatCreate() for each matrix load (I cut and pasted it in the email I just sent). The bug I fixed was only related to using the SAME matrix from one MatLoad() in another MatLoad(). Barry > On Oct 24, 2016, at 9:25 AM, Satish Balay wrote: > > Yes - but this test code [that Hong is also using] is buggy due to > using MatLoad() twice - so the corrupted Matrix does have wierd > behavior later in PC. > > With your fix - the test code rpovided by Anton behaves fine for > me. So Hong would have to restart the diagnosis - and I suspect all > the wierd behavior she observed will go away [well I don't see the the > original wired behavior with this test code anymore].. > > Sinced you said "This will also make MatMPIAIJSetPreallocation() work > properly with multiple calls" - perhaps Anton's issue is also somehow > releated? I think its best if he can try this fix. > > And if it doesn't work - then we'll need a better test case to > reproduce. > > [Or perhaps Hong is using a different test code and is observing bugs > with superlu_dist interface..] > > Satish > > On Mon, 24 Oct 2016, Barry Smith wrote: > >> >> Hong wrote: (Note that it creates a new Mat each time so shouldn't be affected by the bug I fixed; it also "works" with MUMPs but not superlu_dist.) >> >> >> It is not problem with Matload twice. The file has one matrix, but is loaded twice. >> >> Replacing pc with ksp, the code runs fine. >> The error occurs when PCSetUp_LU() is called with SAME_NONZERO_PATTERN. >> I'll further look at it later. >> >> Hong >> ________________________________________ >> From: Zhang, Hong >> Sent: Friday, October 21, 2016 8:18 PM >> To: Barry Smith; petsc-users >> Subject: RE: [petsc-users] SuperLU_dist issue in 3.7.4 >> >> I am investigating it. The file has two matrices. The code takes following steps: >> >> PCCreate(PETSC_COMM_WORLD, &pc); >> >> MatCreate(PETSC_COMM_WORLD,&A); >> MatLoad(A,fd); >> PCSetOperators(pc,A,A); >> PCSetUp(pc); >> >> MatCreate(PETSC_COMM_WORLD,&A); >> MatLoad(A,fd); >> PCSetOperators(pc,A,A); >> PCSetUp(pc); //crash here with np=2, superlu_dist, not with mumps/superlu or superlu_dist np=1 >> >> Hong >> >>> On Oct 24, 2016, at 9:00 AM, Satish Balay wrote: >>> >>> Since the provided test code dosn't crash [and is valgrind clean] - >>> with this fix - I'm not sure what bug Hong is chasing.. >>> >>> Satish >>> >>> On Mon, 24 Oct 2016, Barry Smith wrote: >>> >>>> >>>> Anton, >>>> >>>> Sorry for any confusion. This doesn't resolve the SuperLU_DIST issue which I think Hong is working on, this only resolves multiple loads of matrices into the same Mat. >>>> >>>> Barry >>>> >>>>> On Oct 24, 2016, at 5:07 AM, Anton Popov wrote: >>>>> >>>>> Thank you Barry, Satish, Fande! >>>>> >>>>> Is there a chance to get this fix in the maintenance release 3.7.5 together with the latest SuperLU_DIST? Or next release is a more realistic option? >>>>> >>>>> Anton >>>>> >>>>> On 10/24/2016 01:58 AM, Satish Balay wrote: >>>>>> The original testcode from Anton also works [i.e is valgrind clean] with this change.. >>>>>> >>>>>> Satish >>>>>> >>>>>> On Sun, 23 Oct 2016, Barry Smith wrote: >>>>>> >>>>>>> Thanks Satish, >>>>>>> >>>>>>> I have fixed this in barry/fix-matmpixxxsetpreallocation-reentrant (in next for testing) >>>>>>> >>>>>>> Fande, >>>>>>> >>>>>>> This will also make MatMPIAIJSetPreallocation() work properly with multiple calls (you will not need a MatReset()). >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>>> On Oct 21, 2016, at 6:48 PM, Satish Balay wrote: >>>>>>>> >>>>>>>> On Fri, 21 Oct 2016, Barry Smith wrote: >>>>>>>> >>>>>>>>> valgrind first >>>>>>>> balay at asterix /home/balay/download-pine/x/superlu_dist_test >>>>>>>> $ mpiexec -n 2 $VG ./ex16 -f ~/datafiles/matrices/small >>>>>>>> First MatLoad! >>>>>>>> Mat Object: 2 MPI processes >>>>>>>> type: mpiaij >>>>>>>> row 0: (0, 4.) (1, -1.) (6, -1.) >>>>>>>> row 1: (0, -1.) (1, 4.) (2, -1.) (7, -1.) >>>>>>>> row 2: (1, -1.) (2, 4.) (3, -1.) (8, -1.) >>>>>>>> row 3: (2, -1.) (3, 4.) (4, -1.) (9, -1.) >>>>>>>> row 4: (3, -1.) (4, 4.) (5, -1.) (10, -1.) >>>>>>>> row 5: (4, -1.) (5, 4.) (11, -1.) >>>>>>>> row 6: (0, -1.) (6, 4.) (7, -1.) (12, -1.) >>>>>>>> row 7: (1, -1.) (6, -1.) (7, 4.) (8, -1.) (13, -1.) >>>>>>>> row 8: (2, -1.) (7, -1.) (8, 4.) (9, -1.) (14, -1.) >>>>>>>> row 9: (3, -1.) (8, -1.) (9, 4.) (10, -1.) (15, -1.) >>>>>>>> row 10: (4, -1.) (9, -1.) (10, 4.) (11, -1.) (16, -1.) >>>>>>>> row 11: (5, -1.) (10, -1.) (11, 4.) (17, -1.) >>>>>>>> row 12: (6, -1.) (12, 4.) (13, -1.) (18, -1.) >>>>>>>> row 13: (7, -1.) (12, -1.) (13, 4.) (14, -1.) (19, -1.) >>>>>>>> row 14: (8, -1.) (13, -1.) (14, 4.) (15, -1.) (20, -1.) >>>>>>>> row 15: (9, -1.) (14, -1.) (15, 4.) (16, -1.) (21, -1.) >>>>>>>> row 16: (10, -1.) (15, -1.) (16, 4.) (17, -1.) (22, -1.) >>>>>>>> row 17: (11, -1.) (16, -1.) (17, 4.) (23, -1.) >>>>>>>> row 18: (12, -1.) (18, 4.) (19, -1.) (24, -1.) >>>>>>>> row 19: (13, -1.) (18, -1.) (19, 4.) (20, -1.) (25, -1.) >>>>>>>> row 20: (14, -1.) (19, -1.) (20, 4.) (21, -1.) (26, -1.) >>>>>>>> row 21: (15, -1.) (20, -1.) (21, 4.) (22, -1.) (27, -1.) >>>>>>>> row 22: (16, -1.) (21, -1.) (22, 4.) (23, -1.) (28, -1.) >>>>>>>> row 23: (17, -1.) (22, -1.) (23, 4.) (29, -1.) >>>>>>>> row 24: (18, -1.) (24, 4.) (25, -1.) (30, -1.) >>>>>>>> row 25: (19, -1.) (24, -1.) (25, 4.) (26, -1.) (31, -1.) >>>>>>>> row 26: (20, -1.) (25, -1.) (26, 4.) (27, -1.) (32, -1.) >>>>>>>> row 27: (21, -1.) (26, -1.) (27, 4.) (28, -1.) (33, -1.) >>>>>>>> row 28: (22, -1.) (27, -1.) (28, 4.) (29, -1.) (34, -1.) >>>>>>>> row 29: (23, -1.) (28, -1.) (29, 4.) (35, -1.) >>>>>>>> row 30: (24, -1.) (30, 4.) (31, -1.) >>>>>>>> row 31: (25, -1.) (30, -1.) (31, 4.) (32, -1.) >>>>>>>> row 32: (26, -1.) (31, -1.) (32, 4.) (33, -1.) >>>>>>>> row 33: (27, -1.) (32, -1.) (33, 4.) (34, -1.) >>>>>>>> row 34: (28, -1.) (33, -1.) (34, 4.) (35, -1.) >>>>>>>> row 35: (29, -1.) (34, -1.) (35, 4.) >>>>>>>> Second MatLoad! >>>>>>>> Mat Object: 2 MPI processes >>>>>>>> type: mpiaij >>>>>>>> ==4592== Invalid read of size 4 >>>>>>>> ==4592== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1402) >>>>>>>> ==4592== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) >>>>>>>> ==4592== by 0x53373D7: MatView (matrix.c:989) >>>>>>>> ==4592== by 0x40107E: main (ex16.c:30) >>>>>>>> ==4592== Address 0xa47b460 is 20 bytes after a block of size 28 alloc'd >>>>>>>> ==4592== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) >>>>>>>> ==4592== by 0x4FD121A: PetscMallocAlign (mal.c:28) >>>>>>>> ==4592== by 0x5842C70: MatSetUpMultiply_MPIAIJ (mmaij.c:41) >>>>>>>> ==4592== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) >>>>>>>> ==4592== by 0x536B299: MatAssemblyEnd (matrix.c:5298) >>>>>>>> ==4592== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) >>>>>>>> ==4592== by 0x5337FEA: MatLoad (matrix.c:1101) >>>>>>>> ==4592== by 0x400D9F: main (ex16.c:22) >>>>>>>> ==4592== >>>>>>>> ==4591== Invalid read of size 4 >>>>>>>> ==4591== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1402) >>>>>>>> ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) >>>>>>>> ==4591== by 0x53373D7: MatView (matrix.c:989) >>>>>>>> ==4591== by 0x40107E: main (ex16.c:30) >>>>>>>> ==4591== Address 0xa482958 is 24 bytes before a block of size 7 alloc'd >>>>>>>> ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) >>>>>>>> ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) >>>>>>>> ==4591== by 0x4F31FB5: PetscStrallocpy (str.c:197) >>>>>>>> ==4591== by 0x4F0D3F5: PetscClassRegLogRegister (classlog.c:253) >>>>>>>> ==4591== by 0x4EF96E2: PetscClassIdRegister (plog.c:2053) >>>>>>>> ==4591== by 0x51FA018: VecInitializePackage (dlregisvec.c:165) >>>>>>>> ==4591== by 0x51F6DE9: VecCreate (veccreate.c:35) >>>>>>>> ==4591== by 0x51C49F0: VecCreateSeq (vseqcr.c:37) >>>>>>>> ==4591== by 0x5843191: MatSetUpMultiply_MPIAIJ (mmaij.c:104) >>>>>>>> ==4591== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) >>>>>>>> ==4591== by 0x536B299: MatAssemblyEnd (matrix.c:5298) >>>>>>>> ==4591== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) >>>>>>>> ==4591== by 0x5337FEA: MatLoad (matrix.c:1101) >>>>>>>> ==4591== by 0x400D9F: main (ex16.c:22) >>>>>>>> ==4591== >>>>>>>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>>>>>> [0]PETSC ERROR: Argument out of range >>>>>>>> [0]PETSC ERROR: Column too large: col 96 max 35 >>>>>>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>>>>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1729-g4c4de23 GIT Date: 2016-10-20 22:22:58 +0000 >>>>>>>> [0]PETSC ERROR: ./ex16 on a arch-idx64-slu named asterix by balay Fri Oct 21 18:47:51 2016 >>>>>>>> [0]PETSC ERROR: Configure options --download-metis --download-parmetis --download-superlu_dist PETSC_ARCH=arch-idx64-slu >>>>>>>> [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 585 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>> [0]PETSC ERROR: #2 MatAssemblyEnd_MPIAIJ() line 724 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>> [0]PETSC ERROR: #3 MatAssemblyEnd() line 5298 in /home/balay/petsc/src/mat/interface/matrix.c >>>>>>>> [0]PETSC ERROR: #4 MatView_MPIAIJ_ASCIIorDraworSocket() line 1410 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>> [0]PETSC ERROR: #5 MatView_MPIAIJ() line 1440 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>>>>> [0]PETSC ERROR: #6 MatView() line 989 in /home/balay/petsc/src/mat/interface/matrix.c >>>>>>>> [0]PETSC ERROR: #7 main() line 30 in /home/balay/download-pine/x/superlu_dist_test/ex16.c >>>>>>>> [0]PETSC ERROR: PETSc Option Table entries: >>>>>>>> [0]PETSC ERROR: -display :0.0 >>>>>>>> [0]PETSC ERROR: -f /home/balay/datafiles/matrices/small >>>>>>>> [0]PETSC ERROR: -malloc_dump >>>>>>>> [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- >>>>>>>> application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 >>>>>>>> [cli_0]: aborting job: >>>>>>>> application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 >>>>>>>> ==4591== 16,965 (2,744 direct, 14,221 indirect) bytes in 1 blocks are definitely lost in loss record 1,014 of 1,016 >>>>>>>> ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) >>>>>>>> ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) >>>>>>>> ==4591== by 0x52F3B14: MatCreate (gcreate.c:84) >>>>>>>> ==4591== by 0x581390A: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1371) >>>>>>>> ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) >>>>>>>> ==4591== by 0x53373D7: MatView (matrix.c:989) >>>>>>>> ==4591== by 0x40107E: main (ex16.c:30) >>>>>>>> ==4591== >>>>>>>> >>>>>>>> =================================================================================== >>>>>>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>>>>>>> = PID 4591 RUNNING AT asterix >>>>>>>> = EXIT CODE: 63 >>>>>>>> = CLEANING UP REMAINING PROCESSES >>>>>>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >>>>>>>> =================================================================================== >>>>>>>> balay at asterix /home/balay/download-pine/x/superlu_dist_test >>>>>>>> $ >>>>>>> >>>>> >>>> >>>> >>> >> >> > From bsmith at mcs.anl.gov Mon Oct 24 09:33:45 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 24 Oct 2016 09:33:45 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 failure of repeated calls to MatLoad() or MatMPIAIJSetPreallocation() with the same matrix In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <0ed184b6-15f0-21ff-8138-94f60d317e06@uni-mainz.de> <2c18b572-9d4e-b6f1-d624-7799bd46b849@uni-mainz.de> <09f620ac-25a8-6cb8-9aad-2ff473750762@uni-mainz.de> <9a6c60b6-3573-a704-22e4-5d707044bd19@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> Message-ID: <8EA31F05-E679-4B51-9088-AE2F63CFAA9A@mcs.anl.gov> > On Oct 24, 2016, at 9:24 AM, Kong, Fande wrote: > > > > On Mon, Oct 24, 2016 at 8:07 AM, Kong, Fande wrote: > > > On Sun, Oct 23, 2016 at 3:56 PM, Barry Smith wrote: > > Thanks Satish, > > I have fixed this in barry/fix-matmpixxxsetpreallocation-reentrant (in next for testing) > > Fande, > > This will also make MatMPIAIJSetPreallocation() work properly with multiple calls (you will not need a MatReset()). > > > Does this work for MPIAIJ only? There are also other functions: MatSeqAIJSetPreallocation(), MatMPIAIJSetPreallocation(), MatSeqBAIJSetPreallocation(), MatMPIBAIJSetPreallocation(), MatSeqSBAIJSetPreallocation(), MatMPISBAIJSetPreallocation(), and MatXAIJSetPreallocation. It works for all of them. > > We have to use different function for different type. Could we have an unified-interface for all of them? Supposedly you can call MatXAIJSetPreallocation() and it is the same as calling all of them, so I think it is a "unified" interface. Barry > > Fande, > > > Barry > > Thanks, Barry. > > Fande, > > > > > On Oct 21, 2016, at 6:48 PM, Satish Balay wrote: > > > > On Fri, 21 Oct 2016, Barry Smith wrote: > > > >> > >> valgrind first > > > > balay at asterix /home/balay/download-pine/x/superlu_dist_test > > $ mpiexec -n 2 $VG ./ex16 -f ~/datafiles/matrices/small > > First MatLoad! > > Mat Object: 2 MPI processes > > type: mpiaij > > row 0: (0, 4.) (1, -1.) (6, -1.) > > row 1: (0, -1.) (1, 4.) (2, -1.) (7, -1.) > > row 2: (1, -1.) (2, 4.) (3, -1.) (8, -1.) > > row 3: (2, -1.) (3, 4.) (4, -1.) (9, -1.) > > row 4: (3, -1.) (4, 4.) (5, -1.) (10, -1.) > > row 5: (4, -1.) (5, 4.) (11, -1.) > > row 6: (0, -1.) (6, 4.) (7, -1.) (12, -1.) > > row 7: (1, -1.) (6, -1.) (7, 4.) (8, -1.) (13, -1.) > > row 8: (2, -1.) (7, -1.) (8, 4.) (9, -1.) (14, -1.) > > row 9: (3, -1.) (8, -1.) (9, 4.) (10, -1.) (15, -1.) > > row 10: (4, -1.) (9, -1.) (10, 4.) (11, -1.) (16, -1.) > > row 11: (5, -1.) (10, -1.) (11, 4.) (17, -1.) > > row 12: (6, -1.) (12, 4.) (13, -1.) (18, -1.) > > row 13: (7, -1.) (12, -1.) (13, 4.) (14, -1.) (19, -1.) > > row 14: (8, -1.) (13, -1.) (14, 4.) (15, -1.) (20, -1.) > > row 15: (9, -1.) (14, -1.) (15, 4.) (16, -1.) (21, -1.) > > row 16: (10, -1.) (15, -1.) (16, 4.) (17, -1.) (22, -1.) > > row 17: (11, -1.) (16, -1.) (17, 4.) (23, -1.) > > row 18: (12, -1.) (18, 4.) (19, -1.) (24, -1.) > > row 19: (13, -1.) (18, -1.) (19, 4.) (20, -1.) (25, -1.) > > row 20: (14, -1.) (19, -1.) (20, 4.) (21, -1.) (26, -1.) > > row 21: (15, -1.) (20, -1.) (21, 4.) (22, -1.) (27, -1.) > > row 22: (16, -1.) (21, -1.) (22, 4.) (23, -1.) (28, -1.) > > row 23: (17, -1.) (22, -1.) (23, 4.) (29, -1.) > > row 24: (18, -1.) (24, 4.) (25, -1.) (30, -1.) > > row 25: (19, -1.) (24, -1.) (25, 4.) (26, -1.) (31, -1.) > > row 26: (20, -1.) (25, -1.) (26, 4.) (27, -1.) (32, -1.) > > row 27: (21, -1.) (26, -1.) (27, 4.) (28, -1.) (33, -1.) > > row 28: (22, -1.) (27, -1.) (28, 4.) (29, -1.) (34, -1.) > > row 29: (23, -1.) (28, -1.) (29, 4.) (35, -1.) > > row 30: (24, -1.) (30, 4.) (31, -1.) > > row 31: (25, -1.) (30, -1.) (31, 4.) (32, -1.) > > row 32: (26, -1.) (31, -1.) (32, 4.) (33, -1.) > > row 33: (27, -1.) (32, -1.) (33, 4.) (34, -1.) > > row 34: (28, -1.) (33, -1.) (34, 4.) (35, -1.) > > row 35: (29, -1.) (34, -1.) (35, 4.) > > Second MatLoad! > > Mat Object: 2 MPI processes > > type: mpiaij > > ==4592== Invalid read of size 4 > > ==4592== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1402) > > ==4592== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) > > ==4592== by 0x53373D7: MatView (matrix.c:989) > > ==4592== by 0x40107E: main (ex16.c:30) > > ==4592== Address 0xa47b460 is 20 bytes after a block of size 28 alloc'd > > ==4592== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) > > ==4592== by 0x4FD121A: PetscMallocAlign (mal.c:28) > > ==4592== by 0x5842C70: MatSetUpMultiply_MPIAIJ (mmaij.c:41) > > ==4592== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) > > ==4592== by 0x536B299: MatAssemblyEnd (matrix.c:5298) > > ==4592== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) > > ==4592== by 0x5337FEA: MatLoad (matrix.c:1101) > > ==4592== by 0x400D9F: main (ex16.c:22) > > ==4592== > > ==4591== Invalid read of size 4 > > ==4591== at 0x5814014: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1402) > > ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) > > ==4591== by 0x53373D7: MatView (matrix.c:989) > > ==4591== by 0x40107E: main (ex16.c:30) > > ==4591== Address 0xa482958 is 24 bytes before a block of size 7 alloc'd > > ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) > > ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) > > ==4591== by 0x4F31FB5: PetscStrallocpy (str.c:197) > > ==4591== by 0x4F0D3F5: PetscClassRegLogRegister (classlog.c:253) > > ==4591== by 0x4EF96E2: PetscClassIdRegister (plog.c:2053) > > ==4591== by 0x51FA018: VecInitializePackage (dlregisvec.c:165) > > ==4591== by 0x51F6DE9: VecCreate (veccreate.c:35) > > ==4591== by 0x51C49F0: VecCreateSeq (vseqcr.c:37) > > ==4591== by 0x5843191: MatSetUpMultiply_MPIAIJ (mmaij.c:104) > > ==4591== by 0x5809943: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) > > ==4591== by 0x536B299: MatAssemblyEnd (matrix.c:5298) > > ==4591== by 0x5829C05: MatLoad_MPIAIJ (mpiaij.c:3032) > > ==4591== by 0x5337FEA: MatLoad (matrix.c:1101) > > ==4591== by 0x400D9F: main (ex16.c:22) > > ==4591== > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > [0]PETSC ERROR: Argument out of range > > [0]PETSC ERROR: Column too large: col 96 max 35 > > [0]PETSC ERROR: See https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mcs.anl.gov_petsc_documentation_faq.html&d=CwIFAg&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=yCFQeqGFVZhJtXzPwmjejP5oiMeddVxB4a_mxWbQYkA&s=lWoiLmjuyX1M9FCbfQAwkLK2cAGeDvnXO-fMCKllDTE&e= for trouble shooting. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.4-1729-g4c4de23 GIT Date: 2016-10-20 22:22:58 +0000 > > [0]PETSC ERROR: ./ex16 on a arch-idx64-slu named asterix by balay Fri Oct 21 18:47:51 2016 > > [0]PETSC ERROR: Configure options --download-metis --download-parmetis --download-superlu_dist PETSC_ARCH=arch-idx64-slu > > [0]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 585 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > > [0]PETSC ERROR: #2 MatAssemblyEnd_MPIAIJ() line 724 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > > [0]PETSC ERROR: #3 MatAssemblyEnd() line 5298 in /home/balay/petsc/src/mat/interface/matrix.c > > [0]PETSC ERROR: #4 MatView_MPIAIJ_ASCIIorDraworSocket() line 1410 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > > [0]PETSC ERROR: #5 MatView_MPIAIJ() line 1440 in /home/balay/petsc/src/mat/impls/aij/mpi/mpiaij.c > > [0]PETSC ERROR: #6 MatView() line 989 in /home/balay/petsc/src/mat/interface/matrix.c > > [0]PETSC ERROR: #7 main() line 30 in /home/balay/download-pine/x/superlu_dist_test/ex16.c > > [0]PETSC ERROR: PETSc Option Table entries: > > [0]PETSC ERROR: -display :0.0 > > [0]PETSC ERROR: -f /home/balay/datafiles/matrices/small > > [0]PETSC ERROR: -malloc_dump > > [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- > > application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 > > [cli_0]: aborting job: > > application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 > > ==4591== 16,965 (2,744 direct, 14,221 indirect) bytes in 1 blocks are definitely lost in loss record 1,014 of 1,016 > > ==4591== at 0x4C2FF83: memalign (vg_replace_malloc.c:858) > > ==4591== by 0x4FD121A: PetscMallocAlign (mal.c:28) > > ==4591== by 0x52F3B14: MatCreate (gcreate.c:84) > > ==4591== by 0x581390A: MatView_MPIAIJ_ASCIIorDraworSocket (mpiaij.c:1371) > > ==4591== by 0x5814A75: MatView_MPIAIJ (mpiaij.c:1440) > > ==4591== by 0x53373D7: MatView (matrix.c:989) > > ==4591== by 0x40107E: main (ex16.c:30) > > ==4591== > > > > =================================================================================== > > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > > = PID 4591 RUNNING AT asterix > > = EXIT CODE: 63 > > = CLEANING UP REMAINING PROCESSES > > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > > =================================================================================== > > balay at asterix /home/balay/download-pine/x/superlu_dist_test > > $ > > > From balay at mcs.anl.gov Mon Oct 24 09:34:57 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 24 Oct 2016 09:34:57 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 failure of repeated calls to MatLoad() or MatMPIAIJSetPreallocation() with the same matrix In-Reply-To: <94801705-937C-4D8D-BDCE-3E1AEAA4BCB4@mcs.anl.gov> References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> <0d72e78d-52af-1d9a-677f-167d2efbd8ab@uni-mainz.de> <6B14246C-CA55-4E5B-BF5C-F01C33DACCB8@mcs.anl.gov> <94801705-937C-4D8D-BDCE-3E1AEAA4BCB4@mcs.anl.gov> Message-ID: On Mon, 24 Oct 2016, Barry Smith wrote: > > > [Or perhaps Hong is using a different test code and is observing bugs > > with superlu_dist interface..] > > She states that her test does a NEW MatCreate() for each matrix load (I cut and pasted it in the email I just sent). The bug I fixed was only related to using the SAME matrix from one MatLoad() in another MatLoad(). Ah - ok.. Sorry - wasn't thinking clearly :( Satish From hzhang at mcs.anl.gov Mon Oct 24 10:47:47 2016 From: hzhang at mcs.anl.gov (Hong) Date: Mon, 24 Oct 2016 10:47:47 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 failure of repeated calls to MatLoad() or MatMPIAIJSetPreallocation() with the same matrix In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> <0d72e78d-52af-1d9a-677f-167d2efbd8ab@uni-mainz.de> <6B14246C-CA55-4E5B-BF5C-F01C33DACCB8@mcs.anl.gov> <94801705-937C-4D8D-BDCE-3E1AEAA4BCB4@mcs.anl.gov> Message-ID: Barry, Your change indeed fixed the error of his testing code. As Satish tested, on your branch, ex16 runs smooth. I do not understand why on maint or master branch, ex16 creases inside superlu_dist, but not with mumps. Hong On Mon, Oct 24, 2016 at 9:34 AM, Satish Balay wrote: > On Mon, 24 Oct 2016, Barry Smith wrote: > > > > > > [Or perhaps Hong is using a different test code and is observing bugs > > > with superlu_dist interface..] > > > > She states that her test does a NEW MatCreate() for each matrix load > (I cut and pasted it in the email I just sent). The bug I fixed was only > related to using the SAME matrix from one MatLoad() in another MatLoad(). > > Ah - ok.. Sorry - wasn't thinking clearly :( > > Satish > -------------- next part -------------- An HTML attachment was scrubbed... URL: From popov at uni-mainz.de Mon Oct 24 12:09:37 2016 From: popov at uni-mainz.de (Anton Popov) Date: Mon, 24 Oct 2016 19:09:37 +0200 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 failure of repeated calls to MatLoad() or MatMPIAIJSetPreallocation() with the same matrix In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> <0d72e78d-52af-1d9a-677f-167d2efbd8ab@uni-mainz.de> <6B14246C-CA55-4E5B-BF5C-F01C33DACCB8@mcs.anl.gov> <94801705-937C-4D8D-BDCE-3E1AEAA4BCB4@mcs.anl.gov> Message-ID: On 10/24/2016 05:47 PM, Hong wrote: > Barry, > Your change indeed fixed the error of his testing code. > As Satish tested, on your branch, ex16 runs smooth. > > I do not understand why on maint or master branch, ex16 creases inside > superlu_dist, but not with mumps. > I also confirm that ex16 runs fine with latest fix, but unfortunately not my code. This is something to be expected, since my code preallocates once in the beginning. So there is no way it can be affected by multiple preallocations. Subsequently I only do matrix assembly, that makes sure structure doesn't change (set to get error otherwise). Summary: we don't have a simple test code to debug superlu issue anymore. Anton > Hong > > On Mon, Oct 24, 2016 at 9:34 AM, Satish Balay > wrote: > > On Mon, 24 Oct 2016, Barry Smith wrote: > > > > > > [Or perhaps Hong is using a different test code and is observing bugs > > > with superlu_dist interface..] > > > > She states that her test does a NEW MatCreate() for each > matrix load (I cut and pasted it in the email I just sent). The > bug I fixed was only related to using the SAME matrix from one > MatLoad() in another MatLoad(). > > Ah - ok.. Sorry - wasn't thinking clearly :( > > Satish > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Mon Oct 24 13:21:44 2016 From: hzhang at mcs.anl.gov (Hong) Date: Mon, 24 Oct 2016 13:21:44 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 failure of repeated calls to MatLoad() or MatMPIAIJSetPreallocation() with the same matrix In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> <0d72e78d-52af-1d9a-677f-167d2efbd8ab@uni-mainz.de> <6B14246C-CA55-4E5B-BF5C-F01C33DACCB8@mcs.anl.gov> <94801705-937C-4D8D-BDCE-3E1AEAA4BCB4@mcs.anl.gov> Message-ID: Anton : If replacing superlu_dist with mumps, does your code work? Hong > > On 10/24/2016 05:47 PM, Hong wrote: > > Barry, > Your change indeed fixed the error of his testing code. > As Satish tested, on your branch, ex16 runs smooth. > > I do not understand why on maint or master branch, ex16 creases inside > superlu_dist, but not with mumps. > > > I also confirm that ex16 runs fine with latest fix, but unfortunately not > my code. > > This is something to be expected, since my code preallocates once in the > beginning. So there is no way it can be affected by multiple > preallocations. Subsequently I only do matrix assembly, that makes sure > structure doesn't change (set to get error otherwise). > > Summary: we don't have a simple test code to debug superlu issue anymore. > > Anton > > Hong > > On Mon, Oct 24, 2016 at 9:34 AM, Satish Balay wrote: > >> On Mon, 24 Oct 2016, Barry Smith wrote: >> >> > >> > > [Or perhaps Hong is using a different test code and is observing bugs >> > > with superlu_dist interface..] >> > >> > She states that her test does a NEW MatCreate() for each matrix load >> (I cut and pasted it in the email I just sent). The bug I fixed was only >> related to using the SAME matrix from one MatLoad() in another MatLoad(). >> >> Ah - ok.. Sorry - wasn't thinking clearly :( >> >> Satish >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From popov at uni-mainz.de Mon Oct 24 13:43:12 2016 From: popov at uni-mainz.de (Anton) Date: Mon, 24 Oct 2016 20:43:12 +0200 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 failure of repeated calls to MatLoad() or MatMPIAIJSetPreallocation() with the same matrix In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> <0d72e78d-52af-1d9a-677f-167d2efbd8ab@uni-mainz.de> <6B14246C-CA55-4E5B-BF5C-F01C33DACCB8@mcs.anl.gov> <94801705-937C-4D8D-BDCE-3E1AEAA4BCB4@mcs.anl.gov> Message-ID: <953ef0f8-b2bf-c152-9a6a-9927f0f47a24@uni-mainz.de> On 10/24/16 8:21 PM, Hong wrote: > Anton : > If replacing superlu_dist with mumps, does your code work? yes > Hong > > On 10/24/2016 05:47 PM, Hong wrote: >> Barry, >> Your change indeed fixed the error of his testing code. >> As Satish tested, on your branch, ex16 runs smooth. >> >> I do not understand why on maint or master branch, ex16 creases >> inside superlu_dist, but not with mumps. >> > > I also confirm that ex16 runs fine with latest fix, but > unfortunately not my code. > > This is something to be expected, since my code preallocates once > in the beginning. So there is no way it can be affected by > multiple preallocations. Subsequently I only do matrix assembly, > that makes sure structure doesn't change (set to get error otherwise). > > Summary: we don't have a simple test code to debug superlu issue > anymore. > > Anton > >> Hong >> >> On Mon, Oct 24, 2016 at 9:34 AM, Satish Balay > > wrote: >> >> On Mon, 24 Oct 2016, Barry Smith wrote: >> >> > >> > > [Or perhaps Hong is using a different test code and is observing bugs >> > > with superlu_dist interface..] >> > >> > She states that her test does a NEW MatCreate() for each >> matrix load (I cut and pasted it in the email I just sent). >> The bug I fixed was only related to using the SAME matrix >> from one MatLoad() in another MatLoad(). >> >> Ah - ok.. Sorry - wasn't thinking clearly :( >> >> Satish >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Mon Oct 24 14:06:48 2016 From: hzhang at mcs.anl.gov (Hong) Date: Mon, 24 Oct 2016 14:06:48 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 failure of repeated calls to MatLoad() or MatMPIAIJSetPreallocation() with the same matrix In-Reply-To: <953ef0f8-b2bf-c152-9a6a-9927f0f47a24@uni-mainz.de> References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> <0d72e78d-52af-1d9a-677f-167d2efbd8ab@uni-mainz.de> <6B14246C-CA55-4E5B-BF5C-F01C33DACCB8@mcs.anl.gov> <94801705-937C-4D8D-BDCE-3E1AEAA4BCB4@mcs.anl.gov> <953ef0f8-b2bf-c152-9a6a-9927f0f47a24@uni-mainz.de> Message-ID: Anton: > > If replacing superlu_dist with mumps, does your code work? > > yes > You may use mumps in your code, or tests different options for superlu_dist: -mat_superlu_dist_equil: Equilibrate matrix (None) -mat_superlu_dist_rowperm Row permutation (choose one of) LargeDiag NATURAL (None) -mat_superlu_dist_colperm Column permutation (choose one of) NATURAL MMD_AT_PLUS_A MMD_ATA METIS_AT_PLUS_A PARMETIS (None) -mat_superlu_dist_replacetinypivot: Replace tiny pivots (None) -mat_superlu_dist_parsymbfact: Parallel symbolic factorization (None) -mat_superlu_dist_fact Sparsity pattern for repeated matrix factorization (choose one of) SamePattern SamePattern_SameRowPerm (None) The options inside <> are defaults. You may try others. This might help narrow down the bug. Hong > > Hong >> >> On 10/24/2016 05:47 PM, Hong wrote: >> >> Barry, >> Your change indeed fixed the error of his testing code. >> As Satish tested, on your branch, ex16 runs smooth. >> >> I do not understand why on maint or master branch, ex16 creases inside >> superlu_dist, but not with mumps. >> >> >> I also confirm that ex16 runs fine with latest fix, but unfortunately not >> my code. >> >> This is something to be expected, since my code preallocates once in the >> beginning. So there is no way it can be affected by multiple >> preallocations. Subsequently I only do matrix assembly, that makes sure >> structure doesn't change (set to get error otherwise). >> >> Summary: we don't have a simple test code to debug superlu issue anymore. >> >> Anton >> >> Hong >> >> On Mon, Oct 24, 2016 at 9:34 AM, Satish Balay wrote: >> >>> On Mon, 24 Oct 2016, Barry Smith wrote: >>> >>> > >>> > > [Or perhaps Hong is using a different test code and is observing bugs >>> > > with superlu_dist interface..] >>> > >>> > She states that her test does a NEW MatCreate() for each matrix >>> load (I cut and pasted it in the email I just sent). The bug I fixed was >>> only related to using the SAME matrix from one MatLoad() in another >>> MatLoad(). >>> >>> Ah - ok.. Sorry - wasn't thinking clearly :( >>> >>> Satish >>> >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gideon.simpson at gmail.com Mon Oct 24 14:22:12 2016 From: gideon.simpson at gmail.com (Gideon Simpson) Date: Mon, 24 Oct 2016 15:22:12 -0400 Subject: [petsc-users] question Message-ID: <02920882-D428-47FB-A04A-CDBFFC7BDF74@gmail.com> I notice that if I use -snes_view, I see lines like: total number of linear solver iterations=20 total number of function evaluations=5 Just to clarify, the number of "function evaluations" corresponds to the number of Newton (or Newton like) steps, and the total "number of linear solver iterations? is the total number of iterations needed to solve the linear problem at each Newton iteration. Is that correct? So in the above, there are 5 steps of Newton and a total of 20 iterations of the linear solver across all 5 Newton steps. -gideon From jed at jedbrown.org Mon Oct 24 14:59:03 2016 From: jed at jedbrown.org (Jed Brown) Date: Mon, 24 Oct 2016 13:59:03 -0600 Subject: [petsc-users] question In-Reply-To: <02920882-D428-47FB-A04A-CDBFFC7BDF74@gmail.com> References: <02920882-D428-47FB-A04A-CDBFFC7BDF74@gmail.com> Message-ID: <871sz5o6wo.fsf@jedbrown.org> Gideon Simpson writes: > I notice that if I use -snes_view, > > I see lines like: > total number of linear solver iterations=20 > total number of function evaluations=5 > Just to clarify, the number of "function evaluations" corresponds to the number of Newton (or Newton like) steps, and the total "number of linear solver iterations? is the total number of iterations needed to solve the linear problem at each Newton iteration. Is that correct? So in the above, there are 5 steps of Newton and a total of 20 iterations of the linear solver across all 5 Newton steps. Usually there is one final residual evaluation to declare convergence. Also, if you activated a line search, the residual would have been evaluated more than once per Newton step. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From gideon.simpson at gmail.com Mon Oct 24 15:00:45 2016 From: gideon.simpson at gmail.com (Gideon Simpson) Date: Mon, 24 Oct 2016 16:00:45 -0400 Subject: [petsc-users] question In-Reply-To: <871sz5o6wo.fsf@jedbrown.org> References: <02920882-D428-47FB-A04A-CDBFFC7BDF74@gmail.com> <871sz5o6wo.fsf@jedbrown.org> Message-ID: <59F98D6C-77DC-45DA-9DF8-5E40D6FF4DE5@gmail.com> Ok, so if I?m doing the default Newton Line Search, how would I interpret the 5 and the 20, vis a vis what I would be doing with pencil and paper? -gideon > On Oct 24, 2016, at 3:59 PM, Jed Brown wrote: > > Gideon Simpson writes: > >> I notice that if I use -snes_view, >> >> I see lines like: >> total number of linear solver iterations=20 >> total number of function evaluations=5 >> Just to clarify, the number of "function evaluations" corresponds to the number of Newton (or Newton like) steps, and the total "number of linear solver iterations? is the total number of iterations needed to solve the linear problem at each Newton iteration. Is that correct? So in the above, there are 5 steps of Newton and a total of 20 iterations of the linear solver across all 5 Newton steps. > > Usually there is one final residual evaluation to declare convergence. > > Also, if you activated a line search, the residual would have been > evaluated more than once per Newton step. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Mon Oct 24 15:01:08 2016 From: jychang48 at gmail.com (Justin Chang) Date: Mon, 24 Oct 2016 15:01:08 -0500 Subject: [petsc-users] question In-Reply-To: References: <02920882-D428-47FB-A04A-CDBFFC7BDF74@gmail.com> Message-ID: Sorry forgot to hit reply all On Monday, October 24, 2016, Justin Chang wrote: > It depends on your SNES solver. A SNES iteration could involve more than > one function evaluation (e.g., line searching). Also, -snes_monitor may say > 3 iterations whereas -snes_view might indicate 4 function evaluations which > could suggest that the first call was for computing the initial residual. > > On Mon, Oct 24, 2016 at 2:22 PM, Gideon Simpson > wrote: > >> I notice that if I use -snes_view, >> >> I see lines like: >> total number of linear solver iterations=20 >> total number of function evaluations=5 >> Just to clarify, the number of "function evaluations" corresponds to the >> number of Newton (or Newton like) steps, and the total "number of linear >> solver iterations? is the total number of iterations needed to solve the >> linear problem at each Newton iteration. Is that correct? So in the >> above, there are 5 steps of Newton and a total of 20 iterations of the >> linear solver across all 5 Newton steps. >> >> -gideon >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fande.kong at inl.gov Mon Oct 24 15:02:59 2016 From: fande.kong at inl.gov (Kong, Fande) Date: Mon, 24 Oct 2016 14:02:59 -0600 Subject: [petsc-users] question In-Reply-To: References: <02920882-D428-47FB-A04A-CDBFFC7BDF74@gmail.com> Message-ID: If you are using the matrix-free method, the number of function evaluations is way more than the number of Newton iterations. Fande, On Mon, Oct 24, 2016 at 2:01 PM, Justin Chang wrote: > Sorry forgot to hit reply all > > On Monday, October 24, 2016, Justin Chang wrote: > >> It depends on your SNES solver. A SNES iteration could involve more than >> one function evaluation (e.g., line searching). Also, -snes_monitor may say >> 3 iterations whereas -snes_view might indicate 4 function evaluations which >> could suggest that the first call was for computing the initial residual. >> >> On Mon, Oct 24, 2016 at 2:22 PM, Gideon Simpson > > wrote: >> >>> I notice that if I use -snes_view, >>> >>> I see lines like: >>> total number of linear solver iterations=20 >>> total number of function evaluations=5 >>> Just to clarify, the number of "function evaluations" corresponds to the >>> number of Newton (or Newton like) steps, and the total "number of linear >>> solver iterations? is the total number of iterations needed to solve the >>> linear problem at each Newton iteration. Is that correct? So in the >>> above, there are 5 steps of Newton and a total of 20 iterations of the >>> linear solver across all 5 Newton steps. >>> >>> -gideon >>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From gideon.simpson at gmail.com Mon Oct 24 15:05:27 2016 From: gideon.simpson at gmail.com (Gideon Simpson) Date: Mon, 24 Oct 2016 16:05:27 -0400 Subject: [petsc-users] question In-Reply-To: References: <02920882-D428-47FB-A04A-CDBFFC7BDF74@gmail.com> Message-ID: <4F95883E-04B0-4F93-88A1-0B3678B55DF4@gmail.com> Suppose I?m specifying the Jacobian. -gideon > On Oct 24, 2016, at 4:02 PM, Kong, Fande wrote: > > If you are using the matrix-free method, the number of function evaluations is way more than the number of Newton iterations. > > Fande, > > On Mon, Oct 24, 2016 at 2:01 PM, Justin Chang > wrote: > Sorry forgot to hit reply all > > On Monday, October 24, 2016, Justin Chang > wrote: > It depends on your SNES solver. A SNES iteration could involve more than one function evaluation (e.g., line searching). Also, -snes_monitor may say 3 iterations whereas -snes_view might indicate 4 function evaluations which could suggest that the first call was for computing the initial residual. > > On Mon, Oct 24, 2016 at 2:22 PM, Gideon Simpson > wrote: > I notice that if I use -snes_view, > > I see lines like: > total number of linear solver iterations=20 > total number of function evaluations=5 > Just to clarify, the number of "function evaluations" corresponds to the number of Newton (or Newton like) steps, and the total "number of linear solver iterations? is the total number of iterations needed to solve the linear problem at each Newton iteration. Is that correct? So in the above, there are 5 steps of Newton and a total of 20 iterations of the linear solver across all 5 Newton steps. > > -gideon > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Oct 24 15:11:47 2016 From: jed at jedbrown.org (Jed Brown) Date: Mon, 24 Oct 2016 14:11:47 -0600 Subject: [petsc-users] question In-Reply-To: <59F98D6C-77DC-45DA-9DF8-5E40D6FF4DE5@gmail.com> References: <02920882-D428-47FB-A04A-CDBFFC7BDF74@gmail.com> <871sz5o6wo.fsf@jedbrown.org> <59F98D6C-77DC-45DA-9DF8-5E40D6FF4DE5@gmail.com> Message-ID: <87y41dmrr0.fsf@jedbrown.org> Gideon Simpson writes: > Ok, so if I?m doing the default Newton Line Search, how would I interpret the 5 and the 20, vis a vis what I would be doing with pencil and paper? I don't know what you're doing with pencil and paper. It's just counting the number of residual evaluations and solver iterations (Jacobian and preconditioner application). Use -snes_monitor -snes_linesearch_monitor -ksp_monitor for the details. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From gideon.simpson at gmail.com Mon Oct 24 15:13:54 2016 From: gideon.simpson at gmail.com (Gideon Simpson) Date: Mon, 24 Oct 2016 16:13:54 -0400 Subject: [petsc-users] question In-Reply-To: <87y41dmrr0.fsf@jedbrown.org> References: <02920882-D428-47FB-A04A-CDBFFC7BDF74@gmail.com> <871sz5o6wo.fsf@jedbrown.org> <59F98D6C-77DC-45DA-9DF8-5E40D6FF4DE5@gmail.com> <87y41dmrr0.fsf@jedbrown.org> Message-ID: <1AFF7C28-69A8-4F91-B1C0-ED78E3C23D73@gmail.com> I just mean that if I were working a Newton iteration by hand, i.e., x_{n+1} = x_n - J^{-1} F(x_n), I?d be able to count the number of Newton iterations. I?m trying to see how that count would relate to the numbers reported by snes_view. I?m guessing that -snes_monitor is giving a more consistent count of this? -gideon > On Oct 24, 2016, at 4:11 PM, Jed Brown wrote: > > Gideon Simpson writes: > >> Ok, so if I?m doing the default Newton Line Search, how would I interpret the 5 and the 20, vis a vis what I would be doing with pencil and paper? > > I don't know what you're doing with pencil and paper. It's just > counting the number of residual evaluations and solver iterations > (Jacobian and preconditioner application). Use -snes_monitor > -snes_linesearch_monitor -ksp_monitor for the details. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fande.kong at inl.gov Mon Oct 24 15:17:41 2016 From: fande.kong at inl.gov (Kong, Fande) Date: Mon, 24 Oct 2016 14:17:41 -0600 Subject: [petsc-users] question In-Reply-To: <1AFF7C28-69A8-4F91-B1C0-ED78E3C23D73@gmail.com> References: <02920882-D428-47FB-A04A-CDBFFC7BDF74@gmail.com> <871sz5o6wo.fsf@jedbrown.org> <59F98D6C-77DC-45DA-9DF8-5E40D6FF4DE5@gmail.com> <87y41dmrr0.fsf@jedbrown.org> <1AFF7C28-69A8-4F91-B1C0-ED78E3C23D73@gmail.com> Message-ID: Using -snes_linesearch_type basic to turn off the line search, you will see that the number of function evaluations is the same as the number of Newton iterations. Fande, On Mon, Oct 24, 2016 at 2:13 PM, Gideon Simpson wrote: > I just mean that if I were working a Newton iteration by hand, i.e., > > x_{n+1} = x_n - J^{-1} F(x_n), > > I?d be able to count the number of Newton iterations. I?m trying to see > how that count would relate to the numbers reported by snes_view. I?m > guessing that -snes_monitor is giving a more consistent count of this? > > > -gideon > > On Oct 24, 2016, at 4:11 PM, Jed Brown wrote: > > Gideon Simpson writes: > > Ok, so if I?m doing the default Newton Line Search, how would I interpret > the 5 and the 20, vis a vis what I would be doing with pencil and paper? > > > I don't know what you're doing with pencil and paper. It's just > counting the number of residual evaluations and solver iterations > (Jacobian and preconditioner application). Use -snes_monitor > -snes_linesearch_monitor -ksp_monitor for the details. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Oct 24 15:32:11 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 24 Oct 2016 15:32:11 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 failure of repeated calls to MatLoad() or MatMPIAIJSetPreallocation() with the same matrix In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <89fa6a18-5869-8b91-10b7-99b1c272e654@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> <0d72e78d-52af-1d9a-677f-167d2efbd8ab@uni-mainz.de> <6B14246C-CA55-4E5B-BF5C-F01C33DACCB8@mcs.anl.gov> <94801705-937C-4D8D-BDCE-3E1AEAA4BCB4@mcs.anl.gov> Message-ID: <0AE27A19-1A1D-46F1-A642-93BB47F3D181@mcs.anl.gov> Valgrind doesn't report any problems? > On Oct 24, 2016, at 12:09 PM, Anton Popov wrote: > > > > On 10/24/2016 05:47 PM, Hong wrote: >> Barry, >> Your change indeed fixed the error of his testing code. >> As Satish tested, on your branch, ex16 runs smooth. >> >> I do not understand why on maint or master branch, ex16 creases inside superlu_dist, but not with mumps. >> > > I also confirm that ex16 runs fine with latest fix, but unfortunately not my code. > > This is something to be expected, since my code preallocates once in the beginning. So there is no way it can be affected by multiple preallocations. Subsequently I only do matrix assembly, that makes sure structure doesn't change (set to get error otherwise). > > Summary: we don't have a simple test code to debug superlu issue anymore. > > Anton > >> Hong >> >> On Mon, Oct 24, 2016 at 9:34 AM, Satish Balay wrote: >> On Mon, 24 Oct 2016, Barry Smith wrote: >> >> > >> > > [Or perhaps Hong is using a different test code and is observing bugs >> > > with superlu_dist interface..] >> > >> > She states that her test does a NEW MatCreate() for each matrix load (I cut and pasted it in the email I just sent). The bug I fixed was only related to using the SAME matrix from one MatLoad() in another MatLoad(). >> >> Ah - ok.. Sorry - wasn't thinking clearly :( >> >> Satish >> > From jroman at dsic.upv.es Tue Oct 25 04:25:25 2016 From: jroman at dsic.upv.es (Jose E. Roman) Date: Tue, 25 Oct 2016 11:25:25 +0200 Subject: [petsc-users] BVNormColumn In-Reply-To: <4807F42C-75A7-4DE3-A605-A4BDE9CDF868@dsic.upv.es> References: <4807F42C-75A7-4DE3-A605-A4BDE9CDF868@dsic.upv.es> Message-ID: <647546F4-0365-4E11-89E2-2241C6F32939@dsic.upv.es> > El 19 oct 2016, a las 9:54, Jose E. Roman escribi?: > >> >> El 19 oct 2016, a las 0:26, Bikash Kanungo escribi?: >> >> Hi Jose, >> >> Thanks for the pointers. Here's what I observed on probing it further: >> >> ? The ||B - B^H|| norm was 1e-18. So I explicitly made it Hermitian by setting B = 0.5(B+B^H). However, this didn't help. >> ? Next, I checked for the conditioning of B by computing the ratio of the highest and lowest eigenvalues. The conditioning of the order 1e-9. >> ? I monitored the imaginary the imaginary part of VecDot(y,x, dotXY) where y = B*x and noted that only when the imaginary part is more than 1e-16 in magnitude, the error of "The inner product is not well defined" is flagged. For the first few iterations of orhtogonalization (i.e., the one where orthogonization is successful), the values of VecDot(y,x, dotXY) are all found to be lower than 1e-16. I guess this small imaginary part might be the cause of the error. >> Let me know if there is a way to bypass the abort by changing the tolerance for imaginary part. >> >> >> >> Regards, >> Bikash >> > > There is something wrong: the condition number is greater than 1 by definition, so it cannot be 1e-9. Anyway, maybe what happens is that your matrix has a very small norm. The SLEPc code needs a fix for the case when the norm of B or the norm of the vector x is very small. Please send the matrix to my personal email and I will make some tests. > > Jose I tested with your matrix and vector with two different machines, with different compilers, and in both cases the computation did not fail. The imaginary part is below the machine precision, as expected. I don't know why you are getting larger roundoff error. Anyway, the check that we currently have in SLEPc is too strict. You can try relaxing it, by editing function BV_SafeSqrt (in $SLEPC_DIR/include/slepc/private/bvimpl.h), for instance with this: if (PetscAbsReal(PetscImaginaryPart(alpha))>PETSC_MACHINE_EPSILON && PetscAbsReal(PetscImaginaryPart(alpha))/absal>100*PETSC_MACHINE_EPSILON) SETERRQ1(PetscObjectComm((PetscObject)bv),1,"The inner product is not well defined: nonzero imaginary part %g",PetscImaginaryPart(alpha)); Let us know if this works for you. Thanks. Jose From C.Klaij at marin.nl Tue Oct 25 06:29:26 2016 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Tue, 25 Oct 2016 11:29:26 +0000 Subject: [petsc-users] error with wrong tarball in path/to/package Message-ID: <1477394966280.53606@marin.nl> Here is a small complaint about the error message "unable to download" that is given when using --download-PACKAGENAME=/PATH/TO/package.tar.gz with the wrong tarball. For example, for my previous install, I was using petsc-3.5.3 with: --download-ml=/path/to/ml-6.2-win.tar.gz Using the same file with 3.7.4 gives this error message ******************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): ------------------------------------------------------------------------------- Unable to download ML Failed to download ML ******************************************************************************* My guess from ml.py is that I should now download: https://bitbucket.org/petsc/pkg-ml/get/v6.2-p4.tar.gz and that you are somehow checking that the file specified in the path matches this file (name, hash, ...)? If so, "unable to download" is a bit confusing, I wasted some time looking at the file system and the file:// protocol, and annoyed the sysadmins... May I suggest to replace the message with "wrong version" or something? Chris dr. ir. Christiaan Klaij | CFD Researcher | Research & Development MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl | http://www.marin.nl MARIN news: http://www.marin.nl/web/News/News-items/Workshop-Optimaliseren-is-ook-innoveren-15-november.htm From mono at dtu.dk Tue Oct 25 06:39:23 2016 From: mono at dtu.dk (=?iso-8859-1?Q?Morten_Nobel-J=F8rgensen?=) Date: Tue, 25 Oct 2016 11:39:23 +0000 Subject: [petsc-users] Element to local dof map using dmplex In-Reply-To: References: <6B03D347796DED499A2696FC095CE81A05B5E20A@ait-pex02mbx04.win.dtu.dk>, Message-ID: <6B03D347796DED499A2696FC095CE81A05B612CD@ait-pex02mbx04.win.dtu.dk> Dear Matt Did you (or anyone else) find time to look at our issue? We are really looking forward to your answer :) Kind regards, Morten ________________________________ From: Matthew Knepley [knepley at gmail.com] Sent: Wednesday, October 12, 2016 3:41 PM To: Morten Nobel-J?rgensen Cc: petsc-users at mcs.anl.gov Subject: Re: Element to local dof map using dmplex On Wed, Oct 12, 2016 at 6:40 AM, Morten Nobel-J?rgensen > wrote: Dear PETSc developers / Matt Thanks for your suggestions regarding our use of dmplex in a FEM context. However, Matt's advise on using the PetscFE is not sufficient for our needs (our end goal is a topology optimization framework - not just FEM) and we must honestly admit that we do not see how we can use the MATIS and the MatSetValuesClosure or DMPlexMatSetClosure to solve our current issues as Stefano has suggested. We have therefore created a more representative, yet heavily oversimplified, code example that demonstrates our problem. That is, the dof handling is only correct on a single process and goes wrong on np>1. We hope very much that you can help us to overcome our problem. Okay, I will look at it and try to rework it to fix your problem. I am in London this week, so it might take me until next week. Thanks, Matt Thank you for an excellent toolkit Morten and Niels -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From popov at uni-mainz.de Tue Oct 25 06:58:08 2016 From: popov at uni-mainz.de (Anton Popov) Date: Tue, 25 Oct 2016 13:58:08 +0200 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 failure of repeated calls to MatLoad() or MatMPIAIJSetPreallocation() with the same matrix In-Reply-To: <0AE27A19-1A1D-46F1-A642-93BB47F3D181@mcs.anl.gov> References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> <0d72e78d-52af-1d9a-677f-167d2efbd8ab@uni-mainz.de> <6B14246C-CA55-4E5B-BF5C-F01C33DACCB8@mcs.anl.gov> <94801705-937C-4D8D-BDCE-3E1AEAA4BCB4@mcs.anl.gov> <0AE27A19-1A1D-46F1-A642-93BB47F3D181@mcs.anl.gov> Message-ID: On 10/24/2016 10:32 PM, Barry Smith wrote: > Valgrind doesn't report any problems? > Valgrind hangs and never returns (waited hours for a 5 sec run) after entering factorization for the second time. >> On Oct 24, 2016, at 12:09 PM, Anton Popov wrote: >> >> >> >> On 10/24/2016 05:47 PM, Hong wrote: >>> Barry, >>> Your change indeed fixed the error of his testing code. >>> As Satish tested, on your branch, ex16 runs smooth. >>> >>> I do not understand why on maint or master branch, ex16 creases inside superlu_dist, but not with mumps. >>> >> I also confirm that ex16 runs fine with latest fix, but unfortunately not my code. >> >> This is something to be expected, since my code preallocates once in the beginning. So there is no way it can be affected by multiple preallocations. Subsequently I only do matrix assembly, that makes sure structure doesn't change (set to get error otherwise). >> >> Summary: we don't have a simple test code to debug superlu issue anymore. >> >> Anton >> >>> Hong >>> >>> On Mon, Oct 24, 2016 at 9:34 AM, Satish Balay wrote: >>> On Mon, 24 Oct 2016, Barry Smith wrote: >>> >>>>> [Or perhaps Hong is using a different test code and is observing bugs >>>>> with superlu_dist interface..] >>>> She states that her test does a NEW MatCreate() for each matrix load (I cut and pasted it in the email I just sent). The bug I fixed was only related to using the SAME matrix from one MatLoad() in another MatLoad(). >>> Ah - ok.. Sorry - wasn't thinking clearly :( >>> >>> Satish >>> From popov at uni-mainz.de Tue Oct 25 07:06:13 2016 From: popov at uni-mainz.de (Anton Popov) Date: Tue, 25 Oct 2016 14:06:13 +0200 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 failure of repeated calls to MatLoad() or MatMPIAIJSetPreallocation() with the same matrix In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> <0d72e78d-52af-1d9a-677f-167d2efbd8ab@uni-mainz.de> <6B14246C-CA55-4E5B-BF5C-F01C33DACCB8@mcs.anl.gov> <94801705-937C-4D8D-BDCE-3E1AEAA4BCB4@mcs.anl.gov> <0AE27A19-1A1D-46F1-A642-93BB47F3D181@mcs.anl.gov> Message-ID: <91661c37-17ed-6b18-c226-468c27223f6b@uni-mainz.de> On 10/25/2016 01:58 PM, Anton Popov wrote: > > > On 10/24/2016 10:32 PM, Barry Smith wrote: >> Valgrind doesn't report any problems? >> > > Valgrind hangs and never returns (waited hours for a 5 sec run) after > entering factorization for the second time. Before it happens it prints this (attached) Anton > >>> On Oct 24, 2016, at 12:09 PM, Anton Popov wrote: >>> >>> >>> >>> On 10/24/2016 05:47 PM, Hong wrote: >>>> Barry, >>>> Your change indeed fixed the error of his testing code. >>>> As Satish tested, on your branch, ex16 runs smooth. >>>> >>>> I do not understand why on maint or master branch, ex16 creases >>>> inside superlu_dist, but not with mumps. >>>> >>> I also confirm that ex16 runs fine with latest fix, but >>> unfortunately not my code. >>> >>> This is something to be expected, since my code preallocates once in >>> the beginning. So there is no way it can be affected by multiple >>> preallocations. Subsequently I only do matrix assembly, that makes >>> sure structure doesn't change (set to get error otherwise). >>> >>> Summary: we don't have a simple test code to debug superlu issue >>> anymore. >>> >>> Anton >>> >>>> Hong >>>> >>>> On Mon, Oct 24, 2016 at 9:34 AM, Satish Balay >>>> wrote: >>>> On Mon, 24 Oct 2016, Barry Smith wrote: >>>> >>>>>> [Or perhaps Hong is using a different test code and is observing >>>>>> bugs >>>>>> with superlu_dist interface..] >>>>> She states that her test does a NEW MatCreate() for each >>>>> matrix load (I cut and pasted it in the email I just sent). The >>>>> bug I fixed was only related to using the SAME matrix from one >>>>> MatLoad() in another MatLoad(). >>>> Ah - ok.. Sorry - wasn't thinking clearly :( >>>> >>>> Satish >>>> > -------------- next part -------------- A non-text attachment was scrubbed... Name: valgrind.log Type: text/x-log Size: 7684 bytes Desc: not available URL: From knepley at gmail.com Tue Oct 25 07:40:54 2016 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 25 Oct 2016 07:40:54 -0500 Subject: [petsc-users] Element to local dof map using dmplex In-Reply-To: <6B03D347796DED499A2696FC095CE81A05B612CD@ait-pex02mbx04.win.dtu.dk> References: <6B03D347796DED499A2696FC095CE81A05B5E20A@ait-pex02mbx04.win.dtu.dk> <6B03D347796DED499A2696FC095CE81A05B612CD@ait-pex02mbx04.win.dtu.dk> Message-ID: On Tue, Oct 25, 2016 at 6:39 AM, Morten Nobel-J?rgensen wrote: > Dear Matt > > Did you (or anyone else) find time to look at our issue? > > We are really looking forward to your answer :) > Yes, I had a little difficulty understanding what was going on, but now I think I see. I am attaching my modified ex19.cc. Please look at the sections marked with 'MGK'. The largest change is that I think you can dispense with your matrix data structure, and just call DMPlexVecGetValuesClosure (for coordinates) and DMPlexMatSetValuesClosure (for element matrices). I did not understand what you needed to modify for ExodusII. Thanks, Matt > Kind regards, > Morten > ------------------------------ > *From:* Matthew Knepley [knepley at gmail.com] > *Sent:* Wednesday, October 12, 2016 3:41 PM > *To:* Morten Nobel-J?rgensen > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: Element to local dof map using dmplex > > On Wed, Oct 12, 2016 at 6:40 AM, Morten Nobel-J?rgensen > wrote: > >> Dear PETSc developers / Matt >> >> Thanks for your suggestions regarding our use of dmplex in a FEM context. >> However, Matt's advise on using the PetscFE is not sufficient for our >> needs (our end goal is a topology optimization framework - not just FEM) >> and we must honestly admit that we do not see how we can use the MATIS and >> the MatSetValuesClosure or DMPlexMatSetClosure to solve our current issues >> as Stefano has suggested. >> >> We have therefore created a more representative, yet heavily >> oversimplified, code example that demonstrates our problem. That is, the >> dof handling is only correct on a single process and goes wrong on np>1. >> >> We hope very much that you can help us to overcome our problem. >> > > Okay, I will look at it and try to rework it to fix your problem. > > I am in London this week, so it might take me until next week. > > Thanks, > > Matt > > >> Thank you for an excellent toolkit >> Morten and Niels >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex19.cc Type: application/octet-stream Size: 28428 bytes Desc: not available URL: From popov at uni-mainz.de Tue Oct 25 08:20:39 2016 From: popov at uni-mainz.de (Anton Popov) Date: Tue, 25 Oct 2016 15:20:39 +0200 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> <0d72e78d-52af-1d9a-677f-167d2efbd8ab@uni-mainz.de> <6B14246C-CA55-4E5B-BF5C-F01C33DACCB8@mcs.anl.gov> <94801705-937C-4D8D-BDCE-3E1AEAA4BCB4@mcs.anl.gov> <953ef0f8-b2bf-c152-9a6a-9927f0f47a24@uni-mainz.de> Message-ID: <4292667a-fda6-dafd-c8f6-44e8130f042a@uni-mainz.de> Hong, I get all the problems gone and valgrind-clean output if I specify this: -mat_superlu_dist_fact SamePattern_SameRowPerm What does SamePattern_SameRowPerm actually mean? Row permutations are for large diagonal, column permutations are for sparsity, right? Will it skip subsequent matrix permutations for large diagonal even if matrix values change significantly? Surprisingly everything works even with: -mat_superlu_dist_colperm PARMETIS -mat_superlu_dist_parsymbfact TRUE Thanks, Anton On 10/24/2016 09:06 PM, Hong wrote: > Anton: > >> If replacing superlu_dist with mumps, does your code work? > yes > > You may use mumps in your code, or tests different options for > superlu_dist: > > -mat_superlu_dist_equil: Equilibrate matrix (None) > -mat_superlu_dist_rowperm Row permutation (choose one > of) LargeDiag NATURAL (None) > -mat_superlu_dist_colperm Column permutation > (choose one of) NATURAL MMD_AT_PLUS_A MMD_ATA METIS_AT_PLUS_A PARMETIS > (None) > -mat_superlu_dist_replacetinypivot: Replace tiny pivots (None) > -mat_superlu_dist_parsymbfact: Parallel symbolic > factorization (None) > -mat_superlu_dist_fact Sparsity pattern for repeated > matrix factorization (choose one of) SamePattern > SamePattern_SameRowPerm (None) > > The options inside <> are defaults. You may try others. This might > help narrow down the bug. > > Hong > > >> Hong >> >> On 10/24/2016 05:47 PM, Hong wrote: >>> Barry, >>> Your change indeed fixed the error of his testing code. >>> As Satish tested, on your branch, ex16 runs smooth. >>> >>> I do not understand why on maint or master branch, ex16 >>> creases inside superlu_dist, but not with mumps. >>> >> >> I also confirm that ex16 runs fine with latest fix, but >> unfortunately not my code. >> >> This is something to be expected, since my code preallocates >> once in the beginning. So there is no way it can be affected >> by multiple preallocations. Subsequently I only do matrix >> assembly, that makes sure structure doesn't change (set to >> get error otherwise). >> >> Summary: we don't have a simple test code to debug superlu >> issue anymore. >> >> Anton >> >>> Hong >>> >>> On Mon, Oct 24, 2016 at 9:34 AM, Satish Balay >>> > wrote: >>> >>> On Mon, 24 Oct 2016, Barry Smith wrote: >>> >>> > >>> > > [Or perhaps Hong is using a different test code and is >>> observing bugs >>> > > with superlu_dist interface..] >>> > >>> > She states that her test does a NEW MatCreate() for >>> each matrix load (I cut and pasted it in the email I >>> just sent). The bug I fixed was only related to using >>> the SAME matrix from one MatLoad() in another MatLoad(). >>> >>> Ah - ok.. Sorry - wasn't thinking clearly :( >>> >>> Satish >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue Oct 25 09:01:03 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 25 Oct 2016 09:01:03 -0500 Subject: [petsc-users] error with wrong tarball in path/to/package In-Reply-To: <1477394966280.53606@marin.nl> References: <1477394966280.53606@marin.nl> Message-ID: Always look in configure.log to see the exact error. No - configure does not do checksums - but it expcects the package to be in a certain format [and this can change between petsc versions]. So if you are using a url from petsc-3.5 - with 3.7 -- it might not work.. So for any version of petsc - its always best to use the default externalpacakge URLs [for some externalpacakges - different versions might work - but usually thats not tested]. configure attempts to determine the exact error - and attempts to print appropriate message - but thats not always possible to figureout - so its best to check configure.log to see the exact issue.. Note: to print the 'wrong version message' - it needs to know & keep track of the previous versions [and formats - if any] - and thats not easy.. All it can do is check for - current version/format is found or not.. Satish On Tue, 25 Oct 2016, Klaij, Christiaan wrote: > > Here is a small complaint about the error message "unable to > download" that is given when using > --download-PACKAGENAME=/PATH/TO/package.tar.gz with the wrong > tarball. For example, for my previous install, I was using > petsc-3.5.3 with: > > --download-ml=/path/to/ml-6.2-win.tar.gz > > Using the same file with 3.7.4 gives this error message > > ******************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > ------------------------------------------------------------------------------- > Unable to download ML > Failed to download ML > ******************************************************************************* > > My guess from ml.py is that I should now download: > > https://bitbucket.org/petsc/pkg-ml/get/v6.2-p4.tar.gz > > and that you are somehow checking that the file specified in the > path matches this file (name, hash, ...)? > > If so, "unable to download" is a bit confusing, I wasted some > time looking at the file system and the file:// protocol, and > annoyed the sysadmins... May I suggest to replace the message > with "wrong version" or something? > > Chris > > > dr. ir. Christiaan Klaij | CFD Researcher | Research & Development > MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl | http://www.marin.nl > > MARIN news: http://www.marin.nl/web/News/News-items/Workshop-Optimaliseren-is-ook-innoveren-15-november.htm > > From C.Klaij at marin.nl Tue Oct 25 09:41:46 2016 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Tue, 25 Oct 2016 14:41:46 +0000 Subject: [petsc-users] error with wrong tarball in path/to/package In-Reply-To: References: <1477394966280.53606@marin.nl>, Message-ID: <1477406505997.39412@marin.nl> Satish, Fair enough, thanks for explaining. As far as I can tell the configure log (attached) gives the same error. If the current version/format is not found, why not just say so in the error message? Saying "unable to download" suggests something's wrong with the internet connection, or file path. Chris ________________________________________ From: Satish Balay Sent: Tuesday, October 25, 2016 4:01 PM To: Klaij, Christiaan Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] error with wrong tarball in path/to/package Always look in configure.log to see the exact error. No - configure does not do checksums - but it expcects the package to be in a certain format [and this can change between petsc versions]. So if you are using a url from petsc-3.5 - with 3.7 -- it might not work.. So for any version of petsc - its always best to use the default externalpacakge URLs [for some externalpacakges - different versions might work - but usually thats not tested]. configure attempts to determine the exact error - and attempts to print appropriate message - but thats not always possible to figureout - so its best to check configure.log to see the exact issue.. Note: to print the 'wrong version message' - it needs to know & keep track of the previous versions [and formats - if any] - and thats not easy.. All it can do is check for - current version/format is found or not.. Satish On Tue, 25 Oct 2016, Klaij, Christiaan wrote: > > Here is a small complaint about the error message "unable to > download" that is given when using > --download-PACKAGENAME=/PATH/TO/package.tar.gz with the wrong > tarball. For example, for my previous install, I was using > petsc-3.5.3 with: > > --download-ml=/path/to/ml-6.2-win.tar.gz > > Using the same file with 3.7.4 gives this error message > > ******************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > ------------------------------------------------------------------------------- > Unable to download ML > Failed to download ML > ******************************************************************************* > > My guess from ml.py is that I should now download: > > https://bitbucket.org/petsc/pkg-ml/get/v6.2-p4.tar.gz > > and that you are somehow checking that the file specified in the > path matches this file (name, hash, ...)? > > If so, "unable to download" is a bit confusing, I wasted some > time looking at the file system and the file:// protocol, and > annoyed the sysadmins... May I suggest to replace the message > with "wrong version" or something? > > Chris > > > dr. ir. Christiaan Klaij | CFD Researcher | Research & Development > MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl | http://www.marin.nl > > MARIN news: http://www.marin.nl/web/News/News-items/Workshop-Optimaliseren-is-ook-innoveren-15-november.htm > > -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 2151274 bytes Desc: configure.log URL: From balay at mcs.anl.gov Tue Oct 25 09:51:17 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 25 Oct 2016 09:51:17 -0500 Subject: [petsc-users] error with wrong tarball in path/to/package In-Reply-To: <1477406505997.39412@marin.nl> References: <1477394966280.53606@marin.nl>, <1477406505997.39412@marin.nl> Message-ID: >>> Looking for ML at git.ml, hg.ml or a directory starting with petsc-pkg-ml Could not locate an existing copy of ML: ['ml-6.2'] <<< So configure was looking for something with 'petsc-pkg-ml' - and it did not find it. [there was 'ml-6.2' - but configure doesn't know what it is..] Yeah the message could be more fine-grained - will chek. Satish On Tue, 25 Oct 2016, Klaij, Christiaan wrote: > Satish, > > Fair enough, thanks for explaining. > > As far as I can tell the configure log (attached) gives the same error. > > If the current version/format is not found, why not just say so > in the error message? Saying "unable to download" suggests > something's wrong with the internet connection, or file path. > > Chris > ________________________________________ > From: Satish Balay > Sent: Tuesday, October 25, 2016 4:01 PM > To: Klaij, Christiaan > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] error with wrong tarball in path/to/package > > Always look in configure.log to see the exact error. > > No - configure does not do checksums - but it expcects the package to > be in a certain format [and this can change between petsc versions]. > So if you are using a url from petsc-3.5 - with 3.7 -- it might not > work.. > > So for any version of petsc - its always best to use the default > externalpacakge URLs [for some externalpacakges - different versions > might work - but usually thats not tested]. > > configure attempts to determine the exact error - and attempts to > print appropriate message - but thats not always possible to figureout > - so its best to check configure.log to see the exact issue.. > > Note: to print the 'wrong version message' - it needs to know & keep > track of the previous versions [and formats - if any] - and thats not > easy.. All it can do is check for - current version/format is found or > not.. > > Satish > > On Tue, 25 Oct 2016, Klaij, Christiaan wrote: > > > > > Here is a small complaint about the error message "unable to > > download" that is given when using > > --download-PACKAGENAME=/PATH/TO/package.tar.gz with the wrong > > tarball. For example, for my previous install, I was using > > petsc-3.5.3 with: > > > > --download-ml=/path/to/ml-6.2-win.tar.gz > > > > Using the same file with 3.7.4 gives this error message > > > > ******************************************************************************* > > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > > ------------------------------------------------------------------------------- > > Unable to download ML > > Failed to download ML > > ******************************************************************************* > > > > My guess from ml.py is that I should now download: > > > > https://bitbucket.org/petsc/pkg-ml/get/v6.2-p4.tar.gz > > > > and that you are somehow checking that the file specified in the > > path matches this file (name, hash, ...)? > > > > If so, "unable to download" is a bit confusing, I wasted some > > time looking at the file system and the file:// protocol, and > > annoyed the sysadmins... May I suggest to replace the message > > with "wrong version" or something? > > > > Chris > > > > > > dr. ir. Christiaan Klaij | CFD Researcher | Research & Development > > MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl | http://www.marin.nl > > > > MARIN news: http://www.marin.nl/web/News/News-items/Workshop-Optimaliseren-is-ook-innoveren-15-november.htm > > > > > > From hzhang at mcs.anl.gov Tue Oct 25 10:38:03 2016 From: hzhang at mcs.anl.gov (Hong) Date: Tue, 25 Oct 2016 10:38:03 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: <4292667a-fda6-dafd-c8f6-44e8130f042a@uni-mainz.de> References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> <0d72e78d-52af-1d9a-677f-167d2efbd8ab@uni-mainz.de> <6B14246C-CA55-4E5B-BF5C-F01C33DACCB8@mcs.anl.gov> <94801705-937C-4D8D-BDCE-3E1AEAA4BCB4@mcs.anl.gov> <953ef0f8-b2bf-c152-9a6a-9927f0f47a24@uni-mainz.de> <4292667a-fda6-dafd-c8f6-44e8130f042a@uni-mainz.de> Message-ID: Anton, I guess, when you reuse matrix and its symbolic factor with updated numerical values, superlu_dist requires this option. I'm cc'ing Sherry to confirm it. I'll check petsc/superlu-dist interface to set this flag for this case. Hong On Tue, Oct 25, 2016 at 8:20 AM, Anton Popov wrote: > Hong, > > I get all the problems gone and valgrind-clean output if I specify this: > > -mat_superlu_dist_fact SamePattern_SameRowPerm > What does SamePattern_SameRowPerm actually mean? > Row permutations are for large diagonal, column permutations are for > sparsity, right? > Will it skip subsequent matrix permutations for large diagonal even if > matrix values change significantly? > > Surprisingly everything works even with: > > -mat_superlu_dist_colperm PARMETIS > -mat_superlu_dist_parsymbfact TRUE > > Thanks, > Anton > > On 10/24/2016 09:06 PM, Hong wrote: > > Anton: >> >> If replacing superlu_dist with mumps, does your code work? >> >> yes >> > > You may use mumps in your code, or tests different options for > superlu_dist: > > -mat_superlu_dist_equil: Equilibrate matrix (None) > -mat_superlu_dist_rowperm Row permutation (choose one of) > LargeDiag NATURAL (None) > -mat_superlu_dist_colperm Column permutation (choose > one of) NATURAL MMD_AT_PLUS_A MMD_ATA METIS_AT_PLUS_A PARMETIS (None) > -mat_superlu_dist_replacetinypivot: Replace tiny pivots (None) > -mat_superlu_dist_parsymbfact: Parallel symbolic factorization > (None) > -mat_superlu_dist_fact Sparsity pattern for repeated > matrix factorization (choose one of) SamePattern SamePattern_SameRowPerm > (None) > > The options inside <> are defaults. You may try others. This might help > narrow down the bug. > > Hong > >> >> Hong >>> >>> On 10/24/2016 05:47 PM, Hong wrote: >>> >>> Barry, >>> Your change indeed fixed the error of his testing code. >>> As Satish tested, on your branch, ex16 runs smooth. >>> >>> I do not understand why on maint or master branch, ex16 creases inside >>> superlu_dist, but not with mumps. >>> >>> >>> I also confirm that ex16 runs fine with latest fix, but unfortunately >>> not my code. >>> >>> This is something to be expected, since my code preallocates once in the >>> beginning. So there is no way it can be affected by multiple >>> preallocations. Subsequently I only do matrix assembly, that makes sure >>> structure doesn't change (set to get error otherwise). >>> >>> Summary: we don't have a simple test code to debug superlu issue anymore. >>> >>> Anton >>> >>> Hong >>> >>> On Mon, Oct 24, 2016 at 9:34 AM, Satish Balay wrote: >>> >>>> On Mon, 24 Oct 2016, Barry Smith wrote: >>>> >>>> > >>>> > > [Or perhaps Hong is using a different test code and is observing >>>> bugs >>>> > > with superlu_dist interface..] >>>> > >>>> > She states that her test does a NEW MatCreate() for each matrix >>>> load (I cut and pasted it in the email I just sent). The bug I fixed was >>>> only related to using the SAME matrix from one MatLoad() in another >>>> MatLoad(). >>>> >>>> Ah - ok.. Sorry - wasn't thinking clearly :( >>>> >>>> Satish >>>> >>> >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abdullahasivas at gmail.com Tue Oct 25 10:54:13 2016 From: abdullahasivas at gmail.com (Abdullah Ali Sivas) Date: Tue, 25 Oct 2016 11:54:13 -0400 Subject: [petsc-users] Using PETSc solvers and preconditioners with mfem Message-ID: <74402424-3960-898b-7d36-4dd061c06c69@gmail.com> Hello, I want to use PETSc with mfem and I know that mfem people will figure out a way to do it in few months. But for now as a temporary solution I just thought of converting hypre PARCSR matrices (that is what mfem uses as linear solver package) into PETSc MPIAIJ matrices and I have a semi-working code with some bugs. Also my code is dauntingly slow and seems like not scaling. I have used /MatHYPRE_IJMatrixCopy /from myhp.c of PETSc and /hypre_ParCSRMatrixPrintIJ/ from par_csr_matrix.c of hypre as starting points. Before starting I checked whether there was anything done similar to this, I could not find anything. My question is, are you aware of such a conversion code (i.e. something like /hypre_ParCSRtoPETScMPIAIJ( hypre_ParCSRMatrix *matrix, //Mat *A/)? Thanks in advance, Abdullah Ali Sivas -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Oct 25 11:15:13 2016 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 25 Oct 2016 11:15:13 -0500 Subject: [petsc-users] Using PETSc solvers and preconditioners with mfem In-Reply-To: <74402424-3960-898b-7d36-4dd061c06c69@gmail.com> References: <74402424-3960-898b-7d36-4dd061c06c69@gmail.com> Message-ID: On Tue, Oct 25, 2016 at 10:54 AM, Abdullah Ali Sivas < abdullahasivas at gmail.com> wrote: > Hello, > > I want to use PETSc with mfem and I know that mfem people will figure out > a way to do it in few months. But for now as a temporary solution I just > thought of converting hypre PARCSR matrices (that is what mfem uses as > linear solver package) into PETSc MPIAIJ matrices and I have a semi-working > code with some bugs. Also my code is dauntingly slow and seems like not > scaling. I have used *MatHYPRE_IJMatrixCopy *from myhp.c of PETSc and > *hypre_ParCSRMatrixPrintIJ* from par_csr_matrix.c of hypre as starting > points. Before starting I checked whether there was anything done similar > to this, I could not find anything. > > My question is, are you aware of such a conversion code (i.e. something > like *hypre_ParCSRtoPETScMPIAIJ( hypre_ParCSRMatrix *matrix, **Mat *A*)? > No, but maybe Satish knows. Slow running times most likely come from lack of preallocation for the target matrix. Thanks, Matt > Thanks in advance, > Abdullah Ali Sivas > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Tue Oct 25 11:18:34 2016 From: hzhang at mcs.anl.gov (Hong) Date: Tue, 25 Oct 2016 11:18:34 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> <0d72e78d-52af-1d9a-677f-167d2efbd8ab@uni-mainz.de> <6B14246C-CA55-4E5B-BF5C-F01C33DACCB8@mcs.anl.gov> <94801705-937C-4D8D-BDCE-3E1AEAA4BCB4@mcs.anl.gov> <953ef0f8-b2bf-c152-9a6a-9927f0f47a24@uni-mainz.de> <4292667a-fda6-dafd-c8f6-44e8130f042a@uni-mainz.de> Message-ID: Sherry, We set '-mat_superlu_dist_fact SamePattern' as default in petsc/superlu_dist on 12/6/15 (see attached email below). However, Anton must set 'SamePattern_SameRowPerm' to avoid crash in his code. Checking http://crd-legacy.lbl.gov/~xiaoye/SuperLU/superlu_dist_code_html/pzgssvx___a_bglobal_8c.html I see detailed description on using SamePattern_SameRowPerm, which requires more from user than SamePattern. I guess these flags are used for efficiency. The library sets a default, then have users to switch for their own applications. The default setting should not cause crash. If crash occurs, give a meaningful error message would be help. Do you have suggestion how should we set default in petsc for this flag? Hong ------------------- Hong 12/7/15 to Danyang, petsc-maint, PETSc, Xiaoye Danyang : Adding '-mat_superlu_dist_fact SamePattern' fixed the problem. Below is how I figured it out. 1. Reading ex52f.F, I see '-superlu_default' = '-pc_factor_mat_solver_package superlu_dist', the later enables runtime options for other packages. I use superlu_dist-4.2 and superlu-4.1 for the tests below. ... 5. Using a_flow_check_1.bin, I am able to reproduce the error you reported: all packages give correct results except superlu_dist: ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package superlu_dist Norm of error 2.5970E-12 iterations 1 -->Test for matrix 168 Norm of error 1.3936E-01 iterations 34 -->Test for matrix 169 I guess the error might come from reuse of matrix factor. Replacing default -mat_superlu_dist_fact with -mat_superlu_dist_fact SamePattern, I get ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package superlu_dist -mat_superlu_dist_fact SamePattern Norm of error 2.5970E-12 iterations 1 -->Test for matrix 168 ... Sherry may tell you why SamePattern_SameRowPerm cause the difference here. Best on the above experiments, I would set following as default '-mat_superlu_diagpivotthresh 0.0' in petsc/superlu interface. '-mat_superlu_dist_fact SamePattern' in petsc/superlu_dist interface. Hong On Tue, Oct 25, 2016 at 10:38 AM, Hong wrote: > Anton, > I guess, when you reuse matrix and its symbolic factor with updated > numerical values, superlu_dist requires this option. I'm cc'ing Sherry to > confirm it. > > I'll check petsc/superlu-dist interface to set this flag for this case. > > Hong > > > On Tue, Oct 25, 2016 at 8:20 AM, Anton Popov wrote: > >> Hong, >> >> I get all the problems gone and valgrind-clean output if I specify this: >> >> -mat_superlu_dist_fact SamePattern_SameRowPerm >> What does SamePattern_SameRowPerm actually mean? >> Row permutations are for large diagonal, column permutations are for >> sparsity, right? >> Will it skip subsequent matrix permutations for large diagonal even if >> matrix values change significantly? >> >> Surprisingly everything works even with: >> >> -mat_superlu_dist_colperm PARMETIS >> -mat_superlu_dist_parsymbfact TRUE >> >> Thanks, >> Anton >> >> On 10/24/2016 09:06 PM, Hong wrote: >> >> Anton: >>> >>> If replacing superlu_dist with mumps, does your code work? >>> >>> yes >>> >> >> You may use mumps in your code, or tests different options for >> superlu_dist: >> >> -mat_superlu_dist_equil: Equilibrate matrix (None) >> -mat_superlu_dist_rowperm Row permutation (choose one of) >> LargeDiag NATURAL (None) >> -mat_superlu_dist_colperm Column permutation (choose >> one of) NATURAL MMD_AT_PLUS_A MMD_ATA METIS_AT_PLUS_A PARMETIS (None) >> -mat_superlu_dist_replacetinypivot: Replace tiny pivots (None) >> -mat_superlu_dist_parsymbfact: Parallel symbolic factorization >> (None) >> -mat_superlu_dist_fact Sparsity pattern for repeated >> matrix factorization (choose one of) SamePattern SamePattern_SameRowPerm >> (None) >> >> The options inside <> are defaults. You may try others. This might help >> narrow down the bug. >> >> Hong >> >>> >>> Hong >>>> >>>> On 10/24/2016 05:47 PM, Hong wrote: >>>> >>>> Barry, >>>> Your change indeed fixed the error of his testing code. >>>> As Satish tested, on your branch, ex16 runs smooth. >>>> >>>> I do not understand why on maint or master branch, ex16 creases inside >>>> superlu_dist, but not with mumps. >>>> >>>> >>>> I also confirm that ex16 runs fine with latest fix, but unfortunately >>>> not my code. >>>> >>>> This is something to be expected, since my code preallocates once in >>>> the beginning. So there is no way it can be affected by multiple >>>> preallocations. Subsequently I only do matrix assembly, that makes sure >>>> structure doesn't change (set to get error otherwise). >>>> >>>> Summary: we don't have a simple test code to debug superlu issue >>>> anymore. >>>> >>>> Anton >>>> >>>> Hong >>>> >>>> On Mon, Oct 24, 2016 at 9:34 AM, Satish Balay >>>> wrote: >>>> >>>>> On Mon, 24 Oct 2016, Barry Smith wrote: >>>>> >>>>> > >>>>> > > [Or perhaps Hong is using a different test code and is observing >>>>> bugs >>>>> > > with superlu_dist interface..] >>>>> > >>>>> > She states that her test does a NEW MatCreate() for each matrix >>>>> load (I cut and pasted it in the email I just sent). The bug I fixed was >>>>> only related to using the SAME matrix from one MatLoad() in another >>>>> MatLoad(). >>>>> >>>>> Ah - ok.. Sorry - wasn't thinking clearly :( >>>>> >>>>> Satish >>>>> >>>> >>>> >>>> >>> >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abdullahasivas at gmail.com Tue Oct 25 11:30:20 2016 From: abdullahasivas at gmail.com (Abdullah Ali Sivas) Date: Tue, 25 Oct 2016 12:30:20 -0400 Subject: [petsc-users] Using PETSc solvers and preconditioners with mfem In-Reply-To: References: <74402424-3960-898b-7d36-4dd061c06c69@gmail.com> Message-ID: I will check that. I am preallocating but it may be that I am not allocating big enough. I still have to figure out nuance differences between these formats to solve the bugs. I appreciate your answer and hope Satish knows it. Thank you, Abdullah Ali Sivas On 2016-10-25 12:15 PM, Matthew Knepley wrote: > On Tue, Oct 25, 2016 at 10:54 AM, Abdullah Ali Sivas > > wrote: > > Hello, > > I want to use PETSc with mfem and I know that mfem people will > figure out a way to do it in few months. But for now as a > temporary solution I just thought of converting hypre PARCSR > matrices (that is what mfem uses as linear solver package) into > PETSc MPIAIJ matrices and I have a semi-working code with some > bugs. Also my code is dauntingly slow and seems like not scaling. > I have used /MatHYPRE_IJMatrixCopy /from myhp.c of PETSc and > /hypre_ParCSRMatrixPrintIJ/ from par_csr_matrix.c of hypre as > starting points. Before starting I checked whether there was > anything done similar to this, I could not find anything. > > My question is, are you aware of such a conversion code (i.e. > something like /hypre_ParCSRtoPETScMPIAIJ( hypre_ParCSRMatrix > *matrix, //Mat *A/)? > > No, but maybe Satish knows. Slow running times most likely come from > lack of preallocation for the target matrix. > > Thanks, > > Matt > > Thanks in advance, > Abdullah Ali Sivas > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From juan at tf.uni-kiel.de Tue Oct 25 11:53:27 2016 From: juan at tf.uni-kiel.de (Julian Andrej) Date: Tue, 25 Oct 2016 18:53:27 +0200 Subject: [petsc-users] Using PETSc solvers and preconditioners with mfem In-Reply-To: References: <74402424-3960-898b-7d36-4dd061c06c69@gmail.com> Message-ID: We tried a similar approach to get MFEM objects to PETSc and the real problem is that all other convenient functions like creating gridfunctions and projections, you have to convert them every time which is basically nothing more than a bad workaround. Stefano Zampini replied to the issue on github and was stating that there is some intention to get MFEM working with PETSc but there is no specific timeframe. Regards Julian Andrej On Tue, Oct 25, 2016 at 6:30 PM, Abdullah Ali Sivas wrote: > I will check that. I am preallocating but it may be that I am not allocating > big enough. I still have to figure out nuance differences between these > formats to solve the bugs. I appreciate your answer and hope Satish knows > it. > > Thank you, > Abdullah Ali Sivas > > > On 2016-10-25 12:15 PM, Matthew Knepley wrote: > > On Tue, Oct 25, 2016 at 10:54 AM, Abdullah Ali Sivas > wrote: >> >> Hello, >> >> I want to use PETSc with mfem and I know that mfem people will figure out >> a way to do it in few months. But for now as a temporary solution I just >> thought of converting hypre PARCSR matrices (that is what mfem uses as >> linear solver package) into PETSc MPIAIJ matrices and I have a semi-working >> code with some bugs. Also my code is dauntingly slow and seems like not >> scaling. I have used MatHYPRE_IJMatrixCopy from myhp.c of PETSc and >> hypre_ParCSRMatrixPrintIJ from par_csr_matrix.c of hypre as starting points. >> Before starting I checked whether there was anything done similar to this, I >> could not find anything. >> >> My question is, are you aware of such a conversion code (i.e. something >> like hypre_ParCSRtoPETScMPIAIJ( hypre_ParCSRMatrix *matrix, Mat *A)? > > No, but maybe Satish knows. Slow running times most likely come from lack of > preallocation for the target matrix. > > Thanks, > > Matt >> >> Thanks in advance, >> Abdullah Ali Sivas > > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > > From stefano.zampini at gmail.com Tue Oct 25 12:19:38 2016 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 25 Oct 2016 20:19:38 +0300 Subject: [petsc-users] Using PETSc solvers and preconditioners with mfem In-Reply-To: References: <74402424-3960-898b-7d36-4dd061c06c69@gmail.com> Message-ID: <0155CFFD-C904-4FCA-8951-3DCB5C65C6C7@gmail.com> I have a working conversion from HypreParCSR to PETSc MPIAIJ format. I could add this code to PETSc, maybe in the contrib folder. Barry, what do you think? > We tried a similar approach to get MFEM objects to PETSc and the real > problem is that all other convenient functions like creating > gridfunctions and projections, you have to convert them every time > which is basically nothing more than a bad workaround. So far, my interface covers matrices and Krylov solvers (PCFieldSplit and PCBDDC are explicitly supported). Can you tell me how would you like to use these objects with PETSc? What you would like to achieve? So far, my work on the PETSc interface to MFEM originated from a wishlist for solvers, but I could expand it. > Stefano Zampini > replied to the issue on github and was stating that there is some > intention to get MFEM working with PETSc but there is no specific > timeframe. > There?s currently an open pull request for PETSc solvers inside the private MFEM repo. However, I don?t know when the code, if merged, will be released. > Regards > Julian Andrej > > On Tue, Oct 25, 2016 at 6:30 PM, Abdullah Ali Sivas > wrote: >> I will check that. I am preallocating but it may be that I am not allocating >> big enough. I still have to figure out nuance differences between these >> formats to solve the bugs. I appreciate your answer and hope Satish knows >> it. >> >> Thank you, >> Abdullah Ali Sivas >> >> >> On 2016-10-25 12:15 PM, Matthew Knepley wrote: >> >> On Tue, Oct 25, 2016 at 10:54 AM, Abdullah Ali Sivas >> wrote: >>> >>> Hello, >>> >>> I want to use PETSc with mfem and I know that mfem people will figure out >>> a way to do it in few months. But for now as a temporary solution I just >>> thought of converting hypre PARCSR matrices (that is what mfem uses as >>> linear solver package) into PETSc MPIAIJ matrices and I have a semi-working >>> code with some bugs. Also my code is dauntingly slow and seems like not >>> scaling. I have used MatHYPRE_IJMatrixCopy from myhp.c of PETSc and >>> hypre_ParCSRMatrixPrintIJ from par_csr_matrix.c of hypre as starting points. >>> Before starting I checked whether there was anything done similar to this, I >>> could not find anything. >>> >>> My question is, are you aware of such a conversion code (i.e. something >>> like hypre_ParCSRtoPETScMPIAIJ( hypre_ParCSRMatrix *matrix, Mat *A)? >> >> No, but maybe Satish knows. Slow running times most likely come from lack of >> preallocation for the target matrix. >> >> Thanks, >> >> Matt >>> >>> Thanks in advance, >>> Abdullah Ali Sivas >> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments >> is infinitely more interesting than any results to which their experiments >> lead. >> -- Norbert Wiener >> >> From knepley at gmail.com Tue Oct 25 12:31:52 2016 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 25 Oct 2016 12:31:52 -0500 Subject: [petsc-users] Using PETSc solvers and preconditioners with mfem In-Reply-To: <0155CFFD-C904-4FCA-8951-3DCB5C65C6C7@gmail.com> References: <74402424-3960-898b-7d36-4dd061c06c69@gmail.com> <0155CFFD-C904-4FCA-8951-3DCB5C65C6C7@gmail.com> Message-ID: On Tue, Oct 25, 2016 at 12:19 PM, Stefano Zampini wrote: > I have a working conversion from HypreParCSR to PETSc MPIAIJ format. > I could add this code to PETSc, maybe in the contrib folder. Barry, what > do you think? > No, no one looks there. Add it to src/mat/utils and make an interface function like MatCreateFromHypreParCSR(). Thanks, Matt > > We tried a similar approach to get MFEM objects to PETSc and the real > > problem is that all other convenient functions like creating > > gridfunctions and projections, you have to convert them every time > > which is basically nothing more than a bad workaround. > > So far, my interface covers matrices and Krylov solvers (PCFieldSplit and > PCBDDC are explicitly supported). > > Can you tell me how would you like to use these objects with PETSc? What > you would like to achieve? > > So far, my work on the PETSc interface to MFEM originated from a wishlist > for solvers, but I could expand it. > > > > Stefano Zampini > > replied to the issue on github and was stating that there is some > > intention to get MFEM working with PETSc but there is no specific > > timeframe. > > > > There?s currently an open pull request for PETSc solvers inside the > private MFEM repo. > However, I don?t know when the code, if merged, will be released. > > > > Regards > > Julian Andrej > > > > On Tue, Oct 25, 2016 at 6:30 PM, Abdullah Ali Sivas > > wrote: > >> I will check that. I am preallocating but it may be that I am not > allocating > >> big enough. I still have to figure out nuance differences between these > >> formats to solve the bugs. I appreciate your answer and hope Satish > knows > >> it. > >> > >> Thank you, > >> Abdullah Ali Sivas > >> > >> > >> On 2016-10-25 12:15 PM, Matthew Knepley wrote: > >> > >> On Tue, Oct 25, 2016 at 10:54 AM, Abdullah Ali Sivas > >> wrote: > >>> > >>> Hello, > >>> > >>> I want to use PETSc with mfem and I know that mfem people will figure > out > >>> a way to do it in few months. But for now as a temporary solution I > just > >>> thought of converting hypre PARCSR matrices (that is what mfem uses as > >>> linear solver package) into PETSc MPIAIJ matrices and I have a > semi-working > >>> code with some bugs. Also my code is dauntingly slow and seems like not > >>> scaling. I have used MatHYPRE_IJMatrixCopy from myhp.c of PETSc and > >>> hypre_ParCSRMatrixPrintIJ from par_csr_matrix.c of hypre as starting > points. > >>> Before starting I checked whether there was anything done similar to > this, I > >>> could not find anything. > >>> > >>> My question is, are you aware of such a conversion code (i.e. something > >>> like hypre_ParCSRtoPETScMPIAIJ( hypre_ParCSRMatrix *matrix, Mat *A)? > >> > >> No, but maybe Satish knows. Slow running times most likely come from > lack of > >> preallocation for the target matrix. > >> > >> Thanks, > >> > >> Matt > >>> > >>> Thanks in advance, > >>> Abdullah Ali Sivas > >> > >> > >> > >> > >> -- > >> What most experimenters take for granted before they begin their > experiments > >> is infinitely more interesting than any results to which their > experiments > >> lead. > >> -- Norbert Wiener > >> > >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Oct 25 12:50:10 2016 From: jed at jedbrown.org (Jed Brown) Date: Tue, 25 Oct 2016 11:50:10 -0600 Subject: [petsc-users] Using PETSc solvers and preconditioners with mfem In-Reply-To: References: <74402424-3960-898b-7d36-4dd061c06c69@gmail.com> <0155CFFD-C904-4FCA-8951-3DCB5C65C6C7@gmail.com> Message-ID: <87oa28xqr1.fsf@jedbrown.org> Matthew Knepley writes: > On Tue, Oct 25, 2016 at 12:19 PM, Stefano Zampini > wrote: > >> I have a working conversion from HypreParCSR to PETSc MPIAIJ format. >> I could add this code to PETSc, maybe in the contrib folder. Barry, what >> do you think? >> > > No, no one looks there. Add it to src/mat/utils and make an interface > function like MatCreateFromHypreParCSR(). Note that mhyp.c contains code to convert AIJ matrices to ParCSR. If we were to create a MatHypreParCSR implementation, we could use those functions for MatConvert_{Seq,MPI}AIJ_HypreParCSR and use your function for the reverse. That would be consistent with how external matrix formats are normally represented and may enable some new capability to mix PETSc and Hypre components in the future. Here, I'm envisioning PetscErrorCode MatCreateHypreParCSR(hyper_ParCSRMatrix *A,Mat *B); This way, if a user chooses -pc_type hypre, there would be no copies for going through PETSc. Similarly, if we implement MatSetValues_HypreParCSR, a pure PETSc application could use Hypre preconditioners with no copies. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From abdullahasivas at gmail.com Tue Oct 25 12:53:18 2016 From: abdullahasivas at gmail.com (Abdullah Ali Sivas) Date: Tue, 25 Oct 2016 13:53:18 -0400 Subject: [petsc-users] Using PETSc solvers and preconditioners with mfem In-Reply-To: <87oa28xqr1.fsf@jedbrown.org> References: <74402424-3960-898b-7d36-4dd061c06c69@gmail.com> <0155CFFD-C904-4FCA-8951-3DCB5C65C6C7@gmail.com> <87oa28xqr1.fsf@jedbrown.org> Message-ID: @Stefano Zampini: I am planning to do two things. One is to directly use Krylov solvers and preconditioners available to PETSc to try out some stuff and make some matrix manipulations like symmetric diagonal scaling or getting a submatrix. If that works I will implement few things (preconditioners or other krylov solvers) using PETSc and try them out. So what you did is like a blessing for me (basically because I was working on the very same thing for days now) and thank you for that. @Mark and Jed These are great ideas and I believe a lot of users like me will be grateful if these are available. On 2016-10-25 01:50 PM, Jed Brown wrote: > Matthew Knepley writes: > >> On Tue, Oct 25, 2016 at 12:19 PM, Stefano Zampini >> wrote: >>> I have a working conversion from HypreParCSR to PETSc MPIAIJ format. >>> I could add this code to PETSc, maybe in the contrib folder. Barry, what >>> do you think? >>> >> No, no one looks there. Add it to src/mat/utils and make an interface >> function like MatCreateFromHypreParCSR(). > Note that mhyp.c contains code to convert AIJ matrices to ParCSR. If we > were to create a MatHypreParCSR implementation, we could use those > functions for MatConvert_{Seq,MPI}AIJ_HypreParCSR and use your function > for the reverse. That would be consistent with how external matrix > formats are normally represented and may enable some new capability to > mix PETSc and Hypre components in the future. Here, I'm envisioning > > PetscErrorCode MatCreateHypreParCSR(hyper_ParCSRMatrix *A,Mat *B); > > This way, if a user chooses -pc_type hypre, there would be no copies for > going through PETSc. Similarly, if we implement > MatSetValues_HypreParCSR, a pure PETSc application could use Hypre > preconditioners with no copies. From knepley at gmail.com Tue Oct 25 12:53:48 2016 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 25 Oct 2016 12:53:48 -0500 Subject: [petsc-users] Using PETSc solvers and preconditioners with mfem In-Reply-To: <87oa28xqr1.fsf@jedbrown.org> References: <74402424-3960-898b-7d36-4dd061c06c69@gmail.com> <0155CFFD-C904-4FCA-8951-3DCB5C65C6C7@gmail.com> <87oa28xqr1.fsf@jedbrown.org> Message-ID: On Tue, Oct 25, 2016 at 12:50 PM, Jed Brown wrote: > Matthew Knepley writes: > > > On Tue, Oct 25, 2016 at 12:19 PM, Stefano Zampini < > stefano.zampini at gmail.com > >> wrote: > > > >> I have a working conversion from HypreParCSR to PETSc MPIAIJ format. > >> I could add this code to PETSc, maybe in the contrib folder. Barry, what > >> do you think? > >> > > > > No, no one looks there. Add it to src/mat/utils and make an interface > > function like MatCreateFromHypreParCSR(). > > Note that mhyp.c contains code to convert AIJ matrices to ParCSR. If we > were to create a MatHypreParCSR implementation, we could use those > functions for MatConvert_{Seq,MPI}AIJ_HypreParCSR and use your function > for the reverse. That would be consistent with how external matrix > formats are normally represented and may enable some new capability to > mix PETSc and Hypre components in the future. Here, I'm envisioning > > PetscErrorCode MatCreateHypreParCSR(hyper_ParCSRMatrix *A,Mat *B); > > This way, if a user chooses -pc_type hypre, there would be no copies for > going through PETSc. Similarly, if we implement > MatSetValues_HypreParCSR, a pure PETSc application could use Hypre > preconditioners with no copies. > This is a better way, but I did not suggest it because it entails the extra work of implementing the Mat methods for the new class. If Stefano has time for that, fantastic. Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Tue Oct 25 13:06:13 2016 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 25 Oct 2016 21:06:13 +0300 Subject: [petsc-users] Using PETSc solvers and preconditioners with mfem In-Reply-To: <87oa28xqr1.fsf@jedbrown.org> References: <74402424-3960-898b-7d36-4dd061c06c69@gmail.com> <0155CFFD-C904-4FCA-8951-3DCB5C65C6C7@gmail.com> <87oa28xqr1.fsf@jedbrown.org> Message-ID: <9E7416E7-1FEF-4C60-91B5-CF328EBDBFAC@gmail.com> On Oct 25, 2016, at 8:50 PM, Jed Brown wrote: > Matthew Knepley writes: > >> On Tue, Oct 25, 2016 at 12:19 PM, Stefano Zampini >> wrote: >> >>> I have a working conversion from HypreParCSR to PETSc MPIAIJ format. >>> I could add this code to PETSc, maybe in the contrib folder. Barry, what >>> do you think? >>> >> >> No, no one looks there. Add it to src/mat/utils and make an interface >> function like MatCreateFromHypreParCSR(). > > Note that mhyp.c contains code to convert AIJ matrices to ParCSR. If we > were to create a MatHypreParCSR implementation, we could use those > functions for MatConvert_{Seq,MPI}AIJ_HypreParCSR and use your function > for the reverse. That would be consistent with how external matrix > formats are normally represented and may enable some new capability to > mix PETSc and Hypre components in the future. Here, I'm envisioning > > PetscErrorCode MatCreateHypreParCSR(hyper_ParCSRMatrix *A,Mat *B); > > This way, if a user chooses -pc_type hypre, there would be no copies for > going through PETSc. Similarly, if we implement > MatSetValues_HypreParCSR, a pure PETSc application could use Hypre > preconditioners with no copies. MATHYPRE could be a shell wrapping Hypre calls for the moment. However, HypreParCSR and MATAIJ are mostly equivalent formats. As far as I know, the main (only?) difference resides in the fact that the diagonal term of the diagonal part is ordered first in the CSR. For this reason, I think it should inherit from AIJ. As soon as I have time, I can start a new matrix class, but I don?t have much time to implement at the SetValues level yet. From juan at tf.uni-kiel.de Tue Oct 25 13:10:38 2016 From: juan at tf.uni-kiel.de (Julian Andrej) Date: Tue, 25 Oct 2016 20:10:38 +0200 Subject: [petsc-users] Using PETSc solvers and preconditioners with mfem In-Reply-To: <0155CFFD-C904-4FCA-8951-3DCB5C65C6C7@gmail.com> References: <74402424-3960-898b-7d36-4dd061c06c69@gmail.com> <0155CFFD-C904-4FCA-8951-3DCB5C65C6C7@gmail.com> Message-ID: We have an implementation which models a real physical application but don't have the manpower to implement different preconditioner themes (like with fieldsplit) or try out different time solving schemes (which is way too easy with TS). Specifically we create a bunch of operators (Boundary and Face integrators) which model actuator/observer domains on a predefined mesh. On Tue, Oct 25, 2016 at 7:19 PM, Stefano Zampini wrote: > I have a working conversion from HypreParCSR to PETSc MPIAIJ format. > I could add this code to PETSc, maybe in the contrib folder. Barry, what do you think? > > >> We tried a similar approach to get MFEM objects to PETSc and the real >> problem is that all other convenient functions like creating >> gridfunctions and projections, you have to convert them every time >> which is basically nothing more than a bad workaround. > > So far, my interface covers matrices and Krylov solvers (PCFieldSplit and PCBDDC are explicitly supported). > > Can you tell me how would you like to use these objects with PETSc? What you would like to achieve? > > So far, my work on the PETSc interface to MFEM originated from a wishlist for solvers, but I could expand it. > > >> Stefano Zampini >> replied to the issue on github and was stating that there is some >> intention to get MFEM working with PETSc but there is no specific >> timeframe. >> > > There?s currently an open pull request for PETSc solvers inside the private MFEM repo. > However, I don?t know when the code, if merged, will be released. > > >> Regards >> Julian Andrej >> >> On Tue, Oct 25, 2016 at 6:30 PM, Abdullah Ali Sivas >> wrote: >>> I will check that. I am preallocating but it may be that I am not allocating >>> big enough. I still have to figure out nuance differences between these >>> formats to solve the bugs. I appreciate your answer and hope Satish knows >>> it. >>> >>> Thank you, >>> Abdullah Ali Sivas >>> >>> >>> On 2016-10-25 12:15 PM, Matthew Knepley wrote: >>> >>> On Tue, Oct 25, 2016 at 10:54 AM, Abdullah Ali Sivas >>> wrote: >>>> >>>> Hello, >>>> >>>> I want to use PETSc with mfem and I know that mfem people will figure out >>>> a way to do it in few months. But for now as a temporary solution I just >>>> thought of converting hypre PARCSR matrices (that is what mfem uses as >>>> linear solver package) into PETSc MPIAIJ matrices and I have a semi-working >>>> code with some bugs. Also my code is dauntingly slow and seems like not >>>> scaling. I have used MatHYPRE_IJMatrixCopy from myhp.c of PETSc and >>>> hypre_ParCSRMatrixPrintIJ from par_csr_matrix.c of hypre as starting points. >>>> Before starting I checked whether there was anything done similar to this, I >>>> could not find anything. >>>> >>>> My question is, are you aware of such a conversion code (i.e. something >>>> like hypre_ParCSRtoPETScMPIAIJ( hypre_ParCSRMatrix *matrix, Mat *A)? >>> >>> No, but maybe Satish knows. Slow running times most likely come from lack of >>> preallocation for the target matrix. >>> >>> Thanks, >>> >>> Matt >>>> >>>> Thanks in advance, >>>> Abdullah Ali Sivas >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments >>> is infinitely more interesting than any results to which their experiments >>> lead. >>> -- Norbert Wiener >>> >>> > From stefano.zampini at gmail.com Tue Oct 25 13:26:16 2016 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 25 Oct 2016 21:26:16 +0300 Subject: [petsc-users] Using PETSc solvers and preconditioners with mfem In-Reply-To: References: <74402424-3960-898b-7d36-4dd061c06c69@gmail.com> <0155CFFD-C904-4FCA-8951-3DCB5C65C6C7@gmail.com> Message-ID: On Oct 25, 2016, at 9:10 PM, Julian Andrej wrote: > We have an implementation which models a real physical application but > don't have the manpower to implement different preconditioner themes > (like with fieldsplit) or try out different time solving schemes > (which is way too easy with TS). SNES and TS are on my TODOLIST. We can discuss this. > Specifically we create a bunch of > operators (Boundary and Face integrators) which model > actuator/observer domains on a predefined mesh. > At the moment, PETSc matrices are created during the MFEM assemble call; in the MFEM spirit, they are obtained as R A P operations. mfem::BlockOperator is fully supported, and MATNEST is created on the fly. MATIS is also supported (for PCBDDC) You can mail me directly if you think this discussion is getting too technical for the PETSc mailing list. > On Tue, Oct 25, 2016 at 7:19 PM, Stefano Zampini > wrote: >> I have a working conversion from HypreParCSR to PETSc MPIAIJ format. >> I could add this code to PETSc, maybe in the contrib folder. Barry, what do you think? >> >> >>> We tried a similar approach to get MFEM objects to PETSc and the real >>> problem is that all other convenient functions like creating >>> gridfunctions and projections, you have to convert them every time >>> which is basically nothing more than a bad workaround. >> >> So far, my interface covers matrices and Krylov solvers (PCFieldSplit and PCBDDC are explicitly supported). >> >> Can you tell me how would you like to use these objects with PETSc? What you would like to achieve? >> >> So far, my work on the PETSc interface to MFEM originated from a wishlist for solvers, but I could expand it. >> >> >>> Stefano Zampini >>> replied to the issue on github and was stating that there is some >>> intention to get MFEM working with PETSc but there is no specific >>> timeframe. >>> >> >> There?s currently an open pull request for PETSc solvers inside the private MFEM repo. >> However, I don?t know when the code, if merged, will be released. >> >> >>> Regards >>> Julian Andrej >>> >>> On Tue, Oct 25, 2016 at 6:30 PM, Abdullah Ali Sivas >>> wrote: >>>> I will check that. I am preallocating but it may be that I am not allocating >>>> big enough. I still have to figure out nuance differences between these >>>> formats to solve the bugs. I appreciate your answer and hope Satish knows >>>> it. >>>> >>>> Thank you, >>>> Abdullah Ali Sivas >>>> >>>> >>>> On 2016-10-25 12:15 PM, Matthew Knepley wrote: >>>> >>>> On Tue, Oct 25, 2016 at 10:54 AM, Abdullah Ali Sivas >>>> wrote: >>>>> >>>>> Hello, >>>>> >>>>> I want to use PETSc with mfem and I know that mfem people will figure out >>>>> a way to do it in few months. But for now as a temporary solution I just >>>>> thought of converting hypre PARCSR matrices (that is what mfem uses as >>>>> linear solver package) into PETSc MPIAIJ matrices and I have a semi-working >>>>> code with some bugs. Also my code is dauntingly slow and seems like not >>>>> scaling. I have used MatHYPRE_IJMatrixCopy from myhp.c of PETSc and >>>>> hypre_ParCSRMatrixPrintIJ from par_csr_matrix.c of hypre as starting points. >>>>> Before starting I checked whether there was anything done similar to this, I >>>>> could not find anything. >>>>> >>>>> My question is, are you aware of such a conversion code (i.e. something >>>>> like hypre_ParCSRtoPETScMPIAIJ( hypre_ParCSRMatrix *matrix, Mat *A)? >>>> >>>> No, but maybe Satish knows. Slow running times most likely come from lack of >>>> preallocation for the target matrix. >>>> >>>> Thanks, >>>> >>>> Matt >>>>> >>>>> Thanks in advance, >>>>> Abdullah Ali Sivas >>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments >>>> is infinitely more interesting than any results to which their experiments >>>> lead. >>>> -- Norbert Wiener >>>> >>>> >> From bsmith at mcs.anl.gov Tue Oct 25 13:36:36 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 25 Oct 2016 13:36:36 -0500 Subject: [petsc-users] Using PETSc solvers and preconditioners with mfem In-Reply-To: References: <74402424-3960-898b-7d36-4dd061c06c69@gmail.com> <0155CFFD-C904-4FCA-8951-3DCB5C65C6C7@gmail.com> Message-ID: <533EA1C6-1613-4FAD-AD68-E235B120113E@mcs.anl.gov> > On Oct 25, 2016, at 12:31 PM, Matthew Knepley wrote: > > On Tue, Oct 25, 2016 at 12:19 PM, Stefano Zampini wrote: > I have a working conversion from HypreParCSR to PETSc MPIAIJ format. > I could add this code to PETSc, maybe in the contrib folder. Barry, what do you think? > > No, no one looks there. Add it to src/mat/utils and make an interface function like MatCreateFromHypreParCSR(). I agree with Matt on this. > > Thanks, > > Matt > > > We tried a similar approach to get MFEM objects to PETSc and the real > > problem is that all other convenient functions like creating > > gridfunctions and projections, you have to convert them every time > > which is basically nothing more than a bad workaround. > > So far, my interface covers matrices and Krylov solvers (PCFieldSplit and PCBDDC are explicitly supported). > > Can you tell me how would you like to use these objects with PETSc? What you would like to achieve? > > So far, my work on the PETSc interface to MFEM originated from a wishlist for solvers, but I could expand it. > > > > Stefano Zampini > > replied to the issue on github and was stating that there is some > > intention to get MFEM working with PETSc but there is no specific > > timeframe. > > > > There?s currently an open pull request for PETSc solvers inside the private MFEM repo. > However, I don?t know when the code, if merged, will be released. > > > > Regards > > Julian Andrej > > > > On Tue, Oct 25, 2016 at 6:30 PM, Abdullah Ali Sivas > > wrote: > >> I will check that. I am preallocating but it may be that I am not allocating > >> big enough. I still have to figure out nuance differences between these > >> formats to solve the bugs. I appreciate your answer and hope Satish knows > >> it. > >> > >> Thank you, > >> Abdullah Ali Sivas > >> > >> > >> On 2016-10-25 12:15 PM, Matthew Knepley wrote: > >> > >> On Tue, Oct 25, 2016 at 10:54 AM, Abdullah Ali Sivas > >> wrote: > >>> > >>> Hello, > >>> > >>> I want to use PETSc with mfem and I know that mfem people will figure out > >>> a way to do it in few months. But for now as a temporary solution I just > >>> thought of converting hypre PARCSR matrices (that is what mfem uses as > >>> linear solver package) into PETSc MPIAIJ matrices and I have a semi-working > >>> code with some bugs. Also my code is dauntingly slow and seems like not > >>> scaling. I have used MatHYPRE_IJMatrixCopy from myhp.c of PETSc and > >>> hypre_ParCSRMatrixPrintIJ from par_csr_matrix.c of hypre as starting points. > >>> Before starting I checked whether there was anything done similar to this, I > >>> could not find anything. > >>> > >>> My question is, are you aware of such a conversion code (i.e. something > >>> like hypre_ParCSRtoPETScMPIAIJ( hypre_ParCSRMatrix *matrix, Mat *A)? > >> > >> No, but maybe Satish knows. Slow running times most likely come from lack of > >> preallocation for the target matrix. > >> > >> Thanks, > >> > >> Matt > >>> > >>> Thanks in advance, > >>> Abdullah Ali Sivas > >> > >> > >> > >> > >> -- > >> What most experimenters take for granted before they begin their experiments > >> is infinitely more interesting than any results to which their experiments > >> lead. > >> -- Norbert Wiener > >> > >> > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener From jed at jedbrown.org Tue Oct 25 13:43:08 2016 From: jed at jedbrown.org (Jed Brown) Date: Tue, 25 Oct 2016 12:43:08 -0600 Subject: [petsc-users] Using PETSc solvers and preconditioners with mfem In-Reply-To: <9E7416E7-1FEF-4C60-91B5-CF328EBDBFAC@gmail.com> References: <74402424-3960-898b-7d36-4dd061c06c69@gmail.com> <0155CFFD-C904-4FCA-8951-3DCB5C65C6C7@gmail.com> <87oa28xqr1.fsf@jedbrown.org> <9E7416E7-1FEF-4C60-91B5-CF328EBDBFAC@gmail.com> Message-ID: <87lgxcxoar.fsf@jedbrown.org> Stefano Zampini writes: > MATHYPRE could be a shell wrapping Hypre calls for the moment. > However, HypreParCSR and MATAIJ are mostly equivalent formats. As far as I know, the main (only?) difference resides in the fact that the diagonal term of the diagonal part is ordered first in the CSR. > For this reason, I think it should inherit from AIJ. This is more delicate. Derived classes need to *exactly* retain the AIJ structure because all unimplemented methods use the parent implementations. If the rows are not sorted, MatSetValues, MatGetRow, and the like cease to work. You can still make MatHypreParCSR respond to MatMPIAIJSetPreallocation, but I don't think it can be derived from AIJ unless you audit *all* reachable AIJ code to remove the assumption of sorted rows *and* document the API change for all users that could observe the lack of sorting. (I don't think that's worthwhile.) Note that the existing AIJ derived implementations merely augment the AIJ structure rather than modifying it. > As soon as I have time, I can start a new matrix class, but I don?t have much time to implement at the SetValues level yet. That's not urgent, but if you write it as a Mat implementation instead of some utility functions, it would be easy to add later and would not disrupt existing users. There is no requirement that all Mat implementations include all the methods that "make sense"; it can be fleshed out later according to demand. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From knepley at gmail.com Tue Oct 25 13:45:47 2016 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 25 Oct 2016 13:45:47 -0500 Subject: [petsc-users] Using PETSc solvers and preconditioners with mfem In-Reply-To: <87lgxcxoar.fsf@jedbrown.org> References: <74402424-3960-898b-7d36-4dd061c06c69@gmail.com> <0155CFFD-C904-4FCA-8951-3DCB5C65C6C7@gmail.com> <87oa28xqr1.fsf@jedbrown.org> <9E7416E7-1FEF-4C60-91B5-CF328EBDBFAC@gmail.com> <87lgxcxoar.fsf@jedbrown.org> Message-ID: On Tue, Oct 25, 2016 at 1:43 PM, Jed Brown wrote: > Stefano Zampini writes: > > MATHYPRE could be a shell wrapping Hypre calls for the moment. > > However, HypreParCSR and MATAIJ are mostly equivalent formats. As far as > I know, the main (only?) difference resides in the fact that the diagonal > term of the diagonal part is ordered first in the CSR. > > For this reason, I think it should inherit from AIJ. > > This is more delicate. Derived classes need to *exactly* retain the AIJ > structure because all unimplemented methods use the parent > implementations. If the rows are not sorted, MatSetValues, MatGetRow, > and the like cease to work. You can still make MatHypreParCSR respond > to MatMPIAIJSetPreallocation, but I don't think it can be derived from > AIJ unless you audit *all* reachable AIJ code to remove the assumption > of sorted rows *and* document the API change for all users that could > observe the lack of sorting. (I don't think that's worthwhile.) > > Note that the existing AIJ derived implementations merely augment the > AIJ structure rather than modifying it. > Inheritance is almost never useful except in contrived textbook examples. It was a tremendous pain to make work in PETSc, and I think if we did it again, I would just go back and make subobjects that packaged up lower level behavior instead of inheriting. Matt > As soon as I have time, I can start a new matrix class, but I don?t have > much time to implement at the SetValues level yet. > > That's not urgent, but if you write it as a Mat implementation instead > of some utility functions, it would be easy to add later and would not > disrupt existing users. There is no requirement that all Mat > implementations include all the methods that "make sense"; it can be > fleshed out later according to demand. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From kolev1 at llnl.gov Tue Oct 25 14:14:59 2016 From: kolev1 at llnl.gov (Tzanio Kolev) Date: Tue, 25 Oct 2016 12:14:59 -0700 Subject: [petsc-users] Using PETSc solvers and preconditioners with mfem In-Reply-To: <0155CFFD-C904-4FCA-8951-3DCB5C65C6C7@gmail.com> References: <74402424-3960-898b-7d36-4dd061c06c69@gmail.com> <0155CFFD-C904-4FCA-8951-3DCB5C65C6C7@gmail.com> Message-ID: <11B30383-3A31-40D8-A81B-07FF1B73D29A@llnl.gov> >> Stefano Zampini >> replied to the issue on github and was stating that there is some intention to get MFEM working with PETSc but there is no specific timeframe. > > There?s currently an open pull request for PETSc solvers inside the private MFEM repo. > However, I don?t know when the code, if merged, will be released. We do plan to merge this as part of mfem?s next official release (v3.3). Optimistically, that's targeted for the end of 2016 :) Tzanio From olivier.mesnard8 at gmail.com Tue Oct 25 16:38:28 2016 From: olivier.mesnard8 at gmail.com (Olivier Mesnard) Date: Tue, 25 Oct 2016 17:38:28 -0400 Subject: [petsc-users] Moving from KSPSetNullSpace to MatSetNullSpace Message-ID: Hi all, We develop a CFD code using the PETSc library that solves the Navier-Stokes equations using the fractional-step method from Perot (1993). At each time-step, we solve two systems: one for the velocity field, the other, a Poisson system, for the pressure field. One of our test-cases is a 2D lid-driven cavity flow (Re=100) on a 20x20 grid using 1 or 2 procs. For the Poisson system, we usually use CG preconditioned with GAMG. So far, we have been using PETSc-3.5.4, and we would like to update the code with the latest release: 3.7.4. As suggested in the changelog of 3.6, we replaced the routine `KSPSetNullSpace()` with `MatSetNullSpace()`. Here is the list of options we use to configure the two solvers: * Velocity solver: prefix `-velocity_` -velocity_ksp_type bcgs -velocity_ksp_rtol 1.0E-08 -velocity_ksp_atol 0.0 -velocity_ksp_max_it 10000 -velocity_pc_type jacobi -velocity_ksp_view -velocity_ksp_monitor_true_residual -velocity_ksp_converged_reason * Poisson solver: prefix `-poisson_` -poisson_ksp_type cg -poisson_ksp_rtol 1.0E-08 -poisson_ksp_atol 0.0 -poisson_ksp_max_it 20000 -poisson_pc_type gamg -poisson_pc_gamg_type agg -poisson_pc_gamg_agg_nsmooths 1 -poissonksp_view -poisson_ksp_monitor_true_residual -poisson_ksp_converged_reason With 3.5.4, the case runs normally on 1 or 2 procs. With 3.7.4, the case runs normally on 1 proc but not on 2. Why? The Poisson solver diverges because of an indefinite preconditioner (only with 2 procs). We also saw that the routine `MatSetNullSpace()` was already available in 3.5.4. With 3.5.4, replacing `KSPSetNullSpace()` with `MatSetNullSpace()` led to the Poisson solver diverging because of an indefinite matrix (on 1 and 2 procs). Thus, we were wondering if we needed to update something else for the KSP, and not just modifying the name of the routine? I have attached the output files from the different cases: * `run-petsc-3.5.4-n1.log` (3.5.4, `KSPSetNullSpace()`, n=1) * `run-petsc-3.5.4-n2.log` * `run-petsc-3.5.4-nsp-n1.log` (3.5.4, `MatSetNullSpace()`, n=1) * `run-petsc-3.5.4-nsp-n2.log` * `run-petsc-3.7.4-n1.log` (3.7.4, `MatSetNullSpace()`, n=1) * `run-petsc-3.7.4-n2.log` Thank you for your help, Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run-petsc-3.5.4-n1.log Type: text/x-log Size: 9513 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run-petsc-3.5.4-n2.log Type: text/x-log Size: 9535 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run-petsc-3.5.4-nsp-n1.log Type: text/x-log Size: 10348 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run-petsc-3.5.4-nsp-n2.log Type: text/x-log Size: 11348 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run-petsc-3.7.4-n1.log Type: text/x-log Size: 12085 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run-petsc-3.7.4-n2.log Type: text/x-log Size: 11424 bytes Desc: not available URL: From bsmith at mcs.anl.gov Tue Oct 25 16:51:28 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 25 Oct 2016 16:51:28 -0500 Subject: [petsc-users] Moving from KSPSetNullSpace to MatSetNullSpace In-Reply-To: References: Message-ID: <15FB9A49-3162-41B0-B221-9FA01C271714@mcs.anl.gov> Olivier, In theory you do not need to change anything else. Are you using a different matrix object for the velocity_ksp object than the poisson_ksp object? The code change in PETSc is very little but we have a report from another CFD user who also had problems with the change so there may be some subtle bug that we can't figure out causing things to not behave properly. First run the 3.7.4 code with -poisson_ksp_view and verify that when it prints the matrix information it prints something like has attached null space if it does not print that it means that somehow the matrix is not properly getting the matrix attached. Though older versions had MatSetNullSpace() they didn't necessarily associate it with the KSP so it was not expected to work as a replacement for KSPSetNullSpace() with older versions. Because our other user had great difficulty trying to debug the issue feel free to send us at petsc-maint at mcs.anl.gov your code with instructions on building and running and we can try to track down the problem. Better than hours and hours spent with fruitless email. We will, of course, not distribute the code and will delete in when we are finished with it. Barry > On Oct 25, 2016, at 4:38 PM, Olivier Mesnard wrote: > > Hi all, > > We develop a CFD code using the PETSc library that solves the Navier-Stokes equations using the fractional-step method from Perot (1993). > At each time-step, we solve two systems: one for the velocity field, the other, a Poisson system, for the pressure field. > One of our test-cases is a 2D lid-driven cavity flow (Re=100) on a 20x20 grid using 1 or 2 procs. > For the Poisson system, we usually use CG preconditioned with GAMG. > > So far, we have been using PETSc-3.5.4, and we would like to update the code with the latest release: 3.7.4. > > As suggested in the changelog of 3.6, we replaced the routine `KSPSetNullSpace()` with `MatSetNullSpace()`. > > Here is the list of options we use to configure the two solvers: > * Velocity solver: prefix `-velocity_` > -velocity_ksp_type bcgs > -velocity_ksp_rtol 1.0E-08 > -velocity_ksp_atol 0.0 > -velocity_ksp_max_it 10000 > -velocity_pc_type jacobi > -velocity_ksp_view > -velocity_ksp_monitor_true_residual > -velocity_ksp_converged_reason > * Poisson solver: prefix `-poisson_` > -poisson_ksp_type cg > -poisson_ksp_rtol 1.0E-08 > -poisson_ksp_atol 0.0 > -poisson_ksp_max_it 20000 > -poisson_pc_type gamg > -poisson_pc_gamg_type agg > -poisson_pc_gamg_agg_nsmooths 1 > -poissonksp_view > -poisson_ksp_monitor_true_residual > -poisson_ksp_converged_reason > > With 3.5.4, the case runs normally on 1 or 2 procs. > With 3.7.4, the case runs normally on 1 proc but not on 2. > Why? The Poisson solver diverges because of an indefinite preconditioner (only with 2 procs). > > We also saw that the routine `MatSetNullSpace()` was already available in 3.5.4. > With 3.5.4, replacing `KSPSetNullSpace()` with `MatSetNullSpace()` led to the Poisson solver diverging because of an indefinite matrix (on 1 and 2 procs). > > Thus, we were wondering if we needed to update something else for the KSP, and not just modifying the name of the routine? > > I have attached the output files from the different cases: > * `run-petsc-3.5.4-n1.log` (3.5.4, `KSPSetNullSpace()`, n=1) > * `run-petsc-3.5.4-n2.log` > * `run-petsc-3.5.4-nsp-n1.log` (3.5.4, `MatSetNullSpace()`, n=1) > * `run-petsc-3.5.4-nsp-n2.log` > * `run-petsc-3.7.4-n1.log` (3.7.4, `MatSetNullSpace()`, n=1) > * `run-petsc-3.7.4-n2.log` > > Thank you for your help, > Olivier > From olivier.mesnard8 at gmail.com Tue Oct 25 17:39:08 2016 From: olivier.mesnard8 at gmail.com (Olivier Mesnard) Date: Tue, 25 Oct 2016 18:39:08 -0400 Subject: [petsc-users] Moving from KSPSetNullSpace to MatSetNullSpace In-Reply-To: <15FB9A49-3162-41B0-B221-9FA01C271714@mcs.anl.gov> References: <15FB9A49-3162-41B0-B221-9FA01C271714@mcs.anl.gov> Message-ID: On 25 October 2016 at 17:51, Barry Smith wrote: > > Olivier, > > In theory you do not need to change anything else. Are you using a > different matrix object for the velocity_ksp object than the poisson_ksp > object? > > ?The matrix is different for the velocity_ksp and the poisson_ksp?. > The code change in PETSc is very little but we have a report from > another CFD user who also had problems with the change so there may be some > subtle bug that we can't figure out causing things to not behave properly. > > First run the 3.7.4 code with -poisson_ksp_view and verify that when it > prints the matrix information it prints something like has attached null > space if it does not print that it means that somehow the matrix is not > properly getting the matrix attached. > > ?When running with 3.7.4 and -poisson_ksp_view, the output shows that the nullspace is not attached to the KSP (as it was with 3.5.4)?; however the print statement is now under the Mat info (which is expected when moving from KSPSetNullSpace to MatSetNullSpace?). Though older versions had MatSetNullSpace() they didn't necessarily > associate it with the KSP so it was not expected to work as a replacement > for KSPSetNullSpace() with older versions. > > Because our other user had great difficulty trying to debug the issue > feel free to send us at petsc-maint at mcs.anl.gov your code with > instructions on building and running and we can try to track down the > problem. Better than hours and hours spent with fruitless email. We will, > of course, not distribute the code and will delete in when we are finished > with it. > > ?The code is open-source and hosted on GitHub ( https://github.com/barbagroup/PetIBM)?. I just pushed the branches `feature-compatible-petsc-3.7` and `revert-compatible-petsc-3.5` that I used to observe this problem. PETSc (both 3.5.4 and 3.7.4) was configured as follow: export PETSC_ARCH="linux-gnu-dbg" ./configure --PETSC_ARCH=$PETSC_ARCH \ --with-cc=gcc \ --with-cxx=g++ \ --with-fc=gfortran \ --COPTFLAGS="-O0" \ --CXXOPTFLAGS="-O0" \ --FOPTFLAGS="-O0" \ --with-debugging=1 \ --download-fblaslapack \ --download-mpich \ --download-hypre \ --download-yaml \ --with-x=1 Our code was built using the following commands:? mkdir petibm-build cd petibm-build ?export PETSC_DIR= export PETSC_ARCH="linux-gnu-dbg" export PETIBM_DIR= $PETIBM_DIR/configure --prefix=$PWD \ CXX=$PETSC_DIR/$PETSC_ARCH/bin/mpicxx \ CXXFLAGS="-g -O0 -std=c++11"? make all make install ?Then cd examples make examples? ?The example of the lid-driven cavity I was talking about can be found in the folder `examples/2d/convergence/lidDrivenCavity20/20/`? To run it: mpiexec -n N /bin/petibm2d -directory Let me know if you need more info. Thank you. Barry > > > > > > > > > > On Oct 25, 2016, at 4:38 PM, Olivier Mesnard > wrote: > > > > Hi all, > > > > We develop a CFD code using the PETSc library that solves the > Navier-Stokes equations using the fractional-step method from Perot (1993). > > At each time-step, we solve two systems: one for the velocity field, the > other, a Poisson system, for the pressure field. > > One of our test-cases is a 2D lid-driven cavity flow (Re=100) on a 20x20 > grid using 1 or 2 procs. > > For the Poisson system, we usually use CG preconditioned with GAMG. > > > > So far, we have been using PETSc-3.5.4, and we would like to update the > code with the latest release: 3.7.4. > > > > As suggested in the changelog of 3.6, we replaced the routine > `KSPSetNullSpace()` with `MatSetNullSpace()`. > > > > Here is the list of options we use to configure the two solvers: > > * Velocity solver: prefix `-velocity_` > > -velocity_ksp_type bcgs > > -velocity_ksp_rtol 1.0E-08 > > -velocity_ksp_atol 0.0 > > -velocity_ksp_max_it 10000 > > -velocity_pc_type jacobi > > -velocity_ksp_view > > -velocity_ksp_monitor_true_residual > > -velocity_ksp_converged_reason > > * Poisson solver: prefix `-poisson_` > > -poisson_ksp_type cg > > -poisson_ksp_rtol 1.0E-08 > > -poisson_ksp_atol 0.0 > > -poisson_ksp_max_it 20000 > > -poisson_pc_type gamg > > -poisson_pc_gamg_type agg > > -poisson_pc_gamg_agg_nsmooths 1 > > -poissonksp_view > > -poisson_ksp_monitor_true_residual > > -poisson_ksp_converged_reason > > > > With 3.5.4, the case runs normally on 1 or 2 procs. > > With 3.7.4, the case runs normally on 1 proc but not on 2. > > Why? The Poisson solver diverges because of an indefinite preconditioner > (only with 2 procs). > > > > We also saw that the routine `MatSetNullSpace()` was already available > in 3.5.4. > > With 3.5.4, replacing `KSPSetNullSpace()` with `MatSetNullSpace()` led > to the Poisson solver diverging because of an indefinite matrix (on 1 and 2 > procs). > > > > Thus, we were wondering if we needed to update something else for the > KSP, and not just modifying the name of the routine? > > > > I have attached the output files from the different cases: > > * `run-petsc-3.5.4-n1.log` (3.5.4, `KSPSetNullSpace()`, n=1) > > * `run-petsc-3.5.4-n2.log` > > * `run-petsc-3.5.4-nsp-n1.log` (3.5.4, `MatSetNullSpace()`, n=1) > > * `run-petsc-3.5.4-nsp-n2.log` > > * `run-petsc-3.7.4-n1.log` (3.7.4, `MatSetNullSpace()`, n=1) > > * `run-petsc-3.7.4-n2.log` > > > > Thank you for your help, > > Olivier > > 3.5.4-nsp-n1.log> 4-n1.log> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Oct 25 18:22:08 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 25 Oct 2016 18:22:08 -0500 Subject: [petsc-users] Moving from KSPSetNullSpace to MatSetNullSpace In-Reply-To: References: <15FB9A49-3162-41B0-B221-9FA01C271714@mcs.anl.gov> Message-ID: <49994548-2BDA-4996-A4A9-6BB29D862CD5@mcs.anl.gov> > On Oct 25, 2016, at 5:39 PM, Olivier Mesnard wrote: > > On 25 October 2016 at 17:51, Barry Smith wrote: > > Olivier, > > In theory you do not need to change anything else. Are you using a different matrix object for the velocity_ksp object than the poisson_ksp object? > > ?The matrix is different for the velocity_ksp and the poisson_ksp?. > > The code change in PETSc is very little but we have a report from another CFD user who also had problems with the change so there may be some subtle bug that we can't figure out causing things to not behave properly. > > First run the 3.7.4 code with -poisson_ksp_view and verify that when it prints the matrix information it prints something like has attached null space if it does not print that it means that somehow the matrix is not properly getting the matrix attached. > > ?When running with 3.7.4 and -poisson_ksp_view, the output shows that the nullspace is not attached to the KSP (as it was with 3.5.4)?; however the print statement is now under the Mat info (which is expected when moving from KSPSetNullSpace to MatSetNullSpace?). Good, this is how it should be. > > Though older versions had MatSetNullSpace() they didn't necessarily associate it with the KSP so it was not expected to work as a replacement for KSPSetNullSpace() with older versions. > > Because our other user had great difficulty trying to debug the issue feel free to send us at petsc-maint at mcs.anl.gov your code with instructions on building and running and we can try to track down the problem. Better than hours and hours spent with fruitless email. We will, of course, not distribute the code and will delete in when we are finished with it. > > ?The code is open-source and hosted on GitHub (https://github.com/barbagroup/PetIBM)?. > I just pushed the branches `feature-compatible-petsc-3.7` and `revert-compatible-petsc-3.5` that I used to observe this problem. > Thanks, I'll get back to you if I discover anything > PETSc (both 3.5.4 and 3.7.4) was configured as follow: > export PETSC_ARCH="linux-gnu-dbg" > ./configure --PETSC_ARCH=$PETSC_ARCH \ > --with-cc=gcc \ > --with-cxx=g++ \ > --with-fc=gfortran \ > --COPTFLAGS="-O0" \ > --CXXOPTFLAGS="-O0" \ > --FOPTFLAGS="-O0" \ > --with-debugging=1 \ > --download-fblaslapack \ > --download-mpich \ > --download-hypre \ > --download-yaml \ > --with-x=1 > > Our code was built using the following commands:? > mkdir petibm-build > cd petibm-build > ?export PETSC_DIR= > export PETSC_ARCH="linux-gnu-dbg" > export PETIBM_DIR= > $PETIBM_DIR/configure --prefix=$PWD \ > CXX=$PETSC_DIR/$PETSC_ARCH/bin/mpicxx \ > CXXFLAGS="-g -O0 -std=c++11"? > make all > make install > > ?Then > cd examples > make examples? > > ?The example of the lid-driven cavity I was talking about can be found in the folder `examples/2d/convergence/lidDrivenCavity20/20/`? > > To run it: > mpiexec -n N /bin/petibm2d -directory > > Let me know if you need more info. Thank you. > > Barry > > > > > > > > > > On Oct 25, 2016, at 4:38 PM, Olivier Mesnard wrote: > > > > Hi all, > > > > We develop a CFD code using the PETSc library that solves the Navier-Stokes equations using the fractional-step method from Perot (1993). > > At each time-step, we solve two systems: one for the velocity field, the other, a Poisson system, for the pressure field. > > One of our test-cases is a 2D lid-driven cavity flow (Re=100) on a 20x20 grid using 1 or 2 procs. > > For the Poisson system, we usually use CG preconditioned with GAMG. > > > > So far, we have been using PETSc-3.5.4, and we would like to update the code with the latest release: 3.7.4. > > > > As suggested in the changelog of 3.6, we replaced the routine `KSPSetNullSpace()` with `MatSetNullSpace()`. > > > > Here is the list of options we use to configure the two solvers: > > * Velocity solver: prefix `-velocity_` > > -velocity_ksp_type bcgs > > -velocity_ksp_rtol 1.0E-08 > > -velocity_ksp_atol 0.0 > > -velocity_ksp_max_it 10000 > > -velocity_pc_type jacobi > > -velocity_ksp_view > > -velocity_ksp_monitor_true_residual > > -velocity_ksp_converged_reason > > * Poisson solver: prefix `-poisson_` > > -poisson_ksp_type cg > > -poisson_ksp_rtol 1.0E-08 > > -poisson_ksp_atol 0.0 > > -poisson_ksp_max_it 20000 > > -poisson_pc_type gamg > > -poisson_pc_gamg_type agg > > -poisson_pc_gamg_agg_nsmooths 1 > > -poissonksp_view > > -poisson_ksp_monitor_true_residual > > -poisson_ksp_converged_reason > > > > With 3.5.4, the case runs normally on 1 or 2 procs. > > With 3.7.4, the case runs normally on 1 proc but not on 2. > > Why? The Poisson solver diverges because of an indefinite preconditioner (only with 2 procs). > > > > We also saw that the routine `MatSetNullSpace()` was already available in 3.5.4. > > With 3.5.4, replacing `KSPSetNullSpace()` with `MatSetNullSpace()` led to the Poisson solver diverging because of an indefinite matrix (on 1 and 2 procs). > > > > Thus, we were wondering if we needed to update something else for the KSP, and not just modifying the name of the routine? > > > > I have attached the output files from the different cases: > > * `run-petsc-3.5.4-n1.log` (3.5.4, `KSPSetNullSpace()`, n=1) > > * `run-petsc-3.5.4-n2.log` > > * `run-petsc-3.5.4-nsp-n1.log` (3.5.4, `MatSetNullSpace()`, n=1) > > * `run-petsc-3.5.4-nsp-n2.log` > > * `run-petsc-3.7.4-n1.log` (3.7.4, `MatSetNullSpace()`, n=1) > > * `run-petsc-3.7.4-n2.log` > > > > Thank you for your help, > > Olivier > > > > From knepley at gmail.com Tue Oct 25 18:59:07 2016 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 25 Oct 2016 18:59:07 -0500 Subject: [petsc-users] Moving from KSPSetNullSpace to MatSetNullSpace In-Reply-To: <49994548-2BDA-4996-A4A9-6BB29D862CD5@mcs.anl.gov> References: <15FB9A49-3162-41B0-B221-9FA01C271714@mcs.anl.gov> <49994548-2BDA-4996-A4A9-6BB29D862CD5@mcs.anl.gov> Message-ID: On Tue, Oct 25, 2016 at 6:22 PM, Barry Smith wrote: > > > On Oct 25, 2016, at 5:39 PM, Olivier Mesnard > wrote: > > > > On 25 October 2016 at 17:51, Barry Smith wrote: > > > > Olivier, > > > > In theory you do not need to change anything else. Are you using a > different matrix object for the velocity_ksp object than the poisson_ksp > object? > > > > ?The matrix is different for the velocity_ksp and the poisson_ksp?. > > > > The code change in PETSc is very little but we have a report from > another CFD user who also had problems with the change so there may be some > subtle bug that we can't figure out causing things to not behave properly. > > > > First run the 3.7.4 code with -poisson_ksp_view and verify that when > it prints the matrix information it prints something like has attached null > space if it does not print that it means that somehow the matrix is not > properly getting the matrix attached. > > > > ?When running with 3.7.4 and -poisson_ksp_view, the output shows that > the nullspace is not attached to the KSP (as it was with 3.5.4)?; however > the print statement is now under the Mat info (which is expected when > moving from KSPSetNullSpace to MatSetNullSpace?). > > Good, this is how it should be. > > > > Though older versions had MatSetNullSpace() they didn't necessarily > associate it with the KSP so it was not expected to work as a replacement > for KSPSetNullSpace() with older versions. > > > > Because our other user had great difficulty trying to debug the > issue feel free to send us at petsc-maint at mcs.anl.gov your code with > instructions on building and running and we can try to track down the > problem. Better than hours and hours spent with fruitless email. We will, > of course, not distribute the code and will delete in when we are finished > with it. > > > > ?The code is open-source and hosted on GitHub (https://github.com/ > barbagroup/PetIBM)?. > > I just pushed the branches `feature-compatible-petsc-3.7` and > `revert-compatible-petsc-3.5` that I used to observe this problem. > > > Thanks, I'll get back to you if I discover anything Obviously GAMG is behaving quite differently (1 vs 2 levels and a much sparser coarse problem in 3.7). Could you try one thing for me before we start running it? Run with -poisson_mg_coarse_sub_pc_type svd and see what happens on 2 procs for 3.7? Thanks, Matt > > > PETSc (both 3.5.4 and 3.7.4) was configured as follow: > > export PETSC_ARCH="linux-gnu-dbg" > > ./configure --PETSC_ARCH=$PETSC_ARCH \ > > --with-cc=gcc \ > > --with-cxx=g++ \ > > --with-fc=gfortran \ > > --COPTFLAGS="-O0" \ > > --CXXOPTFLAGS="-O0" \ > > --FOPTFLAGS="-O0" \ > > --with-debugging=1 \ > > --download-fblaslapack \ > > --download-mpich \ > > --download-hypre \ > > --download-yaml \ > > --with-x=1 > > > > Our code was built using the following commands:? > > mkdir petibm-build > > cd petibm-build > > ?export PETSC_DIR= > > export PETSC_ARCH="linux-gnu-dbg" > > export PETIBM_DIR= > > $PETIBM_DIR/configure --prefix=$PWD \ > > CXX=$PETSC_DIR/$PETSC_ARCH/bin/mpicxx \ > > CXXFLAGS="-g -O0 -std=c++11"? > > make all > > make install > > > > ?Then > > cd examples > > make examples? > > > > ?The example of the lid-driven cavity I was talking about can be found > in the folder `examples/2d/convergence/lidDrivenCavity20/20/`? > > > > To run it: > > mpiexec -n N /bin/petibm2d -directory > > > > > Let me know if you need more info. Thank you. > > > > Barry > > > > > > > > > > > > > > > > > > > On Oct 25, 2016, at 4:38 PM, Olivier Mesnard < > olivier.mesnard8 at gmail.com> wrote: > > > > > > Hi all, > > > > > > We develop a CFD code using the PETSc library that solves the > Navier-Stokes equations using the fractional-step method from Perot (1993). > > > At each time-step, we solve two systems: one for the velocity field, > the other, a Poisson system, for the pressure field. > > > One of our test-cases is a 2D lid-driven cavity flow (Re=100) on a > 20x20 grid using 1 or 2 procs. > > > For the Poisson system, we usually use CG preconditioned with GAMG. > > > > > > So far, we have been using PETSc-3.5.4, and we would like to update > the code with the latest release: 3.7.4. > > > > > > As suggested in the changelog of 3.6, we replaced the routine > `KSPSetNullSpace()` with `MatSetNullSpace()`. > > > > > > Here is the list of options we use to configure the two solvers: > > > * Velocity solver: prefix `-velocity_` > > > -velocity_ksp_type bcgs > > > -velocity_ksp_rtol 1.0E-08 > > > -velocity_ksp_atol 0.0 > > > -velocity_ksp_max_it 10000 > > > -velocity_pc_type jacobi > > > -velocity_ksp_view > > > -velocity_ksp_monitor_true_residual > > > -velocity_ksp_converged_reason > > > * Poisson solver: prefix `-poisson_` > > > -poisson_ksp_type cg > > > -poisson_ksp_rtol 1.0E-08 > > > -poisson_ksp_atol 0.0 > > > -poisson_ksp_max_it 20000 > > > -poisson_pc_type gamg > > > -poisson_pc_gamg_type agg > > > -poisson_pc_gamg_agg_nsmooths 1 > > > -poissonksp_view > > > -poisson_ksp_monitor_true_residual > > > -poisson_ksp_converged_reason > > > > > > With 3.5.4, the case runs normally on 1 or 2 procs. > > > With 3.7.4, the case runs normally on 1 proc but not on 2. > > > Why? The Poisson solver diverges because of an indefinite > preconditioner (only with 2 procs). > > > > > > We also saw that the routine `MatSetNullSpace()` was already available > in 3.5.4. > > > With 3.5.4, replacing `KSPSetNullSpace()` with `MatSetNullSpace()` led > to the Poisson solver diverging because of an indefinite matrix (on 1 and 2 > procs). > > > > > > Thus, we were wondering if we needed to update something else for the > KSP, and not just modifying the name of the routine? > > > > > > I have attached the output files from the different cases: > > > * `run-petsc-3.5.4-n1.log` (3.5.4, `KSPSetNullSpace()`, n=1) > > > * `run-petsc-3.5.4-n2.log` > > > * `run-petsc-3.5.4-nsp-n1.log` (3.5.4, `MatSetNullSpace()`, n=1) > > > * `run-petsc-3.5.4-nsp-n2.log` > > > * `run-petsc-3.7.4-n1.log` (3.7.4, `MatSetNullSpace()`, n=1) > > > * `run-petsc-3.7.4-n2.log` > > > > > > Thank you for your help, > > > Olivier > > > 3.5.4-nsp-n1.log> 4-n1.log> > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.mesnard8 at gmail.com Tue Oct 25 19:57:38 2016 From: olivier.mesnard8 at gmail.com (Olivier Mesnard) Date: Tue, 25 Oct 2016 20:57:38 -0400 Subject: [petsc-users] Moving from KSPSetNullSpace to MatSetNullSpace In-Reply-To: References: <15FB9A49-3162-41B0-B221-9FA01C271714@mcs.anl.gov> <49994548-2BDA-4996-A4A9-6BB29D862CD5@mcs.anl.gov> Message-ID: On 25 October 2016 at 19:59, Matthew Knepley wrote: > On Tue, Oct 25, 2016 at 6:22 PM, Barry Smith wrote: > >> >> > On Oct 25, 2016, at 5:39 PM, Olivier Mesnard < >> olivier.mesnard8 at gmail.com> wrote: >> > >> > On 25 October 2016 at 17:51, Barry Smith wrote: >> > >> > Olivier, >> > >> > In theory you do not need to change anything else. Are you using a >> different matrix object for the velocity_ksp object than the poisson_ksp >> object? >> > >> > ?The matrix is different for the velocity_ksp and the poisson_ksp?. >> > >> > The code change in PETSc is very little but we have a report from >> another CFD user who also had problems with the change so there may be some >> subtle bug that we can't figure out causing things to not behave properly. >> > >> > First run the 3.7.4 code with -poisson_ksp_view and verify that when >> it prints the matrix information it prints something like has attached null >> space if it does not print that it means that somehow the matrix is not >> properly getting the matrix attached. >> > >> > ?When running with 3.7.4 and -poisson_ksp_view, the output shows that >> the nullspace is not attached to the KSP (as it was with 3.5.4)?; however >> the print statement is now under the Mat info (which is expected when >> moving from KSPSetNullSpace to MatSetNullSpace?). >> >> Good, this is how it should be. >> > >> > Though older versions had MatSetNullSpace() they didn't necessarily >> associate it with the KSP so it was not expected to work as a replacement >> for KSPSetNullSpace() with older versions. >> > >> > Because our other user had great difficulty trying to debug the >> issue feel free to send us at petsc-maint at mcs.anl.gov your code with >> instructions on building and running and we can try to track down the >> problem. Better than hours and hours spent with fruitless email. We will, >> of course, not distribute the code and will delete in when we are finished >> with it. >> > >> > ?The code is open-source and hosted on GitHub ( >> https://github.com/barbagroup/PetIBM)?. >> > I just pushed the branches `feature-compatible-petsc-3.7` and >> `revert-compatible-petsc-3.5` that I used to observe this problem. >> > >> Thanks, I'll get back to you if I discover anything > > > Obviously GAMG is behaving quite differently (1 vs 2 levels and a much > sparser coarse problem in 3.7). > > Could you try one thing for me before we start running it? Run with > > -poisson_mg_coarse_sub_pc_type svd > > and see what happens on 2 procs for 3.7? > > ?Hi Matt, With -poisson_mg_coarse_sub_pc_type svd ?,? it ran normally on 1 proc but not on 2 procs the end of the output says: "** On entry to DGESVD parameter number 6 had an illegal value " I attached the log file. ? > Thanks, > > Matt > > >> >> > PETSc (both 3.5.4 and 3.7.4) was configured as follow: >> > export PETSC_ARCH="linux-gnu-dbg" >> > ./configure --PETSC_ARCH=$PETSC_ARCH \ >> > --with-cc=gcc \ >> > --with-cxx=g++ \ >> > --with-fc=gfortran \ >> > --COPTFLAGS="-O0" \ >> > --CXXOPTFLAGS="-O0" \ >> > --FOPTFLAGS="-O0" \ >> > --with-debugging=1 \ >> > --download-fblaslapack \ >> > --download-mpich \ >> > --download-hypre \ >> > --download-yaml \ >> > --with-x=1 >> > >> > Our code was built using the following commands:? >> > mkdir petibm-build >> > cd petibm-build >> > ?export PETSC_DIR= >> > export PETSC_ARCH="linux-gnu-dbg" >> > export PETIBM_DIR= >> > $PETIBM_DIR/configure --prefix=$PWD \ >> > CXX=$PETSC_DIR/$PETSC_ARCH/bin/mpicxx \ >> > CXXFLAGS="-g -O0 -std=c++11"? >> > make all >> > make install >> > >> > ?Then >> > cd examples >> > make examples? >> > >> > ?The example of the lid-driven cavity I was talking about can be found >> in the folder `examples/2d/convergence/lidDrivenCavity20/20/`? >> > >> > To run it: >> > mpiexec -n N /bin/petibm2d -directory >> >> > >> > Let me know if you need more info. Thank you. >> > >> > Barry >> > >> > >> > >> > >> > >> > >> > >> > >> > > On Oct 25, 2016, at 4:38 PM, Olivier Mesnard < >> olivier.mesnard8 at gmail.com> wrote: >> > > >> > > Hi all, >> > > >> > > We develop a CFD code using the PETSc library that solves the >> Navier-Stokes equations using the fractional-step method from Perot (1993). >> > > At each time-step, we solve two systems: one for the velocity field, >> the other, a Poisson system, for the pressure field. >> > > One of our test-cases is a 2D lid-driven cavity flow (Re=100) on a >> 20x20 grid using 1 or 2 procs. >> > > For the Poisson system, we usually use CG preconditioned with GAMG. >> > > >> > > So far, we have been using PETSc-3.5.4, and we would like to update >> the code with the latest release: 3.7.4. >> > > >> > > As suggested in the changelog of 3.6, we replaced the routine >> `KSPSetNullSpace()` with `MatSetNullSpace()`. >> > > >> > > Here is the list of options we use to configure the two solvers: >> > > * Velocity solver: prefix `-velocity_` >> > > -velocity_ksp_type bcgs >> > > -velocity_ksp_rtol 1.0E-08 >> > > -velocity_ksp_atol 0.0 >> > > -velocity_ksp_max_it 10000 >> > > -velocity_pc_type jacobi >> > > -velocity_ksp_view >> > > -velocity_ksp_monitor_true_residual >> > > -velocity_ksp_converged_reason >> > > * Poisson solver: prefix `-poisson_` >> > > -poisson_ksp_type cg >> > > -poisson_ksp_rtol 1.0E-08 >> > > -poisson_ksp_atol 0.0 >> > > -poisson_ksp_max_it 20000 >> > > -poisson_pc_type gamg >> > > -poisson_pc_gamg_type agg >> > > -poisson_pc_gamg_agg_nsmooths 1 >> > > -poissonksp_view >> > > -poisson_ksp_monitor_true_residual >> > > -poisson_ksp_converged_reason >> > > >> > > With 3.5.4, the case runs normally on 1 or 2 procs. >> > > With 3.7.4, the case runs normally on 1 proc but not on 2. >> > > Why? The Poisson solver diverges because of an indefinite >> preconditioner (only with 2 procs). >> > > >> > > We also saw that the routine `MatSetNullSpace()` was already >> available in 3.5.4. >> > > With 3.5.4, replacing `KSPSetNullSpace()` with `MatSetNullSpace()` >> led to the Poisson solver diverging because of an indefinite matrix (on 1 >> and 2 procs). >> > > >> > > Thus, we were wondering if we needed to update something else for the >> KSP, and not just modifying the name of the routine? >> > > >> > > I have attached the output files from the different cases: >> > > * `run-petsc-3.5.4-n1.log` (3.5.4, `KSPSetNullSpace()`, n=1) >> > > * `run-petsc-3.5.4-n2.log` >> > > * `run-petsc-3.5.4-nsp-n1.log` (3.5.4, `MatSetNullSpace()`, n=1) >> > > * `run-petsc-3.5.4-nsp-n2.log` >> > > * `run-petsc-3.7.4-n1.log` (3.7.4, `MatSetNullSpace()`, n=1) >> > > * `run-petsc-3.7.4-n2.log` >> > > >> > > Thank you for your help, >> > > Olivier >> > > > .5.4-nsp-n1.log>> -n1.log> >> > >> > >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run-petsc-3.7.4-n2.log Type: application/octet-stream Size: 4114 bytes Desc: not available URL: From ztdepyahoo at 163.com Tue Oct 25 20:12:30 2016 From: ztdepyahoo at 163.com (ztdepyahoo at 163.com) Date: Wed, 26 Oct 2016 09:12:30 +0800 Subject: [petsc-users] How to scatter values In-Reply-To: <7a54264f.c2ae.157e7792950.Coremail.ztdepyahoo@163.com> References: <7a54264f.c2ae.157e7792950.Coremail.ztdepyahoo@163.com> Message-ID: <581002FE.1070803@163.com> Dear professor: I?????????????????????????????? ???????????????????????????? ??????????????????????????????? ??? ??????????????????????????? ??????????????????????????????? ????????????????????????????????? ?? ???????????????????????????????????? ???????????????????????????????????? ??????????????????????????????? ?????????????????????? Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Oct 25 20:28:50 2016 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 25 Oct 2016 20:28:50 -0500 Subject: [petsc-users] Moving from KSPSetNullSpace to MatSetNullSpace In-Reply-To: References: <15FB9A49-3162-41B0-B221-9FA01C271714@mcs.anl.gov> <49994548-2BDA-4996-A4A9-6BB29D862CD5@mcs.anl.gov> Message-ID: On Tue, Oct 25, 2016 at 7:57 PM, Olivier Mesnard wrote: > On 25 October 2016 at 19:59, Matthew Knepley wrote: > >> On Tue, Oct 25, 2016 at 6:22 PM, Barry Smith wrote: >> >>> >>> > On Oct 25, 2016, at 5:39 PM, Olivier Mesnard < >>> olivier.mesnard8 at gmail.com> wrote: >>> > >>> > On 25 October 2016 at 17:51, Barry Smith wrote: >>> > >>> > Olivier, >>> > >>> > In theory you do not need to change anything else. Are you using a >>> different matrix object for the velocity_ksp object than the poisson_ksp >>> object? >>> > >>> > ?The matrix is different for the velocity_ksp and the poisson_ksp?. >>> > >>> > The code change in PETSc is very little but we have a report from >>> another CFD user who also had problems with the change so there may be some >>> subtle bug that we can't figure out causing things to not behave properly. >>> > >>> > First run the 3.7.4 code with -poisson_ksp_view and verify that >>> when it prints the matrix information it prints something like has attached >>> null space if it does not print that it means that somehow the matrix is >>> not properly getting the matrix attached. >>> > >>> > ?When running with 3.7.4 and -poisson_ksp_view, the output shows that >>> the nullspace is not attached to the KSP (as it was with 3.5.4)?; however >>> the print statement is now under the Mat info (which is expected when >>> moving from KSPSetNullSpace to MatSetNullSpace?). >>> >>> Good, this is how it should be. >>> > >>> > Though older versions had MatSetNullSpace() they didn't >>> necessarily associate it with the KSP so it was not expected to work as a >>> replacement for KSPSetNullSpace() with older versions. >>> > >>> > Because our other user had great difficulty trying to debug the >>> issue feel free to send us at petsc-maint at mcs.anl.gov your code with >>> instructions on building and running and we can try to track down the >>> problem. Better than hours and hours spent with fruitless email. We will, >>> of course, not distribute the code and will delete in when we are finished >>> with it. >>> > >>> > ?The code is open-source and hosted on GitHub ( >>> https://github.com/barbagroup/PetIBM)?. >>> > I just pushed the branches `feature-compatible-petsc-3.7` and >>> `revert-compatible-petsc-3.5` that I used to observe this problem. >>> > >>> Thanks, I'll get back to you if I discover anything >> >> >> Obviously GAMG is behaving quite differently (1 vs 2 levels and a much >> sparser coarse problem in 3.7). >> >> Could you try one thing for me before we start running it? Run with >> >> -poisson_mg_coarse_sub_pc_type svd >> >> and see what happens on 2 procs for 3.7? >> >> ?Hi Matt, > > With > -poisson_mg_coarse_sub_pc_type svd > ?,? > it ran normally on 1 proc but not on 2 procs the end of the output says: > > "** On entry to DGESVD parameter number 6 had an illegal value > " > Something is wrong with your 3.7 installation. That parameter is a simple size, so I suspect memory corruption. Run with valgrind. Thanks, Matt > I attached the log file. > ? > > >> Thanks, >> >> Matt >> >> >>> >>> > PETSc (both 3.5.4 and 3.7.4) was configured as follow: >>> > export PETSC_ARCH="linux-gnu-dbg" >>> > ./configure --PETSC_ARCH=$PETSC_ARCH \ >>> > --with-cc=gcc \ >>> > --with-cxx=g++ \ >>> > --with-fc=gfortran \ >>> > --COPTFLAGS="-O0" \ >>> > --CXXOPTFLAGS="-O0" \ >>> > --FOPTFLAGS="-O0" \ >>> > --with-debugging=1 \ >>> > --download-fblaslapack \ >>> > --download-mpich \ >>> > --download-hypre \ >>> > --download-yaml \ >>> > --with-x=1 >>> > >>> > Our code was built using the following commands:? >>> > mkdir petibm-build >>> > cd petibm-build >>> > ?export PETSC_DIR= >>> > export PETSC_ARCH="linux-gnu-dbg" >>> > export PETIBM_DIR= >>> > $PETIBM_DIR/configure --prefix=$PWD \ >>> > CXX=$PETSC_DIR/$PETSC_ARCH/bin/mpicxx \ >>> > CXXFLAGS="-g -O0 -std=c++11"? >>> > make all >>> > make install >>> > >>> > ?Then >>> > cd examples >>> > make examples? >>> > >>> > ?The example of the lid-driven cavity I was talking about can be found >>> in the folder `examples/2d/convergence/lidDrivenCavity20/20/`? >>> > >>> > To run it: >>> > mpiexec -n N /bin/petibm2d -directory >>> >>> > >>> > Let me know if you need more info. Thank you. >>> > >>> > Barry >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > > On Oct 25, 2016, at 4:38 PM, Olivier Mesnard < >>> olivier.mesnard8 at gmail.com> wrote: >>> > > >>> > > Hi all, >>> > > >>> > > We develop a CFD code using the PETSc library that solves the >>> Navier-Stokes equations using the fractional-step method from Perot (1993). >>> > > At each time-step, we solve two systems: one for the velocity field, >>> the other, a Poisson system, for the pressure field. >>> > > One of our test-cases is a 2D lid-driven cavity flow (Re=100) on a >>> 20x20 grid using 1 or 2 procs. >>> > > For the Poisson system, we usually use CG preconditioned with GAMG. >>> > > >>> > > So far, we have been using PETSc-3.5.4, and we would like to update >>> the code with the latest release: 3.7.4. >>> > > >>> > > As suggested in the changelog of 3.6, we replaced the routine >>> `KSPSetNullSpace()` with `MatSetNullSpace()`. >>> > > >>> > > Here is the list of options we use to configure the two solvers: >>> > > * Velocity solver: prefix `-velocity_` >>> > > -velocity_ksp_type bcgs >>> > > -velocity_ksp_rtol 1.0E-08 >>> > > -velocity_ksp_atol 0.0 >>> > > -velocity_ksp_max_it 10000 >>> > > -velocity_pc_type jacobi >>> > > -velocity_ksp_view >>> > > -velocity_ksp_monitor_true_residual >>> > > -velocity_ksp_converged_reason >>> > > * Poisson solver: prefix `-poisson_` >>> > > -poisson_ksp_type cg >>> > > -poisson_ksp_rtol 1.0E-08 >>> > > -poisson_ksp_atol 0.0 >>> > > -poisson_ksp_max_it 20000 >>> > > -poisson_pc_type gamg >>> > > -poisson_pc_gamg_type agg >>> > > -poisson_pc_gamg_agg_nsmooths 1 >>> > > -poissonksp_view >>> > > -poisson_ksp_monitor_true_residual >>> > > -poisson_ksp_converged_reason >>> > > >>> > > With 3.5.4, the case runs normally on 1 or 2 procs. >>> > > With 3.7.4, the case runs normally on 1 proc but not on 2. >>> > > Why? The Poisson solver diverges because of an indefinite >>> preconditioner (only with 2 procs). >>> > > >>> > > We also saw that the routine `MatSetNullSpace()` was already >>> available in 3.5.4. >>> > > With 3.5.4, replacing `KSPSetNullSpace()` with `MatSetNullSpace()` >>> led to the Poisson solver diverging because of an indefinite matrix (on 1 >>> and 2 procs). >>> > > >>> > > Thus, we were wondering if we needed to update something else for >>> the KSP, and not just modifying the name of the routine? >>> > > >>> > > I have attached the output files from the different cases: >>> > > * `run-petsc-3.5.4-n1.log` (3.5.4, `KSPSetNullSpace()`, n=1) >>> > > * `run-petsc-3.5.4-n2.log` >>> > > * `run-petsc-3.5.4-nsp-n1.log` (3.5.4, `MatSetNullSpace()`, n=1) >>> > > * `run-petsc-3.5.4-nsp-n2.log` >>> > > * `run-petsc-3.7.4-n1.log` (3.7.4, `MatSetNullSpace()`, n=1) >>> > > * `run-petsc-3.7.4-n2.log` >>> > > >>> > > Thank you for your help, >>> > > Olivier >>> > > >> .5.4-nsp-n1.log>>> -n1.log> >>> > >>> > >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Oct 25 21:11:19 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 25 Oct 2016 21:11:19 -0500 Subject: [petsc-users] Moving from KSPSetNullSpace to MatSetNullSpace In-Reply-To: References: <15FB9A49-3162-41B0-B221-9FA01C271714@mcs.anl.gov> <49994548-2BDA-4996-A4A9-6BB29D862CD5@mcs.anl.gov> Message-ID: > On Oct 25, 2016, at 8:28 PM, Matthew Knepley wrote: > > On Tue, Oct 25, 2016 at 7:57 PM, Olivier Mesnard wrote: > On 25 October 2016 at 19:59, Matthew Knepley wrote: > On Tue, Oct 25, 2016 at 6:22 PM, Barry Smith wrote: > > > On Oct 25, 2016, at 5:39 PM, Olivier Mesnard wrote: > > > > On 25 October 2016 at 17:51, Barry Smith wrote: > > > > Olivier, > > > > In theory you do not need to change anything else. Are you using a different matrix object for the velocity_ksp object than the poisson_ksp object? > > > > ?The matrix is different for the velocity_ksp and the poisson_ksp?. > > > > The code change in PETSc is very little but we have a report from another CFD user who also had problems with the change so there may be some subtle bug that we can't figure out causing things to not behave properly. > > > > First run the 3.7.4 code with -poisson_ksp_view and verify that when it prints the matrix information it prints something like has attached null space if it does not print that it means that somehow the matrix is not properly getting the matrix attached. > > > > ?When running with 3.7.4 and -poisson_ksp_view, the output shows that the nullspace is not attached to the KSP (as it was with 3.5.4)?; however the print statement is now under the Mat info (which is expected when moving from KSPSetNullSpace to MatSetNullSpace?). > > Good, this is how it should be. > > > > Though older versions had MatSetNullSpace() they didn't necessarily associate it with the KSP so it was not expected to work as a replacement for KSPSetNullSpace() with older versions. > > > > Because our other user had great difficulty trying to debug the issue feel free to send us at petsc-maint at mcs.anl.gov your code with instructions on building and running and we can try to track down the problem. Better than hours and hours spent with fruitless email. We will, of course, not distribute the code and will delete in when we are finished with it. > > > > ?The code is open-source and hosted on GitHub (https://github.com/barbagroup/PetIBM)?. > > I just pushed the branches `feature-compatible-petsc-3.7` and `revert-compatible-petsc-3.5` that I used to observe this problem. > > > Thanks, I'll get back to you if I discover anything > > Obviously GAMG is behaving quite differently (1 vs 2 levels and a much sparser coarse problem in 3.7). > > Could you try one thing for me before we start running it? Run with > > -poisson_mg_coarse_sub_pc_type svd > > and see what happens on 2 procs for 3.7? > > ?Hi Matt, > > With -poisson_mg_coarse_sub_pc_type svd?,? > it ran normally on 1 proc but not on 2 procs the end of the output says: > > "** On entry to DGESVD parameter number 6 had an illegal value" > > Something is wrong with your 3.7 installation. That parameter is a simple size, so I suspect > memory corruption. Run with valgrind. Matt, This is our bug, I am fixing it now. Turns out DGESVD will error out if you give it a vector size of zero (and since GAMG squeezes the coarse matrix to one process leaving the others empty) so PETSc needs to just return when the matrix is of zero size instead of calling DGESVD. Barry > > Thanks, > > Matt > > I attached the log file. > ? > Thanks, > > Matt > > > > PETSc (both 3.5.4 and 3.7.4) was configured as follow: > > export PETSC_ARCH="linux-gnu-dbg" > > ./configure --PETSC_ARCH=$PETSC_ARCH \ > > --with-cc=gcc \ > > --with-cxx=g++ \ > > --with-fc=gfortran \ > > --COPTFLAGS="-O0" \ > > --CXXOPTFLAGS="-O0" \ > > --FOPTFLAGS="-O0" \ > > --with-debugging=1 \ > > --download-fblaslapack \ > > --download-mpich \ > > --download-hypre \ > > --download-yaml \ > > --with-x=1 > > > > Our code was built using the following commands:? > > mkdir petibm-build > > cd petibm-build > > ?export PETSC_DIR= > > export PETSC_ARCH="linux-gnu-dbg" > > export PETIBM_DIR= > > $PETIBM_DIR/configure --prefix=$PWD \ > > CXX=$PETSC_DIR/$PETSC_ARCH/bin/mpicxx \ > > CXXFLAGS="-g -O0 -std=c++11"? > > make all > > make install > > > > ?Then > > cd examples > > make examples? > > > > ?The example of the lid-driven cavity I was talking about can be found in the folder `examples/2d/convergence/lidDrivenCavity20/20/`? > > > > To run it: > > mpiexec -n N /bin/petibm2d -directory > > > > Let me know if you need more info. Thank you. > > > > Barry > > > > > > > > > > > > > > > > > > > On Oct 25, 2016, at 4:38 PM, Olivier Mesnard wrote: > > > > > > Hi all, > > > > > > We develop a CFD code using the PETSc library that solves the Navier-Stokes equations using the fractional-step method from Perot (1993). > > > At each time-step, we solve two systems: one for the velocity field, the other, a Poisson system, for the pressure field. > > > One of our test-cases is a 2D lid-driven cavity flow (Re=100) on a 20x20 grid using 1 or 2 procs. > > > For the Poisson system, we usually use CG preconditioned with GAMG. > > > > > > So far, we have been using PETSc-3.5.4, and we would like to update the code with the latest release: 3.7.4. > > > > > > As suggested in the changelog of 3.6, we replaced the routine `KSPSetNullSpace()` with `MatSetNullSpace()`. > > > > > > Here is the list of options we use to configure the two solvers: > > > * Velocity solver: prefix `-velocity_` > > > -velocity_ksp_type bcgs > > > -velocity_ksp_rtol 1.0E-08 > > > -velocity_ksp_atol 0.0 > > > -velocity_ksp_max_it 10000 > > > -velocity_pc_type jacobi > > > -velocity_ksp_view > > > -velocity_ksp_monitor_true_residual > > > -velocity_ksp_converged_reason > > > * Poisson solver: prefix `-poisson_` > > > -poisson_ksp_type cg > > > -poisson_ksp_rtol 1.0E-08 > > > -poisson_ksp_atol 0.0 > > > -poisson_ksp_max_it 20000 > > > -poisson_pc_type gamg > > > -poisson_pc_gamg_type agg > > > -poisson_pc_gamg_agg_nsmooths 1 > > > -poissonksp_view > > > -poisson_ksp_monitor_true_residual > > > -poisson_ksp_converged_reason > > > > > > With 3.5.4, the case runs normally on 1 or 2 procs. > > > With 3.7.4, the case runs normally on 1 proc but not on 2. > > > Why? The Poisson solver diverges because of an indefinite preconditioner (only with 2 procs). > > > > > > We also saw that the routine `MatSetNullSpace()` was already available in 3.5.4. > > > With 3.5.4, replacing `KSPSetNullSpace()` with `MatSetNullSpace()` led to the Poisson solver diverging because of an indefinite matrix (on 1 and 2 procs). > > > > > > Thus, we were wondering if we needed to update something else for the KSP, and not just modifying the name of the routine? > > > > > > I have attached the output files from the different cases: > > > * `run-petsc-3.5.4-n1.log` (3.5.4, `KSPSetNullSpace()`, n=1) > > > * `run-petsc-3.5.4-n2.log` > > > * `run-petsc-3.5.4-nsp-n1.log` (3.5.4, `MatSetNullSpace()`, n=1) > > > * `run-petsc-3.5.4-nsp-n2.log` > > > * `run-petsc-3.7.4-n1.log` (3.7.4, `MatSetNullSpace()`, n=1) > > > * `run-petsc-3.7.4-n2.log` > > > > > > Thank you for your help, > > > Olivier > > > > > > > > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener From bsmith at mcs.anl.gov Tue Oct 25 21:20:10 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 25 Oct 2016 21:20:10 -0500 Subject: [petsc-users] Moving from KSPSetNullSpace to MatSetNullSpace In-Reply-To: References: <15FB9A49-3162-41B0-B221-9FA01C271714@mcs.anl.gov> Message-ID: <2ACB6A7B-58CD-40BD-8E0C-7375CFB488BA@mcs.anl.gov> Olivier, Ok, so I've run the code in the debugger, but I don't not think the problem is with the null space. The code is correctly removing the null space on all the levels of multigrid. I think the error comes from changes in the behavior of GAMG. GAMG is relatively rapidly moving with different defaults and even different code with each release. To check this I added the option -poisson_mg_levels_pc_sor_lits 2 and it stopped complaining about KSP_DIVERGED_INDEFINITE_PC. I've seen this before where the smoother is "too weak" and so the net result is that action of the preconditioner is indefinite. Mark Adams probably has better suggestions on how to make the preconditioner behave. Note you could also use a KSP of richardson or gmres instead of cg since they don't care about this indefinite business. Barry > On Oct 25, 2016, at 5:39 PM, Olivier Mesnard wrote: > > On 25 October 2016 at 17:51, Barry Smith wrote: > > Olivier, > > In theory you do not need to change anything else. Are you using a different matrix object for the velocity_ksp object than the poisson_ksp object? > > ?The matrix is different for the velocity_ksp and the poisson_ksp?. > > The code change in PETSc is very little but we have a report from another CFD user who also had problems with the change so there may be some subtle bug that we can't figure out causing things to not behave properly. > > First run the 3.7.4 code with -poisson_ksp_view and verify that when it prints the matrix information it prints something like has attached null space if it does not print that it means that somehow the matrix is not properly getting the matrix attached. > > ?When running with 3.7.4 and -poisson_ksp_view, the output shows that the nullspace is not attached to the KSP (as it was with 3.5.4)?; however the print statement is now under the Mat info (which is expected when moving from KSPSetNullSpace to MatSetNullSpace?). > > Though older versions had MatSetNullSpace() they didn't necessarily associate it with the KSP so it was not expected to work as a replacement for KSPSetNullSpace() with older versions. > > Because our other user had great difficulty trying to debug the issue feel free to send us at petsc-maint at mcs.anl.gov your code with instructions on building and running and we can try to track down the problem. Better than hours and hours spent with fruitless email. We will, of course, not distribute the code and will delete in when we are finished with it. > > ?The code is open-source and hosted on GitHub (https://github.com/barbagroup/PetIBM)?. > I just pushed the branches `feature-compatible-petsc-3.7` and `revert-compatible-petsc-3.5` that I used to observe this problem. > > PETSc (both 3.5.4 and 3.7.4) was configured as follow: > export PETSC_ARCH="linux-gnu-dbg" > ./configure --PETSC_ARCH=$PETSC_ARCH \ > --with-cc=gcc \ > --with-cxx=g++ \ > --with-fc=gfortran \ > --COPTFLAGS="-O0" \ > --CXXOPTFLAGS="-O0" \ > --FOPTFLAGS="-O0" \ > --with-debugging=1 \ > --download-fblaslapack \ > --download-mpich \ > --download-hypre \ > --download-yaml \ > --with-x=1 > > Our code was built using the following commands:? > mkdir petibm-build > cd petibm-build > ?export PETSC_DIR= > export PETSC_ARCH="linux-gnu-dbg" > export PETIBM_DIR= > $PETIBM_DIR/configure --prefix=$PWD \ > CXX=$PETSC_DIR/$PETSC_ARCH/bin/mpicxx \ > CXXFLAGS="-g -O0 -std=c++11"? > make all > make install > > ?Then > cd examples > make examples? > > ?The example of the lid-driven cavity I was talking about can be found in the folder `examples/2d/convergence/lidDrivenCavity20/20/`? > > To run it: > mpiexec -n N /bin/petibm2d -directory > > Let me know if you need more info. Thank you. > > Barry > > > > > > > > > > On Oct 25, 2016, at 4:38 PM, Olivier Mesnard wrote: > > > > Hi all, > > > > We develop a CFD code using the PETSc library that solves the Navier-Stokes equations using the fractional-step method from Perot (1993). > > At each time-step, we solve two systems: one for the velocity field, the other, a Poisson system, for the pressure field. > > One of our test-cases is a 2D lid-driven cavity flow (Re=100) on a 20x20 grid using 1 or 2 procs. > > For the Poisson system, we usually use CG preconditioned with GAMG. > > > > So far, we have been using PETSc-3.5.4, and we would like to update the code with the latest release: 3.7.4. > > > > As suggested in the changelog of 3.6, we replaced the routine `KSPSetNullSpace()` with `MatSetNullSpace()`. > > > > Here is the list of options we use to configure the two solvers: > > * Velocity solver: prefix `-velocity_` > > -velocity_ksp_type bcgs > > -velocity_ksp_rtol 1.0E-08 > > -velocity_ksp_atol 0.0 > > -velocity_ksp_max_it 10000 > > -velocity_pc_type jacobi > > -velocity_ksp_view > > -velocity_ksp_monitor_true_residual > > -velocity_ksp_converged_reason > > * Poisson solver: prefix `-poisson_` > > -poisson_ksp_type cg > > -poisson_ksp_rtol 1.0E-08 > > -poisson_ksp_atol 0.0 > > -poisson_ksp_max_it 20000 > > -poisson_pc_type gamg > > -poisson_pc_gamg_type agg > > -poisson_pc_gamg_agg_nsmooths 1 > > -poissonksp_view > > -poisson_ksp_monitor_true_residual > > -poisson_ksp_converged_reason > > > > With 3.5.4, the case runs normally on 1 or 2 procs. > > With 3.7.4, the case runs normally on 1 proc but not on 2. > > Why? The Poisson solver diverges because of an indefinite preconditioner (only with 2 procs). > > > > We also saw that the routine `MatSetNullSpace()` was already available in 3.5.4. > > With 3.5.4, replacing `KSPSetNullSpace()` with `MatSetNullSpace()` led to the Poisson solver diverging because of an indefinite matrix (on 1 and 2 procs). > > > > Thus, we were wondering if we needed to update something else for the KSP, and not just modifying the name of the routine? > > > > I have attached the output files from the different cases: > > * `run-petsc-3.5.4-n1.log` (3.5.4, `KSPSetNullSpace()`, n=1) > > * `run-petsc-3.5.4-n2.log` > > * `run-petsc-3.5.4-nsp-n1.log` (3.5.4, `MatSetNullSpace()`, n=1) > > * `run-petsc-3.5.4-nsp-n2.log` > > * `run-petsc-3.7.4-n1.log` (3.7.4, `MatSetNullSpace()`, n=1) > > * `run-petsc-3.7.4-n2.log` > > > > Thank you for your help, > > Olivier > > > > From knepley at gmail.com Tue Oct 25 21:22:19 2016 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 25 Oct 2016 21:22:19 -0500 Subject: [petsc-users] Moving from KSPSetNullSpace to MatSetNullSpace In-Reply-To: <2ACB6A7B-58CD-40BD-8E0C-7375CFB488BA@mcs.anl.gov> References: <15FB9A49-3162-41B0-B221-9FA01C271714@mcs.anl.gov> <2ACB6A7B-58CD-40BD-8E0C-7375CFB488BA@mcs.anl.gov> Message-ID: On Tue, Oct 25, 2016 at 9:20 PM, Barry Smith wrote: > > Olivier, > > Ok, so I've run the code in the debugger, but I don't not think the > problem is with the null space. The code is correctly removing the null > space on all the levels of multigrid. > > I think the error comes from changes in the behavior of GAMG. GAMG is > relatively rapidly moving with different defaults and even different code > with each release. > > To check this I added the option -poisson_mg_levels_pc_sor_lits 2 and > it stopped complaining about KSP_DIVERGED_INDEFINITE_PC. I've seen this > before where the smoother is "too weak" and so the net result is that > action of the preconditioner is indefinite. Mark Adams probably has better > suggestions on how to make the preconditioner behave. Note you could also > use a KSP of richardson or gmres instead of cg since they don't care about > this indefinite business. I think old GAMG squared the graph by default. You can see in the 3.7 output that it does not. Matt > > Barry > > > > > On Oct 25, 2016, at 5:39 PM, Olivier Mesnard > wrote: > > > > On 25 October 2016 at 17:51, Barry Smith wrote: > > > > Olivier, > > > > In theory you do not need to change anything else. Are you using a > different matrix object for the velocity_ksp object than the poisson_ksp > object? > > > > ?The matrix is different for the velocity_ksp and the poisson_ksp?. > > > > The code change in PETSc is very little but we have a report from > another CFD user who also had problems with the change so there may be some > subtle bug that we can't figure out causing things to not behave properly. > > > > First run the 3.7.4 code with -poisson_ksp_view and verify that when > it prints the matrix information it prints something like has attached null > space if it does not print that it means that somehow the matrix is not > properly getting the matrix attached. > > > > ?When running with 3.7.4 and -poisson_ksp_view, the output shows that > the nullspace is not attached to the KSP (as it was with 3.5.4)?; however > the print statement is now under the Mat info (which is expected when > moving from KSPSetNullSpace to MatSetNullSpace?). > > > > Though older versions had MatSetNullSpace() they didn't necessarily > associate it with the KSP so it was not expected to work as a replacement > for KSPSetNullSpace() with older versions. > > > > Because our other user had great difficulty trying to debug the > issue feel free to send us at petsc-maint at mcs.anl.gov your code with > instructions on building and running and we can try to track down the > problem. Better than hours and hours spent with fruitless email. We will, > of course, not distribute the code and will delete in when we are finished > with it. > > > > ?The code is open-source and hosted on GitHub (https://github.com/ > barbagroup/PetIBM)?. > > I just pushed the branches `feature-compatible-petsc-3.7` and > `revert-compatible-petsc-3.5` that I used to observe this problem. > > > > PETSc (both 3.5.4 and 3.7.4) was configured as follow: > > export PETSC_ARCH="linux-gnu-dbg" > > ./configure --PETSC_ARCH=$PETSC_ARCH \ > > --with-cc=gcc \ > > --with-cxx=g++ \ > > --with-fc=gfortran \ > > --COPTFLAGS="-O0" \ > > --CXXOPTFLAGS="-O0" \ > > --FOPTFLAGS="-O0" \ > > --with-debugging=1 \ > > --download-fblaslapack \ > > --download-mpich \ > > --download-hypre \ > > --download-yaml \ > > --with-x=1 > > > > Our code was built using the following commands:? > > mkdir petibm-build > > cd petibm-build > > ?export PETSC_DIR= > > export PETSC_ARCH="linux-gnu-dbg" > > export PETIBM_DIR= > > $PETIBM_DIR/configure --prefix=$PWD \ > > CXX=$PETSC_DIR/$PETSC_ARCH/bin/mpicxx \ > > CXXFLAGS="-g -O0 -std=c++11"? > > make all > > make install > > > > ?Then > > cd examples > > make examples? > > > > ?The example of the lid-driven cavity I was talking about can be found > in the folder `examples/2d/convergence/lidDrivenCavity20/20/`? > > > > To run it: > > mpiexec -n N /bin/petibm2d -directory > > > > > Let me know if you need more info. Thank you. > > > > Barry > > > > > > > > > > > > > > > > > > > On Oct 25, 2016, at 4:38 PM, Olivier Mesnard < > olivier.mesnard8 at gmail.com> wrote: > > > > > > Hi all, > > > > > > We develop a CFD code using the PETSc library that solves the > Navier-Stokes equations using the fractional-step method from Perot (1993). > > > At each time-step, we solve two systems: one for the velocity field, > the other, a Poisson system, for the pressure field. > > > One of our test-cases is a 2D lid-driven cavity flow (Re=100) on a > 20x20 grid using 1 or 2 procs. > > > For the Poisson system, we usually use CG preconditioned with GAMG. > > > > > > So far, we have been using PETSc-3.5.4, and we would like to update > the code with the latest release: 3.7.4. > > > > > > As suggested in the changelog of 3.6, we replaced the routine > `KSPSetNullSpace()` with `MatSetNullSpace()`. > > > > > > Here is the list of options we use to configure the two solvers: > > > * Velocity solver: prefix `-velocity_` > > > -velocity_ksp_type bcgs > > > -velocity_ksp_rtol 1.0E-08 > > > -velocity_ksp_atol 0.0 > > > -velocity_ksp_max_it 10000 > > > -velocity_pc_type jacobi > > > -velocity_ksp_view > > > -velocity_ksp_monitor_true_residual > > > -velocity_ksp_converged_reason > > > * Poisson solver: prefix `-poisson_` > > > -poisson_ksp_type cg > > > -poisson_ksp_rtol 1.0E-08 > > > -poisson_ksp_atol 0.0 > > > -poisson_ksp_max_it 20000 > > > -poisson_pc_type gamg > > > -poisson_pc_gamg_type agg > > > -poisson_pc_gamg_agg_nsmooths 1 > > > -poissonksp_view > > > -poisson_ksp_monitor_true_residual > > > -poisson_ksp_converged_reason > > > > > > With 3.5.4, the case runs normally on 1 or 2 procs. > > > With 3.7.4, the case runs normally on 1 proc but not on 2. > > > Why? The Poisson solver diverges because of an indefinite > preconditioner (only with 2 procs). > > > > > > We also saw that the routine `MatSetNullSpace()` was already available > in 3.5.4. > > > With 3.5.4, replacing `KSPSetNullSpace()` with `MatSetNullSpace()` led > to the Poisson solver diverging because of an indefinite matrix (on 1 and 2 > procs). > > > > > > Thus, we were wondering if we needed to update something else for the > KSP, and not just modifying the name of the routine? > > > > > > I have attached the output files from the different cases: > > > * `run-petsc-3.5.4-n1.log` (3.5.4, `KSPSetNullSpace()`, n=1) > > > * `run-petsc-3.5.4-n2.log` > > > * `run-petsc-3.5.4-nsp-n1.log` (3.5.4, `MatSetNullSpace()`, n=1) > > > * `run-petsc-3.5.4-nsp-n2.log` > > > * `run-petsc-3.7.4-n1.log` (3.7.4, `MatSetNullSpace()`, n=1) > > > * `run-petsc-3.7.4-n2.log` > > > > > > Thank you for your help, > > > Olivier > > > 3.5.4-nsp-n1.log> 4-n1.log> > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From C.Klaij at marin.nl Wed Oct 26 03:43:21 2016 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Wed, 26 Oct 2016 08:43:21 +0000 Subject: [petsc-users] petsc 3.7.4 with superlu_dist install problem Message-ID: <1477471401674.45879@marin.nl> Satish, I'm having a similar problem with SuperLU_DIST, attached is the configure.log. I've noticed that the extraction of the tarball works fine, yet it gives the "unable to download" message: Checking for a functional SuperLU_DIST Looking for SUPERLU_DIST at git.superlu_dist, hg.superlu_dist or a directory starting with superlu_dist Could not locate an existing copy of SUPERLU_DIST: ['metis-5.1.0-p3', 'parmetis-4.0.3-p3'] Downloading SuperLU_DIST =============================================================================== Trying to download file:///projects/developers/cklaij/ReFRESCO/Dev/trunk/Libs/src/superlu_dist_5.1.2.tar.gz for SUPERLU_DIST =============================================================================== Downloading file:///projects/developers/cklaij/ReFRESCO/Dev/trunk/Libs/src/superlu_dist_5.1.2.tar.gz to /projects/developers/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc/3.7.4-dbg/linux_64bit/externalpackages/_d_superlu_dist_5.1.2.tar.gz Extracting /projects/developers/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc/3.7.4-dbg/linux_64bit/externalpackages/_d_superlu_dist_5.1.2.tar.gz Executing: cd /projects/developers/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc/3.7.4-dbg/linux_64bit/externalpackages; chmod -R a+r SuperLU_DIST_5.1.2;find SuperLU_DIST_5.1.2 -type d -name "*" -exec chmod a+rx {} \; Looking for SUPERLU_DIST at git.superlu_dist, hg.superlu_dist or a directory starting with superlu_dist Could not locate an existing copy of SUPERLU_DIST: ['metis-5.1.0-p3', 'parmetis-4.0.3-p3', 'SuperLU_DIST_5.1.2'] ERROR: Failed to download SUPERLU_DIST Chris dr. ir. Christiaan Klaij | CFD Researcher | Research & Development MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl | http://www.marin.nl MARIN news: http://www.marin.nl/web/News/News-items/SSSRIMARIN-seminar-November-2-Shanghai.htm -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 2594405 bytes Desc: configure.log URL: From mfadams at lbl.gov Wed Oct 26 08:38:00 2016 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 26 Oct 2016 09:38:00 -0400 Subject: [petsc-users] Moving from KSPSetNullSpace to MatSetNullSpace In-Reply-To: References: <15FB9A49-3162-41B0-B221-9FA01C271714@mcs.anl.gov> <2ACB6A7B-58CD-40BD-8E0C-7375CFB488BA@mcs.anl.gov> Message-ID: Please run with -info and grep on GAMG and send that. (-info is very noisy). I'm not sure what is going on here. Divergence with parallelism. Here are some suggestions. Note, you do not need to set the null space for a scalar (Poisson) problem unless you have some special null space. And not getting it set (with the 6 rigid body modes) for the velocity (elasticity) equation will only degrade convergence rates. There was a bug for a while (early 3.7 versions) where the coarse grid was not squeezed onto one processor, which could result in very bad convergence, but not divergence, on multiple processors (the -info output will report the number of 'active pes'). Perhaps this bug is causing divergence for you. We had another subtle bug where the eigen estimates used a bad seed vector, which gives a bad eigen estimate. This would cause divergence but it should not be a parallelism issue (these two bugs were both regressions in around 3.7) Divergence usually comes from a bad eigen estimate in a Chebyshev smoother, but this is not highly correlated with parallelism. The -info data will report the eigen estimates but that is not terribly useful but you can see if it changes (gets larger) with better parameters. Add these parameters, with the correct prefix, and use -options_left to make sure that "there are no unused options": -mg_levels_ksp_type chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 -mg_levels_ksp_chebyshev_esteig 0,.1,0,1.05 chebyshev is the default, as Barry suggested, replace this with gmres or richardson (see below) and verify that this fixed the divergence problem. If your matrix is symmetric positive definite then use '-mg_levels_esteig_ksp_type cg', if not then use the default gmres. Increase/decrease '-mg_levels_esteig_ksp_max_it 10', you should see the estimates increase and converge with higher max_it. Setting this to a huge number, like 100, should fix the bad seed vector problem mentioned above. If eigen estimates are a pain, like with non SPD systems, then richardson is an option (instead of chebyshev): -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_scale 0.6 You then need to play with the scaling (that is what chebyshev does for you essentially). On Tue, Oct 25, 2016 at 10:22 PM, Matthew Knepley wrote: > On Tue, Oct 25, 2016 at 9:20 PM, Barry Smith wrote: > >> >> Olivier, >> >> Ok, so I've run the code in the debugger, but I don't not think the >> problem is with the null space. The code is correctly removing the null >> space on all the levels of multigrid. >> >> I think the error comes from changes in the behavior of GAMG. GAMG is >> relatively rapidly moving with different defaults and even different code >> with each release. >> >> To check this I added the option -poisson_mg_levels_pc_sor_lits 2 and >> it stopped complaining about KSP_DIVERGED_INDEFINITE_PC. I've seen this >> before where the smoother is "too weak" and so the net result is that >> action of the preconditioner is indefinite. Mark Adams probably has better >> suggestions on how to make the preconditioner behave. Note you could also >> use a KSP of richardson or gmres instead of cg since they don't care about >> this indefinite business. > > > I think old GAMG squared the graph by default. You can see in the 3.7 > output that it does not. > > Matt > > >> >> Barry >> >> >> >> > On Oct 25, 2016, at 5:39 PM, Olivier Mesnard < >> olivier.mesnard8 at gmail.com> wrote: >> > >> > On 25 October 2016 at 17:51, Barry Smith wrote: >> > >> > Olivier, >> > >> > In theory you do not need to change anything else. Are you using a >> different matrix object for the velocity_ksp object than the poisson_ksp >> object? >> > >> > ?The matrix is different for the velocity_ksp and the poisson_ksp?. >> > >> > The code change in PETSc is very little but we have a report from >> another CFD user who also had problems with the change so there may be some >> subtle bug that we can't figure out causing things to not behave properly. >> > >> > First run the 3.7.4 code with -poisson_ksp_view and verify that when >> it prints the matrix information it prints something like has attached null >> space if it does not print that it means that somehow the matrix is not >> properly getting the matrix attached. >> > >> > ?When running with 3.7.4 and -poisson_ksp_view, the output shows that >> the nullspace is not attached to the KSP (as it was with 3.5.4)?; however >> the print statement is now under the Mat info (which is expected when >> moving from KSPSetNullSpace to MatSetNullSpace?). >> > >> > Though older versions had MatSetNullSpace() they didn't necessarily >> associate it with the KSP so it was not expected to work as a replacement >> for KSPSetNullSpace() with older versions. >> > >> > Because our other user had great difficulty trying to debug the >> issue feel free to send us at petsc-maint at mcs.anl.gov your code with >> instructions on building and running and we can try to track down the >> problem. Better than hours and hours spent with fruitless email. We will, >> of course, not distribute the code and will delete in when we are finished >> with it. >> > >> > ?The code is open-source and hosted on GitHub ( >> https://github.com/barbagroup/PetIBM)?. >> > I just pushed the branches `feature-compatible-petsc-3.7` and >> `revert-compatible-petsc-3.5` that I used to observe this problem. >> > >> > PETSc (both 3.5.4 and 3.7.4) was configured as follow: >> > export PETSC_ARCH="linux-gnu-dbg" >> > ./configure --PETSC_ARCH=$PETSC_ARCH \ >> > --with-cc=gcc \ >> > --with-cxx=g++ \ >> > --with-fc=gfortran \ >> > --COPTFLAGS="-O0" \ >> > --CXXOPTFLAGS="-O0" \ >> > --FOPTFLAGS="-O0" \ >> > --with-debugging=1 \ >> > --download-fblaslapack \ >> > --download-mpich \ >> > --download-hypre \ >> > --download-yaml \ >> > --with-x=1 >> > >> > Our code was built using the following commands:? >> > mkdir petibm-build >> > cd petibm-build >> > ?export PETSC_DIR= >> > export PETSC_ARCH="linux-gnu-dbg" >> > export PETIBM_DIR= >> > $PETIBM_DIR/configure --prefix=$PWD \ >> > CXX=$PETSC_DIR/$PETSC_ARCH/bin/mpicxx \ >> > CXXFLAGS="-g -O0 -std=c++11"? >> > make all >> > make install >> > >> > ?Then >> > cd examples >> > make examples? >> > >> > ?The example of the lid-driven cavity I was talking about can be found >> in the folder `examples/2d/convergence/lidDrivenCavity20/20/`? >> > >> > To run it: >> > mpiexec -n N /bin/petibm2d -directory >> >> > >> > Let me know if you need more info. Thank you. >> > >> > Barry >> > >> > >> > >> > >> > >> > >> > >> > >> > > On Oct 25, 2016, at 4:38 PM, Olivier Mesnard < >> olivier.mesnard8 at gmail.com> wrote: >> > > >> > > Hi all, >> > > >> > > We develop a CFD code using the PETSc library that solves the >> Navier-Stokes equations using the fractional-step method from Perot (1993). >> > > At each time-step, we solve two systems: one for the velocity field, >> the other, a Poisson system, for the pressure field. >> > > One of our test-cases is a 2D lid-driven cavity flow (Re=100) on a >> 20x20 grid using 1 or 2 procs. >> > > For the Poisson system, we usually use CG preconditioned with GAMG. >> > > >> > > So far, we have been using PETSc-3.5.4, and we would like to update >> the code with the latest release: 3.7.4. >> > > >> > > As suggested in the changelog of 3.6, we replaced the routine >> `KSPSetNullSpace()` with `MatSetNullSpace()`. >> > > >> > > Here is the list of options we use to configure the two solvers: >> > > * Velocity solver: prefix `-velocity_` >> > > -velocity_ksp_type bcgs >> > > -velocity_ksp_rtol 1.0E-08 >> > > -velocity_ksp_atol 0.0 >> > > -velocity_ksp_max_it 10000 >> > > -velocity_pc_type jacobi >> > > -velocity_ksp_view >> > > -velocity_ksp_monitor_true_residual >> > > -velocity_ksp_converged_reason >> > > * Poisson solver: prefix `-poisson_` >> > > -poisson_ksp_type cg >> > > -poisson_ksp_rtol 1.0E-08 >> > > -poisson_ksp_atol 0.0 >> > > -poisson_ksp_max_it 20000 >> > > -poisson_pc_type gamg >> > > -poisson_pc_gamg_type agg >> > > -poisson_pc_gamg_agg_nsmooths 1 >> > > -poissonksp_view >> > > -poisson_ksp_monitor_true_residual >> > > -poisson_ksp_converged_reason >> > > >> > > With 3.5.4, the case runs normally on 1 or 2 procs. >> > > With 3.7.4, the case runs normally on 1 proc but not on 2. >> > > Why? The Poisson solver diverges because of an indefinite >> preconditioner (only with 2 procs). >> > > >> > > We also saw that the routine `MatSetNullSpace()` was already >> available in 3.5.4. >> > > With 3.5.4, replacing `KSPSetNullSpace()` with `MatSetNullSpace()` >> led to the Poisson solver diverging because of an indefinite matrix (on 1 >> and 2 procs). >> > > >> > > Thus, we were wondering if we needed to update something else for the >> KSP, and not just modifying the name of the routine? >> > > >> > > I have attached the output files from the different cases: >> > > * `run-petsc-3.5.4-n1.log` (3.5.4, `KSPSetNullSpace()`, n=1) >> > > * `run-petsc-3.5.4-n2.log` >> > > * `run-petsc-3.5.4-nsp-n1.log` (3.5.4, `MatSetNullSpace()`, n=1) >> > > * `run-petsc-3.5.4-nsp-n2.log` >> > > * `run-petsc-3.7.4-n1.log` (3.7.4, `MatSetNullSpace()`, n=1) >> > > * `run-petsc-3.7.4-n2.log` >> > > >> > > Thank you for your help, >> > > Olivier >> > > > .5.4-nsp-n1.log>> -n1.log> >> > >> > >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Oct 26 10:09:55 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 26 Oct 2016 10:09:55 -0500 Subject: [petsc-users] petsc 3.7.4 with superlu_dist install problem In-Reply-To: <1477471401674.45879@marin.nl> References: <1477471401674.45879@marin.nl> Message-ID: As you can see - the dir names don't match. petsc-3.7 uses: https://github.com/xiaoyeli/superlu_dist/archive/0b5369f.tar.gz If you wish to try version 5.1.2 [which is not the default for this version of PETSc] - you can try: https://github.com/xiaoyeli/superlu_dist/archive/v5.1.2.tar.gz Alternatively: cd /projects/developers/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc/3.7.4-dbg/linux_64bit/externalpackages mv SuperLU_DIST_5.1.2 superlu_dist_5.1.2 rerun configure [with the same options as before] Satish On Wed, 26 Oct 2016, Klaij, Christiaan wrote: > Satish, > > I'm having a similar problem with SuperLU_DIST, attached is the > configure.log. I've noticed that the extraction of the tarball > works fine, yet it gives the "unable to download" message: > > Checking for a functional SuperLU_DIST > Looking for SUPERLU_DIST at git.superlu_dist, hg.superlu_dist or a directory starting with superlu_dist > Could not locate an existing copy of SUPERLU_DIST: > ['metis-5.1.0-p3', 'parmetis-4.0.3-p3'] > Downloading SuperLU_DIST > =============================================================================== > Trying to download file:///projects/developers/cklaij/ReFRESCO/Dev/trunk/Libs/src/superlu_dist_5.1.2.tar.gz for SUPERLU_DIST > =============================================================================== > > Downloading file:///projects/developers/cklaij/ReFRESCO/Dev/trunk/Libs/src/superlu_dist_5.1.2.tar.gz to /projects/developers/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc/3.7.4-dbg/linux_64bit/externalpackages/_d_superlu_dist_5.1.2.tar.gz > Extracting /projects/developers/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc/3.7.4-dbg/linux_64bit/externalpackages/_d_superlu_dist_5.1.2.tar.gz > Executing: cd /projects/developers/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc/3.7.4-dbg/linux_64bit/externalpackages; chmod -R a+r SuperLU_DIST_5.1.2;find SuperLU_DIST_5.1.2 -type d -name "*" -exec chmod a+rx {} \; > Looking for SUPERLU_DIST at git.superlu_dist, hg.superlu_dist or a directory starting with superlu_dist > Could not locate an existing copy of SUPERLU_DIST: > ['metis-5.1.0-p3', 'parmetis-4.0.3-p3', 'SuperLU_DIST_5.1.2'] > ERROR: Failed to download SUPERLU_DIST > > Chris > > > dr. ir. Christiaan Klaij | CFD Researcher | Research & Development > MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl | http://www.marin.nl > > MARIN news: http://www.marin.nl/web/News/News-items/SSSRIMARIN-seminar-November-2-Shanghai.htm > > From olivier.mesnard8 at gmail.com Wed Oct 26 10:15:32 2016 From: olivier.mesnard8 at gmail.com (Olivier Mesnard) Date: Wed, 26 Oct 2016 11:15:32 -0400 Subject: [petsc-users] Moving from KSPSetNullSpace to MatSetNullSpace In-Reply-To: References: <15FB9A49-3162-41B0-B221-9FA01C271714@mcs.anl.gov> <2ACB6A7B-58CD-40BD-8E0C-7375CFB488BA@mcs.anl.gov> Message-ID: On 26 October 2016 at 09:38, Mark Adams wrote: > Please run with -info and grep on GAMG and send that. (-info is very > noisy). > > ?I cat the grep at the end of the log file (see attachment petsc-3.7.4-n2.log). Also, increasing the local number of iterations in SOR, as suggested by Barry, removed the indefinite preconditioner (file petsc-3.7.4-n2-lits2.log). > I'm not sure what is going on here. Divergence with parallelism. Here are > some suggestions. > > Note, you do not need to set the null space for a scalar (Poisson) problem > unless you have some special null space. And not getting it set (with the 6 > rigid body modes) for the velocity (elasticity) equation will only degrade > convergence rates. > > There was a bug for a while (early 3.7 versions) where the coarse grid was > not squeezed onto one processor, which could result in very bad > convergence, but not divergence, on multiple processors (the -info output > will report the number of 'active pes'). Perhaps this bug is causing > divergence for you. We had another subtle bug where the eigen estimates > used a bad seed vector, which gives a bad eigen estimate. This would cause > divergence but it should not be a parallelism issue (these two bugs were > both regressions in around 3.7) > > Divergence usually comes from a bad eigen estimate in a Chebyshev > smoother, but this is not highly correlated with parallelism. The -info > data will report the eigen estimates but that is not terribly useful but > you can see if it changes (gets larger) with better parameters. Add these > parameters, with the correct prefix, and use -options_left to make sure > that "there are no unused options": > > -mg_levels_ksp_type chebyshev > -mg_levels_esteig_ksp_type cg > -mg_levels_esteig_ksp_max_it 10 > ?? > -mg_levels_ksp_chebyshev_esteig 0,.1,0,1.05 > > ?petsc-3.7.4-n2-chebyshev.log contains the output when using the default KSP Chebyshev. When estimating the eigenvalues using cg with the translations [0, 0.1; 0, 1.05] (previously using default gmres with translations [0, 0.1; 0, 1.1]), the max eigenvalue decreases from 1.0931 to 1.04366 and the indefinite preconditioner appears ealier after 2 iterations (3 previously). I attached the log (see petsc-3.7.4-chebyshev.log). > chebyshev is the default, as Barry suggested, replace this with gmres or > richardson (see below) and verify that this fixed the divergence problem. > > ? Using gmres (-poisson_mg_levels_ksp_type gmres) fixes the divergence problem ? (file petsc-3.7.4-n2-gmres.log)? . ? Same observation with richardson (file petsc-3.7.4-n2-richardson.log). > If your matrix is symmetric positive definite then use > '-mg_levels_esteig_ksp_type cg', if not then use the default gmres. > I checked and I still get an indefinite preconditioner when using gmres to estimate the eigenvalues.? > > Increase/decrease '-mg_levels_esteig_ksp_max_it 10', you should see the > estimates increase and converge with higher max_it. Setting this to a huge > number, like 100, should fix the bad seed vector problem mentioned above. > > ?I played with the maximum number of iterations. Here are the min/max eigenvalue estimates for the two levels: - max_it 5: (min?=0.0975079, max=1.02383) on level 1, (0.0975647, 1.02443) on level 2 - max_it 10: (0.0991546, 1.04112), (0.0993962, 1.04366) - max_it 20: (0.0995918, 1.04571), (0.115723, 1.21509) - max_it 50: (0.0995651, 1.04543), (0.133744, 1.40431) - max_it 100: (0.0995651, 1.04543), (0.133744, 1.40431) Note that all those runs ended up with an indefinite preconditioner, except when increasing the maximum number of iterations to 50 (and 100, which did not improve the eigenvalue estimates). > If eigen estimates are a pain, like with non SPD systems, then > richardson is an option (instead of chebyshev): > > -mg_levels_ksp_type richardson > -mg_levels_ksp_richardson_scale 0.6 > > You then need to play with the scaling (that is what chebyshev does for > you essentially). > > > On Tue, Oct 25, 2016 at 10:22 PM, Matthew Knepley > wrote: > >> On Tue, Oct 25, 2016 at 9:20 PM, Barry Smith wrote: >> >>> >>> Olivier, >>> >>> Ok, so I've run the code in the debugger, but I don't not think the >>> problem is with the null space. The code is correctly removing the null >>> space on all the levels of multigrid. >>> >>> I think the error comes from changes in the behavior of GAMG. GAMG >>> is relatively rapidly moving with different defaults and even different >>> code with each release. >>> >>> To check this I added the option -poisson_mg_levels_pc_sor_lits 2 >>> and it stopped complaining about KSP_DIVERGED_INDEFINITE_PC. I've seen this >>> before where the smoother is "too weak" and so the net result is that >>> action of the preconditioner is indefinite. Mark Adams probably has better >>> suggestions on how to make the preconditioner behave. Note you could also >>> use a KSP of richardson or gmres instead of cg since they don't care about >>> this indefinite business. >> >> >> I think old GAMG squared the graph by default. You can see in the 3.7 >> output that it does not. >> >> Matt >> >> >>> >>> Barry >>> >>> >>> >>> > On Oct 25, 2016, at 5:39 PM, Olivier Mesnard < >>> olivier.mesnard8 at gmail.com> wrote: >>> > >>> > On 25 October 2016 at 17:51, Barry Smith wrote: >>> > >>> > Olivier, >>> > >>> > In theory you do not need to change anything else. Are you using a >>> different matrix object for the velocity_ksp object than the poisson_ksp >>> object? >>> > >>> > ?The matrix is different for the velocity_ksp and the poisson_ksp?. >>> > >>> > The code change in PETSc is very little but we have a report from >>> another CFD user who also had problems with the change so there may be some >>> subtle bug that we can't figure out causing things to not behave properly. >>> > >>> > First run the 3.7.4 code with -poisson_ksp_view and verify that >>> when it prints the matrix information it prints something like has attached >>> null space if it does not print that it means that somehow the matrix is >>> not properly getting the matrix attached. >>> > >>> > ?When running with 3.7.4 and -poisson_ksp_view, the output shows that >>> the nullspace is not attached to the KSP (as it was with 3.5.4)?; however >>> the print statement is now under the Mat info (which is expected when >>> moving from KSPSetNullSpace to MatSetNullSpace?). >>> > >>> > Though older versions had MatSetNullSpace() they didn't >>> necessarily associate it with the KSP so it was not expected to work as a >>> replacement for KSPSetNullSpace() with older versions. >>> > >>> > Because our other user had great difficulty trying to debug the >>> issue feel free to send us at petsc-maint at mcs.anl.gov your code with >>> instructions on building and running and we can try to track down the >>> problem. Better than hours and hours spent with fruitless email. We will, >>> of course, not distribute the code and will delete in when we are finished >>> with it. >>> > >>> > ?The code is open-source and hosted on GitHub ( >>> https://github.com/barbagroup/PetIBM)?. >>> > I just pushed the branches `feature-compatible-petsc-3.7` and >>> `revert-compatible-petsc-3.5` that I used to observe this problem. >>> > >>> > PETSc (both 3.5.4 and 3.7.4) was configured as follow: >>> > export PETSC_ARCH="linux-gnu-dbg" >>> > ./configure --PETSC_ARCH=$PETSC_ARCH \ >>> > --with-cc=gcc \ >>> > --with-cxx=g++ \ >>> > --with-fc=gfortran \ >>> > --COPTFLAGS="-O0" \ >>> > --CXXOPTFLAGS="-O0" \ >>> > --FOPTFLAGS="-O0" \ >>> > --with-debugging=1 \ >>> > --download-fblaslapack \ >>> > --download-mpich \ >>> > --download-hypre \ >>> > --download-yaml \ >>> > --with-x=1 >>> > >>> > Our code was built using the following commands:? >>> > mkdir petibm-build >>> > cd petibm-build >>> > ?export PETSC_DIR= >>> > export PETSC_ARCH="linux-gnu-dbg" >>> > export PETIBM_DIR= >>> > $PETIBM_DIR/configure --prefix=$PWD \ >>> > CXX=$PETSC_DIR/$PETSC_ARCH/bin/mpicxx \ >>> > CXXFLAGS="-g -O0 -std=c++11"? >>> > make all >>> > make install >>> > >>> > ?Then >>> > cd examples >>> > make examples? >>> > >>> > ?The example of the lid-driven cavity I was talking about can be found >>> in the folder `examples/2d/convergence/lidDrivenCavity20/20/`? >>> > >>> > To run it: >>> > mpiexec -n N /bin/petibm2d -directory >>> >>> > >>> > Let me know if you need more info. Thank you. >>> > >>> > Barry >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > > On Oct 25, 2016, at 4:38 PM, Olivier Mesnard < >>> olivier.mesnard8 at gmail.com> wrote: >>> > > >>> > > Hi all, >>> > > >>> > > We develop a CFD code using the PETSc library that solves the >>> Navier-Stokes equations using the fractional-step method from Perot (1993). >>> > > At each time-step, we solve two systems: one for the velocity field, >>> the other, a Poisson system, for the pressure field. >>> > > One of our test-cases is a 2D lid-driven cavity flow (Re=100) on a >>> 20x20 grid using 1 or 2 procs. >>> > > For the Poisson system, we usually use CG preconditioned with GAMG. >>> > > >>> > > So far, we have been using PETSc-3.5.4, and we would like to update >>> the code with the latest release: 3.7.4. >>> > > >>> > > As suggested in the changelog of 3.6, we replaced the routine >>> `KSPSetNullSpace()` with `MatSetNullSpace()`. >>> > > >>> > > Here is the list of options we use to configure the two solvers: >>> > > * Velocity solver: prefix `-velocity_` >>> > > -velocity_ksp_type bcgs >>> > > -velocity_ksp_rtol 1.0E-08 >>> > > -velocity_ksp_atol 0.0 >>> > > -velocity_ksp_max_it 10000 >>> > > -velocity_pc_type jacobi >>> > > -velocity_ksp_view >>> > > -velocity_ksp_monitor_true_residual >>> > > -velocity_ksp_converged_reason >>> > > * Poisson solver: prefix `-poisson_` >>> > > -poisson_ksp_type cg >>> > > -poisson_ksp_rtol 1.0E-08 >>> > > -poisson_ksp_atol 0.0 >>> > > -poisson_ksp_max_it 20000 >>> > > -poisson_pc_type gamg >>> > > -poisson_pc_gamg_type agg >>> > > -poisson_pc_gamg_agg_nsmooths 1 >>> > > -poissonksp_view >>> > > -poisson_ksp_monitor_true_residual >>> > > -poisson_ksp_converged_reason >>> > > >>> > > With 3.5.4, the case runs normally on 1 or 2 procs. >>> > > With 3.7.4, the case runs normally on 1 proc but not on 2. >>> > > Why? The Poisson solver diverges because of an indefinite >>> preconditioner (only with 2 procs). >>> > > >>> > > We also saw that the routine `MatSetNullSpace()` was already >>> available in 3.5.4. >>> > > With 3.5.4, replacing `KSPSetNullSpace()` with `MatSetNullSpace()` >>> led to the Poisson solver diverging because of an indefinite matrix (on 1 >>> and 2 procs). >>> > > >>> > > Thus, we were wondering if we needed to update something else for >>> the KSP, and not just modifying the name of the routine? >>> > > >>> > > I have attached the output files from the different cases: >>> > > * `run-petsc-3.5.4-n1.log` (3.5.4, `KSPSetNullSpace()`, n=1) >>> > > * `run-petsc-3.5.4-n2.log` >>> > > * `run-petsc-3.5.4-nsp-n1.log` (3.5.4, `MatSetNullSpace()`, n=1) >>> > > * `run-petsc-3.5.4-nsp-n2.log` >>> > > * `run-petsc-3.7.4-n1.log` (3.7.4, `MatSetNullSpace()`, n=1) >>> > > * `run-petsc-3.7.4-n2.log` >>> > > >>> > > Thank you for your help, >>> > > Olivier >>> > > >> .5.4-nsp-n1.log>>> -n1.log> >>> > >>> > >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: petsc-3.5.4-n2.log Type: text/x-log Size: 9535 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: petsc-3.7.4-n2-lits2.log Type: text/x-log Size: 12996 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: petsc-3.7.4-n2-chebyshev.log Type: text/x-log Size: 12084 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: petsc-3.7.4-n2-gmres.log Type: text/x-log Size: 12480 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: petsc-3.7.4-n2-richardson.log Type: text/x-log Size: 12245 bytes Desc: not available URL: From balay at mcs.anl.gov Wed Oct 26 10:32:29 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 26 Oct 2016 10:32:29 -0500 Subject: [petsc-users] petsc 3.7.4 with superlu_dist install problem In-Reply-To: References: <1477471401674.45879@marin.nl> Message-ID: One additional note - there is this option thats thats useful for your use case of downloading tarballs separately.. >>> $ ./configure --with-packages-dir=$HOME/tmp --download-superlu =============================================================================== Configuring PETSc to compile on your system =============================================================================== Download the following packages to /home/balay/tmp superlu ['git://https://github.com/xiaoyeli/superlu', 'https://github.com/xiaoyeli/superlu/archive/7e10c8a.tar.gz'] Then run the script again <<< It tells you exactly the URLs that you should download - for the packages that you are installing.. Satish On Wed, 26 Oct 2016, Satish Balay wrote: > As you can see - the dir names don't match. > > petsc-3.7 uses: https://github.com/xiaoyeli/superlu_dist/archive/0b5369f.tar.gz > > If you wish to try version 5.1.2 [which is not the default for this version of PETSc] - you can try: > > https://github.com/xiaoyeli/superlu_dist/archive/v5.1.2.tar.gz > > Alternatively: > > cd /projects/developers/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc/3.7.4-dbg/linux_64bit/externalpackages > mv SuperLU_DIST_5.1.2 superlu_dist_5.1.2 > > rerun configure [with the same options as before] > > Satish > > On Wed, 26 Oct 2016, Klaij, Christiaan wrote: > > > Satish, > > > > I'm having a similar problem with SuperLU_DIST, attached is the > > configure.log. I've noticed that the extraction of the tarball > > works fine, yet it gives the "unable to download" message: > > > > Checking for a functional SuperLU_DIST > > Looking for SUPERLU_DIST at git.superlu_dist, hg.superlu_dist or a directory starting with superlu_dist > > Could not locate an existing copy of SUPERLU_DIST: > > ['metis-5.1.0-p3', 'parmetis-4.0.3-p3'] > > Downloading SuperLU_DIST > > =============================================================================== > > Trying to download file:///projects/developers/cklaij/ReFRESCO/Dev/trunk/Libs/src/superlu_dist_5.1.2.tar.gz for SUPERLU_DIST > > =============================================================================== > > > > Downloading file:///projects/developers/cklaij/ReFRESCO/Dev/trunk/Libs/src/superlu_dist_5.1.2.tar.gz to /projects/developers/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc/3.7.4-dbg/linux_64bit/externalpackages/_d_superlu_dist_5.1.2.tar.gz > > Extracting /projects/developers/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc/3.7.4-dbg/linux_64bit/externalpackages/_d_superlu_dist_5.1.2.tar.gz > > Executing: cd /projects/developers/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc/3.7.4-dbg/linux_64bit/externalpackages; chmod -R a+r SuperLU_DIST_5.1.2;find SuperLU_DIST_5.1.2 -type d -name "*" -exec chmod a+rx {} \; > > Looking for SUPERLU_DIST at git.superlu_dist, hg.superlu_dist or a directory starting with superlu_dist > > Could not locate an existing copy of SUPERLU_DIST: > > ['metis-5.1.0-p3', 'parmetis-4.0.3-p3', 'SuperLU_DIST_5.1.2'] > > ERROR: Failed to download SUPERLU_DIST > > > > Chris > > > > > > dr. ir. Christiaan Klaij | CFD Researcher | Research & Development > > MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl | http://www.marin.nl > > > > MARIN news: http://www.marin.nl/web/News/News-items/SSSRIMARIN-seminar-November-2-Shanghai.htm > > > > > > From xsli at lbl.gov Wed Oct 26 15:23:50 2016 From: xsli at lbl.gov (Xiaoye S. Li) Date: Wed, 26 Oct 2016 13:23:50 -0700 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> <0d72e78d-52af-1d9a-677f-167d2efbd8ab@uni-mainz.de> <6B14246C-CA55-4E5B-BF5C-F01C33DACCB8@mcs.anl.gov> <94801705-937C-4D8D-BDCE-3E1AEAA4BCB4@mcs.anl.gov> <953ef0f8-b2bf-c152-9a6a-9927f0f47a24@uni-mainz.de> <4292667a-fda6-dafd-c8f6-44e8130f042a@uni-mainz.de> Message-ID: Some graph preprocessing steps can be skipped ONLY IF a previous factorization was done, and the information can be reused (AS INPUT) to the new factorization. In general, the driver routine SRC/pdgssvx.c() performs the LU factorization of the following (preprocessed) matrix: Pc*Pr*diag(R)*A*diag(C)*Pc^T = L*U The default is to do LU from scratch, including all the steps to compute equilibration (R, C), pivot ordering (Pr), and sparsity ordering (Pc). -- The default should be set as options.Fact = DOFACT. -- When you set options.Fact = SamePattern, the sparsity ordering step is skipped, but you need to input Pc which was obtained from a previous factorization. -- When you set options.Fact = SamePattern_SameRowPerm, both sparsity reordering and pivoting ordering steps are skipped, but you need to input both Pr and Pc. Please see Lines 258 - 307 comments in SRC/pdgssvx.c for details, regarding which data structures should be inputs and which are outputs. The Users Guide also explains this. In EXAMPLE/ directory, I have various examples of these usage situations, see EXAMPLE/README. I am a little puzzled why in PETSc, the default is set to SamePattern ?? Sherry On Tue, Oct 25, 2016 at 9:18 AM, Hong wrote: > Sherry, > > We set '-mat_superlu_dist_fact SamePattern' as default in > petsc/superlu_dist on 12/6/15 (see attached email below). > > However, Anton must set 'SamePattern_SameRowPerm' to avoid crash in his > code. Checking > http://crd-legacy.lbl.gov/~xiaoye/SuperLU/superlu_dist_ > code_html/pzgssvx___a_bglobal_8c.html > I see detailed description on using SamePattern_SameRowPerm, which > requires more from user than SamePattern. I guess these flags are used > for efficiency. The library sets a default, then have users to switch for > their own applications. The default setting should not cause crash. If > crash occurs, give a meaningful error message would be help. > > Do you have suggestion how should we set default in petsc for this flag? > > Hong > > ------------------- > Hong > 12/7/15 > to Danyang, petsc-maint, PETSc, Xiaoye > Danyang : > > Adding '-mat_superlu_dist_fact SamePattern' fixed the problem. Below is > how I figured it out. > > 1. Reading ex52f.F, I see '-superlu_default' = > '-pc_factor_mat_solver_package superlu_dist', the later enables runtime > options for other packages. I use superlu_dist-4.2 and superlu-4.1 for the > tests below. > ... > 5. > Using a_flow_check_1.bin, I am able to reproduce the error you reported: > all packages give correct results except superlu_dist: > ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs > matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check > -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package > superlu_dist > Norm of error 2.5970E-12 iterations 1 > -->Test for matrix 168 > Norm of error 1.3936E-01 iterations 34 > -->Test for matrix 169 > > I guess the error might come from reuse of matrix factor. Replacing default > -mat_superlu_dist_fact with > -mat_superlu_dist_fact SamePattern, I get > > ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs > matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check > -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package > superlu_dist -mat_superlu_dist_fact SamePattern > > Norm of error 2.5970E-12 iterations 1 > -->Test for matrix 168 > ... > Sherry may tell you why SamePattern_SameRowPerm cause the difference here. > Best on the above experiments, I would set following as default > '-mat_superlu_diagpivotthresh 0.0' in petsc/superlu interface. > '-mat_superlu_dist_fact SamePattern' in petsc/superlu_dist interface. > > Hong > > On Tue, Oct 25, 2016 at 10:38 AM, Hong wrote: > >> Anton, >> I guess, when you reuse matrix and its symbolic factor with updated >> numerical values, superlu_dist requires this option. I'm cc'ing Sherry to >> confirm it. >> >> I'll check petsc/superlu-dist interface to set this flag for this case. >> >> Hong >> >> >> On Tue, Oct 25, 2016 at 8:20 AM, Anton Popov wrote: >> >>> Hong, >>> >>> I get all the problems gone and valgrind-clean output if I specify this: >>> >>> -mat_superlu_dist_fact SamePattern_SameRowPerm >>> What does SamePattern_SameRowPerm actually mean? >>> Row permutations are for large diagonal, column permutations are for >>> sparsity, right? >>> Will it skip subsequent matrix permutations for large diagonal even if >>> matrix values change significantly? >>> >>> Surprisingly everything works even with: >>> >>> -mat_superlu_dist_colperm PARMETIS >>> -mat_superlu_dist_parsymbfact TRUE >>> >>> Thanks, >>> Anton >>> >>> On 10/24/2016 09:06 PM, Hong wrote: >>> >>> Anton: >>>> >>>> If replacing superlu_dist with mumps, does your code work? >>>> >>>> yes >>>> >>> >>> You may use mumps in your code, or tests different options for >>> superlu_dist: >>> >>> -mat_superlu_dist_equil: Equilibrate matrix (None) >>> -mat_superlu_dist_rowperm Row permutation (choose one of) >>> LargeDiag NATURAL (None) >>> -mat_superlu_dist_colperm Column permutation (choose >>> one of) NATURAL MMD_AT_PLUS_A MMD_ATA METIS_AT_PLUS_A PARMETIS (None) >>> -mat_superlu_dist_replacetinypivot: Replace tiny pivots (None) >>> -mat_superlu_dist_parsymbfact: Parallel symbolic factorization >>> (None) >>> -mat_superlu_dist_fact Sparsity pattern for repeated >>> matrix factorization (choose one of) SamePattern SamePattern_SameRowPerm >>> (None) >>> >>> The options inside <> are defaults. You may try others. This might help >>> narrow down the bug. >>> >>> Hong >>> >>>> >>>> Hong >>>>> >>>>> On 10/24/2016 05:47 PM, Hong wrote: >>>>> >>>>> Barry, >>>>> Your change indeed fixed the error of his testing code. >>>>> As Satish tested, on your branch, ex16 runs smooth. >>>>> >>>>> I do not understand why on maint or master branch, ex16 creases inside >>>>> superlu_dist, but not with mumps. >>>>> >>>>> >>>>> I also confirm that ex16 runs fine with latest fix, but unfortunately >>>>> not my code. >>>>> >>>>> This is something to be expected, since my code preallocates once in >>>>> the beginning. So there is no way it can be affected by multiple >>>>> preallocations. Subsequently I only do matrix assembly, that makes sure >>>>> structure doesn't change (set to get error otherwise). >>>>> >>>>> Summary: we don't have a simple test code to debug superlu issue >>>>> anymore. >>>>> >>>>> Anton >>>>> >>>>> Hong >>>>> >>>>> On Mon, Oct 24, 2016 at 9:34 AM, Satish Balay >>>>> wrote: >>>>> >>>>>> On Mon, 24 Oct 2016, Barry Smith wrote: >>>>>> >>>>>> > >>>>>> > > [Or perhaps Hong is using a different test code and is observing >>>>>> bugs >>>>>> > > with superlu_dist interface..] >>>>>> > >>>>>> > She states that her test does a NEW MatCreate() for each matrix >>>>>> load (I cut and pasted it in the email I just sent). The bug I fixed was >>>>>> only related to using the SAME matrix from one MatLoad() in another >>>>>> MatLoad(). >>>>>> >>>>>> Ah - ok.. Sorry - wasn't thinking clearly :( >>>>>> >>>>>> Satish >>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Oct 26 21:29:46 2016 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 26 Oct 2016 22:29:46 -0400 Subject: [petsc-users] Moving from KSPSetNullSpace to MatSetNullSpace In-Reply-To: References: <15FB9A49-3162-41B0-B221-9FA01C271714@mcs.anl.gov> <2ACB6A7B-58CD-40BD-8E0C-7375CFB488BA@mcs.anl.gov> Message-ID: On Wed, Oct 26, 2016 at 11:15 AM, Olivier Mesnard < olivier.mesnard8 at gmail.com> wrote: > On 26 October 2016 at 09:38, Mark Adams wrote: > >> Please run with -info and grep on GAMG and send that. (-info is very >> noisy). >> >> ?I cat the grep at the end of the log file (see attachment > petsc-3.7.4-n2.log). > Also, increasing the local number of iterations in SOR, as suggested by > Barry, removed the indefinite preconditioner (file > petsc-3.7.4-n2-lits2.log). > The eigen estimate on the second level looks like it could be low. You have a small stencil. Is this a 2D problem? You might try adding '-pc_gamg_square_graph 0', just to see what happens. (AMG can be a real pain sometimes.) > > >> I'm not sure what is going on here. Divergence with parallelism. Here >> are some suggestions. >> >> Note, you do not need to set the null space for a scalar (Poisson) >> problem unless you have some special null space. And not getting it set >> (with the 6 rigid body modes) for the velocity (elasticity) equation will >> only degrade convergence rates. >> >> There was a bug for a while (early 3.7 versions) where the coarse grid >> was not squeezed onto one processor, which could result in very bad >> convergence, but not divergence, on multiple processors (the -info output >> will report the number of 'active pes'). Perhaps this bug is causing >> divergence for you. We had another subtle bug where the eigen estimates >> used a bad seed vector, which gives a bad eigen estimate. This would cause >> divergence but it should not be a parallelism issue (these two bugs were >> both regressions in around 3.7) >> >> Divergence usually comes from a bad eigen estimate in a Chebyshev >> smoother, but this is not highly correlated with parallelism. The -info >> data will report the eigen estimates but that is not terribly useful but >> you can see if it changes (gets larger) with better parameters. Add these >> parameters, with the correct prefix, and use -options_left to make sure >> that "there are no unused options": >> >> -mg_levels_ksp_type chebyshev >> -mg_levels_esteig_ksp_type cg >> -mg_levels_esteig_ksp_max_it 10 >> ?? >> -mg_levels_ksp_chebyshev_esteig 0,.1,0,1.05 >> >> ?petsc-3.7.4-n2-chebyshev.log contains the output when using the default > KSP Chebyshev. > When estimating the eigenvalues using cg with the translations [0, 0.1; 0, > 1.05] (previously using default gmres with translations [0, 0.1; 0, 1.1]), > the max eigenvalue decreases from 1.0931 to 1.04366 and the indefinite > preconditioner appears ealier after 2 iterations (3 previously). > I attached the log (see petsc-3.7.4-chebyshev.log). > > >> chebyshev is the default, as Barry suggested, replace this with gmres or >> richardson (see below) and verify that this fixed the divergence problem. >> >> ? > Using gmres (-poisson_mg_levels_ksp_type gmres) fixes the divergence > problem > ? (file petsc-3.7.4-n2-gmres.log)? > . > ? > Same observation with richardson (file petsc-3.7.4-n2-richardson.log). > > >> If your matrix is symmetric positive definite then use >> '-mg_levels_esteig_ksp_type cg', if not then use the default gmres. >> > > I checked and I still get an indefinite preconditioner when using gmres to > estimate the eigenvalues.? > > If you matrix is symmetric (and positive) then CG will give you a better estimate (provable?) > >> Increase/decrease '-mg_levels_esteig_ksp_max_it 10', you should see the >> estimates increase and converge with higher max_it. Setting this to a huge >> number, like 100, should fix the bad seed vector problem mentioned above. >> >> ?I played with the maximum number of iterations. Here are the min/max > eigenvalue estimates for the two levels: > - max_it 5: (min?=0.0975079, max=1.02383) on level 1, (0.0975647, 1.02443) > on level 2 > - max_it 10: (0.0991546, 1.04112), (0.0993962, 1.04366) > - max_it 20: (0.0995918, 1.04571), (0.115723, 1.21509) > - max_it 50: (0.0995651, 1.04543), (0.133744, 1.40431) > - max_it 100: (0.0995651, 1.04543), (0.133744, 1.40431) > > Note that all those runs ended up with an indefinite preconditioner, > except when increasing the maximum number of iterations to 50 (and 100, > which did not improve the eigenvalue estimates). > Are you saying that the indefinite PC error goes away when '-mg_levels_esteig_ksp_max_it 50' ? or is this MG smoother iterations? > > >> If eigen estimates are a pain, like with non SPD systems, then >> richardson is an option (instead of chebyshev): >> >> -mg_levels_ksp_type richardson >> -mg_levels_ksp_richardson_scale 0.6 >> >> You then need to play with the scaling (that is what chebyshev does for >> you essentially). >> >> >> On Tue, Oct 25, 2016 at 10:22 PM, Matthew Knepley >> wrote: >> >>> On Tue, Oct 25, 2016 at 9:20 PM, Barry Smith wrote: >>> >>>> >>>> Olivier, >>>> >>>> Ok, so I've run the code in the debugger, but I don't not think the >>>> problem is with the null space. The code is correctly removing the null >>>> space on all the levels of multigrid. >>>> >>>> I think the error comes from changes in the behavior of GAMG. GAMG >>>> is relatively rapidly moving with different defaults and even different >>>> code with each release. >>>> >>>> To check this I added the option -poisson_mg_levels_pc_sor_lits 2 >>>> and it stopped complaining about KSP_DIVERGED_INDEFINITE_PC. I've seen this >>>> before where the smoother is "too weak" and so the net result is that >>>> action of the preconditioner is indefinite. Mark Adams probably has better >>>> suggestions on how to make the preconditioner behave. Note you could also >>>> use a KSP of richardson or gmres instead of cg since they don't care about >>>> this indefinite business. >>> >>> >>> I think old GAMG squared the graph by default. You can see in the 3.7 >>> output that it does not. >>> >>> Matt >>> >>> >>>> >>>> Barry >>>> >>>> >>>> >>>> > On Oct 25, 2016, at 5:39 PM, Olivier Mesnard < >>>> olivier.mesnard8 at gmail.com> wrote: >>>> > >>>> > On 25 October 2016 at 17:51, Barry Smith wrote: >>>> > >>>> > Olivier, >>>> > >>>> > In theory you do not need to change anything else. Are you using >>>> a different matrix object for the velocity_ksp object than the poisson_ksp >>>> object? >>>> > >>>> > ?The matrix is different for the velocity_ksp and the poisson_ksp?. >>>> > >>>> > The code change in PETSc is very little but we have a report from >>>> another CFD user who also had problems with the change so there may be some >>>> subtle bug that we can't figure out causing things to not behave properly. >>>> > >>>> > First run the 3.7.4 code with -poisson_ksp_view and verify that >>>> when it prints the matrix information it prints something like has attached >>>> null space if it does not print that it means that somehow the matrix is >>>> not properly getting the matrix attached. >>>> > >>>> > ?When running with 3.7.4 and -poisson_ksp_view, the output shows that >>>> the nullspace is not attached to the KSP (as it was with 3.5.4)?; however >>>> the print statement is now under the Mat info (which is expected when >>>> moving from KSPSetNullSpace to MatSetNullSpace?). >>>> > >>>> > Though older versions had MatSetNullSpace() they didn't >>>> necessarily associate it with the KSP so it was not expected to work as a >>>> replacement for KSPSetNullSpace() with older versions. >>>> > >>>> > Because our other user had great difficulty trying to debug the >>>> issue feel free to send us at petsc-maint at mcs.anl.gov your code with >>>> instructions on building and running and we can try to track down the >>>> problem. Better than hours and hours spent with fruitless email. We will, >>>> of course, not distribute the code and will delete in when we are finished >>>> with it. >>>> > >>>> > ?The code is open-source and hosted on GitHub ( >>>> https://github.com/barbagroup/PetIBM)?. >>>> > I just pushed the branches `feature-compatible-petsc-3.7` and >>>> `revert-compatible-petsc-3.5` that I used to observe this problem. >>>> > >>>> > PETSc (both 3.5.4 and 3.7.4) was configured as follow: >>>> > export PETSC_ARCH="linux-gnu-dbg" >>>> > ./configure --PETSC_ARCH=$PETSC_ARCH \ >>>> > --with-cc=gcc \ >>>> > --with-cxx=g++ \ >>>> > --with-fc=gfortran \ >>>> > --COPTFLAGS="-O0" \ >>>> > --CXXOPTFLAGS="-O0" \ >>>> > --FOPTFLAGS="-O0" \ >>>> > --with-debugging=1 \ >>>> > --download-fblaslapack \ >>>> > --download-mpich \ >>>> > --download-hypre \ >>>> > --download-yaml \ >>>> > --with-x=1 >>>> > >>>> > Our code was built using the following commands:? >>>> > mkdir petibm-build >>>> > cd petibm-build >>>> > ?export PETSC_DIR= >>>> > export PETSC_ARCH="linux-gnu-dbg" >>>> > export PETIBM_DIR= >>>> > $PETIBM_DIR/configure --prefix=$PWD \ >>>> > CXX=$PETSC_DIR/$PETSC_ARCH/bin/mpicxx \ >>>> > CXXFLAGS="-g -O0 -std=c++11"? >>>> > make all >>>> > make install >>>> > >>>> > ?Then >>>> > cd examples >>>> > make examples? >>>> > >>>> > ?The example of the lid-driven cavity I was talking about can be >>>> found in the folder `examples/2d/convergence/lidDrivenCavity20/20/`? >>>> > >>>> > To run it: >>>> > mpiexec -n N /bin/petibm2d -directory >>>> >>>> > >>>> > Let me know if you need more info. Thank you. >>>> > >>>> > Barry >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > > On Oct 25, 2016, at 4:38 PM, Olivier Mesnard < >>>> olivier.mesnard8 at gmail.com> wrote: >>>> > > >>>> > > Hi all, >>>> > > >>>> > > We develop a CFD code using the PETSc library that solves the >>>> Navier-Stokes equations using the fractional-step method from Perot (1993). >>>> > > At each time-step, we solve two systems: one for the velocity >>>> field, the other, a Poisson system, for the pressure field. >>>> > > One of our test-cases is a 2D lid-driven cavity flow (Re=100) on a >>>> 20x20 grid using 1 or 2 procs. >>>> > > For the Poisson system, we usually use CG preconditioned with GAMG. >>>> > > >>>> > > So far, we have been using PETSc-3.5.4, and we would like to update >>>> the code with the latest release: 3.7.4. >>>> > > >>>> > > As suggested in the changelog of 3.6, we replaced the routine >>>> `KSPSetNullSpace()` with `MatSetNullSpace()`. >>>> > > >>>> > > Here is the list of options we use to configure the two solvers: >>>> > > * Velocity solver: prefix `-velocity_` >>>> > > -velocity_ksp_type bcgs >>>> > > -velocity_ksp_rtol 1.0E-08 >>>> > > -velocity_ksp_atol 0.0 >>>> > > -velocity_ksp_max_it 10000 >>>> > > -velocity_pc_type jacobi >>>> > > -velocity_ksp_view >>>> > > -velocity_ksp_monitor_true_residual >>>> > > -velocity_ksp_converged_reason >>>> > > * Poisson solver: prefix `-poisson_` >>>> > > -poisson_ksp_type cg >>>> > > -poisson_ksp_rtol 1.0E-08 >>>> > > -poisson_ksp_atol 0.0 >>>> > > -poisson_ksp_max_it 20000 >>>> > > -poisson_pc_type gamg >>>> > > -poisson_pc_gamg_type agg >>>> > > -poisson_pc_gamg_agg_nsmooths 1 >>>> > > -poissonksp_view >>>> > > -poisson_ksp_monitor_true_residual >>>> > > -poisson_ksp_converged_reason >>>> > > >>>> > > With 3.5.4, the case runs normally on 1 or 2 procs. >>>> > > With 3.7.4, the case runs normally on 1 proc but not on 2. >>>> > > Why? The Poisson solver diverges because of an indefinite >>>> preconditioner (only with 2 procs). >>>> > > >>>> > > We also saw that the routine `MatSetNullSpace()` was already >>>> available in 3.5.4. >>>> > > With 3.5.4, replacing `KSPSetNullSpace()` with `MatSetNullSpace()` >>>> led to the Poisson solver diverging because of an indefinite matrix (on 1 >>>> and 2 procs). >>>> > > >>>> > > Thus, we were wondering if we needed to update something else for >>>> the KSP, and not just modifying the name of the routine? >>>> > > >>>> > > I have attached the output files from the different cases: >>>> > > * `run-petsc-3.5.4-n1.log` (3.5.4, `KSPSetNullSpace()`, n=1) >>>> > > * `run-petsc-3.5.4-n2.log` >>>> > > * `run-petsc-3.5.4-nsp-n1.log` (3.5.4, `MatSetNullSpace()`, n=1) >>>> > > * `run-petsc-3.5.4-nsp-n2.log` >>>> > > * `run-petsc-3.7.4-n1.log` (3.7.4, `MatSetNullSpace()`, n=1) >>>> > > * `run-petsc-3.7.4-n2.log` >>>> > > >>>> > > Thank you for your help, >>>> > > Olivier >>>> > > >>> .5.4-nsp-n1.log>>>> -n1.log> >>>> > >>>> > >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From C.Klaij at marin.nl Thu Oct 27 01:43:33 2016 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Thu, 27 Oct 2016 06:43:33 +0000 Subject: [petsc-users] petsc 3.7.4 with superlu_dist install problem In-Reply-To: References: <1477471401674.45879@marin.nl> , Message-ID: <1477550613080.76481@marin.nl> Satish, Thanks for the tip, that's very convenient! Chris dr. ir. Christiaan Klaij | CFD Researcher | Research & Development MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl | http://www.marin.nl MARIN news: http://www.marin.nl/web/News/News-items/Workshop-Optimaliseren-is-ook-innoveren-15-november.htm ________________________________________ From: Satish Balay Sent: Wednesday, October 26, 2016 5:32 PM To: petsc-users Cc: Klaij, Christiaan Subject: Re: [petsc-users] petsc 3.7.4 with superlu_dist install problem One additional note - there is this option thats thats useful for your use case of downloading tarballs separately.. >>> $ ./configure --with-packages-dir=$HOME/tmp --download-superlu =============================================================================== Configuring PETSc to compile on your system =============================================================================== Download the following packages to /home/balay/tmp superlu ['git://https://github.com/xiaoyeli/superlu', 'https://github.com/xiaoyeli/superlu/archive/7e10c8a.tar.gz'] Then run the script again <<< It tells you exactly the URLs that you should download - for the packages that you are installing.. Satish On Wed, 26 Oct 2016, Satish Balay wrote: > As you can see - the dir names don't match. > > petsc-3.7 uses: https://github.com/xiaoyeli/superlu_dist/archive/0b5369f.tar.gz > > If you wish to try version 5.1.2 [which is not the default for this version of PETSc] - you can try: > > https://github.com/xiaoyeli/superlu_dist/archive/v5.1.2.tar.gz > > Alternatively: > > cd /projects/developers/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc/3.7.4-dbg/linux_64bit/externalpackages > mv SuperLU_DIST_5.1.2 superlu_dist_5.1.2 > > rerun configure [with the same options as before] > > Satish > > On Wed, 26 Oct 2016, Klaij, Christiaan wrote: > > > Satish, > > > > I'm having a similar problem with SuperLU_DIST, attached is the > > configure.log. I've noticed that the extraction of the tarball > > works fine, yet it gives the "unable to download" message: > > > > Checking for a functional SuperLU_DIST > > Looking for SUPERLU_DIST at git.superlu_dist, hg.superlu_dist or a directory starting with superlu_dist > > Could not locate an existing copy of SUPERLU_DIST: > > ['metis-5.1.0-p3', 'parmetis-4.0.3-p3'] > > Downloading SuperLU_DIST > > =============================================================================== > > Trying to download file:///projects/developers/cklaij/ReFRESCO/Dev/trunk/Libs/src/superlu_dist_5.1.2.tar.gz for SUPERLU_DIST > > =============================================================================== > > > > Downloading file:///projects/developers/cklaij/ReFRESCO/Dev/trunk/Libs/src/superlu_dist_5.1.2.tar.gz to /projects/developers/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc/3.7.4-dbg/linux_64bit/externalpackages/_d_superlu_dist_5.1.2.tar.gz > > Extracting /projects/developers/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc/3.7.4-dbg/linux_64bit/externalpackages/_d_superlu_dist_5.1.2.tar.gz > > Executing: cd /projects/developers/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc/3.7.4-dbg/linux_64bit/externalpackages; chmod -R a+r SuperLU_DIST_5.1.2;find SuperLU_DIST_5.1.2 -type d -name "*" -exec chmod a+rx {} \; > > Looking for SUPERLU_DIST at git.superlu_dist, hg.superlu_dist or a directory starting with superlu_dist > > Could not locate an existing copy of SUPERLU_DIST: > > ['metis-5.1.0-p3', 'parmetis-4.0.3-p3', 'SuperLU_DIST_5.1.2'] > > ERROR: Failed to download SUPERLU_DIST > > > > Chris > > > > > > dr. ir. Christiaan Klaij | CFD Researcher | Research & Development > > MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl | http://www.marin.nl > > > > MARIN news: http://www.marin.nl/web/News/News-items/SSSRIMARIN-seminar-November-2-Shanghai.htm > > > > > > From gotofd at gmail.com Thu Oct 27 03:11:11 2016 From: gotofd at gmail.com (Ji Zhang) Date: Thu, 27 Oct 2016 16:11:11 +0800 Subject: [petsc-users] how to run petsc in MPI mode correct? Message-ID: Dear all, I'm using petsc as a solver for my project. However, the solver in parallel mode creates much more process then my expectation. The code using python and petsc4py. The machine have 4 cores. (a). If I run it directly, petsc uses only 1 process to assemble the matrix, and creates 4 process to solve the equations, (b). If I use comment 'mpirun -n 4', petsc uses 4 process to assemble the matrix, but creates 16 process to solve the equations, I have checked my own python code,, the main component associates with matrix create is as follow: m = PETSc.Mat().create(comm=PETSc.COMM_WORLD) m.setSizes(((None, n_vnode[0]*3), (None, n_fnode[0]*3))) m.setType('dense') m.setFromOptions() m.setUp() m_start, m_end = m.getOwnershipRange() for i0 in range(m_start, m_end): delta_xi = fnodes - vnodes[i0//3] temp1 = delta_xi ** 2 delta_2 = np.square(delta) # delta_2 = e^2 delta_r2 = temp1.sum(axis=1) + delta_2 # delta_r2 = r^2+e^2 delta_r3 = delta_r2 * np.sqrt(delta_r2) # delta_r3 = (r^2+e^2)^1.5 temp2 = (delta_r2 + delta_2) / delta_r3 # temp2 = (r^2+2*e^2)/(r^2+e^2)^1.5 if i0 % 3 == 0: # x axis m[i0, 0::3] = ( temp2 + np.square(delta_xi[:, 0]) / delta_r3 ) / (8 * np.pi) # Mxx m[i0, 1::3] = delta_xi[:, 0] * delta_xi[:, 1] / delta_r3 / (8 * np.pi) # Mxy m[i0, 2::3] = delta_xi[:, 0] * delta_xi[:, 2] / delta_r3 / (8 * np.pi) # Mxz elif i0 % 3 == 1: # y axis m[i0, 0::3] = delta_xi[:, 0] * delta_xi[:, 1] / delta_r3 / (8 * np.pi) # Mxy m[i0, 1::3] = ( temp2 + np.square(delta_xi[:, 1]) / delta_r3 ) / (8 * np.pi) # Myy m[i0, 2::3] = delta_xi[:, 1] * delta_xi[:, 2] / delta_r3 / (8 * np.pi) # Myz else: # z axis m[i0, 0::3] = delta_xi[:, 0] * delta_xi[:, 2] / delta_r3 / (8 * np.pi) # Mxz m[i0, 1::3] = delta_xi[:, 1] * delta_xi[:, 2] / delta_r3 / (8 * np.pi) # Myz m[i0, 2::3] = ( temp2 + np.square(delta_xi[:, 2]) / delta_r3 ) / (8 * np.pi) # Mzz m.assemble() the main component associates to petsc solver is as follow: ksp = PETSc.KSP() ksp.create(comm=PETSc.COMM_WORLD) ksp.setType(solve_method) ksp.getPC().setType(precondition_method) ksp.setOperators(self._M_petsc) ksp.setFromOptions() ksp.solve(velocity_petsc, force_petsc) Is there any one could give me some suggestion? Thanks. ?? ?? ????????? ?????????? ???????????10????9?? ?100193? Best, Regards, Zhang Ji, PhD student Beijing Computational Science Research Center Zhongguancun Software Park II, No. 10 Dongbeiwang West Road, Haidian District, Beijing 100193, China -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Thu Oct 27 03:24:13 2016 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Thu, 27 Oct 2016 11:24:13 +0300 Subject: [petsc-users] how to run petsc in MPI mode correct? In-Reply-To: References: Message-ID: 2016-10-27 11:11 GMT+03:00 Ji Zhang : > Dear all, > > I'm using petsc as a solver for my project. However, the solver in > parallel mode creates much more process then my expectation. > > The code using python and petsc4py. The machine have 4 cores. > (a). If I run it directly, petsc uses only 1 process to assemble the > matrix, and creates 4 process to solve the equations, > (b). If I use comment 'mpirun -n 4', petsc uses 4 process to assemble the > matrix, but creates 16 process to solve the equations, > What do you mean by "PETSc creates 16 processes"? PETSc does not create processes. What's the output of PETSc.COMM_WORLD.getSize()? My feeling is that you have some python component (numpy?) or the BLAS/LAPACK library that is multithreaded. Rerun using OMP_NUM_THREADS=1 (or MKL_NUM_THREADS=1) If this does not fix your issues, try running under strace > I have checked my own python code,, the main component associates with > matrix create is as follow: > > m = PETSc.Mat().create(comm=PETSc.COMM_WORLD) > m.setSizes(((None, n_vnode[0]*3), (None, n_fnode[0]*3))) > m.setType('dense') > m.setFromOptions() > m.setUp() > m_start, m_end = m.getOwnershipRange() > for i0 in range(m_start, m_end): > delta_xi = fnodes - vnodes[i0//3] > temp1 = delta_xi ** 2 > delta_2 = np.square(delta) # delta_2 = e^2 > delta_r2 = temp1.sum(axis=1) + delta_2 # delta_r2 = r^2+e^2 > delta_r3 = delta_r2 * np.sqrt(delta_r2) # delta_r3 = (r^2+e^2)^1.5 > temp2 = (delta_r2 + delta_2) / delta_r3 # temp2 = (r^2+2*e^2)/(r^2+e^2)^1.5 > if i0 % 3 == 0: # x axis > m[i0, 0::3] = ( temp2 + np.square(delta_xi[:, 0]) / delta_r3 ) / (8 * np.pi) # Mxx > m[i0, 1::3] = delta_xi[:, 0] * delta_xi[:, 1] / delta_r3 / (8 * np.pi) # Mxy > m[i0, 2::3] = delta_xi[:, 0] * delta_xi[:, 2] / delta_r3 / (8 * np.pi) # Mxz > elif i0 % 3 == 1: # y axis > m[i0, 0::3] = delta_xi[:, 0] * delta_xi[:, 1] / delta_r3 / (8 * np.pi) # Mxy > m[i0, 1::3] = ( temp2 + np.square(delta_xi[:, 1]) / delta_r3 ) / (8 * np.pi) # Myy > m[i0, 2::3] = delta_xi[:, 1] * delta_xi[:, 2] / delta_r3 / (8 * np.pi) # Myz > else: # z axis > m[i0, 0::3] = delta_xi[:, 0] * delta_xi[:, 2] / delta_r3 / (8 * np.pi) # Mxz > m[i0, 1::3] = delta_xi[:, 1] * delta_xi[:, 2] / delta_r3 / (8 * np.pi) # Myz > m[i0, 2::3] = ( temp2 + np.square(delta_xi[:, 2]) / delta_r3 ) / (8 * np.pi) # Mzz > m.assemble() > > > > the main component associates to petsc solver is as follow: > > ksp = PETSc.KSP() > ksp.create(comm=PETSc.COMM_WORLD) > ksp.setType(solve_method) > ksp.getPC().setType(precondition_method) > ksp.setOperators(self._M_petsc) > ksp.setFromOptions() > ksp.solve(velocity_petsc, force_petsc) > > Is there any one could give me some suggestion? Thanks. > ?? > ?? > ????????? > ?????????? > ???????????10????9?? ?100193? > > Best, > Regards, > Zhang Ji, PhD student > Beijing Computational Science Research Center > Zhongguancun Software Park II, No. 10 Dongbeiwang West Road, Haidian > District, Beijing 100193, China > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From gotofd at gmail.com Thu Oct 27 03:52:07 2016 From: gotofd at gmail.com (Ji Zhang) Date: Thu, 27 Oct 2016 16:52:07 +0800 Subject: [petsc-users] how to run petsc in MPI mode correct? In-Reply-To: References: Message-ID: Thank you very much, OMP_NUM_THREADS=1 works well! ?? ?? ????????? ?????????? ???????????10????9?? ?100193? Best, Regards, Zhang Ji, PhD student Beijing Computational Science Research Center Zhongguancun Software Park II, No. 10 Dongbeiwang West Road, Haidian District, Beijing 100193, China On Thu, Oct 27, 2016 at 4:24 PM, Stefano Zampini wrote: > > > 2016-10-27 11:11 GMT+03:00 Ji Zhang : > >> Dear all, >> >> I'm using petsc as a solver for my project. However, the solver in >> parallel mode creates much more process then my expectation. >> >> The code using python and petsc4py. The machine have 4 cores. >> (a). If I run it directly, petsc uses only 1 process to assemble the >> matrix, and creates 4 process to solve the equations, >> (b). If I use comment 'mpirun -n 4', petsc uses 4 process to assemble >> the matrix, but creates 16 process to solve the equations, >> > > What do you mean by "PETSc creates 16 processes"? PETSc does not create > processes. > What's the output of PETSc.COMM_WORLD.getSize()? > > My feeling is that you have some python component (numpy?) or the > BLAS/LAPACK library that is multithreaded. Rerun using OMP_NUM_THREADS=1 > (or MKL_NUM_THREADS=1) > If this does not fix your issues, try running under strace > > >> I have checked my own python code,, the main component associates with >> matrix create is as follow: >> >> m = PETSc.Mat().create(comm=PETSc.COMM_WORLD) >> m.setSizes(((None, n_vnode[0]*3), (None, n_fnode[0]*3))) >> m.setType('dense') >> m.setFromOptions() >> m.setUp() >> m_start, m_end = m.getOwnershipRange() >> for i0 in range(m_start, m_end): >> delta_xi = fnodes - vnodes[i0//3] >> temp1 = delta_xi ** 2 >> delta_2 = np.square(delta) # delta_2 = e^2 >> delta_r2 = temp1.sum(axis=1) + delta_2 # delta_r2 = r^2+e^2 >> delta_r3 = delta_r2 * np.sqrt(delta_r2) # delta_r3 = (r^2+e^2)^1.5 >> temp2 = (delta_r2 + delta_2) / delta_r3 # temp2 = (r^2+2*e^2)/(r^2+e^2)^1.5 >> if i0 % 3 == 0: # x axis >> m[i0, 0::3] = ( temp2 + np.square(delta_xi[:, 0]) / delta_r3 ) / (8 * np.pi) # Mxx >> m[i0, 1::3] = delta_xi[:, 0] * delta_xi[:, 1] / delta_r3 / (8 * np.pi) # Mxy >> m[i0, 2::3] = delta_xi[:, 0] * delta_xi[:, 2] / delta_r3 / (8 * np.pi) # Mxz >> elif i0 % 3 == 1: # y axis >> m[i0, 0::3] = delta_xi[:, 0] * delta_xi[:, 1] / delta_r3 / (8 * np.pi) # Mxy >> m[i0, 1::3] = ( temp2 + np.square(delta_xi[:, 1]) / delta_r3 ) / (8 * np.pi) # Myy >> m[i0, 2::3] = delta_xi[:, 1] * delta_xi[:, 2] / delta_r3 / (8 * np.pi) # Myz >> else: # z axis >> m[i0, 0::3] = delta_xi[:, 0] * delta_xi[:, 2] / delta_r3 / (8 * np.pi) # Mxz >> m[i0, 1::3] = delta_xi[:, 1] * delta_xi[:, 2] / delta_r3 / (8 * np.pi) # Myz >> m[i0, 2::3] = ( temp2 + np.square(delta_xi[:, 2]) / delta_r3 ) / (8 * np.pi) # Mzz >> m.assemble() >> >> >> >> the main component associates to petsc solver is as follow: >> >> ksp = PETSc.KSP() >> ksp.create(comm=PETSc.COMM_WORLD) >> ksp.setType(solve_method) >> ksp.getPC().setType(precondition_method) >> ksp.setOperators(self._M_petsc) >> ksp.setFromOptions() >> ksp.solve(velocity_petsc, force_petsc) >> >> Is there any one could give me some suggestion? Thanks. >> ?? >> ?? >> ????????? >> ?????????? >> ???????????10????9?? ?100193? >> >> Best, >> Regards, >> Zhang Ji, PhD student >> Beijing Computational Science Research Center >> Zhongguancun Software Park II, No. 10 Dongbeiwang West Road, Haidian >> District, Beijing 100193, China >> > > > > -- > Stefano > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Thu Oct 27 09:51:21 2016 From: hzhang at mcs.anl.gov (Hong) Date: Thu, 27 Oct 2016 09:51:21 -0500 Subject: [petsc-users] SuperLU_dist issue in 3.7.4 In-Reply-To: References: <33127b2c-a0b4-bda0-bfa7-c7ebeed6b0ba@uni-mainz.de> <34EF03D3-66E5-41FD-A069-AFF06408FD9D@mcs.anl.gov> <0d72e78d-52af-1d9a-677f-167d2efbd8ab@uni-mainz.de> <6B14246C-CA55-4E5B-BF5C-F01C33DACCB8@mcs.anl.gov> <94801705-937C-4D8D-BDCE-3E1AEAA4BCB4@mcs.anl.gov> <953ef0f8-b2bf-c152-9a6a-9927f0f47a24@uni-mainz.de> <4292667a-fda6-dafd-c8f6-44e8130f042a@uni-mainz.de> Message-ID: Sherry, Thanks for detailed explanation. We use options.Fact = DOFACT as default for the first factorization. When user reuses matrix factor, then we must provide a default, either 'options.Fact = SamePattern' or 'SamePattern_SameRowPerm'. We previously set 'SamePattern_SameRowPerm'. After a user reported error, we switched to 'SamePattern' which causes problem for 2nd user. I'll check our interface to see if we can add flag-checking for Pr and Pc, then set default accordingly. Hong On Wed, Oct 26, 2016 at 3:23 PM, Xiaoye S. Li wrote: > Some graph preprocessing steps can be skipped ONLY IF a previous > factorization was done, and the information can be reused (AS INPUT) to the > new factorization. > > In general, the driver routine SRC/pdgssvx.c() performs the LU > factorization of the following (preprocessed) matrix: > Pc*Pr*diag(R)*A*diag(C)*Pc^T = L*U > > The default is to do LU from scratch, including all the steps to compute > equilibration (R, C), pivot ordering (Pr), and sparsity ordering (Pc). > > -- The default should be set as options.Fact = DOFACT. > > -- When you set options.Fact = SamePattern, the sparsity ordering step is > skipped, but you need to input Pc which was obtained from a previous > factorization. > > -- When you set options.Fact = SamePattern_SameRowPerm, both sparsity > reordering and pivoting ordering steps are skipped, but you need to input > both Pr and Pc. > > Please see Lines 258 - 307 comments in SRC/pdgssvx.c for details, > regarding which data structures should be inputs and which are outputs. > The Users Guide also explains this. > > In EXAMPLE/ directory, I have various examples of these usage situations, > see EXAMPLE/README. > > I am a little puzzled why in PETSc, the default is set to SamePattern ?? > > Sherry > > > On Tue, Oct 25, 2016 at 9:18 AM, Hong wrote: > >> Sherry, >> >> We set '-mat_superlu_dist_fact SamePattern' as default in >> petsc/superlu_dist on 12/6/15 (see attached email below). >> >> However, Anton must set 'SamePattern_SameRowPerm' to avoid crash in his >> code. Checking >> http://crd-legacy.lbl.gov/~xiaoye/SuperLU/superlu_dist_code_ >> html/pzgssvx___a_bglobal_8c.html >> I see detailed description on using SamePattern_SameRowPerm, which >> requires more from user than SamePattern. I guess these flags are used >> for efficiency. The library sets a default, then have users to switch for >> their own applications. The default setting should not cause crash. If >> crash occurs, give a meaningful error message would be help. >> >> Do you have suggestion how should we set default in petsc for this flag? >> >> Hong >> >> ------------------- >> Hong >> 12/7/15 >> to Danyang, petsc-maint, PETSc, Xiaoye >> Danyang : >> >> Adding '-mat_superlu_dist_fact SamePattern' fixed the problem. Below is >> how I figured it out. >> >> 1. Reading ex52f.F, I see '-superlu_default' = >> '-pc_factor_mat_solver_package superlu_dist', the later enables runtime >> options for other packages. I use superlu_dist-4.2 and superlu-4.1 for the >> tests below. >> ... >> 5. >> Using a_flow_check_1.bin, I am able to reproduce the error you reported: >> all packages give correct results except superlu_dist: >> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs >> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check >> -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package >> superlu_dist >> Norm of error 2.5970E-12 iterations 1 >> -->Test for matrix 168 >> Norm of error 1.3936E-01 iterations 34 >> -->Test for matrix 169 >> >> I guess the error might come from reuse of matrix factor. Replacing >> default >> -mat_superlu_dist_fact with >> -mat_superlu_dist_fact SamePattern, I get >> >> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs >> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check >> -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package >> superlu_dist -mat_superlu_dist_fact SamePattern >> >> Norm of error 2.5970E-12 iterations 1 >> -->Test for matrix 168 >> ... >> Sherry may tell you why SamePattern_SameRowPerm cause the difference >> here. >> Best on the above experiments, I would set following as default >> '-mat_superlu_diagpivotthresh 0.0' in petsc/superlu interface. >> '-mat_superlu_dist_fact SamePattern' in petsc/superlu_dist interface. >> >> Hong >> >> On Tue, Oct 25, 2016 at 10:38 AM, Hong wrote: >> >>> Anton, >>> I guess, when you reuse matrix and its symbolic factor with updated >>> numerical values, superlu_dist requires this option. I'm cc'ing Sherry to >>> confirm it. >>> >>> I'll check petsc/superlu-dist interface to set this flag for this case. >>> >>> Hong >>> >>> >>> On Tue, Oct 25, 2016 at 8:20 AM, Anton Popov wrote: >>> >>>> Hong, >>>> >>>> I get all the problems gone and valgrind-clean output if I specify this: >>>> >>>> -mat_superlu_dist_fact SamePattern_SameRowPerm >>>> What does SamePattern_SameRowPerm actually mean? >>>> Row permutations are for large diagonal, column permutations are for >>>> sparsity, right? >>>> Will it skip subsequent matrix permutations for large diagonal even if >>>> matrix values change significantly? >>>> >>>> Surprisingly everything works even with: >>>> >>>> -mat_superlu_dist_colperm PARMETIS >>>> -mat_superlu_dist_parsymbfact TRUE >>>> >>>> Thanks, >>>> Anton >>>> >>>> On 10/24/2016 09:06 PM, Hong wrote: >>>> >>>> Anton: >>>>> >>>>> If replacing superlu_dist with mumps, does your code work? >>>>> >>>>> yes >>>>> >>>> >>>> You may use mumps in your code, or tests different options for >>>> superlu_dist: >>>> >>>> -mat_superlu_dist_equil: Equilibrate matrix (None) >>>> -mat_superlu_dist_rowperm Row permutation (choose one of) >>>> LargeDiag NATURAL (None) >>>> -mat_superlu_dist_colperm Column permutation >>>> (choose one of) NATURAL MMD_AT_PLUS_A MMD_ATA METIS_AT_PLUS_A PARMETIS >>>> (None) >>>> -mat_superlu_dist_replacetinypivot: Replace tiny pivots >>>> (None) >>>> -mat_superlu_dist_parsymbfact: Parallel symbolic >>>> factorization (None) >>>> -mat_superlu_dist_fact Sparsity pattern for repeated >>>> matrix factorization (choose one of) SamePattern SamePattern_SameRowPerm >>>> (None) >>>> >>>> The options inside <> are defaults. You may try others. This might help >>>> narrow down the bug. >>>> >>>> Hong >>>> >>>>> >>>>> Hong >>>>>> >>>>>> On 10/24/2016 05:47 PM, Hong wrote: >>>>>> >>>>>> Barry, >>>>>> Your change indeed fixed the error of his testing code. >>>>>> As Satish tested, on your branch, ex16 runs smooth. >>>>>> >>>>>> I do not understand why on maint or master branch, ex16 creases >>>>>> inside superlu_dist, but not with mumps. >>>>>> >>>>>> >>>>>> I also confirm that ex16 runs fine with latest fix, but unfortunately >>>>>> not my code. >>>>>> >>>>>> This is something to be expected, since my code preallocates once in >>>>>> the beginning. So there is no way it can be affected by multiple >>>>>> preallocations. Subsequently I only do matrix assembly, that makes sure >>>>>> structure doesn't change (set to get error otherwise). >>>>>> >>>>>> Summary: we don't have a simple test code to debug superlu issue >>>>>> anymore. >>>>>> >>>>>> Anton >>>>>> >>>>>> Hong >>>>>> >>>>>> On Mon, Oct 24, 2016 at 9:34 AM, Satish Balay >>>>>> wrote: >>>>>> >>>>>>> On Mon, 24 Oct 2016, Barry Smith wrote: >>>>>>> >>>>>>> > >>>>>>> > > [Or perhaps Hong is using a different test code and is observing >>>>>>> bugs >>>>>>> > > with superlu_dist interface..] >>>>>>> > >>>>>>> > She states that her test does a NEW MatCreate() for each matrix >>>>>>> load (I cut and pasted it in the email I just sent). The bug I fixed was >>>>>>> only related to using the SAME matrix from one MatLoad() in another >>>>>>> MatLoad(). >>>>>>> >>>>>>> Ah - ok.. Sorry - wasn't thinking clearly :( >>>>>>> >>>>>>> Satish >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Oct 27 11:29:25 2016 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 27 Oct 2016 12:29:25 -0400 Subject: [petsc-users] Moving from KSPSetNullSpace to MatSetNullSpace In-Reply-To: References: <15FB9A49-3162-41B0-B221-9FA01C271714@mcs.anl.gov> <2ACB6A7B-58CD-40BD-8E0C-7375CFB488BA@mcs.anl.gov> Message-ID: It the ksp_view data I see on the fine grid: Chebyshev: eigenvalue estimates: min = 0.14153, max = 1.55683 Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (poisson_mg_levels_2_esteig_) 2 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. and GAMG has: [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.977326e+00 min=3.920450e-02 PC=jacobi You have a 5-point stencil dof del^2, the correct max eig is 2, and 1.55683 is not close enough for the 1.1 safety factor to save it. The view should have this line from petsc/src/ksp/ksp/impls/cheby/cheby.c (it does not): if (cheb->usenoisy) { ierr = PetscViewerASCIIPrintf(viewer," Chebyshev: estimating eigenvalues using noisy right hand side\n");CHKERRQ(ierr); } This was fixed in late May and 3.7 came out in late April, so I would think 3.7.4 would have this fix, but it does not look like it. You can fix this with some parameters but you should check that your source does not have this line. On Thu, Oct 27, 2016 at 12:14 PM, Olivier Mesnard < olivier.mesnard8 at gmail.com> wrote: > Exact. > > On 27 October 2016 at 11:58, Mark Adams wrote: > >> >>> -poisson_mg_levels_esteig_ksp_max_it 50 >>> >> >> 50 works but say 10 fails? >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From leejearl at 126.com Fri Oct 28 06:32:31 2016 From: leejearl at 126.com (leejearl) Date: Fri, 28 Oct 2016 19:32:31 +0800 Subject: [petsc-users] A question about ghost cells Message-ID: <0b74c49e-6044-fd64-8f8f-c881f1745a38@126.com> Hi, everyone: I have a distributed DMPlex representing a 2D finite volume mesh. I distributed it using the function "DMPlexDistribute()". Then, I constructed the ghost cells using "DMPlexConstructGhostCells()". The question is that I want to determine which cell is ghosted. I obtained all the cells using "DMPlexGetHeightStratum()", and how can I kown which cell is ghosted? Thanks, all the helps are appreciated. leejearl From knepley at gmail.com Fri Oct 28 06:56:12 2016 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 28 Oct 2016 06:56:12 -0500 Subject: [petsc-users] A question about ghost cells In-Reply-To: <0b74c49e-6044-fd64-8f8f-c881f1745a38@126.com> References: <0b74c49e-6044-fd64-8f8f-c881f1745a38@126.com> Message-ID: On Fri, Oct 28, 2016 at 6:32 AM, leejearl wrote: > Hi, everyone: > > I have a distributed DMPlex representing a 2D finite volume mesh. I > distributed it using the function "DMPlexDistribute()". > > Then, I constructed the ghost cells using "DMPlexConstructGhostCells()". > > The question is that I want to determine which cell is ghosted. I > obtained all the cells using "DMPlexGetHeightStratum()", and > > how can I kown which cell is ghosted? > 1) The ghost cells are marked as "hybrid", so you can call ierr = DMPlexGetHybridBounds(dm, &cEndInterior, NULL, NULL, NULL);CHKERRQ(ierr); ierr = DMPlexGetHeightStratum(dm, 0, NULL, &cEnd);CHKERRQ(ierr); and then the ghost cells are [cEndInterior, cEnd). 2) There is also a label "ghost" which marks the ghost faces, and ghost cells which are non-local Thanks, Matt > Thanks, all the helps are appreciated. > > > leejearl > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From leejearl at 126.com Fri Oct 28 08:05:06 2016 From: leejearl at 126.com (leejearl) Date: Fri, 28 Oct 2016 21:05:06 +0800 Subject: [petsc-users] A question about ghost cells In-Reply-To: References: <0b74c49e-6044-fd64-8f8f-c881f1745a38@126.com> Message-ID: <032c9d0e-1c1d-98d9-891a-3e03e1a8e6e5@126.com> Hi, Matt: Thank you for your help. It is the answer which I expect. But? I have an another equestion. Can I obtain the coordinates of the ghost cells: Thanks, leejearl On 2016?10?28? 19:56, Matthew Knepley wrote: > On Fri, Oct 28, 2016 at 6:32 AM, leejearl > wrote: > > Hi, everyone: > > I have a distributed DMPlex representing a 2D finite volume > mesh. I distributed it using the function "DMPlexDistribute()". > > Then, I constructed the ghost cells using > "DMPlexConstructGhostCells()". > > The question is that I want to determine which cell is ghosted. > I obtained all the cells using "DMPlexGetHeightStratum()", and > > how can I kown which cell is ghosted? > > > 1) The ghost cells are marked as "hybrid", so you can call > > ierr = DMPlexGetHybridBounds(dm, &cEndInterior, NULL, NULL, > NULL);CHKERRQ(ierr); > ierr = DMPlexGetHeightStratum(dm, 0, NULL, &cEnd);CHKERRQ(ierr); > > and then the ghost cells are [cEndInterior, cEnd). > > 2) There is also a label "ghost" which marks the ghost faces, and > ghost cells which are non-local > > Thanks, > > Matt > > Thanks, all the helps are appreciated. > > > leejearl > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Oct 28 08:08:25 2016 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 28 Oct 2016 08:08:25 -0500 Subject: [petsc-users] A question about ghost cells In-Reply-To: <032c9d0e-1c1d-98d9-891a-3e03e1a8e6e5@126.com> References: <0b74c49e-6044-fd64-8f8f-c881f1745a38@126.com> <032c9d0e-1c1d-98d9-891a-3e03e1a8e6e5@126.com> Message-ID: On Fri, Oct 28, 2016 at 8:05 AM, leejearl wrote: > Hi, Matt: > Thank you for your help. It is the answer which I expect. > But? I have an another equestion. Can I obtain the coordinates of the > ghost cells: > I am not sure I understand. By default, Plex stores coordinates only for vertices. Also, ghost cells do not have a full complement of vertices because they are only used to define a state on the other side of a face. Does that make sense? Matt > Thanks, > > leejearl > > On 2016?10?28? 19:56, Matthew Knepley wrote: > > On Fri, Oct 28, 2016 at 6:32 AM, leejearl wrote: > >> Hi, everyone: >> >> I have a distributed DMPlex representing a 2D finite volume mesh. I >> distributed it using the function "DMPlexDistribute()". >> >> Then, I constructed the ghost cells using "DMPlexConstructGhostCells()". >> >> The question is that I want to determine which cell is ghosted. I >> obtained all the cells using "DMPlexGetHeightStratum()", and >> >> how can I kown which cell is ghosted? >> > > 1) The ghost cells are marked as "hybrid", so you can call > > ierr = DMPlexGetHybridBounds(dm, &cEndInterior, NULL, NULL, > NULL);CHKERRQ(ierr); > ierr = DMPlexGetHeightStratum(dm, 0, NULL, &cEnd);CHKERRQ(ierr); > > and then the ghost cells are [cEndInterior, cEnd). > > 2) There is also a label "ghost" which marks the ghost faces, and ghost > cells which are non-local > > Thanks, > > Matt > > >> Thanks, all the helps are appreciated. >> >> >> leejearl >> >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremy at seamplex.com Fri Oct 28 08:13:40 2016 From: jeremy at seamplex.com (Jeremy Theler) Date: Fri, 28 Oct 2016 10:13:40 -0300 Subject: [petsc-users] GAMG Message-ID: <1477660420.2766.15.camel@seamplex.com> Hi! I want to use PCGAMG as a preconditioner for a 3D linear elasticity problem (displacement-based FEM formulation) over an unstructured grid. I am not using DMPlex, I just build the stiffness matrix myself and pass it to PETSc. I set MatSetBlockSize() to 3 and pass the node coordinates through PCSetCoordinates(). But using gamg and gmres I get: PETSc error 77-0 'Eigen estimator failed: DIVERGED_NANORINF at iteration 0' in /home/gtheler/libs/petsc-3.7.4/src/ksp/ksp/impls/cheby/cheby.c KSPSolve_Chebyshev:440 Any suggestion? Another PC/KSP combination to try? Thanks -- jeremy From mfadams at lbl.gov Fri Oct 28 08:16:16 2016 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 28 Oct 2016 09:16:16 -0400 Subject: [petsc-users] GAMG In-Reply-To: <1477660420.2766.15.camel@seamplex.com> References: <1477660420.2766.15.camel@seamplex.com> Message-ID: I think there is something wrong with your matrix. Use any solver and verify that you like the solution first. On Fri, Oct 28, 2016 at 9:13 AM, Jeremy Theler wrote: > Hi! I want to use PCGAMG as a preconditioner for a 3D linear elasticity > problem (displacement-based FEM formulation) over an unstructured grid. > I am not using DMPlex, I just build the stiffness matrix myself and pass > it to PETSc. > > I set MatSetBlockSize() to 3 and pass the node coordinates through > PCSetCoordinates(). But using gamg and gmres I get: > > PETSc error 77-0 'Eigen estimator failed: DIVERGED_NANORINF at iteration > 0' in /home/gtheler/libs/petsc-3.7.4/src/ksp/ksp/impls/cheby/cheby.c > KSPSolve_Chebyshev:440 > > Any suggestion? Another PC/KSP combination to try? > > Thanks > -- > jeremy > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From leejearl at 126.com Fri Oct 28 08:22:43 2016 From: leejearl at 126.com (leejearl) Date: Fri, 28 Oct 2016 21:22:43 +0800 Subject: [petsc-users] A question about ghost cells In-Reply-To: References: <0b74c49e-6044-fd64-8f8f-c881f1745a38@126.com> <032c9d0e-1c1d-98d9-891a-3e03e1a8e6e5@126.com> Message-ID: <20525b40-527a-0996-3d06-713217256c4a@126.com> Hi, Matt: Thank your for your reply, and it is what I needed. leejearl On 2016?10?28? 21:08, Matthew Knepley wrote: > On Fri, Oct 28, 2016 at 8:05 AM, leejearl > wrote: > > Hi, Matt: > Thank you for your help. It is the answer which I expect. > But? I have an another equestion. Can I obtain the coordinates > of the ghost cells: > > > I am not sure I understand. By default, Plex stores coordinates only > for vertices. Also, ghost cells > do not have a full complement of vertices because they are only used > to define a state on the other > side of a face. Does that make sense? > > Matt > > Thanks, > > leejearl > > On 2016?10?28? 19:56, Matthew Knepley wrote: >> On Fri, Oct 28, 2016 at 6:32 AM, leejearl > > wrote: >> >> Hi, everyone: >> >> I have a distributed DMPlex representing a 2D finite >> volume mesh. I distributed it using the function >> "DMPlexDistribute()". >> >> Then, I constructed the ghost cells using >> "DMPlexConstructGhostCells()". >> >> The question is that I want to determine which cell is >> ghosted. I obtained all the cells using >> "DMPlexGetHeightStratum()", and >> >> how can I kown which cell is ghosted? >> >> >> 1) The ghost cells are marked as "hybrid", so you can call >> >> ierr = DMPlexGetHybridBounds(dm, &cEndInterior, NULL, NULL, >> NULL);CHKERRQ(ierr); >> ierr = DMPlexGetHeightStratum(dm, 0, NULL, &cEnd);CHKERRQ(ierr); >> >> and then the ghost cells are [cEndInterior, cEnd). >> >> 2) There is also a label "ghost" which marks the ghost faces, and >> ghost cells which are non-local >> >> Thanks, >> >> Matt >> >> Thanks, all the helps are appreciated. >> >> >> leejearl >> >> >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -- ?? ??????????????? Phone: 17792092487 QQ: 188524324 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremy at seamplex.com Fri Oct 28 08:24:32 2016 From: jeremy at seamplex.com (Jeremy Theler) Date: Fri, 28 Oct 2016 10:24:32 -0300 Subject: [petsc-users] GAMG In-Reply-To: References: <1477660420.2766.15.camel@seamplex.com> Message-ID: <1477661072.2766.22.camel@seamplex.com> Hi Mark. The matrix is solved well with lu/preonly. If I do not call PCSetCoordinates() the error goes away but convergence is slow. I call PCSetCoordinates() this way (1 processor): PetscMalloc1(dimensions * mesh->n_nodes, &coords); for (j = 0; j < mesh->n_nodes; j++) { for (d = 0; d < dimensions; d++) { coords[j*dimensions + d] = mesh->node[j].x[d]; } } PCSetCoordinates(pc, dimensions, dimensions * mesh->n_nodes, coords); PetscFree(coords); Thanks -- jeremy On Fri, 2016-10-28 at 09:16 -0400, Mark Adams wrote: > I think there is something wrong with your matrix. Use any solver and > verify that you like the solution first. > > On Fri, Oct 28, 2016 at 9:13 AM, Jeremy Theler > wrote: > Hi! I want to use PCGAMG as a preconditioner for a 3D linear > elasticity > problem (displacement-based FEM formulation) over an > unstructured grid. > I am not using DMPlex, I just build the stiffness matrix > myself and pass > it to PETSc. > > I set MatSetBlockSize() to 3 and pass the node coordinates > through > PCSetCoordinates(). But using gamg and gmres I get: > > PETSc error 77-0 'Eigen estimator failed: DIVERGED_NANORINF at > iteration > 0' > in /home/gtheler/libs/petsc-3.7.4/src/ksp/ksp/impls/cheby/cheby.c > KSPSolve_Chebyshev:440 > > Any suggestion? Another PC/KSP combination to try? > > Thanks > -- > jeremy > > > > From knepley at gmail.com Fri Oct 28 08:30:45 2016 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 28 Oct 2016 08:30:45 -0500 Subject: [petsc-users] GAMG In-Reply-To: <1477661072.2766.22.camel@seamplex.com> References: <1477660420.2766.15.camel@seamplex.com> <1477661072.2766.22.camel@seamplex.com> Message-ID: On Fri, Oct 28, 2016 at 8:24 AM, Jeremy Theler wrote: > Hi Mark. > > The matrix is solved well with lu/preonly. > > If I do not call PCSetCoordinates() the error goes away but convergence > is slow. > Is it possible that your coordinates lie on a 2D surface? All this does is make the 6 basis vectors for translations and rotations. You can just make these yourself and call MatSetNearNullSpace() and see what you get. Thanks, Matt > I call PCSetCoordinates() this way (1 processor): > > PetscMalloc1(dimensions * mesh->n_nodes, &coords); > for (j = 0; j < mesh->n_nodes; j++) { > for (d = 0; d < dimensions; d++) { > coords[j*dimensions + d] = mesh->node[j].x[d]; > } > } > PCSetCoordinates(pc, dimensions, dimensions * mesh->n_nodes, > coords); > PetscFree(coords); > > > Thanks > -- > jeremy > > > > On Fri, 2016-10-28 at 09:16 -0400, Mark Adams wrote: > > I think there is something wrong with your matrix. Use any solver and > > verify that you like the solution first. > > > > On Fri, Oct 28, 2016 at 9:13 AM, Jeremy Theler > > wrote: > > Hi! I want to use PCGAMG as a preconditioner for a 3D linear > > elasticity > > problem (displacement-based FEM formulation) over an > > unstructured grid. > > I am not using DMPlex, I just build the stiffness matrix > > myself and pass > > it to PETSc. > > > > I set MatSetBlockSize() to 3 and pass the node coordinates > > through > > PCSetCoordinates(). But using gamg and gmres I get: > > > > PETSc error 77-0 'Eigen estimator failed: DIVERGED_NANORINF at > > iteration > > 0' > > in /home/gtheler/libs/petsc-3.7.4/src/ksp/ksp/impls/cheby/ > cheby.c > > KSPSolve_Chebyshev:440 > > > > Any suggestion? Another PC/KSP combination to try? > > > > Thanks > > -- > > jeremy > > > > > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremy at seamplex.com Fri Oct 28 08:38:43 2016 From: jeremy at seamplex.com (Jeremy Theler) Date: Fri, 28 Oct 2016 10:38:43 -0300 Subject: [petsc-users] GAMG In-Reply-To: References: <1477660420.2766.15.camel@seamplex.com> <1477661072.2766.22.camel@seamplex.com> Message-ID: <1477661923.2766.24.camel@seamplex.com> > > If I do not call PCSetCoordinates() the error goes away but > convergence > is slow. > Is it possible that your coordinates lie on a 2D surface? All this > does is make the 6 basis vectors > for translations and rotations. You can just make these yourself and > call MatSetNearNullSpace() > and see what you get. > No, they do not lie on a 2D surface :-/ Sorry but I did not get the point about the 6 basis vectors and MatSetNearNullSpace(). -- jeremy From mfadams at lbl.gov Fri Oct 28 08:46:18 2016 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 28 Oct 2016 09:46:18 -0400 Subject: [petsc-users] GAMG In-Reply-To: <1477661923.2766.24.camel@seamplex.com> References: <1477660420.2766.15.camel@seamplex.com> <1477661072.2766.22.camel@seamplex.com> <1477661923.2766.24.camel@seamplex.com> Message-ID: Please run with -info and grep on GAMG. On Fri, Oct 28, 2016 at 9:38 AM, Jeremy Theler wrote: > > > > > If I do not call PCSetCoordinates() the error goes away but > > convergence > > is slow. > > Is it possible that your coordinates lie on a 2D surface? All this > > does is make the 6 basis vectors > > for translations and rotations. You can just make these yourself and > > call MatSetNearNullSpace() > > and see what you get. > > > No, they do not lie on a 2D surface :-/ > > Sorry but I did not get the point about the 6 basis vectors and > MatSetNearNullSpace(). > > -- > jeremy > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremy at seamplex.com Fri Oct 28 08:48:56 2016 From: jeremy at seamplex.com (Jeremy Theler) Date: Fri, 28 Oct 2016 10:48:56 -0300 Subject: [petsc-users] GAMG In-Reply-To: References: <1477660420.2766.15.camel@seamplex.com> <1477661072.2766.22.camel@seamplex.com> <1477661923.2766.24.camel@seamplex.com> Message-ID: <1477662536.2766.25.camel@seamplex.com> On Fri, 2016-10-28 at 09:46 -0400, Mark Adams wrote: > Please run with -info and grep on GAMG. > [0] PCSetUp_GAMG(): level 0) N=120726, n data rows=3, n data cols=6, nnz/row (ave)=41, np=1 [0] PCGAMGFilterGraph(): 99.904% nnz after filtering, with threshold 0., 13.7468 nnz ave. (N=40242) [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square [0] PCGAMGProlongator_AGG(): New grid 1894 nodes [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=5.726852e+00 min=1.330683e-01 PC=jacobi [0] PCSetUp_GAMG(): 1) N=11364, n data cols=6, nnz/row (ave)=196, 1 active pes [0] PCGAMGFilterGraph(): 99.9839% nnz after filtering, with threshold 0., 32.7656 nnz ave. (N=1894) [0] PCGAMGProlongator_AGG(): New grid 155 nodes [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.011119e+01 min=1.832878e-04 PC=jacobi [0] PCSetUp_GAMG(): 2) N=930, n data cols=6, nnz/row (ave)=196, 1 active pes [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold 0., 32.7806 nnz ave. (N=155) [0] PCGAMGProlongator_AGG(): New grid 9 nodes [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=2.116373e+00 min=6.337173e-03 PC=jacobi [0] PCSetUp_GAMG(): 3) N=54, n data cols=6, nnz/row (ave)=34, 1 active pes [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold 0., 5.66667 nnz ave. (N=9) [0] PCGAMGProlongator_AGG(): New grid 2 nodes [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.984549e+00 min=8.582767e-03 PC=jacobi [0] PCSetUp_GAMG(): 4) N=12, n data cols=6, nnz/row (ave)=12, 1 active pes [0] PCSetUp_GAMG(): 5 levels, grid complexity = 1.48586 error: PETSc error 77-0 'Eigen estimator failed: DIVERGED_NANORINF at iteration 0' in /home/gtheler/libs/petsc-3.7.4/src/ksp/ksp/impls/cheby/cheby.c KSPSolve_Chebyshev:440 > > > > From mfadams at lbl.gov Fri Oct 28 09:04:11 2016 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 28 Oct 2016 10:04:11 -0400 Subject: [petsc-users] GAMG In-Reply-To: <1477662536.2766.25.camel@seamplex.com> References: <1477660420.2766.15.camel@seamplex.com> <1477661072.2766.22.camel@seamplex.com> <1477661923.2766.24.camel@seamplex.com> <1477662536.2766.25.camel@seamplex.com> Message-ID: GAMG's eigen estimator worked but the values are very high. You have very low number of equations per processor, is this a thin body? Are the elements badly stretched? Do this again with these parameters: -mg_levels_ksp_type chebyshev -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 ?? -mg_levels_ksp_chebyshev_esteig 0,.1,0,1.05 -gamg_est_ksp_type cg On Fri, Oct 28, 2016 at 9:48 AM, Jeremy Theler wrote: > On Fri, 2016-10-28 at 09:46 -0400, Mark Adams wrote: > > Please run with -info and grep on GAMG. > > > [0] PCSetUp_GAMG(): level 0) N=120726, n data rows=3, n data cols=6, > nnz/row (ave)=41, np=1 > [0] PCGAMGFilterGraph(): 99.904% nnz after filtering, with > threshold 0., 13.7468 nnz ave. (N=40242) > [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square > [0] PCGAMGProlongator_AGG(): New grid 1894 nodes > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=5.726852e+00 > min=1.330683e-01 PC=jacobi > [0] PCSetUp_GAMG(): 1) N=11364, n data cols=6, nnz/row (ave)=196, 1 > active pes > [0] PCGAMGFilterGraph(): 99.9839% nnz after filtering, with > threshold 0., 32.7656 nnz ave. (N=1894) > [0] PCGAMGProlongator_AGG(): New grid 155 nodes > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.011119e+01 > min=1.832878e-04 PC=jacobi > [0] PCSetUp_GAMG(): 2) N=930, n data cols=6, nnz/row (ave)=196, 1 active > pes > [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with > threshold 0., 32.7806 nnz ave. (N=155) > [0] PCGAMGProlongator_AGG(): New grid 9 nodes > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=2.116373e+00 > min=6.337173e-03 PC=jacobi > [0] PCSetUp_GAMG(): 3) N=54, n data cols=6, nnz/row (ave)=34, 1 active > pes > [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with > threshold 0., 5.66667 nnz ave. (N=9) > [0] PCGAMGProlongator_AGG(): New grid 2 nodes > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.984549e+00 > min=8.582767e-03 PC=jacobi > [0] PCSetUp_GAMG(): 4) N=12, n data cols=6, nnz/row (ave)=12, 1 active > pes > [0] PCSetUp_GAMG(): 5 levels, grid complexity = 1.48586 > error: PETSc error 77-0 'Eigen estimator failed: DIVERGED_NANORINF at > iteration 0' > in /home/gtheler/libs/petsc-3.7.4/src/ksp/ksp/impls/cheby/cheby.c > KSPSolve_Chebyshev:440 > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Oct 28 09:07:39 2016 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 28 Oct 2016 10:07:39 -0400 Subject: [petsc-users] GAMG In-Reply-To: References: <1477660420.2766.15.camel@seamplex.com> <1477661072.2766.22.camel@seamplex.com> <1477661923.2766.24.camel@seamplex.com> <1477662536.2766.25.camel@seamplex.com> Message-ID: Also, try solving the problem with a one level iterative method and Chebyshev, like: -ksp_type chebyshev -pc_type jacobi It will take a long time to solve but I just want to see if it has the same error. On Fri, Oct 28, 2016 at 10:04 AM, Mark Adams wrote: > GAMG's eigen estimator worked but the values are very high. You have very > low number of equations per processor, is this a thin body? Are the > elements badly stretched? > > Do this again with these parameters: > > -mg_levels_ksp_type chebyshev > -mg_levels_esteig_ksp_type cg > -mg_levels_esteig_ksp_max_it 10 > ?? > -mg_levels_ksp_chebyshev_esteig 0,.1,0,1.05 > -gamg_est_ksp_type cg > > > On Fri, Oct 28, 2016 at 9:48 AM, Jeremy Theler > wrote: > >> On Fri, 2016-10-28 at 09:46 -0400, Mark Adams wrote: >> > Please run with -info and grep on GAMG. >> > >> [0] PCSetUp_GAMG(): level 0) N=120726, n data rows=3, n data cols=6, >> nnz/row (ave)=41, np=1 >> [0] PCGAMGFilterGraph(): 99.904% nnz after filtering, with >> threshold 0., 13.7468 nnz ave. (N=40242) >> [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square >> [0] PCGAMGProlongator_AGG(): New grid 1894 nodes >> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=5.726852e+00 >> min=1.330683e-01 PC=jacobi >> [0] PCSetUp_GAMG(): 1) N=11364, n data cols=6, nnz/row (ave)=196, 1 >> active pes >> [0] PCGAMGFilterGraph(): 99.9839% nnz after filtering, with >> threshold 0., 32.7656 nnz ave. (N=1894) >> [0] PCGAMGProlongator_AGG(): New grid 155 nodes >> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.011119e+01 >> min=1.832878e-04 PC=jacobi >> [0] PCSetUp_GAMG(): 2) N=930, n data cols=6, nnz/row (ave)=196, 1 active >> pes >> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with >> threshold 0., 32.7806 nnz ave. (N=155) >> [0] PCGAMGProlongator_AGG(): New grid 9 nodes >> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=2.116373e+00 >> min=6.337173e-03 PC=jacobi >> [0] PCSetUp_GAMG(): 3) N=54, n data cols=6, nnz/row (ave)=34, 1 active >> pes >> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with >> threshold 0., 5.66667 nnz ave. (N=9) >> [0] PCGAMGProlongator_AGG(): New grid 2 nodes >> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.984549e+00 >> min=8.582767e-03 PC=jacobi >> [0] PCSetUp_GAMG(): 4) N=12, n data cols=6, nnz/row (ave)=12, 1 active >> pes >> [0] PCSetUp_GAMG(): 5 levels, grid complexity = 1.48586 >> error: PETSc error 77-0 'Eigen estimator failed: DIVERGED_NANORINF at >> iteration 0' >> in /home/gtheler/libs/petsc-3.7.4/src/ksp/ksp/impls/cheby/cheby.c >> KSPSolve_Chebyshev:440 >> >> >> >> > >> > >> > >> > >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremy at seamplex.com Fri Oct 28 09:12:18 2016 From: jeremy at seamplex.com (jeremy theler) Date: Fri, 28 Oct 2016 14:12:18 +0000 Subject: [petsc-users] GAMG In-Reply-To: References: <1477660420.2766.15.camel@seamplex.com> <1477661072.2766.22.camel@seamplex.com> <1477661923.2766.24.camel@seamplex.com> <1477662536.2766.25.camel@seamplex.com> Message-ID: I will try these options in a couple of hours (I have to go out now). I forgot to mention that the geometry has revolution symmetry around the z axis (just the geometry, not the problem because it has a non-symmetric temperature distribution). I am solving with only one proc, there are approx 50k nodes so 150k dofs. Thanks again. On Fri, Oct 28, 2016, 11:07 Mark Adams wrote: > Also, try solving the problem with a one level iterative method and > Chebyshev, like: > > -ksp_type chebyshev > -pc_type jacobi > > It will take a long time to solve but I just want to see if it has the > same error. > > > On Fri, Oct 28, 2016 at 10:04 AM, Mark Adams wrote: > > GAMG's eigen estimator worked but the values are very high. You have very > low number of equations per processor, is this a thin body? Are the > elements badly stretched? > > Do this again with these parameters: > > -mg_levels_ksp_type chebyshev > -mg_levels_esteig_ksp_type cg > -mg_levels_esteig_ksp_max_it 10 > ?? > -mg_levels_ksp_chebyshev_esteig 0,.1,0,1.05 > -gamg_est_ksp_type cg > > > On Fri, Oct 28, 2016 at 9:48 AM, Jeremy Theler > wrote: > > On Fri, 2016-10-28 at 09:46 -0400, Mark Adams wrote: > > Please run with -info and grep on GAMG. > > > [0] PCSetUp_GAMG(): level 0) N=120726, n data rows=3, n data cols=6, > nnz/row (ave)=41, np=1 > [0] PCGAMGFilterGraph(): 99.904% nnz after filtering, with > threshold 0., 13.7468 nnz ave. (N=40242) > [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square > [0] PCGAMGProlongator_AGG(): New grid 1894 nodes > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=5.726852e+00 > min=1.330683e-01 PC=jacobi > [0] PCSetUp_GAMG(): 1) N=11364, n data cols=6, nnz/row (ave)=196, 1 > active pes > [0] PCGAMGFilterGraph(): 99.9839% nnz after filtering, with > threshold 0., 32.7656 nnz ave. (N=1894) > [0] PCGAMGProlongator_AGG(): New grid 155 nodes > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.011119e+01 > min=1.832878e-04 PC=jacobi > [0] PCSetUp_GAMG(): 2) N=930, n data cols=6, nnz/row (ave)=196, 1 active > pes > [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with > threshold 0., 32.7806 nnz ave. (N=155) > [0] PCGAMGProlongator_AGG(): New grid 9 nodes > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=2.116373e+00 > min=6.337173e-03 PC=jacobi > [0] PCSetUp_GAMG(): 3) N=54, n data cols=6, nnz/row (ave)=34, 1 active > pes > [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with > threshold 0., 5.66667 nnz ave. (N=9) > [0] PCGAMGProlongator_AGG(): New grid 2 nodes > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.984549e+00 > min=8.582767e-03 PC=jacobi > [0] PCSetUp_GAMG(): 4) N=12, n data cols=6, nnz/row (ave)=12, 1 active > pes > [0] PCSetUp_GAMG(): 5 levels, grid complexity = 1.48586 > error: PETSc error 77-0 'Eigen estimator failed: DIVERGED_NANORINF at > iteration 0' > in /home/gtheler/libs/petsc-3.7.4/src/ksp/ksp/impls/cheby/cheby.c > KSPSolve_Chebyshev:440 > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Oct 28 09:22:08 2016 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 28 Oct 2016 10:22:08 -0400 Subject: [petsc-users] GAMG In-Reply-To: References: <1477660420.2766.15.camel@seamplex.com> <1477661072.2766.22.camel@seamplex.com> <1477661923.2766.24.camel@seamplex.com> <1477662536.2766.25.camel@seamplex.com> Message-ID: So this is a fully 3D problem, or is it a very flat disc? What is the worst aspect ratio (or whatever) of an element, approximately. That is, is this a bad mesh? You might want to start with a simple problem like a cube. The eigen estimates (Smooth P0: max eigen=1.011119e+01) are huge and they are a lower bound. You might also try -gamg_est_ksp_max_it 50 and see if these eigen estimates go up much. (and use -gamg_est_ksp_type cg). GAMG's eigen estimates are working but I use manufacture a seed vector, which is slightly different than what Cheby does. Also, what version of PETSc are you using? It would be best to use git to clone the repository. This would give you maint or master branch which have a fix for the cheby eigen estimator that your version might not have (use -ksp_view and grep for "noisy" to see if you have an up to date version). On Fri, Oct 28, 2016 at 10:12 AM, jeremy theler wrote: > I will try these options in a couple of hours (I have to go out now). I > forgot to mention that the geometry has revolution symmetry around the z > axis (just the geometry, not the problem because it has a non-symmetric > temperature distribution). > I am solving with only one proc, there are approx 50k nodes so 150k dofs. > Thanks again. > > On Fri, Oct 28, 2016, 11:07 Mark Adams wrote: > >> Also, try solving the problem with a one level iterative method and >> Chebyshev, like: >> >> -ksp_type chebyshev >> -pc_type jacobi >> >> It will take a long time to solve but I just want to see if it has the >> same error. >> >> >> On Fri, Oct 28, 2016 at 10:04 AM, Mark Adams wrote: >> >> GAMG's eigen estimator worked but the values are very high. You have >> very low number of equations per processor, is this a thin body? Are the >> elements badly stretched? >> >> Do this again with these parameters: >> >> -mg_levels_ksp_type chebyshev >> -mg_levels_esteig_ksp_type cg >> -mg_levels_esteig_ksp_max_it 10 >> ?? >> -mg_levels_ksp_chebyshev_esteig 0,.1,0,1.05 >> -gamg_est_ksp_type cg >> >> >> On Fri, Oct 28, 2016 at 9:48 AM, Jeremy Theler >> wrote: >> >> On Fri, 2016-10-28 at 09:46 -0400, Mark Adams wrote: >> > Please run with -info and grep on GAMG. >> > >> [0] PCSetUp_GAMG(): level 0) N=120726, n data rows=3, n data cols=6, >> nnz/row (ave)=41, np=1 >> [0] PCGAMGFilterGraph(): 99.904% nnz after filtering, with >> threshold 0., 13.7468 nnz ave. (N=40242) >> [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square >> [0] PCGAMGProlongator_AGG(): New grid 1894 nodes >> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=5.726852e+00 >> min=1.330683e-01 PC=jacobi >> [0] PCSetUp_GAMG(): 1) N=11364, n data cols=6, nnz/row (ave)=196, 1 >> active pes >> [0] PCGAMGFilterGraph(): 99.9839% nnz after filtering, with >> threshold 0., 32.7656 nnz ave. (N=1894) >> [0] PCGAMGProlongator_AGG(): New grid 155 nodes >> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.011119e+01 >> min=1.832878e-04 PC=jacobi >> [0] PCSetUp_GAMG(): 2) N=930, n data cols=6, nnz/row (ave)=196, 1 active >> pes >> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with >> threshold 0., 32.7806 nnz ave. (N=155) >> [0] PCGAMGProlongator_AGG(): New grid 9 nodes >> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=2.116373e+00 >> min=6.337173e-03 PC=jacobi >> [0] PCSetUp_GAMG(): 3) N=54, n data cols=6, nnz/row (ave)=34, 1 active >> pes >> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with >> threshold 0., 5.66667 nnz ave. (N=9) >> [0] PCGAMGProlongator_AGG(): New grid 2 nodes >> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.984549e+00 >> min=8.582767e-03 PC=jacobi >> [0] PCSetUp_GAMG(): 4) N=12, n data cols=6, nnz/row (ave)=12, 1 active >> pes >> [0] PCSetUp_GAMG(): 5 levels, grid complexity = 1.48586 >> error: PETSc error 77-0 'Eigen estimator failed: DIVERGED_NANORINF at >> iteration 0' >> in /home/gtheler/libs/petsc-3.7.4/src/ksp/ksp/impls/cheby/cheby.c >> KSPSolve_Chebyshev:440 >> >> >> >> > >> > >> > >> > >> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Oct 28 09:35:07 2016 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 28 Oct 2016 09:35:07 -0500 Subject: [petsc-users] GAMG In-Reply-To: <1477661923.2766.24.camel@seamplex.com> References: <1477660420.2766.15.camel@seamplex.com> <1477661072.2766.22.camel@seamplex.com> <1477661923.2766.24.camel@seamplex.com> Message-ID: On Fri, Oct 28, 2016 at 8:38 AM, Jeremy Theler wrote: > > > > > If I do not call PCSetCoordinates() the error goes away but > > convergence > > is slow. > > Is it possible that your coordinates lie on a 2D surface? All this > > does is make the 6 basis vectors > > for translations and rotations. You can just make these yourself and > > call MatSetNearNullSpace() > > and see what you get. > > > No, they do not lie on a 2D surface :-/ > > Sorry but I did not get the point about the 6 basis vectors and > MatSetNearNullSpace(). > AMG (the agglomeration kind) needs to know the near null space of your operator in order to work. You have an elasticity problem (I think), and if you take that operator without boundary conditions, the energy is invariant to translations and rotations. The space of translations and rotations is a 6D space (3 translations, 3 rotations). You need to express these in the basis for your problem (I assume linear elements, P1). This is what PCSetCoordinates() tries to do. Something is going wrong, but its hard for us to say what since I have no idea what your problem looks like. So you can make these vectors yourself and provide them to GAMG using MatSetNearNullSpace(). Matt > -- > jeremy > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Oct 28 09:46:37 2016 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 28 Oct 2016 10:46:37 -0400 Subject: [petsc-users] GAMG In-Reply-To: References: <1477660420.2766.15.camel@seamplex.com> <1477661072.2766.22.camel@seamplex.com> <1477661923.2766.24.camel@seamplex.com> Message-ID: > > >> > AMG (the agglomeration kind) needs to know the near null space of your > operator in order > to work. You have an elasticity problem (I think), and if you take that > operator without boundary > conditions, the energy is invariant to translations and rotations. The > space of translations and > rotations is a 6D space (3 translations, 3 rotations). You need to express > these in the basis for > your problem (I assume linear elements, P1). > Actually, these vectors are purely geometric. If these rigid body modes are not your kernel then you have a bad discretization or you are not doing 3D elasticity. Anyway, this reminds me that the problem goes away w/o the RBMs. The fine grid eigen estimate was large and will not be affected by the null space business. The second grid had a huge eigenvalue and that could be affected by the null space. What is your Poisson ratio? > This is what PCSetCoordinates() tries to do. Something > is going wrong, but its hard for us to say what since I have no idea what > your problem looks like. > So you can make these vectors yourself and provide them to GAMG using > MatSetNearNullSpace(). > > Matt > > >> -- >> jeremy >> > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bikash at umich.edu Sun Oct 30 19:57:20 2016 From: bikash at umich.edu (Bikash Kanungo) Date: Sun, 30 Oct 2016 20:57:20 -0400 Subject: [petsc-users] BVNormColumn In-Reply-To: <647546F4-0365-4E11-89E2-2241C6F32939@dsic.upv.es> References: <4807F42C-75A7-4DE3-A605-A4BDE9CDF868@dsic.upv.es> <647546F4-0365-4E11-89E2-2241C6F32939@dsic.upv.es> Message-ID: Thank you so much, Jose. Loosening the tolerance did the trick! Thanks again, Bikash On Tue, Oct 25, 2016 at 5:25 AM, Jose E. Roman wrote: > > > El 19 oct 2016, a las 9:54, Jose E. Roman escribi?: > > > >> > >> El 19 oct 2016, a las 0:26, Bikash Kanungo escribi?: > >> > >> Hi Jose, > >> > >> Thanks for the pointers. Here's what I observed on probing it further: > >> > >> ? The ||B - B^H|| norm was 1e-18. So I explicitly made it > Hermitian by setting B = 0.5(B+B^H). However, this didn't help. > >> ? Next, I checked for the conditioning of B by computing the ratio > of the highest and lowest eigenvalues. The conditioning of the order 1e-9. > >> ? I monitored the imaginary the imaginary part of VecDot(y,x, > dotXY) where y = B*x and noted that only when the imaginary part is more > than 1e-16 in magnitude, the error of "The inner product is not well > defined" is flagged. For the first few iterations of orhtogonalization > (i.e., the one where orthogonization is successful), the values of > VecDot(y,x, dotXY) are all found to be lower than 1e-16. I guess this small > imaginary part might be the cause of the error. > >> Let me know if there is a way to bypass the abort by changing the > tolerance for imaginary part. > >> > >> > >> > >> Regards, > >> Bikash > >> > > > > There is something wrong: the condition number is greater than 1 by > definition, so it cannot be 1e-9. Anyway, maybe what happens is that your > matrix has a very small norm. The SLEPc code needs a fix for the case when > the norm of B or the norm of the vector x is very small. Please send the > matrix to my personal email and I will make some tests. > > > > Jose > > I tested with your matrix and vector with two different machines, with > different compilers, and in both cases the computation did not fail. The > imaginary part is below the machine precision, as expected. I don't know > why you are getting larger roundoff error. Anyway, the check that we > currently have in SLEPc is too strict. You can try relaxing it, by editing > function BV_SafeSqrt (in $SLEPC_DIR/include/slepc/private/bvimpl.h), for > instance with this: > > if (PetscAbsReal(PetscImaginaryPart(alpha))>PETSC_MACHINE_EPSILON && > PetscAbsReal(PetscImaginaryPart(alpha))/absal>100*PETSC_MACHINE_EPSILON) > SETERRQ1(PetscObjectComm((PetscObject)bv),1,"The inner product is not > well defined: nonzero imaginary part %g",PetscImaginaryPart(alpha)); > > Let us know if this works for you. > Thanks. > Jose > > -- Bikash S. Kanungo PhD Student Computational Materials Physics Group Mechanical Engineering University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From C.Klaij at marin.nl Mon Oct 31 05:14:40 2016 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Mon, 31 Oct 2016 10:14:40 +0000 Subject: [petsc-users] --download-metis and build of stand-alone tools Message-ID: <82ee49ca0e1e404887275a198b1b81ee@MAR190n2.marin.local> Satish, I've noticed that SuperLU depends on metis and parmetis and that PETSc downloads the versions 5.1.0-p3 and 4.0.3-p3. These are different from the Karypis latest stable versions (without the -p3). Do I really need these -p3 versions? If so, after configure, compilation and installation by petsc, it seems that the stand-alone programs such as gpmetis are not being build and installed. That's a problem for me. I don't mind switching to the versions and config that petsc needs, but I do need the complete thing. Can I somehow tell petsc to also build the standalone tools? Chris dr. ir. Christiaan Klaij | CFD Researcher | Research & Development MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl | http://www.marin.nl MARIN news: http://www.marin.nl/web/News/News-items/New-SCREENIN-JIP-open-for-participation.htm From jeremy at seamplex.com Mon Oct 31 05:54:11 2016 From: jeremy at seamplex.com (Jeremy Theler) Date: Mon, 31 Oct 2016 07:54:11 -0300 Subject: [petsc-users] GAMG In-Reply-To: References: <1477660420.2766.15.camel@seamplex.com> <1477661072.2766.22.camel@seamplex.com> <1477661923.2766.24.camel@seamplex.com> Message-ID: <1477911251.7699.19.camel@seamplex.com> Hi again I have been wokring on these issues. Long story short: it is about the ordering of the unknown fields in the vector. Long story: The physics is linear elastic problem, you can see it does work with LU over a simple cube (warp the displacements to see it does represent an elastic problem, E=200e3, nu=0.3): https://caeplex.com/demo/results.php?id=5817146bdb561 Say my three displacements (unknowns) are u,v,w. I can define the unknown vector as (is this called node-based ordering?) [u1 v1 w1 u2 v2 w2 ... un vn wn]^T Another option is (is this called unknown-based ordering?) [u1 u2 ... un v1 v2 ... vn w1 w2 ... wn]^T With lu/preonly the results are the same, although the stiffnes matrixes for each case are attached as PNGs. And of course, the near-nullspace vectors are different. So PCSetCoordinates() should work with one ordering and not with another one, an issue I did not take into consideration. After understanding Matt's point about the near nullspace (and reading some interesting comments from Jed on scicomp stackexchange) I did built my own vectors (I had to take a look at MatNullSpaceCreateRigidBody() because I found out by running the code the nullspace should be an orthonormal basis, it should say so in the docs). Now, there are some results I do not understand. I tried these six combinations: order near-nullspace iterations norm ----- -------------- ---------- ---- unknown explicit 10 1.6e-6 unknown PCSetCoordinates 15 1.7e-7 unknown none 15 2.4e-7 node explicit fails with error -11 node PCSetCoordinates fails with error -11 node none 13 3.8e-7 Error -11 is PETSc's linear solver did not converge with reason 'DIVERGED_PCSETUP_FAILED' (-11) Any explanation (for dumbs)? Another thing to take into account: I am setting the dirichlet BCs with MatZeroRows(), but I am not updating the columns to keep symmetry. Can this pose a problem for GAMG? I can post the two stiffnes matrices and the RHS vectors as binary files. Thank you! -- jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: node_based.png Type: image/png Size: 33919 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: unknown_based.png Type: image/png Size: 76356 bytes Desc: not available URL: From balay at mcs.anl.gov Mon Oct 31 09:25:18 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 31 Oct 2016 09:25:18 -0500 Subject: [petsc-users] --download-metis and build of stand-alone tools In-Reply-To: <82ee49ca0e1e404887275a198b1b81ee@MAR190n2.marin.local> References: <82ee49ca0e1e404887275a198b1b81ee@MAR190n2.marin.local> Message-ID: On Mon, 31 Oct 2016, Klaij, Christiaan wrote: > Satish, > > I've noticed that SuperLU depends on metis and parmetis and that > PETSc downloads the versions 5.1.0-p3 and 4.0.3-p3. These are > different from the Karypis latest stable versions (without the > -p3). Do I really need these -p3 versions? The tarballs we distribute have a bunch of patches [mostly for protable build - but some bugfixes aswell] https://bitbucket.org/petsc/pkg-parmetis/commits/all https://bitbucket.org/petsc/pkg-metis/commits/all > > If so, after configure, compilation and installation by petsc, it > seems that the stand-alone programs such as gpmetis are not being > build and installed. That's a problem for me. I don't mind > switching to the versions and config that petsc needs, but I do > need the complete thing. Can I somehow tell petsc to also build > the standalone tools? I haven't built this stuff. But presumably the process with our tarball is similar to the one you would use with tarballs from Karypis. So you should be able to "cd PETSC_ARCH/externalapcakages/*metis*" and do the build of this extra stuff as you require? Satish > > Chris > > > dr. ir. Christiaan Klaij | CFD Researcher | Research & Development > MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl | http://www.marin.nl > > MARIN news: http://www.marin.nl/web/News/News-items/New-SCREENIN-JIP-open-for-participation.htm > > From jed at jedbrown.org Mon Oct 31 09:26:39 2016 From: jed at jedbrown.org (Jed Brown) Date: Mon, 31 Oct 2016 08:26:39 -0600 Subject: [petsc-users] --download-metis and build of stand-alone tools In-Reply-To: <82ee49ca0e1e404887275a198b1b81ee@MAR190n2.marin.local> References: <82ee49ca0e1e404887275a198b1b81ee@MAR190n2.marin.local> Message-ID: <87k2covbkw.fsf@jedbrown.org> "Klaij, Christiaan" writes: > Satish, > > I've noticed that SuperLU depends on metis and parmetis and that > PETSc downloads the versions 5.1.0-p3 and 4.0.3-p3. These are > different from the Karypis latest stable versions (without the > -p3). Do I really need these -p3 versions? They fix some portability and correctness bugs. Those packages are mostly unmaintained by upstream and new releases often don't fix bugs that have reproducible test cases and patches. So you can use the upstream version, but it might crash due to known bugs and good luck getting support. > If so, after configure, compilation and installation by petsc, it > seems that the stand-alone programs such as gpmetis are not being > build and installed. That's a problem for me. I don't mind > switching to the versions and config that petsc needs, but I do > need the complete thing. Can I somehow tell petsc to also build > the standalone tools? PETSc only needs or wants the library. In the pkg-metis CMakeLists.txt file, there is a line #add_subdirectory("programs") which needs to be uncommented to get the programs. Someone could add a conditional and plumb it into metis.py to make a PETSc configure option. That would be a welcome contribution. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From jed at jedbrown.org Mon Oct 31 09:44:42 2016 From: jed at jedbrown.org (Jed Brown) Date: Mon, 31 Oct 2016 08:44:42 -0600 Subject: [petsc-users] GAMG In-Reply-To: <1477911251.7699.19.camel@seamplex.com> References: <1477660420.2766.15.camel@seamplex.com> <1477661072.2766.22.camel@seamplex.com> <1477661923.2766.24.camel@seamplex.com> <1477911251.7699.19.camel@seamplex.com> Message-ID: <87h97svaqt.fsf@jedbrown.org> Jeremy Theler writes: > Hi again > > I have been wokring on these issues. Long story short: it is about the > ordering of the unknown fields in the vector. > > Long story: > The physics is linear elastic problem, you can see it does work with LU > over a simple cube (warp the displacements to see it does represent an > elastic problem, E=200e3, nu=0.3): > > https://caeplex.com/demo/results.php?id=5817146bdb561 > > > Say my three displacements (unknowns) are u,v,w. I can define the > unknown vector as (is this called node-based ordering?) > > [u1 v1 w1 u2 v2 w2 ... un vn wn]^T > > Another option is (is this called unknown-based ordering?) > > [u1 u2 ... un v1 v2 ... vn w1 w2 ... wn]^T > > > With lu/preonly the results are the same, although the stiffnes matrixes > for each case are attached as PNGs. And of course, the near-nullspace > vectors are different. So PCSetCoordinates() should work with one > ordering and not with another one, an issue I did not take into > consideration. > > After understanding Matt's point about the near nullspace (and reading > some interesting comments from Jed on scicomp stackexchange) I did built > my own vectors (I had to take a look at MatNullSpaceCreateRigidBody() > because I found out by running the code the nullspace should be an > orthonormal basis, it should say so in the docs). Where? "vecs - the vectors that span the null space (excluding the constant vector); these vectors must be orthonormal." https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatNullSpaceCreate.html And if you run in debug mode (default), as you always should until you are confident that your code is correct, MatNullSpaceCreate tests that your vectors are orthonormal. > Now, there are some results I do not understand. I tried these six > combinations: > > order near-nullspace iterations norm > ----- -------------- ---------- ---- > unknown explicit 10 1.6e-6 > unknown PCSetCoordinates 15 1.7e-7 > unknown none 15 2.4e-7 > node explicit fails with error -11 > node PCSetCoordinates fails with error -11 > node none 13 3.8e-7 Did you set a block size for the "node-based" orderings? Are you sure the above is labeled correctly? Anyway, PCSetCoordinates uses "node-based" ordering. Implementation performance will generally be better with node-based ordering -- it has better memory streaming and cache behavior. The AIJ matrix format will also automatically do an "inode" optimization to reduce memory bandwidth and enable block smoothing (default configuration uses SOR smoothing). You can use -mat_no_inode to try turning that off. > Error -11 is > PETSc's linear solver did not converge with reason > 'DIVERGED_PCSETUP_FAILED' (-11) Isn't there an actual error message? > Any explanation (for dumbs)? > Another thing to take into account: I am setting the dirichlet BCs with > MatZeroRows(), but I am not updating the columns to keep symmetry. Can > this pose a problem for GAMG? Usually minor, but it is better to maintain symmetry. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From C.Klaij at marin.nl Mon Oct 31 10:18:30 2016 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Mon, 31 Oct 2016 15:18:30 +0000 Subject: [petsc-users] --download-metis and build of stand-alone tools In-Reply-To: <87k2covbkw.fsf@jedbrown.org> References: <82ee49ca0e1e404887275a198b1b81ee@MAR190n2.marin.local>, <87k2covbkw.fsf@jedbrown.org> Message-ID: <1477927110109.14733@marin.nl> Jed, Thanks, that line in the cmake file is exactly what I needed to know. A petsc configure option would be nice to have, but it's too difficult for me to do right now, I'll just hack the file instead. Chris dr. ir. Christiaan Klaij | CFD Researcher | Research & Development MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl | http://www.marin.nl MARIN news: http://www.marin.nl/web/News/News-items/SSSRIMARIN-seminar-November-2-Shanghai.htm ________________________________________ From: Jed Brown Sent: Monday, October 31, 2016 3:26 PM To: Klaij, Christiaan; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] --download-metis and build of stand-alone tools "Klaij, Christiaan" writes: > Satish, > > I've noticed that SuperLU depends on metis and parmetis and that > PETSc downloads the versions 5.1.0-p3 and 4.0.3-p3. These are > different from the Karypis latest stable versions (without the > -p3). Do I really need these -p3 versions? They fix some portability and correctness bugs. Those packages are mostly unmaintained by upstream and new releases often don't fix bugs that have reproducible test cases and patches. So you can use the upstream version, but it might crash due to known bugs and good luck getting support. > If so, after configure, compilation and installation by petsc, it > seems that the stand-alone programs such as gpmetis are not being > build and installed. That's a problem for me. I don't mind > switching to the versions and config that petsc needs, but I do > need the complete thing. Can I somehow tell petsc to also build > the standalone tools? PETSc only needs or wants the library. In the pkg-metis CMakeLists.txt file, there is a line #add_subdirectory("programs") which needs to be uncommented to get the programs. Someone could add a conditional and plumb it into metis.py to make a PETSc configure option. That would be a welcome contribution. From fande.kong at inl.gov Mon Oct 31 10:29:38 2016 From: fande.kong at inl.gov (Kong, Fande) Date: Mon, 31 Oct 2016 09:29:38 -0600 Subject: [petsc-users] GAMG In-Reply-To: <87h97svaqt.fsf@jedbrown.org> References: <1477660420.2766.15.camel@seamplex.com> <1477661072.2766.22.camel@seamplex.com> <1477661923.2766.24.camel@seamplex.com> <1477911251.7699.19.camel@seamplex.com> <87h97svaqt.fsf@jedbrown.org> Message-ID: On Mon, Oct 31, 2016 at 8:44 AM, Jed Brown wrote: > Jeremy Theler writes: > > > Hi again > > > > I have been wokring on these issues. Long story short: it is about the > > ordering of the unknown fields in the vector. > > > > Long story: > > The physics is linear elastic problem, you can see it does work with LU > > over a simple cube (warp the displacements to see it does represent an > > elastic problem, E=200e3, nu=0.3): > > > > https://caeplex.com/demo/results.php?id=5817146bdb561 > > > > > > Say my three displacements (unknowns) are u,v,w. I can define the > > unknown vector as (is this called node-based ordering?) > > > > [u1 v1 w1 u2 v2 w2 ... un vn wn]^T > > > > Another option is (is this called unknown-based ordering?) > > > > [u1 u2 ... un v1 v2 ... vn w1 w2 ... wn]^T > > > > > > With lu/preonly the results are the same, although the stiffnes matrixes > > for each case are attached as PNGs. And of course, the near-nullspace > > vectors are different. So PCSetCoordinates() should work with one > > ordering and not with another one, an issue I did not take into > > consideration. > > > > After understanding Matt's point about the near nullspace (and reading > > some interesting comments from Jed on scicomp stackexchange) I did built > > my own vectors (I had to take a look at MatNullSpaceCreateRigidBody() > > because I found out by running the code the nullspace should be an > > orthonormal basis, it should say so in the docs). > > Where? > > "vecs - the vectors that span the null space (excluding the constant > vector); these vectors must be orthonormal." > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/ > MatNullSpaceCreate.html > > And if you run in debug mode (default), as you always should until you > are confident that your code is correct, MatNullSpaceCreate tests that > your vectors are orthonormal. > > > Now, there are some results I do not understand. I tried these six > > combinations: > > > > order near-nullspace iterations norm > > ----- -------------- ---------- ---- > > unknown explicit 10 1.6e-6 > > unknown PCSetCoordinates 15 1.7e-7 > > unknown none 15 2.4e-7 > > node explicit fails with error -11 > > node PCSetCoordinates fails with error -11 > > node none 13 3.8e-7 > > Did you set a block size for the "node-based" orderings? Are you sure > the above is labeled correctly? Anyway, PCSetCoordinates uses > "node-based" ordering. Implementation performance will generally be > better with node-based ordering -- it has better memory streaming and > cache behavior. > > The AIJ matrix format will also automatically do an "inode" optimization > to reduce memory bandwidth and enable block smoothing (default > configuration uses SOR smoothing). You can use -mat_no_inode to try > turning that off. > > > Error -11 is > > PETSc's linear solver did not converge with reason > > 'DIVERGED_PCSETUP_FAILED' (-11) > > Isn't there an actual error message? > > > Any explanation (for dumbs)? > > Another thing to take into account: I am setting the dirichlet BCs with > > MatZeroRows(), but I am not updating the columns to keep symmetry. Can > > this pose a problem for GAMG? > > Usually minor, but it is better to maintain symmetry. > If the boundary values are not zero, no way to maintain symmetry unless we reduce the extra part of the matrix. Not updating the columns is better in this situation. Fande, -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Oct 31 10:31:34 2016 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 31 Oct 2016 10:31:34 -0500 Subject: [petsc-users] GAMG In-Reply-To: References: <1477660420.2766.15.camel@seamplex.com> <1477661072.2766.22.camel@seamplex.com> <1477661923.2766.24.camel@seamplex.com> <1477911251.7699.19.camel@seamplex.com> <87h97svaqt.fsf@jedbrown.org> Message-ID: On Mon, Oct 31, 2016 at 10:29 AM, Kong, Fande wrote: > On Mon, Oct 31, 2016 at 8:44 AM, Jed Brown wrote: > >> Jeremy Theler writes: >> >> > Hi again >> > >> > I have been wokring on these issues. Long story short: it is about the >> > ordering of the unknown fields in the vector. >> > >> > Long story: >> > The physics is linear elastic problem, you can see it does work with LU >> > over a simple cube (warp the displacements to see it does represent an >> > elastic problem, E=200e3, nu=0.3): >> > >> > https://caeplex.com/demo/results.php?id=5817146bdb561 >> > >> > >> > Say my three displacements (unknowns) are u,v,w. I can define the >> > unknown vector as (is this called node-based ordering?) >> > >> > [u1 v1 w1 u2 v2 w2 ... un vn wn]^T >> > >> > Another option is (is this called unknown-based ordering?) >> > >> > [u1 u2 ... un v1 v2 ... vn w1 w2 ... wn]^T >> > >> > >> > With lu/preonly the results are the same, although the stiffnes matrixes >> > for each case are attached as PNGs. And of course, the near-nullspace >> > vectors are different. So PCSetCoordinates() should work with one >> > ordering and not with another one, an issue I did not take into >> > consideration. >> > >> > After understanding Matt's point about the near nullspace (and reading >> > some interesting comments from Jed on scicomp stackexchange) I did built >> > my own vectors (I had to take a look at MatNullSpaceCreateRigidBody() >> > because I found out by running the code the nullspace should be an >> > orthonormal basis, it should say so in the docs). >> >> Where? >> >> "vecs - the vectors that span the null space (excluding the constant >> vector); these vectors must be orthonormal." >> >> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages >> /Mat/MatNullSpaceCreate.html >> >> And if you run in debug mode (default), as you always should until you >> are confident that your code is correct, MatNullSpaceCreate tests that >> your vectors are orthonormal. >> >> > Now, there are some results I do not understand. I tried these six >> > combinations: >> > >> > order near-nullspace iterations norm >> > ----- -------------- ---------- ---- >> > unknown explicit 10 1.6e-6 >> > unknown PCSetCoordinates 15 1.7e-7 >> > unknown none 15 2.4e-7 >> > node explicit fails with error -11 >> > node PCSetCoordinates fails with error -11 >> > node none 13 3.8e-7 >> >> Did you set a block size for the "node-based" orderings? Are you sure >> the above is labeled correctly? Anyway, PCSetCoordinates uses >> "node-based" ordering. Implementation performance will generally be >> better with node-based ordering -- it has better memory streaming and >> cache behavior. >> >> The AIJ matrix format will also automatically do an "inode" optimization >> to reduce memory bandwidth and enable block smoothing (default >> configuration uses SOR smoothing). You can use -mat_no_inode to try >> turning that off. >> >> > Error -11 is >> > PETSc's linear solver did not converge with reason >> > 'DIVERGED_PCSETUP_FAILED' (-11) >> >> Isn't there an actual error message? >> >> > Any explanation (for dumbs)? >> > Another thing to take into account: I am setting the dirichlet BCs with >> > MatZeroRows(), but I am not updating the columns to keep symmetry. Can >> > this pose a problem for GAMG? >> >> Usually minor, but it is better to maintain symmetry. >> > > If the boundary values are not zero, no way to maintain symmetry unless we > reduce the extra part of the matrix. Not updating the columns is better > in this situation. > ? You just eliminate the unknowns. Matt > > Fande, > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Oct 31 11:01:43 2016 From: jed at jedbrown.org (Jed Brown) Date: Mon, 31 Oct 2016 10:01:43 -0600 Subject: [petsc-users] GAMG In-Reply-To: References: <1477660420.2766.15.camel@seamplex.com> <1477661072.2766.22.camel@seamplex.com> <1477661923.2766.24.camel@seamplex.com> <1477911251.7699.19.camel@seamplex.com> <87h97svaqt.fsf@jedbrown.org> Message-ID: <87wpgotsm0.fsf@jedbrown.org> "Kong, Fande" writes: > If the boundary values are not zero, no way to maintain symmetry unless we > reduce the extra part of the matrix. Not updating the columns is better in > this situation. The inhomogeneity of the boundary condition has nothing to do with operator symmetry. I like this formulation for Dirichlet conditions. https://scicomp.stackexchange.com/questions/3298/appropriate-space-for-weak-solutions-to-an-elliptical-pde-with-mixed-inhomogeneo/3300#3300 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 800 bytes Desc: not available URL: From jeremy at seamplex.com Mon Oct 31 12:03:04 2016 From: jeremy at seamplex.com (Jeremy Theler) Date: Mon, 31 Oct 2016 14:03:04 -0300 Subject: [petsc-users] GAMG In-Reply-To: <87h97svaqt.fsf@jedbrown.org> References: <1477660420.2766.15.camel@seamplex.com> <1477661072.2766.22.camel@seamplex.com> <1477661923.2766.24.camel@seamplex.com> <1477911251.7699.19.camel@seamplex.com> <87h97svaqt.fsf@jedbrown.org> Message-ID: <1477933384.21553.15.camel@seamplex.com> On Mon, 2016-10-31 at 08:44 -0600, Jed Brown wrote: > > After understanding Matt's point about the near nullspace (and reading > > some interesting comments from Jed on scicomp stackexchange) I did built > > my own vectors (I had to take a look at MatNullSpaceCreateRigidBody() > > because I found out by running the code the nullspace should be an > > orthonormal basis, it should say so in the docs). > > Where? > "vecs - the vectors that span the null space (excluding the constant vector); these vectors must be orthonormal." > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatNullSpaceCreate.html ok, I might have passed on that but I started with http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSetNearNullSpace.html that says ?attaches a null space to a matrix, which is often the null space (rigid body modes) of the operator without boundary conditions This null space will be used to provide near null space vectors to a multigrid preconditioner built from this matrix.? It wouldn't hurt to remind dumb users like me that ?...it is often the set of _orthonormalized_ rigid body modes...? > And if you run in debug mode (default), as you always should until you > are confident that your code is correct, MatNullSpaceCreate tests that > your vectors are orthonormal. That's how I realized I needed to normalize. Then I found MatNullSpaceCreateRigidBody() and copied the code to orthogonalize. Wouldn't it be better to orthonormalize inside MatSetNullSpace()? I bet an orthogonalization from PETSc's code would beat any user-side code. > > Now, there are some results I do not understand. I tried these six > > combinations: > > > > order near-nullspace iterations norm > > ----- -------------- ---------- ---- > > unknown explicit 10 1.6e-6 > > unknown PCSetCoordinates 15 1.7e-7 > > unknown none 15 2.4e-7 > > node explicit fails with error -11 > > node PCSetCoordinates fails with error -11 > > node none 13 3.8e-7 > > Did you set a block size for the "node-based" orderings? Are you sure > the above is labeled correctly? Anyway, PCSetCoordinates uses > "node-based" ordering. Implementation performance will generally be > better with node-based ordering -- it has better memory streaming and > cache behavior. Yes. Indeed, when I save the stiffnes matrix as a binary file I get a .info file that contains -matload_block_size 3 The labeling is right, I re-checked. That's the funny part, I can't get GAMG to work with PCSetCoordinates (which BTW, I think its documentation does not address the issue of DOF ordering). Any idea of what can be happening to me? > The AIJ matrix format will also automatically do an "inode" optimization > to reduce memory bandwidth and enable block smoothing (default > configuration uses SOR smoothing). You can use -mat_no_inode to try > turning that off. That option does not make any difference. > > > Error -11 is > > PETSc's linear solver did not converge with reason > > 'DIVERGED_PCSETUP_FAILED' (-11) > Isn't there an actual error message? Sorry, KSPGetConvergedReason() returns -11 and then my code prints that error string. Find attached the output with -info. Thanks -- jeremy -------------- next part -------------- [0] PetscInitialize(): PETSc successfully started: number of processors = 1 [0] PetscGetHostName(): Rejecting domainname, likely is NIS tom.(none) [0] PetscInitialize(): Running on machine: tom [0] SlepcInitialize(): SLEPc successfully started 697 3611 [0] PetscCommDuplicate(): Duplicating a communicator 2 2 max tags = 100000000 [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 2091 X 2091; storage space: 835506 unneeded,74079 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 2091) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 697 nodes of 2091. Limit used: 5. Using Inode routines [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 2091 X 2091; storage space: 0 unneeded,74079 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 2091) < 0.6. Do not use CompressedRow routines. [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 2091 X 2091; storage space: 0 unneeded,74079 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 2091) < 0.6. Do not use CompressedRow routines. [0] PCSetUp(): Setting up PC for first time [0] PCSetUp_GAMG(): level 0) N=2091, n data rows=3, n data cols=6, nnz/row (ave)=35, np=1 [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 697 X 697; storage space: 0 unneeded,8231 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 697) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 697 nodes out of 697 rows. Not using Inode routines [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 697 X 697; storage space: 915 unneeded,7316 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 697) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 697 nodes out of 697 rows. Not using Inode routines [0] PCGAMGFilterGraph(): 88.8835% nnz after filtering, with threshold 0., 11.8092 nnz ave. (N=697) [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square [0] MatGetSymbolicTranspose_SeqAIJ(): Getting Symbolic Transpose. [0] PetscCommDuplicate(): Duplicating a communicator 1 3 max tags = 100000000 [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 697 X 697; storage space: 0 unneeded,7316 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 697) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 697 nodes out of 697 rows. Not using Inode routines [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 697 X 697; storage space: 0 unneeded,33067 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 116 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 697) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 697 nodes out of 697 rows. Not using Inode routines [0] MatMatMultSymbolic_SeqAIJ_SeqAIJ(): Reallocs 0; Fill ratio: given 2. needed 2.25992. [0] MatMatMultSymbolic_SeqAIJ_SeqAIJ(): Use MatMatMult(A,B,MatReuse,2.25992,&C) for best performance.; [0] MatRestoreSymbolicTranspose_SeqAIJ(): Restoring Symbolic Transpose. [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 697 X 697; storage space: 0 unneeded,33067 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 116 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 697) < 0.6. Do not use CompressedRow routines. [0] PetscCommDuplicate(): Using internal PETSc communicator 1 3 [0] maxIndSetAgg(): removed 0 of 697 vertices. 47 selected. [0] PCGAMGProlongator_AGG(): New grid 47 nodes [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 2091 X 282; storage space: 0 unneeded,12546 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 6 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 2091) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 678 nodes of 2091. Limit used: 5. Using Inode routines [0] PCSetUp(): Setting up PC for first time [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=4.239704e+00 min=1.138183e-01 PC=jacobi [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 2091 X 282; storage space: 0 unneeded,39168 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 54 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 2091) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 697 nodes of 2091. Limit used: 5. Using Inode routines [0] MatMatMultSymbolic_SeqAIJ_SeqAIJ(): Reallocs 0; Fill ratio: given 2. needed 1.. [0] MatMatMultSymbolic_SeqAIJ_SeqAIJ(): Use MatMatMult(A,B,MatReuse,1.,&C) for best performance.; [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 2091 X 282; storage space: 0 unneeded,39168 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 54 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 2091) < 0.6. Do not use CompressedRow routines. [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 2091 X 282; storage space: 0 unneeded,39168 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 54 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 2091) < 0.6. Do not use CompressedRow routines. [0] Petsc_DelComm_Inner(): Removing reference to PETSc communicator embedded in a user MPI_Comm 3 [0] Petsc_DelComm_Outer(): User MPI_Comm 1 is being freed after removing reference from inner PETSc comm to this outer comm [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 3 [0] Petsc_DelCounter(): Deleting counter data in an MPI_Comm 3 [0] MatGetSymbolicTranspose_SeqAIJ(): Getting Symbolic Transpose. [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 282 X 282; storage space: 0 unneeded,33516 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 234 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 282) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 94 nodes of 282. Limit used: 5. Using Inode routines [0] MatRestoreSymbolicTranspose_SeqAIJ(): Restoring Symbolic Transpose. [0] MatPtAPSymbolic_SeqAIJ_SeqAIJ_SparseAxpy(): Reallocs 0; Fill ratio: given 2. needed 1.. [0] MatPtAPSymbolic_SeqAIJ_SeqAIJ_SparseAxpy(): Use MatPtAP(A,P,MatReuse,1.,&C) for best performance. [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 282 X 282; storage space: 0 unneeded,33516 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 234 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 282) < 0.6. Do not use CompressedRow routines. [0] PCSetUp_GAMG(): 1) N=282, n data cols=6, nnz/row (ave)=118, 1 active pes [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 47 X 47; storage space: 0 unneeded,931 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 39 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 47) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 47 nodes out of 47 rows. Not using Inode routines [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 47 X 47; storage space: 14 unneeded,917 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 39 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 47) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 47 nodes out of 47 rows. Not using Inode routines [0] PCGAMGFilterGraph(): 98.4962% nnz after filtering, with threshold 0., 19.8085 nnz ave. (N=47) [0] PetscCommDuplicate(): Duplicating a communicator 1 3 max tags = 100000000 [0] maxIndSetAgg(): removed 0 of 47 vertices. 10 selected. [0] Petsc_DelComm_Inner(): Removing reference to PETSc communicator embedded in a user MPI_Comm 3 [0] Petsc_DelComm_Outer(): User MPI_Comm 1 is being freed after removing reference from inner PETSc comm to this outer comm [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 3 [0] Petsc_DelCounter(): Deleting counter data in an MPI_Comm 3 [0] PCGAMGProlongator_AGG(): New grid 10 nodes [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 282 X 60; storage space: 0 unneeded,1692 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 6 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 282) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 84 nodes of 282. Limit used: 5. Using Inode routines [0] PCSetUp(): Setting up PC for first time [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp_Jacobi(): Zero detected in diagonal of matrix, using 1 at those locations [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=5.217159e+00 min=1.912394e-05 PC=jacobi [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 282 X 60; storage space: 0 unneeded,8892 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 54 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 282) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 91 nodes of 282. Limit used: 5. Using Inode routines [0] MatMatMultSymbolic_SeqAIJ_SeqAIJ(): Reallocs 0; Fill ratio: given 2. needed 1.. [0] MatMatMultSymbolic_SeqAIJ_SeqAIJ(): Use MatMatMult(A,B,MatReuse,1.,&C) for best performance.; [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 282 X 60; storage space: 0 unneeded,8892 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 54 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 282) < 0.6. Do not use CompressedRow routines. [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 282 X 60; storage space: 0 unneeded,8892 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 54 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 282) < 0.6. Do not use CompressedRow routines. [0] MatGetSymbolicTranspose_SeqAIJ(): Getting Symbolic Transpose. [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 60 X 60; storage space: 0 unneeded,3600 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 60 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 60) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 12 nodes of 60. Limit used: 5. Using Inode routines [0] MatRestoreSymbolicTranspose_SeqAIJ(): Restoring Symbolic Transpose. [0] MatPtAPSymbolic_SeqAIJ_SeqAIJ_SparseAxpy(): Reallocs 0; Fill ratio: given 2. needed 1.. [0] MatPtAPSymbolic_SeqAIJ_SeqAIJ_SparseAxpy(): Use MatPtAP(A,P,MatReuse,1.,&C) for best performance. [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 60 X 60; storage space: 0 unneeded,3600 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 60 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 60) < 0.6. Do not use CompressedRow routines. [0] PCSetUp_GAMG(): 2) N=60, n data cols=6, nnz/row (ave)=60, 1 active pes [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 10 X 10; storage space: 0 unneeded,100 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 10 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 10) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 2 nodes of 10. Limit used: 5. Using Inode routines [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 10 X 10; storage space: 0 unneeded,100 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 10 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 10) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 2 nodes of 10. Limit used: 5. Using Inode routines [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold 0., 10. nnz ave. (N=10) [0] PetscCommDuplicate(): Duplicating a communicator 1 3 max tags = 100000000 [0] maxIndSetAgg(): removed 0 of 10 vertices. 1 selected. [0] Petsc_DelComm_Inner(): Removing reference to PETSc communicator embedded in a user MPI_Comm 3 [0] Petsc_DelComm_Outer(): User MPI_Comm 1 is being freed after removing reference from inner PETSc comm to this outer comm [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 3 [0] Petsc_DelCounter(): Deleting counter data in an MPI_Comm 3 [0] PCGAMGProlongator_AGG(): New grid 1 nodes [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 60 X 6; storage space: 0 unneeded,360 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 6 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 60) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 12 nodes of 60. Limit used: 5. Using Inode routines [0] PCSetUp(): Setting up PC for first time [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp_Jacobi(): Zero detected in diagonal of matrix, using 1 at those locations [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=6.483096e+01 min=2.507448e-05 PC=jacobi [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 60 X 6; storage space: 0 unneeded,360 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 6 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 60) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 12 nodes of 60. Limit used: 5. Using Inode routines [0] MatMatMultSymbolic_SeqAIJ_SeqAIJ(): Reallocs 0; Fill ratio: given 2. needed 1.. [0] MatMatMultSymbolic_SeqAIJ_SeqAIJ(): Use MatMatMult(A,B,MatReuse,1.,&C) for best performance.; [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 60 X 6; storage space: 0 unneeded,360 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 6 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 60) < 0.6. Do not use CompressedRow routines. [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 60 X 6; storage space: 0 unneeded,360 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 6 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 60) < 0.6. Do not use CompressedRow routines. [0] MatGetSymbolicTranspose_SeqAIJ(): Getting Symbolic Transpose. [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 6 X 6; storage space: 0 unneeded,36 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 6 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 6) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 2 nodes of 6. Limit used: 5. Using Inode routines [0] MatRestoreSymbolicTranspose_SeqAIJ(): Restoring Symbolic Transpose. [0] MatPtAPSymbolic_SeqAIJ_SeqAIJ_SparseAxpy(): Reallocs 0; Fill ratio: given 2. needed 1.. [0] MatPtAPSymbolic_SeqAIJ_SeqAIJ_SparseAxpy(): Use MatPtAP(A,P,MatReuse,1.,&C) for best performance. [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 6 X 6; storage space: 0 unneeded,36 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 6 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 6) < 0.6. Do not use CompressedRow routines. [0] PCSetUp_GAMG(): 3) N=6, n data cols=6, nnz/row (ave)=6, 1 active pes [0] PCSetUp_GAMG(): HARD stop of coarsening on level 2. Grid too small: 1 block nodes [0] PCSetUp_GAMG(): 4 levels, grid complexity = 1.50152 [0] PCSetUp(): Setting up PC for first time [0] PetscCommDuplicate(): Duplicating a communicator 1 3 max tags = 100000000 [0] PetscCommDuplicate(): Using internal PETSc communicator 1 3 [0] PetscCommDuplicate(): Using internal PETSc communicator 1 3 [0] PCSetUp_MG(): Using outer operators to define finest grid operator because PCMGGetSmoother(pc,nlevels-1,&ksp);KSPSetOperators(ksp,...); was not called. [0] PCSetUp(): Setting up PC for first time [0] PCSetUp(): Setting up PC for first time [0] PCSetUp(): Setting up PC for first time [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] KSPSolve_Chebyshev(): Eigen estimator ran for prescribed number of iterations [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PetscKernel_A_gets_inverse_A_5(): Zero pivot, row 3 [0] MatSOR_SeqAIJ_Inode(): Zero pivot, row 17 [0] PetscKernel_A_gets_inverse_A_5(): Zero pivot, row 3 [0] MatSOR_SeqAIJ_Inode(): Zero pivot, row 23 [0] PetscKernel_A_gets_inverse_A_5(): Zero pivot, row 3 [0] MatSOR_SeqAIJ_Inode(): Zero pivot, row 29 [0] PetscKernel_A_gets_inverse_A_5(): Zero pivot, row 3 [0] MatSOR_SeqAIJ_Inode(): Zero pivot, row 41 [0] PetscKernel_A_gets_inverse_A_5(): Zero pivot, row 3 [0] MatSOR_SeqAIJ_Inode(): Zero pivot, row 101 [0] KSPSolve_Chebyshev(): Eigen estimator KSP_DIVERGED_PCSETUP_FAILED [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PetscKernel_A_gets_inverse_A_5(): Zero pivot, row 0 [0] KSPSolve_Chebyshev(): Eigen estimator KSP_DIVERGED_PCSETUP_FAILED [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Setting up PC for first time [0] PetscCommDuplicate(): Using internal PETSc communicator 1 3 [0] PetscCommDuplicate(): Using internal PETSc communicator 1 3 [0] PetscCommDuplicate(): Using internal PETSc communicator 1 3 [0] PetscCommDuplicate(): Using internal PETSc communicator 1 3 [0] PetscCommDuplicate(): Using internal PETSc communicator 1 3 [0] MatLUFactorSymbolic_SeqAIJ(): Reallocs 0 Fill ratio:given 5. needed 1. [0] MatLUFactorSymbolic_SeqAIJ(): Run with -pc_factor_fill 1. or use [0] MatLUFactorSymbolic_SeqAIJ(): PCFactorSetFill(pc,1.); [0] MatLUFactorSymbolic_SeqAIJ(): for best performance. [0] MatSeqAIJCheckInode_FactorLU(): Found 2 nodes of 6. Limit used: 5. Using Inode routines [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged error: PETSc's linear solver did not converge with reason 'DIVERGED_PCSETUP_FAILED' (-11) [0] Petsc_DelComm_Inner(): Removing reference to PETSc communicator embedded in a user MPI_Comm 3 [0] Petsc_DelComm_Outer(): User MPI_Comm 1 is being freed after removing reference from inner PETSc comm to this outer comm [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 3 [0] Petsc_DelCounter(): Deleting counter data in an MPI_Comm 3 [0] PetscFinalize(): PetscFinalize() called From support at careerbuiIder.com Mon Oct 31 12:04:45 2016 From: support at careerbuiIder.com (CareerBuiIder.com) Date: Mon, 31 Oct 2016 18:04:45 +0100 (CET) Subject: [petsc-users] We have an open position in our team for you! Message-ID: <20161031170445.D90DD884E3B@02server.chiariglione.com> An HTML attachment was scrubbed... URL: From peetz2 at illinois.edu Mon Oct 31 16:40:46 2016 From: peetz2 at illinois.edu (Peetz, Darin T) Date: Mon, 31 Oct 2016 21:40:46 +0000 Subject: [petsc-users] Provide Matrix Factorization to EPS for Generalized Eigenvalue Problem Message-ID: Hello, I'm wondering how I could go about providing a matrix factorization calculated in Petsc to the eigenvalue routines in Slepc. I'm trying to solve the eigenvalue problem for stability, where the solution to KU=F is needed to construct the A-matrix (K_sigma) in the eigenvalue problem. Since the eigenvalue problem is generalized, it seems like the best way to solve it is to factorize the B-Matrix (K, same as in KU=F) with a package like MUMPS and use a method in Slepc such as Krylov-Schur. Since I need to solve both KU=F and the eigenvalue problem, I'd like to compute the factorization of K first to solve KU=F, and then reuse it in the EPS routines. I've tried using EPSGetST() and STSetKSP() to provide the KSP object that I used to solve KU=F, but then for some reason when I change nonzero values in K (but not nonzero locations) Petsc redoes the symbolic factorization when I go to solve KU=F again (it's part of an optimization routine, so I'm solving both problems, updating parts of K, and repeating). This does provide the correct solution, and allows me to use the same factorization for KU=F and the eigenvalue problem, but the extra symbolic factorizations, while comparatively cheap, are unnecessary and ideally should be eliminated. If I skip the calls to EPSGetST() and STSetKSP(), the symbolic factorization for the KSP object associated with KU=F is only performed once, as it should be. Is there some option I'm overlooking, or maybe a better way to go about this? Thanks, Darin -------------- next part -------------- An HTML attachment was scrubbed... URL: