From patrick.sanan at gmail.com Fri Nov 1 05:41:09 2019 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Fri, 1 Nov 2019 11:41:09 +0100 Subject: [petsc-users] Do the guards against calling MPI_Comm_dup() in PetscCommDuplicate() apply with Fortran? Message-ID: *Context:* I'm trying to track down an error that (only) arises when running a Fortran 90 code, using PETSc, on a new cluster. The code creates and destroys a linear system (Mat,Vec, and KSP) at each of (many) timesteps. The error message from a user looks like this, which leads me to suspect that MPI_Comm_dup() is being called many times and this is eventually a problem for this particular MPI implementation (Open MPI 2.1.0): [lo-a2-058:21425] *** An error occurred in MPI_Comm_dup [lo-a2-058:21425] *** reported by process [4222287873,2] [lo-a2-058:21425] *** on communicator MPI COMMUNICATOR 65534 DUP FROM 65533 [lo-a2-058:21425] *** MPI_ERR_INTERN: internal error [lo-a2-058:21425] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [lo-a2-058:21425] *** and potentially your MPI job) *Question: *I remember some discussion recently (but can't find the thread) about not calling MPI_Comm_dup() too many times from PetscCommDuplicate(), which would allow one to safely use the (admittedly not optimal) approach used in this application code. Is that a correct understanding and would the fixes made in that context also apply to Fortran? I don't fully understand the details of the MPI techniques used, so thought I'd ask here. If I hack a simple build-solve-destroy example to run several loops, I see a notable difference between C and Fortran examples. With the attached ex223.c and ex221f.F90, which just add outer loops (5 iterations) to KSP tutorials examples ex23.c and ex21f.F90, respectively, I see the following. Note that in the Fortran case, it appears that communicators are actually duplicated in each loop, but in the C case, this only happens in the first loop: [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex223 -info | grep PetscCommDuplicate [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex221f -info | grep PetscCommDuplicate [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex221f.F90 Type: application/octet-stream Size: 10705 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex223.c Type: application/octet-stream Size: 7641 bytes Desc: not available URL: From stefano.zampini at gmail.com Fri Nov 1 06:16:59 2019 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Fri, 1 Nov 2019 14:16:59 +0300 Subject: [petsc-users] Do the guards against calling MPI_Comm_dup() in PetscCommDuplicate() apply with Fortran? In-Reply-To: References: Message-ID: <6CA72589-DD32-4FC8-B7CD-D692401DBE7A@gmail.com> From src/sys/objects/ftn-custom/zstart.c petscinitialize_internal PETSC_COMM_WORLD = MPI_COMM_WORLD Which means that PETSC_COMM_WORLD is not a PETSc communicator. The first matrix creation duplicates the PETSC_COMM_WORLD and thus can be reused for the other objects When you finally destroy the matrix inside the loop, the ref count of this duplicated comm goes to zero and it is free This is why you duplicate at each step However, the C version of PetscInitialize does the same, so I?m not sure why this happens with Fortran and not with C. (Do you leak objects in the C code?) > On Nov 1, 2019, at 1:41 PM, Patrick Sanan via petsc-users wrote: > > Context: I'm trying to track down an error that (only) arises when running a Fortran 90 code, using PETSc, on a new cluster. The code creates and destroys a linear system (Mat,Vec, and KSP) at each of (many) timesteps. The error message from a user looks like this, which leads me to suspect that MPI_Comm_dup() is being called many times and this is eventually a problem for this particular MPI implementation (Open MPI 2.1.0): > > [lo-a2-058:21425] *** An error occurred in MPI_Comm_dup > [lo-a2-058:21425] *** reported by process [4222287873,2] > [lo-a2-058:21425] *** on communicator MPI COMMUNICATOR 65534 DUP FROM 65533 > [lo-a2-058:21425] *** MPI_ERR_INTERN: internal error > [lo-a2-058:21425] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > [lo-a2-058:21425] *** and potentially your MPI job) > > Question: I remember some discussion recently (but can't find the thread) about not calling MPI_Comm_dup() too many times from PetscCommDuplicate(), which would allow one to safely use the (admittedly not optimal) approach used in this application code. Is that a correct understanding and would the fixes made in that context also apply to Fortran? I don't fully understand the details of the MPI techniques used, so thought I'd ask here. > > If I hack a simple build-solve-destroy example to run several loops, I see a notable difference between C and Fortran examples. With the attached ex223.c and ex221f.F90, which just add outer loops (5 iterations) to KSP tutorials examples ex23.c and ex21f.F90, respectively, I see the following. Note that in the Fortran case, it appears that communicators are actually duplicated in each loop, but in the C case, this only happens in the first loop: > > [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex223 -info | grep PetscCommDuplicate > [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > > [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex221f -info | grep PetscCommDuplicate > [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Fri Nov 1 06:36:03 2019 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Fri, 1 Nov 2019 14:36:03 +0300 Subject: [petsc-users] Do the guards against calling MPI_Comm_dup() in PetscCommDuplicate() apply with Fortran? In-Reply-To: <6CA72589-DD32-4FC8-B7CD-D692401DBE7A@gmail.com> References: <6CA72589-DD32-4FC8-B7CD-D692401DBE7A@gmail.com> Message-ID: <4D16D3C4-F23A-425D-B776-C21BE1B4742A@gmail.com> I know why your C code does not duplicate the comm at each step. This is because it uses PETSC_VIEWER_STDOUT_WORLD, which basically inserts the duplicated comm into PETSC_COMM_WORLD as attribute. Try removing the KSPView call and you will see the C code behaves as the Fortran one. > On Nov 1, 2019, at 2:16 PM, Stefano Zampini wrote: > > From src/sys/objects/ftn-custom/zstart.c petscinitialize_internal > > PETSC_COMM_WORLD = MPI_COMM_WORLD > > Which means that PETSC_COMM_WORLD is not a PETSc communicator. > > The first matrix creation duplicates the PETSC_COMM_WORLD and thus can be reused for the other objects > When you finally destroy the matrix inside the loop, the ref count of this duplicated comm goes to zero and it is free > This is why you duplicate at each step > > However, the C version of PetscInitialize does the same, so I?m not sure why this happens with Fortran and not with C. (Do you leak objects in the C code?) > > >> On Nov 1, 2019, at 1:41 PM, Patrick Sanan via petsc-users > wrote: >> >> Context: I'm trying to track down an error that (only) arises when running a Fortran 90 code, using PETSc, on a new cluster. The code creates and destroys a linear system (Mat,Vec, and KSP) at each of (many) timesteps. The error message from a user looks like this, which leads me to suspect that MPI_Comm_dup() is being called many times and this is eventually a problem for this particular MPI implementation (Open MPI 2.1.0): >> >> [lo-a2-058:21425] *** An error occurred in MPI_Comm_dup >> [lo-a2-058:21425] *** reported by process [4222287873,2] >> [lo-a2-058:21425] *** on communicator MPI COMMUNICATOR 65534 DUP FROM 65533 >> [lo-a2-058:21425] *** MPI_ERR_INTERN: internal error >> [lo-a2-058:21425] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >> [lo-a2-058:21425] *** and potentially your MPI job) >> >> Question: I remember some discussion recently (but can't find the thread) about not calling MPI_Comm_dup() too many times from PetscCommDuplicate(), which would allow one to safely use the (admittedly not optimal) approach used in this application code. Is that a correct understanding and would the fixes made in that context also apply to Fortran? I don't fully understand the details of the MPI techniques used, so thought I'd ask here. >> >> If I hack a simple build-solve-destroy example to run several loops, I see a notable difference between C and Fortran examples. With the attached ex223.c and ex221f.F90, which just add outer loops (5 iterations) to KSP tutorials examples ex23.c and ex21f.F90, respectively, I see the following. Note that in the Fortran case, it appears that communicators are actually duplicated in each loop, but in the C case, this only happens in the first loop: >> >> [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex223 -info | grep PetscCommDuplicate >> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> >> [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex221f -info | grep PetscCommDuplicate >> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.sanan at gmail.com Fri Nov 1 06:45:04 2019 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Fri, 1 Nov 2019 12:45:04 +0100 Subject: [petsc-users] Do the guards against calling MPI_Comm_dup() in PetscCommDuplicate() apply with Fortran? In-Reply-To: <4D16D3C4-F23A-425D-B776-C21BE1B4742A@gmail.com> References: <6CA72589-DD32-4FC8-B7CD-D692401DBE7A@gmail.com> <4D16D3C4-F23A-425D-B776-C21BE1B4742A@gmail.com> Message-ID: Ah, really interesting! In the attached ex321f.F90, I create a dummy KSP before the loop, and indeed the behavior is as you say - no duplications [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex321f -info | grep PetscCommDuplicate [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 I've asked the user to re-run with -info, so then I'll hopefully be able to see whether the duplication is happening as I expect (in which case your insight might provide at least a workaround), and to see if it's choosing a new communicator number each time, somehow. > Am 01.11.2019 um 12:36 schrieb Stefano Zampini : > > I know why your C code does not duplicate the comm at each step. This is because it uses PETSC_VIEWER_STDOUT_WORLD, which basically inserts the duplicated comm into PETSC_COMM_WORLD as attribute. Try removing the KSPView call and you will see the C code behaves as the Fortran one. > > >> On Nov 1, 2019, at 2:16 PM, Stefano Zampini > wrote: >> >> From src/sys/objects/ftn-custom/zstart.c petscinitialize_internal >> >> PETSC_COMM_WORLD = MPI_COMM_WORLD >> >> Which means that PETSC_COMM_WORLD is not a PETSc communicator. >> >> The first matrix creation duplicates the PETSC_COMM_WORLD and thus can be reused for the other objects >> When you finally destroy the matrix inside the loop, the ref count of this duplicated comm goes to zero and it is free >> This is why you duplicate at each step >> >> However, the C version of PetscInitialize does the same, so I?m not sure why this happens with Fortran and not with C. (Do you leak objects in the C code?) >> >> >>> On Nov 1, 2019, at 1:41 PM, Patrick Sanan via petsc-users > wrote: >>> >>> Context: I'm trying to track down an error that (only) arises when running a Fortran 90 code, using PETSc, on a new cluster. The code creates and destroys a linear system (Mat,Vec, and KSP) at each of (many) timesteps. The error message from a user looks like this, which leads me to suspect that MPI_Comm_dup() is being called many times and this is eventually a problem for this particular MPI implementation (Open MPI 2.1.0): >>> >>> [lo-a2-058:21425] *** An error occurred in MPI_Comm_dup >>> [lo-a2-058:21425] *** reported by process [4222287873,2] >>> [lo-a2-058:21425] *** on communicator MPI COMMUNICATOR 65534 DUP FROM 65533 >>> [lo-a2-058:21425] *** MPI_ERR_INTERN: internal error >>> [lo-a2-058:21425] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >>> [lo-a2-058:21425] *** and potentially your MPI job) >>> >>> Question: I remember some discussion recently (but can't find the thread) about not calling MPI_Comm_dup() too many times from PetscCommDuplicate(), which would allow one to safely use the (admittedly not optimal) approach used in this application code. Is that a correct understanding and would the fixes made in that context also apply to Fortran? I don't fully understand the details of the MPI techniques used, so thought I'd ask here. >>> >>> If I hack a simple build-solve-destroy example to run several loops, I see a notable difference between C and Fortran examples. With the attached ex223.c and ex221f.F90, which just add outer loops (5 iterations) to KSP tutorials examples ex23.c and ex21f.F90, respectively, I see the following. Note that in the Fortran case, it appears that communicators are actually duplicated in each loop, but in the C case, this only happens in the first loop: >>> >>> [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex223 -info | grep PetscCommDuplicate >>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> >>> [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex221f -info | grep PetscCommDuplicate >>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> >>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex321f.F90 Type: application/octet-stream Size: 10854 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Fri Nov 1 06:48:57 2019 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Fri, 1 Nov 2019 14:48:57 +0300 Subject: [petsc-users] Do the guards against calling MPI_Comm_dup() in PetscCommDuplicate() apply with Fortran? In-Reply-To: References: <6CA72589-DD32-4FC8-B7CD-D692401DBE7A@gmail.com> <4D16D3C4-F23A-425D-B776-C21BE1B4742A@gmail.com> Message-ID: It seems we don?t have a fortran wrapper for PetscCommDuplicate (or at least I cannot find it) Is this an oversight? If we have a Fortran wrapper for PetscComm{Duplicate~Destroy}, the proper fix will be to call PetscCommDuplicate(PETSC_COMM_WORLD,&user_petsc_comm) after PetscInitalize and PetscCommDestroy(&user_petsc_comm) right before PetscFinalize is called in your app > On Nov 1, 2019, at 2:45 PM, Patrick Sanan wrote: > > Ah, really interesting! In the attached ex321f.F90, I create a dummy KSP before the loop, and indeed the behavior is as you say - no duplications > > > [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex321f -info | grep PetscCommDuplicate > [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > > I've asked the user to re-run with -info, so then I'll hopefully be able to see whether the duplication is happening as I expect (in which case your insight might provide at least a workaround), and to see if it's choosing a new communicator number each time, somehow. > >> Am 01.11.2019 um 12:36 schrieb Stefano Zampini >: >> >> I know why your C code does not duplicate the comm at each step. This is because it uses PETSC_VIEWER_STDOUT_WORLD, which basically inserts the duplicated comm into PETSC_COMM_WORLD as attribute. Try removing the KSPView call and you will see the C code behaves as the Fortran one. >> >> >>> On Nov 1, 2019, at 2:16 PM, Stefano Zampini > wrote: >>> >>> From src/sys/objects/ftn-custom/zstart.c petscinitialize_internal >>> >>> PETSC_COMM_WORLD = MPI_COMM_WORLD >>> >>> Which means that PETSC_COMM_WORLD is not a PETSc communicator. >>> >>> The first matrix creation duplicates the PETSC_COMM_WORLD and thus can be reused for the other objects >>> When you finally destroy the matrix inside the loop, the ref count of this duplicated comm goes to zero and it is free >>> This is why you duplicate at each step >>> >>> However, the C version of PetscInitialize does the same, so I?m not sure why this happens with Fortran and not with C. (Do you leak objects in the C code?) >>> >>> >>>> On Nov 1, 2019, at 1:41 PM, Patrick Sanan via petsc-users > wrote: >>>> >>>> Context: I'm trying to track down an error that (only) arises when running a Fortran 90 code, using PETSc, on a new cluster. The code creates and destroys a linear system (Mat,Vec, and KSP) at each of (many) timesteps. The error message from a user looks like this, which leads me to suspect that MPI_Comm_dup() is being called many times and this is eventually a problem for this particular MPI implementation (Open MPI 2.1.0): >>>> >>>> [lo-a2-058:21425] *** An error occurred in MPI_Comm_dup >>>> [lo-a2-058:21425] *** reported by process [4222287873,2] >>>> [lo-a2-058:21425] *** on communicator MPI COMMUNICATOR 65534 DUP FROM 65533 >>>> [lo-a2-058:21425] *** MPI_ERR_INTERN: internal error >>>> [lo-a2-058:21425] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >>>> [lo-a2-058:21425] *** and potentially your MPI job) >>>> >>>> Question: I remember some discussion recently (but can't find the thread) about not calling MPI_Comm_dup() too many times from PetscCommDuplicate(), which would allow one to safely use the (admittedly not optimal) approach used in this application code. Is that a correct understanding and would the fixes made in that context also apply to Fortran? I don't fully understand the details of the MPI techniques used, so thought I'd ask here. >>>> >>>> If I hack a simple build-solve-destroy example to run several loops, I see a notable difference between C and Fortran examples. With the attached ex223.c and ex221f.F90, which just add outer loops (5 iterations) to KSP tutorials examples ex23.c and ex21f.F90, respectively, I see the following. Note that in the Fortran case, it appears that communicators are actually duplicated in each loop, but in the C case, this only happens in the first loop: >>>> >>>> [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex223 -info | grep PetscCommDuplicate >>>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> >>>> [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex221f -info | grep PetscCommDuplicate >>>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>>> >>>> >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.sanan at gmail.com Fri Nov 1 10:09:52 2019 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Fri, 1 Nov 2019 16:09:52 +0100 Subject: [petsc-users] Do the guards against calling MPI_Comm_dup() in PetscCommDuplicate() apply with Fortran? In-Reply-To: References: <6CA72589-DD32-4FC8-B7CD-D692401DBE7A@gmail.com> <4D16D3C4-F23A-425D-B776-C21BE1B4742A@gmail.com> Message-ID: I don't see those interfaces, either. If there was a reason that they're non-trivial to implement, we should at least note on the man pages in "Fortran Note:" sections that they don't exist. In this particular instance, we can get by without those interfaces by just creating and destroying the KSP once (the settings are constant), thus hanging onto a reference that way. I'll wait for our -info run to come back and will then confirm that this fixes things. Thanks again, Stefano! Am Fr., 1. Nov. 2019 um 12:49 Uhr schrieb Stefano Zampini < stefano.zampini at gmail.com>: > It seems we don?t have a fortran wrapper for PetscCommDuplicate (or at > least I cannot find it) Is this an oversight? > > If we have a Fortran wrapper for PetscComm{Duplicate~Destroy}, the proper > fix will be to call PetscCommDuplicate(PETSC_COMM_WORLD,&user_petsc_comm) > after PetscInitalize and PetscCommDestroy(&user_petsc_comm) right before > PetscFinalize is called in your app > > On Nov 1, 2019, at 2:45 PM, Patrick Sanan wrote: > > Ah, really interesting! In the attached ex321f.F90, I create a dummy KSP > before the loop, and indeed the behavior is as you say - no duplications > > > [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex321f -info | grep > PetscCommDuplicate > [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 > -2080374784 max tags = 268435455 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > > I've asked the user to re-run with -info, so then I'll hopefully be able > to see whether the duplication is happening as I expect (in which case your > insight might provide at least a workaround), and to see if it's choosing a > new communicator number each time, somehow. > > Am 01.11.2019 um 12:36 schrieb Stefano Zampini >: > > I know why your C code does not duplicate the comm at each step. This is > because it uses PETSC_VIEWER_STDOUT_WORLD, which basically inserts the > duplicated comm into PETSC_COMM_WORLD as attribute. Try removing the > KSPView call and you will see the C code behaves as the Fortran one. > > > On Nov 1, 2019, at 2:16 PM, Stefano Zampini > wrote: > > From src/sys/objects/ftn-custom/zstart.c petscinitialize_internal > > PETSC_COMM_WORLD = MPI_COMM_WORLD > > Which means that PETSC_COMM_WORLD is not a PETSc communicator. > > The first matrix creation duplicates the PETSC_COMM_WORLD and thus can be > reused for the other objects > When you finally destroy the matrix inside the loop, the ref count of this > duplicated comm goes to zero and it is free > This is why you duplicate at each step > > However, the C version of PetscInitialize does the same, so I?m not sure > why this happens with Fortran and not with C. (Do you leak objects in the C > code?) > > > On Nov 1, 2019, at 1:41 PM, Patrick Sanan via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > *Context:* I'm trying to track down an error that (only) arises when > running a Fortran 90 code, using PETSc, on a new cluster. The code creates > and destroys a linear system (Mat,Vec, and KSP) at each of (many) > timesteps. The error message from a user looks like this, which leads me to > suspect that MPI_Comm_dup() is being called many times and this is > eventually a problem for this particular MPI implementation (Open MPI > 2.1.0): > > > [lo-a2-058:21425] *** An error occurred in MPI_Comm_dup > [lo-a2-058:21425] *** reported by process [4222287873,2] > [lo-a2-058:21425] *** on communicator MPI COMMUNICATOR 65534 DUP FROM 65533 > [lo-a2-058:21425] *** MPI_ERR_INTERN: internal error > [lo-a2-058:21425] *** MPI_ERRORS_ARE_FATAL (processes in this communicator > will now abort, > [lo-a2-058:21425] *** and potentially your MPI job) > > *Question: *I remember some discussion recently (but can't find the > thread) about not calling MPI_Comm_dup() too many times from > PetscCommDuplicate(), which would allow one to safely use the (admittedly > not optimal) approach used in this application code. Is that a correct > understanding and would the fixes made in that context also apply to > Fortran? I don't fully understand the details of the MPI techniques used, > so thought I'd ask here. > > If I hack a simple build-solve-destroy example to run several loops, I see > a notable difference between C and Fortran examples. With the attached > ex223.c and ex221f.F90, which just add outer loops (5 iterations) to KSP > tutorials examples ex23.c and ex21f.F90, respectively, I see the following. > Note that in the Fortran case, it appears that communicators are actually > duplicated in each loop, but in the C case, this only happens in the first > loop: > > [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex223 -info | grep > PetscCommDuplicate > [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 > -2080374784 max tags = 268435455 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > > [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex221f -info | grep > PetscCommDuplicate > [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 > -2080374784 max tags = 268435455 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 > -2080374784 max tags = 268435455 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 > -2080374784 max tags = 268435455 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 > -2080374784 max tags = 268435455 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 > -2080374784 max tags = 268435455 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 > -2080374784 > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexlindsay239 at gmail.com Fri Nov 1 10:14:30 2019 From: alexlindsay239 at gmail.com (Alexander Lindsay) Date: Fri, 1 Nov 2019 10:14:30 -0500 Subject: [petsc-users] VI: RS vs SS In-Reply-To: <7D085275-4E5C-4152-9784-51A703688A0A@mcs.anl.gov> References: <6DA5E815-6FB8-465F-98A5-3BA67F668AFB@mcs.anl.gov> <7D085275-4E5C-4152-9784-51A703688A0A@mcs.anl.gov> Message-ID: No, the matrix is not symmetric because of how we impose some Dirichlet conditions on the boundary. I could easily give you the Jacobian, for one of the "bad" problems. But at least in the case of RSLS, I don't know whether the algorithm is performing badly, or whether the slow convergence is simply a property of the algorithm. Here's a VI monitor history for a representative "bad" solve. 0 SNES VI Function norm 0.229489 Active lower constraints 0/1 upper constraints 0/1 Percent of total 0. Percent of bounded 0. 1 SNES VI Function norm 0.365268 Active lower constraints 83/85 upper constraints 83/85 Percent of total 0.207241 Percent of bounded 0. 2 SNES VI Function norm 0.495088 Active lower constraints 82/84 upper constraints 82/84 Percent of total 0.204744 Percent of bounded 0. 3 SNES VI Function norm 0.478328 Active lower constraints 81/83 upper constraints 81/83 Percent of total 0.202247 Percent of bounded 0. 4 SNES VI Function norm 0.46163 Active lower constraints 80/82 upper constraints 80/82 Percent of total 0.19975 Percent of bounded 0. 5 SNES VI Function norm 0.444996 Active lower constraints 79/81 upper constraints 79/81 Percent of total 0.197253 Percent of bounded 0. 6 SNES VI Function norm 0.428424 Active lower constraints 78/80 upper constraints 78/80 Percent of total 0.194757 Percent of bounded 0. 7 SNES VI Function norm 0.411916 Active lower constraints 77/79 upper constraints 77/79 Percent of total 0.19226 Percent of bounded 0. 8 SNES VI Function norm 0.395472 Active lower constraints 76/78 upper constraints 76/78 Percent of total 0.189763 Percent of bounded 0. 9 SNES VI Function norm 0.379092 Active lower constraints 75/77 upper constraints 75/77 Percent of total 0.187266 Percent of bounded 0. 10 SNES VI Function norm 0.362776 Active lower constraints 74/76 upper constraints 74/76 Percent of total 0.184769 Percent of bounded 0. 11 SNES VI Function norm 0.346525 Active lower constraints 73/75 upper constraints 73/75 Percent of total 0.182272 Percent of bounded 0. 12 SNES VI Function norm 0.330338 Active lower constraints 72/74 upper constraints 72/74 Percent of total 0.179775 Percent of bounded 0. 13 SNES VI Function norm 0.314217 Active lower constraints 71/73 upper constraints 71/73 Percent of total 0.177278 Percent of bounded 0. 14 SNES VI Function norm 0.298162 Active lower constraints 70/72 upper constraints 70/72 Percent of total 0.174782 Percent of bounded 0. 15 SNES VI Function norm 0.282173 Active lower constraints 69/71 upper constraints 69/71 Percent of total 0.172285 Percent of bounded 0. 16 SNES VI Function norm 0.26625 Active lower constraints 68/70 upper constraints 68/70 Percent of total 0.169788 Percent of bounded 0. 17 SNES VI Function norm 0.250393 Active lower constraints 67/69 upper constraints 67/69 Percent of total 0.167291 Percent of bounded 0. 18 SNES VI Function norm 0.234604 Active lower constraints 66/68 upper constraints 66/68 Percent of total 0.164794 Percent of bounded 0. 19 SNES VI Function norm 0.218882 Active lower constraints 65/67 upper constraints 65/67 Percent of total 0.162297 Percent of bounded 0. 20 SNES VI Function norm 0.203229 Active lower constraints 64/66 upper constraints 64/66 Percent of total 0.1598 Percent of bounded 0. 21 SNES VI Function norm 0.187643 Active lower constraints 63/65 upper constraints 63/65 Percent of total 0.157303 Percent of bounded 0. 22 SNES VI Function norm 0.172126 Active lower constraints 62/64 upper constraints 62/64 Percent of total 0.154806 Percent of bounded 0. 23 SNES VI Function norm 0.156679 Active lower constraints 61/63 upper constraints 61/63 Percent of total 0.15231 Percent of bounded 0. 24 SNES VI Function norm 0.141301 Active lower constraints 60/62 upper constraints 60/62 Percent of total 0.149813 Percent of bounded 0. 25 SNES VI Function norm 0.125993 Active lower constraints 59/61 upper constraints 59/61 Percent of total 0.147316 Percent of bounded 0. 26 SNES VI Function norm 0.110755 Active lower constraints 58/60 upper constraints 58/60 Percent of total 0.144819 Percent of bounded 0. 27 SNES VI Function norm 0.0955886 Active lower constraints 57/59 upper constraints 57/59 Percent of total 0.142322 Percent of bounded 0. 28 SNES VI Function norm 0.0804936 Active lower constraints 56/58 upper constraints 56/58 Percent of total 0.139825 Percent of bounded 0. 29 SNES VI Function norm 0.0654705 Active lower constraints 55/57 upper constraints 55/57 Percent of total 0.137328 Percent of bounded 0. 30 SNES VI Function norm 0.0505198 Active lower constraints 54/56 upper constraints 54/56 Percent of total 0.134831 Percent of bounded 0. 31 SNES VI Function norm 0.0356422 Active lower constraints 53/55 upper constraints 53/55 Percent of total 0.132335 Percent of bounded 0. 32 SNES VI Function norm 0.020838 Active lower constraints 52/54 upper constraints 52/54 Percent of total 0.129838 Percent of bounded 0. 33 SNES VI Function norm 0.0061078 Active lower constraints 51/53 upper constraints 51/53 Percent of total 0.127341 Percent of bounded 0. 34 SNES VI Function norm 2.2664e-12 Active lower constraints 51/52 upper constraints 51/52 Percent of total 0.127341 Percent of bounded 0. I've read that in some cases the VI solver is simply unable to move the constraint set more than one grid cell per non-linear iteration. That looks like what I'm seeing here... On Tue, Oct 29, 2019 at 7:15 AM Munson, Todd wrote: > > Hi, > > Is the matrix for the linear PDE symmetric? If so, then the VI is > equivalent to > finding the stationary points of a bound-constrained quadratic program and > you > may want to use the TAO Newton Trust-Region or Line-Search methods for > bound-constrained optimization problems. > > Alp: are there flags set when a problem is linear with a symmetric > matrix? Maybe > we can do an internal reformulation in those cases to use the optimization > tools. > > Is there an easy way to get the matrix and the constant vector for one of > the > problems that fails or does not perform well? Typically, the TAO RSLS > methods will work well for the types of problems that you have and if > they are not, then I can go about finding out why and making some > improvements. > > Monotone in this case is that your matrix is positive semidefinite; x^TMx > >= 0 for > all x. For M symmetric, this is the same as M having all nonnegative > eigenvalues. > > Todd. > > > On Oct 28, 2019, at 11:14 PM, Alexander Lindsay < > alexlindsay239 at gmail.com> wrote: > > > > On Thu, Oct 24, 2019 at 4:52 AM Munson, Todd > wrote: > > > > Hi, > > > > For these problems, how large are they? And are they linear or > nonlinear? > > What I can do is use some fancier tools to help with what is going on > with > > the solvers in certain cases. > > > > For the results cited above: > > > > 100 elements -> 101 dofs > > 1,000 elements -> 1,001 dofs > > 10,000 elements -> 10,001 dofs > > > > The PDE is linear with simple bounds constraints on the variable: 0 <= u > <= 10 > > > > > > For Barry's question, the matrix in the SS solver is a diagonal matrix > plus > > a column scaling of the Jacobian. > > > > Note: semismooth, reduced space and interior point methods mainly work > for > > problems that are strictly monotone. > > > > Dumb question, but monotone in what way? > > > > Thanks for the replies! > > > > Alex > > > > Finding out what is going on with > > your problems with some additional diagnostics might yield some > > insights. > > > > Todd. > > > > > On Oct 24, 2019, at 3:36 AM, Smith, Barry F. > wrote: > > > > > > > > > See bottom > > > > > > > > >> On Oct 14, 2019, at 1:12 PM, Justin Chang via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > >> > > >> It might depend on your application, but for my stuff on maximum > principles for advection-diffusion, I found RS to be much better than SS. > Here?s the paper I wrote documenting the performance numbers I came across > > >> > > >> https://www.sciencedirect.com/science/article/pii/S0045782516316176 > > >> > > >> Or the arXiV version: > > >> > > >> https://arxiv.org/pdf/1611.08758.pdf > > >> > > >> > > >> On Mon, Oct 14, 2019 at 1:07 PM Alexander Lindsay via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > >> I've been working on mechanical contact in MOOSE for a while, and > it's led to me to think about general inequality constraint enforcement. > I've been playing around with both `vinewtonssls` and `vinewtonrsls`. In > Benson's and Munson's Flexible Complementarity Solvers paper, they were > able to solve 73.7% of their problems with SS and 65.5% with RS which led > them to conclude that the SS method is generally more robust. We have had > at least one instance where a MOOSE user reported an order of magnitude > reduction in non-linear iterations when switching from SS to RS. Moreover, > when running the problem described in this issue, I get these results: > > >> > > >> num_elements = 100 > > >> SS nl iterations = 53 > > >> RS nl iterations = 22 > > >> > > >> num_elements = 1000 > > >> SS nl iterations = 123 > > >> RS nl iterations = 140 > > >> > > >> num_elements = 10000 > > >> SS: fails to converge within 50 nl iterations during the second time > step whether using a `basic` or `bt` line search > > >> RS: fails to converge within 50 nl iterations during the second time > step whether using a `basic` or `bt` line search (although I believe > `vinewtonrsls` performs a line-search that is guaranteed to keep the > degrees of freedom within their bounds) > > >> > > >> So depending on the number of elements, it appears that either SS or > RS may be more performant. I guess since I can get different relative > performance with even the same PDE, it would be silly for me to ask for > guidance on when to use which? In the conclusion of Benson's and Munson's > paper, they mention using mesh sequencing for generating initial guesses on > finer meshes. Does anyone know whether there have been any publications > using PETSc/TAO and mesh sequencing for solving large VI problems? > > >> > > >> A related question: what needs to be done to allow SS to run with > `-snes_mf_operator`? RS already appears to support the option. > > > > > > This may not make sense. Is the operator used in the SS solution > process derivable from the function that is being optimized with the > constraints or some strange scaled beast? > > >> > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmunson at mcs.anl.gov Fri Nov 1 10:18:59 2019 From: tmunson at mcs.anl.gov (Munson, Todd) Date: Fri, 1 Nov 2019 15:18:59 +0000 Subject: [petsc-users] VI: RS vs SS In-Reply-To: References: <6DA5E815-6FB8-465F-98A5-3BA67F668AFB@mcs.anl.gov> <7D085275-4E5C-4152-9784-51A703688A0A@mcs.anl.gov> Message-ID: Yes, that looks weird. Can you send me directly the linear problem (M, q, l, and u)? I will take a look and run some other diagnostics with some of my other tools. Thanks, Todd. > On Nov 1, 2019, at 10:14 AM, Alexander Lindsay wrote: > > No, the matrix is not symmetric because of how we impose some Dirichlet conditions on the boundary. I could easily give you the Jacobian, for one of the "bad" problems. But at least in the case of RSLS, I don't know whether the algorithm is performing badly, or whether the slow convergence is simply a property of the algorithm. Here's a VI monitor history for a representative "bad" solve. > > 0 SNES VI Function norm 0.229489 Active lower constraints 0/1 upper constraints 0/1 Percent of total 0. Percent of bounded 0. > 1 SNES VI Function norm 0.365268 Active lower constraints 83/85 upper constraints 83/85 Percent of total 0.207241 Percent of bounded 0. > 2 SNES VI Function norm 0.495088 Active lower constraints 82/84 upper constraints 82/84 Percent of total 0.204744 Percent of bounded 0. > 3 SNES VI Function norm 0.478328 Active lower constraints 81/83 upper constraints 81/83 Percent of total 0.202247 Percent of bounded 0. > 4 SNES VI Function norm 0.46163 Active lower constraints 80/82 upper constraints 80/82 Percent of total 0.19975 Percent of bounded 0. > 5 SNES VI Function norm 0.444996 Active lower constraints 79/81 upper constraints 79/81 Percent of total 0.197253 Percent of bounded 0. > 6 SNES VI Function norm 0.428424 Active lower constraints 78/80 upper constraints 78/80 Percent of total 0.194757 Percent of bounded 0. > 7 SNES VI Function norm 0.411916 Active lower constraints 77/79 upper constraints 77/79 Percent of total 0.19226 Percent of bounded 0. > 8 SNES VI Function norm 0.395472 Active lower constraints 76/78 upper constraints 76/78 Percent of total 0.189763 Percent of bounded 0. > 9 SNES VI Function norm 0.379092 Active lower constraints 75/77 upper constraints 75/77 Percent of total 0.187266 Percent of bounded 0. > 10 SNES VI Function norm 0.362776 Active lower constraints 74/76 upper constraints 74/76 Percent of total 0.184769 Percent of bounded 0. > 11 SNES VI Function norm 0.346525 Active lower constraints 73/75 upper constraints 73/75 Percent of total 0.182272 Percent of bounded 0. > 12 SNES VI Function norm 0.330338 Active lower constraints 72/74 upper constraints 72/74 Percent of total 0.179775 Percent of bounded 0. > 13 SNES VI Function norm 0.314217 Active lower constraints 71/73 upper constraints 71/73 Percent of total 0.177278 Percent of bounded 0. > 14 SNES VI Function norm 0.298162 Active lower constraints 70/72 upper constraints 70/72 Percent of total 0.174782 Percent of bounded 0. > 15 SNES VI Function norm 0.282173 Active lower constraints 69/71 upper constraints 69/71 Percent of total 0.172285 Percent of bounded 0. > 16 SNES VI Function norm 0.26625 Active lower constraints 68/70 upper constraints 68/70 Percent of total 0.169788 Percent of bounded 0. > 17 SNES VI Function norm 0.250393 Active lower constraints 67/69 upper constraints 67/69 Percent of total 0.167291 Percent of bounded 0. > 18 SNES VI Function norm 0.234604 Active lower constraints 66/68 upper constraints 66/68 Percent of total 0.164794 Percent of bounded 0. > 19 SNES VI Function norm 0.218882 Active lower constraints 65/67 upper constraints 65/67 Percent of total 0.162297 Percent of bounded 0. > 20 SNES VI Function norm 0.203229 Active lower constraints 64/66 upper constraints 64/66 Percent of total 0.1598 Percent of bounded 0. > 21 SNES VI Function norm 0.187643 Active lower constraints 63/65 upper constraints 63/65 Percent of total 0.157303 Percent of bounded 0. > 22 SNES VI Function norm 0.172126 Active lower constraints 62/64 upper constraints 62/64 Percent of total 0.154806 Percent of bounded 0. > 23 SNES VI Function norm 0.156679 Active lower constraints 61/63 upper constraints 61/63 Percent of total 0.15231 Percent of bounded 0. > 24 SNES VI Function norm 0.141301 Active lower constraints 60/62 upper constraints 60/62 Percent of total 0.149813 Percent of bounded 0. > 25 SNES VI Function norm 0.125993 Active lower constraints 59/61 upper constraints 59/61 Percent of total 0.147316 Percent of bounded 0. > 26 SNES VI Function norm 0.110755 Active lower constraints 58/60 upper constraints 58/60 Percent of total 0.144819 Percent of bounded 0. > 27 SNES VI Function norm 0.0955886 Active lower constraints 57/59 upper constraints 57/59 Percent of total 0.142322 Percent of bounded 0. > 28 SNES VI Function norm 0.0804936 Active lower constraints 56/58 upper constraints 56/58 Percent of total 0.139825 Percent of bounded 0. > 29 SNES VI Function norm 0.0654705 Active lower constraints 55/57 upper constraints 55/57 Percent of total 0.137328 Percent of bounded 0. > 30 SNES VI Function norm 0.0505198 Active lower constraints 54/56 upper constraints 54/56 Percent of total 0.134831 Percent of bounded 0. > 31 SNES VI Function norm 0.0356422 Active lower constraints 53/55 upper constraints 53/55 Percent of total 0.132335 Percent of bounded 0. > 32 SNES VI Function norm 0.020838 Active lower constraints 52/54 upper constraints 52/54 Percent of total 0.129838 Percent of bounded 0. > 33 SNES VI Function norm 0.0061078 Active lower constraints 51/53 upper constraints 51/53 Percent of total 0.127341 Percent of bounded 0. > 34 SNES VI Function norm 2.2664e-12 Active lower constraints 51/52 upper constraints 51/52 Percent of total 0.127341 Percent of bounded 0. > > I've read that in some cases the VI solver is simply unable to move the constraint set more than one grid cell per non-linear iteration. That looks like what I'm seeing here... > > On Tue, Oct 29, 2019 at 7:15 AM Munson, Todd wrote: > > Hi, > > Is the matrix for the linear PDE symmetric? If so, then the VI is equivalent to > finding the stationary points of a bound-constrained quadratic program and you > may want to use the TAO Newton Trust-Region or Line-Search methods for > bound-constrained optimization problems. > > Alp: are there flags set when a problem is linear with a symmetric matrix? Maybe > we can do an internal reformulation in those cases to use the optimization tools. > > Is there an easy way to get the matrix and the constant vector for one of the > problems that fails or does not perform well? Typically, the TAO RSLS > methods will work well for the types of problems that you have and if > they are not, then I can go about finding out why and making some > improvements. > > Monotone in this case is that your matrix is positive semidefinite; x^TMx >= 0 for > all x. For M symmetric, this is the same as M having all nonnegative eigenvalues. > > Todd. > > > On Oct 28, 2019, at 11:14 PM, Alexander Lindsay wrote: > > > > On Thu, Oct 24, 2019 at 4:52 AM Munson, Todd wrote: > > > > Hi, > > > > For these problems, how large are they? And are they linear or nonlinear? > > What I can do is use some fancier tools to help with what is going on with > > the solvers in certain cases. > > > > For the results cited above: > > > > 100 elements -> 101 dofs > > 1,000 elements -> 1,001 dofs > > 10,000 elements -> 10,001 dofs > > > > The PDE is linear with simple bounds constraints on the variable: 0 <= u <= 10 > > > > > > For Barry's question, the matrix in the SS solver is a diagonal matrix plus > > a column scaling of the Jacobian. > > > > Note: semismooth, reduced space and interior point methods mainly work for > > problems that are strictly monotone. > > > > Dumb question, but monotone in what way? > > > > Thanks for the replies! > > > > Alex > > > > Finding out what is going on with > > your problems with some additional diagnostics might yield some > > insights. > > > > Todd. > > > > > On Oct 24, 2019, at 3:36 AM, Smith, Barry F. wrote: > > > > > > > > > See bottom > > > > > > > > >> On Oct 14, 2019, at 1:12 PM, Justin Chang via petsc-users wrote: > > >> > > >> It might depend on your application, but for my stuff on maximum principles for advection-diffusion, I found RS to be much better than SS. Here?s the paper I wrote documenting the performance numbers I came across > > >> > > >> https://www.sciencedirect.com/science/article/pii/S0045782516316176 > > >> > > >> Or the arXiV version: > > >> > > >> https://arxiv.org/pdf/1611.08758.pdf > > >> > > >> > > >> On Mon, Oct 14, 2019 at 1:07 PM Alexander Lindsay via petsc-users wrote: > > >> I've been working on mechanical contact in MOOSE for a while, and it's led to me to think about general inequality constraint enforcement. I've been playing around with both `vinewtonssls` and `vinewtonrsls`. In Benson's and Munson's Flexible Complementarity Solvers paper, they were able to solve 73.7% of their problems with SS and 65.5% with RS which led them to conclude that the SS method is generally more robust. We have had at least one instance where a MOOSE user reported an order of magnitude reduction in non-linear iterations when switching from SS to RS. Moreover, when running the problem described in this issue, I get these results: > > >> > > >> num_elements = 100 > > >> SS nl iterations = 53 > > >> RS nl iterations = 22 > > >> > > >> num_elements = 1000 > > >> SS nl iterations = 123 > > >> RS nl iterations = 140 > > >> > > >> num_elements = 10000 > > >> SS: fails to converge within 50 nl iterations during the second time step whether using a `basic` or `bt` line search > > >> RS: fails to converge within 50 nl iterations during the second time step whether using a `basic` or `bt` line search (although I believe `vinewtonrsls` performs a line-search that is guaranteed to keep the degrees of freedom within their bounds) > > >> > > >> So depending on the number of elements, it appears that either SS or RS may be more performant. I guess since I can get different relative performance with even the same PDE, it would be silly for me to ask for guidance on when to use which? In the conclusion of Benson's and Munson's paper, they mention using mesh sequencing for generating initial guesses on finer meshes. Does anyone know whether there have been any publications using PETSc/TAO and mesh sequencing for solving large VI problems? > > >> > > >> A related question: what needs to be done to allow SS to run with `-snes_mf_operator`? RS already appears to support the option. > > > > > > This may not make sense. Is the operator used in the SS solution process derivable from the function that is being optimized with the constraints or some strange scaled beast? > > >> > > > > > > From bsmith at mcs.anl.gov Fri Nov 1 10:24:22 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 1 Nov 2019 15:24:22 +0000 Subject: [petsc-users] Do the guards against calling MPI_Comm_dup() in PetscCommDuplicate() apply with Fortran? In-Reply-To: References: Message-ID: <51DE3BAE-2B7F-4273-95AB-9B0B32D4A483@anl.gov> Certain OpenMPI versions have bugs where even when you properly duplicate and then free communicators it eventually "runs out of communicators". This is a definitely a bug and was fixed in later OpenMPI versions. We wasted a lot of time tracking down this bug in the past. By now it is an old version of OpenMPI; the OpenMPI site https://www.open-mpi.org/software/ompi/v4.0/ lists the buggy versions as retired. So the question is should PETSc attempt to change its behavior or add functionality or hacks to work around this bug? My answer is NO. This is a "NEW" cluster! A "NEW" cluster is not running OpenMPI 2.1 by definition of new. The cluster manager needs to remove the buggy version of OpenMPI from their system. If the cluster manager is incapable of doing the most elementary part of the their job (removing buggy code) then the application person is stuck having to put hacks into their code to work around the bugs on their cluster; it cannot be PETSc's responsibility to distorted itself due to ancient bugs in other software. Barry Note that this OpenMPI bug does not affect very many MPI or PETSc codes. It only affects those codes that completely correctly call duplicate and free many times. This is why PETSc configure doesn't blacklist the OpenMPI version (though perhaps it should). > On Nov 1, 2019, at 5:41 AM, Patrick Sanan via petsc-users wrote: > > Context: I'm trying to track down an error that (only) arises when running a Fortran 90 code, using PETSc, on a new cluster. The code creates and destroys a linear system (Mat,Vec, and KSP) at each of (many) timesteps. The error message from a user looks like this, which leads me to suspect that MPI_Comm_dup() is being called many times and this is eventually a problem for this particular MPI implementation (Open MPI 2.1.0): > > [lo-a2-058:21425] *** An error occurred in MPI_Comm_dup > [lo-a2-058:21425] *** reported by process [4222287873,2] > [lo-a2-058:21425] *** on communicator MPI COMMUNICATOR 65534 DUP FROM 65533 > [lo-a2-058:21425] *** MPI_ERR_INTERN: internal error > [lo-a2-058:21425] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > [lo-a2-058:21425] *** and potentially your MPI job) > > Question: I remember some discussion recently (but can't find the thread) about not calling MPI_Comm_dup() too many times from PetscCommDuplicate(), which would allow one to safely use the (admittedly not optimal) approach used in this application code. Is that a correct understanding and would the fixes made in that context also apply to Fortran? I don't fully understand the details of the MPI techniques used, so thought I'd ask here. > > If I hack a simple build-solve-destroy example to run several loops, I see a notable difference between C and Fortran examples. With the attached ex223.c and ex221f.F90, which just add outer loops (5 iterations) to KSP tutorials examples ex23.c and ex21f.F90, respectively, I see the following. Note that in the Fortran case, it appears that communicators are actually duplicated in each loop, but in the C case, this only happens in the first loop: > > [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex223 -info | grep PetscCommDuplicate > [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > > [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex221f -info | grep PetscCommDuplicate > [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 > > > > From patrick.sanan at gmail.com Fri Nov 1 10:54:04 2019 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Fri, 1 Nov 2019 16:54:04 +0100 Subject: [petsc-users] Do the guards against calling MPI_Comm_dup() in PetscCommDuplicate() apply with Fortran? In-Reply-To: <51DE3BAE-2B7F-4273-95AB-9B0B32D4A483@anl.gov> References: <51DE3BAE-2B7F-4273-95AB-9B0B32D4A483@anl.gov> Message-ID: Thanks, Barry. I should have realized that was an ancient version. The cluster does have Open MPI 4.0.1 so I'll see if we can't use that instead. (I'm sure that the old version is there just to provide continuity - the weird thing is that the previous, quite similar, cluster used Open MPI 1.6.5 and that seemed to work fine with this application :D ) > Am 01.11.2019 um 16:24 schrieb Smith, Barry F. : > > > Certain OpenMPI versions have bugs where even when you properly duplicate and then free communicators it eventually "runs out of communicators". This is a definitely a bug and was fixed in later OpenMPI versions. We wasted a lot of time tracking down this bug in the past. By now it is an old version of OpenMPI; the OpenMPI site https://www.open-mpi.org/software/ompi/v4.0/ lists the buggy versions as retired. > > So the question is should PETSc attempt to change its behavior or add functionality or hacks to work around this bug? > > My answer is NO. This is a "NEW" cluster! A "NEW" cluster is not running OpenMPI 2.1 by definition of new. The cluster manager needs to remove the buggy version of OpenMPI from their system. If the cluster manager is incapable of doing the most elementary part of the their job (removing buggy code) then the application person is stuck having to put hacks into their code to work around the bugs on their cluster; it cannot be PETSc's responsibility to distorted itself due to ancient bugs in other software. > > Barry > > Note that this OpenMPI bug does not affect very many MPI or PETSc codes. It only affects those codes that completely correctly call duplicate and free many times. This is why PETSc configure doesn't blacklist the OpenMPI version (though perhaps it should). > > > >> On Nov 1, 2019, at 5:41 AM, Patrick Sanan via petsc-users wrote: >> >> Context: I'm trying to track down an error that (only) arises when running a Fortran 90 code, using PETSc, on a new cluster. The code creates and destroys a linear system (Mat,Vec, and KSP) at each of (many) timesteps. The error message from a user looks like this, which leads me to suspect that MPI_Comm_dup() is being called many times and this is eventually a problem for this particular MPI implementation (Open MPI 2.1.0): >> >> [lo-a2-058:21425] *** An error occurred in MPI_Comm_dup >> [lo-a2-058:21425] *** reported by process [4222287873,2] >> [lo-a2-058:21425] *** on communicator MPI COMMUNICATOR 65534 DUP FROM 65533 >> [lo-a2-058:21425] *** MPI_ERR_INTERN: internal error >> [lo-a2-058:21425] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >> [lo-a2-058:21425] *** and potentially your MPI job) >> >> Question: I remember some discussion recently (but can't find the thread) about not calling MPI_Comm_dup() too many times from PetscCommDuplicate(), which would allow one to safely use the (admittedly not optimal) approach used in this application code. Is that a correct understanding and would the fixes made in that context also apply to Fortran? I don't fully understand the details of the MPI techniques used, so thought I'd ask here. >> >> If I hack a simple build-solve-destroy example to run several loops, I see a notable difference between C and Fortran examples. With the attached ex223.c and ex221f.F90, which just add outer loops (5 iterations) to KSP tutorials examples ex23.c and ex21f.F90, respectively, I see the following. Note that in the Fortran case, it appears that communicators are actually duplicated in each loop, but in the C case, this only happens in the first loop: >> >> [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex223 -info | grep PetscCommDuplicate >> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> >> [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex221f -info | grep PetscCommDuplicate >> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >> >> >> >> > From bsmith at mcs.anl.gov Fri Nov 1 13:39:17 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 1 Nov 2019 18:39:17 +0000 Subject: [petsc-users] Do the guards against calling MPI_Comm_dup() in PetscCommDuplicate() apply with Fortran? In-Reply-To: References: <51DE3BAE-2B7F-4273-95AB-9B0B32D4A483@anl.gov> Message-ID: <1BC6A575-727A-4108-BDFC-86EAA82BF680@mcs.anl.gov> > On Nov 1, 2019, at 10:54 AM, Patrick Sanan wrote: > > Thanks, Barry. I should have realized that was an ancient version. The cluster does have Open MPI 4.0.1 so I'll see if we can't use that instead. (I'm sure that the old version is there just to provide continuity - the weird thing is that the previous, quite similar, cluster used Open MPI 1.6.5 and that seemed to work fine with this application :D ) Yes, the bug was introduced into OpenMPI at some point and then removed at a later point, so it is actually completely reasonable that the older OpenMPI worked fine. Barry > >> Am 01.11.2019 um 16:24 schrieb Smith, Barry F. : >> >> >> Certain OpenMPI versions have bugs where even when you properly duplicate and then free communicators it eventually "runs out of communicators". This is a definitely a bug and was fixed in later OpenMPI versions. We wasted a lot of time tracking down this bug in the past. By now it is an old version of OpenMPI; the OpenMPI site https://www.open-mpi.org/software/ompi/v4.0/ lists the buggy versions as retired. >> >> So the question is should PETSc attempt to change its behavior or add functionality or hacks to work around this bug? >> >> My answer is NO. This is a "NEW" cluster! A "NEW" cluster is not running OpenMPI 2.1 by definition of new. The cluster manager needs to remove the buggy version of OpenMPI from their system. If the cluster manager is incapable of doing the most elementary part of the their job (removing buggy code) then the application person is stuck having to put hacks into their code to work around the bugs on their cluster; it cannot be PETSc's responsibility to distorted itself due to ancient bugs in other software. >> >> Barry >> >> Note that this OpenMPI bug does not affect very many MPI or PETSc codes. It only affects those codes that completely correctly call duplicate and free many times. This is why PETSc configure doesn't blacklist the OpenMPI version (though perhaps it should). >> >> >> >>> On Nov 1, 2019, at 5:41 AM, Patrick Sanan via petsc-users wrote: >>> >>> Context: I'm trying to track down an error that (only) arises when running a Fortran 90 code, using PETSc, on a new cluster. The code creates and destroys a linear system (Mat,Vec, and KSP) at each of (many) timesteps. The error message from a user looks like this, which leads me to suspect that MPI_Comm_dup() is being called many times and this is eventually a problem for this particular MPI implementation (Open MPI 2.1.0): >>> >>> [lo-a2-058:21425] *** An error occurred in MPI_Comm_dup >>> [lo-a2-058:21425] *** reported by process [4222287873,2] >>> [lo-a2-058:21425] *** on communicator MPI COMMUNICATOR 65534 DUP FROM 65533 >>> [lo-a2-058:21425] *** MPI_ERR_INTERN: internal error >>> [lo-a2-058:21425] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >>> [lo-a2-058:21425] *** and potentially your MPI job) >>> >>> Question: I remember some discussion recently (but can't find the thread) about not calling MPI_Comm_dup() too many times from PetscCommDuplicate(), which would allow one to safely use the (admittedly not optimal) approach used in this application code. Is that a correct understanding and would the fixes made in that context also apply to Fortran? I don't fully understand the details of the MPI techniques used, so thought I'd ask here. >>> >>> If I hack a simple build-solve-destroy example to run several loops, I see a notable difference between C and Fortran examples. With the attached ex223.c and ex221f.F90, which just add outer loops (5 iterations) to KSP tutorials examples ex23.c and ex21f.F90, respectively, I see the following. Note that in the Fortran case, it appears that communicators are actually duplicated in each loop, but in the C case, this only happens in the first loop: >>> >>> [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex223 -info | grep PetscCommDuplicate >>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> >>> [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex221f -info | grep PetscCommDuplicate >>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784 >>> >>> >>> >>> >> > From sajidsyed2021 at u.northwestern.edu Fri Nov 1 14:06:33 2019 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Fri, 1 Nov 2019 14:06:33 -0500 Subject: [petsc-users] VecDuplicate for FFTW-Vec causes VecDestroy to fail conditionally on VecLoad Message-ID: Hi PETSc-developers, I'm unable to debug a crash with VecDestroy that seems to depend only on whether or not a VecLoad was performed on a vector that was generated by duplicating one generated by MatCreateVecsFFTW. I'm attaching two examples ex1.c and ex2.c. The first one just creates vectors aligned as per FFTW layout, duplicates one of them and destroys all at the end. A bug related to this was fixed sometime between the 3.11 release and 3.12 release. I've tested this code with the versions 3.11.1 and 3.12.1 and as expected it runs with no issues for 3.12.1 and fails with 3.11.1. Now, the second one just adds a few lines which load a vector from memory to the duplicated vector before destroying all. For some reason, this code fails for both 3.11.1 and 3.12.1 versions. I'm lost as to what may cause this error and would appreciate any help in how to debug this. Thanks in advance for the help! PS: I've attached the two codes, ex1.c/ex2.c, the log files for both make and run and finally a bash script that was run to compile/log and control the version of petsc used. -- Sajid Ali Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex1_make_3_12_1 Type: application/octet-stream Size: 3604 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex1_make_3_11_1 Type: application/octet-stream Size: 3768 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex1.c Type: application/octet-stream Size: 1809 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex1_log_3_12_1 Type: application/octet-stream Size: 351 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex1_log_3_11_1 Type: application/octet-stream Size: 433628 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex2.c Type: application/octet-stream Size: 2186 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex2_make_3_11_1 Type: application/octet-stream Size: 3768 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex2_make_3_12_1 Type: application/octet-stream Size: 3604 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex2_log_3_12_1 Type: application/octet-stream Size: 433628 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test.sh Type: application/octet-stream Size: 514 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex2_log_3_11_1 Type: application/octet-stream Size: 434412 bytes Desc: not available URL: From jczhang at mcs.anl.gov Fri Nov 1 16:50:10 2019 From: jczhang at mcs.anl.gov (Zhang, Junchao) Date: Fri, 1 Nov 2019 21:50:10 +0000 Subject: [petsc-users] VecDuplicate for FFTW-Vec causes VecDestroy to fail conditionally on VecLoad In-Reply-To: References: Message-ID: I know nothing about Vec FFTW, but if you can provide hdf5 files in your test, I will see if I can reproduce it. --Junchao Zhang On Fri, Nov 1, 2019 at 2:08 PM Sajid Ali via petsc-users > wrote: Hi PETSc-developers, I'm unable to debug a crash with VecDestroy that seems to depend only on whether or not a VecLoad was performed on a vector that was generated by duplicating one generated by MatCreateVecsFFTW. I'm attaching two examples ex1.c and ex2.c. The first one just creates vectors aligned as per FFTW layout, duplicates one of them and destroys all at the end. A bug related to this was fixed sometime between the 3.11 release and 3.12 release. I've tested this code with the versions 3.11.1 and 3.12.1 and as expected it runs with no issues for 3.12.1 and fails with 3.11.1. Now, the second one just adds a few lines which load a vector from memory to the duplicated vector before destroying all. For some reason, this code fails for both 3.11.1 and 3.12.1 versions. I'm lost as to what may cause this error and would appreciate any help in how to debug this. Thanks in advance for the help! PS: I've attached the two codes, ex1.c/ex2.c, the log files for both make and run and finally a bash script that was run to compile/log and control the version of petsc used. -- Sajid Ali Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Nov 1 18:10:38 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 1 Nov 2019 23:10:38 +0000 Subject: [petsc-users] VecDuplicate for FFTW-Vec causes VecDestroy to fail conditionally on VecLoad In-Reply-To: References: Message-ID: <9495B365-A1AE-4840-8E2D-45CD01DE3D41@anl.gov> > On Nov 1, 2019, at 4:50 PM, Zhang, Junchao via petsc-users wrote: > > I know nothing about Vec FFTW, You are lucky :-) > but if you can provide hdf5 files in your test, I will see if I can reproduce it. > --Junchao Zhang > > > On Fri, Nov 1, 2019 at 2:08 PM Sajid Ali via petsc-users wrote: > Hi PETSc-developers, > > I'm unable to debug a crash with VecDestroy that seems to depend only on whether or not a VecLoad was performed on a vector that was generated by duplicating one generated by MatCreateVecsFFTW. > > I'm attaching two examples ex1.c and ex2.c. The first one just creates vectors aligned as per FFTW layout, duplicates one of them and destroys all at the end. A bug related to this was fixed sometime between the 3.11 release and 3.12 release. I've tested this code with the versions 3.11.1 and 3.12.1 and as expected it runs with no issues for 3.12.1 and fails with 3.11.1. > > Now, the second one just adds a few lines which load a vector from memory to the duplicated vector before destroying all. For some reason, this code fails for both 3.11.1 and 3.12.1 versions. I'm lost as to what may cause this error and would appreciate any help in how to debug this. Thanks in advance for the help! > > PS: I've attached the two codes, ex1.c/ex2.c, the log files for both make and run and finally a bash script that was run to compile/log and control the version of petsc used. > > > -- > Sajid Ali > Applied Physics > Northwestern University > s-sajid-ali.github.io From sajidsyed2021 at u.northwestern.edu Fri Nov 1 18:49:56 2019 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Fri, 1 Nov 2019 18:49:56 -0500 Subject: [petsc-users] VecDuplicate for FFTW-Vec causes VecDestroy to fail conditionally on VecLoad In-Reply-To: <9495B365-A1AE-4840-8E2D-45CD01DE3D41@anl.gov> References: <9495B365-A1AE-4840-8E2D-45CD01DE3D41@anl.gov> Message-ID: Hi Junchao/Barry, It doesn't really matter what the h5 file contains, so I'm attaching a lightly edited script of src/vec/vec/examples/tutorials/ex10.c which should produce a vector to be used as input for the above test case. (I'm working with ` --with-scalar-type=complex`). Now that I think of it, fixing this bug is not important, I can workaround the issue by creating a new vector with VecCreateMPI and accept the small loss in performance of VecPointwiseMult due to misaligned layouts. If it's a small fix it may be worth the time, but fixing this is not a big priority right now. If it's a complicated fix, this issue can serve as a note to future users. Thank You, Sajid Ali Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex10.c Type: application/octet-stream Size: 1568 bytes Desc: not available URL: From juaneah at gmail.com Sun Nov 3 23:41:50 2019 From: juaneah at gmail.com (Emmanuel Ayala) Date: Sun, 3 Nov 2019 23:41:50 -0600 Subject: [petsc-users] doubts on VecScatterCreate Message-ID: Hi everyone, thanks in advance. I have three parallel vectors: A, B and C. A and B have different sizes, and C must be contain these two vectors (MatLab notation C=[A;B]). I need to do some operations on C then put back the proper portion of C on A and B, then I do some computations on A and B y put again on C, and the loop repeats. For these propose I use Scatters: C is created as a parallel vector with size of (sizeA + sizeB) with petsc_decide for parallel layout. The vectors have been distributed on the same amount of processes. For the specific case with order [A;B] VecGetOwnershipRange(A,&start,&end); ISCreateStride(MPI_COMM_WORLD,(end-start),start,1,&is_fromA); ISCreateStride(MPI_COMM_WORLD,(end-start),start,1,&is_toC1);// this is redundant VecScatterCreate(A,is_fromA,C,is_toC1,&scatter1); VecGetSize(A,&sizeA) VecGetOwnershipRange(B,&start,&end); ISCreateStride(MPI_COMM_WORLD,(end-start),start,1,&is_fromB); ISCreateStride(MPI_COMM_WORLD,(end-start),(start+sizeA),1,&is_toC2); //shifts the index location VecScatterCreate(B,is_fromB,C,is_toC2,&scatter2); Then I can use VecScatterBegin(scatter1,A,C,INSERT_VALUES,SCATTER_FORWARD); VecScatterEnd(scatter1,A,C,INSERT_VALUES,SCATTER_FORWARD); and VecScatterBegin(scatter1,C,A,INSERT_VALUES,SCATTER_REVERSE); VecScatterEnd(scatter1,C,A,INSERT_VALUES,SCATTER_REVERSE); and the same with B. I used MPI_COMM SELF and I got the same results. *The situation is: My results look good for the portion of B, but no for the portion of A, there is something that I'm doing wrong with the scattering?* Best regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From juaneah at gmail.com Sun Nov 3 23:49:38 2019 From: juaneah at gmail.com (Emmanuel Ayala) Date: Sun, 3 Nov 2019 23:49:38 -0600 Subject: [petsc-users] doubts on VecScatterCreate In-Reply-To: References: Message-ID: *I mean, the portion for A look like messy distribution, as if scatter was done wrong* El dom., 3 de nov. de 2019 a la(s) 23:41, Emmanuel Ayala (juaneah at gmail.com) escribi?: > Hi everyone, thanks in advance. > > I have three parallel vectors: A, B and C. A and B have different sizes, > and C must be contain these two vectors (MatLab notation C=[A;B]). I need > to do some operations on C then put back the proper portion of C on A and > B, then I do some computations on A and B y put again on C, and the loop > repeats. > > For these propose I use Scatters: > > C is created as a parallel vector with size of (sizeA + sizeB) with > petsc_decide for parallel layout. The vectors have been distributed on the > same amount of processes. > > For the specific case with order [A;B] > > VecGetOwnershipRange(A,&start,&end); > ISCreateStride(MPI_COMM_WORLD,(end-start),start,1,&is_fromA); > ISCreateStride(MPI_COMM_WORLD,(end-start),start,1,&is_toC1);// this is > redundant > VecScatterCreate(A,is_fromA,C,is_toC1,&scatter1); > > VecGetSize(A,&sizeA) > VecGetOwnershipRange(B,&start,&end); > ISCreateStride(MPI_COMM_WORLD,(end-start),start,1,&is_fromB); > ISCreateStride(MPI_COMM_WORLD,(end-start),(start+sizeA),1,&is_toC2); > //shifts the index location > VecScatterCreate(B,is_fromB,C,is_toC2,&scatter2); > > Then I can use > VecScatterBegin(scatter1,A,C,INSERT_VALUES,SCATTER_FORWARD); > VecScatterEnd(scatter1,A,C,INSERT_VALUES,SCATTER_FORWARD); > > and > VecScatterBegin(scatter1,C,A,INSERT_VALUES,SCATTER_REVERSE); > VecScatterEnd(scatter1,C,A,INSERT_VALUES,SCATTER_REVERSE); > > and the same with B. > I used MPI_COMM SELF and I got the same results. > > *The situation is: My results look good for the portion of B, but no for > the portion of A, there is something that I'm doing wrong with the > scattering?* > > Best regards. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Nov 4 08:47:47 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Mon, 4 Nov 2019 14:47:47 +0000 Subject: [petsc-users] doubts on VecScatterCreate In-Reply-To: References: Message-ID: <35D3FEA6-6B71-44F4-B746-23461B581E9C@anl.gov> It works for me. Please send a complete code that fails. > On Nov 3, 2019, at 11:41 PM, Emmanuel Ayala via petsc-users wrote: > > Hi everyone, thanks in advance. > > I have three parallel vectors: A, B and C. A and B have different sizes, and C must be contain these two vectors (MatLab notation C=[A;B]). I need to do some operations on C then put back the proper portion of C on A and B, then I do some computations on A and B y put again on C, and the loop repeats. > > For these propose I use Scatters: > > C is created as a parallel vector with size of (sizeA + sizeB) with petsc_decide for parallel layout. The vectors have been distributed on the same amount of processes. > > For the specific case with order [A;B] > > VecGetOwnershipRange(A,&start,&end); > ISCreateStride(MPI_COMM_WORLD,(end-start),start,1,&is_fromA); > ISCreateStride(MPI_COMM_WORLD,(end-start),start,1,&is_toC1);// this is redundant > VecScatterCreate(A,is_fromA,C,is_toC1,&scatter1); > > VecGetSize(A,&sizeA) > VecGetOwnershipRange(B,&start,&end); > ISCreateStride(MPI_COMM_WORLD,(end-start),start,1,&is_fromB); > ISCreateStride(MPI_COMM_WORLD,(end-start),(start+sizeA),1,&is_toC2); //shifts the index location > VecScatterCreate(B,is_fromB,C,is_toC2,&scatter2); > > Then I can use > VecScatterBegin(scatter1,A,C,INSERT_VALUES,SCATTER_FORWARD); > VecScatterEnd(scatter1,A,C,INSERT_VALUES,SCATTER_FORWARD); > > and > VecScatterBegin(scatter1,C,A,INSERT_VALUES,SCATTER_REVERSE); > VecScatterEnd(scatter1,C,A,INSERT_VALUES,SCATTER_REVERSE); > > and the same with B. > I used MPI_COMM SELF and I got the same results. > > The situation is: My results look good for the portion of B, but no for the portion of A, there is something that I'm doing wrong with the scattering? > > Best regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex1.c Type: application/octet-stream Size: 1481 bytes Desc: ex1.c URL: From perceval.desforges at polytechnique.edu Mon Nov 4 11:33:54 2019 From: perceval.desforges at polytechnique.edu (Perceval Desforges) Date: Mon, 04 Nov 2019 18:33:54 +0100 Subject: [petsc-users] SLEPC no speedup in parallel Message-ID: <82480440c0cf3973ca6e935413279be3@polytechnique.edu> Dear petsc and slepc developpers, I am using slepc to solve an eigenvalue problem. Since I need all the eigenvalues in a certain interval, I use the spectrum slicing technique with mumps. However I do not understand: when I run my code with more than one processor, there is no speedup at all, and it even slows down, and I don't understand why. I wanted to test further and I ran the same code without spectrum slicing, and asking for about the same amount of eigenvalues. The calculation was much slower (about 10 times slower), but using more than one processor sped it up. Is this normal behavior or am I doing something wrong? Thanks, Best regards, Perceval, -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Mon Nov 4 11:45:36 2019 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 4 Nov 2019 18:45:36 +0100 Subject: [petsc-users] SLEPC no speedup in parallel In-Reply-To: <82480440c0cf3973ca6e935413279be3@polytechnique.edu> References: <82480440c0cf3973ca6e935413279be3@polytechnique.edu> Message-ID: <710EDF59-3233-470E-8E1F-22E3B504556B@dsic.upv.es> Did you follow the instructions in section 3.4.5 of the SLEPc users manual? Send the output of -eps_view Jose > El 4 nov 2019, a las 18:33, Perceval Desforges via petsc-users escribi?: > > Dear petsc and slepc developpers, > > I am using slepc to solve an eigenvalue problem. Since I need all the eigenvalues in a certain interval, I use the spectrum slicing technique with mumps. However I do not understand: when I run my code with more than one processor, there is no speedup at all, and it even slows down, and I don't understand why. > > I wanted to test further and I ran the same code without spectrum slicing, and asking for about the same amount of eigenvalues. The calculation was much slower (about 10 times slower), but using more than one processor sped it up. > > Is this normal behavior or am I doing something wrong? > > Thanks, > > Best regards, > > Perceval, > > > From alexlindsay239 at gmail.com Mon Nov 4 12:44:14 2019 From: alexlindsay239 at gmail.com (Alexander Lindsay) Date: Mon, 4 Nov 2019 12:44:14 -0600 Subject: [petsc-users] VI: RS vs SS In-Reply-To: References: <6DA5E815-6FB8-465F-98A5-3BA67F668AFB@mcs.anl.gov> <7D085275-4E5C-4152-9784-51A703688A0A@mcs.anl.gov> Message-ID: I'm not too familiar with the M and q notation. However, I've attached A and b for the unconstrained linear problem in PETSc binary format (don't know if they'll go through on this list...). l and u are 0 and 10 respectively. On Fri, Nov 1, 2019 at 10:19 AM Munson, Todd wrote: > > Yes, that looks weird. Can you send me directly the linear problem (M, q, > l, and u)? I > will take a look and run some other diagnostics with some of my other > tools. > > Thanks, Todd. > > > On Nov 1, 2019, at 10:14 AM, Alexander Lindsay > wrote: > > > > No, the matrix is not symmetric because of how we impose some Dirichlet > conditions on the boundary. I could easily give you the Jacobian, for one > of the "bad" problems. But at least in the case of RSLS, I don't know > whether the algorithm is performing badly, or whether the slow convergence > is simply a property of the algorithm. Here's a VI monitor history for a > representative "bad" solve. > > > > 0 SNES VI Function norm 0.229489 Active lower constraints 0/1 upper > constraints 0/1 Percent of total 0. Percent of bounded 0. > > 1 SNES VI Function norm 0.365268 Active lower constraints 83/85 upper > constraints 83/85 Percent of total 0.207241 Percent of bounded 0. > > 2 SNES VI Function norm 0.495088 Active lower constraints 82/84 upper > constraints 82/84 Percent of total 0.204744 Percent of bounded 0. > > 3 SNES VI Function norm 0.478328 Active lower constraints 81/83 upper > constraints 81/83 Percent of total 0.202247 Percent of bounded 0. > > 4 SNES VI Function norm 0.46163 Active lower constraints 80/82 upper > constraints 80/82 Percent of total 0.19975 Percent of bounded 0. > > 5 SNES VI Function norm 0.444996 Active lower constraints 79/81 upper > constraints 79/81 Percent of total 0.197253 Percent of bounded 0. > > 6 SNES VI Function norm 0.428424 Active lower constraints 78/80 upper > constraints 78/80 Percent of total 0.194757 Percent of bounded 0. > > 7 SNES VI Function norm 0.411916 Active lower constraints 77/79 upper > constraints 77/79 Percent of total 0.19226 Percent of bounded 0. > > 8 SNES VI Function norm 0.395472 Active lower constraints 76/78 upper > constraints 76/78 Percent of total 0.189763 Percent of bounded 0. > > 9 SNES VI Function norm 0.379092 Active lower constraints 75/77 upper > constraints 75/77 Percent of total 0.187266 Percent of bounded 0. > > 10 SNES VI Function norm 0.362776 Active lower constraints 74/76 upper > constraints 74/76 Percent of total 0.184769 Percent of bounded 0. > > 11 SNES VI Function norm 0.346525 Active lower constraints 73/75 upper > constraints 73/75 Percent of total 0.182272 Percent of bounded 0. > > 12 SNES VI Function norm 0.330338 Active lower constraints 72/74 upper > constraints 72/74 Percent of total 0.179775 Percent of bounded 0. > > 13 SNES VI Function norm 0.314217 Active lower constraints 71/73 upper > constraints 71/73 Percent of total 0.177278 Percent of bounded 0. > > 14 SNES VI Function norm 0.298162 Active lower constraints 70/72 upper > constraints 70/72 Percent of total 0.174782 Percent of bounded 0. > > 15 SNES VI Function norm 0.282173 Active lower constraints 69/71 upper > constraints 69/71 Percent of total 0.172285 Percent of bounded 0. > > 16 SNES VI Function norm 0.26625 Active lower constraints 68/70 upper > constraints 68/70 Percent of total 0.169788 Percent of bounded 0. > > 17 SNES VI Function norm 0.250393 Active lower constraints 67/69 upper > constraints 67/69 Percent of total 0.167291 Percent of bounded 0. > > 18 SNES VI Function norm 0.234604 Active lower constraints 66/68 upper > constraints 66/68 Percent of total 0.164794 Percent of bounded 0. > > 19 SNES VI Function norm 0.218882 Active lower constraints 65/67 upper > constraints 65/67 Percent of total 0.162297 Percent of bounded 0. > > 20 SNES VI Function norm 0.203229 Active lower constraints 64/66 upper > constraints 64/66 Percent of total 0.1598 Percent of bounded 0. > > 21 SNES VI Function norm 0.187643 Active lower constraints 63/65 upper > constraints 63/65 Percent of total 0.157303 Percent of bounded 0. > > 22 SNES VI Function norm 0.172126 Active lower constraints 62/64 upper > constraints 62/64 Percent of total 0.154806 Percent of bounded 0. > > 23 SNES VI Function norm 0.156679 Active lower constraints 61/63 upper > constraints 61/63 Percent of total 0.15231 Percent of bounded 0. > > 24 SNES VI Function norm 0.141301 Active lower constraints 60/62 upper > constraints 60/62 Percent of total 0.149813 Percent of bounded 0. > > 25 SNES VI Function norm 0.125993 Active lower constraints 59/61 upper > constraints 59/61 Percent of total 0.147316 Percent of bounded 0. > > 26 SNES VI Function norm 0.110755 Active lower constraints 58/60 upper > constraints 58/60 Percent of total 0.144819 Percent of bounded 0. > > 27 SNES VI Function norm 0.0955886 Active lower constraints 57/59 upper > constraints 57/59 Percent of total 0.142322 Percent of bounded 0. > > 28 SNES VI Function norm 0.0804936 Active lower constraints 56/58 upper > constraints 56/58 Percent of total 0.139825 Percent of bounded 0. > > 29 SNES VI Function norm 0.0654705 Active lower constraints 55/57 upper > constraints 55/57 Percent of total 0.137328 Percent of bounded 0. > > 30 SNES VI Function norm 0.0505198 Active lower constraints 54/56 upper > constraints 54/56 Percent of total 0.134831 Percent of bounded 0. > > 31 SNES VI Function norm 0.0356422 Active lower constraints 53/55 upper > constraints 53/55 Percent of total 0.132335 Percent of bounded 0. > > 32 SNES VI Function norm 0.020838 Active lower constraints 52/54 upper > constraints 52/54 Percent of total 0.129838 Percent of bounded 0. > > 33 SNES VI Function norm 0.0061078 Active lower constraints 51/53 upper > constraints 51/53 Percent of total 0.127341 Percent of bounded 0. > > 34 SNES VI Function norm 2.2664e-12 Active lower constraints 51/52 > upper constraints 51/52 Percent of total 0.127341 Percent of bounded 0. > > > > I've read that in some cases the VI solver is simply unable to move the > constraint set more than one grid cell per non-linear iteration. That looks > like what I'm seeing here... > > > > On Tue, Oct 29, 2019 at 7:15 AM Munson, Todd > wrote: > > > > Hi, > > > > Is the matrix for the linear PDE symmetric? If so, then the VI is > equivalent to > > finding the stationary points of a bound-constrained quadratic program > and you > > may want to use the TAO Newton Trust-Region or Line-Search methods for > > bound-constrained optimization problems. > > > > Alp: are there flags set when a problem is linear with a symmetric > matrix? Maybe > > we can do an internal reformulation in those cases to use the > optimization tools. > > > > Is there an easy way to get the matrix and the constant vector for one > of the > > problems that fails or does not perform well? Typically, the TAO RSLS > > methods will work well for the types of problems that you have and if > > they are not, then I can go about finding out why and making some > > improvements. > > > > Monotone in this case is that your matrix is positive semidefinite; > x^TMx >= 0 for > > all x. For M symmetric, this is the same as M having all nonnegative > eigenvalues. > > > > Todd. > > > > > On Oct 28, 2019, at 11:14 PM, Alexander Lindsay < > alexlindsay239 at gmail.com> wrote: > > > > > > On Thu, Oct 24, 2019 at 4:52 AM Munson, Todd > wrote: > > > > > > Hi, > > > > > > For these problems, how large are they? And are they linear or > nonlinear? > > > What I can do is use some fancier tools to help with what is going on > with > > > the solvers in certain cases. > > > > > > For the results cited above: > > > > > > 100 elements -> 101 dofs > > > 1,000 elements -> 1,001 dofs > > > 10,000 elements -> 10,001 dofs > > > > > > The PDE is linear with simple bounds constraints on the variable: 0 <= > u <= 10 > > > > > > > > > For Barry's question, the matrix in the SS solver is a diagonal matrix > plus > > > a column scaling of the Jacobian. > > > > > > Note: semismooth, reduced space and interior point methods mainly work > for > > > problems that are strictly monotone. > > > > > > Dumb question, but monotone in what way? > > > > > > Thanks for the replies! > > > > > > Alex > > > > > > Finding out what is going on with > > > your problems with some additional diagnostics might yield some > > > insights. > > > > > > Todd. > > > > > > > On Oct 24, 2019, at 3:36 AM, Smith, Barry F. > wrote: > > > > > > > > > > > > See bottom > > > > > > > > > > > >> On Oct 14, 2019, at 1:12 PM, Justin Chang via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > >> > > > >> It might depend on your application, but for my stuff on maximum > principles for advection-diffusion, I found RS to be much better than SS. > Here?s the paper I wrote documenting the performance numbers I came across > > > >> > > > >> https://www.sciencedirect.com/science/article/pii/S0045782516316176 > > > >> > > > >> Or the arXiV version: > > > >> > > > >> https://arxiv.org/pdf/1611.08758.pdf > > > >> > > > >> > > > >> On Mon, Oct 14, 2019 at 1:07 PM Alexander Lindsay via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > >> I've been working on mechanical contact in MOOSE for a while, and > it's led to me to think about general inequality constraint enforcement. > I've been playing around with both `vinewtonssls` and `vinewtonrsls`. In > Benson's and Munson's Flexible Complementarity Solvers paper, they were > able to solve 73.7% of their problems with SS and 65.5% with RS which led > them to conclude that the SS method is generally more robust. We have had > at least one instance where a MOOSE user reported an order of magnitude > reduction in non-linear iterations when switching from SS to RS. Moreover, > when running the problem described in this issue, I get these results: > > > >> > > > >> num_elements = 100 > > > >> SS nl iterations = 53 > > > >> RS nl iterations = 22 > > > >> > > > >> num_elements = 1000 > > > >> SS nl iterations = 123 > > > >> RS nl iterations = 140 > > > >> > > > >> num_elements = 10000 > > > >> SS: fails to converge within 50 nl iterations during the second > time step whether using a `basic` or `bt` line search > > > >> RS: fails to converge within 50 nl iterations during the second > time step whether using a `basic` or `bt` line search (although I believe > `vinewtonrsls` performs a line-search that is guaranteed to keep the > degrees of freedom within their bounds) > > > >> > > > >> So depending on the number of elements, it appears that either SS > or RS may be more performant. I guess since I can get different relative > performance with even the same PDE, it would be silly for me to ask for > guidance on when to use which? In the conclusion of Benson's and Munson's > paper, they mention using mesh sequencing for generating initial guesses on > finer meshes. Does anyone know whether there have been any publications > using PETSc/TAO and mesh sequencing for solving large VI problems? > > > >> > > > >> A related question: what needs to be done to allow SS to run with > `-snes_mf_operator`? RS already appears to support the option. > > > > > > > > This may not make sense. Is the operator used in the SS solution > process derivable from the function that is being optimized with the > constraints or some strange scaled beast? > > > >> > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: A.dat Type: application/octet-stream Size: 32032 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: b.dat Type: application/octet-stream Size: 6416 bytes Desc: not available URL: From aph at email.arizona.edu Mon Nov 4 14:14:43 2019 From: aph at email.arizona.edu (Anthony Paul Haas) Date: Mon, 4 Nov 2019 13:14:43 -0700 Subject: [petsc-users] --with-64-bit-indices=1 Message-ID: Hello, I ran into an issue while using Mumps from Petsc. I got the following error (see below please). Somebody suggested that I compile Petsc with --with-64-bit-indices=1. Will that suffice? Also I compiled my own version of Petsc on Cray Onyx (HPCMP) but although I compiled --with-debugging=0, Petsc was very very slow (compared to the version of Petsc available from the Cray admins). Do you have a list of flags that I should compile Petsc with for Cray supercomputers? Thanks, Anthony INFOG(1)=-51. I saw in the mumps manual that: An external ordering (Metis/ParMetis, SCOTCH/PT-SCOTCH, PORD), with 32-bit default integers, is invoked to processing a graph of size larger than 2^31-1. INFO(2) holds the size required to store the graph as a number of integer values; -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Nov 4 14:46:10 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Mon, 4 Nov 2019 20:46:10 +0000 Subject: [petsc-users] --with-64-bit-indices=1 In-Reply-To: References: Message-ID: > On Nov 4, 2019, at 2:14 PM, Anthony Paul Haas via petsc-users wrote: > > Hello, > > I ran into an issue while using Mumps from Petsc. I got the following error (see below please). Somebody suggested that I compile Petsc with --with-64-bit-indices=1. Will that suffice? Currently PETSc and MUMPS do not work together with --with-64-bit-indices=1. > Also I compiled my own version of Petsc on Cray Onyx (HPCMP) but although I compiled --with-debugging=0, Petsc was very very slow (compared to the version of Petsc available from the Cray admins). Do you have a list of flags that I should compile Petsc with for Cray supercomputers? No idea why it would be particularly slower. No way to know what compiler options they used. You also have a choice of different compilers on Cray, perhaps that makes a difference. > > Thanks, > > Anthony > > INFOG(1)=-51. I saw in the mumps manual that: > > An external ordering (Metis/ParMetis, SCOTCH/PT-SCOTCH, PORD), with 32-bit default > integers, is invoked to processing a graph of size larger than 2^31-1. INFO(2) holds the size > required to store the graph as a number of integer values; This is strange. Since PETSc cannot when using 32 bit indices produce such a large graph I cannot explain how this message was generated. Perhaps there was an integer overflow > > From fdkong.jd at gmail.com Mon Nov 4 17:28:17 2019 From: fdkong.jd at gmail.com (Fande Kong) Date: Mon, 4 Nov 2019 16:28:17 -0700 Subject: [petsc-users] Select a preconditioner for SLEPc eigenvalue solver Jacobi-Davidson In-Reply-To: <1CC5E48C-7709-44E0-84F9-7FBD46297069@dsic.upv.es> References: <1CC5E48C-7709-44E0-84F9-7FBD46297069@dsic.upv.es> Message-ID: Thanks Jose, I think I understand now. Another question: what is the right way to setup a linear preconditioning matrix for the inner linear solver of JD? I was trying to do something like this: /* Create eigensolver context */ ierr = EPSCreate(PETSC_COMM_WORLD,&eps);CHKERRQ(ierr); /* Set operators. In this case, it is a standard eigenvalue problem */ ierr = EPSSetOperators(eps,A,NULL);CHKERRQ(ierr); ierr = EPSSetProblemType(eps,EPS_HEP);CHKERRQ(ierr); ierr = EPSGetST(eps,&st);CHKERRQ(ierr); ierr = STPrecondSetMatForPC(st,B);CHKERRQ(ierr); /* Set solver parameters at runtime */ ierr = EPSSetFromOptions(eps);CHKERRQ(ierr); /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Solve the eigensystem - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ ierr = EPSSolve(eps);CHKERRQ(ierr); But did not work. A complete example is attached. I could try to dig into the code, but you may already know the answer. On Wed, Oct 23, 2019 at 3:58 AM Jose E. Roman wrote: > Yes, it is confusing. Here is the explanation: when you use a target, the > preconditioner is built from matrix A-sigma*B. By default, instead of > TARGET_MAGNITUDE we set LARGEST_MAGNITUDE, and in Jacobi-Davidson we treat > this case by setting sigma=PETSC_MAX_REAL. In this case, the preconditioner > is built from matrix B. The thing is that in a standard eigenproblem we > have B=I, and hence there is no point in using a preconditioner, that is > why we set PCNONE. > > Jose > > > > El 22 oct 2019, a las 19:57, Fande Kong via petsc-users < > petsc-users at mcs.anl.gov> escribi?: > > > > Hi All, > > > > It looks like the preconditioner is hard-coded in the Jacobi-Davidson > solver. I could not select a preconditioner rather than the default setting. > > > > For example, I was trying to select LU, but PC NONE was still used. I > ran standard example 2 in slepc/src/eps/examples/tutorials, and had the > following results. > > > > > > Thanks, > > > > Fande > > > > > > ./ex2 -eps_type jd -st_ksp_type gmres -st_pc_type lu -eps_view > > > > 2-D Laplacian Eigenproblem, N=100 (10x10 grid) > > > > EPS Object: 1 MPI processes > > type: jd > > search subspace is orthogonalized > > block size=1 > > type of the initial subspace: non-Krylov > > size of the subspace after restarting: 6 > > number of vectors after restarting from the previous iteration: 1 > > threshold for changing the target in the correction equation (fix): > 0.01 > > problem type: symmetric eigenvalue problem > > selected portion of the spectrum: largest eigenvalues in magnitude > > number of eigenvalues (nev): 1 > > number of column vectors (ncv): 17 > > maximum dimension of projected problem (mpd): 17 > > maximum number of iterations: 1700 > > tolerance: 1e-08 > > convergence test: relative to the eigenvalue > > BV Object: 1 MPI processes > > type: svec > > 17 columns of global length 100 > > vector orthogonalization method: classical Gram-Schmidt > > orthogonalization refinement: if needed (eta: 0.7071) > > block orthogonalization method: GS > > doing matmult as a single matrix-matrix product > > DS Object: 1 MPI processes > > type: hep > > solving the problem with: Implicit QR method (_steqr) > > ST Object: 1 MPI processes > > type: precond > > shift: 1.79769e+308 > > number of matrices: 1 > > KSP Object: (st_) 1 MPI processes > > type: gmres > > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > happy breakdown tolerance 1e-30 > > maximum iterations=90, initial guess is zero > > tolerances: relative=0.0001, absolute=1e-50, divergence=10000. > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: (st_) 1 MPI processes > > type: none > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: shell > > rows=100, cols=100 > > Solution method: jd > > > > Number of requested eigenvalues: 1 > > Linear eigensolve converged (1 eigenpair) due to CONVERGED_TOL; > iterations 20 > > ---------------------- -------------------- > > k ||Ax-kx||/||kx|| > > ---------------------- -------------------- > > 7.837972 7.71944e-10 > > ---------------------- -------------------- > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex3.c Type: application/octet-stream Size: 7372 bytes Desc: not available URL: From bsmith at mcs.anl.gov Mon Nov 4 19:38:30 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Tue, 5 Nov 2019 01:38:30 +0000 Subject: [petsc-users] --with-64-bit-indices=1 In-Reply-To: References: Message-ID: <0249846E-62A3-459D-A9F2-2717F1A622F6@mcs.anl.gov> You should upgrade to the lastest PETSc and that will also upgrade to the latest MUMPS. There may be better error checking and bounds checking now. Fundamentally until there is solid support for MUMPS with 64 bit indices you are stuck. SuperLU_DIST does fully support 64 bits but I have heard it is slower. Barry > On Nov 4, 2019, at 4:19 PM, Anthony Paul Haas wrote: > > Hi Barry, > > Thanks for your answer. Integer overflow seems to make sense. I am trying to do a direct inversion with Mumps LU. The code works for smaller grids but this case is pretty large (mesh is 12,001x 301). I am also attaching the output of the code in case that could provide more info. Do you know how I should proceed? > > Thanks, > > Anthony > > On Mon, Nov 4, 2019 at 1:46 PM Smith, Barry F. wrote: > > > > > > On Nov 4, 2019, at 2:14 PM, Anthony Paul Haas via petsc-users wrote: > > > > Hello, > > > > I ran into an issue while using Mumps from Petsc. I got the following error (see below please). Somebody suggested that I compile Petsc with --with-64-bit-indices=1. Will that suffice? > > Currently PETSc and MUMPS do not work together with --with-64-bit-indices=1. > > > Also I compiled my own version of Petsc on Cray Onyx (HPCMP) but although I compiled --with-debugging=0, Petsc was very very slow (compared to the version of Petsc available from the Cray admins). Do you have a list of flags that I should compile Petsc with for Cray supercomputers? > > No idea why it would be particularly slower. No way to know what compiler options they used. > > You also have a choice of different compilers on Cray, perhaps that makes a difference. > > > > > Thanks, > > > > Anthony > > > > INFOG(1)=-51. I saw in the mumps manual that: > > > > An external ordering (Metis/ParMetis, SCOTCH/PT-SCOTCH, PORD), with 32-bit default > > integers, is invoked to processing a graph of size larger than 2^31-1. INFO(2) holds the size > > required to store the graph as a number of integer values; > > This is strange. Since PETSc cannot when using 32 bit indices produce such a large graph I cannot explain how this message was generated. Perhaps there was an integer overflow > > > > > > > > From perceval.desforges at polytechnique.edu Tue Nov 5 07:55:58 2019 From: perceval.desforges at polytechnique.edu (Perceval Desforges) Date: Tue, 05 Nov 2019 14:55:58 +0100 Subject: [petsc-users] SLEPC no speedup in parallel In-Reply-To: <710EDF59-3233-470E-8E1F-22E3B504556B@dsic.upv.es> References: <82480440c0cf3973ca6e935413279be3@polytechnique.edu> <710EDF59-3233-470E-8E1F-22E3B504556B@dsic.upv.es> Message-ID: Hello, After carefully looking at the example in the tutorial suggested in section 3.4.5 of the manual, I managed to determine that the problem was caused by me calling EPSSetFromOptions(eps) after EPSKrylovSchurSetPartitions(eps,size) and not before. It now works fine. Sorry! Best regards, Perceval, Le 2019-11-04 18:45, Jose E. Roman a ?crit : > Did you follow the instructions in section 3.4.5 of the SLEPc users manual? > Send the output of -eps_view > > Jose > >> El 4 nov 2019, a las 18:33, Perceval Desforges via petsc-users escribi?: >> >> Dear petsc and slepc developpers, >> >> I am using slepc to solve an eigenvalue problem. Since I need all the eigenvalues in a certain interval, I use the spectrum slicing technique with mumps. However I do not understand: when I run my code with more than one processor, there is no speedup at all, and it even slows down, and I don't understand why. >> >> I wanted to test further and I ran the same code without spectrum slicing, and asking for about the same amount of eigenvalues. The calculation was much slower (about 10 times slower), but using more than one processor sped it up. >> >> Is this normal behavior or am I doing something wrong? >> >> Thanks, >> >> Best regards, >> >> Perceval, -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Tue Nov 5 10:07:18 2019 From: jroman at dsic.upv.es (Jose E. Roman) Date: Tue, 5 Nov 2019 17:07:18 +0100 Subject: [petsc-users] Select a preconditioner for SLEPc eigenvalue solver Jacobi-Davidson In-Reply-To: References: <1CC5E48C-7709-44E0-84F9-7FBD46297069@dsic.upv.es> Message-ID: Currently, the function that passes the preconditioner matrix is specific of STPRECOND, so you have to add ierr = STSetType(st,STPRECOND);CHKERRQ(ierr); before ierr = STPrecondSetMatForPC(st,B);CHKERRQ(ierr); otherwise this latter call is ignored. We may be changing a little bit the way in which ST is initialized, and maybe we modify this as well. It is not decided yet. Jose > El 5 nov 2019, a las 0:28, Fande Kong escribi?: > > Thanks Jose, > > I think I understand now. Another question: what is the right way to setup a linear preconditioning matrix for the inner linear solver of JD? > > I was trying to do something like this: > > /* > Create eigensolver context > */ > ierr = EPSCreate(PETSC_COMM_WORLD,&eps);CHKERRQ(ierr); > > /* > Set operators. In this case, it is a standard eigenvalue problem > */ > ierr = EPSSetOperators(eps,A,NULL);CHKERRQ(ierr); > ierr = EPSSetProblemType(eps,EPS_HEP);CHKERRQ(ierr); > ierr = EPSGetST(eps,&st);CHKERRQ(ierr); > ierr = STPrecondSetMatForPC(st,B);CHKERRQ(ierr); > > /* > Set solver parameters at runtime > */ > ierr = EPSSetFromOptions(eps);CHKERRQ(ierr); > > /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > Solve the eigensystem > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ > > ierr = EPSSolve(eps);CHKERRQ(ierr); > > > But did not work. A complete example is attached. I could try to dig into the code, but you may already know the answer. > > > On Wed, Oct 23, 2019 at 3:58 AM Jose E. Roman wrote: > Yes, it is confusing. Here is the explanation: when you use a target, the preconditioner is built from matrix A-sigma*B. By default, instead of TARGET_MAGNITUDE we set LARGEST_MAGNITUDE, and in Jacobi-Davidson we treat this case by setting sigma=PETSC_MAX_REAL. In this case, the preconditioner is built from matrix B. The thing is that in a standard eigenproblem we have B=I, and hence there is no point in using a preconditioner, that is why we set PCNONE. > > Jose > > > > El 22 oct 2019, a las 19:57, Fande Kong via petsc-users escribi?: > > > > Hi All, > > > > It looks like the preconditioner is hard-coded in the Jacobi-Davidson solver. I could not select a preconditioner rather than the default setting. > > > > For example, I was trying to select LU, but PC NONE was still used. I ran standard example 2 in slepc/src/eps/examples/tutorials, and had the following results. > > > > > > Thanks, > > > > Fande > > > > > > ./ex2 -eps_type jd -st_ksp_type gmres -st_pc_type lu -eps_view > > > > 2-D Laplacian Eigenproblem, N=100 (10x10 grid) > > > > EPS Object: 1 MPI processes > > type: jd > > search subspace is orthogonalized > > block size=1 > > type of the initial subspace: non-Krylov > > size of the subspace after restarting: 6 > > number of vectors after restarting from the previous iteration: 1 > > threshold for changing the target in the correction equation (fix): 0.01 > > problem type: symmetric eigenvalue problem > > selected portion of the spectrum: largest eigenvalues in magnitude > > number of eigenvalues (nev): 1 > > number of column vectors (ncv): 17 > > maximum dimension of projected problem (mpd): 17 > > maximum number of iterations: 1700 > > tolerance: 1e-08 > > convergence test: relative to the eigenvalue > > BV Object: 1 MPI processes > > type: svec > > 17 columns of global length 100 > > vector orthogonalization method: classical Gram-Schmidt > > orthogonalization refinement: if needed (eta: 0.7071) > > block orthogonalization method: GS > > doing matmult as a single matrix-matrix product > > DS Object: 1 MPI processes > > type: hep > > solving the problem with: Implicit QR method (_steqr) > > ST Object: 1 MPI processes > > type: precond > > shift: 1.79769e+308 > > number of matrices: 1 > > KSP Object: (st_) 1 MPI processes > > type: gmres > > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > > happy breakdown tolerance 1e-30 > > maximum iterations=90, initial guess is zero > > tolerances: relative=0.0001, absolute=1e-50, divergence=10000. > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: (st_) 1 MPI processes > > type: none > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: shell > > rows=100, cols=100 > > Solution method: jd > > > > Number of requested eigenvalues: 1 > > Linear eigensolve converged (1 eigenpair) due to CONVERGED_TOL; iterations 20 > > ---------------------- -------------------- > > k ||Ax-kx||/||kx|| > > ---------------------- -------------------- > > 7.837972 7.71944e-10 > > ---------------------- -------------------- > > > > > > > > From fdkong.jd at gmail.com Tue Nov 5 11:13:58 2019 From: fdkong.jd at gmail.com (Fande Kong) Date: Tue, 5 Nov 2019 10:13:58 -0700 Subject: [petsc-users] Select a preconditioner for SLEPc eigenvalue solver Jacobi-Davidson In-Reply-To: References: <1CC5E48C-7709-44E0-84F9-7FBD46297069@dsic.upv.es> Message-ID: How about I want to determine the ST type on runtime? mpirun -n 1 ./ex3 -eps_type jd -st_ksp_type gmres -st_pc_type none -eps_view -eps_target 0 -eps_monitor -st_ksp_monitor ST is indeed STPrecond, but the passed preconditioning matrix is still ignored. EPS Object: 1 MPI processes type: jd search subspace is orthogonalized block size=1 type of the initial subspace: non-Krylov size of the subspace after restarting: 6 number of vectors after restarting from the previous iteration: 1 threshold for changing the target in the correction equation (fix): 0.01 problem type: symmetric eigenvalue problem selected portion of the spectrum: closest to target: 0. (in magnitude) number of eigenvalues (nev): 1 number of column vectors (ncv): 17 maximum dimension of projected problem (mpd): 17 maximum number of iterations: 1700 tolerance: 1e-08 convergence test: relative to the eigenvalue BV Object: 1 MPI processes type: svec 17 columns of global length 100 vector orthogonalization method: classical Gram-Schmidt orthogonalization refinement: if needed (eta: 0.7071) block orthogonalization method: GS doing matmult as a single matrix-matrix product DS Object: 1 MPI processes type: hep solving the problem with: Implicit QR method (_steqr) ST Object: 1 MPI processes type: precond shift: 0. number of matrices: 1 KSP Object: (st_) 1 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=90, initial guess is zero tolerances: relative=0.0001, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (st_) 1 MPI processes type: none linear system matrix = precond matrix: Mat Object: 1 MPI processes type: shell rows=100, cols=100 Solution method: jd Preconding matrix should be a SeqAIJ not shell. Fande, On Tue, Nov 5, 2019 at 9:07 AM Jose E. Roman wrote: > Currently, the function that passes the preconditioner matrix is specific > of STPRECOND, so you have to add > ierr = STSetType(st,STPRECOND);CHKERRQ(ierr); > before > ierr = STPrecondSetMatForPC(st,B);CHKERRQ(ierr); > otherwise this latter call is ignored. > > We may be changing a little bit the way in which ST is initialized, and > maybe we modify this as well. It is not decided yet. > > Jose > > > > El 5 nov 2019, a las 0:28, Fande Kong escribi?: > > > > Thanks Jose, > > > > I think I understand now. Another question: what is the right way to > setup a linear preconditioning matrix for the inner linear solver of JD? > > > > I was trying to do something like this: > > > > /* > > Create eigensolver context > > */ > > ierr = EPSCreate(PETSC_COMM_WORLD,&eps);CHKERRQ(ierr); > > > > /* > > Set operators. In this case, it is a standard eigenvalue problem > > */ > > ierr = EPSSetOperators(eps,A,NULL);CHKERRQ(ierr); > > ierr = EPSSetProblemType(eps,EPS_HEP);CHKERRQ(ierr); > > ierr = EPSGetST(eps,&st);CHKERRQ(ierr); > > ierr = STPrecondSetMatForPC(st,B);CHKERRQ(ierr); > > > > /* > > Set solver parameters at runtime > > */ > > ierr = EPSSetFromOptions(eps);CHKERRQ(ierr); > > > > /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > Solve the eigensystem > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > */ > > > > ierr = EPSSolve(eps);CHKERRQ(ierr); > > > > > > But did not work. A complete example is attached. I could try to dig > into the code, but you may already know the answer. > > > > > > On Wed, Oct 23, 2019 at 3:58 AM Jose E. Roman > wrote: > > Yes, it is confusing. Here is the explanation: when you use a target, > the preconditioner is built from matrix A-sigma*B. By default, instead of > TARGET_MAGNITUDE we set LARGEST_MAGNITUDE, and in Jacobi-Davidson we treat > this case by setting sigma=PETSC_MAX_REAL. In this case, the preconditioner > is built from matrix B. The thing is that in a standard eigenproblem we > have B=I, and hence there is no point in using a preconditioner, that is > why we set PCNONE. > > > > Jose > > > > > > > El 22 oct 2019, a las 19:57, Fande Kong via petsc-users < > petsc-users at mcs.anl.gov> escribi?: > > > > > > Hi All, > > > > > > It looks like the preconditioner is hard-coded in the Jacobi-Davidson > solver. I could not select a preconditioner rather than the default setting. > > > > > > For example, I was trying to select LU, but PC NONE was still used. I > ran standard example 2 in slepc/src/eps/examples/tutorials, and had the > following results. > > > > > > > > > Thanks, > > > > > > Fande > > > > > > > > > ./ex2 -eps_type jd -st_ksp_type gmres -st_pc_type lu -eps_view > > > > > > 2-D Laplacian Eigenproblem, N=100 (10x10 grid) > > > > > > EPS Object: 1 MPI processes > > > type: jd > > > search subspace is orthogonalized > > > block size=1 > > > type of the initial subspace: non-Krylov > > > size of the subspace after restarting: 6 > > > number of vectors after restarting from the previous iteration: 1 > > > threshold for changing the target in the correction equation > (fix): 0.01 > > > problem type: symmetric eigenvalue problem > > > selected portion of the spectrum: largest eigenvalues in magnitude > > > number of eigenvalues (nev): 1 > > > number of column vectors (ncv): 17 > > > maximum dimension of projected problem (mpd): 17 > > > maximum number of iterations: 1700 > > > tolerance: 1e-08 > > > convergence test: relative to the eigenvalue > > > BV Object: 1 MPI processes > > > type: svec > > > 17 columns of global length 100 > > > vector orthogonalization method: classical Gram-Schmidt > > > orthogonalization refinement: if needed (eta: 0.7071) > > > block orthogonalization method: GS > > > doing matmult as a single matrix-matrix product > > > DS Object: 1 MPI processes > > > type: hep > > > solving the problem with: Implicit QR method (_steqr) > > > ST Object: 1 MPI processes > > > type: precond > > > shift: 1.79769e+308 > > > number of matrices: 1 > > > KSP Object: (st_) 1 MPI processes > > > type: gmres > > > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > > happy breakdown tolerance 1e-30 > > > maximum iterations=90, initial guess is zero > > > tolerances: relative=0.0001, absolute=1e-50, divergence=10000. > > > left preconditioning > > > using PRECONDITIONED norm type for convergence test > > > PC Object: (st_) 1 MPI processes > > > type: none > > > linear system matrix = precond matrix: > > > Mat Object: 1 MPI processes > > > type: shell > > > rows=100, cols=100 > > > Solution method: jd > > > > > > Number of requested eigenvalues: 1 > > > Linear eigensolve converged (1 eigenpair) due to CONVERGED_TOL; > iterations 20 > > > ---------------------- -------------------- > > > k ||Ax-kx||/||kx|| > > > ---------------------- -------------------- > > > 7.837972 7.71944e-10 > > > ---------------------- -------------------- > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jczhang at mcs.anl.gov Tue Nov 5 11:32:59 2019 From: jczhang at mcs.anl.gov (Zhang, Junchao) Date: Tue, 5 Nov 2019 17:32:59 +0000 Subject: [petsc-users] VecDuplicate for FFTW-Vec causes VecDestroy to fail conditionally on VecLoad In-Reply-To: References: <9495B365-A1AE-4840-8E2D-45CD01DE3D41@anl.gov> Message-ID: Fixed in https://gitlab.com/petsc/petsc/merge_requests/2262 --Junchao Zhang On Fri, Nov 1, 2019 at 6:51 PM Sajid Ali > wrote: Hi Junchao/Barry, It doesn't really matter what the h5 file contains, so I'm attaching a lightly edited script of src/vec/vec/examples/tutorials/ex10.c which should produce a vector to be used as input for the above test case. (I'm working with ` --with-scalar-type=complex`). Now that I think of it, fixing this bug is not important, I can workaround the issue by creating a new vector with VecCreateMPI and accept the small loss in performance of VecPointwiseMult due to misaligned layouts. If it's a small fix it may be worth the time, but fixing this is not a big priority right now. If it's a complicated fix, this issue can serve as a note to future users. Thank You, Sajid Ali Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Tue Nov 5 11:33:13 2019 From: jroman at dsic.upv.es (Jose E. Roman) Date: Tue, 5 Nov 2019 18:33:13 +0100 Subject: [petsc-users] Select a preconditioner for SLEPc eigenvalue solver Jacobi-Davidson In-Reply-To: References: <1CC5E48C-7709-44E0-84F9-7FBD46297069@dsic.upv.es> Message-ID: <00026233-9A46-4FB0-92C1-DEF642B0682F@dsic.upv.es> JD sets STPRECOND at EPSSetUp(), if it was not set before. So I guess you need to add -st_type precond on the command line, so that it is set at EPSSetFromOptions(). Jose > El 5 nov 2019, a las 18:13, Fande Kong escribi?: > > How about I want to determine the ST type on runtime? > > mpirun -n 1 ./ex3 -eps_type jd -st_ksp_type gmres -st_pc_type none -eps_view -eps_target 0 -eps_monitor -st_ksp_monitor > > ST is indeed STPrecond, but the passed preconditioning matrix is still ignored. > > EPS Object: 1 MPI processes > type: jd > search subspace is orthogonalized > block size=1 > type of the initial subspace: non-Krylov > size of the subspace after restarting: 6 > number of vectors after restarting from the previous iteration: 1 > threshold for changing the target in the correction equation (fix): 0.01 > problem type: symmetric eigenvalue problem > selected portion of the spectrum: closest to target: 0. (in magnitude) > number of eigenvalues (nev): 1 > number of column vectors (ncv): 17 > maximum dimension of projected problem (mpd): 17 > maximum number of iterations: 1700 > tolerance: 1e-08 > convergence test: relative to the eigenvalue > BV Object: 1 MPI processes > type: svec > 17 columns of global length 100 > vector orthogonalization method: classical Gram-Schmidt > orthogonalization refinement: if needed (eta: 0.7071) > block orthogonalization method: GS > doing matmult as a single matrix-matrix product > DS Object: 1 MPI processes > type: hep > solving the problem with: Implicit QR method (_steqr) > ST Object: 1 MPI processes > type: precond > shift: 0. > number of matrices: 1 > KSP Object: (st_) 1 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=90, initial guess is zero > tolerances: relative=0.0001, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (st_) 1 MPI processes > type: none > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: shell > rows=100, cols=100 > Solution method: jd > > > Preconding matrix should be a SeqAIJ not shell. > > > Fande, > > On Tue, Nov 5, 2019 at 9:07 AM Jose E. Roman wrote: > Currently, the function that passes the preconditioner matrix is specific of STPRECOND, so you have to add > ierr = STSetType(st,STPRECOND);CHKERRQ(ierr); > before > ierr = STPrecondSetMatForPC(st,B);CHKERRQ(ierr); > otherwise this latter call is ignored. > > We may be changing a little bit the way in which ST is initialized, and maybe we modify this as well. It is not decided yet. > > Jose > > > > El 5 nov 2019, a las 0:28, Fande Kong escribi?: > > > > Thanks Jose, > > > > I think I understand now. Another question: what is the right way to setup a linear preconditioning matrix for the inner linear solver of JD? > > > > I was trying to do something like this: > > > > /* > > Create eigensolver context > > */ > > ierr = EPSCreate(PETSC_COMM_WORLD,&eps);CHKERRQ(ierr); > > > > /* > > Set operators. In this case, it is a standard eigenvalue problem > > */ > > ierr = EPSSetOperators(eps,A,NULL);CHKERRQ(ierr); > > ierr = EPSSetProblemType(eps,EPS_HEP);CHKERRQ(ierr); > > ierr = EPSGetST(eps,&st);CHKERRQ(ierr); > > ierr = STPrecondSetMatForPC(st,B);CHKERRQ(ierr); > > > > /* > > Set solver parameters at runtime > > */ > > ierr = EPSSetFromOptions(eps);CHKERRQ(ierr); > > > > /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > Solve the eigensystem > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ > > > > ierr = EPSSolve(eps);CHKERRQ(ierr); > > > > > > But did not work. A complete example is attached. I could try to dig into the code, but you may already know the answer. > > > > > > On Wed, Oct 23, 2019 at 3:58 AM Jose E. Roman wrote: > > Yes, it is confusing. Here is the explanation: when you use a target, the preconditioner is built from matrix A-sigma*B. By default, instead of TARGET_MAGNITUDE we set LARGEST_MAGNITUDE, and in Jacobi-Davidson we treat this case by setting sigma=PETSC_MAX_REAL. In this case, the preconditioner is built from matrix B. The thing is that in a standard eigenproblem we have B=I, and hence there is no point in using a preconditioner, that is why we set PCNONE. > > > > Jose > > > > > > > El 22 oct 2019, a las 19:57, Fande Kong via petsc-users escribi?: > > > > > > Hi All, > > > > > > It looks like the preconditioner is hard-coded in the Jacobi-Davidson solver. I could not select a preconditioner rather than the default setting. > > > > > > For example, I was trying to select LU, but PC NONE was still used. I ran standard example 2 in slepc/src/eps/examples/tutorials, and had the following results. > > > > > > > > > Thanks, > > > > > > Fande > > > > > > > > > ./ex2 -eps_type jd -st_ksp_type gmres -st_pc_type lu -eps_view > > > > > > 2-D Laplacian Eigenproblem, N=100 (10x10 grid) > > > > > > EPS Object: 1 MPI processes > > > type: jd > > > search subspace is orthogonalized > > > block size=1 > > > type of the initial subspace: non-Krylov > > > size of the subspace after restarting: 6 > > > number of vectors after restarting from the previous iteration: 1 > > > threshold for changing the target in the correction equation (fix): 0.01 > > > problem type: symmetric eigenvalue problem > > > selected portion of the spectrum: largest eigenvalues in magnitude > > > number of eigenvalues (nev): 1 > > > number of column vectors (ncv): 17 > > > maximum dimension of projected problem (mpd): 17 > > > maximum number of iterations: 1700 > > > tolerance: 1e-08 > > > convergence test: relative to the eigenvalue > > > BV Object: 1 MPI processes > > > type: svec > > > 17 columns of global length 100 > > > vector orthogonalization method: classical Gram-Schmidt > > > orthogonalization refinement: if needed (eta: 0.7071) > > > block orthogonalization method: GS > > > doing matmult as a single matrix-matrix product > > > DS Object: 1 MPI processes > > > type: hep > > > solving the problem with: Implicit QR method (_steqr) > > > ST Object: 1 MPI processes > > > type: precond > > > shift: 1.79769e+308 > > > number of matrices: 1 > > > KSP Object: (st_) 1 MPI processes > > > type: gmres > > > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > > > happy breakdown tolerance 1e-30 > > > maximum iterations=90, initial guess is zero > > > tolerances: relative=0.0001, absolute=1e-50, divergence=10000. > > > left preconditioning > > > using PRECONDITIONED norm type for convergence test > > > PC Object: (st_) 1 MPI processes > > > type: none > > > linear system matrix = precond matrix: > > > Mat Object: 1 MPI processes > > > type: shell > > > rows=100, cols=100 > > > Solution method: jd > > > > > > Number of requested eigenvalues: 1 > > > Linear eigensolve converged (1 eigenpair) due to CONVERGED_TOL; iterations 20 > > > ---------------------- -------------------- > > > k ||Ax-kx||/||kx|| > > > ---------------------- -------------------- > > > 7.837972 7.71944e-10 > > > ---------------------- -------------------- > > > > > > > > > > > > > > From fdkong.jd at gmail.com Tue Nov 5 11:55:50 2019 From: fdkong.jd at gmail.com (Fande Kong) Date: Tue, 5 Nov 2019 10:55:50 -0700 Subject: [petsc-users] Select a preconditioner for SLEPc eigenvalue solver Jacobi-Davidson In-Reply-To: <00026233-9A46-4FB0-92C1-DEF642B0682F@dsic.upv.es> References: <1CC5E48C-7709-44E0-84F9-7FBD46297069@dsic.upv.es> <00026233-9A46-4FB0-92C1-DEF642B0682F@dsic.upv.es> Message-ID: OK, I figured it out! I need to add the code : ierr = EPSGetST(eps,&st);CHKERRQ(ierr); ierr = STPrecondSetMatForPC(st,B);CHKERRQ(ierr); after ierr = EPSSetFromOptions(eps);CHKERRQ(ierr); It might be a good idea to document this if we intend to do so. Fande, On Tue, Nov 5, 2019 at 10:33 AM Jose E. Roman wrote: > JD sets STPRECOND at EPSSetUp(), if it was not set before. So I guess you > need to add -st_type precond on the command line, so that it is set at > EPSSetFromOptions(). > > Jose > > > > El 5 nov 2019, a las 18:13, Fande Kong escribi?: > > > > How about I want to determine the ST type on runtime? > > > > mpirun -n 1 ./ex3 -eps_type jd -st_ksp_type gmres -st_pc_type none > -eps_view -eps_target 0 -eps_monitor -st_ksp_monitor > > > > ST is indeed STPrecond, but the passed preconditioning matrix is still > ignored. > > > > EPS Object: 1 MPI processes > > type: jd > > search subspace is orthogonalized > > block size=1 > > type of the initial subspace: non-Krylov > > size of the subspace after restarting: 6 > > number of vectors after restarting from the previous iteration: 1 > > threshold for changing the target in the correction equation (fix): > 0.01 > > problem type: symmetric eigenvalue problem > > selected portion of the spectrum: closest to target: 0. (in magnitude) > > number of eigenvalues (nev): 1 > > number of column vectors (ncv): 17 > > maximum dimension of projected problem (mpd): 17 > > maximum number of iterations: 1700 > > tolerance: 1e-08 > > convergence test: relative to the eigenvalue > > BV Object: 1 MPI processes > > type: svec > > 17 columns of global length 100 > > vector orthogonalization method: classical Gram-Schmidt > > orthogonalization refinement: if needed (eta: 0.7071) > > block orthogonalization method: GS > > doing matmult as a single matrix-matrix product > > DS Object: 1 MPI processes > > type: hep > > solving the problem with: Implicit QR method (_steqr) > > ST Object: 1 MPI processes > > type: precond > > shift: 0. > > number of matrices: 1 > > KSP Object: (st_) 1 MPI processes > > type: gmres > > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > happy breakdown tolerance 1e-30 > > maximum iterations=90, initial guess is zero > > tolerances: relative=0.0001, absolute=1e-50, divergence=10000. > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: (st_) 1 MPI processes > > type: none > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: shell > > rows=100, cols=100 > > Solution method: jd > > > > > > Preconding matrix should be a SeqAIJ not shell. > > > > > > Fande, > > > > On Tue, Nov 5, 2019 at 9:07 AM Jose E. Roman wrote: > > Currently, the function that passes the preconditioner matrix is > specific of STPRECOND, so you have to add > > ierr = STSetType(st,STPRECOND);CHKERRQ(ierr); > > before > > ierr = STPrecondSetMatForPC(st,B);CHKERRQ(ierr); > > otherwise this latter call is ignored. > > > > We may be changing a little bit the way in which ST is initialized, and > maybe we modify this as well. It is not decided yet. > > > > Jose > > > > > > > El 5 nov 2019, a las 0:28, Fande Kong escribi?: > > > > > > Thanks Jose, > > > > > > I think I understand now. Another question: what is the right way to > setup a linear preconditioning matrix for the inner linear solver of JD? > > > > > > I was trying to do something like this: > > > > > > /* > > > Create eigensolver context > > > */ > > > ierr = EPSCreate(PETSC_COMM_WORLD,&eps);CHKERRQ(ierr); > > > > > > /* > > > Set operators. In this case, it is a standard eigenvalue problem > > > */ > > > ierr = EPSSetOperators(eps,A,NULL);CHKERRQ(ierr); > > > ierr = EPSSetProblemType(eps,EPS_HEP);CHKERRQ(ierr); > > > ierr = EPSGetST(eps,&st);CHKERRQ(ierr); > > > ierr = STPrecondSetMatForPC(st,B);CHKERRQ(ierr); > > > > > > /* > > > Set solver parameters at runtime > > > */ > > > ierr = EPSSetFromOptions(eps);CHKERRQ(ierr); > > > > > > /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - > > > Solve the eigensystem > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - */ > > > > > > ierr = EPSSolve(eps);CHKERRQ(ierr); > > > > > > > > > But did not work. A complete example is attached. I could try to dig > into the code, but you may already know the answer. > > > > > > > > > On Wed, Oct 23, 2019 at 3:58 AM Jose E. Roman > wrote: > > > Yes, it is confusing. Here is the explanation: when you use a target, > the preconditioner is built from matrix A-sigma*B. By default, instead of > TARGET_MAGNITUDE we set LARGEST_MAGNITUDE, and in Jacobi-Davidson we treat > this case by setting sigma=PETSC_MAX_REAL. In this case, the preconditioner > is built from matrix B. The thing is that in a standard eigenproblem we > have B=I, and hence there is no point in using a preconditioner, that is > why we set PCNONE. > > > > > > Jose > > > > > > > > > > El 22 oct 2019, a las 19:57, Fande Kong via petsc-users < > petsc-users at mcs.anl.gov> escribi?: > > > > > > > > Hi All, > > > > > > > > It looks like the preconditioner is hard-coded in the > Jacobi-Davidson solver. I could not select a preconditioner rather than the > default setting. > > > > > > > > For example, I was trying to select LU, but PC NONE was still used. > I ran standard example 2 in slepc/src/eps/examples/tutorials, and had the > following results. > > > > > > > > > > > > Thanks, > > > > > > > > Fande > > > > > > > > > > > > ./ex2 -eps_type jd -st_ksp_type gmres -st_pc_type lu -eps_view > > > > > > > > 2-D Laplacian Eigenproblem, N=100 (10x10 grid) > > > > > > > > EPS Object: 1 MPI processes > > > > type: jd > > > > search subspace is orthogonalized > > > > block size=1 > > > > type of the initial subspace: non-Krylov > > > > size of the subspace after restarting: 6 > > > > number of vectors after restarting from the previous iteration: 1 > > > > threshold for changing the target in the correction equation > (fix): 0.01 > > > > problem type: symmetric eigenvalue problem > > > > selected portion of the spectrum: largest eigenvalues in magnitude > > > > number of eigenvalues (nev): 1 > > > > number of column vectors (ncv): 17 > > > > maximum dimension of projected problem (mpd): 17 > > > > maximum number of iterations: 1700 > > > > tolerance: 1e-08 > > > > convergence test: relative to the eigenvalue > > > > BV Object: 1 MPI processes > > > > type: svec > > > > 17 columns of global length 100 > > > > vector orthogonalization method: classical Gram-Schmidt > > > > orthogonalization refinement: if needed (eta: 0.7071) > > > > block orthogonalization method: GS > > > > doing matmult as a single matrix-matrix product > > > > DS Object: 1 MPI processes > > > > type: hep > > > > solving the problem with: Implicit QR method (_steqr) > > > > ST Object: 1 MPI processes > > > > type: precond > > > > shift: 1.79769e+308 > > > > number of matrices: 1 > > > > KSP Object: (st_) 1 MPI processes > > > > type: gmres > > > > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > > > happy breakdown tolerance 1e-30 > > > > maximum iterations=90, initial guess is zero > > > > tolerances: relative=0.0001, absolute=1e-50, divergence=10000. > > > > left preconditioning > > > > using PRECONDITIONED norm type for convergence test > > > > PC Object: (st_) 1 MPI processes > > > > type: none > > > > linear system matrix = precond matrix: > > > > Mat Object: 1 MPI processes > > > > type: shell > > > > rows=100, cols=100 > > > > Solution method: jd > > > > > > > > Number of requested eigenvalues: 1 > > > > Linear eigensolve converged (1 eigenpair) due to CONVERGED_TOL; > iterations 20 > > > > ---------------------- -------------------- > > > > k ||Ax-kx||/||kx|| > > > > ---------------------- -------------------- > > > > 7.837972 7.71944e-10 > > > > ---------------------- -------------------- > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Tue Nov 5 15:09:28 2019 From: hgbk2008 at gmail.com (hg) Date: Tue, 5 Nov 2019 22:09:28 +0100 Subject: [petsc-users] solve problem with pastix Message-ID: Hello I got crashed when using Pastix as solver for KSP. The error message looks like: .... NUMBER of BUBBLE 1 COEFMAX 1735566 CPFTMAX 0 BPFTMAX 0 NBFTMAX 0 ARFTMAX 0 ** End of Partition & Distribution phase ** Time to analyze 0.225 s Number of nonzeros in factorized matrix 708784076 Fill-in 12.2337 Number of operations (LU) 2.80185e+12 Prediction Time to factorize (AMD 6180 MKL) 394 s 0 : SolverMatrix size (without coefficients) 32.4 MB 0 : Number of nonzeros (local block structure) 365309391 Numerical Factorization (LU) : 0 : Internal CSC size 1.08 GB Time to fill internal csc 6.66 s --- Sopalin : Allocation de la structure globale --- --- Fin Sopalin Init --- --- Initialisation des tableaux globaux --- sched_setaffinity: Invalid argument [node083:165071] *** Process received signal *** [node083:165071] Signal: Aborted (6) [node083:165071] Signal code: (-6) [node083:165071] [ 0] /lib64/libpthread.so.0(+0xf680)[0x2b8081845680] [node083:165071] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b8082191207] [node083:165071] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b80821928f8] [node083:165071] [ 3] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_comm+0x0)[0x2b80a4124c9d] [node083:165071] [ 4] Launching 1 threads (1 commputation, 0 communication, 0 out-of-core) --- Sopalin : Local structure allocation --- /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_sopalin_init_smp+0x29b)[0x2b80a40c39d2] [node083:165071] [ 5] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_smp+0x68)[0x2b80a40cf4c2] [node083:165071] [ 6] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_thread+0x4ba)[0x2b80a4124a31] [node083:165071] [ 7] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_thread+0x94)[0x2b80a40d6170] [node083:165071] [ 8] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_pastix_task_sopalin+0x5ad)[0x2b80a40b09a2] [node083:165071] [ 9] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(d_pastix+0xa8a)[0x2b80a40b2325] [node083:165071] [10] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0x63927b)[0x2b80a35bf27b] [node083:165071] [11] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(MatLUFactorNumeric+0x19a)[0x2b80a32c7552] [node083:165071] [12] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0xa46c09)[0x2b80a39ccc09] [node083:165071] [13] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(PCSetUp+0x311)[0x2b80a3a8f1a9] [node083:165071] [14] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSetUp+0xbf7)[0x2b80a3b46e81] [node083:165071] [15] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSolve+0x210)[0x2b80a3b4746e] Does anyone have an idea what is the problem and how to fix it? The PETSc parameters I used are as below: -pc_type lu -pc_factor_mat_solver_package pastix -mat_pastix_verbose 2 -mat_pastix_threadnbr 1 Giang -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 5 15:49:52 2019 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 5 Nov 2019 16:49:52 -0500 Subject: [petsc-users] solve problem with pastix In-Reply-To: References: Message-ID: On Tue, Nov 5, 2019 at 4:11 PM hg via petsc-users wrote: > Hello > > I got crashed when using Pastix as solver for KSP. The error message looks > like: > > .... > NUMBER of BUBBLE 1 > COEFMAX 1735566 CPFTMAX 0 BPFTMAX 0 NBFTMAX 0 ARFTMAX 0 > ** End of Partition & Distribution phase ** > Time to analyze 0.225 s > Number of nonzeros in factorized matrix 708784076 > Fill-in 12.2337 > Number of operations (LU) 2.80185e+12 > Prediction Time to factorize (AMD 6180 MKL) 394 s > 0 : SolverMatrix size (without coefficients) 32.4 MB > 0 : Number of nonzeros (local block structure) 365309391 > Numerical Factorization (LU) : > 0 : Internal CSC size 1.08 GB > Time to fill internal csc 6.66 s > --- Sopalin : Allocation de la structure globale --- > --- Fin Sopalin Init --- > --- Initialisation des tableaux globaux --- > sched_setaffinity: Invalid argument > [node083:165071] *** Process received signal *** > [node083:165071] Signal: Aborted (6) > [node083:165071] Signal code: (-6) > [node083:165071] [ 0] /lib64/libpthread.so.0(+0xf680)[0x2b8081845680] > [node083:165071] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b8082191207] > [node083:165071] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b80821928f8] > [node083:165071] [ 3] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_comm+0x0)[0x2b80a4124c9d] > [node083:165071] [ 4] Launching 1 threads (1 commputation, 0 > communication, 0 out-of-core) > --- Sopalin : Local structure allocation --- > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_sopalin_init_smp+0x29b)[0x2b80a40c39d2] > [node083:165071] [ 5] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_smp+0x68)[0x2b80a40cf4c2] > [node083:165071] [ 6] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_thread+0x4ba)[0x2b80a4124a31] > [node083:165071] [ 7] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_thread+0x94)[0x2b80a40d6170] > [node083:165071] [ 8] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_pastix_task_sopalin+0x5ad)[0x2b80a40b09a2] > [node083:165071] [ 9] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(d_pastix+0xa8a)[0x2b80a40b2325] > [node083:165071] [10] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0x63927b)[0x2b80a35bf27b] > [node083:165071] [11] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(MatLUFactorNumeric+0x19a)[0x2b80a32c7552] > [node083:165071] [12] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0xa46c09)[0x2b80a39ccc09] > [node083:165071] [13] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(PCSetUp+0x311)[0x2b80a3a8f1a9] > [node083:165071] [14] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSetUp+0xbf7)[0x2b80a3b46e81] > [node083:165071] [15] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSolve+0x210)[0x2b80a3b4746e] > > Does anyone have an idea what is the problem and how to fix it? The PETSc > parameters I used are as below: > It looks like PasTix is having trouble setting the thread affinity: sched_setaffinity: Invalid argument so it may be your build of PasTix. Thanks, Matt > -pc_type lu > -pc_factor_mat_solver_package pastix > -mat_pastix_verbose 2 > -mat_pastix_threadnbr 1 > > Giang > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Tue Nov 5 16:32:08 2019 From: hgbk2008 at gmail.com (hg) Date: Tue, 5 Nov 2019 23:32:08 +0100 Subject: [petsc-users] solve problem with pastix In-Reply-To: References: Message-ID: Should thread affinity be invoked? I set -mat_pastix_threadnbr 1 and also OMP_NUM_THREADS to 1 Giang On Tue, Nov 5, 2019 at 10:50 PM Matthew Knepley wrote: > On Tue, Nov 5, 2019 at 4:11 PM hg via petsc-users > wrote: > >> Hello >> >> I got crashed when using Pastix as solver for KSP. The error message >> looks like: >> >> .... >> NUMBER of BUBBLE 1 >> COEFMAX 1735566 CPFTMAX 0 BPFTMAX 0 NBFTMAX 0 ARFTMAX 0 >> ** End of Partition & Distribution phase ** >> Time to analyze 0.225 s >> Number of nonzeros in factorized matrix 708784076 >> Fill-in 12.2337 >> Number of operations (LU) 2.80185e+12 >> Prediction Time to factorize (AMD 6180 MKL) 394 s >> 0 : SolverMatrix size (without coefficients) 32.4 MB >> 0 : Number of nonzeros (local block structure) 365309391 >> Numerical Factorization (LU) : >> 0 : Internal CSC size 1.08 GB >> Time to fill internal csc 6.66 s >> --- Sopalin : Allocation de la structure globale --- >> --- Fin Sopalin Init --- >> --- Initialisation des tableaux globaux --- >> sched_setaffinity: Invalid argument >> [node083:165071] *** Process received signal *** >> [node083:165071] Signal: Aborted (6) >> [node083:165071] Signal code: (-6) >> [node083:165071] [ 0] /lib64/libpthread.so.0(+0xf680)[0x2b8081845680] >> [node083:165071] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b8082191207] >> [node083:165071] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b80821928f8] >> [node083:165071] [ 3] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_comm+0x0)[0x2b80a4124c9d] >> [node083:165071] [ 4] Launching 1 threads (1 commputation, 0 >> communication, 0 out-of-core) >> --- Sopalin : Local structure allocation --- >> >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_sopalin_init_smp+0x29b)[0x2b80a40c39d2] >> [node083:165071] [ 5] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_smp+0x68)[0x2b80a40cf4c2] >> [node083:165071] [ 6] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_thread+0x4ba)[0x2b80a4124a31] >> [node083:165071] [ 7] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_thread+0x94)[0x2b80a40d6170] >> [node083:165071] [ 8] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_pastix_task_sopalin+0x5ad)[0x2b80a40b09a2] >> [node083:165071] [ 9] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(d_pastix+0xa8a)[0x2b80a40b2325] >> [node083:165071] [10] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0x63927b)[0x2b80a35bf27b] >> [node083:165071] [11] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(MatLUFactorNumeric+0x19a)[0x2b80a32c7552] >> [node083:165071] [12] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0xa46c09)[0x2b80a39ccc09] >> [node083:165071] [13] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(PCSetUp+0x311)[0x2b80a3a8f1a9] >> [node083:165071] [14] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSetUp+0xbf7)[0x2b80a3b46e81] >> [node083:165071] [15] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSolve+0x210)[0x2b80a3b4746e] >> >> Does anyone have an idea what is the problem and how to fix it? The PETSc >> parameters I used are as below: >> > > It looks like PasTix is having trouble setting the thread affinity: > > sched_setaffinity: Invalid argument > > so it may be your build of PasTix. > > Thanks, > > Matt > > >> -pc_type lu >> -pc_factor_mat_solver_package pastix >> -mat_pastix_verbose 2 >> -mat_pastix_threadnbr 1 >> >> Giang >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 5 19:01:09 2019 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 5 Nov 2019 20:01:09 -0500 Subject: [petsc-users] solve problem with pastix In-Reply-To: References: Message-ID: I have no idea. That is a good question for the PasTix list. Thanks, Matt On Tue, Nov 5, 2019 at 5:32 PM hg wrote: > Should thread affinity be invoked? I set -mat_pastix_threadnbr 1 and also > OMP_NUM_THREADS to 1 > > Giang > > > On Tue, Nov 5, 2019 at 10:50 PM Matthew Knepley wrote: > >> On Tue, Nov 5, 2019 at 4:11 PM hg via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >>> Hello >>> >>> I got crashed when using Pastix as solver for KSP. The error message >>> looks like: >>> >>> .... >>> NUMBER of BUBBLE 1 >>> COEFMAX 1735566 CPFTMAX 0 BPFTMAX 0 NBFTMAX 0 ARFTMAX 0 >>> ** End of Partition & Distribution phase ** >>> Time to analyze 0.225 s >>> Number of nonzeros in factorized matrix 708784076 >>> Fill-in 12.2337 >>> Number of operations (LU) 2.80185e+12 >>> Prediction Time to factorize (AMD 6180 MKL) 394 s >>> 0 : SolverMatrix size (without coefficients) 32.4 MB >>> 0 : Number of nonzeros (local block structure) 365309391 >>> Numerical Factorization (LU) : >>> 0 : Internal CSC size 1.08 GB >>> Time to fill internal csc 6.66 s >>> --- Sopalin : Allocation de la structure globale --- >>> --- Fin Sopalin Init --- >>> --- Initialisation des tableaux globaux --- >>> sched_setaffinity: Invalid argument >>> [node083:165071] *** Process received signal *** >>> [node083:165071] Signal: Aborted (6) >>> [node083:165071] Signal code: (-6) >>> [node083:165071] [ 0] /lib64/libpthread.so.0(+0xf680)[0x2b8081845680] >>> [node083:165071] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b8082191207] >>> [node083:165071] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b80821928f8] >>> [node083:165071] [ 3] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_comm+0x0)[0x2b80a4124c9d] >>> [node083:165071] [ 4] Launching 1 threads (1 commputation, 0 >>> communication, 0 out-of-core) >>> --- Sopalin : Local structure allocation --- >>> >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_sopalin_init_smp+0x29b)[0x2b80a40c39d2] >>> [node083:165071] [ 5] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_smp+0x68)[0x2b80a40cf4c2] >>> [node083:165071] [ 6] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_thread+0x4ba)[0x2b80a4124a31] >>> [node083:165071] [ 7] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_thread+0x94)[0x2b80a40d6170] >>> [node083:165071] [ 8] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_pastix_task_sopalin+0x5ad)[0x2b80a40b09a2] >>> [node083:165071] [ 9] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(d_pastix+0xa8a)[0x2b80a40b2325] >>> [node083:165071] [10] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0x63927b)[0x2b80a35bf27b] >>> [node083:165071] [11] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(MatLUFactorNumeric+0x19a)[0x2b80a32c7552] >>> [node083:165071] [12] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0xa46c09)[0x2b80a39ccc09] >>> [node083:165071] [13] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(PCSetUp+0x311)[0x2b80a3a8f1a9] >>> [node083:165071] [14] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSetUp+0xbf7)[0x2b80a3b46e81] >>> [node083:165071] [15] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSolve+0x210)[0x2b80a3b4746e] >>> >>> Does anyone have an idea what is the problem and how to fix it? The >>> PETSc parameters I used are as below: >>> >> >> It looks like PasTix is having trouble setting the thread affinity: >> >> sched_setaffinity: Invalid argument >> >> so it may be your build of PasTix. >> >> Thanks, >> >> Matt >> >> >>> -pc_type lu >>> -pc_factor_mat_solver_package pastix >>> -mat_pastix_verbose 2 >>> -mat_pastix_threadnbr 1 >>> >>> Giang >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Nov 5 21:36:59 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 6 Nov 2019 03:36:59 +0000 Subject: [petsc-users] solve problem with pastix In-Reply-To: References: Message-ID: <937C0613-989A-4DD6-8A33-BB947D5E7DEB@anl.gov> Google finds this https://gforge.inria.fr/forum/forum.php?thread_id=32824&forum_id=599&group_id=186 > On Nov 5, 2019, at 7:01 PM, Matthew Knepley via petsc-users wrote: > > I have no idea. That is a good question for the PasTix list. > > Thanks, > > Matt > > On Tue, Nov 5, 2019 at 5:32 PM hg wrote: > Should thread affinity be invoked? I set -mat_pastix_threadnbr 1 and also OMP_NUM_THREADS to 1 > > Giang > > > On Tue, Nov 5, 2019 at 10:50 PM Matthew Knepley wrote: > On Tue, Nov 5, 2019 at 4:11 PM hg via petsc-users wrote: > Hello > > I got crashed when using Pastix as solver for KSP. The error message looks like: > > .... > NUMBER of BUBBLE 1 > COEFMAX 1735566 CPFTMAX 0 BPFTMAX 0 NBFTMAX 0 ARFTMAX 0 > ** End of Partition & Distribution phase ** > Time to analyze 0.225 s > Number of nonzeros in factorized matrix 708784076 > Fill-in 12.2337 > Number of operations (LU) 2.80185e+12 > Prediction Time to factorize (AMD 6180 MKL) 394 s > 0 : SolverMatrix size (without coefficients) 32.4 MB > 0 : Number of nonzeros (local block structure) 365309391 > Numerical Factorization (LU) : > 0 : Internal CSC size 1.08 GB > Time to fill internal csc 6.66 s > --- Sopalin : Allocation de la structure globale --- > --- Fin Sopalin Init --- > --- Initialisation des tableaux globaux --- > sched_setaffinity: Invalid argument > [node083:165071] *** Process received signal *** > [node083:165071] Signal: Aborted (6) > [node083:165071] Signal code: (-6) > [node083:165071] [ 0] /lib64/libpthread.so.0(+0xf680)[0x2b8081845680] > [node083:165071] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b8082191207] > [node083:165071] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b80821928f8] > [node083:165071] [ 3] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_comm+0x0)[0x2b80a4124c9d] > [node083:165071] [ 4] Launching 1 threads (1 commputation, 0 communication, 0 out-of-core) > --- Sopalin : Local structure allocation --- > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_sopalin_init_smp+0x29b)[0x2b80a40c39d2] > [node083:165071] [ 5] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_smp+0x68)[0x2b80a40cf4c2] > [node083:165071] [ 6] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_thread+0x4ba)[0x2b80a4124a31] > [node083:165071] [ 7] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_thread+0x94)[0x2b80a40d6170] > [node083:165071] [ 8] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_pastix_task_sopalin+0x5ad)[0x2b80a40b09a2] > [node083:165071] [ 9] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(d_pastix+0xa8a)[0x2b80a40b2325] > [node083:165071] [10] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0x63927b)[0x2b80a35bf27b] > [node083:165071] [11] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(MatLUFactorNumeric+0x19a)[0x2b80a32c7552] > [node083:165071] [12] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0xa46c09)[0x2b80a39ccc09] > [node083:165071] [13] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(PCSetUp+0x311)[0x2b80a3a8f1a9] > [node083:165071] [14] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSetUp+0xbf7)[0x2b80a3b46e81] > [node083:165071] [15] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSolve+0x210)[0x2b80a3b4746e] > > Does anyone have an idea what is the problem and how to fix it? The PETSc parameters I used are as below: > > It looks like PasTix is having trouble setting the thread affinity: > > sched_setaffinity: Invalid argument > > so it may be your build of PasTix. > > Thanks, > > Matt > > -pc_type lu > -pc_factor_mat_solver_package pastix > -mat_pastix_verbose 2 > -mat_pastix_threadnbr 1 > > Giang > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From hgbk2008 at gmail.com Wed Nov 6 03:12:53 2019 From: hgbk2008 at gmail.com (hg) Date: Wed, 6 Nov 2019 10:12:53 +0100 Subject: [petsc-users] solve problem with pastix In-Reply-To: <937C0613-989A-4DD6-8A33-BB947D5E7DEB@anl.gov> References: <937C0613-989A-4DD6-8A33-BB947D5E7DEB@anl.gov> Message-ID: sched_setaffinity: Invalid argument only happens when I launch the job with sbatch. Running without scheduler is fine. I think this has something to do with pastix. Giang On Wed, Nov 6, 2019 at 4:37 AM Smith, Barry F. wrote: > > Google finds this > https://gforge.inria.fr/forum/forum.php?thread_id=32824&forum_id=599&group_id=186 > > > > > On Nov 5, 2019, at 7:01 PM, Matthew Knepley via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > I have no idea. That is a good question for the PasTix list. > > > > Thanks, > > > > Matt > > > > On Tue, Nov 5, 2019 at 5:32 PM hg wrote: > > Should thread affinity be invoked? I set -mat_pastix_threadnbr 1 and > also OMP_NUM_THREADS to 1 > > > > Giang > > > > > > On Tue, Nov 5, 2019 at 10:50 PM Matthew Knepley > wrote: > > On Tue, Nov 5, 2019 at 4:11 PM hg via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hello > > > > I got crashed when using Pastix as solver for KSP. The error message > looks like: > > > > .... > > NUMBER of BUBBLE 1 > > COEFMAX 1735566 CPFTMAX 0 BPFTMAX 0 NBFTMAX 0 ARFTMAX 0 > > ** End of Partition & Distribution phase ** > > Time to analyze 0.225 s > > Number of nonzeros in factorized matrix 708784076 > > Fill-in 12.2337 > > Number of operations (LU) 2.80185e+12 > > Prediction Time to factorize (AMD 6180 MKL) 394 s > > 0 : SolverMatrix size (without coefficients) 32.4 MB > > 0 : Number of nonzeros (local block structure) 365309391 > > Numerical Factorization (LU) : > > 0 : Internal CSC size 1.08 GB > > Time to fill internal csc 6.66 s > > --- Sopalin : Allocation de la structure globale --- > > --- Fin Sopalin Init --- > > --- Initialisation des tableaux globaux --- > > sched_setaffinity: Invalid argument > > [node083:165071] *** Process received signal *** > > [node083:165071] Signal: Aborted (6) > > [node083:165071] Signal code: (-6) > > [node083:165071] [ 0] /lib64/libpthread.so.0(+0xf680)[0x2b8081845680] > > [node083:165071] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b8082191207] > > [node083:165071] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b80821928f8] > > [node083:165071] [ 3] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_comm+0x0)[0x2b80a4124c9d] > > [node083:165071] [ 4] Launching 1 threads (1 commputation, 0 > communication, 0 out-of-core) > > --- Sopalin : Local structure allocation --- > > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_sopalin_init_smp+0x29b)[0x2b80a40c39d2] > > [node083:165071] [ 5] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_smp+0x68)[0x2b80a40cf4c2] > > [node083:165071] [ 6] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_thread+0x4ba)[0x2b80a4124a31] > > [node083:165071] [ 7] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_thread+0x94)[0x2b80a40d6170] > > [node083:165071] [ 8] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_pastix_task_sopalin+0x5ad)[0x2b80a40b09a2] > > [node083:165071] [ 9] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(d_pastix+0xa8a)[0x2b80a40b2325] > > [node083:165071] [10] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0x63927b)[0x2b80a35bf27b] > > [node083:165071] [11] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(MatLUFactorNumeric+0x19a)[0x2b80a32c7552] > > [node083:165071] [12] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0xa46c09)[0x2b80a39ccc09] > > [node083:165071] [13] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(PCSetUp+0x311)[0x2b80a3a8f1a9] > > [node083:165071] [14] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSetUp+0xbf7)[0x2b80a3b46e81] > > [node083:165071] [15] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSolve+0x210)[0x2b80a3b4746e] > > > > Does anyone have an idea what is the problem and how to fix it? The > PETSc parameters I used are as below: > > > > It looks like PasTix is having trouble setting the thread affinity: > > > > sched_setaffinity: Invalid argument > > > > so it may be your build of PasTix. > > > > Thanks, > > > > Matt > > > > -pc_type lu > > -pc_factor_mat_solver_package pastix > > -mat_pastix_verbose 2 > > -mat_pastix_threadnbr 1 > > > > Giang > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Wed Nov 6 03:40:06 2019 From: hgbk2008 at gmail.com (hg) Date: Wed, 6 Nov 2019 10:40:06 +0100 Subject: [petsc-users] solve problem with pastix In-Reply-To: References: <937C0613-989A-4DD6-8A33-BB947D5E7DEB@anl.gov> Message-ID: Look into arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c I saw something like: #ifdef HAVE_OLD_SCHED_SETAFFINITY if(sched_setaffinity(0,&mask) < 0) #else /* HAVE_OLD_SCHED_SETAFFINITY */ if(sched_setaffinity(0,sizeof(mask),&mask) < 0) #endif /* HAVE_OLD_SCHED_SETAFFINITY */ { perror("sched_setaffinity"); EXIT(MOD_SOPALIN, INTERNAL_ERR); } Is there possibility that Petsc turn on HAVE_OLD_SCHED_SETAFFINITY during compilation? May I know how to trigger re-compilation of external packages with petsc? I may go in there and check what's going on. Giang On Wed, Nov 6, 2019 at 10:12 AM hg wrote: > sched_setaffinity: Invalid argument only happens when I launch the job > with sbatch. Running without scheduler is fine. I think this has something > to do with pastix. > > Giang > > > On Wed, Nov 6, 2019 at 4:37 AM Smith, Barry F. wrote: > >> >> Google finds this >> https://gforge.inria.fr/forum/forum.php?thread_id=32824&forum_id=599&group_id=186 >> >> >> >> > On Nov 5, 2019, at 7:01 PM, Matthew Knepley via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> > >> > I have no idea. That is a good question for the PasTix list. >> > >> > Thanks, >> > >> > Matt >> > >> > On Tue, Nov 5, 2019 at 5:32 PM hg wrote: >> > Should thread affinity be invoked? I set -mat_pastix_threadnbr 1 and >> also OMP_NUM_THREADS to 1 >> > >> > Giang >> > >> > >> > On Tue, Nov 5, 2019 at 10:50 PM Matthew Knepley >> wrote: >> > On Tue, Nov 5, 2019 at 4:11 PM hg via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> > Hello >> > >> > I got crashed when using Pastix as solver for KSP. The error message >> looks like: >> > >> > .... >> > NUMBER of BUBBLE 1 >> > COEFMAX 1735566 CPFTMAX 0 BPFTMAX 0 NBFTMAX 0 ARFTMAX 0 >> > ** End of Partition & Distribution phase ** >> > Time to analyze 0.225 s >> > Number of nonzeros in factorized matrix 708784076 >> > Fill-in 12.2337 >> > Number of operations (LU) 2.80185e+12 >> > Prediction Time to factorize (AMD 6180 MKL) 394 s >> > 0 : SolverMatrix size (without coefficients) 32.4 MB >> > 0 : Number of nonzeros (local block structure) 365309391 >> > Numerical Factorization (LU) : >> > 0 : Internal CSC size 1.08 GB >> > Time to fill internal csc 6.66 s >> > --- Sopalin : Allocation de la structure globale --- >> > --- Fin Sopalin Init --- >> > --- Initialisation des tableaux globaux --- >> > sched_setaffinity: Invalid argument >> > [node083:165071] *** Process received signal *** >> > [node083:165071] Signal: Aborted (6) >> > [node083:165071] Signal code: (-6) >> > [node083:165071] [ 0] /lib64/libpthread.so.0(+0xf680)[0x2b8081845680] >> > [node083:165071] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b8082191207] >> > [node083:165071] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b80821928f8] >> > [node083:165071] [ 3] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_comm+0x0)[0x2b80a4124c9d] >> > [node083:165071] [ 4] Launching 1 threads (1 commputation, 0 >> communication, 0 out-of-core) >> > --- Sopalin : Local structure allocation --- >> > >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_sopalin_init_smp+0x29b)[0x2b80a40c39d2] >> > [node083:165071] [ 5] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_smp+0x68)[0x2b80a40cf4c2] >> > [node083:165071] [ 6] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_thread+0x4ba)[0x2b80a4124a31] >> > [node083:165071] [ 7] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_thread+0x94)[0x2b80a40d6170] >> > [node083:165071] [ 8] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_pastix_task_sopalin+0x5ad)[0x2b80a40b09a2] >> > [node083:165071] [ 9] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(d_pastix+0xa8a)[0x2b80a40b2325] >> > [node083:165071] [10] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0x63927b)[0x2b80a35bf27b] >> > [node083:165071] [11] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(MatLUFactorNumeric+0x19a)[0x2b80a32c7552] >> > [node083:165071] [12] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0xa46c09)[0x2b80a39ccc09] >> > [node083:165071] [13] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(PCSetUp+0x311)[0x2b80a3a8f1a9] >> > [node083:165071] [14] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSetUp+0xbf7)[0x2b80a3b46e81] >> > [node083:165071] [15] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSolve+0x210)[0x2b80a3b4746e] >> > >> > Does anyone have an idea what is the problem and how to fix it? The >> PETSc parameters I used are as below: >> > >> > It looks like PasTix is having trouble setting the thread affinity: >> > >> > sched_setaffinity: Invalid argument >> > >> > so it may be your build of PasTix. >> > >> > Thanks, >> > >> > Matt >> > >> > -pc_type lu >> > -pc_factor_mat_solver_package pastix >> > -mat_pastix_verbose 2 >> > -mat_pastix_threadnbr 1 >> > >> > Giang >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > -- Norbert Wiener >> > >> > https://www.cse.buffalo.edu/~knepley/ >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > -- Norbert Wiener >> > >> > https://www.cse.buffalo.edu/~knepley/ >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Nov 6 04:02:58 2019 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 6 Nov 2019 05:02:58 -0500 Subject: [petsc-users] solve problem with pastix In-Reply-To: References: <937C0613-989A-4DD6-8A33-BB947D5E7DEB@anl.gov> Message-ID: On Wed, Nov 6, 2019 at 4:40 AM hg wrote: > Look into > arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c > I saw something like: > > #ifdef HAVE_OLD_SCHED_SETAFFINITY > if(sched_setaffinity(0,&mask) < 0) > #else /* HAVE_OLD_SCHED_SETAFFINITY */ > if(sched_setaffinity(0,sizeof(mask),&mask) < 0) > #endif /* HAVE_OLD_SCHED_SETAFFINITY */ > { > perror("sched_setaffinity"); > EXIT(MOD_SOPALIN, INTERNAL_ERR); > } > > Is there possibility that Petsc turn on HAVE_OLD_SCHED_SETAFFINITY during > compilation? > > May I know how to trigger re-compilation of external packages with petsc? > I may go in there and check what's going on. > If we built it during configure, then you can just go to $PETSC_DIR/$PETSC_ARCH/externalpackages/*pastix*/ and rebuild/install it to test. If you want configure to do it, you have to delete $PETSC_DIR/$PETSC_ARCH/lib/petsc/conf/pkg.conf.pastix and reconfigure. Thanks, Matt > Giang > > > On Wed, Nov 6, 2019 at 10:12 AM hg wrote: > >> sched_setaffinity: Invalid argument only happens when I launch the job >> with sbatch. Running without scheduler is fine. I think this has something >> to do with pastix. >> >> Giang >> >> >> On Wed, Nov 6, 2019 at 4:37 AM Smith, Barry F. >> wrote: >> >>> >>> Google finds this >>> https://gforge.inria.fr/forum/forum.php?thread_id=32824&forum_id=599&group_id=186 >>> >>> >>> >>> > On Nov 5, 2019, at 7:01 PM, Matthew Knepley via petsc-users < >>> petsc-users at mcs.anl.gov> wrote: >>> > >>> > I have no idea. That is a good question for the PasTix list. >>> > >>> > Thanks, >>> > >>> > Matt >>> > >>> > On Tue, Nov 5, 2019 at 5:32 PM hg wrote: >>> > Should thread affinity be invoked? I set -mat_pastix_threadnbr 1 and >>> also OMP_NUM_THREADS to 1 >>> > >>> > Giang >>> > >>> > >>> > On Tue, Nov 5, 2019 at 10:50 PM Matthew Knepley >>> wrote: >>> > On Tue, Nov 5, 2019 at 4:11 PM hg via petsc-users < >>> petsc-users at mcs.anl.gov> wrote: >>> > Hello >>> > >>> > I got crashed when using Pastix as solver for KSP. The error message >>> looks like: >>> > >>> > .... >>> > NUMBER of BUBBLE 1 >>> > COEFMAX 1735566 CPFTMAX 0 BPFTMAX 0 NBFTMAX 0 ARFTMAX 0 >>> > ** End of Partition & Distribution phase ** >>> > Time to analyze 0.225 s >>> > Number of nonzeros in factorized matrix 708784076 >>> > Fill-in 12.2337 >>> > Number of operations (LU) 2.80185e+12 >>> > Prediction Time to factorize (AMD 6180 MKL) 394 s >>> > 0 : SolverMatrix size (without coefficients) 32.4 MB >>> > 0 : Number of nonzeros (local block structure) 365309391 >>> > Numerical Factorization (LU) : >>> > 0 : Internal CSC size 1.08 GB >>> > Time to fill internal csc 6.66 s >>> > --- Sopalin : Allocation de la structure globale --- >>> > --- Fin Sopalin Init --- >>> > --- Initialisation des tableaux globaux --- >>> > sched_setaffinity: Invalid argument >>> > [node083:165071] *** Process received signal *** >>> > [node083:165071] Signal: Aborted (6) >>> > [node083:165071] Signal code: (-6) >>> > [node083:165071] [ 0] /lib64/libpthread.so.0(+0xf680)[0x2b8081845680] >>> > [node083:165071] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b8082191207] >>> > [node083:165071] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b80821928f8] >>> > [node083:165071] [ 3] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_comm+0x0)[0x2b80a4124c9d] >>> > [node083:165071] [ 4] Launching 1 threads (1 commputation, 0 >>> communication, 0 out-of-core) >>> > --- Sopalin : Local structure allocation --- >>> > >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_sopalin_init_smp+0x29b)[0x2b80a40c39d2] >>> > [node083:165071] [ 5] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_smp+0x68)[0x2b80a40cf4c2] >>> > [node083:165071] [ 6] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_thread+0x4ba)[0x2b80a4124a31] >>> > [node083:165071] [ 7] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_thread+0x94)[0x2b80a40d6170] >>> > [node083:165071] [ 8] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_pastix_task_sopalin+0x5ad)[0x2b80a40b09a2] >>> > [node083:165071] [ 9] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(d_pastix+0xa8a)[0x2b80a40b2325] >>> > [node083:165071] [10] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0x63927b)[0x2b80a35bf27b] >>> > [node083:165071] [11] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(MatLUFactorNumeric+0x19a)[0x2b80a32c7552] >>> > [node083:165071] [12] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0xa46c09)[0x2b80a39ccc09] >>> > [node083:165071] [13] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(PCSetUp+0x311)[0x2b80a3a8f1a9] >>> > [node083:165071] [14] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSetUp+0xbf7)[0x2b80a3b46e81] >>> > [node083:165071] [15] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSolve+0x210)[0x2b80a3b4746e] >>> > >>> > Does anyone have an idea what is the problem and how to fix it? The >>> PETSc parameters I used are as below: >>> > >>> > It looks like PasTix is having trouble setting the thread affinity: >>> > >>> > sched_setaffinity: Invalid argument >>> > >>> > so it may be your build of PasTix. >>> > >>> > Thanks, >>> > >>> > Matt >>> > >>> > -pc_type lu >>> > -pc_factor_mat_solver_package pastix >>> > -mat_pastix_verbose 2 >>> > -mat_pastix_threadnbr 1 >>> > >>> > Giang >>> > >>> > >>> > >>> > -- >>> > What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> > -- Norbert Wiener >>> > >>> > https://www.cse.buffalo.edu/~knepley/ >>> > >>> > >>> > -- >>> > What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> > -- Norbert Wiener >>> > >>> > https://www.cse.buffalo.edu/~knepley/ >>> >>> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Nov 6 09:52:20 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 6 Nov 2019 15:52:20 +0000 Subject: [petsc-users] solve problem with pastix In-Reply-To: References: <937C0613-989A-4DD6-8A33-BB947D5E7DEB@anl.gov> Message-ID: <1E28761A-883F-4B66-9BA8-8367881D5BCB@mcs.anl.gov> You can also just look at configure.log where it will show the calling sequence of how PETSc configured and built Pastix. The recipe is in config/BuildSystem/config/packages/PaStiX.py we don't monkey with low level things like the affinity of external packages. My guess is that your cluster system has inconsistent parts related to this, that one tool works and another does not indicates they are inconsistent with respect to each other in what they expect. Barry > On Nov 6, 2019, at 4:02 AM, Matthew Knepley wrote: > > On Wed, Nov 6, 2019 at 4:40 AM hg wrote: > Look into arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c I saw something like: > > #ifdef HAVE_OLD_SCHED_SETAFFINITY > if(sched_setaffinity(0,&mask) < 0) > #else /* HAVE_OLD_SCHED_SETAFFINITY */ > if(sched_setaffinity(0,sizeof(mask),&mask) < 0) > #endif /* HAVE_OLD_SCHED_SETAFFINITY */ > { > perror("sched_setaffinity"); > EXIT(MOD_SOPALIN, INTERNAL_ERR); > } > > Is there possibility that Petsc turn on HAVE_OLD_SCHED_SETAFFINITY during compilation? > > May I know how to trigger re-compilation of external packages with petsc? I may go in there and check what's going on. > > If we built it during configure, then you can just go to > > $PETSC_DIR/$PETSC_ARCH/externalpackages/*pastix*/ > > and rebuild/install it to test. If you want configure to do it, you have to delete > > $PETSC_DIR/$PETSC_ARCH/lib/petsc/conf/pkg.conf.pastix > > and reconfigure. > > Thanks, > > Matt > > Giang > > > On Wed, Nov 6, 2019 at 10:12 AM hg wrote: > sched_setaffinity: Invalid argument only happens when I launch the job with sbatch. Running without scheduler is fine. I think this has something to do with pastix. > > Giang > > > On Wed, Nov 6, 2019 at 4:37 AM Smith, Barry F. wrote: > > Google finds this https://gforge.inria.fr/forum/forum.php?thread_id=32824&forum_id=599&group_id=186 > > > > > On Nov 5, 2019, at 7:01 PM, Matthew Knepley via petsc-users wrote: > > > > I have no idea. That is a good question for the PasTix list. > > > > Thanks, > > > > Matt > > > > On Tue, Nov 5, 2019 at 5:32 PM hg wrote: > > Should thread affinity be invoked? I set -mat_pastix_threadnbr 1 and also OMP_NUM_THREADS to 1 > > > > Giang > > > > > > On Tue, Nov 5, 2019 at 10:50 PM Matthew Knepley wrote: > > On Tue, Nov 5, 2019 at 4:11 PM hg via petsc-users wrote: > > Hello > > > > I got crashed when using Pastix as solver for KSP. The error message looks like: > > > > .... > > NUMBER of BUBBLE 1 > > COEFMAX 1735566 CPFTMAX 0 BPFTMAX 0 NBFTMAX 0 ARFTMAX 0 > > ** End of Partition & Distribution phase ** > > Time to analyze 0.225 s > > Number of nonzeros in factorized matrix 708784076 > > Fill-in 12.2337 > > Number of operations (LU) 2.80185e+12 > > Prediction Time to factorize (AMD 6180 MKL) 394 s > > 0 : SolverMatrix size (without coefficients) 32.4 MB > > 0 : Number of nonzeros (local block structure) 365309391 > > Numerical Factorization (LU) : > > 0 : Internal CSC size 1.08 GB > > Time to fill internal csc 6.66 s > > --- Sopalin : Allocation de la structure globale --- > > --- Fin Sopalin Init --- > > --- Initialisation des tableaux globaux --- > > sched_setaffinity: Invalid argument > > [node083:165071] *** Process received signal *** > > [node083:165071] Signal: Aborted (6) > > [node083:165071] Signal code: (-6) > > [node083:165071] [ 0] /lib64/libpthread.so.0(+0xf680)[0x2b8081845680] > > [node083:165071] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b8082191207] > > [node083:165071] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b80821928f8] > > [node083:165071] [ 3] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_comm+0x0)[0x2b80a4124c9d] > > [node083:165071] [ 4] Launching 1 threads (1 commputation, 0 communication, 0 out-of-core) > > --- Sopalin : Local structure allocation --- > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_sopalin_init_smp+0x29b)[0x2b80a40c39d2] > > [node083:165071] [ 5] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_smp+0x68)[0x2b80a40cf4c2] > > [node083:165071] [ 6] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_thread+0x4ba)[0x2b80a4124a31] > > [node083:165071] [ 7] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_thread+0x94)[0x2b80a40d6170] > > [node083:165071] [ 8] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_pastix_task_sopalin+0x5ad)[0x2b80a40b09a2] > > [node083:165071] [ 9] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(d_pastix+0xa8a)[0x2b80a40b2325] > > [node083:165071] [10] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0x63927b)[0x2b80a35bf27b] > > [node083:165071] [11] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(MatLUFactorNumeric+0x19a)[0x2b80a32c7552] > > [node083:165071] [12] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0xa46c09)[0x2b80a39ccc09] > > [node083:165071] [13] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(PCSetUp+0x311)[0x2b80a3a8f1a9] > > [node083:165071] [14] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSetUp+0xbf7)[0x2b80a3b46e81] > > [node083:165071] [15] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSolve+0x210)[0x2b80a3b4746e] > > > > Does anyone have an idea what is the problem and how to fix it? The PETSc parameters I used are as below: > > > > It looks like PasTix is having trouble setting the thread affinity: > > > > sched_setaffinity: Invalid argument > > > > so it may be your build of PasTix. > > > > Thanks, > > > > Matt > > > > -pc_type lu > > -pc_factor_mat_solver_package pastix > > -mat_pastix_verbose 2 > > -mat_pastix_threadnbr 1 > > > > Giang > > > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From hgbk2008 at gmail.com Wed Nov 6 17:18:18 2019 From: hgbk2008 at gmail.com (hg) Date: Thu, 7 Nov 2019 00:18:18 +0100 Subject: [petsc-users] solve problem with pastix In-Reply-To: <1E28761A-883F-4B66-9BA8-8367881D5BCB@mcs.anl.gov> References: <937C0613-989A-4DD6-8A33-BB947D5E7DEB@anl.gov> <1E28761A-883F-4B66-9BA8-8367881D5BCB@mcs.anl.gov> Message-ID: Hi Barry Maybe you're right, sched_setaffinity returns EINVAL in my case, Probably the scheduler does not allow the process to bind to thread on its own. Giang On Wed, Nov 6, 2019 at 4:52 PM Smith, Barry F. wrote: > > You can also just look at configure.log where it will show the calling > sequence of how PETSc configured and built Pastix. The recipe is in > config/BuildSystem/config/packages/PaStiX.py we don't monkey with low level > things like the affinity of external packages. My guess is that your > cluster system has inconsistent parts related to this, that one tool works > and another does not indicates they are inconsistent with respect to each > other in what they expect. > > Barry > > > > > > On Nov 6, 2019, at 4:02 AM, Matthew Knepley wrote: > > > > On Wed, Nov 6, 2019 at 4:40 AM hg wrote: > > Look into > arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c > I saw something like: > > > > #ifdef HAVE_OLD_SCHED_SETAFFINITY > > if(sched_setaffinity(0,&mask) < 0) > > #else /* HAVE_OLD_SCHED_SETAFFINITY */ > > if(sched_setaffinity(0,sizeof(mask),&mask) < 0) > > #endif /* HAVE_OLD_SCHED_SETAFFINITY */ > > { > > perror("sched_setaffinity"); > > EXIT(MOD_SOPALIN, INTERNAL_ERR); > > } > > > > Is there possibility that Petsc turn on HAVE_OLD_SCHED_SETAFFINITY > during compilation? > > > > May I know how to trigger re-compilation of external packages with > petsc? I may go in there and check what's going on. > > > > If we built it during configure, then you can just go to > > > > $PETSC_DIR/$PETSC_ARCH/externalpackages/*pastix*/ > > > > and rebuild/install it to test. If you want configure to do it, you have > to delete > > > > $PETSC_DIR/$PETSC_ARCH/lib/petsc/conf/pkg.conf.pastix > > > > and reconfigure. > > > > Thanks, > > > > Matt > > > > Giang > > > > > > On Wed, Nov 6, 2019 at 10:12 AM hg wrote: > > sched_setaffinity: Invalid argument only happens when I launch the job > with sbatch. Running without scheduler is fine. I think this has something > to do with pastix. > > > > Giang > > > > > > On Wed, Nov 6, 2019 at 4:37 AM Smith, Barry F. > wrote: > > > > Google finds this > https://gforge.inria.fr/forum/forum.php?thread_id=32824&forum_id=599&group_id=186 > > > > > > > > > On Nov 5, 2019, at 7:01 PM, Matthew Knepley via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > > > I have no idea. That is a good question for the PasTix list. > > > > > > Thanks, > > > > > > Matt > > > > > > On Tue, Nov 5, 2019 at 5:32 PM hg wrote: > > > Should thread affinity be invoked? I set -mat_pastix_threadnbr 1 and > also OMP_NUM_THREADS to 1 > > > > > > Giang > > > > > > > > > On Tue, Nov 5, 2019 at 10:50 PM Matthew Knepley > wrote: > > > On Tue, Nov 5, 2019 at 4:11 PM hg via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > Hello > > > > > > I got crashed when using Pastix as solver for KSP. The error message > looks like: > > > > > > .... > > > NUMBER of BUBBLE 1 > > > COEFMAX 1735566 CPFTMAX 0 BPFTMAX 0 NBFTMAX 0 ARFTMAX 0 > > > ** End of Partition & Distribution phase ** > > > Time to analyze 0.225 s > > > Number of nonzeros in factorized matrix 708784076 > > > Fill-in 12.2337 > > > Number of operations (LU) 2.80185e+12 > > > Prediction Time to factorize (AMD 6180 MKL) 394 s > > > 0 : SolverMatrix size (without coefficients) 32.4 MB > > > 0 : Number of nonzeros (local block structure) 365309391 > > > Numerical Factorization (LU) : > > > 0 : Internal CSC size 1.08 GB > > > Time to fill internal csc 6.66 s > > > --- Sopalin : Allocation de la structure globale --- > > > --- Fin Sopalin Init --- > > > --- Initialisation des tableaux globaux --- > > > sched_setaffinity: Invalid argument > > > [node083:165071] *** Process received signal *** > > > [node083:165071] Signal: Aborted (6) > > > [node083:165071] Signal code: (-6) > > > [node083:165071] [ 0] /lib64/libpthread.so.0(+0xf680)[0x2b8081845680] > > > [node083:165071] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b8082191207] > > > [node083:165071] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b80821928f8] > > > [node083:165071] [ 3] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_comm+0x0)[0x2b80a4124c9d] > > > [node083:165071] [ 4] Launching 1 threads (1 commputation, 0 > communication, 0 out-of-core) > > > --- Sopalin : Local structure allocation --- > > > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_sopalin_init_smp+0x29b)[0x2b80a40c39d2] > > > [node083:165071] [ 5] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_smp+0x68)[0x2b80a40cf4c2] > > > [node083:165071] [ 6] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_thread+0x4ba)[0x2b80a4124a31] > > > [node083:165071] [ 7] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_thread+0x94)[0x2b80a40d6170] > > > [node083:165071] [ 8] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_pastix_task_sopalin+0x5ad)[0x2b80a40b09a2] > > > [node083:165071] [ 9] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(d_pastix+0xa8a)[0x2b80a40b2325] > > > [node083:165071] [10] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0x63927b)[0x2b80a35bf27b] > > > [node083:165071] [11] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(MatLUFactorNumeric+0x19a)[0x2b80a32c7552] > > > [node083:165071] [12] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0xa46c09)[0x2b80a39ccc09] > > > [node083:165071] [13] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(PCSetUp+0x311)[0x2b80a3a8f1a9] > > > [node083:165071] [14] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSetUp+0xbf7)[0x2b80a3b46e81] > > > [node083:165071] [15] > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSolve+0x210)[0x2b80a3b4746e] > > > > > > Does anyone have an idea what is the problem and how to fix it? The > PETSc parameters I used are as below: > > > > > > It looks like PasTix is having trouble setting the thread affinity: > > > > > > sched_setaffinity: Invalid argument > > > > > > so it may be your build of PasTix. > > > > > > Thanks, > > > > > > Matt > > > > > > -pc_type lu > > > -pc_factor_mat_solver_package pastix > > > -mat_pastix_verbose 2 > > > -mat_pastix_threadnbr 1 > > > > > > Giang > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > > -- Norbert Wiener > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > > -- Norbert Wiener > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.a.hack at utwente.nl Thu Nov 7 05:44:18 2019 From: s.a.hack at utwente.nl (s.a.hack at utwente.nl) Date: Thu, 7 Nov 2019 11:44:18 +0000 Subject: [petsc-users] nondeterministic behavior of MUMPS when filtering out zero rows and columns Message-ID: Hi, I am doing calculations with version 3.12.0 of PETSc. Using the finite-element method, I solve the Maxwell equations on the interior of a 3D domain, coupled with boundary condition auxiliary equations on the boundary of the domain. The auxiliary equations employ auxiliary variables g. For ease of implementation of element matrix assembly, the auxiliary variables g are defined on the entire domain. However, only the basis functions for g with nonzero value at the boundary give nonzero entries in the system matrix. The element matrices hence have the structure [ A B; C D] at the boundary. In the interior the element matrices have the structure [A 0; 0 0]. The degrees of freedom in the system matrix can be ordered by element [u_e1 g_e1 u_e2 g_e2 ?] or by parallel process [u_p1 g_p1 u_p2 g_p2 ?]. To solve the system matrix, I need to filter out zero rows and columns: error = MatFindNonzeroRows(stiffnessMatrix, &nonzeroRows); CHKERRABORT(PETSC_COMM_WORLD, error); error = MatCreateSubMatrix(stiffnessMatrix, nonzeroRows, nonzeroRows, MAT_INITIAL_MATRIX, &stiffnessMatrixSubMatrix); CHKERRABORT(PETSC_COMM_WORLD, error); I solve the system matrix in parallel on multiple nodes connected with InfiniBand. The problem is that the MUMPS solver frequently (nondeterministically) hangs during KSPSolve() (after KSPSetUp() is completed). Running with the options -ksp_view and -info the last printed statement is: [0] VecScatterCreate_SF(): Using StarForest for vector scatter In the calculations where the program does not hang, the calculated solution is correct. The problem doesn?t occur for calculations on a single node, or for calculations with the SuperLU solver (but SuperLU will not allow calculations that are as large). The problem also doesn?t seem to occur for small problems. The problem doesn?t occur either when I put ones on the diagonal, but this is computationally expensive: error = MatFindZeroRows(stiffnessMatrix, &zeroRows); CHKERRABORT(PETSC_COMM_WORLD, error); error = MatZeroRowsColumnsIS(stiffnessMatrix, zeroRows, diagEntry, PETSC_IGNORE, PETSC_IGNORE); CHKERRABORT(PETSC_COMM_WORLD, error); Would you have any ideas on what I could check? Best regards, Sjoerd -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Thu Nov 7 09:28:26 2019 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Thu, 7 Nov 2019 15:28:26 +0000 Subject: [petsc-users] nondeterministic behavior of MUMPS when filtering out zero rows and columns In-Reply-To: References: Message-ID: Run your code with option '-ksp_error_if_not_converged' to get more info. Hong On Thu, Nov 7, 2019 at 5:45 AM s.a.hack--- via petsc-users > wrote: Hi, I am doing calculations with version 3.12.0 of PETSc. Using the finite-element method, I solve the Maxwell equations on the interior of a 3D domain, coupled with boundary condition auxiliary equations on the boundary of the domain. The auxiliary equations employ auxiliary variables g. For ease of implementation of element matrix assembly, the auxiliary variables g are defined on the entire domain. However, only the basis functions for g with nonzero value at the boundary give nonzero entries in the system matrix. The element matrices hence have the structure [ A B; C D] at the boundary. In the interior the element matrices have the structure [A 0; 0 0]. The degrees of freedom in the system matrix can be ordered by element [u_e1 g_e1 u_e2 g_e2 ?] or by parallel process [u_p1 g_p1 u_p2 g_p2 ?]. To solve the system matrix, I need to filter out zero rows and columns: error = MatFindNonzeroRows(stiffnessMatrix, &nonzeroRows); CHKERRABORT(PETSC_COMM_WORLD, error); error = MatCreateSubMatrix(stiffnessMatrix, nonzeroRows, nonzeroRows, MAT_INITIAL_MATRIX, &stiffnessMatrixSubMatrix); CHKERRABORT(PETSC_COMM_WORLD, error); I solve the system matrix in parallel on multiple nodes connected with InfiniBand. The problem is that the MUMPS solver frequently (nondeterministically) hangs during KSPSolve() (after KSPSetUp() is completed). Running with the options -ksp_view and -info the last printed statement is: [0] VecScatterCreate_SF(): Using StarForest for vector scatter In the calculations where the program does not hang, the calculated solution is correct. The problem doesn?t occur for calculations on a single node, or for calculations with the SuperLU solver (but SuperLU will not allow calculations that are as large). The problem also doesn?t seem to occur for small problems. The problem doesn?t occur either when I put ones on the diagonal, but this is computationally expensive: error = MatFindZeroRows(stiffnessMatrix, &zeroRows); CHKERRABORT(PETSC_COMM_WORLD, error); error = MatZeroRowsColumnsIS(stiffnessMatrix, zeroRows, diagEntry, PETSC_IGNORE, PETSC_IGNORE); CHKERRABORT(PETSC_COMM_WORLD, error); Would you have any ideas on what I could check? Best regards, Sjoerd -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Nov 7 10:40:05 2019 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 7 Nov 2019 11:40:05 -0500 Subject: [petsc-users] nondeterministic behavior of MUMPS when filtering out zero rows and columns In-Reply-To: References: Message-ID: On Thu, Nov 7, 2019 at 6:44 AM s.a.hack--- via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi, > > > > I am doing calculations with version 3.12.0 of PETSc. > > Using the finite-element method, I solve the Maxwell equations on the > interior of a 3D domain, coupled with boundary condition auxiliary > equations on the boundary of the domain. The auxiliary equations employ > auxiliary variables g. > > > > For ease of implementation of element matrix assembly, the auxiliary > variables g are defined on the entire domain. However, only the basis > functions for g with nonzero value at the boundary give nonzero entries in > the system matrix. > > > > The element matrices hence have the structure > > [ A B; C D] > > at the boundary. > > > > In the interior the element matrices have the structure > > [A 0; 0 0]. > > > > The degrees of freedom in the system matrix can be ordered by element > [u_e1 g_e1 u_e2 g_e2 ?] or by parallel process [u_p1 g_p1 u_p2 g_p2 ?]. > > > > To solve the system matrix, I need to filter out zero rows and columns: > > error = MatFindNonzeroRows(stiffnessMatrix, &nonzeroRows); > > CHKERRABORT(PETSC_COMM_WORLD, error); > > error = MatCreateSubMatrix(stiffnessMatrix, nonzeroRows, nonzeroRows, > MAT_INITIAL_MATRIX, &stiffnessMatrixSubMatrix); > > CHKERRABORT(PETSC_COMM_WORLD, error); > > > > I solve the system matrix in parallel on multiple nodes connected with > InfiniBand. > > The problem is that the MUMPS solver frequently (nondeterministically) > hangs during KSPSolve() (after KSPSetUp() is completed). > > Running with the options -ksp_view and -info the last printed statement is: > > [0] VecScatterCreate_SF(): Using StarForest for vector scatter > There is a bug in some older MPI implementations. You can try using -vec_assembly_legacy -matstash_legacy to see if you avoid the bug. > In the calculations where the program does not hang, the calculated > solution is correct. > > > > The problem doesn?t occur for calculations on a single node, or for > calculations with the SuperLU solver (but SuperLU will not allow > calculations that are as large). > SuperLU_dist can do large problems. Use --download-superlu_dist > The problem also doesn?t seem to occur for small problems. > > The problem doesn?t occur either when I put ones on the diagonal, but this > is computationally expensive: > > error = MatFindZeroRows(stiffnessMatrix, &zeroRows); > > CHKERRABORT(PETSC_COMM_WORLD, error); > > error = MatZeroRowsColumnsIS(stiffnessMatrix, zeroRows, diagEntry, > PETSC_IGNORE, PETSC_IGNORE); > > CHKERRABORT(PETSC_COMM_WORLD, error); > The two function calls above are expensive? I can you run it with -log_view and send the timing? Thanks, Matt > > > Would you have any ideas on what I could check? > > > > Best regards, > > Sjoerd > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alexander.vonRamm at lrz.de Thu Nov 7 11:11:51 2019 From: Alexander.vonRamm at lrz.de (von Ramm, Alexander) Date: Thu, 7 Nov 2019 17:11:51 +0000 Subject: [petsc-users] DMNetwork - how to interprete the arguments of DMNetworkSetSizes ? Message-ID: <4f6def37121145cbb9ca32e28495f4c6@lrz.de> Hello together, I'm trying to figure out how to create a DMNetwork, but the proper way to set the parameters eludes me (also there is some discrepancy between the manual https://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf (page 166) and the online documenation https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMNetwork/DMNetworkSetSizes.html ). Currently I'm trying to set a up a simple Network with 8 nodes and 7 edges and distribute it over 2 processors. In the call DMNetworkSetSizes does Nsubnet need to be 1 (one global network, without any further subnetworks) or 2 (one subnetwork per processor) (my guess would be the former). My current attempt looks like the following: int main( int argc, char *argv[]) { PetscInitialize(&argc, &argv, NULL, NULL); DM dm; PetscInt NSubnet = 1; PetscInt nV[1] = {8}; PetscInt nE[1] = {7}; PetscInt NsubnetCouple = 0; PetscInt nec[0]; DMNetworkCreate(PETSC_COMM_WORLD, &dm); DMNetworkSetSizes(dm, NSubnet, nV, nE, NsubnetCouple, nec); PetscInt *edgeList; PetscMalloc1(14, &edgeList); edgeList[0] = 0; edgeList[1] = 4; edgeList[2] = 1; edgeList[3] = 4; edgeList[4] = 2; edgeList[5] = 5; edgeList[6] = 3; edgeList[7] = 5; edgeList[8] = 4; edgeList[9] = 6; edgeList[10] = 5; edgeList[11] = 6; edgeList[12] = 6; edgeList[13] = 7; PetscInt *edges[1]; edges[0] = edgeList; DMNetworkSetEdgeList(dm, edges, NULL); DMNetworkLayoutSetUp(dm); return 0; } Except from the Online Documenation I wasn't able to find any information/example where newest version Petsc was used when setting up a DMNetwork. (I found a few examples using older versions, however I could not figure out how these could solved using the newest version). If some could explain to me the the correct interpretation of the parameters of DMNetworkSetSizes ? Any pointers to examples using the newest API would also be much appreciated. Thanks and best Regards, Alex From hzhang at mcs.anl.gov Thu Nov 7 13:53:55 2019 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Thu, 7 Nov 2019 19:53:55 +0000 Subject: [petsc-users] DMNetwork - how to interprete the arguments of DMNetworkSetSizes ? In-Reply-To: <4f6def37121145cbb9ca32e28495f4c6@lrz.de> References: <4f6def37121145cbb9ca32e28495f4c6@lrz.de> Message-ID: Alex, DMNetwork is under active development, thus our manual might not be updated. I'll check it. I'm attaching a paper which might be used as a manual for the latest DMNetwork. For your case, Nsubnet = 1, which is determined by application, not number of processes. There are several examples using DMNetwork in petsc, e.g., under ~petsc, $ git grep DMNetworkCreate * ... src/ksp/ksp/examples/tutorials/network/ex1.c: ierr = DMNetworkCreate(PETSC_COMM_WORLD,&dmnetwork);CHKERRQ(ierr); src/ksp/ksp/examples/tutorials/network/ex1_nest.c: ierr = DMNetworkCreate(PETSC_COMM_WORLD,&networkdm);CHKERRQ(ierr); src/ksp/ksp/examples/tutorials/network/ex2.c: ierr = DMNetworkCreate(PETSC_COMM_WORLD,&networkdm);CHKERRQ(ierr); src/snes/examples/tutorials/network/ex1.c: ierr = DMNetworkCreate(PETSC_COMM_WORLD,&networkdm);CHKERRQ(ierr); src/snes/examples/tutorials/network/power/power.c: ierr = DMNetworkCreate(PETSC_COMM_WORLD,&networkdm);CHKERRQ(ierr); src/snes/examples/tutorials/network/power/power2.c: ierr = DMNetworkCreate(PETSC_COMM_WORLD,&networkdm);CHKERRQ(ierr); src/snes/examples/tutorials/network/water/water.c: ierr = DMNetworkCreate(PETSC_COMM_WORLD,&networkdm);CHKERRQ(ierr); src/ts/examples/tutorials/network/wash/pipes1.c: ierr = DMNetworkCreate(PETSC_COMM_WORLD,&networkdm);CHKERRQ(ierr); src/ts/examples/tutorials/power_grid/stability_9bus/ex9busdmnetwork.c: ierr = DMNetworkCreate(PETSC_COMM_WORLD,&networkdm);CHKERRQ(ierr); Hong On Thu, Nov 7, 2019 at 11:12 AM von Ramm, Alexander via petsc-users > wrote: Hello together, I'm trying to figure out how to create a DMNetwork, but the proper way to set the parameters eludes me (also there is some discrepancy between the manual https://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf (page 166) and the online documenation https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMNetwork/DMNetworkSetSizes.html ). Currently I'm trying to set a up a simple Network with 8 nodes and 7 edges and distribute it over 2 processors. In the call DMNetworkSetSizes does Nsubnet need to be 1 (one global network, without any further subnetworks) or 2 (one subnetwork per processor) (my guess would be the former). My current attempt looks like the following: int main( int argc, char *argv[]) { PetscInitialize(&argc, &argv, NULL, NULL); DM dm; PetscInt NSubnet = 1; PetscInt nV[1] = {8}; PetscInt nE[1] = {7}; PetscInt NsubnetCouple = 0; PetscInt nec[0]; DMNetworkCreate(PETSC_COMM_WORLD, &dm); DMNetworkSetSizes(dm, NSubnet, nV, nE, NsubnetCouple, nec); PetscInt *edgeList; PetscMalloc1(14, &edgeList); edgeList[0] = 0; edgeList[1] = 4; edgeList[2] = 1; edgeList[3] = 4; edgeList[4] = 2; edgeList[5] = 5; edgeList[6] = 3; edgeList[7] = 5; edgeList[8] = 4; edgeList[9] = 6; edgeList[10] = 5; edgeList[11] = 6; edgeList[12] = 6; edgeList[13] = 7; PetscInt *edges[1]; edges[0] = edgeList; DMNetworkSetEdgeList(dm, edges, NULL); DMNetworkLayoutSetUp(dm); return 0; } Except from the Online Documenation I wasn't able to find any information/example where newest version Petsc was used when setting up a DMNetwork. (I found a few examples using older versions, however I could not figure out how these could solved using the newest version). If some could explain to me the the correct interpretation of the parameters of DMNetworkSetSizes ? Any pointers to examples using the newest API would also be much appreciated. Thanks and best Regards, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dmnetwork-2019.pdf Type: application/pdf Size: 1415801 bytes Desc: dmnetwork-2019.pdf URL: From shrirang.abhyankar at pnnl.gov Thu Nov 7 14:47:17 2019 From: shrirang.abhyankar at pnnl.gov (Abhyankar, Shrirang G) Date: Thu, 7 Nov 2019 20:47:17 +0000 Subject: [petsc-users] DMNetwork - how to interprete the arguments of DMNetworkSetSizes ? In-Reply-To: <4f6def37121145cbb9ca32e28495f4c6@lrz.de> References: <4f6def37121145cbb9ca32e28495f4c6@lrz.de> Message-ID: <6FE8D7D5-16FE-4172-A40D-5395FB416E72@pnnl.gov> From: petsc-users on behalf of "von Ramm, Alexander via petsc-users" Reply-To: "von Ramm, Alexander" Date: Thursday, November 7, 2019 at 11:12 AM To: "petsc-users at mcs.anl.gov" Subject: [petsc-users] DMNetwork - how to interprete the arguments of DMNetworkSetSizes ? Hello together, I'm trying to figure out how to create a DMNetwork, but the proper way to set the parameters eludes me (also there is some discrepancy between the manual https://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf (page 166) and the online documenation https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMNetwork/DMNetworkSetSizes.html ). Thanks for pointing out the discrepancy. We?ll update the user manual. Currently I'm trying to set a up a simple Network with 8 nodes and 7 edges and distribute it over 2 processors. In the call DMNetworkSetSizes does Nsubnet need to be 1 (one global network, without any further subnetworks) or 2 (one subnetwork per processor) (my guess would be the former). Yes, Nsubnet = 1 for your application since you just have a single network, i.e., no subnetworks. My current attempt looks like the following: int main( int argc, char *argv[]) { PetscInitialize(&argc, &argv, NULL, NULL); DM dm; PetscInt NSubnet = 1; PetscInt nV[1] = {8}; PetscInt nE[1] = {7}; PetscInt NsubnetCouple = 0; PetscInt nec[0]; You do not need nec. You can simply set it to NULL. DMNetworkCreate(PETSC_COMM_WORLD, &dm); DMNetworkSetSizes(dm, NSubnet, nV, nE, NsubnetCouple, nec); DMNetworkSetSizes(dm, NSubnet, nV, nE, NsubnetCouple, NULL); PetscInt *edgeList; PetscMalloc1(14, &edgeList); edgeList[0] = 0; edgeList[1] = 4; edgeList[2] = 1; edgeList[3] = 4; edgeList[4] = 2; edgeList[5] = 5; edgeList[6] = 3; edgeList[7] = 5; edgeList[8] = 4; edgeList[9] = 6; edgeList[10] = 5; edgeList[11] = 6; edgeList[12] = 6; edgeList[13] = 7; PetscInt *edges[1]; edges[0] = edgeList; DMNetworkSetEdgeList(dm, edges, NULL); DMNetworkLayoutSetUp(dm); return 0; } Except from the Online Documenation I wasn't able to find any information/example where newest version Petsc was used when setting up a DMNetwork. (I found a few examples using older versions, however I could not figure out how these could solved using the newest version). If some could explain to me the the correct interpretation of the parameters of DMNetworkSetSizes ? Any pointers to examples using the newest API would also be much appreciated. Most of the examples with DMNetwork have a single network (Nsubnet = 1), except src/snes/examples/tutorials/network/ex1.c. This is a water + electric network simulation that has two subnetworks. Shri Thanks and best Regards, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Nov 8 00:05:10 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 8 Nov 2019 06:05:10 +0000 Subject: [petsc-users] nondeterministic behavior of MUMPS when filtering out zero rows and columns In-Reply-To: References: Message-ID: Make sure you have the latest PETSc and MUMPS installed; they have fixed bugs in MUMPs over time. Hanging locations are best found with a debugger; there is really no other way. If you have a parallel debugger like DDT use it. If you don't you can use the PETSc option -start_in_debugger to have PETSc start a line debugger in an xterm for each process. Type cont in each window and when it "hangs" do control C in the windows and type bt It will show the traceback where it is hanging on each process. Send us the output. Barry Another approach that avoids the debugger is to send to one of the MPI processes a signal, term would be a good one to use. If you are luck that process with catch the signal and print a traceback of where it is when the signal occurred. If you are super lucky you can send the signal to several processes and get several tracebacks. > On Nov 7, 2019, at 5:44 AM, s.a.hack--- via petsc-users wrote: > > Hi, > > I am doing calculations with version 3.12.0 of PETSc. > Using the finite-element method, I solve the Maxwell equations on the interior of a 3D domain, coupled with boundary condition auxiliary equations on the boundary of the domain. The auxiliary equations employ auxiliary variables g. > > For ease of implementation of element matrix assembly, the auxiliary variables g are defined on the entire domain. However, only the basis functions for g with nonzero value at the boundary give nonzero entries in the system matrix. > > The element matrices hence have the structure > [ A B; C D] > at the boundary. > > In the interior the element matrices have the structure > [A 0; 0 0]. > > The degrees of freedom in the system matrix can be ordered by element [u_e1 g_e1 u_e2 g_e2 ?] or by parallel process [u_p1 g_p1 u_p2 g_p2 ?]. > > To solve the system matrix, I need to filter out zero rows and columns: > error = MatFindNonzeroRows(stiffnessMatrix, &nonzeroRows); > CHKERRABORT(PETSC_COMM_WORLD, error); > error = MatCreateSubMatrix(stiffnessMatrix, nonzeroRows, nonzeroRows, MAT_INITIAL_MATRIX, &stiffnessMatrixSubMatrix); > CHKERRABORT(PETSC_COMM_WORLD, error); > > I solve the system matrix in parallel on multiple nodes connected with InfiniBand. > The problem is that the MUMPS solver frequently (nondeterministically) hangs during KSPSolve() (after KSPSetUp() is completed). > Running with the options -ksp_view and -info the last printed statement is: > [0] VecScatterCreate_SF(): Using StarForest for vector scatter > In the calculations where the program does not hang, the calculated solution is correct. > > The problem doesn?t occur for calculations on a single node, or for calculations with the SuperLU solver (but SuperLU will not allow calculations that are as large). > The problem also doesn?t seem to occur for small problems. > The problem doesn?t occur either when I put ones on the diagonal, but this is computationally expensive: > error = MatFindZeroRows(stiffnessMatrix, &zeroRows); > CHKERRABORT(PETSC_COMM_WORLD, error); > error = MatZeroRowsColumnsIS(stiffnessMatrix, zeroRows, diagEntry, PETSC_IGNORE, PETSC_IGNORE); > CHKERRABORT(PETSC_COMM_WORLD, error); > > Would you have any ideas on what I could check? > > Best regards, > Sjoerd From Alexander.vonRamm at lrz.de Fri Nov 8 01:06:22 2019 From: Alexander.vonRamm at lrz.de (von Ramm, Alexander) Date: Fri, 8 Nov 2019 07:06:22 +0000 Subject: [petsc-users] DMNetwork - how to interprete the arguments of DMNetworkSetSizes ? In-Reply-To: <6FE8D7D5-16FE-4172-A40D-5395FB416E72@pnnl.gov> References: <4f6def37121145cbb9ca32e28495f4c6@lrz.de>, <6FE8D7D5-16FE-4172-A40D-5395FB416E72@pnnl.gov> Message-ID: Hi Shri, Hi Hong, thanks a lot for the information. This already helps a lot. Best, Alex ________________________________________ From: Abhyankar, Shrirang G Sent: Thursday, November 7, 2019 9:47:17 PM To: von Ramm, Alexander; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] DMNetwork - how to interprete the arguments of DMNetworkSetSizes ? From: petsc-users on behalf of "von Ramm, Alexander via petsc-users" Reply-To: "von Ramm, Alexander" Date: Thursday, November 7, 2019 at 11:12 AM To: "petsc-users at mcs.anl.gov" Subject: [petsc-users] DMNetwork - how to interprete the arguments of DMNetworkSetSizes ? Hello together, I'm trying to figure out how to create a DMNetwork, but the proper way to set the parameters eludes me (also there is some discrepancy between the manual https://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf (page 166) and the online documenation https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMNetwork/DMNetworkSetSizes.html ). Thanks for pointing out the discrepancy. We?ll update the user manual. Currently I'm trying to set a up a simple Network with 8 nodes and 7 edges and distribute it over 2 processors. In the call DMNetworkSetSizes does Nsubnet need to be 1 (one global network, without any further subnetworks) or 2 (one subnetwork per processor) (my guess would be the former). Yes, Nsubnet = 1 for your application since you just have a single network, i.e., no subnetworks. My current attempt looks like the following: int main( int argc, char *argv[]) { PetscInitialize(&argc, &argv, NULL, NULL); DM dm; PetscInt NSubnet = 1; PetscInt nV[1] = {8}; PetscInt nE[1] = {7}; PetscInt NsubnetCouple = 0; PetscInt nec[0]; You do not need nec. You can simply set it to NULL. DMNetworkCreate(PETSC_COMM_WORLD, &dm); DMNetworkSetSizes(dm, NSubnet, nV, nE, NsubnetCouple, nec); DMNetworkSetSizes(dm, NSubnet, nV, nE, NsubnetCouple, NULL); PetscInt *edgeList; PetscMalloc1(14, &edgeList); edgeList[0] = 0; edgeList[1] = 4; edgeList[2] = 1; edgeList[3] = 4; edgeList[4] = 2; edgeList[5] = 5; edgeList[6] = 3; edgeList[7] = 5; edgeList[8] = 4; edgeList[9] = 6; edgeList[10] = 5; edgeList[11] = 6; edgeList[12] = 6; edgeList[13] = 7; PetscInt *edges[1]; edges[0] = edgeList; DMNetworkSetEdgeList(dm, edges, NULL); DMNetworkLayoutSetUp(dm); return 0; } Except from the Online Documenation I wasn't able to find any information/example where newest version Petsc was used when setting up a DMNetwork. (I found a few examples using older versions, however I could not figure out how these could solved using the newest version). If some could explain to me the the correct interpretation of the parameters of DMNetworkSetSizes ? Any pointers to examples using the newest API would also be much appreciated. Most of the examples with DMNetwork have a single network (Nsubnet = 1), except src/snes/examples/tutorials/network/ex1.c. This is a water + electric network simulation that has two subnetworks. Shri Thanks and best Regards, Alex From juaneah at gmail.com Fri Nov 8 01:22:28 2019 From: juaneah at gmail.com (Emmanuel Ayala) Date: Fri, 8 Nov 2019 01:22:28 -0600 Subject: [petsc-users] doubts on VecScatterCreate In-Reply-To: <35D3FEA6-6B71-44F4-B746-23461B581E9C@anl.gov> References: <35D3FEA6-6B71-44F4-B746-23461B581E9C@anl.gov> Message-ID: Hi, Thank you very much for the help and the quickly answer. After check my code the problem was previous to scattering. *The scatter routines work perfectly.* Just to undesrtand, in the line: ISCreateStride(MPI_COMM_WORLD,(end-start),start,1,&is_fromA); if I use MPI_COMM_WORLD, it means that all the processes have a copy of the (current local process) index set? If I use MPI_COMM_SELF, it means that only the local process have information about the index set? Kind regards El lun., 4 de nov. de 2019 a la(s) 08:47, Smith, Barry F. ( bsmith at mcs.anl.gov) escribi?: > > It works for me. Please send a complete code that fails. > > > > > > On Nov 3, 2019, at 11:41 PM, Emmanuel Ayala via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > Hi everyone, thanks in advance. > > > > I have three parallel vectors: A, B and C. A and B have different sizes, > and C must be contain these two vectors (MatLab notation C=[A;B]). I need > to do some operations on C then put back the proper portion of C on A and > B, then I do some computations on A and B y put again on C, and the loop > repeats. > > > > For these propose I use Scatters: > > > > C is created as a parallel vector with size of (sizeA + sizeB) with > petsc_decide for parallel layout. The vectors have been distributed on the > same amount of processes. > > > > For the specific case with order [A;B] > > > > VecGetOwnershipRange(A,&start,&end); > > ISCreateStride(MPI_COMM_WORLD,(end-start),start,1,&is_fromA); > > ISCreateStride(MPI_COMM_WORLD,(end-start),start,1,&is_toC1);// this is > redundant > > VecScatterCreate(A,is_fromA,C,is_toC1,&scatter1); > > > > VecGetSize(A,&sizeA) > > VecGetOwnershipRange(B,&start,&end); > > ISCreateStride(MPI_COMM_WORLD,(end-start),start,1,&is_fromB); > > ISCreateStride(MPI_COMM_WORLD,(end-start),(start+sizeA),1,&is_toC2); > //shifts the index location > > VecScatterCreate(B,is_fromB,C,is_toC2,&scatter2); > > > > Then I can use > > VecScatterBegin(scatter1,A,C,INSERT_VALUES,SCATTER_FORWARD); > > VecScatterEnd(scatter1,A,C,INSERT_VALUES,SCATTER_FORWARD); > > > > and > > VecScatterBegin(scatter1,C,A,INSERT_VALUES,SCATTER_REVERSE); > > VecScatterEnd(scatter1,C,A,INSERT_VALUES,SCATTER_REVERSE); > > > > and the same with B. > > I used MPI_COMM SELF and I got the same results. > > > > The situation is: My results look good for the portion of B, but no for > the portion of A, there is something that I'm doing wrong with the > scattering? > > > > Best regards. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Nov 8 04:23:34 2019 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 8 Nov 2019 05:23:34 -0500 Subject: [petsc-users] doubts on VecScatterCreate In-Reply-To: References: <35D3FEA6-6B71-44F4-B746-23461B581E9C@anl.gov> Message-ID: On Fri, Nov 8, 2019 at 2:23 AM Emmanuel Ayala via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi, > > Thank you very much for the help and the quickly answer. > > After check my code the problem was previous to scattering. *The scatter > routines work perfectly.* > > Just to undesrtand, in the line: > > ISCreateStride(MPI_COMM_WORLD,(end-start),start,1,&is_fromA); > > if I use MPI_COMM_WORLD, it means that all the processes have a copy of > the (current local process) index set? > If I use MPI_COMM_SELF, it means that only the local process have > information about the index set? > No, each process has whatever you give it in this call. The comm determines what happens with collective calls, like https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/IS/ISGetTotalIndices.html Thanks, Matt > Kind regards > > > > El lun., 4 de nov. de 2019 a la(s) 08:47, Smith, Barry F. ( > bsmith at mcs.anl.gov) escribi?: > >> >> It works for me. Please send a complete code that fails. >> >> >> >> >> > On Nov 3, 2019, at 11:41 PM, Emmanuel Ayala via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> > >> > Hi everyone, thanks in advance. >> > >> > I have three parallel vectors: A, B and C. A and B have different >> sizes, and C must be contain these two vectors (MatLab notation C=[A;B]). I >> need to do some operations on C then put back the proper portion of C on A >> and B, then I do some computations on A and B y put again on C, and the >> loop repeats. >> > >> > For these propose I use Scatters: >> > >> > C is created as a parallel vector with size of (sizeA + sizeB) with >> petsc_decide for parallel layout. The vectors have been distributed on the >> same amount of processes. >> > >> > For the specific case with order [A;B] >> > >> > VecGetOwnershipRange(A,&start,&end); >> > ISCreateStride(MPI_COMM_WORLD,(end-start),start,1,&is_fromA); >> > ISCreateStride(MPI_COMM_WORLD,(end-start),start,1,&is_toC1);// this is >> redundant >> > VecScatterCreate(A,is_fromA,C,is_toC1,&scatter1); >> > >> > VecGetSize(A,&sizeA) >> > VecGetOwnershipRange(B,&start,&end); >> > ISCreateStride(MPI_COMM_WORLD,(end-start),start,1,&is_fromB); >> > ISCreateStride(MPI_COMM_WORLD,(end-start),(start+sizeA),1,&is_toC2); >> //shifts the index location >> > VecScatterCreate(B,is_fromB,C,is_toC2,&scatter2); >> > >> > Then I can use >> > VecScatterBegin(scatter1,A,C,INSERT_VALUES,SCATTER_FORWARD); >> > VecScatterEnd(scatter1,A,C,INSERT_VALUES,SCATTER_FORWARD); >> > >> > and >> > VecScatterBegin(scatter1,C,A,INSERT_VALUES,SCATTER_REVERSE); >> > VecScatterEnd(scatter1,C,A,INSERT_VALUES,SCATTER_REVERSE); >> > >> > and the same with B. >> > I used MPI_COMM SELF and I got the same results. >> > >> > The situation is: My results look good for the portion of B, but no for >> the portion of A, there is something that I'm doing wrong with the >> scattering? >> > >> > Best regards. >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From juaneah at gmail.com Fri Nov 8 11:11:20 2019 From: juaneah at gmail.com (Emmanuel Ayala) Date: Fri, 8 Nov 2019 11:11:20 -0600 Subject: [petsc-users] doubts on VecScatterCreate In-Reply-To: References: <35D3FEA6-6B71-44F4-B746-23461B581E9C@anl.gov> Message-ID: Ok, thanks for the clarification. Kind regards. El vie., 8 de nov. de 2019 a la(s) 04:23, Matthew Knepley (knepley at gmail.com) escribi?: > On Fri, Nov 8, 2019 at 2:23 AM Emmanuel Ayala via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Hi, >> >> Thank you very much for the help and the quickly answer. >> >> After check my code the problem was previous to scattering. *The scatter >> routines work perfectly.* >> >> Just to undesrtand, in the line: >> >> ISCreateStride(MPI_COMM_WORLD,(end-start),start,1,&is_fromA); >> >> if I use MPI_COMM_WORLD, it means that all the processes have a copy of >> the (current local process) index set? >> If I use MPI_COMM_SELF, it means that only the local process have >> information about the index set? >> > > No, each process has whatever you give it in this call. The comm > determines what happens with collective calls, like > > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/IS/ISGetTotalIndices.html > > Thanks, > > Matt > > >> Kind regards >> >> >> >> El lun., 4 de nov. de 2019 a la(s) 08:47, Smith, Barry F. ( >> bsmith at mcs.anl.gov) escribi?: >> >>> >>> It works for me. Please send a complete code that fails. >>> >>> >>> >>> >>> > On Nov 3, 2019, at 11:41 PM, Emmanuel Ayala via petsc-users < >>> petsc-users at mcs.anl.gov> wrote: >>> > >>> > Hi everyone, thanks in advance. >>> > >>> > I have three parallel vectors: A, B and C. A and B have different >>> sizes, and C must be contain these two vectors (MatLab notation C=[A;B]). I >>> need to do some operations on C then put back the proper portion of C on A >>> and B, then I do some computations on A and B y put again on C, and the >>> loop repeats. >>> > >>> > For these propose I use Scatters: >>> > >>> > C is created as a parallel vector with size of (sizeA + sizeB) with >>> petsc_decide for parallel layout. The vectors have been distributed on the >>> same amount of processes. >>> > >>> > For the specific case with order [A;B] >>> > >>> > VecGetOwnershipRange(A,&start,&end); >>> > ISCreateStride(MPI_COMM_WORLD,(end-start),start,1,&is_fromA); >>> > ISCreateStride(MPI_COMM_WORLD,(end-start),start,1,&is_toC1);// this is >>> redundant >>> > VecScatterCreate(A,is_fromA,C,is_toC1,&scatter1); >>> > >>> > VecGetSize(A,&sizeA) >>> > VecGetOwnershipRange(B,&start,&end); >>> > ISCreateStride(MPI_COMM_WORLD,(end-start),start,1,&is_fromB); >>> > ISCreateStride(MPI_COMM_WORLD,(end-start),(start+sizeA),1,&is_toC2); >>> //shifts the index location >>> > VecScatterCreate(B,is_fromB,C,is_toC2,&scatter2); >>> > >>> > Then I can use >>> > VecScatterBegin(scatter1,A,C,INSERT_VALUES,SCATTER_FORWARD); >>> > VecScatterEnd(scatter1,A,C,INSERT_VALUES,SCATTER_FORWARD); >>> > >>> > and >>> > VecScatterBegin(scatter1,C,A,INSERT_VALUES,SCATTER_REVERSE); >>> > VecScatterEnd(scatter1,C,A,INSERT_VALUES,SCATTER_REVERSE); >>> > >>> > and the same with B. >>> > I used MPI_COMM SELF and I got the same results. >>> > >>> > The situation is: My results look good for the portion of B, but no >>> for the portion of A, there is something that I'm doing wrong with the >>> scattering? >>> > >>> > Best regards. >>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Mon Nov 11 08:20:59 2019 From: mlohry at gmail.com (Mark Lohry) Date: Mon, 11 Nov 2019 09:20:59 -0500 Subject: [petsc-users] Line search Ended due to ynorm, nondeterministic stagnation Message-ID: Symptom: running the same code on the same core count multiple times, 80% of the time it converges out no problem. 20% of the time SNES seems to reset to the previous step, showing "Line search: Ended due to ynorm < stol*xnorm" Sample output below. Everything is healthy until TS step 7, which starts with a norm of 1.181275684011e-04. After that point, every KSP converges to RTOL and I see the line search message, and the subsequent timesteps all seem like they don't use the previous update, because they have exactly the same residuals. Running the same thing again, it happily proceeds through and converges out without the stagnation. Does this sound familiar to anyone? 5 TS dt 30. time 150. 0 SNES Function norm 2.654593713313e-03 0 KSP Residual norm 2.654593713313e-03 ... 41 KSP Residual norm 2.515907124549e-04 Linear solve converged due to CONVERGED_RTOL iterations 41 Line search: gnorm after quadratic fit 2.445531672458e-03 Line search: Quadratically determined step, lambda=1.5043681801723077e-01 1 SNES Function norm 2.445531672458e-03 0 KSP Residual norm 2.445531672458e-03 ... 40 KSP Residual norm 2.281075491298e-04 Linear solve converged due to CONVERGED_RTOL iterations 40 Line search: gnorm after quadratic fit 2.158781959371e-03 Line search: Quadratically determined step, lambda=2.1593140643569805e-01 2 SNES Function norm 2.158781959371e-03 0 KSP Residual norm 2.158781959371e-03 40 KSP Residual norm 1.902363860750e-04 Linear solve converged due to CONVERGED_RTOL iterations 40 Line search: gnorm after quadratic fit 1.727564041943e-03 Line search: Quadratically determined step, lambda=3.7179194957755984e-01 3 SNES Function norm 1.727564041943e-03 0 KSP Residual norm 1.727564041943e-03 ... 38 KSP Residual norm 1.572811703375e-04 Linear solve converged due to CONVERGED_RTOL iterations 38 Line search: Using full step: fnorm 1.727564041943e-03 gnorm 1.074289112389e-03 4 SNES Function norm 1.074289112389e-03 0 KSP Residual norm 1.074289112389e-03 ... 18 KSP Residual norm 1.047939125382e-04 Linear solve converged due to CONVERGED_RTOL iterations 18 Line search: Using full step: fnorm 1.074289112389e-03 gnorm 1.049627557624e-04 5 SNES Function norm 1.049627557624e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 5 TSAdapt none beuler 0: step 5 accepted t=150 + 3.000e+01 dt=3.000e+01 6 TS dt 30. time 180. 0 SNES Function norm 1.181275684011e-04 0 KSP Residual norm 1.181275684011e-04 ... 44 KSP Residual norm 1.107203720458e-05 Linear solve converged due to CONVERGED_RTOL iterations 44 Line search: Ended due to ynorm < stol*xnorm (1.034132241459e-04 < 2.190637006857e-04). 1 SNES Function norm 1.181275684011e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt none beuler 0: step 6 accepted t=180 + 3.000e+01 dt=3.000e+01 7 TS dt 30. time 210. 0 SNES Function norm 1.181275684011e-04 0 KSP Residual norm 1.181275684011e-04 ... 44 KSP Residual norm 1.110103600083e-05 Linear solve converged due to CONVERGED_RTOL iterations 44 Line search: Ended due to ynorm < stol*xnorm (1.047067861804e-04 < 2.190637006857e-04). 1 SNES Function norm 1.181275684011e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt none beuler 0: step 7 accepted t=210 + 3.000e+01 dt=3.000e+01 8 TS dt 30. time 240. 0 SNES Function norm 1.181275684011e-04 0 KSP Residual norm 1.181275684011e-04 ... 44 KSP Residual norm 1.102709658960e-05 Linear solve converged due to CONVERGED_RTOL iterations 44 Line search: Ended due to ynorm < stol*xnorm (1.035351833216e-04 < 2.190637006857e-04). 1 SNES Function norm 1.181275684011e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt none beuler 0: step 8 accepted t=240 + 3.000e+01 dt=3.000e+01 9 2.700e+02 3.000e+01 1.72412e-09 2.28859e-07 2.39889e-07 9.69858e-08 4.71512e-06 3.30532e-02 1.83203e+00 -3.38502e-01 -3.09701e+00 1.96301e-02 3.53675e-02 3.96395e+01 9 TS dt 30. time 270. 0 SNES Function norm 1.181275684011e-04 0 KSP Residual norm 1.181275684011e-04 ... -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Nov 11 16:49:12 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Mon, 11 Nov 2019 22:49:12 +0000 Subject: [petsc-users] Line search Ended due to ynorm, nondeterministic stagnation In-Reply-To: References: Message-ID: <3B6F0D75-3509-453C-B2B2-A6BE7ADF9D8C@anl.gov> Mark, What are you using for KSP rtol ? It looks like 1.e-1 from > 0 KSP Residual norm 2.654593713313e-03 > ... > 41 KSP Residual norm 2.515907124549e-04 What about SNES stol, are you setting that? > Line search: Ended due to ynorm < stol*xnorm (1.047067861804e-04 < 2.190637006857e-04). Any idea of the order of magnitude of the solution? These numbers are much larger than one normally sees in this situation. It looks like the linear solve is just not accurate enough so a descent direction is not generated and hence the line search has to give up. The problem comes up only in some runs because the linear solve is right on the cusp of good enough to find a descent direction and depending on the order of operations in the linear solve the solution sometimes is a descent direction and sometimes is not producing inconsistent behavior. I would just make the linear solver more accurate, pay the cost of a bit more time in trade off for removing the problems with failure Barry > On Nov 11, 2019, at 8:20 AM, Mark Lohry via petsc-users wrote: > > Symptom: running the same code on the same core count multiple times, 80% of the time it converges out no problem. 20% of the time SNES seems to reset to the previous step, showing > "Line search: Ended due to ynorm < stol*xnorm" > > Sample output below. Everything is healthy until TS step 7, which starts with a norm of 1.181275684011e-04. After that point, every KSP converges to RTOL and I see the line search message, and the subsequent timesteps all seem like they don't use the previous update, because they have exactly the same residuals. > > Running the same thing again, it happily proceeds through and converges out without the stagnation. > > Does this sound familiar to anyone? > > > 5 TS dt 30. time 150. > 0 SNES Function norm 2.654593713313e-03 > 0 KSP Residual norm 2.654593713313e-03 > ... > 41 KSP Residual norm 2.515907124549e-04 > Linear solve converged due to CONVERGED_RTOL iterations 41 > Line search: gnorm after quadratic fit 2.445531672458e-03 > Line search: Quadratically determined step, lambda=1.5043681801723077e-01 > 1 SNES Function norm 2.445531672458e-03 > 0 KSP Residual norm 2.445531672458e-03 > ... > 40 KSP Residual norm 2.281075491298e-04 > Linear solve converged due to CONVERGED_RTOL iterations 40 > Line search: gnorm after quadratic fit 2.158781959371e-03 > Line search: Quadratically determined step, lambda=2.1593140643569805e-01 > 2 SNES Function norm 2.158781959371e-03 > 0 KSP Residual norm 2.158781959371e-03 > 40 KSP Residual norm 1.902363860750e-04 > Linear solve converged due to CONVERGED_RTOL iterations 40 > Line search: gnorm after quadratic fit 1.727564041943e-03 > Line search: Quadratically determined step, lambda=3.7179194957755984e-01 > 3 SNES Function norm 1.727564041943e-03 > 0 KSP Residual norm 1.727564041943e-03 > ... > 38 KSP Residual norm 1.572811703375e-04 > Linear solve converged due to CONVERGED_RTOL iterations 38 > Line search: Using full step: fnorm 1.727564041943e-03 gnorm 1.074289112389e-03 > 4 SNES Function norm 1.074289112389e-03 > 0 KSP Residual norm 1.074289112389e-03 > ... > 18 KSP Residual norm 1.047939125382e-04 > Linear solve converged due to CONVERGED_RTOL iterations 18 > Line search: Using full step: fnorm 1.074289112389e-03 gnorm 1.049627557624e-04 > 5 SNES Function norm 1.049627557624e-04 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 5 > TSAdapt none beuler 0: step 5 accepted t=150 + 3.000e+01 dt=3.000e+01 > 6 TS dt 30. time 180. > 0 SNES Function norm 1.181275684011e-04 > 0 KSP Residual norm 1.181275684011e-04 > ... > > 44 KSP Residual norm 1.107203720458e-05 > Linear solve converged due to CONVERGED_RTOL iterations 44 > Line search: Ended due to ynorm < stol*xnorm (1.034132241459e-04 < 2.190637006857e-04). > 1 SNES Function norm 1.181275684011e-04 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > TSAdapt none beuler 0: step 6 accepted t=180 + 3.000e+01 dt=3.000e+01 > 7 TS dt 30. time 210. > 0 SNES Function norm 1.181275684011e-04 > 0 KSP Residual norm 1.181275684011e-04 > ... > 44 KSP Residual norm 1.110103600083e-05 > Linear solve converged due to CONVERGED_RTOL iterations 44 > Line search: Ended due to ynorm < stol*xnorm (1.047067861804e-04 < 2.190637006857e-04). > 1 SNES Function norm 1.181275684011e-04 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > TSAdapt none beuler 0: step 7 accepted t=210 + 3.000e+01 dt=3.000e+01 > 8 TS dt 30. time 240. > 0 SNES Function norm 1.181275684011e-04 > 0 KSP Residual norm 1.181275684011e-04 > ... > 44 KSP Residual norm 1.102709658960e-05 > Linear solve converged due to CONVERGED_RTOL iterations 44 > Line search: Ended due to ynorm < stol*xnorm (1.035351833216e-04 < 2.190637006857e-04). > 1 SNES Function norm 1.181275684011e-04 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > TSAdapt none beuler 0: step 8 accepted t=240 + 3.000e+01 dt=3.000e+01 > 9 2.700e+02 3.000e+01 1.72412e-09 2.28859e-07 2.39889e-07 9.69858e-08 4.71512e-06 3.30532e-02 1.83203e+00 -3.38502e-01 -3.09701e+00 1.96301e-02 3.53675e-02 3.96395e+01 > 9 TS dt 30. time 270. > 0 SNES Function norm 1.181275684011e-04 > 0 KSP Residual norm 1.181275684011e-04 > ... > > From gideon.simpson at gmail.com Mon Nov 11 19:00:32 2019 From: gideon.simpson at gmail.com (Gideon Simpson) Date: Mon, 11 Nov 2019 20:00:32 -0500 Subject: [petsc-users] ts behavior question Message-ID: I noticed that when I am solving a problem with the ts and I am *not* using a da, if I want to use an implicit time stepping routine: 1. I have to explicitly provide the Jacobian 2. When I do provide the Jacobian, if I want to access the elements of x(t) to construct f(t,x), I need to use a const PetscScalar and a VecGetArrayRead to get it to work. 3. My code works without declaring const when I'm using an explicit scheme. In contrast, if I solve a problem using a da, my code works, I can use implicit schemes without having to provide the Jacobian, and I don't have to use const anywhere. Can someone clarify what is expected/preferred? -- gideon -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Nov 11 23:33:50 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Tue, 12 Nov 2019 05:33:50 +0000 Subject: [petsc-users] ts behavior question In-Reply-To: References: Message-ID: <7EC7C513-A02E-4CFA-80E1-A8A1E783A69A@anl.gov> > On Nov 11, 2019, at 7:00 PM, Gideon Simpson via petsc-users wrote: > > I noticed that when I am solving a problem with the ts and I am *not* using a da, if I want to use an implicit time stepping routine: > 1. I have to explicitly provide the Jacobian Yes > 2. When I do provide the Jacobian, if I want to access the elements of x(t) to construct f(t,x), I need to use a const PetscScalar and a VecGetArrayRead to get it to work. Presumably you call VecGetArray() instead? > > > 3. My code works without declaring const when I'm using an explicit scheme. > > In contrast, if I solve a problem using a da, my code works, I can use implicit schemes without having to provide the Jacobian, and I don't have to use const anywhere. The use with DMDA provides automatic routines for computing the needed Jacobians using finite differencing of your provided function and coloring of the Jacobian. This results in reasonably efficient computation of Jacobians that work in most (almost all) cases. > > Can someone clarify what is expected/preferred? You should always use VecGetArrayRead() for vectors you are accessing but NOT changing the values in. There is no reason not and it provides the potential for higher performance. The algebraic solvers have additional checks to prevent peopled from inadvertently changing the entries in x (which would produce bugs). Presumably this results in generating an error when you call VecGetArray(). At least some of the TS explicit calls do not have such checks. They could be added and should be added. https://gitlab.com/petsc/petsc/issues/493 Thanks for pointing out the inconsistency Barry > > -- > gideon From gideon.simpson at gmail.com Tue Nov 12 09:26:21 2019 From: gideon.simpson at gmail.com (Gideon Simpson) Date: Tue, 12 Nov 2019 10:26:21 -0500 Subject: [petsc-users] ts behavior question In-Reply-To: <7EC7C513-A02E-4CFA-80E1-A8A1E783A69A@anl.gov> References: <7EC7C513-A02E-4CFA-80E1-A8A1E783A69A@anl.gov> Message-ID: So, in principle, should we actually be using DMDAVecGetArrayRead in this context? I seem to be able to get away with DMDAVecGetArray with all time steppers. On Tue, Nov 12, 2019 at 12:33 AM Smith, Barry F. wrote: > > > > On Nov 11, 2019, at 7:00 PM, Gideon Simpson via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > I noticed that when I am solving a problem with the ts and I am *not* > using a da, if I want to use an implicit time stepping routine: > > 1. I have to explicitly provide the Jacobian > > Yes > > > 2. When I do provide the Jacobian, if I want to access the elements of > x(t) to construct f(t,x), I need to use a const PetscScalar and a > VecGetArrayRead to get it to work. > > Presumably you call VecGetArray() instead? > > > > > > 3. My code works without declaring const when I'm using an explicit > scheme. > > > > In contrast, if I solve a problem using a da, my code works, I can use > implicit schemes without having to provide the Jacobian, and I don't have > to use const anywhere. > > The use with DMDA provides automatic routines for computing the needed > Jacobians using finite differencing of your provided function and coloring > of the Jacobian. This results in reasonably efficient computation of > Jacobians that work in most (almost all) cases. > > > > Can someone clarify what is expected/preferred? > > You should always use VecGetArrayRead() for vectors you are accessing > but NOT changing the values in. There is no reason not and it provides the > potential for higher performance. > > The algebraic solvers have additional checks to prevent peopled from > inadvertently changing the entries in x (which would produce bugs). > Presumably this results in generating an error when you call VecGetArray(). > At least some of the TS explicit calls do not have such checks. They could > be added and should be added. https://gitlab.com/petsc/petsc/issues/493 > > Thanks for pointing out the inconsistency > > Barry > > > > > -- > > gideon > > -- gideon -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Nov 12 09:43:43 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Tue, 12 Nov 2019 15:43:43 +0000 Subject: [petsc-users] ts behavior question In-Reply-To: References: <7EC7C513-A02E-4CFA-80E1-A8A1E783A69A@anl.gov> Message-ID: <8DE0C027-9119-4FC0-A638-CDA47814EBB1@mcs.anl.gov> For any vector you only read you should use the read version. Sometimes the vector may not be locked and hence the other routine can be used but that may change as we add more locks and improve the code. So best to do it right > On Nov 12, 2019, at 9:26 AM, Gideon Simpson wrote: > > So, in principle, should we actually be using DMDAVecGetArrayRead in this context? I seem to be able to get away with DMDAVecGetArray with all time steppers. I am not sure why DMDAVecGetArray would work if VecGetArray did not work. Internally it calls VecGetArray() that will do the check. If you call it on local ghosted vectors it doesn't check if the vector is locked since the ghosted version is a copy of the true locked vector. Barry > > On Tue, Nov 12, 2019 at 12:33 AM Smith, Barry F. wrote: > > > > On Nov 11, 2019, at 7:00 PM, Gideon Simpson via petsc-users wrote: > > > > I noticed that when I am solving a problem with the ts and I am *not* using a da, if I want to use an implicit time stepping routine: > > 1. I have to explicitly provide the Jacobian > > Yes > > > 2. When I do provide the Jacobian, if I want to access the elements of x(t) to construct f(t,x), I need to use a const PetscScalar and a VecGetArrayRead to get it to work. > > Presumably you call VecGetArray() instead? > > > > > > 3. My code works without declaring const when I'm using an explicit scheme. > > > > In contrast, if I solve a problem using a da, my code works, I can use implicit schemes without having to provide the Jacobian, and I don't have to use const anywhere. > > The use with DMDA provides automatic routines for computing the needed Jacobians using finite differencing of your provided function and coloring of the Jacobian. This results in reasonably efficient computation of Jacobians that work in most (almost all) cases. > > > > Can someone clarify what is expected/preferred? > > You should always use VecGetArrayRead() for vectors you are accessing but NOT changing the values in. There is no reason not and it provides the potential for higher performance. > > The algebraic solvers have additional checks to prevent peopled from inadvertently changing the entries in x (which would produce bugs). Presumably this results in generating an error when you call VecGetArray(). At least some of the TS explicit calls do not have such checks. They could be added and should be added. https://gitlab.com/petsc/petsc/issues/493 > > Thanks for pointing out the inconsistency > > Barry > > > > > -- > > gideon > > > > -- > gideon From hongzhang at anl.gov Tue Nov 12 09:58:19 2019 From: hongzhang at anl.gov (Zhang, Hong) Date: Tue, 12 Nov 2019 15:58:19 +0000 Subject: [petsc-users] ts behavior question In-Reply-To: <7EC7C513-A02E-4CFA-80E1-A8A1E783A69A@anl.gov> References: <7EC7C513-A02E-4CFA-80E1-A8A1E783A69A@anl.gov> Message-ID: <7CE1276E-EDD7-4245-8E44-B7127326A27C@anl.gov> > On Nov 11, 2019, at 11:33 PM, Smith, Barry F. via petsc-users wrote: > > > >> On Nov 11, 2019, at 7:00 PM, Gideon Simpson via petsc-users wrote: >> >> I noticed that when I am solving a problem with the ts and I am *not* using a da, if I want to use an implicit time stepping routine: >> 1. I have to explicitly provide the Jacobian > > Yes Alternatively, -snes_fd can be used to approximate the Jacobian with normal finite differences (no coloring). The FD approximation is not efficient, but should work for small problems, and it is also useful for testing your hand-written Jacobian (via -snes_test_jacobian) Hong (Mr.) > >> 2. When I do provide the Jacobian, if I want to access the elements of x(t) to construct f(t,x), I need to use a const PetscScalar and a VecGetArrayRead to get it to work. > > Presumably you call VecGetArray() instead? >> >> >> 3. My code works without declaring const when I'm using an explicit scheme. >> >> In contrast, if I solve a problem using a da, my code works, I can use implicit schemes without having to provide the Jacobian, and I don't have to use const anywhere. > > The use with DMDA provides automatic routines for computing the needed Jacobians using finite differencing of your provided function and coloring of the Jacobian. This results in reasonably efficient computation of Jacobians that work in most (almost all) cases. >> >> Can someone clarify what is expected/preferred? > > You should always use VecGetArrayRead() for vectors you are accessing but NOT changing the values in. There is no reason not and it provides the potential for higher performance. > > The algebraic solvers have additional checks to prevent peopled from inadvertently changing the entries in x (which would produce bugs). Presumably this results in generating an error when you call VecGetArray(). At least some of the TS explicit calls do not have such checks. They could be added and should be added. https://gitlab.com/petsc/petsc/issues/493 > > Thanks for pointing out the inconsistency > > Barry > >> >> -- >> gideon > From gideon.simpson at gmail.com Tue Nov 12 14:09:55 2019 From: gideon.simpson at gmail.com (Gideon Simpson) Date: Tue, 12 Nov 2019 15:09:55 -0500 Subject: [petsc-users] ts behavior question In-Reply-To: <8DE0C027-9119-4FC0-A638-CDA47814EBB1@mcs.anl.gov> References: <7EC7C513-A02E-4CFA-80E1-A8A1E783A69A@anl.gov> <8DE0C027-9119-4FC0-A638-CDA47814EBB1@mcs.anl.gov> Message-ID: So this might be a resolution/another question. Part of the reason to use the da is that it provides you with ghost points. If you're only accessing the dependent variables entries with DMDAVecGetArrayRead, then you can't modify the ghost points. If you can't modify the ghost points here, where would you do so in the context of a problem with, for instance, time dependent boundary conditions? On Tue, Nov 12, 2019 at 10:43 AM Smith, Barry F. wrote: > > For any vector you only read you should use the read version. > > Sometimes the vector may not be locked and hence the other routine can > be used but that may change as we add more locks and improve the code. So > best to do it right > > > On Nov 12, 2019, at 9:26 AM, Gideon Simpson > wrote: > > > > So, in principle, should we actually be using DMDAVecGetArrayRead in > this context? I seem to be able to get away with DMDAVecGetArray with all > time steppers. > > I am not sure why DMDAVecGetArray would work if VecGetArray did not > work. Internally it calls VecGetArray() that will do the check. If you call > it on local ghosted vectors it doesn't check if the vector is locked since > the ghosted version is a copy of the true locked vector. > > Barry > > > > > On Tue, Nov 12, 2019 at 12:33 AM Smith, Barry F. > wrote: > > > > > > > On Nov 11, 2019, at 7:00 PM, Gideon Simpson via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > > > I noticed that when I am solving a problem with the ts and I am *not* > using a da, if I want to use an implicit time stepping routine: > > > 1. I have to explicitly provide the Jacobian > > > > Yes > > > > > 2. When I do provide the Jacobian, if I want to access the elements of > x(t) to construct f(t,x), I need to use a const PetscScalar and a > VecGetArrayRead to get it to work. > > > > Presumably you call VecGetArray() instead? > > > > > > > > > 3. My code works without declaring const when I'm using an explicit > scheme. > > > > > > In contrast, if I solve a problem using a da, my code works, I can use > implicit schemes without having to provide the Jacobian, and I don't have > to use const anywhere. > > > > The use with DMDA provides automatic routines for computing the needed > Jacobians using finite differencing of your provided function and coloring > of the Jacobian. This results in reasonably efficient computation of > Jacobians that work in most (almost all) cases. > > > > > > Can someone clarify what is expected/preferred? > > > > You should always use VecGetArrayRead() for vectors you are accessing > but NOT changing the values in. There is no reason not and it provides the > potential for higher performance. > > > > The algebraic solvers have additional checks to prevent peopled from > inadvertently changing the entries in x (which would produce bugs). > Presumably this results in generating an error when you call VecGetArray(). > At least some of the TS explicit calls do not have such checks. They could > be added and should be added. https://gitlab.com/petsc/petsc/issues/493 > > > > Thanks for pointing out the inconsistency > > > > Barry > > > > > > > > -- > > > gideon > > > > > > > > -- > > gideon > > -- gideon -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Nov 12 14:41:38 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Tue, 12 Nov 2019 20:41:38 +0000 Subject: [petsc-users] ts behavior question In-Reply-To: References: <7EC7C513-A02E-4CFA-80E1-A8A1E783A69A@anl.gov> <8DE0C027-9119-4FC0-A638-CDA47814EBB1@mcs.anl.gov> Message-ID: <28124FE8-4FD1-4BAD-8FEC-52B6177EB178@mcs.anl.gov> > On Nov 12, 2019, at 2:09 PM, Gideon Simpson wrote: > > So this might be a resolution/another question. Part of the reason to use the da is that it provides you with ghost points. If you're only accessing the dependent variables entries with DMDAVecGetArrayRead, then you can't modify the ghost points. If you can't modify the ghost points here, where would you do so in the context of a problem with, for instance, time dependent boundary conditions? In that case, as I say below, you have a ghosted local copy and you can put whatever values you wish into those ghosted locations. That is, when using ghosted local vectors you don't need to use the Read() version. Barry Note: if I were writing the code I would open the ghosted local input vector as writeable to put in the ghost values. Close it and then separately open it again as Read() to use in compute the needed TS functions. This is certainly not necessary but it helps with code maintainability and to decreases the likelihood of bugs. You have one set of access where you are legitimately changing values thus should not use Read() and another where you should not be changing values and thus should use read(). > > On Tue, Nov 12, 2019 at 10:43 AM Smith, Barry F. wrote: > > For any vector you only read you should use the read version. > > Sometimes the vector may not be locked and hence the other routine can be used but that may change as we add more locks and improve the code. So best to do it right > > > On Nov 12, 2019, at 9:26 AM, Gideon Simpson wrote: > > > > So, in principle, should we actually be using DMDAVecGetArrayRead in this context? I seem to be able to get away with DMDAVecGetArray with all time steppers. > > I am not sure why DMDAVecGetArray would work if VecGetArray did not work. Internally it calls VecGetArray() that will do the check. If you call it on local ghosted vectors it doesn't check if the vector is locked since the ghosted version is a copy of the true locked vector. > > Barry > > > > > On Tue, Nov 12, 2019 at 12:33 AM Smith, Barry F. wrote: > > > > > > > On Nov 11, 2019, at 7:00 PM, Gideon Simpson via petsc-users wrote: > > > > > > I noticed that when I am solving a problem with the ts and I am *not* using a da, if I want to use an implicit time stepping routine: > > > 1. I have to explicitly provide the Jacobian > > > > Yes > > > > > 2. When I do provide the Jacobian, if I want to access the elements of x(t) to construct f(t,x), I need to use a const PetscScalar and a VecGetArrayRead to get it to work. > > > > Presumably you call VecGetArray() instead? > > > > > > > > > 3. My code works without declaring const when I'm using an explicit scheme. > > > > > > In contrast, if I solve a problem using a da, my code works, I can use implicit schemes without having to provide the Jacobian, and I don't have to use const anywhere. > > > > The use with DMDA provides automatic routines for computing the needed Jacobians using finite differencing of your provided function and coloring of the Jacobian. This results in reasonably efficient computation of Jacobians that work in most (almost all) cases. > > > > > > Can someone clarify what is expected/preferred? > > > > You should always use VecGetArrayRead() for vectors you are accessing but NOT changing the values in. There is no reason not and it provides the potential for higher performance. > > > > The algebraic solvers have additional checks to prevent peopled from inadvertently changing the entries in x (which would produce bugs). Presumably this results in generating an error when you call VecGetArray(). At least some of the TS explicit calls do not have such checks. They could be added and should be added. https://gitlab.com/petsc/petsc/issues/493 > > > > Thanks for pointing out the inconsistency > > > > Barry > > > > > > > > -- > > > gideon > > > > > > > > -- > > gideon > > > > -- > gideon From gideon.simpson at gmail.com Tue Nov 12 14:48:00 2019 From: gideon.simpson at gmail.com (Gideon Simpson) Date: Tue, 12 Nov 2019 15:48:00 -0500 Subject: [petsc-users] ts behavior question In-Reply-To: <28124FE8-4FD1-4BAD-8FEC-52B6177EB178@mcs.anl.gov> References: <7EC7C513-A02E-4CFA-80E1-A8A1E783A69A@anl.gov> <8DE0C027-9119-4FC0-A638-CDA47814EBB1@mcs.anl.gov> <28124FE8-4FD1-4BAD-8FEC-52B6177EB178@mcs.anl.gov> Message-ID: I think I'm almost with you. In my code, I make a local copy of the vector (with DMGetLocalVector) and after calling GlobaltoLocal, I call DMDAVecGetArray on the local vector. I use the array I obtain of this local copy in populating by right hand side function. Is that consistent with your the approach that you guys recommend? If I were to do as you say and have a separate set of calls for populating the ghost points, where would this fit in the ts framework? Are are you saying this would be done at the beginning of the RHS function? On Tue, Nov 12, 2019 at 3:41 PM Smith, Barry F. wrote: > > > > On Nov 12, 2019, at 2:09 PM, Gideon Simpson > wrote: > > > > So this might be a resolution/another question. Part of the reason to > use the da is that it provides you with ghost points. If you're only > accessing the dependent variables entries with DMDAVecGetArrayRead, then > you can't modify the ghost points. If you can't modify the ghost points > here, where would you do so in the context of a problem with, for instance, > time dependent boundary conditions? > > In that case, as I say below, you have a ghosted local copy and you can > put whatever values you wish into those ghosted locations. That is, when > using ghosted local vectors you don't need to use the Read() version. > > Barry > > Note: if I were writing the code I would open the ghosted local input > vector as writeable to put in the ghost values. Close it and then > separately open it again as Read() to use in compute the needed TS > functions. This is certainly not necessary but it helps with code > maintainability and to decreases the likelihood of bugs. You have one set > of access where you are legitimately changing values thus should not use > Read() and another where you should not be changing values and thus should > use read(). > > > > > > > > On Tue, Nov 12, 2019 at 10:43 AM Smith, Barry F. > wrote: > > > > For any vector you only read you should use the read version. > > > > Sometimes the vector may not be locked and hence the other routine can > be used but that may change as we add more locks and improve the code. So > best to do it right > > > > > On Nov 12, 2019, at 9:26 AM, Gideon Simpson > wrote: > > > > > > So, in principle, should we actually be using DMDAVecGetArrayRead in > this context? I seem to be able to get away with DMDAVecGetArray with all > time steppers. > > > > I am not sure why DMDAVecGetArray would work if VecGetArray did not > work. Internally it calls VecGetArray() that will do the check. If you call > it on local ghosted vectors it doesn't check if the vector is locked since > the ghosted version is a copy of the true locked vector. > > > > Barry > > > > > > > > On Tue, Nov 12, 2019 at 12:33 AM Smith, Barry F. > wrote: > > > > > > > > > > On Nov 11, 2019, at 7:00 PM, Gideon Simpson via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > > > > > I noticed that when I am solving a problem with the ts and I am > *not* using a da, if I want to use an implicit time stepping routine: > > > > 1. I have to explicitly provide the Jacobian > > > > > > Yes > > > > > > > 2. When I do provide the Jacobian, if I want to access the elements > of x(t) to construct f(t,x), I need to use a const PetscScalar and a > VecGetArrayRead to get it to work. > > > > > > Presumably you call VecGetArray() instead? > > > > > > > > > > > > 3. My code works without declaring const when I'm using an explicit > scheme. > > > > > > > > In contrast, if I solve a problem using a da, my code works, I can > use implicit schemes without having to provide the Jacobian, and I don't > have to use const anywhere. > > > > > > The use with DMDA provides automatic routines for computing the > needed Jacobians using finite differencing of your provided function and > coloring of the Jacobian. This results in reasonably efficient computation > of Jacobians that work in most (almost all) cases. > > > > > > > > Can someone clarify what is expected/preferred? > > > > > > You should always use VecGetArrayRead() for vectors you are > accessing but NOT changing the values in. There is no reason not and it > provides the potential for higher performance. > > > > > > The algebraic solvers have additional checks to prevent peopled from > inadvertently changing the entries in x (which would produce bugs). > Presumably this results in generating an error when you call VecGetArray(). > At least some of the TS explicit calls do not have such checks. They could > be added and should be added. https://gitlab.com/petsc/petsc/issues/493 > > > > > > Thanks for pointing out the inconsistency > > > > > > Barry > > > > > > > > > > > -- > > > > gideon > > > > > > > > > > > > -- > > > gideon > > > > > > > > -- > > gideon > > -- gideon -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Nov 12 17:26:53 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Tue, 12 Nov 2019 23:26:53 +0000 Subject: [petsc-users] ts behavior question In-Reply-To: References: <7EC7C513-A02E-4CFA-80E1-A8A1E783A69A@anl.gov> <8DE0C027-9119-4FC0-A638-CDA47814EBB1@mcs.anl.gov> <28124FE8-4FD1-4BAD-8FEC-52B6177EB178@mcs.anl.gov> Message-ID: > On Nov 12, 2019, at 2:48 PM, Gideon Simpson wrote: > > I think I'm almost with you. In my code, I make a local copy of the vector (with DMGetLocalVector) and after calling GlobaltoLocal, I call DMDAVecGetArray on the local vector. I use the array I obtain of this local copy in populating by right hand side function. Is that consistent with your the approach that you guys recommend? When you need ghost points yes. > If I were to do as you say and have a separate set of calls for populating the ghost points, where would this fit in the ts framework? You could do it after you after the DMDAVecGetArray() call. So it is in the same routine. It would just "separate" more clearly the "setting the ghost values" from the "competing the RHS function". Barry > Are are you saying this would be done at the beginning of the RHS function? > > On Tue, Nov 12, 2019 at 3:41 PM Smith, Barry F. wrote: > > > > On Nov 12, 2019, at 2:09 PM, Gideon Simpson wrote: > > > > So this might be a resolution/another question. Part of the reason to use the da is that it provides you with ghost points. If you're only accessing the dependent variables entries with DMDAVecGetArrayRead, then you can't modify the ghost points. If you can't modify the ghost points here, where would you do so in the context of a problem with, for instance, time dependent boundary conditions? > > In that case, as I say below, you have a ghosted local copy and you can put whatever values you wish into those ghosted locations. That is, when using ghosted local vectors you don't need to use the Read() version. > > Barry > > Note: if I were writing the code I would open the ghosted local input vector as writeable to put in the ghost values. Close it and then separately open it again as Read() to use in compute the needed TS functions. This is certainly not necessary but it helps with code maintainability and to decreases the likelihood of bugs. You have one set of access where you are legitimately changing values thus should not use Read() and another where you should not be changing values and thus should use read(). > > > > > > > > On Tue, Nov 12, 2019 at 10:43 AM Smith, Barry F. wrote: > > > > For any vector you only read you should use the read version. > > > > Sometimes the vector may not be locked and hence the other routine can be used but that may change as we add more locks and improve the code. So best to do it right > > > > > On Nov 12, 2019, at 9:26 AM, Gideon Simpson wrote: > > > > > > So, in principle, should we actually be using DMDAVecGetArrayRead in this context? I seem to be able to get away with DMDAVecGetArray with all time steppers. > > > > I am not sure why DMDAVecGetArray would work if VecGetArray did not work. Internally it calls VecGetArray() that will do the check. If you call it on local ghosted vectors it doesn't check if the vector is locked since the ghosted version is a copy of the true locked vector. > > > > Barry > > > > > > > > On Tue, Nov 12, 2019 at 12:33 AM Smith, Barry F. wrote: > > > > > > > > > > On Nov 11, 2019, at 7:00 PM, Gideon Simpson via petsc-users wrote: > > > > > > > > I noticed that when I am solving a problem with the ts and I am *not* using a da, if I want to use an implicit time stepping routine: > > > > 1. I have to explicitly provide the Jacobian > > > > > > Yes > > > > > > > 2. When I do provide the Jacobian, if I want to access the elements of x(t) to construct f(t,x), I need to use a const PetscScalar and a VecGetArrayRead to get it to work. > > > > > > Presumably you call VecGetArray() instead? > > > > > > > > > > > > 3. My code works without declaring const when I'm using an explicit scheme. > > > > > > > > In contrast, if I solve a problem using a da, my code works, I can use implicit schemes without having to provide the Jacobian, and I don't have to use const anywhere. > > > > > > The use with DMDA provides automatic routines for computing the needed Jacobians using finite differencing of your provided function and coloring of the Jacobian. This results in reasonably efficient computation of Jacobians that work in most (almost all) cases. > > > > > > > > Can someone clarify what is expected/preferred? > > > > > > You should always use VecGetArrayRead() for vectors you are accessing but NOT changing the values in. There is no reason not and it provides the potential for higher performance. > > > > > > The algebraic solvers have additional checks to prevent peopled from inadvertently changing the entries in x (which would produce bugs). Presumably this results in generating an error when you call VecGetArray(). At least some of the TS explicit calls do not have such checks. They could be added and should be added. https://gitlab.com/petsc/petsc/issues/493 > > > > > > Thanks for pointing out the inconsistency > > > > > > Barry > > > > > > > > > > > -- > > > > gideon > > > > > > > > > > > > -- > > > gideon > > > > > > > > -- > > gideon > > > > -- > gideon From hgbk2008 at gmail.com Thu Nov 14 14:04:42 2019 From: hgbk2008 at gmail.com (hg) Date: Thu, 14 Nov 2019 21:04:42 +0100 Subject: [petsc-users] solve problem with pastix In-Reply-To: References: <937C0613-989A-4DD6-8A33-BB947D5E7DEB@anl.gov> <1E28761A-883F-4B66-9BA8-8367881D5BCB@mcs.anl.gov> Message-ID: Hello It turns out that hwloc is not installed on the cluster system that I'm using. Without hwloc, pastix will run into the branch using sched_setaffinity and caused error (see above at sopalin_thread.c). I'm not able to understand and find a solution with sched_setaffinity so I think enabling hwloc is an easier solution. Between, hwloc is recommended to compile Pastix according to those threads: https://gforge.inria.fr/forum/forum.php?thread_id=32824&forum_id=599&group_id=186 https://solverstack.gitlabpages.inria.fr/pastix/Bindings.html hwloc is supported in PETSc so I assumed a clean and easy solution to compile with --download-hwloc. I made some changes in config/BuildSystem/config/packages/PaStiX.py to tell pastix to link to hwloc: ... self.hwloc = framework.require('config.packages.hwloc',self) ... if self.hwloc.found: g.write('CCPASTIX := $(CCPASTIX) -DWITH_HWLOC '+self.headers.toString(self.hwloc.include)+'\n') g.write('EXTRALIB := $(EXTRALIB) '+self.libraries.toString(self.hwloc.dlib)+'\n') But it does not compile: Possible ERROR while running linker: exit code 1 stderr: /opt/petsc-dev/lib/libpastix.a(pastix.o): In function `pastix_task_init': /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/pastix.c:822: undefined reference to `hwloc_topology_init' /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/pastix.c:828: undefined reference to `hwloc_topology_load' /opt/petsc-dev/lib/libpastix.a(pastix.o): In function `pastix_task_clean': /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/pastix.c:4677: undefined reference to `hwloc_topology_destroy' /opt/petsc-dev/lib/libpastix.a(sopalin_thread.o): In function `hwloc_get_obj_by_type': /opt/petsc-dev/include/hwloc/inlines.h:76: undefined reference to `hwloc_get_type_depth' /opt/petsc-dev/include/hwloc/inlines.h:81: undefined reference to `hwloc_get_obj_by_depth' /opt/petsc-dev/lib/libpastix.a(sopalin_thread.o): In function `sopalin_bindthread': /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:538: undefined reference to `hwloc_bitmap_dup' /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:539: undefined reference to `hwloc_bitmap_singlify' /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:543: undefined reference to `hwloc_set_cpubind' /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:567: undefined reference to `hwloc_bitmap_free' /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:548: undefined reference to `hwloc_bitmap_asprintf' Any idea is appreciated. I can attach configure.log as needed. Giang On Thu, Nov 7, 2019 at 12:18 AM hg wrote: > Hi Barry > > Maybe you're right, sched_setaffinity returns EINVAL in my case, Probably > the scheduler does not allow the process to bind to thread on its own. > > Giang > > > On Wed, Nov 6, 2019 at 4:52 PM Smith, Barry F. wrote: > >> >> You can also just look at configure.log where it will show the calling >> sequence of how PETSc configured and built Pastix. The recipe is in >> config/BuildSystem/config/packages/PaStiX.py we don't monkey with low level >> things like the affinity of external packages. My guess is that your >> cluster system has inconsistent parts related to this, that one tool works >> and another does not indicates they are inconsistent with respect to each >> other in what they expect. >> >> Barry >> >> >> >> >> > On Nov 6, 2019, at 4:02 AM, Matthew Knepley wrote: >> > >> > On Wed, Nov 6, 2019 at 4:40 AM hg wrote: >> > Look into >> arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c >> I saw something like: >> > >> > #ifdef HAVE_OLD_SCHED_SETAFFINITY >> > if(sched_setaffinity(0,&mask) < 0) >> > #else /* HAVE_OLD_SCHED_SETAFFINITY */ >> > if(sched_setaffinity(0,sizeof(mask),&mask) < 0) >> > #endif /* HAVE_OLD_SCHED_SETAFFINITY */ >> > { >> > perror("sched_setaffinity"); >> > EXIT(MOD_SOPALIN, INTERNAL_ERR); >> > } >> > >> > Is there possibility that Petsc turn on HAVE_OLD_SCHED_SETAFFINITY >> during compilation? >> > >> > May I know how to trigger re-compilation of external packages with >> petsc? I may go in there and check what's going on. >> > >> > If we built it during configure, then you can just go to >> > >> > $PETSC_DIR/$PETSC_ARCH/externalpackages/*pastix*/ >> > >> > and rebuild/install it to test. If you want configure to do it, you >> have to delete >> > >> > $PETSC_DIR/$PETSC_ARCH/lib/petsc/conf/pkg.conf.pastix >> > >> > and reconfigure. >> > >> > Thanks, >> > >> > Matt >> > >> > Giang >> > >> > >> > On Wed, Nov 6, 2019 at 10:12 AM hg wrote: >> > sched_setaffinity: Invalid argument only happens when I launch the job >> with sbatch. Running without scheduler is fine. I think this has something >> to do with pastix. >> > >> > Giang >> > >> > >> > On Wed, Nov 6, 2019 at 4:37 AM Smith, Barry F. >> wrote: >> > >> > Google finds this >> https://gforge.inria.fr/forum/forum.php?thread_id=32824&forum_id=599&group_id=186 >> > >> > >> > >> > > On Nov 5, 2019, at 7:01 PM, Matthew Knepley via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> > > >> > > I have no idea. That is a good question for the PasTix list. >> > > >> > > Thanks, >> > > >> > > Matt >> > > >> > > On Tue, Nov 5, 2019 at 5:32 PM hg wrote: >> > > Should thread affinity be invoked? I set -mat_pastix_threadnbr 1 and >> also OMP_NUM_THREADS to 1 >> > > >> > > Giang >> > > >> > > >> > > On Tue, Nov 5, 2019 at 10:50 PM Matthew Knepley >> wrote: >> > > On Tue, Nov 5, 2019 at 4:11 PM hg via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> > > Hello >> > > >> > > I got crashed when using Pastix as solver for KSP. The error message >> looks like: >> > > >> > > .... >> > > NUMBER of BUBBLE 1 >> > > COEFMAX 1735566 CPFTMAX 0 BPFTMAX 0 NBFTMAX 0 ARFTMAX 0 >> > > ** End of Partition & Distribution phase ** >> > > Time to analyze 0.225 s >> > > Number of nonzeros in factorized matrix 708784076 >> > > Fill-in 12.2337 >> > > Number of operations (LU) 2.80185e+12 >> > > Prediction Time to factorize (AMD 6180 MKL) 394 s >> > > 0 : SolverMatrix size (without coefficients) 32.4 MB >> > > 0 : Number of nonzeros (local block structure) 365309391 >> > > Numerical Factorization (LU) : >> > > 0 : Internal CSC size 1.08 GB >> > > Time to fill internal csc 6.66 s >> > > --- Sopalin : Allocation de la structure globale --- >> > > --- Fin Sopalin Init --- >> > > --- Initialisation des tableaux globaux --- >> > > sched_setaffinity: Invalid argument >> > > [node083:165071] *** Process received signal *** >> > > [node083:165071] Signal: Aborted (6) >> > > [node083:165071] Signal code: (-6) >> > > [node083:165071] [ 0] /lib64/libpthread.so.0(+0xf680)[0x2b8081845680] >> > > [node083:165071] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b8082191207] >> > > [node083:165071] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b80821928f8] >> > > [node083:165071] [ 3] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_comm+0x0)[0x2b80a4124c9d] >> > > [node083:165071] [ 4] Launching 1 threads (1 commputation, 0 >> communication, 0 out-of-core) >> > > --- Sopalin : Local structure allocation --- >> > > >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_sopalin_init_smp+0x29b)[0x2b80a40c39d2] >> > > [node083:165071] [ 5] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_smp+0x68)[0x2b80a40cf4c2] >> > > [node083:165071] [ 6] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_thread+0x4ba)[0x2b80a4124a31] >> > > [node083:165071] [ 7] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_thread+0x94)[0x2b80a40d6170] >> > > [node083:165071] [ 8] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_pastix_task_sopalin+0x5ad)[0x2b80a40b09a2] >> > > [node083:165071] [ 9] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(d_pastix+0xa8a)[0x2b80a40b2325] >> > > [node083:165071] [10] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0x63927b)[0x2b80a35bf27b] >> > > [node083:165071] [11] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(MatLUFactorNumeric+0x19a)[0x2b80a32c7552] >> > > [node083:165071] [12] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0xa46c09)[0x2b80a39ccc09] >> > > [node083:165071] [13] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(PCSetUp+0x311)[0x2b80a3a8f1a9] >> > > [node083:165071] [14] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSetUp+0xbf7)[0x2b80a3b46e81] >> > > [node083:165071] [15] >> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSolve+0x210)[0x2b80a3b4746e] >> > > >> > > Does anyone have an idea what is the problem and how to fix it? The >> PETSc parameters I used are as below: >> > > >> > > It looks like PasTix is having trouble setting the thread affinity: >> > > >> > > sched_setaffinity: Invalid argument >> > > >> > > so it may be your build of PasTix. >> > > >> > > Thanks, >> > > >> > > Matt >> > > >> > > -pc_type lu >> > > -pc_factor_mat_solver_package pastix >> > > -mat_pastix_verbose 2 >> > > -mat_pastix_threadnbr 1 >> > > >> > > Giang >> > > >> > > >> > > >> > > -- >> > > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > > -- Norbert Wiener >> > > >> > > https://www.cse.buffalo.edu/~knepley/ >> > > >> > > >> > > -- >> > > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > > -- Norbert Wiener >> > > >> > > https://www.cse.buffalo.edu/~knepley/ >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > -- Norbert Wiener >> > >> > https://www.cse.buffalo.edu/~knepley/ >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.ritwik98 at gmail.com Thu Nov 14 22:15:09 2019 From: s.ritwik98 at gmail.com (Ritwik Saha) Date: Thu, 14 Nov 2019 23:15:09 -0500 Subject: [petsc-users] Including Implementations in my code Message-ID: Hi All, PETSc provides various implementations of functions like VecAXPY() in CUDA. I am talking specifically about VecAXPY_SeqCUDA() in src/vec/vec/impls/seq/seqcuda/veccuda2.cu . How to I include these functions in my C code? Thanks in advance. Regards, Ritwik Saha -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Nov 14 22:49:35 2019 From: jed at jedbrown.org (Jed Brown) Date: Thu, 14 Nov 2019 21:49:35 -0700 Subject: [petsc-users] Including Implementations in my code In-Reply-To: References: Message-ID: <87d0dtg2w0.fsf@jedbrown.org> Ritwik Saha via petsc-users writes: > Hi All, > > PETSc provides various implementations of functions like VecAXPY() in CUDA. > I am talking specifically about VecAXPY_SeqCUDA() in > src/vec/vec/impls/seq/seqcuda/veccuda2.cu . How to I include these > functions in my C code? I'm not sure I follow. If you want to call those functions, set the VecType to VECCUDA. For many examples, this is done via the run-time option -dm_vec_type cuda (see many examples in the PETSc source tree). If you're trying to copy the implementation into your code without using PETSc, you're on your own. From hgbk2008 at gmail.com Fri Nov 15 08:34:48 2019 From: hgbk2008 at gmail.com (hg) Date: Fri, 15 Nov 2019 15:34:48 +0100 Subject: [petsc-users] solve problem with pastix In-Reply-To: References: <937C0613-989A-4DD6-8A33-BB947D5E7DEB@anl.gov> <1E28761A-883F-4B66-9BA8-8367881D5BCB@mcs.anl.gov> Message-ID: FYI, this problem is fixed, providing that hwloc is added to dependencies of Pastix. Giang On Thu, Nov 14, 2019 at 9:04 PM hg wrote: > Hello > > It turns out that hwloc is not installed on the cluster system that I'm > using. Without hwloc, pastix will run into the branch using > sched_setaffinity and caused error (see above at sopalin_thread.c). I'm not > able to understand and find a solution with sched_setaffinity so I think > enabling hwloc is an easier solution. Between, hwloc is recommended to > compile Pastix according to those threads: > > > https://gforge.inria.fr/forum/forum.php?thread_id=32824&forum_id=599&group_id=186 > https://solverstack.gitlabpages.inria.fr/pastix/Bindings.html > > hwloc is supported in PETSc so I assumed a clean and easy solution to > compile with --download-hwloc. I made some changes in > config/BuildSystem/config/packages/PaStiX.py to tell pastix to link to > hwloc: > > ... > self.hwloc = framework.require('config.packages.hwloc',self) > ... > if self.hwloc.found: > g.write('CCPASTIX := $(CCPASTIX) -DWITH_HWLOC > '+self.headers.toString(self.hwloc.include)+'\n') > g.write('EXTRALIB := $(EXTRALIB) > '+self.libraries.toString(self.hwloc.dlib)+'\n') > > But it does not compile: > > Possible ERROR while running linker: exit code 1 > stderr: > /opt/petsc-dev/lib/libpastix.a(pastix.o): In function `pastix_task_init': > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/pastix.c:822: > undefined reference to `hwloc_topology_init' > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/pastix.c:828: > undefined reference to `hwloc_topology_load' > /opt/petsc-dev/lib/libpastix.a(pastix.o): In function `pastix_task_clean': > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/pastix.c:4677: > undefined reference to `hwloc_topology_destroy' > /opt/petsc-dev/lib/libpastix.a(sopalin_thread.o): In function > `hwloc_get_obj_by_type': > /opt/petsc-dev/include/hwloc/inlines.h:76: undefined reference to > `hwloc_get_type_depth' > /opt/petsc-dev/include/hwloc/inlines.h:81: undefined reference to > `hwloc_get_obj_by_depth' > /opt/petsc-dev/lib/libpastix.a(sopalin_thread.o): In function > `sopalin_bindthread': > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:538: > undefined reference to `hwloc_bitmap_dup' > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:539: > undefined reference to `hwloc_bitmap_singlify' > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:543: > undefined reference to `hwloc_set_cpubind' > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:567: > undefined reference to `hwloc_bitmap_free' > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:548: > undefined reference to `hwloc_bitmap_asprintf' > > Any idea is appreciated. I can attach configure.log as needed. > > Giang > > > On Thu, Nov 7, 2019 at 12:18 AM hg wrote: > >> Hi Barry >> >> Maybe you're right, sched_setaffinity returns EINVAL in my case, Probably >> the scheduler does not allow the process to bind to thread on its own. >> >> Giang >> >> >> On Wed, Nov 6, 2019 at 4:52 PM Smith, Barry F. >> wrote: >> >>> >>> You can also just look at configure.log where it will show the calling >>> sequence of how PETSc configured and built Pastix. The recipe is in >>> config/BuildSystem/config/packages/PaStiX.py we don't monkey with low level >>> things like the affinity of external packages. My guess is that your >>> cluster system has inconsistent parts related to this, that one tool works >>> and another does not indicates they are inconsistent with respect to each >>> other in what they expect. >>> >>> Barry >>> >>> >>> >>> >>> > On Nov 6, 2019, at 4:02 AM, Matthew Knepley wrote: >>> > >>> > On Wed, Nov 6, 2019 at 4:40 AM hg wrote: >>> > Look into >>> arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c >>> I saw something like: >>> > >>> > #ifdef HAVE_OLD_SCHED_SETAFFINITY >>> > if(sched_setaffinity(0,&mask) < 0) >>> > #else /* HAVE_OLD_SCHED_SETAFFINITY */ >>> > if(sched_setaffinity(0,sizeof(mask),&mask) < 0) >>> > #endif /* HAVE_OLD_SCHED_SETAFFINITY */ >>> > { >>> > perror("sched_setaffinity"); >>> > EXIT(MOD_SOPALIN, INTERNAL_ERR); >>> > } >>> > >>> > Is there possibility that Petsc turn on HAVE_OLD_SCHED_SETAFFINITY >>> during compilation? >>> > >>> > May I know how to trigger re-compilation of external packages with >>> petsc? I may go in there and check what's going on. >>> > >>> > If we built it during configure, then you can just go to >>> > >>> > $PETSC_DIR/$PETSC_ARCH/externalpackages/*pastix*/ >>> > >>> > and rebuild/install it to test. If you want configure to do it, you >>> have to delete >>> > >>> > $PETSC_DIR/$PETSC_ARCH/lib/petsc/conf/pkg.conf.pastix >>> > >>> > and reconfigure. >>> > >>> > Thanks, >>> > >>> > Matt >>> > >>> > Giang >>> > >>> > >>> > On Wed, Nov 6, 2019 at 10:12 AM hg wrote: >>> > sched_setaffinity: Invalid argument only happens when I launch the job >>> with sbatch. Running without scheduler is fine. I think this has something >>> to do with pastix. >>> > >>> > Giang >>> > >>> > >>> > On Wed, Nov 6, 2019 at 4:37 AM Smith, Barry F. >>> wrote: >>> > >>> > Google finds this >>> https://gforge.inria.fr/forum/forum.php?thread_id=32824&forum_id=599&group_id=186 >>> > >>> > >>> > >>> > > On Nov 5, 2019, at 7:01 PM, Matthew Knepley via petsc-users < >>> petsc-users at mcs.anl.gov> wrote: >>> > > >>> > > I have no idea. That is a good question for the PasTix list. >>> > > >>> > > Thanks, >>> > > >>> > > Matt >>> > > >>> > > On Tue, Nov 5, 2019 at 5:32 PM hg wrote: >>> > > Should thread affinity be invoked? I set -mat_pastix_threadnbr 1 >>> and also OMP_NUM_THREADS to 1 >>> > > >>> > > Giang >>> > > >>> > > >>> > > On Tue, Nov 5, 2019 at 10:50 PM Matthew Knepley >>> wrote: >>> > > On Tue, Nov 5, 2019 at 4:11 PM hg via petsc-users < >>> petsc-users at mcs.anl.gov> wrote: >>> > > Hello >>> > > >>> > > I got crashed when using Pastix as solver for KSP. The error message >>> looks like: >>> > > >>> > > .... >>> > > NUMBER of BUBBLE 1 >>> > > COEFMAX 1735566 CPFTMAX 0 BPFTMAX 0 NBFTMAX 0 ARFTMAX 0 >>> > > ** End of Partition & Distribution phase ** >>> > > Time to analyze 0.225 s >>> > > Number of nonzeros in factorized matrix 708784076 >>> > > Fill-in 12.2337 >>> > > Number of operations (LU) 2.80185e+12 >>> > > Prediction Time to factorize (AMD 6180 MKL) 394 s >>> > > 0 : SolverMatrix size (without coefficients) 32.4 MB >>> > > 0 : Number of nonzeros (local block structure) 365309391 >>> > > Numerical Factorization (LU) : >>> > > 0 : Internal CSC size 1.08 GB >>> > > Time to fill internal csc 6.66 s >>> > > --- Sopalin : Allocation de la structure globale --- >>> > > --- Fin Sopalin Init --- >>> > > --- Initialisation des tableaux globaux --- >>> > > sched_setaffinity: Invalid argument >>> > > [node083:165071] *** Process received signal *** >>> > > [node083:165071] Signal: Aborted (6) >>> > > [node083:165071] Signal code: (-6) >>> > > [node083:165071] [ 0] /lib64/libpthread.so.0(+0xf680)[0x2b8081845680] >>> > > [node083:165071] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b8082191207] >>> > > [node083:165071] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b80821928f8] >>> > > [node083:165071] [ 3] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_comm+0x0)[0x2b80a4124c9d] >>> > > [node083:165071] [ 4] Launching 1 threads (1 commputation, 0 >>> communication, 0 out-of-core) >>> > > --- Sopalin : Local structure allocation --- >>> > > >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_sopalin_init_smp+0x29b)[0x2b80a40c39d2] >>> > > [node083:165071] [ 5] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_smp+0x68)[0x2b80a40cf4c2] >>> > > [node083:165071] [ 6] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_thread+0x4ba)[0x2b80a4124a31] >>> > > [node083:165071] [ 7] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_thread+0x94)[0x2b80a40d6170] >>> > > [node083:165071] [ 8] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_pastix_task_sopalin+0x5ad)[0x2b80a40b09a2] >>> > > [node083:165071] [ 9] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(d_pastix+0xa8a)[0x2b80a40b2325] >>> > > [node083:165071] [10] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0x63927b)[0x2b80a35bf27b] >>> > > [node083:165071] [11] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(MatLUFactorNumeric+0x19a)[0x2b80a32c7552] >>> > > [node083:165071] [12] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0xa46c09)[0x2b80a39ccc09] >>> > > [node083:165071] [13] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(PCSetUp+0x311)[0x2b80a3a8f1a9] >>> > > [node083:165071] [14] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSetUp+0xbf7)[0x2b80a3b46e81] >>> > > [node083:165071] [15] >>> /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSolve+0x210)[0x2b80a3b4746e] >>> > > >>> > > Does anyone have an idea what is the problem and how to fix it? The >>> PETSc parameters I used are as below: >>> > > >>> > > It looks like PasTix is having trouble setting the thread affinity: >>> > > >>> > > sched_setaffinity: Invalid argument >>> > > >>> > > so it may be your build of PasTix. >>> > > >>> > > Thanks, >>> > > >>> > > Matt >>> > > >>> > > -pc_type lu >>> > > -pc_factor_mat_solver_package pastix >>> > > -mat_pastix_verbose 2 >>> > > -mat_pastix_threadnbr 1 >>> > > >>> > > Giang >>> > > >>> > > >>> > > >>> > > -- >>> > > What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> > > -- Norbert Wiener >>> > > >>> > > https://www.cse.buffalo.edu/~knepley/ >>> > > >>> > > >>> > > -- >>> > > What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> > > -- Norbert Wiener >>> > > >>> > > https://www.cse.buffalo.edu/~knepley/ >>> > >>> > >>> > >>> > -- >>> > What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> > -- Norbert Wiener >>> > >>> > https://www.cse.buffalo.edu/~knepley/ >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From yjwu16 at gmail.com Sun Nov 17 09:25:40 2019 From: yjwu16 at gmail.com (Yingjie Wu) Date: Sun, 17 Nov 2019 23:25:40 +0800 Subject: [petsc-users] Question about changing time step during calculation Message-ID: Dear Petsc developers Hi, Recently I am trying to using TS to solve time-dependent nonlinear PDEs. In my program, next time step is based on the results of previous time step. I want to add this control in TSmonitor() to change time step length in calculation. I referred to user guide but I didn't find what I wanted. Please give me some advice. Thanks, Yingjie -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Sun Nov 17 17:32:26 2019 From: hongzhang at anl.gov (Zhang, Hong) Date: Sun, 17 Nov 2019 23:32:26 +0000 Subject: [petsc-users] Question about changing time step during calculation In-Reply-To: References: Message-ID: TSSetTimeStep() https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/TS/TSSetTimeStep.html#TSSetTimeStep If you want to decide the step size by yourself, make sure that the adaptivity is turned off, e.g., with -ts_adapt_type none Btw, have you tried the available TSAdapt types? Is there anything special in your problems so that none of these adaptors works? Hong (Mr.) On Nov 17, 2019, at 9:25 AM, Yingjie Wu via petsc-users > wrote: Dear Petsc developers Hi, Recently I am trying to using TS to solve time-dependent nonlinear PDEs. In my program, next time step is based on the results of previous time step. I want to add this control in TSmonitor() to change time step length in calculation. I referred to user guide but I didn't find what I wanted. Please give me some advice. Thanks, Yingjie -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sun Nov 17 23:24:51 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Mon, 18 Nov 2019 05:24:51 +0000 Subject: [petsc-users] Question about changing time step during calculation In-Reply-To: References: Message-ID: <3A83F643-C6CE-4010-92B7-BF8E2E945AEC@anl.gov> > On Nov 17, 2019, at 5:32 PM, Zhang, Hong via petsc-users wrote: > > TSSetTimeStep() > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/TS/TSSetTimeStep.html#TSSetTimeStep > > If you want to decide the step size by yourself, make sure that the adaptivity is turned off, e.g., with -ts_adapt_type none > > Btw, have you tried the available TSAdapt types? Is there anything special in your problems so that none of these adaptors works? It is also possible to register your own adaptor using TSAdaptRegister(). You can start writing you own by copying the basic adapter src/ts/adapt/impls/basic/adaptbasic.c; you can also look at the other adaptors for ideas. Note this is an advanced topic that only needs to be used when none of standard adapters are useful for your situation. Barry > > Hong (Mr.) > >> On Nov 17, 2019, at 9:25 AM, Yingjie Wu via petsc-users wrote: >> >> Dear Petsc developers >> Hi, >> Recently I am trying to using TS to solve time-dependent nonlinear PDEs. In my program, next time step is based on the results of previous time step. I want to add this control in TSmonitor() to change time step length in calculation. I referred to user guide but I didn't find what I wanted. Please give me some advice. >> >> Thanks, >> Yingjie > From repepo at gmail.com Tue Nov 19 04:40:22 2019 From: repepo at gmail.com (Santiago Andres Triana) Date: Tue, 19 Nov 2019 11:40:22 +0100 Subject: [petsc-users] problem downloading "fix-syntax-for-nag.tar.gx" Message-ID: Hello petsc-users: I found this error when configure tries to download fblaslapack: ******************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): ------------------------------------------------------------------------------- Error during download/extract/detection of FBLASLAPACK: file could not be opened successfully Downloaded package FBLASLAPACK from: https://bitbucket.org/petsc/pkg-fblaslapack/get/origin/barry/2019-08-22/fix-syntax-for-nag.tar.gz is not a tarball. [or installed python cannot process compressed files] * If you are behind a firewall - please fix your proxy and rerun ./configure For example at LANL you may need to set the environmental variable http_proxy (or HTTP_PROXY?) to http://proxyout.lanl.gov * You can run with --with-packages-download-dir=/adirectory and ./configure will instruct you what packages to download manually * or you can download the above URL manually, to /yourselectedlocation/fix-syntax-for-nag.tar.gz and use the configure option: --download-fblaslapack=/yourselectedlocation/fix-syntax-for-nag.tar.gz ******************************************************************************* Any ideas? the file in question doesn't seem to exist ... Thanks a lot in advance! Santiago -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Nov 19 05:49:22 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Tue, 19 Nov 2019 11:49:22 +0000 Subject: [petsc-users] problem downloading "fix-syntax-for-nag.tar.gx" In-Reply-To: References: Message-ID: <6FACC969-7AE8-411D-AE15-A9933A9C8A4E@anl.gov> For a while I had put in an incorrect URL in the download location. Perhaps you are using PETSc 3.12.0 and need to use 3.12.1 from https://www.mcs.anl.gov/petsc/download/index.html Otherwise please send configure.log > On Nov 19, 2019, at 4:40 AM, Santiago Andres Triana via petsc-users wrote: > > Hello petsc-users: > > I found this error when configure tries to download fblaslapack: > > ******************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > ------------------------------------------------------------------------------- > Error during download/extract/detection of FBLASLAPACK: > file could not be opened successfully > Downloaded package FBLASLAPACK from: https://bitbucket.org/petsc/pkg-fblaslapack/get/origin/barry/2019-08-22/fix-syntax-for-nag.tar.gz is not a tarball. > [or installed python cannot process compressed files] > * If you are behind a firewall - please fix your proxy and rerun ./configure > For example at LANL you may need to set the environmental variable http_proxy (or HTTP_PROXY?) to http://proxyout.lanl.gov > * You can run with --with-packages-download-dir=/adirectory and ./configure will instruct you what packages to download manually > * or you can download the above URL manually, to /yourselectedlocation/fix-syntax-for-nag.tar.gz > and use the configure option: > --download-fblaslapack=/yourselectedlocation/fix-syntax-for-nag.tar.gz > ******************************************************************************* > > > Any ideas? the file in question doesn't seem to exist ... Thanks a lot in advance! > > Santiago From bsmith at mcs.anl.gov Tue Nov 19 06:20:45 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Tue, 19 Nov 2019 12:20:45 +0000 Subject: [petsc-users] solve problem with pastix In-Reply-To: References: <937C0613-989A-4DD6-8A33-BB947D5E7DEB@anl.gov> <1E28761A-883F-4B66-9BA8-8367881D5BCB@mcs.anl.gov> Message-ID: <0DB2E0B0-6AB9-409A-A1F7-2A92BEF0915F@mcs.anl.gov> Thanks for the fix. https://gitlab.com/petsc/petsc/pipelines/96957999 > On Nov 14, 2019, at 2:04 PM, hg wrote: > > Hello > > It turns out that hwloc is not installed on the cluster system that I'm using. Without hwloc, pastix will run into the branch using sched_setaffinity and caused error (see above at sopalin_thread.c). I'm not able to understand and find a solution with sched_setaffinity so I think enabling hwloc is an easier solution. Between, hwloc is recommended to compile Pastix according to those threads: > > https://gforge.inria.fr/forum/forum.php?thread_id=32824&forum_id=599&group_id=186 > https://solverstack.gitlabpages.inria.fr/pastix/Bindings.html > > hwloc is supported in PETSc so I assumed a clean and easy solution to compile with --download-hwloc. I made some changes in config/BuildSystem/config/packages/PaStiX.py to tell pastix to link to hwloc: > > ... > self.hwloc = framework.require('config.packages.hwloc',self) > ... > if self.hwloc.found: > g.write('CCPASTIX := $(CCPASTIX) -DWITH_HWLOC '+self.headers.toString(self.hwloc.include)+'\n') > g.write('EXTRALIB := $(EXTRALIB) '+self.libraries.toString(self.hwloc.dlib)+'\n') > > But it does not compile: > > Possible ERROR while running linker: exit code 1 > stderr: > /opt/petsc-dev/lib/libpastix.a(pastix.o): In function `pastix_task_init': > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/pastix.c:822: undefined reference to `hwloc_topology_init' > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/pastix.c:828: undefined reference to `hwloc_topology_load' > /opt/petsc-dev/lib/libpastix.a(pastix.o): In function `pastix_task_clean': > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/pastix.c:4677: undefined reference to `hwloc_topology_destroy' > /opt/petsc-dev/lib/libpastix.a(sopalin_thread.o): In function `hwloc_get_obj_by_type': > /opt/petsc-dev/include/hwloc/inlines.h:76: undefined reference to `hwloc_get_type_depth' > /opt/petsc-dev/include/hwloc/inlines.h:81: undefined reference to `hwloc_get_obj_by_depth' > /opt/petsc-dev/lib/libpastix.a(sopalin_thread.o): In function `sopalin_bindthread': > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:538: undefined reference to `hwloc_bitmap_dup' > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:539: undefined reference to `hwloc_bitmap_singlify' > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:543: undefined reference to `hwloc_set_cpubind' > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:567: undefined reference to `hwloc_bitmap_free' > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:548: undefined reference to `hwloc_bitmap_asprintf' > > Any idea is appreciated. I can attach configure.log as needed. > > Giang > > > On Thu, Nov 7, 2019 at 12:18 AM hg wrote: > Hi Barry > > Maybe you're right, sched_setaffinity returns EINVAL in my case, Probably the scheduler does not allow the process to bind to thread on its own. > > Giang > > > On Wed, Nov 6, 2019 at 4:52 PM Smith, Barry F. wrote: > > You can also just look at configure.log where it will show the calling sequence of how PETSc configured and built Pastix. The recipe is in config/BuildSystem/config/packages/PaStiX.py we don't monkey with low level things like the affinity of external packages. My guess is that your cluster system has inconsistent parts related to this, that one tool works and another does not indicates they are inconsistent with respect to each other in what they expect. > > Barry > > > > > > On Nov 6, 2019, at 4:02 AM, Matthew Knepley wrote: > > > > On Wed, Nov 6, 2019 at 4:40 AM hg wrote: > > Look into arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c I saw something like: > > > > #ifdef HAVE_OLD_SCHED_SETAFFINITY > > if(sched_setaffinity(0,&mask) < 0) > > #else /* HAVE_OLD_SCHED_SETAFFINITY */ > > if(sched_setaffinity(0,sizeof(mask),&mask) < 0) > > #endif /* HAVE_OLD_SCHED_SETAFFINITY */ > > { > > perror("sched_setaffinity"); > > EXIT(MOD_SOPALIN, INTERNAL_ERR); > > } > > > > Is there possibility that Petsc turn on HAVE_OLD_SCHED_SETAFFINITY during compilation? > > > > May I know how to trigger re-compilation of external packages with petsc? I may go in there and check what's going on. > > > > If we built it during configure, then you can just go to > > > > $PETSC_DIR/$PETSC_ARCH/externalpackages/*pastix*/ > > > > and rebuild/install it to test. If you want configure to do it, you have to delete > > > > $PETSC_DIR/$PETSC_ARCH/lib/petsc/conf/pkg.conf.pastix > > > > and reconfigure. > > > > Thanks, > > > > Matt > > > > Giang > > > > > > On Wed, Nov 6, 2019 at 10:12 AM hg wrote: > > sched_setaffinity: Invalid argument only happens when I launch the job with sbatch. Running without scheduler is fine. I think this has something to do with pastix. > > > > Giang > > > > > > On Wed, Nov 6, 2019 at 4:37 AM Smith, Barry F. wrote: > > > > Google finds this https://gforge.inria.fr/forum/forum.php?thread_id=32824&forum_id=599&group_id=186 > > > > > > > > > On Nov 5, 2019, at 7:01 PM, Matthew Knepley via petsc-users wrote: > > > > > > I have no idea. That is a good question for the PasTix list. > > > > > > Thanks, > > > > > > Matt > > > > > > On Tue, Nov 5, 2019 at 5:32 PM hg wrote: > > > Should thread affinity be invoked? I set -mat_pastix_threadnbr 1 and also OMP_NUM_THREADS to 1 > > > > > > Giang > > > > > > > > > On Tue, Nov 5, 2019 at 10:50 PM Matthew Knepley wrote: > > > On Tue, Nov 5, 2019 at 4:11 PM hg via petsc-users wrote: > > > Hello > > > > > > I got crashed when using Pastix as solver for KSP. The error message looks like: > > > > > > .... > > > NUMBER of BUBBLE 1 > > > COEFMAX 1735566 CPFTMAX 0 BPFTMAX 0 NBFTMAX 0 ARFTMAX 0 > > > ** End of Partition & Distribution phase ** > > > Time to analyze 0.225 s > > > Number of nonzeros in factorized matrix 708784076 > > > Fill-in 12.2337 > > > Number of operations (LU) 2.80185e+12 > > > Prediction Time to factorize (AMD 6180 MKL) 394 s > > > 0 : SolverMatrix size (without coefficients) 32.4 MB > > > 0 : Number of nonzeros (local block structure) 365309391 > > > Numerical Factorization (LU) : > > > 0 : Internal CSC size 1.08 GB > > > Time to fill internal csc 6.66 s > > > --- Sopalin : Allocation de la structure globale --- > > > --- Fin Sopalin Init --- > > > --- Initialisation des tableaux globaux --- > > > sched_setaffinity: Invalid argument > > > [node083:165071] *** Process received signal *** > > > [node083:165071] Signal: Aborted (6) > > > [node083:165071] Signal code: (-6) > > > [node083:165071] [ 0] /lib64/libpthread.so.0(+0xf680)[0x2b8081845680] > > > [node083:165071] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b8082191207] > > > [node083:165071] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b80821928f8] > > > [node083:165071] [ 3] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_comm+0x0)[0x2b80a4124c9d] > > > [node083:165071] [ 4] Launching 1 threads (1 commputation, 0 communication, 0 out-of-core) > > > --- Sopalin : Local structure allocation --- > > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_sopalin_init_smp+0x29b)[0x2b80a40c39d2] > > > [node083:165071] [ 5] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_smp+0x68)[0x2b80a40cf4c2] > > > [node083:165071] [ 6] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_thread+0x4ba)[0x2b80a4124a31] > > > [node083:165071] [ 7] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_thread+0x94)[0x2b80a40d6170] > > > [node083:165071] [ 8] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_pastix_task_sopalin+0x5ad)[0x2b80a40b09a2] > > > [node083:165071] [ 9] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(d_pastix+0xa8a)[0x2b80a40b2325] > > > [node083:165071] [10] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0x63927b)[0x2b80a35bf27b] > > > [node083:165071] [11] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(MatLUFactorNumeric+0x19a)[0x2b80a32c7552] > > > [node083:165071] [12] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0xa46c09)[0x2b80a39ccc09] > > > [node083:165071] [13] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(PCSetUp+0x311)[0x2b80a3a8f1a9] > > > [node083:165071] [14] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSetUp+0xbf7)[0x2b80a3b46e81] > > > [node083:165071] [15] /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSolve+0x210)[0x2b80a3b4746e] > > > > > > Does anyone have an idea what is the problem and how to fix it? The PETSc parameters I used are as below: > > > > > > It looks like PasTix is having trouble setting the thread affinity: > > > > > > sched_setaffinity: Invalid argument > > > > > > so it may be your build of PasTix. > > > > > > Thanks, > > > > > > Matt > > > > > > -pc_type lu > > > -pc_factor_mat_solver_package pastix > > > -mat_pastix_verbose 2 > > > -mat_pastix_threadnbr 1 > > > > > > Giang > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > > -- Norbert Wiener > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > -- > > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > > -- Norbert Wiener > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > From yann.jobic at univ-amu.fr Tue Nov 19 10:06:46 2019 From: yann.jobic at univ-amu.fr (Yann Jobic) Date: Tue, 19 Nov 2019 17:06:46 +0100 Subject: [petsc-users] SLEPc GEVP for huge systems Message-ID: <4330c5e5-b40c-0e21-3be5-54a94a8e6a50@univ-amu.fr> Hi all, I'm trying to solve a huge generalize (unsymetric) eigen value problem with SLEPc + MUMPS. We actually failed to allocate the requested memory for MUMPS factorization (we tried BVVECS). We would like to know if there is an alternate iterative way of solving such problems. Thank you, Best regards, Yann From jroman at dsic.upv.es Tue Nov 19 10:25:16 2019 From: jroman at dsic.upv.es (Jose E. Roman) Date: Tue, 19 Nov 2019 17:25:16 +0100 Subject: [petsc-users] SLEPc GEVP for huge systems In-Reply-To: <4330c5e5-b40c-0e21-3be5-54a94a8e6a50@univ-amu.fr> References: <4330c5e5-b40c-0e21-3be5-54a94a8e6a50@univ-amu.fr> Message-ID: <29214E41-C20C-4B97-9469-958C9D859C82@dsic.upv.es> Are you getting an error from MUMPS or from BV? What is the error message you get? What is the size of the matrix? How many eigenvalues do you need to compute? In principle you can use any KSP+PC, see section 3.4.1 of the users manual. If you have a good preconditioner, then an alternative to Krylov methods is to use Davidson-type methods https://doi.org/10.1145/2543696 - in some cases these can be competitive. Jose > El 19 nov 2019, a las 17:06, Yann Jobic via petsc-users escribi?: > > Hi all, > I'm trying to solve a huge generalize (unsymetric) eigen value problem with SLEPc + MUMPS. We actually failed to allocate the requested memory for MUMPS factorization (we tried BVVECS). > We would like to know if there is an alternate iterative way of solving such problems. > Thank you, > Best regards, > Yann From mpovolot at purdue.edu Tue Nov 19 13:42:24 2019 From: mpovolot at purdue.edu (Povolotskyi, Mykhailo) Date: Tue, 19 Nov 2019 19:42:24 +0000 Subject: [petsc-users] petsc without MPI Message-ID: <01a221b4-795e-6624-71e4-75853a1bbf6e@purdue.edu> Hello, I'm trying to build PETSC without MPI. Even if I specify --with_mpi=0, the configuration script still activates MPI. I attach the configure.log. What am I doing wrong? Thank you, Michael. -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: configure.log URL: From balay at mcs.anl.gov Tue Nov 19 13:47:39 2019 From: balay at mcs.anl.gov (Balay, Satish) Date: Tue, 19 Nov 2019 19:47:39 +0000 Subject: [petsc-users] petsc without MPI In-Reply-To: <01a221b4-795e-6624-71e4-75853a1bbf6e@purdue.edu> References: <01a221b4-795e-6624-71e4-75853a1bbf6e@purdue.edu> Message-ID: On Tue, 19 Nov 2019, Povolotskyi, Mykhailo via petsc-users wrote: > Hello, > > I'm trying to build PETSC without MPI. > > Even if I specify --with_mpi=0, the configuration script still activates > MPI. > > I attach the configure.log. > > What am I doing wrong? The option is --with-mpi=0 Satish > > Thank you, > > Michael. > > From balay at mcs.anl.gov Tue Nov 19 13:51:51 2019 From: balay at mcs.anl.gov (Balay, Satish) Date: Tue, 19 Nov 2019 19:51:51 +0000 Subject: [petsc-users] petsc without MPI In-Reply-To: References: <01a221b4-795e-6624-71e4-75853a1bbf6e@purdue.edu> Message-ID: And I see from configure.log - you are using the correct option. >>>>>>> Configure Options: --configModules=PETSc.Configure --optionsModule=config.compilerOptions --with-scalar-type=real --with-x=0 --with-hdf5=0 --with-single-library=1 --with-shared-libraries=0 --with-log=0 --with-mpi=0 --with-clanguage=C++ --with-cxx-dialect=C++11 --CXXFLAGS="-fopenmp -fPIC" --CFLAGS="-fopenmp -fPIC" --with-fortran=0 --FFLAGS="-fopenmp -fPIC" --with-64-bit-indices=0 --with-debugging=0 --with-cc=gcc --with-fc=gfortran --with-cxx=g++ COPTFLAGS= CXXOPTFLAGS= FOPTFLAGS= --download-metis=0 --download-superlu_dist=0 --download-parmetis=0 --with-valgrind-dir=/apps/brown/valgrind/3.13.0_gcc-4.8.5 --download-mumps=1 --with-mumps-serial=1 --with-fortran-kernels=0 --with-blaslapack-lib="-Wl,-rpath,/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64 -L/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core " --with-blacs-lib=/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so --with-blacs-include=/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/include --with-scalapack=0 <<<<<<< And configure completed successfully. What issue are you encountering? Why do you think its activating MPI? Satish On Tue, 19 Nov 2019, Balay, Satish via petsc-users wrote: > On Tue, 19 Nov 2019, Povolotskyi, Mykhailo via petsc-users wrote: > > > Hello, > > > > I'm trying to build PETSC without MPI. > > > > Even if I specify --with_mpi=0, the configuration script still activates > > MPI. > > > > I attach the configure.log. > > > > What am I doing wrong? > > The option is --with-mpi=0 > > Satish > > > > > > Thank you, > > > > Michael. > > > > > From mpovolot at purdue.edu Tue Nov 19 13:51:42 2019 From: mpovolot at purdue.edu (Povolotskyi, Mykhailo) Date: Tue, 19 Nov 2019 19:51:42 +0000 Subject: [petsc-users] petsc without MPI In-Reply-To: References: <01a221b4-795e-6624-71e4-75853a1bbf6e@purdue.edu> Message-ID: <40a99637-47db-bb4b-d6c7-39a831f5ac5d@purdue.edu> On 11/19/2019 2:47 PM, Balay, Satish wrote: > On Tue, 19 Nov 2019, Povolotskyi, Mykhailo via petsc-users wrote: > >> Hello, >> >> I'm trying to build PETSC without MPI. >> >> Even if I specify --with_mpi=0, the configuration script still activates >> MPI. >> >> I attach the configure.log. >> >> What am I doing wrong? > The option is --with-mpi=0 > > Satish > > >> Thank you, >> >> Michael. >> >> Dear Satish, I actually used a correct one --with-mpi=0 (you can see the attached configuration log output, the e-mail had a mistake), but it did not work. Michael. From mpovolot at purdue.edu Tue Nov 19 13:53:38 2019 From: mpovolot at purdue.edu (Povolotskyi, Mykhailo) Date: Tue, 19 Nov 2019 19:53:38 +0000 Subject: [petsc-users] petsc without MPI In-Reply-To: References: <01a221b4-795e-6624-71e4-75853a1bbf6e@purdue.edu> Message-ID: <6925fced-aaf2-1ef9-866c-a0c415d24df6@purdue.edu> Why it did not work then? On 11/19/2019 2:51 PM, Balay, Satish wrote: > And I see from configure.log - you are using the correct option. > > Configure Options: --configModules=PETSc.Configure --optionsModule=config.compilerOptions --with-scalar-type=real --with-x=0 --with-hdf5=0 --with-single-library=1 --with-shared-libraries=0 --with-log=0 --with-mpi=0 --with-clanguage=C++ --with-cxx-dialect=C++11 --CXXFLAGS="-fopenmp -fPIC" --CFLAGS="-fopenmp -fPIC" --with-fortran=0 --FFLAGS="-fopenmp -fPIC" --with-64-bit-indices=0 --with-debugging=0 --with-cc=gcc --with-fc=gfortran --with-cxx=g++ COPTFLAGS= CXXOPTFLAGS= FOPTFLAGS= --download-metis=0 --download-superlu_dist=0 --download-parmetis=0 --with-valgrind-dir=/apps/brown/valgrind/3.13.0_gcc-4.8.5 --download-mumps=1 --with-mumps-serial=1 --with-fortran-kernels=0 --with-blaslapack-lib="-Wl,-rpath,/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64 -L/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core " --with-blacs-lib=/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so --with-blacs-include=/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/include --with-scalapack=0 > <<<<<<< > > And configure completed successfully. What issue are you encountering? Why do you think its activating MPI? > > Satish > > > On Tue, 19 Nov 2019, Balay, Satish via petsc-users wrote: > >> On Tue, 19 Nov 2019, Povolotskyi, Mykhailo via petsc-users wrote: >> >>> Hello, >>> >>> I'm trying to build PETSC without MPI. >>> >>> Even if I specify --with_mpi=0, the configuration script still activates >>> MPI. >>> >>> I attach the configure.log. >>> >>> What am I doing wrong? >> The option is --with-mpi=0 >> >> Satish >> >> >>> Thank you, >>> >>> Michael. >>> >>> From knepley at gmail.com Tue Nov 19 13:55:37 2019 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 19 Nov 2019 14:55:37 -0500 Subject: [petsc-users] petsc without MPI In-Reply-To: <6925fced-aaf2-1ef9-866c-a0c415d24df6@purdue.edu> References: <01a221b4-795e-6624-71e4-75853a1bbf6e@purdue.edu> <6925fced-aaf2-1ef9-866c-a0c415d24df6@purdue.edu> Message-ID: The log you sent has configure completely successfully. Please retry and send the log for a failed run. Thanks, Matt On Tue, Nov 19, 2019 at 2:53 PM Povolotskyi, Mykhailo via petsc-users < petsc-users at mcs.anl.gov> wrote: > Why it did not work then? > > On 11/19/2019 2:51 PM, Balay, Satish wrote: > > And I see from configure.log - you are using the correct option. > > > > Configure Options: --configModules=PETSc.Configure > --optionsModule=config.compilerOptions --with-scalar-type=real --with-x=0 > --with-hdf5=0 --with-single-library=1 --with-shared-libraries=0 > --with-log=0 --with-mpi=0 --with-clanguage=C++ --with-cxx-dialect=C++11 > --CXXFLAGS="-fopenmp -fPIC" --CFLAGS="-fopenmp -fPIC" --with-fortran=0 > --FFLAGS="-fopenmp -fPIC" --with-64-bit-indices=0 --with-debugging=0 > --with-cc=gcc --with-fc=gfortran --with-cxx=g++ COPTFLAGS= CXXOPTFLAGS= > FOPTFLAGS= --download-metis=0 --download-superlu_dist=0 > --download-parmetis=0 > --with-valgrind-dir=/apps/brown/valgrind/3.13.0_gcc-4.8.5 > --download-mumps=1 --with-mumps-serial=1 --with-fortran-kernels=0 > --with-blaslapack-lib="-Wl,-rpath,/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64 > -L/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64 > -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core " > --with-blacs-lib=/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so > --with-blacs-include=/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/include > --with-scalapack=0 > > <<<<<<< > > > > And configure completed successfully. What issue are you encountering? > Why do you think its activating MPI? > > > > Satish > > > > > > On Tue, 19 Nov 2019, Balay, Satish via petsc-users wrote: > > > >> On Tue, 19 Nov 2019, Povolotskyi, Mykhailo via petsc-users wrote: > >> > >>> Hello, > >>> > >>> I'm trying to build PETSC without MPI. > >>> > >>> Even if I specify --with_mpi=0, the configuration script still > activates > >>> MPI. > >>> > >>> I attach the configure.log. > >>> > >>> What am I doing wrong? > >> The option is --with-mpi=0 > >> > >> Satish > >> > >> > >>> Thank you, > >>> > >>> Michael. > >>> > >>> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mpovolot at purdue.edu Tue Nov 19 13:58:41 2019 From: mpovolot at purdue.edu (Povolotskyi, Mykhailo) Date: Tue, 19 Nov 2019 19:58:41 +0000 Subject: [petsc-users] petsc without MPI In-Reply-To: References: <01a221b4-795e-6624-71e4-75853a1bbf6e@purdue.edu> <6925fced-aaf2-1ef9-866c-a0c415d24df6@purdue.edu> Message-ID: Let me explain the problem. This log file has #ifndef PETSC_HAVE_MPI #define PETSC_HAVE_MPI 1 #endif while I need to have PETSC without MPI. On 11/19/2019 2:55 PM, Matthew Knepley wrote: The log you sent has configure completely successfully. Please retry and send the log for a failed run. Thanks, Matt On Tue, Nov 19, 2019 at 2:53 PM Povolotskyi, Mykhailo via petsc-users > wrote: Why it did not work then? On 11/19/2019 2:51 PM, Balay, Satish wrote: > And I see from configure.log - you are using the correct option. > > Configure Options: --configModules=PETSc.Configure --optionsModule=config.compilerOptions --with-scalar-type=real --with-x=0 --with-hdf5=0 --with-single-library=1 --with-shared-libraries=0 --with-log=0 --with-mpi=0 --with-clanguage=C++ --with-cxx-dialect=C++11 --CXXFLAGS="-fopenmp -fPIC" --CFLAGS="-fopenmp -fPIC" --with-fortran=0 --FFLAGS="-fopenmp -fPIC" --with-64-bit-indices=0 --with-debugging=0 --with-cc=gcc --with-fc=gfortran --with-cxx=g++ COPTFLAGS= CXXOPTFLAGS= FOPTFLAGS= --download-metis=0 --download-superlu_dist=0 --download-parmetis=0 --with-valgrind-dir=/apps/brown/valgrind/3.13.0_gcc-4.8.5 --download-mumps=1 --with-mumps-serial=1 --with-fortran-kernels=0 --with-blaslapack-lib="-Wl,-rpath,/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64 -L/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core " --with-blacs-lib=/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so --with-blacs-include=/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/include --with-scalapack=0 > <<<<<<< > > And configure completed successfully. What issue are you encountering? Why do you think its activating MPI? > > Satish > > > On Tue, 19 Nov 2019, Balay, Satish via petsc-users wrote: > >> On Tue, 19 Nov 2019, Povolotskyi, Mykhailo via petsc-users wrote: >> >>> Hello, >>> >>> I'm trying to build PETSC without MPI. >>> >>> Even if I specify --with_mpi=0, the configuration script still activates >>> MPI. >>> >>> I attach the configure.log. >>> >>> What am I doing wrong? >> The option is --with-mpi=0 >> >> Satish >> >> >>> Thank you, >>> >>> Michael. >>> >>> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 19 14:00:32 2019 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 19 Nov 2019 15:00:32 -0500 Subject: [petsc-users] petsc without MPI In-Reply-To: References: <01a221b4-795e-6624-71e4-75853a1bbf6e@purdue.edu> <6925fced-aaf2-1ef9-866c-a0c415d24df6@purdue.edu> Message-ID: On Tue, Nov 19, 2019 at 2:58 PM Povolotskyi, Mykhailo wrote: > Let me explain the problem. > > This log file has > > #ifndef PETSC_HAVE_MPI > #define PETSC_HAVE_MPI 1 > #endif > > while I need to have PETSC without MPI. > If you do not provide MPI, we provide MPIUNI. Do you see it linking to an MPI implementation, or using mpi.h? Matt > On 11/19/2019 2:55 PM, Matthew Knepley wrote: > > The log you sent has configure completely successfully. Please retry and > send the log for a failed run. > > Thanks, > > Matt > > On Tue, Nov 19, 2019 at 2:53 PM Povolotskyi, Mykhailo via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Why it did not work then? >> >> On 11/19/2019 2:51 PM, Balay, Satish wrote: >> > And I see from configure.log - you are using the correct option. >> > >> > Configure Options: --configModules=PETSc.Configure >> --optionsModule=config.compilerOptions --with-scalar-type=real --with-x=0 >> --with-hdf5=0 --with-single-library=1 --with-shared-libraries=0 >> --with-log=0 --with-mpi=0 --with-clanguage=C++ --with-cxx-dialect=C++11 >> --CXXFLAGS="-fopenmp -fPIC" --CFLAGS="-fopenmp -fPIC" --with-fortran=0 >> --FFLAGS="-fopenmp -fPIC" --with-64-bit-indices=0 --with-debugging=0 >> --with-cc=gcc --with-fc=gfortran --with-cxx=g++ COPTFLAGS= CXXOPTFLAGS= >> FOPTFLAGS= --download-metis=0 --download-superlu_dist=0 >> --download-parmetis=0 >> --with-valgrind-dir=/apps/brown/valgrind/3.13.0_gcc-4.8.5 >> --download-mumps=1 --with-mumps-serial=1 --with-fortran-kernels=0 >> --with-blaslapack-lib="-Wl,-rpath,/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64 >> -L/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64 >> -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core " >> --with-blacs-lib=/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so >> --with-blacs-include=/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/include >> --with-scalapack=0 >> > <<<<<<< >> > >> > And configure completed successfully. What issue are you encountering? >> Why do you think its activating MPI? >> > >> > Satish >> > >> > >> > On Tue, 19 Nov 2019, Balay, Satish via petsc-users wrote: >> > >> >> On Tue, 19 Nov 2019, Povolotskyi, Mykhailo via petsc-users wrote: >> >> >> >>> Hello, >> >>> >> >>> I'm trying to build PETSC without MPI. >> >>> >> >>> Even if I specify --with_mpi=0, the configuration script still >> activates >> >>> MPI. >> >>> >> >>> I attach the configure.log. >> >>> >> >>> What am I doing wrong? >> >> The option is --with-mpi=0 >> >> >> >> Satish >> >> >> >> >> >>> Thank you, >> >>> >> >>> Michael. >> >>> >> >>> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue Nov 19 14:07:16 2019 From: balay at mcs.anl.gov (Balay, Satish) Date: Tue, 19 Nov 2019 20:07:16 +0000 Subject: [petsc-users] petsc without MPI In-Reply-To: References: <01a221b4-795e-6624-71e4-75853a1bbf6e@purdue.edu> <6925fced-aaf2-1ef9-866c-a0c415d24df6@purdue.edu> Message-ID: Not sure why you are looking at this flag and interpreting it - PETSc code uses the flag PETSC_HAVE_MPIUNI to check for a sequential build. [this one states the module MPI similar to BLASLAPACK etc in configure is enabled] Satish On Tue, 19 Nov 2019, Povolotskyi, Mykhailo via petsc-users wrote: > Let me explain the problem. > > This log file has > > #ifndef PETSC_HAVE_MPI > #define PETSC_HAVE_MPI 1 > #endif > > while I need to have PETSC without MPI. > > On 11/19/2019 2:55 PM, Matthew Knepley wrote: > The log you sent has configure completely successfully. Please retry and send the log for a failed run. > > Thanks, > > Matt > > On Tue, Nov 19, 2019 at 2:53 PM Povolotskyi, Mykhailo via petsc-users > wrote: > Why it did not work then? > > On 11/19/2019 2:51 PM, Balay, Satish wrote: > > And I see from configure.log - you are using the correct option. > > > > Configure Options: --configModules=PETSc.Configure --optionsModule=config.compilerOptions --with-scalar-type=real --with-x=0 --with-hdf5=0 --with-single-library=1 --with-shared-libraries=0 --with-log=0 --with-mpi=0 --with-clanguage=C++ --with-cxx-dialect=C++11 --CXXFLAGS="-fopenmp -fPIC" --CFLAGS="-fopenmp -fPIC" --with-fortran=0 --FFLAGS="-fopenmp -fPIC" --with-64-bit-indices=0 --with-debugging=0 --with-cc=gcc --with-fc=gfortran --with-cxx=g++ COPTFLAGS= CXXOPTFLAGS= FOPTFLAGS= --download-metis=0 --download-superlu_dist=0 --download-parmetis=0 --with-valgrind-dir=/apps/brown/valgrind/3.13.0_gcc-4.8.5 --download-mumps=1 --with-mumps-serial=1 --with-fortran-kernels=0 --with-blaslapack-lib="-Wl,-rpath,/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64 -L/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core " --with-blacs-lib=/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so --with-blacs-include=/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/include --with-scalapack=0 > > <<<<<<< > > > > And configure completed successfully. What issue are you encountering? Why do you think its activating MPI? > > > > Satish > > > > > > On Tue, 19 Nov 2019, Balay, Satish via petsc-users wrote: > > > >> On Tue, 19 Nov 2019, Povolotskyi, Mykhailo via petsc-users wrote: > >> > >>> Hello, > >>> > >>> I'm trying to build PETSC without MPI. > >>> > >>> Even if I specify --with_mpi=0, the configuration script still activates > >>> MPI. > >>> > >>> I attach the configure.log. > >>> > >>> What am I doing wrong? > >> The option is --with-mpi=0 > >> > >> Satish > >> > >> > >>> Thank you, > >>> > >>> Michael. > >>> > >>> > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > From mpovolot at purdue.edu Tue Nov 19 14:07:45 2019 From: mpovolot at purdue.edu (Povolotskyi, Mykhailo) Date: Tue, 19 Nov 2019 20:07:45 +0000 Subject: [petsc-users] petsc without MPI In-Reply-To: References: <01a221b4-795e-6624-71e4-75853a1bbf6e@purdue.edu> <6925fced-aaf2-1ef9-866c-a0c415d24df6@purdue.edu> Message-ID: <7b2c1352-df81-16e8-080a-83da950e0ede@purdue.edu> I see. Actually, my goal is to compile petsc without real MPI to use it with libmesh. You are saying that PETSC_HAVE_MPI is not a sign that Petsc is built with MPI. It means you have MPIUNI which is a serial code, but has an interface of MPI. Correct? On 11/19/2019 3:00 PM, Matthew Knepley wrote: On Tue, Nov 19, 2019 at 2:58 PM Povolotskyi, Mykhailo > wrote: Let me explain the problem. This log file has #ifndef PETSC_HAVE_MPI #define PETSC_HAVE_MPI 1 #endif while I need to have PETSC without MPI. If you do not provide MPI, we provide MPIUNI. Do you see it linking to an MPI implementation, or using mpi.h? Matt On 11/19/2019 2:55 PM, Matthew Knepley wrote: The log you sent has configure completely successfully. Please retry and send the log for a failed run. Thanks, Matt On Tue, Nov 19, 2019 at 2:53 PM Povolotskyi, Mykhailo via petsc-users > wrote: Why it did not work then? On 11/19/2019 2:51 PM, Balay, Satish wrote: > And I see from configure.log - you are using the correct option. > > Configure Options: --configModules=PETSc.Configure --optionsModule=config.compilerOptions --with-scalar-type=real --with-x=0 --with-hdf5=0 --with-single-library=1 --with-shared-libraries=0 --with-log=0 --with-mpi=0 --with-clanguage=C++ --with-cxx-dialect=C++11 --CXXFLAGS="-fopenmp -fPIC" --CFLAGS="-fopenmp -fPIC" --with-fortran=0 --FFLAGS="-fopenmp -fPIC" --with-64-bit-indices=0 --with-debugging=0 --with-cc=gcc --with-fc=gfortran --with-cxx=g++ COPTFLAGS= CXXOPTFLAGS= FOPTFLAGS= --download-metis=0 --download-superlu_dist=0 --download-parmetis=0 --with-valgrind-dir=/apps/brown/valgrind/3.13.0_gcc-4.8.5 --download-mumps=1 --with-mumps-serial=1 --with-fortran-kernels=0 --with-blaslapack-lib="-Wl,-rpath,/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64 -L/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core " --with-blacs-lib=/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so --with-blacs-include=/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/include --with-scalapack=0 > <<<<<<< > > And configure completed successfully. What issue are you encountering? Why do you think its activating MPI? > > Satish > > > On Tue, 19 Nov 2019, Balay, Satish via petsc-users wrote: > >> On Tue, 19 Nov 2019, Povolotskyi, Mykhailo via petsc-users wrote: >> >>> Hello, >>> >>> I'm trying to build PETSC without MPI. >>> >>> Even if I specify --with_mpi=0, the configuration script still activates >>> MPI. >>> >>> I attach the configure.log. >>> >>> What am I doing wrong? >> The option is --with-mpi=0 >> >> Satish >> >> >>> Thank you, >>> >>> Michael. >>> >>> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mpovolot at purdue.edu Tue Nov 19 14:09:26 2019 From: mpovolot at purdue.edu (Povolotskyi, Mykhailo) Date: Tue, 19 Nov 2019 20:09:26 +0000 Subject: [petsc-users] petsc without MPI In-Reply-To: References: <01a221b4-795e-6624-71e4-75853a1bbf6e@purdue.edu> <6925fced-aaf2-1ef9-866c-a0c415d24df6@purdue.edu> Message-ID: <574993b1-eb17-d56d-5a19-92289131a896@purdue.edu> Thank you. It is clear now. On 11/19/2019 3:07 PM, Balay, Satish wrote: > Not sure why you are looking at this flag and interpreting it - PETSc code uses the flag PETSC_HAVE_MPIUNI to check for a sequential build. > > [this one states the module MPI similar to BLASLAPACK etc in configure is enabled] > > Satish > > On Tue, 19 Nov 2019, Povolotskyi, Mykhailo via petsc-users wrote: > >> Let me explain the problem. >> >> This log file has >> >> #ifndef PETSC_HAVE_MPI >> #define PETSC_HAVE_MPI 1 >> #endif >> >> while I need to have PETSC without MPI. >> >> On 11/19/2019 2:55 PM, Matthew Knepley wrote: >> The log you sent has configure completely successfully. Please retry and send the log for a failed run. >> >> Thanks, >> >> Matt >> >> On Tue, Nov 19, 2019 at 2:53 PM Povolotskyi, Mykhailo via petsc-users > wrote: >> Why it did not work then? >> >> On 11/19/2019 2:51 PM, Balay, Satish wrote: >>> And I see from configure.log - you are using the correct option. >>> >>> Configure Options: --configModules=PETSc.Configure --optionsModule=config.compilerOptions --with-scalar-type=real --with-x=0 --with-hdf5=0 --with-single-library=1 --with-shared-libraries=0 --with-log=0 --with-mpi=0 --with-clanguage=C++ --with-cxx-dialect=C++11 --CXXFLAGS="-fopenmp -fPIC" --CFLAGS="-fopenmp -fPIC" --with-fortran=0 --FFLAGS="-fopenmp -fPIC" --with-64-bit-indices=0 --with-debugging=0 --with-cc=gcc --with-fc=gfortran --with-cxx=g++ COPTFLAGS= CXXOPTFLAGS= FOPTFLAGS= --download-metis=0 --download-superlu_dist=0 --download-parmetis=0 --with-valgrind-dir=/apps/brown/valgrind/3.13.0_gcc-4.8.5 --download-mumps=1 --with-mumps-serial=1 --with-fortran-kernels=0 --with-blaslapack-lib="-Wl,-rpath,/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64 -L/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core " --with-blacs-lib=/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so --with-blacs-include=/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/include --with-scalapack=0 >>> <<<<<<< >>> >>> And configure completed successfully. What issue are you encountering? Why do you think its activating MPI? >>> >>> Satish >>> >>> >>> On Tue, 19 Nov 2019, Balay, Satish via petsc-users wrote: >>> >>>> On Tue, 19 Nov 2019, Povolotskyi, Mykhailo via petsc-users wrote: >>>> >>>>> Hello, >>>>> >>>>> I'm trying to build PETSC without MPI. >>>>> >>>>> Even if I specify --with_mpi=0, the configuration script still activates >>>>> MPI. >>>>> >>>>> I attach the configure.log. >>>>> >>>>> What am I doing wrong? >>>> The option is --with-mpi=0 >>>> >>>> Satish >>>> >>>> >>>>> Thank you, >>>>> >>>>> Michael. >>>>> >>>>> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> From yann.jobic at univ-amu.fr Tue Nov 19 15:05:25 2019 From: yann.jobic at univ-amu.fr (Yann Jobic) Date: Tue, 19 Nov 2019 22:05:25 +0100 Subject: [petsc-users] SLEPc GEVP for huge systems In-Reply-To: <29214E41-C20C-4B97-9469-958C9D859C82@dsic.upv.es> References: <4330c5e5-b40c-0e21-3be5-54a94a8e6a50@univ-amu.fr> <29214E41-C20C-4B97-9469-958C9D859C82@dsic.upv.es> Message-ID: <365c0629-caae-0f1f-0fc0-4dc61335dc4d@univ-amu.fr> Thanks for the fast answer ! The error coming from MUMPS is : On return from DMUMPS, INFOG(1)= -9 On return from DMUMPS, INFOG(2)= 29088157 The matrix size : 4972410*4972410 I need only 1 eigen value, the one near zero. In order to have more precision, i put ncv at 500. I'm using : -eps_gen_non_hermitian -st_type sinvert -eps_target 0.1 -eps_ncv 500 -eps_tol 1e-9 -bv_type vecs I'm doing linear stability analysis. I'm looking at eigen values near zero, and if the first one is positive or negative. The mass matrix is ill conditioned. On a smaller matrix, it seems that using KSP without a preconditioner gives satisfactory results. With a PC, it diverges. Number of iterations of the method: 1 Number of linear iterations of the method: 1000 Solution method: krylovschur Number of requested eigenvalues: 1 Stopping condition: tol=1e-08, maxit=711 Linear eigensolve converged (14 eigenpairs) due to CONVERGED_TOL; iterations 1 ---------------------- -------------------- k ||Ax-kBx||/||kx|| ---------------------- -------------------- 0.000005+0.016787i 7.87928e-07 0.000005-0.016787i 7.87928e-07 -0.001781 1.11832e-05 -0.001802 0.00274427 [...] I'm trying that on the big one. Thanks for your help, Yann Le 11/19/2019 ? 5:25 PM, Jose E. Roman a ?crit?: > Are you getting an error from MUMPS or from BV? What is the error message you get? What is the size of the matrix? How many eigenvalues do you need to compute? > > In principle you can use any KSP+PC, see section 3.4.1 of the users manual. If you have a good preconditioner, then an alternative to Krylov methods is to use Davidson-type methods https://doi.org/10.1145/2543696 - in some cases these can be competitive. > > Jose > > >> El 19 nov 2019, a las 17:06, Yann Jobic via petsc-users escribi?: >> >> Hi all, >> I'm trying to solve a huge generalize (unsymetric) eigen value problem with SLEPc + MUMPS. We actually failed to allocate the requested memory for MUMPS factorization (we tried BVVECS). >> We would like to know if there is an alternate iterative way of solving such problems. >> Thank you, >> Best regards, >> Yann > From jroman at dsic.upv.es Wed Nov 20 08:22:46 2019 From: jroman at dsic.upv.es (Jose E. Roman) Date: Wed, 20 Nov 2019 15:22:46 +0100 Subject: [petsc-users] SLEPc GEVP for huge systems In-Reply-To: <365c0629-caae-0f1f-0fc0-4dc61335dc4d@univ-amu.fr> References: <4330c5e5-b40c-0e21-3be5-54a94a8e6a50@univ-amu.fr> <29214E41-C20C-4B97-9469-958C9D859C82@dsic.upv.es> <365c0629-caae-0f1f-0fc0-4dc61335dc4d@univ-amu.fr> Message-ID: <33A81620-3CF9-4E16-A51A-8AA048948859@dsic.upv.es> > El 19 nov 2019, a las 22:05, Yann Jobic escribi?: > > Thanks for the fast answer ! > The error coming from MUMPS is : > On return from DMUMPS, INFOG(1)= -9 > On return from DMUMPS, INFOG(2)= 29088157 > The matrix size : 4972410*4972410 You may want to try running with -mat_mumps_icntl_14 200 > I need only 1 eigen value, the one near zero. > In order to have more precision, i put ncv at 500. > I'm using : -eps_gen_non_hermitian -st_type sinvert -eps_target 0.1 -eps_ncv 500 -eps_tol 1e-9 -bv_type vecs BVVECS is going to be slower, if the default BV gives a memory error I would suggest using BVMAT. > > I'm doing linear stability analysis. I'm looking at eigen values near zero, and if the first one is positive or negative. > The mass matrix is ill conditioned. On a smaller matrix, it seems that using KSP without a preconditioner gives satisfactory results. With a PC, it diverges. > > Number of iterations of the method: 1 > Number of linear iterations of the method: 1000 > Solution method: krylovschur > > Number of requested eigenvalues: 1 > Stopping condition: tol=1e-08, maxit=711 > Linear eigensolve converged (14 eigenpairs) due to CONVERGED_TOL; iterations 1 > ---------------------- -------------------- > k ||Ax-kBx||/||kx|| > ---------------------- -------------------- > 0.000005+0.016787i 7.87928e-07 > 0.000005-0.016787i 7.87928e-07 > -0.001781 1.11832e-05 > -0.001802 0.00274427 > [...] By default the KSP tolerance is equal to the EPS tolerance. You may need to reduce the KSP tolerance, e.g. -st_ksp_rtol 1e-9 Jose > > I'm trying that on the big one. > > Thanks for your help, > > Yann > > > Le 11/19/2019 ? 5:25 PM, Jose E. Roman a ?crit : >> Are you getting an error from MUMPS or from BV? What is the error message you get? What is the size of the matrix? How many eigenvalues do you need to compute? >> In principle you can use any KSP+PC, see section 3.4.1 of the users manual. If you have a good preconditioner, then an alternative to Krylov methods is to use Davidson-type methods https://doi.org/10.1145/2543696 - in some cases these can be competitive. >> Jose >>> El 19 nov 2019, a las 17:06, Yann Jobic via petsc-users escribi?: >>> >>> Hi all, >>> I'm trying to solve a huge generalize (unsymetric) eigen value problem with SLEPc + MUMPS. We actually failed to allocate the requested memory for MUMPS factorization (we tried BVVECS). >>> We would like to know if there is an alternate iterative way of solving such problems. >>> Thank you, >>> Best regards, >>> Yann From yann.jobic at univ-amu.fr Wed Nov 20 11:35:26 2019 From: yann.jobic at univ-amu.fr (Yann Jobic) Date: Wed, 20 Nov 2019 18:35:26 +0100 Subject: [petsc-users] SLEPc GEVP for huge systems In-Reply-To: <33A81620-3CF9-4E16-A51A-8AA048948859@dsic.upv.es> References: <4330c5e5-b40c-0e21-3be5-54a94a8e6a50@univ-amu.fr> <29214E41-C20C-4B97-9469-958C9D859C82@dsic.upv.es> <365c0629-caae-0f1f-0fc0-4dc61335dc4d@univ-amu.fr> <33A81620-3CF9-4E16-A51A-8AA048948859@dsic.upv.es> Message-ID: <344ebf39-11f3-5d52-e177-57812a4cba2c@univ-amu.fr> Hi Jose, My matrices were not correct... It's now running fine, with mumps. Thanks for the help, Best regards, Yann On 20/11/2019 15:22, Jose E. Roman wrote: > > >> El 19 nov 2019, a las 22:05, Yann Jobic escribi?: >> >> Thanks for the fast answer ! >> The error coming from MUMPS is : >> On return from DMUMPS, INFOG(1)= -9 >> On return from DMUMPS, INFOG(2)= 29088157 >> The matrix size : 4972410*4972410 > > You may want to try running with -mat_mumps_icntl_14 200 > >> I need only 1 eigen value, the one near zero. >> In order to have more precision, i put ncv at 500. >> I'm using : -eps_gen_non_hermitian -st_type sinvert -eps_target 0.1 -eps_ncv 500 -eps_tol 1e-9 -bv_type vecs > > BVVECS is going to be slower, if the default BV gives a memory error I would suggest using BVMAT. > >> >> I'm doing linear stability analysis. I'm looking at eigen values near zero, and if the first one is positive or negative. >> The mass matrix is ill conditioned. On a smaller matrix, it seems that using KSP without a preconditioner gives satisfactory results. With a PC, it diverges. >> >> Number of iterations of the method: 1 >> Number of linear iterations of the method: 1000 >> Solution method: krylovschur >> >> Number of requested eigenvalues: 1 >> Stopping condition: tol=1e-08, maxit=711 >> Linear eigensolve converged (14 eigenpairs) due to CONVERGED_TOL; iterations 1 >> ---------------------- -------------------- >> k ||Ax-kBx||/||kx|| >> ---------------------- -------------------- >> 0.000005+0.016787i 7.87928e-07 >> 0.000005-0.016787i 7.87928e-07 >> -0.001781 1.11832e-05 >> -0.001802 0.00274427 >> [...] > > By default the KSP tolerance is equal to the EPS tolerance. You may need to reduce the KSP tolerance, e.g. -st_ksp_rtol 1e-9 > > Jose > >> >> I'm trying that on the big one. >> >> Thanks for your help, >> >> Yann >> >> >> Le 11/19/2019 ? 5:25 PM, Jose E. Roman a ?crit : >>> Are you getting an error from MUMPS or from BV? What is the error message you get? What is the size of the matrix? How many eigenvalues do you need to compute? >>> In principle you can use any KSP+PC, see section 3.4.1 of the users manual. If you have a good preconditioner, then an alternative to Krylov methods is to use Davidson-type methods https://doi.org/10.1145/2543696 - in some cases these can be competitive. >>> Jose >>>> El 19 nov 2019, a las 17:06, Yann Jobic via petsc-users escribi?: >>>> >>>> Hi all, >>>> I'm trying to solve a huge generalize (unsymetric) eigen value problem with SLEPc + MUMPS. We actually failed to allocate the requested memory for MUMPS factorization (we tried BVVECS). >>>> We would like to know if there is an alternate iterative way of solving such problems. >>>> Thank you, >>>> Best regards, >>>> Yann > From perceval.desforges at polytechnique.edu Thu Nov 21 11:13:52 2019 From: perceval.desforges at polytechnique.edu (Perceval Desforges) Date: Thu, 21 Nov 2019 18:13:52 +0100 Subject: [petsc-users] Memory optimization Message-ID: <7e97478cd26eb88d653bfd7617e92f18@polytechnique.edu> Hello all, I am trying to obtain all the eigenvalues in a certain interval for a fairly large matrix (1000000 * 1000000). I therefore use the spectrum slicing method detailed in section 3.4.5 of the manual. The calculations are run on a processor with 20 cores and 96 Go of RAM. The options I use are : -bv_type vecs -eps_krylovschur_detect_zeros 1 -mat_mumps_icntl_13 1 -mat_mumps_icntl_24 1 -mat_mumps_cntl_3 1e-12 However the program quickly crashes with this error: slurmstepd: error: Step 2115.0 exceeded virtual memory limit (312121084 > 107317760), being killed I've tried reducing the amount of memory used by slepc with the -mat_mumps_icntl_14 option by setting it at -70 for example but then I get this error: [1]PETSC ERROR: Error in external library [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=82733614 which is an error due to setting the mumps icntl option so low from what I've gathered. Is there any other way I can reduce memory usage? Thanks, Regards, Perceval, P.S. I sent the same email a few minutes ago but I think I made a mistake in the address, I'm sorry if I've sent it twice. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Thu Nov 21 11:39:43 2019 From: jroman at dsic.upv.es (Jose E. Roman) Date: Thu, 21 Nov 2019 18:39:43 +0100 Subject: [petsc-users] Memory optimization In-Reply-To: <7e97478cd26eb88d653bfd7617e92f18@polytechnique.edu> References: <7e97478cd26eb88d653bfd7617e92f18@polytechnique.edu> Message-ID: Don't use -mat_mumps_icntl_14 to reduce the memory used by MUMPS. Most likely the problem is that the interval you gave is too large and contains too many eigenvalues (SLEPc needs to allocate at least one vector per each eigenvalue). You can count the eigenvalues in the interval with the inertias, which are available at EPSSetUp (no need to call EPSSolve). See this example: http://slepc.upv.es/documentation/current/src/eps/examples/tutorials/ex25.c.html You can comment out the call to EPSSolve() and run with the option -show_inertias For example, the output Shift 0.1 Inertia 3 Shift 0.35 Inertia 11 means that the interval [0.1,0.35] contains 8 eigenvalues (=11-3). By the way, I would suggest using BVMAT instead of BVVECS (the latter is slower). Jose > El 21 nov 2019, a las 18:13, Perceval Desforges via petsc-users escribi?: > > Hello all, > > I am trying to obtain all the eigenvalues in a certain interval for a fairly large matrix (1000000 * 1000000). I therefore use the spectrum slicing method detailed in section 3.4.5 of the manual. The calculations are run on a processor with 20 cores and 96 Go of RAM. > > The options I use are : > > -bv_type vecs -eps_krylovschur_detect_zeros 1 -mat_mumps_icntl_13 1 -mat_mumps_icntl_24 1 -mat_mumps_cntl_3 1e-12 > > > > However the program quickly crashes with this error: > > slurmstepd: error: Step 2115.0 exceeded virtual memory limit (312121084 > 107317760), being killed > > I've tried reducing the amount of memory used by slepc with the -mat_mumps_icntl_14 option by setting it at -70 for example but then I get this error: > > [1]PETSC ERROR: Error in external library > [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=82733614 > > which is an error due to setting the mumps icntl option so low from what I've gathered. > > Is there any other way I can reduce memory usage? > > > > Thanks, > > Regards, > > Perceval, > > > > P.S. I sent the same email a few minutes ago but I think I made a mistake in the address, I'm sorry if I've sent it twice. > From perceval.desforges at polytechnique.edu Fri Nov 22 12:56:31 2019 From: perceval.desforges at polytechnique.edu (Perceval Desforges) Date: Fri, 22 Nov 2019 19:56:31 +0100 Subject: [petsc-users] Memory optimization In-Reply-To: References: <7e97478cd26eb88d653bfd7617e92f18@polytechnique.edu> Message-ID: <852862500ebf52db0edde47a63ce8ae7@polytechnique.edu> Hi, Thanks for your answer. I tried looking at the inertias before solving, but the problem is that the program crashes when I call EPSSetUp with this error: slurmstepd: error: Step 2140.0 exceeded virtual memory limit (313526508 > 107317760), being killed I get this error even when there are no eigenvalues in the interval. I've started using BVMAT instead of BVVECS by the way. Thanks, Perceval, > Don't use -mat_mumps_icntl_14 to reduce the memory used by MUMPS. > > Most likely the problem is that the interval you gave is too large and contains too many eigenvalues (SLEPc needs to allocate at least one vector per each eigenvalue). You can count the eigenvalues in the interval with the inertias, which are available at EPSSetUp (no need to call EPSSolve). See this example: > http://slepc.upv.es/documentation/current/src/eps/examples/tutorials/ex25.c.html > You can comment out the call to EPSSolve() and run with the option -show_inertias > For example, the output > Shift 0.1 Inertia 3 > Shift 0.35 Inertia 11 > means that the interval [0.1,0.35] contains 8 eigenvalues (=11-3). > > By the way, I would suggest using BVMAT instead of BVVECS (the latter is slower). > > Jose > >> El 21 nov 2019, a las 18:13, Perceval Desforges via petsc-users escribi?: >> >> Hello all, >> >> I am trying to obtain all the eigenvalues in a certain interval for a fairly large matrix (1000000 * 1000000). I therefore use the spectrum slicing method detailed in section 3.4.5 of the manual. The calculations are run on a processor with 20 cores and 96 Go of RAM. >> >> The options I use are : >> >> -bv_type vecs -eps_krylovschur_detect_zeros 1 -mat_mumps_icntl_13 1 -mat_mumps_icntl_24 1 -mat_mumps_cntl_3 1e-12 >> >> However the program quickly crashes with this error: >> >> slurmstepd: error: Step 2115.0 exceeded virtual memory limit (312121084 > 107317760), being killed >> >> I've tried reducing the amount of memory used by slepc with the -mat_mumps_icntl_14 option by setting it at -70 for example but then I get this error: >> >> [1]PETSC ERROR: Error in external library >> [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=82733614 >> >> which is an error due to setting the mumps icntl option so low from what I've gathered. >> >> Is there any other way I can reduce memory usage? >> >> Thanks, >> >> Regards, >> >> Perceval, >> >> P.S. I sent the same email a few minutes ago but I think I made a mistake in the address, I'm sorry if I've sent it twice. -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Nov 22 15:51:37 2019 From: balay at mcs.anl.gov (Balay, Satish) Date: Fri, 22 Nov 2019 21:51:37 +0000 Subject: [petsc-users] petsc-3.12.2.tar.gz now available Message-ID: Dear PETSc users, The patch release petsc-3.12.2 is now available for download, with change list at 'PETSc-3.12 Changelog' http://www.mcs.anl.gov/petsc/download/index.html Satish From ztdepyahoo at gmail.com Sun Nov 24 17:22:08 2019 From: ztdepyahoo at gmail.com (Peng Ding) Date: Mon, 25 Nov 2019 07:22:08 +0800 Subject: [petsc-users] Fwd: how to set the matrix with the new cell ordering with metis In-Reply-To: <301A0E4B-E95D-4746-9CFF-A64084D27CA5@gmail.com> References: <301A0E4B-E95D-4746-9CFF-A64084D27CA5@gmail.com> Message-ID: Dear sir: I generate the 3D mesh with Tetgen for FVM computation. I reordered the cells and partitioned them with metis. Then i got an array which records the distribution of each cells. For example, on CPU -0., I have: 1, 3, 6 , 7 ....11, 22...... But all these cells have non-continuos index, how to set the matrix in petsc. Regards ztdepyahoo ztdepyahoo at gmail.com ??? ?????? ?? -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Nov 24 17:29:49 2019 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 24 Nov 2019 17:29:49 -0600 Subject: [petsc-users] Fwd: how to set the matrix with the new cell ordering with metis In-Reply-To: References: <301A0E4B-E95D-4746-9CFF-A64084D27CA5@gmail.com> Message-ID: On Sun, Nov 24, 2019 at 5:23 PM Peng Ding wrote: > > Dear sir: > I generate the 3D mesh with Tetgen for FVM computation. I reordered > the cells and partitioned them with metis. Then i got an array which > records the distribution of each cells. For example, on CPU -0., I have: > 1, 3, 6 , 7 ....11, 22...... > But all these cells have non-continuos index, how to set the matrix in > petsc. > You will have to renumber them. Thanks, Matt > Regards > > > ztdepyahoo > ztdepyahoo at gmail.com > > > ??? ?????? ?? > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sun Nov 24 21:40:04 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Mon, 25 Nov 2019 03:40:04 +0000 Subject: [petsc-users] how to set the matrix with the new cell ordering with metis In-Reply-To: References: <301A0E4B-E95D-4746-9CFF-A64084D27CA5@gmail.com> Message-ID: <8F373FC9-58BC-4AA1-86FA-BA440469C537@anl.gov> You can possibly use the PETSc object AO (see AOCreate()) to manage the reordering. The non-contiguous order you start with is the application ordering and the new contiguous ordering is the petsc ordering. Note you will likely need to reorder the cell vertex or edge numbers as well. Barry > On Nov 24, 2019, at 5:29 PM, Matthew Knepley wrote: > > On Sun, Nov 24, 2019 at 5:23 PM Peng Ding wrote: > > Dear sir: > I generate the 3D mesh with Tetgen for FVM computation. I reordered the cells and partitioned them with metis. Then i got an array which records the distribution of each cells. For example, on CPU -0., I have: > 1, 3, 6 , 7 ....11, 22...... > But all these cells have non-continuos index, how to set the matrix in petsc. > > You will have to renumber them. > > Thanks, > > Matt > > Regards > > > > ztdepyahoo > ztdepyahoo at gmail.com > ??? ?????? ?? > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From jroman at dsic.upv.es Mon Nov 25 02:44:12 2019 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 25 Nov 2019 09:44:12 +0100 Subject: [petsc-users] Memory optimization In-Reply-To: <852862500ebf52db0edde47a63ce8ae7@polytechnique.edu> References: <7e97478cd26eb88d653bfd7617e92f18@polytechnique.edu> <852862500ebf52db0edde47a63ce8ae7@polytechnique.edu> Message-ID: <9177FF06-92F4-40E4-8CF6-4A2995C17072@dsic.upv.es> Then I guess it is the factorization that is failing. How many nonzero entries do you have? Run with -mat_view ::ascii_info Jose > El 22 nov 2019, a las 19:56, Perceval Desforges escribi?: > > Hi, > > Thanks for your answer. I tried looking at the inertias before solving, but the problem is that the program crashes when I call EPSSetUp with this error: > > slurmstepd: error: Step 2140.0 exceeded virtual memory limit (313526508 > 107317760), being killed > > I get this error even when there are no eigenvalues in the interval. > > I've started using BVMAT instead of BVVECS by the way. > > Thanks, > > Perceval, > > > > > >> Don't use -mat_mumps_icntl_14 to reduce the memory used by MUMPS. >> >> Most likely the problem is that the interval you gave is too large and contains too many eigenvalues (SLEPc needs to allocate at least one vector per each eigenvalue). You can count the eigenvalues in the interval with the inertias, which are available at EPSSetUp (no need to call EPSSolve). See this example: >> http://slepc.upv.es/documentation/current/src/eps/examples/tutorials/ex25.c.html >> You can comment out the call to EPSSolve() and run with the option -show_inertias >> For example, the output >> Shift 0.1 Inertia 3 >> Shift 0.35 Inertia 11 >> means that the interval [0.1,0.35] contains 8 eigenvalues (=11-3). >> >> By the way, I would suggest using BVMAT instead of BVVECS (the latter is slower). >> >> Jose >> >> >>> El 21 nov 2019, a las 18:13, Perceval Desforges via petsc-users escribi?: >>> >>> Hello all, >>> >>> I am trying to obtain all the eigenvalues in a certain interval for a fairly large matrix (1000000 * 1000000). I therefore use the spectrum slicing method detailed in section 3.4.5 of the manual. The calculations are run on a processor with 20 cores and 96 Go of RAM. >>> >>> The options I use are : >>> >>> -bv_type vecs -eps_krylovschur_detect_zeros 1 -mat_mumps_icntl_13 1 -mat_mumps_icntl_24 1 -mat_mumps_cntl_3 1e-12 >>> >>> >>> >>> However the program quickly crashes with this error: >>> >>> slurmstepd: error: Step 2115.0 exceeded virtual memory limit (312121084 > 107317760), being killed >>> >>> I've tried reducing the amount of memory used by slepc with the -mat_mumps_icntl_14 option by setting it at -70 for example but then I get this error: >>> >>> [1]PETSC ERROR: Error in external library >>> [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=82733614 >>> >>> which is an error due to setting the mumps icntl option so low from what I've gathered. >>> >>> Is there any other way I can reduce memory usage? >>> >>> >>> >>> Thanks, >>> >>> Regards, >>> >>> Perceval, >>> >>> >>> >>> P.S. I sent the same email a few minutes ago but I think I made a mistake in the address, I'm sorry if I've sent it twice. > > From perceval.desforges at polytechnique.edu Mon Nov 25 11:20:24 2019 From: perceval.desforges at polytechnique.edu (Perceval Desforges) Date: Mon, 25 Nov 2019 18:20:24 +0100 Subject: [petsc-users] Memory optimization In-Reply-To: <9177FF06-92F4-40E4-8CF6-4A2995C17072@dsic.upv.es> References: <7e97478cd26eb88d653bfd7617e92f18@polytechnique.edu> <852862500ebf52db0edde47a63ce8ae7@polytechnique.edu> <9177FF06-92F4-40E4-8CF6-4A2995C17072@dsic.upv.es> Message-ID: <792624a5b70858444b7e529ce3624395@polytechnique.edu> Hi, So I'm loading two matrices from files, both 1000000 by 10000000. I ran the program with -mat_view::ascii_info and I got: Mat Object: 1 MPI processes type: seqaij rows=1000000, cols=1000000 total: nonzeros=7000000, allocated nonzeros=7000000 total number of mallocs used during MatSetValues calls =0 not using I-node routines 20 times, and then Mat Object: 1 MPI processes type: seqaij rows=1000000, cols=1000000 total: nonzeros=1000000, allocated nonzeros=1000000 total number of mallocs used during MatSetValues calls =0 not using I-node routines 20 times as well, and then Mat Object: 1 MPI processes type: seqaij rows=1000000, cols=1000000 total: nonzeros=7000000, allocated nonzeros=7000000 total number of mallocs used during MatSetValues calls =0 not using I-node routines 20 times as well before crashing. I realized it might be because I am setting up 20 krylov schur partitions which may be too much. I tried running the code again with only 2 partitions and now the code runs but I have speed issues. I have one version of the code where my first matrix has 5 non-zero diagonals (so 5000000 non-zero entries), and the set up time is quite fast (8 seconds) and solving is also quite fast. The second version is the same but I have two extra non-zero diagonals (7000000 non-zero entries) and the set up time is a lot slower (2900 seconds ~ 50 minutes) and solving is also a lot slower. Is it normal that adding two extra diagonals increases solve and set up time so much? Thanks again, Best regards, Perceval, > Then I guess it is the factorization that is failing. How many nonzero entries do you have? Run with > -mat_view ::ascii_info > > Jose > > El 22 nov 2019, a las 19:56, Perceval Desforges escribi?: > > Hi, > > Thanks for your answer. I tried looking at the inertias before solving, but the problem is that the program crashes when I call EPSSetUp with this error: > > slurmstepd: error: Step 2140.0 exceeded virtual memory limit (313526508 > 107317760), being killed > > I get this error even when there are no eigenvalues in the interval. > > I've started using BVMAT instead of BVVECS by the way. > > Thanks, > > Perceval, > > Don't use -mat_mumps_icntl_14 to reduce the memory used by MUMPS. > > Most likely the problem is that the interval you gave is too large and contains too many eigenvalues (SLEPc needs to allocate at least one vector per each eigenvalue). You can count the eigenvalues in the interval with the inertias, which are available at EPSSetUp (no need to call EPSSolve). See this example: > http://slepc.upv.es/documentation/current/src/eps/examples/tutorials/ex25.c.html > You can comment out the call to EPSSolve() and run with the option -show_inertias > For example, the output > Shift 0.1 Inertia 3 > Shift 0.35 Inertia 11 > means that the interval [0.1,0.35] contains 8 eigenvalues (=11-3). > > By the way, I would suggest using BVMAT instead of BVVECS (the latter is slower). > > Jose > > El 21 nov 2019, a las 18:13, Perceval Desforges via petsc-users escribi?: > > Hello all, > > I am trying to obtain all the eigenvalues in a certain interval for a fairly large matrix (1000000 * 1000000). I therefore use the spectrum slicing method detailed in section 3.4.5 of the manual. The calculations are run on a processor with 20 cores and 96 Go of RAM. > > The options I use are : > > -bv_type vecs -eps_krylovschur_detect_zeros 1 -mat_mumps_icntl_13 1 -mat_mumps_icntl_24 1 -mat_mumps_cntl_3 1e-12 > > However the program quickly crashes with this error: > > slurmstepd: error: Step 2115.0 exceeded virtual memory limit (312121084 > 107317760), being killed > > I've tried reducing the amount of memory used by slepc with the -mat_mumps_icntl_14 option by setting it at -70 for example but then I get this error: > > [1]PETSC ERROR: Error in external library > [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=82733614 > > which is an error due to setting the mumps icntl option so low from what I've gathered. > > Is there any other way I can reduce memory usage? > > Thanks, > > Regards, > > Perceval, > > P.S. I sent the same email a few minutes ago but I think I made a mistake in the address, I'm sorry if I've sent it twice. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Nov 25 11:25:07 2019 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 25 Nov 2019 11:25:07 -0600 Subject: [petsc-users] Memory optimization In-Reply-To: <792624a5b70858444b7e529ce3624395@polytechnique.edu> References: <7e97478cd26eb88d653bfd7617e92f18@polytechnique.edu> <852862500ebf52db0edde47a63ce8ae7@polytechnique.edu> <9177FF06-92F4-40E4-8CF6-4A2995C17072@dsic.upv.es> <792624a5b70858444b7e529ce3624395@polytechnique.edu> Message-ID: On Mon, Nov 25, 2019 at 11:20 AM Perceval Desforges < perceval.desforges at polytechnique.edu> wrote: > Hi, > > So I'm loading two matrices from files, both 1000000 by 10000000. I ran > the program with -mat_view::ascii_info and I got: > > Mat Object: 1 MPI processes > type: seqaij > rows=1000000, cols=1000000 > total: nonzeros=7000000, allocated nonzeros=7000000 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > > 20 times, and then > > Mat Object: 1 MPI processes > type: seqaij > rows=1000000, cols=1000000 > total: nonzeros=1000000, allocated nonzeros=1000000 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > > 20 times as well, and then > > Mat Object: 1 MPI processes > type: seqaij > rows=1000000, cols=1000000 > total: nonzeros=7000000, allocated nonzeros=7000000 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > > 20 times as well before crashing. > > I realized it might be because I am setting up 20 krylov schur partitions > which may be too much. I tried running the code again with only 2 > partitions and now the code runs but I have speed issues. > > I have one version of the code where my first matrix has 5 non-zero > diagonals (so 5000000 non-zero entries), and the set up time is quite fast > (8 seconds) and solving is also quite fast. The second version is the same > but I have two extra non-zero diagonals (7000000 non-zero entries) and the > set up time is a lot slower (2900 seconds ~ 50 minutes) and solving is also > a lot slower. Is it normal that adding two extra diagonals increases solve > and set up time so much? > > I can't see the rest of your code, but I am guessing your preallocation statement has "5", so it does no mallocs when you create your first matrix, but mallocs for every row when you create your second matrix. When you load them from disk, we do all the preallocation correctly. Thanks, Matt > Thanks again, > > Best regards, > > Perceval, > > > > Then I guess it is the factorization that is failing. How many nonzero > entries do you have? Run with > -mat_view ::ascii_info > > Jose > > > El 22 nov 2019, a las 19:56, Perceval Desforges < > perceval.desforges at polytechnique.edu> escribi?: > > Hi, > > Thanks for your answer. I tried looking at the inertias before solving, > but the problem is that the program crashes when I call EPSSetUp with this > error: > > slurmstepd: error: Step 2140.0 exceeded virtual memory limit (313526508 > > 107317760), being killed > > I get this error even when there are no eigenvalues in the interval. > > I've started using BVMAT instead of BVVECS by the way. > > Thanks, > > Perceval, > > > > > > Don't use -mat_mumps_icntl_14 to reduce the memory used by MUMPS. > > Most likely the problem is that the interval you gave is too large and > contains too many eigenvalues (SLEPc needs to allocate at least one vector > per each eigenvalue). You can count the eigenvalues in the interval with > the inertias, which are available at EPSSetUp (no need to call EPSSolve). > See this example: > > http://slepc.upv.es/documentation/current/src/eps/examples/tutorials/ex25.c.html > You can comment out the call to EPSSolve() and run with the option > -show_inertias > For example, the output > Shift 0.1 Inertia 3 > Shift 0.35 Inertia 11 > means that the interval [0.1,0.35] contains 8 eigenvalues (=11-3). > > By the way, I would suggest using BVMAT instead of BVVECS (the latter is > slower). > > Jose > > > El 21 nov 2019, a las 18:13, Perceval Desforges via petsc-users < > petsc-users at mcs.anl.gov> escribi?: > > Hello all, > > I am trying to obtain all the eigenvalues in a certain interval for a > fairly large matrix (1000000 * 1000000). I therefore use the spectrum > slicing method detailed in section 3.4.5 of the manual. The calculations > are run on a processor with 20 cores and 96 Go of RAM. > > The options I use are : > > -bv_type vecs -eps_krylovschur_detect_zeros 1 -mat_mumps_icntl_13 1 > -mat_mumps_icntl_24 1 -mat_mumps_cntl_3 1e-12 > > > > However the program quickly crashes with this error: > > slurmstepd: error: Step 2115.0 exceeded virtual memory limit (312121084 > > 107317760), being killed > > I've tried reducing the amount of memory used by slepc with the > -mat_mumps_icntl_14 option by setting it at -70 for example but then I get > this error: > > [1]PETSC ERROR: Error in external library > [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: > INFOG(1)=-9, INFO(2)=82733614 > > which is an error due to setting the mumps icntl option so low from what > I've gathered. > > Is there any other way I can reduce memory usage? > > > > Thanks, > > Regards, > > Perceval, > > > > P.S. I sent the same email a few minutes ago but I think I made a mistake > in the address, I'm sorry if I've sent it twice. > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Mon Nov 25 11:31:19 2019 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 25 Nov 2019 18:31:19 +0100 Subject: [petsc-users] Memory optimization In-Reply-To: References: <7e97478cd26eb88d653bfd7617e92f18@polytechnique.edu> <852862500ebf52db0edde47a63ce8ae7@polytechnique.edu> <9177FF06-92F4-40E4-8CF6-4A2995C17072@dsic.upv.es> <792624a5b70858444b7e529ce3624395@polytechnique.edu> Message-ID: Probably it is not a preallocation issue, as it shows "total number of mallocs used during MatSetValues calls =0". Adding new diagonals may increase fill-in a lot, if the new diagonals are displaced with respect to the other ones. The partitions option is intended for running several nodes. If you are using just one node probably it is better to set one partition only. Jose > El 25 nov 2019, a las 18:25, Matthew Knepley escribi?: > > On Mon, Nov 25, 2019 at 11:20 AM Perceval Desforges wrote: > Hi, > > So I'm loading two matrices from files, both 1000000 by 10000000. I ran the program with -mat_view::ascii_info and I got: > > Mat Object: 1 MPI processes > type: seqaij > rows=1000000, cols=1000000 > total: nonzeros=7000000, allocated nonzeros=7000000 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > > 20 times, and then > > Mat Object: 1 MPI processes > type: seqaij > rows=1000000, cols=1000000 > total: nonzeros=1000000, allocated nonzeros=1000000 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > > 20 times as well, and then > > Mat Object: 1 MPI processes > type: seqaij > rows=1000000, cols=1000000 > total: nonzeros=7000000, allocated nonzeros=7000000 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > > 20 times as well before crashing. > > I realized it might be because I am setting up 20 krylov schur partitions which may be too much. I tried running the code again with only 2 partitions and now the code runs but I have speed issues. > > I have one version of the code where my first matrix has 5 non-zero diagonals (so 5000000 non-zero entries), and the set up time is quite fast (8 seconds) and solving is also quite fast. The second version is the same but I have two extra non-zero diagonals (7000000 non-zero entries) and the set up time is a lot slower (2900 seconds ~ 50 minutes) and solving is also a lot slower. Is it normal that adding two extra diagonals increases solve and set up time so much? > > > I can't see the rest of your code, but I am guessing your preallocation statement has "5", so it does no mallocs when you create > your first matrix, but mallocs for every row when you create your second matrix. When you load them from disk, we do all the > preallocation correctly. > > Thanks, > > Matt > Thanks again, > > Best regards, > > Perceval, > > > > > >> Then I guess it is the factorization that is failing. How many nonzero entries do you have? Run with >> -mat_view ::ascii_info >> >> Jose >> >> >>> El 22 nov 2019, a las 19:56, Perceval Desforges escribi?: >>> >>> Hi, >>> >>> Thanks for your answer. I tried looking at the inertias before solving, but the problem is that the program crashes when I call EPSSetUp with this error: >>> >>> slurmstepd: error: Step 2140.0 exceeded virtual memory limit (313526508 > 107317760), being killed >>> >>> I get this error even when there are no eigenvalues in the interval. >>> >>> I've started using BVMAT instead of BVVECS by the way. >>> >>> Thanks, >>> >>> Perceval, >>> >>> >>> >>> >>> >>>> Don't use -mat_mumps_icntl_14 to reduce the memory used by MUMPS. >>>> >>>> Most likely the problem is that the interval you gave is too large and contains too many eigenvalues (SLEPc needs to allocate at least one vector per each eigenvalue). You can count the eigenvalues in the interval with the inertias, which are available at EPSSetUp (no need to call EPSSolve). See this example: >>>> http://slepc.upv.es/documentation/current/src/eps/examples/tutorials/ex25.c.html >>>> You can comment out the call to EPSSolve() and run with the option -show_inertias >>>> For example, the output >>>> Shift 0.1 Inertia 3 >>>> Shift 0.35 Inertia 11 >>>> means that the interval [0.1,0.35] contains 8 eigenvalues (=11-3). >>>> >>>> By the way, I would suggest using BVMAT instead of BVVECS (the latter is slower). >>>> >>>> Jose >>>> >>>> >>>>> El 21 nov 2019, a las 18:13, Perceval Desforges via petsc-users escribi?: >>>>> >>>>> Hello all, >>>>> >>>>> I am trying to obtain all the eigenvalues in a certain interval for a fairly large matrix (1000000 * 1000000). I therefore use the spectrum slicing method detailed in section 3.4.5 of the manual. The calculations are run on a processor with 20 cores and 96 Go of RAM. >>>>> >>>>> The options I use are : >>>>> >>>>> -bv_type vecs -eps_krylovschur_detect_zeros 1 -mat_mumps_icntl_13 1 -mat_mumps_icntl_24 1 -mat_mumps_cntl_3 1e-12 >>>>> >>>>> >>>>> >>>>> However the program quickly crashes with this error: >>>>> >>>>> slurmstepd: error: Step 2115.0 exceeded virtual memory limit (312121084 > 107317760), being killed >>>>> >>>>> I've tried reducing the amount of memory used by slepc with the -mat_mumps_icntl_14 option by setting it at -70 for example but then I get this error: >>>>> >>>>> [1]PETSC ERROR: Error in external library >>>>> [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=82733614 >>>>> >>>>> which is an error due to setting the mumps icntl option so low from what I've gathered. >>>>> >>>>> Is there any other way I can reduce memory usage? >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Regards, >>>>> >>>>> Perceval, >>>>> >>>>> >>>>> >>>>> P.S. I sent the same email a few minutes ago but I think I made a mistake in the address, I'm sorry if I've sent it twice. >>> > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From perceval.desforges at polytechnique.edu Mon Nov 25 11:44:50 2019 From: perceval.desforges at polytechnique.edu (Perceval Desforges) Date: Mon, 25 Nov 2019 18:44:50 +0100 Subject: [petsc-users] Memory optimization In-Reply-To: References: <7e97478cd26eb88d653bfd7617e92f18@polytechnique.edu> <852862500ebf52db0edde47a63ce8ae7@polytechnique.edu> <9177FF06-92F4-40E4-8CF6-4A2995C17072@dsic.upv.es> <792624a5b70858444b7e529ce3624395@polytechnique.edu> Message-ID: <0007da7378494d3bdb15c219872d3359@polytechnique.edu> I am basically trying to solve a finite element problem, which is why in 3D I have 7 non-zero diagonals that are quite farm apart from one another. In 2D I only have 5 non-zero diagonals that are less far apart. So is it normal that the set up time is around 400 times greater in the 3D case? Is there nothing to be done? I will try setting up only one partition. Thanks, Perceval, > Probably it is not a preallocation issue, as it shows "total number of mallocs used during MatSetValues calls =0". > > Adding new diagonals may increase fill-in a lot, if the new diagonals are displaced with respect to the other ones. > > The partitions option is intended for running several nodes. If you are using just one node probably it is better to set one partition only. > > Jose > > El 25 nov 2019, a las 18:25, Matthew Knepley escribi?: > > On Mon, Nov 25, 2019 at 11:20 AM Perceval Desforges wrote: > Hi, > > So I'm loading two matrices from files, both 1000000 by 10000000. I ran the program with -mat_view::ascii_info and I got: > > Mat Object: 1 MPI processes > type: seqaij > rows=1000000, cols=1000000 > total: nonzeros=7000000, allocated nonzeros=7000000 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > > 20 times, and then > > Mat Object: 1 MPI processes > type: seqaij > rows=1000000, cols=1000000 > total: nonzeros=1000000, allocated nonzeros=1000000 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > > 20 times as well, and then > > Mat Object: 1 MPI processes > type: seqaij > rows=1000000, cols=1000000 > total: nonzeros=7000000, allocated nonzeros=7000000 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > > 20 times as well before crashing. > > I realized it might be because I am setting up 20 krylov schur partitions which may be too much. I tried running the code again with only 2 partitions and now the code runs but I have speed issues. > > I have one version of the code where my first matrix has 5 non-zero diagonals (so 5000000 non-zero entries), and the set up time is quite fast (8 seconds) and solving is also quite fast. The second version is the same but I have two extra non-zero diagonals (7000000 non-zero entries) and the set up time is a lot slower (2900 seconds ~ 50 minutes) and solving is also a lot slower. Is it normal that adding two extra diagonals increases solve and set up time so much? > > I can't see the rest of your code, but I am guessing your preallocation statement has "5", so it does no mallocs when you create > your first matrix, but mallocs for every row when you create your second matrix. When you load them from disk, we do all the > preallocation correctly. > > Thanks, > > Matt > Thanks again, > > Best regards, > > Perceval, > > Then I guess it is the factorization that is failing. How many nonzero entries do you have? Run with > -mat_view ::ascii_info > > Jose > > El 22 nov 2019, a las 19:56, Perceval Desforges escribi?: > > Hi, > > Thanks for your answer. I tried looking at the inertias before solving, but the problem is that the program crashes when I call EPSSetUp with this error: > > slurmstepd: error: Step 2140.0 exceeded virtual memory limit (313526508 > 107317760), being killed > > I get this error even when there are no eigenvalues in the interval. > > I've started using BVMAT instead of BVVECS by the way. > > Thanks, > > Perceval, > > Don't use -mat_mumps_icntl_14 to reduce the memory used by MUMPS. > > Most likely the problem is that the interval you gave is too large and contains too many eigenvalues (SLEPc needs to allocate at least one vector per each eigenvalue). You can count the eigenvalues in the interval with the inertias, which are available at EPSSetUp (no need to call EPSSolve). See this example: > http://slepc.upv.es/documentation/current/src/eps/examples/tutorials/ex25.c.html > You can comment out the call to EPSSolve() and run with the option -show_inertias > For example, the output > Shift 0.1 Inertia 3 > Shift 0.35 Inertia 11 > means that the interval [0.1,0.35] contains 8 eigenvalues (=11-3). > > By the way, I would suggest using BVMAT instead of BVVECS (the latter is slower). > > Jose > > El 21 nov 2019, a las 18:13, Perceval Desforges via petsc-users escribi?: > > Hello all, > > I am trying to obtain all the eigenvalues in a certain interval for a fairly large matrix (1000000 * 1000000). I therefore use the spectrum slicing method detailed in section 3.4.5 of the manual. The calculations are run on a processor with 20 cores and 96 Go of RAM. > > The options I use are : > > -bv_type vecs -eps_krylovschur_detect_zeros 1 -mat_mumps_icntl_13 1 -mat_mumps_icntl_24 1 -mat_mumps_cntl_3 1e-12 > > However the program quickly crashes with this error: > > slurmstepd: error: Step 2115.0 exceeded virtual memory limit (312121084 > 107317760), being killed > > I've tried reducing the amount of memory used by slepc with the -mat_mumps_icntl_14 option by setting it at -70 for example but then I get this error: > > [1]PETSC ERROR: Error in external library > [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=82733614 > > which is an error due to setting the mumps icntl option so low from what I've gathered. > > Is there any other way I can reduce memory usage? > > Thanks, > > Regards, > > Perceval, > > P.S. I sent the same email a few minutes ago but I think I made a mistake in the address, I'm sorry if I've sent it twice. -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Mon Nov 25 11:49:16 2019 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 25 Nov 2019 18:49:16 +0100 Subject: [petsc-users] Memory optimization In-Reply-To: <0007da7378494d3bdb15c219872d3359@polytechnique.edu> References: <7e97478cd26eb88d653bfd7617e92f18@polytechnique.edu> <852862500ebf52db0edde47a63ce8ae7@polytechnique.edu> <9177FF06-92F4-40E4-8CF6-4A2995C17072@dsic.upv.es> <792624a5b70858444b7e529ce3624395@polytechnique.edu> <0007da7378494d3bdb15c219872d3359@polytechnique.edu> Message-ID: In 3D problems it is recommended to use preconditioned iterative solvers. Unfortunately the spectrum slicing technique requires the full factorization (because it uses matrix inertia). > El 25 nov 2019, a las 18:44, Perceval Desforges escribi?: > > I am basically trying to solve a finite element problem, which is why in 3D I have 7 non-zero diagonals that are quite farm apart from one another. In 2D I only have 5 non-zero diagonals that are less far apart. So is it normal that the set up time is around 400 times greater in the 3D case? Is there nothing to be done? > > I will try setting up only one partition. > > Thanks, > > Perceval, > >> Probably it is not a preallocation issue, as it shows "total number of mallocs used during MatSetValues calls =0". >> >> Adding new diagonals may increase fill-in a lot, if the new diagonals are displaced with respect to the other ones. >> >> The partitions option is intended for running several nodes. If you are using just one node probably it is better to set one partition only. >> >> Jose >> >> >>> El 25 nov 2019, a las 18:25, Matthew Knepley escribi?: >>> >>> On Mon, Nov 25, 2019 at 11:20 AM Perceval Desforges wrote: >>> Hi, >>> >>> So I'm loading two matrices from files, both 1000000 by 10000000. I ran the program with -mat_view::ascii_info and I got: >>> >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=1000000, cols=1000000 >>> total: nonzeros=7000000, allocated nonzeros=7000000 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> >>> 20 times, and then >>> >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=1000000, cols=1000000 >>> total: nonzeros=1000000, allocated nonzeros=1000000 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> >>> 20 times as well, and then >>> >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=1000000, cols=1000000 >>> total: nonzeros=7000000, allocated nonzeros=7000000 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> >>> 20 times as well before crashing. >>> >>> I realized it might be because I am setting up 20 krylov schur partitions which may be too much. I tried running the code again with only 2 partitions and now the code runs but I have speed issues. >>> >>> I have one version of the code where my first matrix has 5 non-zero diagonals (so 5000000 non-zero entries), and the set up time is quite fast (8 seconds) and solving is also quite fast. The second version is the same but I have two extra non-zero diagonals (7000000 non-zero entries) and the set up time is a lot slower (2900 seconds ~ 50 minutes) and solving is also a lot slower. Is it normal that adding two extra diagonals increases solve and set up time so much? >>> >>> >>> I can't see the rest of your code, but I am guessing your preallocation statement has "5", so it does no mallocs when you create >>> your first matrix, but mallocs for every row when you create your second matrix. When you load them from disk, we do all the >>> preallocation correctly. >>> >>> Thanks, >>> >>> Matt >>> Thanks again, >>> >>> Best regards, >>> >>> Perceval, >>> >>> >>> >>> >>> >>>> Then I guess it is the factorization that is failing. How many nonzero entries do you have? Run with >>>> -mat_view ::ascii_info >>>> >>>> Jose >>>> >>>> >>>>> El 22 nov 2019, a las 19:56, Perceval Desforges escribi?: >>>>> >>>>> Hi, >>>>> >>>>> Thanks for your answer. I tried looking at the inertias before solving, but the problem is that the program crashes when I call EPSSetUp with this error: >>>>> >>>>> slurmstepd: error: Step 2140.0 exceeded virtual memory limit (313526508 > 107317760), being killed >>>>> >>>>> I get this error even when there are no eigenvalues in the interval. >>>>> >>>>> I've started using BVMAT instead of BVVECS by the way. >>>>> >>>>> Thanks, >>>>> >>>>> Perceval, >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> Don't use -mat_mumps_icntl_14 to reduce the memory used by MUMPS. >>>>>> >>>>>> Most likely the problem is that the interval you gave is too large and contains too many eigenvalues (SLEPc needs to allocate at least one vector per each eigenvalue). You can count the eigenvalues in the interval with the inertias, which are available at EPSSetUp (no need to call EPSSolve). See this example: >>>>>> http://slepc.upv.es/documentation/current/src/eps/examples/tutorials/ex25.c.html >>>>>> You can comment out the call to EPSSolve() and run with the option -show_inertias >>>>>> For example, the output >>>>>> Shift 0.1 Inertia 3 >>>>>> Shift 0.35 Inertia 11 >>>>>> means that the interval [0.1,0.35] contains 8 eigenvalues (=11-3). >>>>>> >>>>>> By the way, I would suggest using BVMAT instead of BVVECS (the latter is slower). >>>>>> >>>>>> Jose >>>>>> >>>>>> >>>>>>> El 21 nov 2019, a las 18:13, Perceval Desforges via petsc-users escribi?: >>>>>>> >>>>>>> Hello all, >>>>>>> >>>>>>> I am trying to obtain all the eigenvalues in a certain interval for a fairly large matrix (1000000 * 1000000). I therefore use the spectrum slicing method detailed in section 3.4.5 of the manual. The calculations are run on a processor with 20 cores and 96 Go of RAM. >>>>>>> >>>>>>> The options I use are : >>>>>>> >>>>>>> -bv_type vecs -eps_krylovschur_detect_zeros 1 -mat_mumps_icntl_13 1 -mat_mumps_icntl_24 1 -mat_mumps_cntl_3 1e-12 >>>>>>> >>>>>>> >>>>>>> >>>>>>> However the program quickly crashes with this error: >>>>>>> >>>>>>> slurmstepd: error: Step 2115.0 exceeded virtual memory limit (312121084 > 107317760), being killed >>>>>>> >>>>>>> I've tried reducing the amount of memory used by slepc with the -mat_mumps_icntl_14 option by setting it at -70 for example but then I get this error: >>>>>>> >>>>>>> [1]PETSC ERROR: Error in external library >>>>>>> [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=82733614 >>>>>>> >>>>>>> which is an error due to setting the mumps icntl option so low from what I've gathered. >>>>>>> >>>>>>> Is there any other way I can reduce memory usage? >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Perceval, >>>>>>> >>>>>>> >>>>>>> >>>>>>> P.S. I sent the same email a few minutes ago but I think I made a mistake in the address, I'm sorry if I've sent it twice. >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ > > From knepley at gmail.com Mon Nov 25 11:48:52 2019 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 25 Nov 2019 11:48:52 -0600 Subject: [petsc-users] Memory optimization In-Reply-To: <0007da7378494d3bdb15c219872d3359@polytechnique.edu> References: <7e97478cd26eb88d653bfd7617e92f18@polytechnique.edu> <852862500ebf52db0edde47a63ce8ae7@polytechnique.edu> <9177FF06-92F4-40E4-8CF6-4A2995C17072@dsic.upv.es> <792624a5b70858444b7e529ce3624395@polytechnique.edu> <0007da7378494d3bdb15c219872d3359@polytechnique.edu> Message-ID: On Mon, Nov 25, 2019 at 11:45 AM Perceval Desforges < perceval.desforges at polytechnique.edu> wrote: > I am basically trying to solve a finite element problem, which is why in > 3D I have 7 non-zero diagonals that are quite farm apart from one another. > In 2D I only have 5 non-zero diagonals that are less far apart. So is it > normal that the set up time is around 400 times greater in the 3D case? Is > there nothing to be done? > > No. It is almost certain that preallocation is screwed up. There is no way it can take 400x longer for a few nonzeros. In order to debug, please send the output of -log_view and indicate where the time is taken for assembly. You can usually track down bad preallocation using -info. Thanks, Matt > I will try setting up only one partition. > > Thanks, > > Perceval, > > Probably it is not a preallocation issue, as it shows "total number of > mallocs used during MatSetValues calls =0". > > Adding new diagonals may increase fill-in a lot, if the new diagonals are > displaced with respect to the other ones. > > The partitions option is intended for running several nodes. If you are > using just one node probably it is better to set one partition only. > > Jose > > > El 25 nov 2019, a las 18:25, Matthew Knepley escribi?: > > On Mon, Nov 25, 2019 at 11:20 AM Perceval Desforges < > perceval.desforges at polytechnique.edu> wrote: > Hi, > > So I'm loading two matrices from files, both 1000000 by 10000000. I ran > the program with -mat_view::ascii_info and I got: > > Mat Object: 1 MPI processes > type: seqaij > rows=1000000, cols=1000000 > total: nonzeros=7000000, allocated nonzeros=7000000 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > > 20 times, and then > > Mat Object: 1 MPI processes > type: seqaij > rows=1000000, cols=1000000 > total: nonzeros=1000000, allocated nonzeros=1000000 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > > 20 times as well, and then > > Mat Object: 1 MPI processes > type: seqaij > rows=1000000, cols=1000000 > total: nonzeros=7000000, allocated nonzeros=7000000 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > > 20 times as well before crashing. > > I realized it might be because I am setting up 20 krylov schur partitions > which may be too much. I tried running the code again with only 2 > partitions and now the code runs but I have speed issues. > > I have one version of the code where my first matrix has 5 non-zero > diagonals (so 5000000 non-zero entries), and the set up time is quite fast > (8 seconds) and solving is also quite fast. The second version is the same > but I have two extra non-zero diagonals (7000000 non-zero entries) and the > set up time is a lot slower (2900 seconds ~ 50 minutes) and solving is also > a lot slower. Is it normal that adding two extra diagonals increases solve > and set up time so much? > > > I can't see the rest of your code, but I am guessing your preallocation > statement has "5", so it does no mallocs when you create > your first matrix, but mallocs for every row when you create your second > matrix. When you load them from disk, we do all the > preallocation correctly. > > Thanks, > > Matt > Thanks again, > > Best regards, > > Perceval, > > > > > > Then I guess it is the factorization that is failing. How many nonzero > entries do you have? Run with > -mat_view ::ascii_info > > Jose > > > El 22 nov 2019, a las 19:56, Perceval Desforges < > perceval.desforges at polytechnique.edu> escribi?: > > Hi, > > Thanks for your answer. I tried looking at the inertias before solving, > but the problem is that the program crashes when I call EPSSetUp with this > error: > > slurmstepd: error: Step 2140.0 exceeded virtual memory limit (313526508 > > 107317760), being killed > > I get this error even when there are no eigenvalues in the interval. > > I've started using BVMAT instead of BVVECS by the way. > > Thanks, > > Perceval, > > > > > > Don't use -mat_mumps_icntl_14 to reduce the memory used by MUMPS. > > Most likely the problem is that the interval you gave is too large and > contains too many eigenvalues (SLEPc needs to allocate at least one vector > per each eigenvalue). You can count the eigenvalues in the interval with > the inertias, which are available at EPSSetUp (no need to call EPSSolve). > See this example: > > http://slepc.upv.es/documentation/current/src/eps/examples/tutorials/ex25.c.html > You can comment out the call to EPSSolve() and run with the option > -show_inertias > For example, the output > Shift 0.1 Inertia 3 > Shift 0.35 Inertia 11 > means that the interval [0.1,0.35] contains 8 eigenvalues (=11-3). > > By the way, I would suggest using BVMAT instead of BVVECS (the latter is > slower). > > Jose > > > El 21 nov 2019, a las 18:13, Perceval Desforges via petsc-users < > petsc-users at mcs.anl.gov> escribi?: > > Hello all, > > I am trying to obtain all the eigenvalues in a certain interval for a > fairly large matrix (1000000 * 1000000). I therefore use the spectrum > slicing method detailed in section 3.4.5 of the manual. The calculations > are run on a processor with 20 cores and 96 Go of RAM. > > The options I use are : > > -bv_type vecs -eps_krylovschur_detect_zeros 1 -mat_mumps_icntl_13 1 > -mat_mumps_icntl_24 1 -mat_mumps_cntl_3 1e-12 > > > > However the program quickly crashes with this error: > > slurmstepd: error: Step 2115.0 exceeded virtual memory limit (312121084 > > 107317760), being killed > > I've tried reducing the amount of memory used by slepc with the > -mat_mumps_icntl_14 option by setting it at -70 for example but then I get > this error: > > [1]PETSC ERROR: Error in external library > [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: > INFOG(1)=-9, INFO(2)=82733614 > > which is an error due to setting the mumps icntl option so low from what > I've gathered. > > Is there any other way I can reduce memory usage? > > > > Thanks, > > Regards, > > Perceval, > > > > P.S. I sent the same email a few minutes ago but I think I made a mistake > in the address, I'm sorry if I've sent it twice. > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From swarnava89 at gmail.com Mon Nov 25 18:23:59 2019 From: swarnava89 at gmail.com (Swarnava Ghosh) Date: Mon, 25 Nov 2019 16:23:59 -0800 Subject: [petsc-users] Domain decomposition using DMPLEX Message-ID: Dear PETSc users and developers, I am working with dmplex to distribute a 3D unstructured mesh made of tetrahedrons in a cuboidal domain. I had a few queries: 1) Is there any way of ensuring load balancing based on the number of vertices per MPI process. 2) As the global domain is cuboidal, is the resulting domain decomposition also cuboidal on every MPI process? If not, is there a way to ensure this? For example in DMDA, the default domain decomposition for a cuboidal domain is cuboidal. Sincerely, SG -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Nov 25 21:54:44 2019 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 25 Nov 2019 21:54:44 -0600 Subject: [petsc-users] Domain decomposition using DMPLEX In-Reply-To: References: Message-ID: On Mon, Nov 25, 2019 at 6:25 PM Swarnava Ghosh wrote: > Dear PETSc users and developers, > > I am working with dmplex to distribute a 3D unstructured mesh made of > tetrahedrons in a cuboidal domain. I had a few queries: > 1) Is there any way of ensuring load balancing based on the number of > vertices per MPI process. > You can now call DMPlexRebalanceSharedPoints() to try and get better balance of vertices. > 2) As the global domain is cuboidal, is the resulting domain decomposition > also cuboidal on every MPI process? If not, is there a way to ensure this? > For example in DMDA, the default domain decomposition for a cuboidal domain > is cuboidal. > It sounds like you do not want something that is actually unstructured. Rather, it seems like you want to take a DMDA type thing and split it into tets. You can get a cuboidal decomposition of a hex mesh easily. Call DMPlexCreateBoxMesh() with one cell for every process, distribute, and then uniformly refine. This will not quite work for tets since the mesh partitioner will tend to violate that constraint. You could: a) Prescribe the distribution yourself using the Shell partitioner type or b) Write a refiner that turns hexes into tets We already have a refiner that turns tets into hexes, but we never wrote the other direction because it was not clear that it was useful. Thanks, Matt > Sincerely, > SG > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From swarnava89 at gmail.com Mon Nov 25 22:45:18 2019 From: swarnava89 at gmail.com (Swarnava Ghosh) Date: Mon, 25 Nov 2019 20:45:18 -0800 Subject: [petsc-users] Domain decomposition using DMPLEX In-Reply-To: References: Message-ID: Hi Matt, https://arxiv.org/pdf/1907.02604.pdf On Mon, Nov 25, 2019 at 7:54 PM Matthew Knepley wrote: > On Mon, Nov 25, 2019 at 6:25 PM Swarnava Ghosh > wrote: > >> Dear PETSc users and developers, >> >> I am working with dmplex to distribute a 3D unstructured mesh made of >> tetrahedrons in a cuboidal domain. I had a few queries: >> 1) Is there any way of ensuring load balancing based on the number of >> vertices per MPI process. >> > > You can now call DMPlexRebalanceSharedPoints() to try and get better > balance of vertices. > > Thank you for pointing out this function! > 2) As the global domain is cuboidal, is the resulting domain decomposition >> also cuboidal on every MPI process? If not, is there a way to ensure this? >> For example in DMDA, the default domain decomposition for a cuboidal domain >> is cuboidal. >> > > It sounds like you do not want something that is actually unstructured. > Rather, it seems like you want to > take a DMDA type thing and split it into tets. You can get a cuboidal > decomposition of a hex mesh easily. > Call DMPlexCreateBoxMesh() with one cell for every process, distribute, > and then uniformly refine. This > will not quite work for tets since the mesh partitioner will tend to > violate that constraint. You could: > > No, I have an unstructured mesh that increases in resolution away from the center of the cuboid. See Figure: 5 in the ArXiv paper https://arxiv.org/pdf/1907.02604.pdf for a slice through the midplane of the cuboid. Given this type of mesh, will dmplex do a cuboidal domain decomposition? Sincerely, SG > a) Prescribe the distribution yourself using the Shell partitioner type > > or > > b) Write a refiner that turns hexes into tets > > We already have a refiner that turns tets into hexes, but we never wrote > the other direction because it was not clear > that it was useful. > > Thanks, > > Matt > > >> Sincerely, >> SG >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Nov 25 23:02:50 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Tue, 26 Nov 2019 05:02:50 +0000 Subject: [petsc-users] Domain decomposition using DMPLEX In-Reply-To: References: Message-ID: "No, I have an unstructured mesh that increases in resolution away from the center of the cuboid. See Figure: 5 in the ArXiv paper https://arxiv.org/pdf/1907.02604.pdf for a slice through the midplane of the cuboid. Given this type of mesh, will dmplex do a cuboidal domain decomposition?" No definitely not. Why do you need a cuboidal domain decomposition? Barry > On Nov 25, 2019, at 10:45 PM, Swarnava Ghosh wrote: > > Hi Matt, > > > https://arxiv.org/pdf/1907.02604.pdf > > On Mon, Nov 25, 2019 at 7:54 PM Matthew Knepley wrote: > On Mon, Nov 25, 2019 at 6:25 PM Swarnava Ghosh wrote: > Dear PETSc users and developers, > > I am working with dmplex to distribute a 3D unstructured mesh made of tetrahedrons in a cuboidal domain. I had a few queries: > 1) Is there any way of ensuring load balancing based on the number of vertices per MPI process. > > You can now call DMPlexRebalanceSharedPoints() to try and get better balance of vertices. > > Thank you for pointing out this function! > > 2) As the global domain is cuboidal, is the resulting domain decomposition also cuboidal on every MPI process? If not, is there a way to ensure this? For example in DMDA, the default domain decomposition for a cuboidal domain is cuboidal. > > It sounds like you do not want something that is actually unstructured. Rather, it seems like you want to > take a DMDA type thing and split it into tets. You can get a cuboidal decomposition of a hex mesh easily. > Call DMPlexCreateBoxMesh() with one cell for every process, distribute, and then uniformly refine. This > will not quite work for tets since the mesh partitioner will tend to violate that constraint. You could: > > No, I have an unstructured mesh that increases in resolution away from the center of the cuboid. See Figure: 5 in the ArXiv paper https://arxiv.org/pdf/1907.02604.pdf for a slice through the midplane of the cuboid. Given this type of mesh, will dmplex do a cuboidal domain decomposition? > > Sincerely, > SG > > a) Prescribe the distribution yourself using the Shell partitioner type > > or > > b) Write a refiner that turns hexes into tets > > We already have a refiner that turns tets into hexes, but we never wrote the other direction because it was not clear > that it was useful. > > Thanks, > > Matt > > Sincerely, > SG > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From bsmith at mcs.anl.gov Tue Nov 26 00:27:45 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Tue, 26 Nov 2019 06:27:45 +0000 Subject: [petsc-users] petsc without MPI In-Reply-To: <7b2c1352-df81-16e8-080a-83da950e0ede@purdue.edu> References: <01a221b4-795e-6624-71e4-75853a1bbf6e@purdue.edu> <6925fced-aaf2-1ef9-866c-a0c415d24df6@purdue.edu> <7b2c1352-df81-16e8-080a-83da950e0ede@purdue.edu> Message-ID: <528CB72A-B209-42A5-99EE-9ED9D3EADA19@anl.gov> I agree this is confusing. https://gitlab.com/petsc/petsc/merge_requests/2331 the flag PETSC_HAVE_MPI will no longer be set when MPI is not used (only MPIUNI is used). Barry The code API still has MPI* in it with MPI but they are stubs that just handle the sequential code and do not require an installation of MPI. > On Nov 19, 2019, at 2:07 PM, Povolotskyi, Mykhailo via petsc-users wrote: > > I see. > > Actually, my goal is to compile petsc without real MPI to use it with libmesh. > > You are saying that PETSC_HAVE_MPI is not a sign that Petsc is built with MPI. It means you have MPIUNI which is a serial code, but has an interface of MPI. > > Correct? > > On 11/19/2019 3:00 PM, Matthew Knepley wrote: >> On Tue, Nov 19, 2019 at 2:58 PM Povolotskyi, Mykhailo wrote: >> Let me explain the problem. >> >> This log file has >> >> #ifndef PETSC_HAVE_MPI >> #define PETSC_HAVE_MPI 1 >> #endif >> >> while I need to have PETSC without MPI. >> >> If you do not provide MPI, we provide MPIUNI. Do you see it linking to an MPI implementation, or using mpi.h? >> >> Matt >> >> On 11/19/2019 2:55 PM, Matthew Knepley wrote: >>> The log you sent has configure completely successfully. Please retry and send the log for a failed run. >>> >>> Thanks, >>> >>> Matt >>> >>> On Tue, Nov 19, 2019 at 2:53 PM Povolotskyi, Mykhailo via petsc-users wrote: >>> Why it did not work then? >>> >>> On 11/19/2019 2:51 PM, Balay, Satish wrote: >>> > And I see from configure.log - you are using the correct option. >>> > >>> > Configure Options: --configModules=PETSc.Configure --optionsModule=config.compilerOptions --with-scalar-type=real --with-x=0 --with-hdf5=0 --with-single-library=1 --with-shared-libraries=0 --with-log=0 --with-mpi=0 --with-clanguage=C++ --with-cxx-dialect=C++11 --CXXFLAGS="-fopenmp -fPIC" --CFLAGS="-fopenmp -fPIC" --with-fortran=0 --FFLAGS="-fopenmp -fPIC" --with-64-bit-indices=0 --with-debugging=0 --with-cc=gcc --with-fc=gfortran --with-cxx=g++ COPTFLAGS= CXXOPTFLAGS= FOPTFLAGS= --download-metis=0 --download-superlu_dist=0 --download-parmetis=0 --with-valgrind-dir=/apps/brown/valgrind/3.13.0_gcc-4.8.5 --download-mumps=1 --with-mumps-serial=1 --with-fortran-kernels=0 --with-blaslapack-lib="-Wl,-rpath,/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64 -L/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core " --with-blacs-lib=/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so --with-blacs-include=/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/include --with-scalapack=0 >>> > <<<<<<< >>> > >>> > And configure completed successfully. What issue are you encountering? Why do you think its activating MPI? >>> > >>> > Satish >>> > >>> > >>> > On Tue, 19 Nov 2019, Balay, Satish via petsc-users wrote: >>> > >>> >> On Tue, 19 Nov 2019, Povolotskyi, Mykhailo via petsc-users wrote: >>> >> >>> >>> Hello, >>> >>> >>> >>> I'm trying to build PETSC without MPI. >>> >>> >>> >>> Even if I specify --with_mpi=0, the configuration script still activates >>> >>> MPI. >>> >>> >>> >>> I attach the configure.log. >>> >>> >>> >>> What am I doing wrong? >>> >> The option is --with-mpi=0 >>> >> >>> >> Satish >>> >> >>> >> >>> >>> Thank you, >>> >>> >>> >>> Michael. >>> >>> >>> >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ From balay at mcs.anl.gov Tue Nov 26 07:52:02 2019 From: balay at mcs.anl.gov (Balay, Satish) Date: Tue, 26 Nov 2019 13:52:02 +0000 Subject: [petsc-users] petsc without MPI In-Reply-To: <528CB72A-B209-42A5-99EE-9ED9D3EADA19@anl.gov> References: <01a221b4-795e-6624-71e4-75853a1bbf6e@purdue.edu> <6925fced-aaf2-1ef9-866c-a0c415d24df6@purdue.edu> <7b2c1352-df81-16e8-080a-83da950e0ede@purdue.edu> <528CB72A-B209-42A5-99EE-9ED9D3EADA19@anl.gov> Message-ID: Generally - even when one wants sequential build - its best use MPICH [or openMPI] when using multiple MPI based packages. [this is to avoid conflicts - if any - in the seqential MPI stubs of these packages] And run the code sequentially.. Satish On Tue, 26 Nov 2019, Smith, Barry F. wrote: > > I agree this is confusing. https://gitlab.com/petsc/petsc/merge_requests/2331 the flag PETSC_HAVE_MPI will no longer be set when MPI is not used (only MPIUNI is used). > > Barry > > The code API still has MPI* in it with MPI but they are stubs that just handle the sequential code and do not require an installation of MPI. > > > > On Nov 19, 2019, at 2:07 PM, Povolotskyi, Mykhailo via petsc-users wrote: > > > > I see. > > > > Actually, my goal is to compile petsc without real MPI to use it with libmesh. > > > > You are saying that PETSC_HAVE_MPI is not a sign that Petsc is built with MPI. It means you have MPIUNI which is a serial code, but has an interface of MPI. > > > > Correct? > > > > On 11/19/2019 3:00 PM, Matthew Knepley wrote: > >> On Tue, Nov 19, 2019 at 2:58 PM Povolotskyi, Mykhailo wrote: > >> Let me explain the problem. > >> > >> This log file has > >> > >> #ifndef PETSC_HAVE_MPI > >> #define PETSC_HAVE_MPI 1 > >> #endif > >> > >> while I need to have PETSC without MPI. > >> > >> If you do not provide MPI, we provide MPIUNI. Do you see it linking to an MPI implementation, or using mpi.h? > >> > >> Matt > >> > >> On 11/19/2019 2:55 PM, Matthew Knepley wrote: > >>> The log you sent has configure completely successfully. Please retry and send the log for a failed run. > >>> > >>> Thanks, > >>> > >>> Matt > >>> > >>> On Tue, Nov 19, 2019 at 2:53 PM Povolotskyi, Mykhailo via petsc-users wrote: > >>> Why it did not work then? > >>> > >>> On 11/19/2019 2:51 PM, Balay, Satish wrote: > >>> > And I see from configure.log - you are using the correct option. > >>> > > >>> > Configure Options: --configModules=PETSc.Configure --optionsModule=config.compilerOptions --with-scalar-type=real --with-x=0 --with-hdf5=0 --with-single-library=1 --with-shared-libraries=0 --with-log=0 --with-mpi=0 --with-clanguage=C++ --with-cxx-dialect=C++11 --CXXFLAGS="-fopenmp -fPIC" --CFLAGS="-fopenmp -fPIC" --with-fortran=0 --FFLAGS="-fopenmp -fPIC" --with-64-bit-indices=0 --with-debugging=0 --with-cc=gcc --with-fc=gfortran --with-cxx=g++ COPTFLAGS= CXXOPTFLAGS= FOPTFLAGS= --download-metis=0 --download-superlu_dist=0 --download-parmetis=0 --with-valgrind-dir=/apps/brown/valgrind/3.13.0_gcc-4.8.5 --download-mumps=1 --with-mumps-serial=1 --with-fortran-kernels=0 --with-blaslapack-lib="-Wl,-rpath,/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64 -L/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core " --with-blacs-lib=/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so --with-blacs-include=/apps/cent7/intel/compilers_and_libraries_2017.1.132/linux/mkl/include --with-scalapack=0 > >>> > <<<<<<< > >>> > > >>> > And configure completed successfully. What issue are you encountering? Why do you think its activating MPI? > >>> > > >>> > Satish > >>> > > >>> > > >>> > On Tue, 19 Nov 2019, Balay, Satish via petsc-users wrote: > >>> > > >>> >> On Tue, 19 Nov 2019, Povolotskyi, Mykhailo via petsc-users wrote: > >>> >> > >>> >>> Hello, > >>> >>> > >>> >>> I'm trying to build PETSC without MPI. > >>> >>> > >>> >>> Even if I specify --with_mpi=0, the configuration script still activates > >>> >>> MPI. > >>> >>> > >>> >>> I attach the configure.log. > >>> >>> > >>> >>> What am I doing wrong? > >>> >> The option is --with-mpi=0 > >>> >> > >>> >> Satish > >>> >> > >>> >> > >>> >>> Thank you, > >>> >>> > >>> >>> Michael. > >>> >>> > >>> >>> > >>> > >>> > >>> > >>> -- > >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > >>> -- Norbert Wiener > >>> > >>> https://www.cse.buffalo.edu/~knepley/ > >> > >> > >> -- > >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > >> -- Norbert Wiener > >> > >> https://www.cse.buffalo.edu/~knepley/ > From bldenton at buffalo.edu Tue Nov 26 07:22:47 2019 From: bldenton at buffalo.edu (Brandon Denton) Date: Tue, 26 Nov 2019 08:22:47 -0500 Subject: [petsc-users] Petsc Matrix modifications Message-ID: Good Morning, Is it possible to expand a matrix in petsc? I current created and loaded a matrix (6 x 5) which holds information required later in my program. I would like to store additional information in the matrix by expanding its size, let's say make it at 10 x 5 matrix. How is this accomplished in petsc. When I try to use MatSetSize() or MatSetValue() my code throws errors. What is the process for accomplishing this? Thank you in advance for your time. Brandon -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 26 09:00:27 2019 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 26 Nov 2019 09:00:27 -0600 Subject: [petsc-users] Petsc Matrix modifications In-Reply-To: References: Message-ID: On Tue, Nov 26, 2019 at 8:04 AM Brandon Denton wrote: > Good Morning, > > Is it possible to expand a matrix in petsc? I current created and loaded a > matrix (6 x 5) which holds information required later in my program. I > would like to store additional information in the matrix by expanding its > size, let's say make it at 10 x 5 matrix. How is this accomplished in > petsc. When I try to use MatSetSize() or MatSetValue() my code throws > errors. What is the process for accomplishing this? > Normally if you change the size, you just make a new object. If you really want to retain the same pointer because it is being held by other objects, you can call MatReset() and rebuild the matrix completely, but I normally would not do this. Thanks, Matt > Thank you in advance for your time. > Brandon > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From perceval.desforges at polytechnique.edu Tue Nov 26 09:23:22 2019 From: perceval.desforges at polytechnique.edu (Perceval Desforges) Date: Tue, 26 Nov 2019 16:23:22 +0100 Subject: [petsc-users] Memory optimization In-Reply-To: References: <7e97478cd26eb88d653bfd7617e92f18@polytechnique.edu> <852862500ebf52db0edde47a63ce8ae7@polytechnique.edu> <9177FF06-92F4-40E4-8CF6-4A2995C17072@dsic.upv.es> <792624a5b70858444b7e529ce3624395@polytechnique.edu> <0007da7378494d3bdb15c219872d3359@polytechnique.edu> Message-ID: <7ac1554555a98057cf38e8d69fb2f9c8@polytechnique.edu> Hello, This is the output of -log_view. I selected what I thought were the important parts. I don't know if this is the best format to send the logs. If a text file is better let me know. Thanks again, ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./dos.exe on a named compute-0-11.local with 20 processors, by pcd Tue Nov 26 15:50:50 2019 Using Petsc Release Version 3.10.5, Mar, 28, 2019 Max Max/Min Avg Total Time (sec): 2.214e+03 1.000 2.214e+03 Objects: 1.370e+02 1.030 1.332e+02 Flop: 1.967e+14 1.412 1.539e+14 3.077e+15 Flop/sec: 8.886e+10 1.412 6.950e+10 1.390e+12 MPI Messages: 1.716e+03 1.350 1.516e+03 3.032e+04 MPI Message Lengths: 2.559e+08 5.796 4.179e+04 1.267e+09 MPI Reductions: 3.840e+02 1.000 Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 1.0000e+02 4.5% 3.0771e+15 100.0% 3.016e+04 99.5% 4.190e+04 99.7% 3.310e+02 86.2% 1: Setting Up EPS: 2.1137e+03 95.5% 7.4307e+09 0.0% 1.600e+02 0.5% 2.000e+04 0.3% 4.600e+01 12.0% ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage PetscBarrier 2 1.0 2.6554e+004632.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 0 0 0 0 0 BuildTwoSidedF 3 1.0 1.2021e-01672.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecDot 8 1.0 1.1364e-02 2.3 8.00e+05 1.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 2 0 0 0 0 2 1408 VecMDot 11 1.0 4.8588e-02 2.2 6.60e+06 1.0 0.0e+00 0.0e+00 1.1e+01 0 0 0 0 3 0 0 0 0 3 2717 VecNorm 12 1.0 5.2616e-02 4.3 1.20e+06 1.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 3 0 0 0 0 4 456 VecScale 12 1.0 9.8681e-04 2.2 6.00e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 12160 VecCopy 3 1.0 4.1175e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 108 1.0 9.3610e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 1 1.0 1.6284e-04 3.2 1.00e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 12282 VecMAXPY 12 1.0 7.6976e-03 1.9 7.70e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 20006 VecScatterBegin 419 1.0 4.5905e-01 3.7 0.00e+00 0.0 2.9e+04 3.7e+04 9.0e+01 0 0 96 85 23 0 0 97 85 27 0 VecScatterEnd 329 1.0 9.3328e-01 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 VecSetRandom 1 1.0 4.3299e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 12 1.0 5.3697e-02 4.2 1.80e+06 1.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 3 0 0 0 0 4 670 MatMult 240 1.0 1.2112e-01 1.5 1.86e+07 1.0 4.4e+02 8.0e+04 0.0e+00 0 0 1 3 0 0 0 1 3 0 3071 MatSolve 101 1.0 9.3087e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04 9.1e+01 4100 97 82 24 93100 97 82 27 33055277 MatCholFctrNum 1 1.0 1.2752e-02 2.8 5.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 78 MatICCFactorSym 1 1.0 4.0321e-03 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 5 1.7 1.2031e-01501.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 5 1.7 6.6613e-02 2.4 0.00e+00 0.0 1.6e+02 2.0e+04 2.4e+01 0 0 1 0 6 0 0 1 0 7 0 MatGetRowIJ 1 1.0 7.1526e-06 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 1.2271e-03 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLoad 3 1.0 2.8543e-01 1.0 0.00e+00 0.0 3.3e+02 5.6e+05 5.4e+01 0 0 1 15 14 0 0 1 15 16 0 MatView 2 0.0 7.4778e-02 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetUp 2 1.0 1.3866e-0236.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 90 1.0 9.3211e+01 1.0 1.97e+14 1.4 3.0e+04 3.6e+04 1.1e+02 4100 98 85 30 93100 99 85 34 33011509 KSPGMRESOrthog 11 1.0 5.3543e-02 2.0 1.32e+07 1.0 0.0e+00 0.0e+00 1.1e+01 0 0 0 0 3 0 0 0 0 3 4931 PCSetUp 2 1.0 1.8253e-02 2.9 5.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 55 PCSetUpOnBlocks 1 1.0 1.8055e-02 2.9 5.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 55 PCApply 101 1.0 9.3089e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04 9.1e+01 4100 97 82 24 93100 97 82 27 33054820 EPSSolve 1 1.0 9.5183e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04 2.4e+02 4100 97 82 63 95100 97 82 73 32327750 STApply 89 1.0 9.3107e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04 9.1e+01 4100 97 82 24 93100 97 82 27 33048198 STMatSolve 89 1.0 9.3084e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04 9.1e+01 4100 97 82 24 93100 97 82 27 33056525 BVCreate 2 1.0 5.0357e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 2 0 0 0 0 2 0 BVCopy 1 1.0 9.2030e-05 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 BVMultVec 132 1.0 7.2259e-01 1.3 5.26e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 14567 BVMultInPlace 1 1.0 2.2316e-01 1.1 6.40e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 57357 BVDotVec 132 1.0 1.3370e+00 1.1 5.46e+08 1.0 0.0e+00 0.0e+00 1.3e+02 0 0 0 0 35 1 0 0 0 40 8169 BVOrthogonalizeV 81 1.0 1.9413e+00 1.1 1.07e+09 1.0 0.0e+00 0.0e+00 1.3e+02 0 0 0 0 35 2 0 0 0 40 11048 BVScale 89 1.0 3.0558e-03 1.4 4.45e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 29125 BVNormVec 8 1.0 1.5073e-02 1.9 1.20e+06 1.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 3 0 0 0 0 3 1592 BVSetRandom 1 1.0 4.3440e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DSSolve 1 1.0 2.5339e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DSVectors 80 1.0 3.5286e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DSOther 1 1.0 6.0797e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 --- Event Stage 1: Setting Up EPS BuildTwoSidedF 3 1.0 2.8591e-0211.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 4 1.0 6.1312e-03122.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCholFctrSym 1 1.0 1.1540e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 1 0 0 0 1 1 0 0 0 11 0 MatCholFctrNum 2 1.0 2.1019e+03 1.0 1.00e+09 4.3 0.0e+00 0.0e+00 0.0e+00 95 0 0 0 0 99100 0 0 0 4 MatCopy 1 1.0 3.3707e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 1 0 0 0 0 4 0 MatConvert 1 1.0 6.1760e-03 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 3 1.0 2.8630e-0211.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 3 1.0 3.2575e-02 1.1 0.00e+00 0.0 1.6e+02 2.0e+04 1.8e+01 0 0 1 0 5 0 0100100 39 0 MatGetRowIJ 1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 2.6703e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 1 1.0 1.0121e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAXPY 2 1.0 1.1354e-01 1.1 0.00e+00 0.0 1.6e+02 2.0e+04 2.0e+01 0 0 1 0 5 0 0100100 43 0 KSPSetUp 2 1.0 2.1458e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCSetUp 2 1.0 2.1135e+03 1.0 1.00e+09 4.3 0.0e+00 0.0e+00 1.2e+01 95 0 0 0 3 100100 0 0 26 4 EPSSetUp 1 1.0 2.1137e+03 1.0 1.00e+09 4.3 1.6e+02 2.0e+04 4.6e+01 95 0 1 0 12 100100100100100 4 STSetUp 2 1.0 1.0712e+03 1.0 4.95e+08 4.3 8.0e+01 2.0e+04 2.6e+01 48 0 0 0 7 51 50 50 50 57 3 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 37 50 126614208 0. Matrix 13 17 159831092 0. Viewer 6 5 4200 0. Index Set 12 13 2507240 0. Vec Scatter 5 7 128984 0. Krylov Solver 3 4 22776 0. Preconditioner 3 4 3848 0. EPS Solver 1 2 8632 0. Spectral Transform 1 2 1664 0. Basis Vectors 3 4 45600 0. PetscRandom 2 2 1292 0. Region 1 2 1344 0. Direct Solver 1 2 163856 0. --- Event Stage 1: Setting Up EPS Vector 19 6 729576 0. Matrix 10 6 12178892 0. Index Set 9 8 766336 0. Vec Scatter 4 2 2640 0. Krylov Solver 1 0 0 0. Preconditioner 1 0 0 0. EPS Solver 1 0 0 0. Spectral Transform 1 0 0 0. Basis Vectors 1 0 0 0. Region 1 0 0 0. Direct Solver 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 Average time for MPI_Barrier(): 0.000263596 Average time for zero size MPI_Send(): 5.78523e-05 #PETSc Option Table entries: -log_view -mat_mumps_cntl_3 1e-12 -mat_mumps_icntl_13 1 -mat_mumps_icntl_14 60 -mat_mumps_icntl_24 1 -matload_block_size 1 #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --prefix=/share/apps/petsc/3.10.5 --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-debugging=0 COPTFLAGS="-O3 -march=native -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3 -march=native -mtune=native" --download-mpich --download-fblaslapack --download-scalapack --download-mumps Best regards, Perceval, > On Mon, Nov 25, 2019 at 11:45 AM Perceval Desforges wrote: > >> I am basically trying to solve a finite element problem, which is why in 3D I have 7 non-zero diagonals that are quite farm apart from one another. In 2D I only have 5 non-zero diagonals that are less far apart. So is it normal that the set up time is around 400 times greater in the 3D case? Is there nothing to be done? > > No. It is almost certain that preallocation is screwed up. There is no way it can take 400x longer for a few nonzeros. > > In order to debug, please send the output of -log_view and indicate where the time is taken for assembly. You can usually > track down bad preallocation using -info. > > Thanks, > > Matt > > I will try setting up only one partition. > > Thanks, > > Perceval, > Probably it is not a preallocation issue, as it shows "total number of mallocs used during MatSetValues calls =0". > > Adding new diagonals may increase fill-in a lot, if the new diagonals are displaced with respect to the other ones. > > The partitions option is intended for running several nodes. If you are using just one node probably it is better to set one partition only. > > Jose > > El 25 nov 2019, a las 18:25, Matthew Knepley escribi?: > > On Mon, Nov 25, 2019 at 11:20 AM Perceval Desforges wrote: > Hi, > > So I'm loading two matrices from files, both 1000000 by 10000000. I ran the program with -mat_view::ascii_info and I got: > > Mat Object: 1 MPI processes > type: seqaij > rows=1000000, cols=1000000 > total: nonzeros=7000000, allocated nonzeros=7000000 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > > 20 times, and then > > Mat Object: 1 MPI processes > type: seqaij > rows=1000000, cols=1000000 > total: nonzeros=1000000, allocated nonzeros=1000000 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > > 20 times as well, and then > > Mat Object: 1 MPI processes > type: seqaij > rows=1000000, cols=1000000 > total: nonzeros=7000000, allocated nonzeros=7000000 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > > 20 times as well before crashing. > > I realized it might be because I am setting up 20 krylov schur partitions which may be too much. I tried running the code again with only 2 partitions and now the code runs but I have speed issues. > > I have one version of the code where my first matrix has 5 non-zero diagonals (so 5000000 non-zero entries), and the set up time is quite fast (8 seconds) and solving is also quite fast. The second version is the same but I have two extra non-zero diagonals (7000000 non-zero entries) and the set up time is a lot slower (2900 seconds ~ 50 minutes) and solving is also a lot slower. Is it normal that adding two extra diagonals increases solve and set up time so much? > > I can't see the rest of your code, but I am guessing your preallocation statement has "5", so it does no mallocs when you create > your first matrix, but mallocs for every row when you create your second matrix. When you load them from disk, we do all the > preallocation correctly. > > Thanks, > > Matt > Thanks again, > > Best regards, > > Perceval, > > Then I guess it is the factorization that is failing. How many nonzero entries do you have? Run with > -mat_view ::ascii_info > > Jose > > El 22 nov 2019, a las 19:56, Perceval Desforges escribi?: > > Hi, > > Thanks for your answer. I tried looking at the inertias before solving, but the problem is that the program crashes when I call EPSSetUp with this error: > > slurmstepd: error: Step 2140.0 exceeded virtual memory limit (313526508 > 107317760), being killed > > I get this error even when there are no eigenvalues in the interval. > > I've started using BVMAT instead of BVVECS by the way. > > Thanks, > > Perceval, > > Don't use -mat_mumps_icntl_14 to reduce the memory used by MUMPS. > > Most likely the problem is that the interval you gave is too large and contains too many eigenvalues (SLEPc needs to allocate at least one vector per each eigenvalue). You can count the eigenvalues in the interval with the inertias, which are available at EPSSetUp (no need to call EPSSolve). See this example: > http://slepc.upv.es/documentation/current/src/eps/examples/tutorials/ex25.c.html > You can comment out the call to EPSSolve() and run with the option -show_inertias > For example, the output > Shift 0.1 Inertia 3 > Shift 0.35 Inertia 11 > means that the interval [0.1,0.35] contains 8 eigenvalues (=11-3). > > By the way, I would suggest using BVMAT instead of BVVECS (the latter is slower). > > Jose > > El 21 nov 2019, a las 18:13, Perceval Desforges via petsc-users escribi?: > > Hello all, > > I am trying to obtain all the eigenvalues in a certain interval for a fairly large matrix (1000000 * 1000000). I therefore use the spectrum slicing method detailed in section 3.4.5 of the manual. The calculations are run on a processor with 20 cores and 96 Go of RAM. > > The options I use are : > > -bv_type vecs -eps_krylovschur_detect_zeros 1 -mat_mumps_icntl_13 1 -mat_mumps_icntl_24 1 -mat_mumps_cntl_3 1e-12 > > However the program quickly crashes with this error: > > slurmstepd: error: Step 2115.0 exceeded virtual memory limit (312121084 > 107317760), being killed > > I've tried reducing the amount of memory used by slepc with the -mat_mumps_icntl_14 option by setting it at -70 for example but then I get this error: > > [1]PETSC ERROR: Error in external library > [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=82733614 > > which is an error due to setting the mumps icntl option so low from what I've gathered. > > Is there any other way I can reduce memory usage? > > Thanks, > > Regards, > > Perceval, > > P.S. I sent the same email a few minutes ago but I think I made a mistake in the address, I'm sorry if I've sent it twice. -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ [1] Links: ------ [1] http://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Nov 26 10:11:15 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Tue, 26 Nov 2019 16:11:15 +0000 Subject: [petsc-users] Memory optimization In-Reply-To: <7ac1554555a98057cf38e8d69fb2f9c8@polytechnique.edu> References: <7e97478cd26eb88d653bfd7617e92f18@polytechnique.edu> <852862500ebf52db0edde47a63ce8ae7@polytechnique.edu> <9177FF06-92F4-40E4-8CF6-4A2995C17072@dsic.upv.es> <792624a5b70858444b7e529ce3624395@polytechnique.edu> <0007da7378494d3bdb15c219872d3359@polytechnique.edu> <7ac1554555a98057cf38e8d69fb2f9c8@polytechnique.edu> Message-ID: > I am basically trying to solve a finite element problem, which is why in 3D I have 7 non-zero diagonals that are quite farm apart from one another. In 2D I only have 5 non-zero diagonals that are less far apart. So is it normal that the set up time is around 400 times greater in the 3D case? Is there nothing to be done? Yes, sparse direct solver behavior between 2d and 3d problems can be dramatically different in both space and time. There is a well developed understanding of this from the 1970s. For 2d the results are given in https://epubs.siam.org/doi/abs/10.1137/0710032?journalCode=sjnaam work is n^3 space is n^2 log (n) using nested dissection ordering In 3d work is n^6 see http://amath.colorado.edu/faculty/martinss/2014_CBMS/Lectures/lecture06.pdf So 3d is very limited for direct solvers; and one has to try something else. Barry > On Nov 26, 2019, at 9:23 AM, Perceval Desforges wrote: > > Hello, > > This is the output of -log_view. I selected what I thought were the important parts. I don't know if this is the best format to send the logs. If a text file is better let me know. Thanks again, > > > > ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- > > ./dos.exe on a named compute-0-11.local with 20 processors, by pcd Tue Nov 26 15:50:50 2019 > Using Petsc Release Version 3.10.5, Mar, 28, 2019 > > Max Max/Min Avg Total > Time (sec): 2.214e+03 1.000 2.214e+03 > Objects: 1.370e+02 1.030 1.332e+02 > Flop: 1.967e+14 1.412 1.539e+14 3.077e+15 > Flop/sec: 8.886e+10 1.412 6.950e+10 1.390e+12 > MPI Messages: 1.716e+03 1.350 1.516e+03 3.032e+04 > MPI Message Lengths: 2.559e+08 5.796 4.179e+04 1.267e+09 > MPI Reductions: 3.840e+02 1.000 > > Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total Count %Total Avg %Total Count %Total > 0: Main Stage: 1.0000e+02 4.5% 3.0771e+15 100.0% 3.016e+04 99.5% 4.190e+04 99.7% 3.310e+02 86.2% > 1: Setting Up EPS: 2.1137e+03 95.5% 7.4307e+09 0.0% 1.600e+02 0.5% 2.000e+04 0.3% 4.600e+01 12.0% > > > > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flop --- Global --- --- Stage ---- Total > Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > PetscBarrier 2 1.0 2.6554e+004632.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 0 0 0 0 0 > BuildTwoSidedF 3 1.0 1.2021e-01672.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecDot 8 1.0 1.1364e-02 2.3 8.00e+05 1.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 2 0 0 0 0 2 1408 > VecMDot 11 1.0 4.8588e-02 2.2 6.60e+06 1.0 0.0e+00 0.0e+00 1.1e+01 0 0 0 0 3 0 0 0 0 3 2717 > VecNorm 12 1.0 5.2616e-02 4.3 1.20e+06 1.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 3 0 0 0 0 4 456 > VecScale 12 1.0 9.8681e-04 2.2 6.00e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 12160 > VecCopy 3 1.0 4.1175e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 108 1.0 9.3610e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAXPY 1 1.0 1.6284e-04 3.2 1.00e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 12282 > VecMAXPY 12 1.0 7.6976e-03 1.9 7.70e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 20006 > VecScatterBegin 419 1.0 4.5905e-01 3.7 0.00e+00 0.0 2.9e+04 3.7e+04 9.0e+01 0 0 96 85 23 0 0 97 85 27 0 > VecScatterEnd 329 1.0 9.3328e-01 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 > VecSetRandom 1 1.0 4.3299e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecNormalize 12 1.0 5.3697e-02 4.2 1.80e+06 1.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 3 0 0 0 0 4 670 > MatMult 240 1.0 1.2112e-01 1.5 1.86e+07 1.0 4.4e+02 8.0e+04 0.0e+00 0 0 1 3 0 0 0 1 3 0 3071 > MatSolve 101 1.0 9.3087e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04 9.1e+01 4100 97 82 24 93100 97 82 27 33055277 > MatCholFctrNum 1 1.0 1.2752e-02 2.8 5.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 78 > MatICCFactorSym 1 1.0 4.0321e-03 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyBegin 5 1.7 1.2031e-01501.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 5 1.7 6.6613e-02 2.4 0.00e+00 0.0 1.6e+02 2.0e+04 2.4e+01 0 0 1 0 6 0 0 1 0 7 0 > MatGetRowIJ 1 1.0 7.1526e-06 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetOrdering 1 1.0 1.2271e-03 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatLoad 3 1.0 2.8543e-01 1.0 0.00e+00 0.0 3.3e+02 5.6e+05 5.4e+01 0 0 1 15 14 0 0 1 15 16 0 > MatView 2 0.0 7.4778e-02 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSetUp 2 1.0 1.3866e-0236.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 90 1.0 9.3211e+01 1.0 1.97e+14 1.4 3.0e+04 3.6e+04 1.1e+02 4100 98 85 30 93100 99 85 34 33011509 > KSPGMRESOrthog 11 1.0 5.3543e-02 2.0 1.32e+07 1.0 0.0e+00 0.0e+00 1.1e+01 0 0 0 0 3 0 0 0 0 3 4931 > PCSetUp 2 1.0 1.8253e-02 2.9 5.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 55 > PCSetUpOnBlocks 1 1.0 1.8055e-02 2.9 5.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 55 > PCApply 101 1.0 9.3089e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04 9.1e+01 4100 97 82 24 93100 97 82 27 33054820 > EPSSolve 1 1.0 9.5183e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04 2.4e+02 4100 97 82 63 95100 97 82 73 32327750 > STApply 89 1.0 9.3107e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04 9.1e+01 4100 97 82 24 93100 97 82 27 33048198 > STMatSolve 89 1.0 9.3084e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04 9.1e+01 4100 97 82 24 93100 97 82 27 33056525 > BVCreate 2 1.0 5.0357e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 2 0 0 0 0 2 0 > BVCopy 1 1.0 9.2030e-05 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > BVMultVec 132 1.0 7.2259e-01 1.3 5.26e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 14567 > BVMultInPlace 1 1.0 2.2316e-01 1.1 6.40e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 57357 > BVDotVec 132 1.0 1.3370e+00 1.1 5.46e+08 1.0 0.0e+00 0.0e+00 1.3e+02 0 0 0 0 35 1 0 0 0 40 8169 > BVOrthogonalizeV 81 1.0 1.9413e+00 1.1 1.07e+09 1.0 0.0e+00 0.0e+00 1.3e+02 0 0 0 0 35 2 0 0 0 40 11048 > BVScale 89 1.0 3.0558e-03 1.4 4.45e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 29125 > BVNormVec 8 1.0 1.5073e-02 1.9 1.20e+06 1.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 3 0 0 0 0 3 1592 > BVSetRandom 1 1.0 4.3440e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > DSSolve 1 1.0 2.5339e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > DSVectors 80 1.0 3.5286e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > DSOther 1 1.0 6.0797e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > --- Event Stage 1: Setting Up EPS > > BuildTwoSidedF 3 1.0 2.8591e-0211.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 4 1.0 6.1312e-03122.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatCholFctrSym 1 1.0 1.1540e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 1 0 0 0 1 1 0 0 0 11 0 > MatCholFctrNum 2 1.0 2.1019e+03 1.0 1.00e+09 4.3 0.0e+00 0.0e+00 0.0e+00 95 0 0 0 0 99100 0 0 0 4 > MatCopy 1 1.0 3.3707e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 1 0 0 0 0 4 0 > MatConvert 1 1.0 6.1760e-03 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyBegin 3 1.0 2.8630e-0211.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 3 1.0 3.2575e-02 1.1 0.00e+00 0.0 1.6e+02 2.0e+04 1.8e+01 0 0 1 0 5 0 0100100 39 0 > MatGetRowIJ 1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetOrdering 1 1.0 2.6703e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatZeroEntries 1 1.0 1.0121e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAXPY 2 1.0 1.1354e-01 1.1 0.00e+00 0.0 1.6e+02 2.0e+04 2.0e+01 0 0 1 0 5 0 0100100 43 0 > KSPSetUp 2 1.0 2.1458e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > PCSetUp 2 1.0 2.1135e+03 1.0 1.00e+09 4.3 0.0e+00 0.0e+00 1.2e+01 95 0 0 0 3 100100 0 0 26 4 > EPSSetUp 1 1.0 2.1137e+03 1.0 1.00e+09 4.3 1.6e+02 2.0e+04 4.6e+01 95 0 1 0 12 100100100100100 4 > STSetUp 2 1.0 1.0712e+03 1.0 4.95e+08 4.3 8.0e+01 2.0e+04 2.6e+01 48 0 0 0 7 51 50 50 50 57 3 > ------------------------------------------------------------------------------------------------------------------------ > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Vector 37 50 126614208 0. > Matrix 13 17 159831092 0. > Viewer 6 5 4200 0. > Index Set 12 13 2507240 0. > Vec Scatter 5 7 128984 0. > Krylov Solver 3 4 22776 0. > Preconditioner 3 4 3848 0. > EPS Solver 1 2 8632 0. > Spectral Transform 1 2 1664 0. > Basis Vectors 3 4 45600 0. > PetscRandom 2 2 1292 0. > Region 1 2 1344 0. > Direct Solver 1 2 163856 0. > > --- Event Stage 1: Setting Up EPS > > Vector 19 6 729576 0. > Matrix 10 6 12178892 0. > Index Set 9 8 766336 0. > Vec Scatter 4 2 2640 0. > Krylov Solver 1 0 0 0. > Preconditioner 1 0 0 0. > EPS Solver 1 0 0 0. > Spectral Transform 1 0 0 0. > Basis Vectors 1 0 0 0. > Region 1 0 0 0. > Direct Solver 1 0 0 0. > ======================================================================================================================== > Average time to get PetscTime(): 9.53674e-08 > Average time for MPI_Barrier(): 0.000263596 > Average time for zero size MPI_Send(): 5.78523e-05 > #PETSc Option Table entries: > -log_view > -mat_mumps_cntl_3 1e-12 > -mat_mumps_icntl_13 1 > -mat_mumps_icntl_14 60 > -mat_mumps_icntl_24 1 > -matload_block_size 1 > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > Configure options: --prefix=/share/apps/petsc/3.10.5 --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-debugging=0 COPTFLAGS="-O3 -march=native -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3 -march=native -mtune=native" --download-mpich --download-fblaslapack --download-scalapack --download-mumps > > > > Best regards, > > Perceval, > > > > > >> On Mon, Nov 25, 2019 at 11:45 AM Perceval Desforges wrote: >> I am basically trying to solve a finite element problem, which is why in 3D I have 7 non-zero diagonals that are quite farm apart from one another. In 2D I only have 5 non-zero diagonals that are less far apart. So is it normal that the set up time is around 400 times greater in the 3D case? Is there nothing to be done? >> >> >> >> No. It is almost certain that preallocation is screwed up. There is no way it can take 400x longer for a few nonzeros. >> >> In order to debug, please send the output of -log_view and indicate where the time is taken for assembly. You can usually >> track down bad preallocation using -info. >> >> Thanks, >> >> Matt >> I will try setting up only one partition. >> >> Thanks, >> >> Perceval, >> >> Probably it is not a preallocation issue, as it shows "total number of mallocs used during MatSetValues calls =0". >> >> Adding new diagonals may increase fill-in a lot, if the new diagonals are displaced with respect to the other ones. >> >> The partitions option is intended for running several nodes. If you are using just one node probably it is better to set one partition only. >> >> Jose >> >> >> El 25 nov 2019, a las 18:25, Matthew Knepley escribi?: >> >> On Mon, Nov 25, 2019 at 11:20 AM Perceval Desforges wrote: >> Hi, >> >> So I'm loading two matrices from files, both 1000000 by 10000000. I ran the program with -mat_view::ascii_info and I got: >> >> Mat Object: 1 MPI processes >> type: seqaij >> rows=1000000, cols=1000000 >> total: nonzeros=7000000, allocated nonzeros=7000000 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> >> 20 times, and then >> >> Mat Object: 1 MPI processes >> type: seqaij >> rows=1000000, cols=1000000 >> total: nonzeros=1000000, allocated nonzeros=1000000 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> >> 20 times as well, and then >> >> Mat Object: 1 MPI processes >> type: seqaij >> rows=1000000, cols=1000000 >> total: nonzeros=7000000, allocated nonzeros=7000000 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> >> 20 times as well before crashing. >> >> I realized it might be because I am setting up 20 krylov schur partitions which may be too much. I tried running the code again with only 2 partitions and now the code runs but I have speed issues. >> >> I have one version of the code where my first matrix has 5 non-zero diagonals (so 5000000 non-zero entries), and the set up time is quite fast (8 seconds) and solving is also quite fast. The second version is the same but I have two extra non-zero diagonals (7000000 non-zero entries) and the set up time is a lot slower (2900 seconds ~ 50 minutes) and solving is also a lot slower. Is it normal that adding two extra diagonals increases solve and set up time so much? >> >> >> I can't see the rest of your code, but I am guessing your preallocation statement has "5", so it does no mallocs when you create >> your first matrix, but mallocs for every row when you create your second matrix. When you load them from disk, we do all the >> preallocation correctly. >> >> Thanks, >> >> Matt >> Thanks again, >> >> Best regards, >> >> Perceval, >> >> >> >> >> >> Then I guess it is the factorization that is failing. How many nonzero entries do you have? Run with >> -mat_view ::ascii_info >> >> Jose >> >> >> El 22 nov 2019, a las 19:56, Perceval Desforges escribi?: >> >> Hi, >> >> Thanks for your answer. I tried looking at the inertias before solving, but the problem is that the program crashes when I call EPSSetUp with this error: >> >> slurmstepd: error: Step 2140.0 exceeded virtual memory limit (313526508 > 107317760), being killed >> >> I get this error even when there are no eigenvalues in the interval. >> >> I've started using BVMAT instead of BVVECS by the way. >> >> Thanks, >> >> Perceval, >> >> >> >> >> >> Don't use -mat_mumps_icntl_14 to reduce the memory used by MUMPS. >> >> Most likely the problem is that the interval you gave is too large and contains too many eigenvalues (SLEPc needs to allocate at least one vector per each eigenvalue). You can count the eigenvalues in the interval with the inertias, which are available at EPSSetUp (no need to call EPSSolve). See this example: >> http://slepc.upv.es/documentation/current/src/eps/examples/tutorials/ex25.c.html >> You can comment out the call to EPSSolve() and run with the option -show_inertias >> For example, the output >> Shift 0.1 Inertia 3 >> Shift 0.35 Inertia 11 >> means that the interval [0.1,0.35] contains 8 eigenvalues (=11-3). >> >> By the way, I would suggest using BVMAT instead of BVVECS (the latter is slower). >> >> Jose >> >> >> El 21 nov 2019, a las 18:13, Perceval Desforges via petsc-users escribi?: >> >> Hello all, >> >> I am trying to obtain all the eigenvalues in a certain interval for a fairly large matrix (1000000 * 1000000). I therefore use the spectrum slicing method detailed in section 3.4.5 of the manual. The calculations are run on a processor with 20 cores and 96 Go of RAM. >> >> The options I use are : >> >> -bv_type vecs -eps_krylovschur_detect_zeros 1 -mat_mumps_icntl_13 1 -mat_mumps_icntl_24 1 -mat_mumps_cntl_3 1e-12 >> >> >> >> However the program quickly crashes with this error: >> >> slurmstepd: error: Step 2115.0 exceeded virtual memory limit (312121084 > 107317760), being killed >> >> I've tried reducing the amount of memory used by slepc with the -mat_mumps_icntl_14 option by setting it at -70 for example but then I get this error: >> >> [1]PETSC ERROR: Error in external library >> [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=82733614 >> >> which is an error due to setting the mumps icntl option so low from what I've gathered. >> >> Is there any other way I can reduce memory usage? >> >> >> >> Thanks, >> >> Regards, >> >> Perceval, >> >> >> >> P.S. I sent the same email a few minutes ago but I think I made a mistake in the address, I'm sorry if I've sent it twice. >> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > > From danyang.su at gmail.com Tue Nov 26 11:42:34 2019 From: danyang.su at gmail.com (Danyang Su) Date: Tue, 26 Nov 2019 09:42:34 -0800 Subject: [petsc-users] Domain decomposition using DMPLEX In-Reply-To: References: Message-ID: <04c21713-b569-0b30-fd48-d1cbf4bb53c9@gmail.com> On 2019-11-25 7:54 p.m., Matthew Knepley wrote: > On Mon, Nov 25, 2019 at 6:25 PM Swarnava Ghosh > wrote: > > Dear PETSc users and developers, > > I am working with dmplex to distribute a 3D unstructured mesh made > of tetrahedrons in a cuboidal domain. I had a few queries: > 1) Is there any way of ensuring load balancing based on the number > of vertices per MPI process. > > > You can now call?DMPlexRebalanceSharedPoints() to try and get better > balance of vertices. Hi Matt, I just want to follow up if this new function can help to solve the "Strange Partition in PETSc 3.11" problem I mentioned before. Would you please let me know when shall I call this function? Right before DMPlexDistribute? call DMPlexCreateFromCellList call DMPlexGetPartitioner call PetscPartitionerSetFromOptions call DMPlexDistribute Thanks, Danyang > 2) As the global domain is cuboidal, is the resulting domain > decomposition also cuboidal on every MPI process? If not, is there > a way to ensure this? For example in DMDA, the default domain > decomposition for a cuboidal domain is cuboidal. > > > It sounds like you do not want something that is actually > unstructured. Rather, it seems like you want to > take a DMDA type thing and split it into tets. You can get a cuboidal > decomposition of a hex mesh easily. > Call DMPlexCreateBoxMesh() with one cell for every process, > distribute, and then uniformly refine. This > will not quite work for tets since the mesh partitioner will tend to > violate that constraint. You could: > > ? a) Prescribe the distribution yourself using the Shell partitioner type > > or > > ? b) Write a refiner that turns hexes into tets > > We already have a refiner that turns tets into hexes, but we never > wrote the other direction because it was not clear > that it was useful. > > ? Thanks, > > ? ? ?Matt > > Sincerely, > SG > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 26 12:18:50 2019 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 26 Nov 2019 12:18:50 -0600 Subject: [petsc-users] Domain decomposition using DMPLEX In-Reply-To: <04c21713-b569-0b30-fd48-d1cbf4bb53c9@gmail.com> References: <04c21713-b569-0b30-fd48-d1cbf4bb53c9@gmail.com> Message-ID: On Tue, Nov 26, 2019 at 11:43 AM Danyang Su wrote: > On 2019-11-25 7:54 p.m., Matthew Knepley wrote: > > On Mon, Nov 25, 2019 at 6:25 PM Swarnava Ghosh > wrote: > >> Dear PETSc users and developers, >> >> I am working with dmplex to distribute a 3D unstructured mesh made of >> tetrahedrons in a cuboidal domain. I had a few queries: >> 1) Is there any way of ensuring load balancing based on the number of >> vertices per MPI process. >> > > You can now call DMPlexRebalanceSharedPoints() to try and get better > balance of vertices. > > Hi Matt, > > I just want to follow up if this new function can help to solve the > "Strange Partition in PETSc 3.11" problem I mentioned before. Would you > please let me know when shall I call this function? Right before > DMPlexDistribute? > This is not the problem. I believe the problem is that you are partitioning hybrid cells, and the way we handle them internally changed, which I think screwed up the dual mesh for partitioning in your example. I have been sick, so I have not gotten to your example yet, but I will. Sorry about that, Matt > call DMPlexCreateFromCellList > > call DMPlexGetPartitioner > > call PetscPartitionerSetFromOptions > > call DMPlexDistribute > > Thanks, > > Danyang > > > >> 2) As the global domain is cuboidal, is the resulting domain >> decomposition also cuboidal on every MPI process? If not, is there a way to >> ensure this? For example in DMDA, the default domain decomposition for a >> cuboidal domain is cuboidal. >> > > It sounds like you do not want something that is actually unstructured. > Rather, it seems like you want to > take a DMDA type thing and split it into tets. You can get a cuboidal > decomposition of a hex mesh easily. > Call DMPlexCreateBoxMesh() with one cell for every process, distribute, > and then uniformly refine. This > will not quite work for tets since the mesh partitioner will tend to > violate that constraint. You could: > > a) Prescribe the distribution yourself using the Shell partitioner type > > or > > b) Write a refiner that turns hexes into tets > > We already have a refiner that turns tets into hexes, but we never wrote > the other direction because it was not clear > that it was useful. > > Thanks, > > Matt > > >> Sincerely, >> SG >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From danyang.su at gmail.com Tue Nov 26 12:24:22 2019 From: danyang.su at gmail.com (Danyang Su) Date: Tue, 26 Nov 2019 10:24:22 -0800 Subject: [petsc-users] Domain decomposition using DMPLEX In-Reply-To: References: <04c21713-b569-0b30-fd48-d1cbf4bb53c9@gmail.com> Message-ID: <0049d72d-cefd-20f5-1fb7-a81295fbaf63@gmail.com> On 2019-11-26 10:18 a.m., Matthew Knepley wrote: > On Tue, Nov 26, 2019 at 11:43 AM Danyang Su > wrote: > > On 2019-11-25 7:54 p.m., Matthew Knepley wrote: >> On Mon, Nov 25, 2019 at 6:25 PM Swarnava Ghosh >> > wrote: >> >> Dear PETSc users and developers, >> >> I am working with dmplex to distribute a 3D unstructured mesh >> made of tetrahedrons in a cuboidal domain. I had a few queries: >> 1) Is there any way of ensuring load balancing based on the >> number of vertices per MPI process. >> >> >> You can now call?DMPlexRebalanceSharedPoints() to try and get >> better balance of vertices. > > Hi Matt, > > I just want to follow up if this new function can help to solve > the "Strange Partition in PETSc 3.11" problem I mentioned before. > Would you please let me know when shall I call this function? > Right before DMPlexDistribute? > > This is not the problem. I believe the problem is that you are > partitioning hybrid cells, and the way we handle > them internally changed, which I think screwed up the dual mesh for > partitioning in your example. I have been > sick, so I have not gotten to your example yet, but I will. Hope you are getting well soon. The mesh is not hybrid, only prism cells layer by layer. But the height of the prism varies significantly. Thanks, Danyang > > ? Sorry about that, > > ? ? Matt > > call DMPlexCreateFromCellList > > call DMPlexGetPartitioner > > call PetscPartitionerSetFromOptions > > call DMPlexDistribute > > Thanks, > > Danyang > >> 2) As the global domain is cuboidal, is the resulting domain >> decomposition also cuboidal on every MPI process? If not, is >> there a way to ensure this? For example in DMDA, the default >> domain decomposition for a cuboidal domain is cuboidal. >> >> >> It sounds like you do not want something that is actually >> unstructured. Rather, it seems like you want to >> take a DMDA type thing and split it into tets. You can get a >> cuboidal decomposition of a hex mesh easily. >> Call DMPlexCreateBoxMesh() with one cell for every process, >> distribute, and then uniformly refine. This >> will not quite work for tets since the mesh partitioner will tend >> to violate that constraint. You could: >> >> ? a) Prescribe the distribution yourself using the Shell >> partitioner type >> >> or >> >> ? b) Write a refiner that turns hexes into tets >> >> We already have a refiner that turns tets into hexes, but we >> never wrote the other direction because it was not clear >> that it was useful. >> >> ? Thanks, >> >> ? ? ?Matt >> >> Sincerely, >> SG >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Nov 26 12:34:54 2019 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 26 Nov 2019 12:34:54 -0600 Subject: [petsc-users] Domain decomposition using DMPLEX In-Reply-To: <0049d72d-cefd-20f5-1fb7-a81295fbaf63@gmail.com> References: <04c21713-b569-0b30-fd48-d1cbf4bb53c9@gmail.com> <0049d72d-cefd-20f5-1fb7-a81295fbaf63@gmail.com> Message-ID: On Tue, Nov 26, 2019 at 12:24 PM Danyang Su wrote: > On 2019-11-26 10:18 a.m., Matthew Knepley wrote: > > On Tue, Nov 26, 2019 at 11:43 AM Danyang Su wrote: > >> On 2019-11-25 7:54 p.m., Matthew Knepley wrote: >> >> On Mon, Nov 25, 2019 at 6:25 PM Swarnava Ghosh >> wrote: >> >>> Dear PETSc users and developers, >>> >>> I am working with dmplex to distribute a 3D unstructured mesh made of >>> tetrahedrons in a cuboidal domain. I had a few queries: >>> 1) Is there any way of ensuring load balancing based on the number of >>> vertices per MPI process. >>> >> >> You can now call DMPlexRebalanceSharedPoints() to try and get better >> balance of vertices. >> >> Hi Matt, >> >> I just want to follow up if this new function can help to solve the >> "Strange Partition in PETSc 3.11" problem I mentioned before. Would you >> please let me know when shall I call this function? Right before >> DMPlexDistribute? >> > This is not the problem. I believe the problem is that you are > partitioning hybrid cells, and the way we handle > them internally changed, which I think screwed up the dual mesh for > partitioning in your example. I have been > sick, so I have not gotten to your example yet, but I will. > > Hope you are getting well soon. The mesh is not hybrid, only prism cells > layer by layer. > Prism cells are called "hybrid" right now, which is indeed a bad term and I will change. Thanks, Matt > But the height of the prism varies significantly. > > Thanks, > > Danyang > > > Sorry about that, > > Matt > >> call DMPlexCreateFromCellList >> >> call DMPlexGetPartitioner >> >> call PetscPartitionerSetFromOptions >> >> call DMPlexDistribute >> >> Thanks, >> >> Danyang >> >> >> >>> 2) As the global domain is cuboidal, is the resulting domain >>> decomposition also cuboidal on every MPI process? If not, is there a way to >>> ensure this? For example in DMDA, the default domain decomposition for a >>> cuboidal domain is cuboidal. >>> >> >> It sounds like you do not want something that is actually unstructured. >> Rather, it seems like you want to >> take a DMDA type thing and split it into tets. You can get a cuboidal >> decomposition of a hex mesh easily. >> Call DMPlexCreateBoxMesh() with one cell for every process, distribute, >> and then uniformly refine. This >> will not quite work for tets since the mesh partitioner will tend to >> violate that constraint. You could: >> >> a) Prescribe the distribution yourself using the Shell partitioner type >> >> or >> >> b) Write a refiner that turns hexes into tets >> >> We already have a refiner that turns tets into hexes, but we never wrote >> the other direction because it was not clear >> that it was useful. >> >> Thanks, >> >> Matt >> >> >>> Sincerely, >>> SG >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pranayreddy865 at gmail.com Thu Nov 28 21:07:57 2019 From: pranayreddy865 at gmail.com (baikadi pranay) Date: Thu, 28 Nov 2019 20:07:57 -0700 Subject: [petsc-users] Outputting matrix for viewing in matlab Message-ID: Hello PETSc users, I have a sparse matrix built and I want to output the matrix for viewing in matlab. However i'm having difficulty outputting the matrix. I am writing my program in Fortran90 and I've included the following lines to output the matrix. *call PetscViewerBinaryOpen(PETSC_COMM_SELF,'matrix',FILE_MODE_WRITE,view,ierr) call PetscViewerBinaryGetDescriptor(view,fd,ierr) call PetscBinaryWrite(fd,ham,1,PETSC_SCALAR,PETSC_FALSE,ierr)* These lines do create a matrix but matlab says its not a binary file. Could you please provide me some inputs on where I'm going wrong and how to proceed with this problem. I can provide any further information that you might need to help me solve this problem. Thank you. Sincerely, Pranay. -------------- next part -------------- An HTML attachment was scrubbed... URL: From swarnava89 at gmail.com Thu Nov 28 21:44:57 2019 From: swarnava89 at gmail.com (Swarnava Ghosh) Date: Thu, 28 Nov 2019 19:44:57 -0800 Subject: [petsc-users] Domain decomposition using DMPLEX In-Reply-To: References: Message-ID: Hi Barry, "Why do you need a cuboidal domain decomposition?" I gave it some thought. I don't always need a cuboidal decomposition. But I would need something that essentially minimized the surface area of the faces of each decomposition. Is there a way to get this? Could you please direct me to a reference a reference where I can read about the domain decomposition strategies used in petsc dmplex. Sincerely, Swarnava On Mon, Nov 25, 2019 at 9:02 PM Smith, Barry F. wrote: > > "No, I have an unstructured mesh that increases in resolution away from > the center of the cuboid. See Figure: 5 in the ArXiv paper > https://arxiv.org/pdf/1907.02604.pdf for a slice through the midplane of > the cuboid. Given this type of mesh, will dmplex do a cuboidal domain > decomposition?" > > No definitely not. Why do you need a cuboidal domain decomposition? > > Barry > > > > On Nov 25, 2019, at 10:45 PM, Swarnava Ghosh > wrote: > > > > Hi Matt, > > > > > > https://arxiv.org/pdf/1907.02604.pdf > > > > On Mon, Nov 25, 2019 at 7:54 PM Matthew Knepley > wrote: > > On Mon, Nov 25, 2019 at 6:25 PM Swarnava Ghosh > wrote: > > Dear PETSc users and developers, > > > > I am working with dmplex to distribute a 3D unstructured mesh made of > tetrahedrons in a cuboidal domain. I had a few queries: > > 1) Is there any way of ensuring load balancing based on the number of > vertices per MPI process. > > > > You can now call DMPlexRebalanceSharedPoints() to try and get better > balance of vertices. > > > > Thank you for pointing out this function! > > > > 2) As the global domain is cuboidal, is the resulting domain > decomposition also cuboidal on every MPI process? If not, is there a way to > ensure this? For example in DMDA, the default domain decomposition for a > cuboidal domain is cuboidal. > > > > It sounds like you do not want something that is actually unstructured. > Rather, it seems like you want to > > take a DMDA type thing and split it into tets. You can get a cuboidal > decomposition of a hex mesh easily. > > Call DMPlexCreateBoxMesh() with one cell for every process, distribute, > and then uniformly refine. This > > will not quite work for tets since the mesh partitioner will tend to > violate that constraint. You could: > > > > No, I have an unstructured mesh that increases in resolution away from > the center of the cuboid. See Figure: 5 in the ArXiv paper > https://arxiv.org/pdf/1907.02604.pdf for a slice through the midplane of > the cuboid. Given this type of mesh, will dmplex do a cuboidal domain > decomposition? > > > > Sincerely, > > SG > > > > a) Prescribe the distribution yourself using the Shell partitioner type > > > > or > > > > b) Write a refiner that turns hexes into tets > > > > We already have a refiner that turns tets into hexes, but we never wrote > the other direction because it was not clear > > that it was useful. > > > > Thanks, > > > > Matt > > > > Sincerely, > > SG > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.sanan at gmail.com Fri Nov 29 02:14:51 2019 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Fri, 29 Nov 2019 09:14:51 +0100 Subject: [petsc-users] Outputting matrix for viewing in matlab In-Reply-To: References: Message-ID: PETSc has its own binary format, which is not the same as MATLAB's. However, PETSc includes some MATLAB/Octave scripts which will load these binary files. See $PETSC_DIR/share/matlab/PetscBinaryRead.m - there are some examples in the comments at the top of that file. Note that you will probably want to add $PETSC_DIR/share/matlab to your MATLAB path so that you can run the script. This is what I have for Octave, but I'm not sure if it this, precisely, works in MATLAB: $ cat ~/.octaverc PETSC_DIR=getenv('PETSC_DIR'); if length(PETSC_DIR)==0 PETSC_DIR='~/code/petsc' end addpath([PETSC_DIR,'/share/petsc/matlab']) (As an aside, note that there are also scripts included to load PETSc binary files to use with numpy/scipy in Python, e.g. $PETSC_DIR/lib/petsc/bin/PetscBinaryIO.py) > Am 29.11.2019 um 04:07 schrieb baikadi pranay : > > Hello PETSc users, > > I have a sparse matrix built and I want to output the matrix for viewing in matlab. However i'm having difficulty outputting the matrix. I am writing my program in Fortran90 and I've included the following lines to output the matrix. > > call PetscViewerBinaryOpen(PETSC_COMM_SELF,'matrix',FILE_MODE_WRITE,view,ierr) > call PetscViewerBinaryGetDescriptor(view,fd,ierr) > call PetscBinaryWrite(fd,ham,1,PETSC_SCALAR,PETSC_FALSE,ierr) > > These lines do create a matrix but matlab says its not a binary file. Could you please provide me some inputs on where I'm going wrong and how to proceed with this problem. I can provide any further information that you might need to help me solve this problem. > > > Thank you. > > Sincerely, > Pranay. -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.sanan at gmail.com Fri Nov 29 02:16:19 2019 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Fri, 29 Nov 2019 09:16:19 +0100 Subject: [petsc-users] Outputting matrix for viewing in matlab In-Reply-To: References: Message-ID: > Am 29.11.2019 um 09:14 schrieb Patrick Sanan : > > PETSc has its own binary format, which is not the same as MATLAB's. > > However, PETSc includes some MATLAB/Octave scripts which will load these binary files. > > See $PETSC_DIR/share/matlab/PetscBinaryRead.m - there are some examples in the comments at the top of that file. Correction: $PETSC_DIR/share/petsc/matlab/PetscBinaryRead.m > > > Note that you will probably want to add $PETSC_DIR/share/matlab to your MATLAB path so that you can run the script. This is what I have for Octave, but I'm not sure if it this, precisely, works in MATLAB: > > $ cat ~/.octaverc > PETSC_DIR=getenv('PETSC_DIR'); > if length(PETSC_DIR)==0 > PETSC_DIR='~/code/petsc' > end > addpath([PETSC_DIR,'/share/petsc/matlab']) > > (As an aside, note that there are also scripts included to load PETSc binary files to use with numpy/scipy in Python, e.g. $PETSC_DIR/lib/petsc/bin/PetscBinaryIO.py) > >> Am 29.11.2019 um 04:07 schrieb baikadi pranay >: >> >> Hello PETSc users, >> >> I have a sparse matrix built and I want to output the matrix for viewing in matlab. However i'm having difficulty outputting the matrix. I am writing my program in Fortran90 and I've included the following lines to output the matrix. >> >> call PetscViewerBinaryOpen(PETSC_COMM_SELF,'matrix',FILE_MODE_WRITE,view,ierr) >> call PetscViewerBinaryGetDescriptor(view,fd,ierr) >> call PetscBinaryWrite(fd,ham,1,PETSC_SCALAR,PETSC_FALSE,ierr) >> >> These lines do create a matrix but matlab says its not a binary file. Could you please provide me some inputs on where I'm going wrong and how to proceed with this problem. I can provide any further information that you might need to help me solve this problem. >> >> >> Thank you. >> >> Sincerely, >> Pranay. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Nov 29 08:44:14 2019 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 29 Nov 2019 08:44:14 -0600 Subject: [petsc-users] Domain decomposition using DMPLEX In-Reply-To: References: Message-ID: On Thu, Nov 28, 2019 at 9:45 PM Swarnava Ghosh wrote: > Hi Barry, > > "Why do you need a cuboidal domain decomposition?" > > I gave it some thought. I don't always need a cuboidal decomposition. But > I would need something that essentially minimized the surface area of the > faces of each decomposition. Is there a way to get this? Could you please > direct me to a reference a reference where I can read about the domain > decomposition strategies used in petsc dmplex. > This is the point of graph partitioning, which minimizes the "cut" which the the number of links between one partition and another. The ParMetis manual has this kind of information, and citations. Thanks, Matt > Sincerely, > Swarnava > > On Mon, Nov 25, 2019 at 9:02 PM Smith, Barry F. > wrote: > >> >> "No, I have an unstructured mesh that increases in resolution away from >> the center of the cuboid. See Figure: 5 in the ArXiv paper >> https://arxiv.org/pdf/1907.02604.pdf for a slice through the midplane >> of the cuboid. Given this type of mesh, will dmplex do a cuboidal domain >> decomposition?" >> >> No definitely not. Why do you need a cuboidal domain decomposition? >> >> Barry >> >> >> > On Nov 25, 2019, at 10:45 PM, Swarnava Ghosh >> wrote: >> > >> > Hi Matt, >> > >> > >> > https://arxiv.org/pdf/1907.02604.pdf >> > >> > On Mon, Nov 25, 2019 at 7:54 PM Matthew Knepley >> wrote: >> > On Mon, Nov 25, 2019 at 6:25 PM Swarnava Ghosh >> wrote: >> > Dear PETSc users and developers, >> > >> > I am working with dmplex to distribute a 3D unstructured mesh made of >> tetrahedrons in a cuboidal domain. I had a few queries: >> > 1) Is there any way of ensuring load balancing based on the number of >> vertices per MPI process. >> > >> > You can now call DMPlexRebalanceSharedPoints() to try and get better >> balance of vertices. >> > >> > Thank you for pointing out this function! >> > >> > 2) As the global domain is cuboidal, is the resulting domain >> decomposition also cuboidal on every MPI process? If not, is there a way to >> ensure this? For example in DMDA, the default domain decomposition for a >> cuboidal domain is cuboidal. >> > >> > It sounds like you do not want something that is actually unstructured. >> Rather, it seems like you want to >> > take a DMDA type thing and split it into tets. You can get a cuboidal >> decomposition of a hex mesh easily. >> > Call DMPlexCreateBoxMesh() with one cell for every process, distribute, >> and then uniformly refine. This >> > will not quite work for tets since the mesh partitioner will tend to >> violate that constraint. You could: >> > >> > No, I have an unstructured mesh that increases in resolution away from >> the center of the cuboid. See Figure: 5 in the ArXiv paper >> https://arxiv.org/pdf/1907.02604.pdf for a slice through the midplane >> of the cuboid. Given this type of mesh, will dmplex do a cuboidal domain >> decomposition? >> > >> > Sincerely, >> > SG >> > >> > a) Prescribe the distribution yourself using the Shell partitioner >> type >> > >> > or >> > >> > b) Write a refiner that turns hexes into tets >> > >> > We already have a refiner that turns tets into hexes, but we never >> wrote the other direction because it was not clear >> > that it was useful. >> > >> > Thanks, >> > >> > Matt >> > >> > Sincerely, >> > SG >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > -- Norbert Wiener >> > >> > https://www.cse.buffalo.edu/~knepley/ >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Nov 29 11:39:24 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 29 Nov 2019 17:39:24 +0000 Subject: [petsc-users] Outputting matrix for viewing in matlab In-Reply-To: References: Message-ID: <40760746-8753-4B63-9F16-31B55C5A9005@anl.gov> > On Nov 28, 2019, at 7:07 PM, baikadi pranay wrote: > > Hello PETSc users, > > I have a sparse matrix built and I want to output the matrix for viewing in matlab. However i'm having difficulty outputting the matrix. I am writing my program in Fortran90 and I've included the following lines to output the matrix. > > call PetscViewerBinaryOpen(PETSC_COMM_SELF,'matrix',FILE_MODE_WRITE,view,ierr) Normally on would use to save the matrix and then use the scripts Patrick mentioned to read the matrix into Matlab or Python. call MatView(matrix, view,ierr) call PetscViewerDestroy(view,ierr) > call PetscViewerBinaryGetDescriptor(view,fd,ierr) > call PetscBinaryWrite(fd,ham,1,PETSC_SCALAR,PETSC_FALSE,ierr) > > These lines do create a matrix but matlab says its not a binary file. Could you please provide me some inputs on where I'm going wrong and how to proceed with this problem. I can provide any further information that you might need to help me solve this problem. > > > Thank you. > > Sincerely, > Pranay. From fe.wallner at gmail.com Fri Nov 29 18:14:36 2019 From: fe.wallner at gmail.com (Felipe Giacomelli) Date: Fri, 29 Nov 2019 22:14:36 -0200 Subject: [petsc-users] Weird behaviour of PCGAMG in coupled poroelasticity Message-ID: Hello, I'm trying to solve Biot's poroelasticity (Cryer's sphere problem) through a fully coupled scheme. Thus, the solution of a single linear system yields both displacement and pressure fields, |K L | | u | = |b_u|. |Q (A + H) | | p | = |b_p| The linear system is asymmetric, given that the discrete equations were obtained through the Element based Finite Volume Method (EbFVM). An unstructured tetrahedral grid is utilised, it has about 10000 nodal points (not coarse, nor too refined). Therefore, GMRES and GAMG are employed to solve it. Furthermore, the program was parallelised through a Domain Decomposition Method. Thus, each processor works in its subdomain only. So far, so good. For a given set of poroelastic properties (which are constant throughout time and space), the speedup increases as more processors are utilised: coupling intensity: 7.51e-01 proc solve time [s] 1 314.23 2 171.65 3 143.21 4 149.26 (> 143.21, but ok) However, after making the problem MORE coupled (different poroelastic properties), a strange behavior is observed: coupling intensity: 2.29e+01 proc solve time [s] 1 28909.35 2 192.39 3 181.29 4 14463.63 Recalling that GMRES and GAMG are used, KSP takes about 4300 iterations to converge when 1 processor is employed. On the other hand, for 2 processors, KSP takes around 30 iterations to reach convergence. Hence, explaining the difference between the solution times. Increasing the coupling even MORE, everything goes as expected: coupling intensity: 4.63e+01 proc solve time [s] 1 229.26 2 146.04 3 121.49 4 107.80 Because of this, I ask: * What may be the source of this behavior? Can it be predicted? * How can I remedy this situation? At last, are there better solver-pc choices for coupled poroelasticity? Thank you, Felipe -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat Nov 30 00:48:59 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Sat, 30 Nov 2019 06:48:59 +0000 Subject: [petsc-users] Weird behaviour of PCGAMG in coupled poroelasticity In-Reply-To: References: Message-ID: I would first run with -ksp_monitor_true_residual -ksp_converged_reason to make sure that those "very fast" cases are actually converging in those runs also use -ksp_view to see what the GMAG parameters are. Also use the -info option to have it print details on the solution process. Barry > On Nov 29, 2019, at 4:14 PM, Felipe Giacomelli wrote: > > Hello, > > I'm trying to solve Biot's poroelasticity (Cryer's sphere problem) through a fully coupled scheme. Thus, the solution of a single linear system yields both displacement and pressure fields, > > |K L | | u | = |b_u|. > |Q (A + H) | | p | = |b_p| > > The linear system is asymmetric, given that the discrete equations were obtained through the Element based Finite Volume Method (EbFVM). An unstructured tetrahedral grid is utilised, it has about 10000 nodal points (not coarse, nor too refined). Therefore, GMRES and GAMG are employed to solve it. > > Furthermore, the program was parallelised through a Domain Decomposition Method. Thus, each processor works in its subdomain only. > > So far, so good. For a given set of poroelastic properties (which are constant throughout time and space), the speedup increases as more processors are utilised: > > coupling intensity: 7.51e-01 > > proc solve time [s] > 1 314.23 > 2 171.65 > 3 143.21 > 4 149.26 (> 143.21, but ok) > > However, after making the problem MORE coupled (different poroelastic properties), a strange behavior is observed: > > coupling intensity: 2.29e+01 > > proc solve time [s] > 1 28909.35 > 2 192.39 > 3 181.29 > 4 14463.63 > > Recalling that GMRES and GAMG are used, KSP takes about 4300 iterations to converge when 1 processor is employed. On the other hand, for 2 processors, KSP takes around 30 iterations to reach convergence. Hence, explaining the difference between the solution times. > > Increasing the coupling even MORE, everything goes as expected: > > coupling intensity: 4.63e+01 > > proc solve time [s] > 1 229.26 > 2 146.04 > 3 121.49 > 4 107.80 > > Because of this, I ask: > > * What may be the source of this behavior? Can it be predicted? > * How can I remedy this situation? > > At last, are there better solver-pc choices for coupled poroelasticity? > > Thank you, > Felipe From mfadams at lbl.gov Sat Nov 30 04:01:05 2019 From: mfadams at lbl.gov (Mark Adams) Date: Sat, 30 Nov 2019 05:01:05 -0500 Subject: [petsc-users] Weird behaviour of PCGAMG in coupled poroelasticity In-Reply-To: References: Message-ID: Let me add that generic AMG is not great for systems like this (indefinite, asymmetric) so yes, check that your good cases are really good. GAMG uses eigenvalues, which are problematic for indefinite and asymmetric matrices. I don't know why this is ever working well, but try '-pc_type hypre' (and configure with --download-hypre'). Hypre is better with asymmetric matrices. This would provide useful information to diagnose what is going on here if not solve your problem. Note, the algorithms and implementations of hypre and GAMG are not very domain decomposition dependant so it is surprising to see these huge differences from the number of processors used. On Sat, Nov 30, 2019 at 1:49 AM Smith, Barry F. wrote: > > I would first run with -ksp_monitor_true_residual -ksp_converged_reason > to make sure that those "very fast" cases are actually converging in those > runs also use -ksp_view to see what the GMAG parameters are. Also use the > -info option to have it print details on the solution process. > > Barry > > > > > On Nov 29, 2019, at 4:14 PM, Felipe Giacomelli > wrote: > > > > Hello, > > > > I'm trying to solve Biot's poroelasticity (Cryer's sphere problem) > through a fully coupled scheme. Thus, the solution of a single linear > system yields both displacement and pressure fields, > > > > |K L | | u | = |b_u|. > > |Q (A + H) | | p | = |b_p| > > > > The linear system is asymmetric, given that the discrete equations were > obtained through the Element based Finite Volume Method (EbFVM). An > unstructured tetrahedral grid is utilised, it has about 10000 nodal points > (not coarse, nor too refined). Therefore, GMRES and GAMG are employed to > solve it. > > > > Furthermore, the program was parallelised through a Domain Decomposition > Method. Thus, each processor works in its subdomain only. > > > > So far, so good. For a given set of poroelastic properties (which are > constant throughout time and space), the speedup increases as more > processors are utilised: > > > > coupling intensity: 7.51e-01 > > > > proc solve time [s] > > 1 314.23 > > 2 171.65 > > 3 143.21 > > 4 149.26 (> 143.21, but ok) > > > > However, after making the problem MORE coupled (different poroelastic > properties), a strange behavior is observed: > > > > coupling intensity: 2.29e+01 > > > > proc solve time [s] > > 1 28909.35 > > 2 192.39 > > 3 181.29 > > 4 14463.63 > > > > Recalling that GMRES and GAMG are used, KSP takes about 4300 iterations > to converge when 1 processor is employed. On the other hand, for 2 > processors, KSP takes around 30 iterations to reach convergence. Hence, > explaining the difference between the solution times. > > > > Increasing the coupling even MORE, everything goes as expected: > > > > coupling intensity: 4.63e+01 > > > > proc solve time [s] > > 1 229.26 > > 2 146.04 > > 3 121.49 > > 4 107.80 > > > > Because of this, I ask: > > > > * What may be the source of this behavior? Can it be predicted? > > * How can I remedy this situation? > > > > At last, are there better solver-pc choices for coupled poroelasticity? > > > > Thank you, > > Felipe > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pranayreddy865 at gmail.com Sat Nov 30 14:30:51 2019 From: pranayreddy865 at gmail.com (baikadi pranay) Date: Sat, 30 Nov 2019 13:30:51 -0700 Subject: [petsc-users] Outputting matrix for viewing in matlab In-Reply-To: <40760746-8753-4B63-9F16-31B55C5A9005@anl.gov> References: <40760746-8753-4B63-9F16-31B55C5A9005@anl.gov> Message-ID: Thank you all for the suggestions. I have it working now. Regards, Pranay. On Fri, Nov 29, 2019 at 10:39 AM Smith, Barry F. wrote: > > > > On Nov 28, 2019, at 7:07 PM, baikadi pranay > wrote: > > > > Hello PETSc users, > > > > I have a sparse matrix built and I want to output the matrix for viewing > in matlab. However i'm having difficulty outputting the matrix. I am > writing my program in Fortran90 and I've included the following lines to > output the matrix. > > > > call > PetscViewerBinaryOpen(PETSC_COMM_SELF,'matrix',FILE_MODE_WRITE,view,ierr) > > Normally on would use to save the matrix and then use the scripts > Patrick mentioned to read the matrix into Matlab or Python. > > call MatView(matrix, view,ierr) > call PetscViewerDestroy(view,ierr) > > > > > call PetscViewerBinaryGetDescriptor(view,fd,ierr) > > call PetscBinaryWrite(fd,ham,1,PETSC_SCALAR,PETSC_FALSE,ierr) > > > > These lines do create a matrix but matlab says its not a binary file. > Could you please provide me some inputs on where I'm going wrong and how to > proceed with this problem. I can provide any further information that you > might need to help me solve this problem. > > > > > > Thank you. > > > > Sincerely, > > Pranay. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pranayreddy865 at gmail.com Sat Nov 30 15:18:12 2019 From: pranayreddy865 at gmail.com (baikadi pranay) Date: Sat, 30 Nov 2019 14:18:12 -0700 Subject: [petsc-users] Floating point exception Message-ID: Hello PETSc users, I am currently trying to build a 1-D Schrodinger solver. I have built my hamiltonian matrix (of size 121 x 121) and i'm trying to find the eigenvalues. I have the following lines of code for the solver: *call EPSCreate(PETSC_COMM_WORLD,eps,ierr)* *call EPSSetOperators(eps,ham,S,ierr)call EPSSetProblemType(eps,EPS_GHEP,ierr)* *call EPSSetFromOptions(eps,ierr)call EPSSetDimensions(eps,10,PETSC_DEFAULT_INTEGER,PETSC_DEFAULT_INTEGER,ierr)call EPSSolve(eps,ierr)call EPSDestroy(eps,ierr)* At the EPSSolve line, i get the following error: *[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------[0]PETSC ERROR: Floating point exception[0]PETSC ERROR: Vec entry at local location 0 is not-a-number or infinite at end of function: Parameter number 3[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.[0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019* I am using the options *-st_pc_factor_shift_type NONZERO -st_pc_factor_shift_amount 1* ( else I end up getting the "zero pivot in LU factorization" error ). I outputted my matrix to matlab and confirmed that the null space is empty and the matrix is not singular. I am not sure why I'm getting this error. Could you provide me a hint as to how to solve this problem. Sincerely, Pranay. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Nov 30 15:45:55 2019 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 30 Nov 2019 15:45:55 -0600 Subject: [petsc-users] Floating point exception In-Reply-To: References: Message-ID: On Sat, Nov 30, 2019 at 3:19 PM baikadi pranay wrote: > Hello PETSc users, > > I am currently trying to build a 1-D Schrodinger solver. I have built my > hamiltonian matrix (of size 121 x 121) and i'm trying to find the > eigenvalues. I have the following lines of code for the solver: > > *call EPSCreate(PETSC_COMM_WORLD,eps,ierr)* > > *call EPSSetOperators(eps,ham,S,ierr)call > EPSSetProblemType(eps,EPS_GHEP,ierr)* > > > > *call EPSSetFromOptions(eps,ierr)call > EPSSetDimensions(eps,10,PETSC_DEFAULT_INTEGER,PETSC_DEFAULT_INTEGER,ierr)call > EPSSolve(eps,ierr)call EPSDestroy(eps,ierr)* > > At the EPSSolve line, i get the following error: > > > > > > *[0]PETSC ERROR: --------------------- Error Message > --------------------------------------------------------------[0]PETSC > ERROR: Floating point exception[0]PETSC ERROR: Vec entry at local location > 0 is not-a-number or infinite at end of function: Parameter number > 3[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble > shooting.[0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019* > You need to show the entire stack trace that is output here. Thanks, Matt > I am using the options *-st_pc_factor_shift_type NONZERO > -st_pc_factor_shift_amount 1* ( else I end up getting the "zero pivot > in LU factorization" error ). > > I outputted my matrix to matlab and confirmed that the null space is empty > and the matrix is not singular. I am not sure why I'm getting this error. > Could you provide me a hint as to how to solve this problem. > > Sincerely, > Pranay. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pranayreddy865 at gmail.com Sat Nov 30 15:56:15 2019 From: pranayreddy865 at gmail.com (baikadi pranay) Date: Sat, 30 Nov 2019 14:56:15 -0700 Subject: [petsc-users] Floating point exception In-Reply-To: References: Message-ID: Hello, The entire output is the following: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Floating point exception [0]PETSC ERROR: Vec entry at local location 0 is not-a-number or infinite at end of function: Parameter number 3 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 [0]PETSC ERROR: ./a.out on a linux-gnu-c-debug named agave1.agave.rc.asu.edu by pbaikadi Sat Nov 30 14:54:31 2019 [0]PETSC ERROR: Configure options [0]PETSC ERROR: #1 VecValidValues() line 28 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/vec/vec/interface/rvector.c [0]PETSC ERROR: #2 PCApply() line 464 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #3 KSP_PCApply() line 281 in /packages/7x/petsc/3.11.1/petsc-3.11.1/include/petsc/private/kspimpl.h [0]PETSC ERROR: #4 KSPSolve_PREONLY() line 22 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/preonly/preonly.c [0]PETSC ERROR: #5 KSPSolve() line 782 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #6 STMatSolve() line 193 in /packages/7x/slepc/3.11.1/slepc-3.11.1/src/sys/classes/st/interface/stsles.c [0]PETSC ERROR: #7 STApply_Shift() line 25 in /packages/7x/slepc/3.11.1/slepc-3.11.1/src/sys/classes/st/impls/shift/shift.c [0]PETSC ERROR: #8 STApply() line 57 in /packages/7x/slepc/3.11.1/slepc-3.11.1/src/sys/classes/st/interface/stsolve.c [0]PETSC ERROR: #9 EPSGetStartVector() line 797 in /packages/7x/slepc/3.11.1/slepc-3.11.1/src/eps/interface/epssolve.c [0]PETSC ERROR: #10 EPSSolve_KrylovSchur_Symm() line 32 in /packages/7x/slepc/3.11.1/slepc-3.11.1/src/eps/impls/krylov/krylovschur/ks-symm.c [0]PETSC ERROR: #11 EPSSolve() line 149 in /packages/7x/slepc/3.11.1/slepc-3.11.1/src/eps/interface/epssolve.c Regards, Pranay. On Sat, Nov 30, 2019 at 2:46 PM Matthew Knepley wrote: > On Sat, Nov 30, 2019 at 3:19 PM baikadi pranay > wrote: > >> Hello PETSc users, >> >> I am currently trying to build a 1-D Schrodinger solver. I have built my >> hamiltonian matrix (of size 121 x 121) and i'm trying to find the >> eigenvalues. I have the following lines of code for the solver: >> >> *call EPSCreate(PETSC_COMM_WORLD,eps,ierr)* >> >> *call EPSSetOperators(eps,ham,S,ierr)call >> EPSSetProblemType(eps,EPS_GHEP,ierr)* >> >> >> >> *call EPSSetFromOptions(eps,ierr)call >> EPSSetDimensions(eps,10,PETSC_DEFAULT_INTEGER,PETSC_DEFAULT_INTEGER,ierr)call >> EPSSolve(eps,ierr)call EPSDestroy(eps,ierr)* >> >> At the EPSSolve line, i get the following error: >> >> >> >> >> >> *[0]PETSC ERROR: --------------------- Error Message >> --------------------------------------------------------------[0]PETSC >> ERROR: Floating point exception[0]PETSC ERROR: Vec entry at local location >> 0 is not-a-number or infinite at end of function: Parameter number >> 3[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >> for trouble >> shooting.[0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019* >> > > You need to show the entire stack trace that is output here. > > Thanks, > > Matt > > >> I am using the options *-st_pc_factor_shift_type NONZERO >> -st_pc_factor_shift_amount 1* ( else I end up getting the "zero >> pivot in LU factorization" error ). >> >> I outputted my matrix to matlab and confirmed that the null space is >> empty and the matrix is not singular. I am not sure why I'm getting this >> error. Could you provide me a hint as to how to solve this problem. >> >> Sincerely, >> Pranay. >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pranayreddy865 at gmail.com Sat Nov 30 19:28:05 2019 From: pranayreddy865 at gmail.com (baikadi pranay) Date: Sat, 30 Nov 2019 18:28:05 -0700 Subject: [petsc-users] petsc-users Digest, Vol 131, Issue 49 In-Reply-To: References: Message-ID: Hello all, I was able to figure out why i was getting a floating point error. Although i knew that petsc uses 0-based indexing for fortran, i forgot to modify my code accordingly. So it was accessing elements of the array which are essentially zero. Thank you for your time. Regards, Pranay. On Sat, Nov 30, 2019 at 2:57 PM wrote: > Send petsc-users mailing list submissions to > petsc-users at mcs.anl.gov > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.mcs.anl.gov/mailman/listinfo/petsc-users > or, via email, send a message with subject or body 'help' to > petsc-users-request at mcs.anl.gov > > You can reach the person managing the list at > petsc-users-owner at mcs.anl.gov > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of petsc-users digest..." > > > Today's Topics: > > 1. Re: Outputting matrix for viewing in matlab (baikadi pranay) > 2. Floating point exception (baikadi pranay) > 3. Re: Floating point exception (Matthew Knepley) > 4. Re: Floating point exception (baikadi pranay) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sat, 30 Nov 2019 13:30:51 -0700 > From: baikadi pranay > To: "Smith, Barry F." > Cc: "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] Outputting matrix for viewing in matlab > Message-ID: > sVZYoM7+4hA at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Thank you all for the suggestions. I have it working now. > > Regards, > Pranay. > > On Fri, Nov 29, 2019 at 10:39 AM Smith, Barry F. > wrote: > > > > > > > > On Nov 28, 2019, at 7:07 PM, baikadi pranay > > wrote: > > > > > > Hello PETSc users, > > > > > > I have a sparse matrix built and I want to output the matrix for > viewing > > in matlab. However i'm having difficulty outputting the matrix. I am > > writing my program in Fortran90 and I've included the following lines to > > output the matrix. > > > > > > call > > PetscViewerBinaryOpen(PETSC_COMM_SELF,'matrix',FILE_MODE_WRITE,view,ierr) > > > > Normally on would use to save the matrix and then use the scripts > > Patrick mentioned to read the matrix into Matlab or Python. > > > > call MatView(matrix, view,ierr) > > call PetscViewerDestroy(view,ierr) > > > > > > > > > call PetscViewerBinaryGetDescriptor(view,fd,ierr) > > > call PetscBinaryWrite(fd,ham,1,PETSC_SCALAR,PETSC_FALSE,ierr) > > > > > > These lines do create a matrix but matlab says its not a binary file. > > Could you please provide me some inputs on where I'm going wrong and how > to > > proceed with this problem. I can provide any further information that you > > might need to help me solve this problem. > > > > > > > > > Thank you. > > > > > > Sincerely, > > > Pranay. > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191130/94be3de8/attachment-0001.html > > > > ------------------------------ > > Message: 2 > Date: Sat, 30 Nov 2019 14:18:12 -0700 > From: baikadi pranay > To: petsc-users at mcs.anl.gov > Subject: [petsc-users] Floating point exception > Message-ID: > < > CA+zFCTkX2Nf-dHnWdxKZtqtLzn2deRRstJbk_dHn9hKxCe6ZsA at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hello PETSc users, > > I am currently trying to build a 1-D Schrodinger solver. I have built my > hamiltonian matrix (of size 121 x 121) and i'm trying to find the > eigenvalues. I have the following lines of code for the solver: > > *call EPSCreate(PETSC_COMM_WORLD,eps,ierr)* > > *call EPSSetOperators(eps,ham,S,ierr)call > EPSSetProblemType(eps,EPS_GHEP,ierr)* > > > > *call EPSSetFromOptions(eps,ierr)call > > EPSSetDimensions(eps,10,PETSC_DEFAULT_INTEGER,PETSC_DEFAULT_INTEGER,ierr)call > EPSSolve(eps,ierr)call EPSDestroy(eps,ierr)* > > At the EPSSolve line, i get the following error: > > > > > > *[0]PETSC ERROR: --------------------- Error Message > --------------------------------------------------------------[0]PETSC > ERROR: Floating point exception[0]PETSC ERROR: Vec entry at local location > 0 is not-a-number or infinite at end of function: Parameter number > 3[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble > shooting.[0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019* > > I am using the options *-st_pc_factor_shift_type NONZERO > -st_pc_factor_shift_amount 1* ( else I end up getting the "zero pivot > in LU factorization" error ). > > I outputted my matrix to matlab and confirmed that the null space is empty > and the matrix is not singular. I am not sure why I'm getting this error. > Could you provide me a hint as to how to solve this problem. > > Sincerely, > Pranay. > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191130/0333a083/attachment-0001.html > > > > ------------------------------ > > Message: 3 > Date: Sat, 30 Nov 2019 15:45:55 -0600 > From: Matthew Knepley > To: baikadi pranay > Cc: PETSc > Subject: Re: [petsc-users] Floating point exception > Message-ID: > t7uTi3eDeZA at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > On Sat, Nov 30, 2019 at 3:19 PM baikadi pranay > wrote: > > > Hello PETSc users, > > > > I am currently trying to build a 1-D Schrodinger solver. I have built my > > hamiltonian matrix (of size 121 x 121) and i'm trying to find the > > eigenvalues. I have the following lines of code for the solver: > > > > *call EPSCreate(PETSC_COMM_WORLD,eps,ierr)* > > > > *call EPSSetOperators(eps,ham,S,ierr)call > > EPSSetProblemType(eps,EPS_GHEP,ierr)* > > > > > > > > *call EPSSetFromOptions(eps,ierr)call > > > EPSSetDimensions(eps,10,PETSC_DEFAULT_INTEGER,PETSC_DEFAULT_INTEGER,ierr)call > > EPSSolve(eps,ierr)call EPSDestroy(eps,ierr)* > > > > At the EPSSolve line, i get the following error: > > > > > > > > > > > > *[0]PETSC ERROR: --------------------- Error Message > > --------------------------------------------------------------[0]PETSC > > ERROR: Floating point exception[0]PETSC ERROR: Vec entry at local > location > > 0 is not-a-number or infinite at end of function: Parameter number > > 3[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > > for trouble > > shooting.[0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019* > > > > You need to show the entire stack trace that is output here. > > Thanks, > > Matt > > > > I am using the options *-st_pc_factor_shift_type NONZERO > > -st_pc_factor_shift_amount 1* ( else I end up getting the "zero pivot > > in LU factorization" error ). > > > > I outputted my matrix to matlab and confirmed that the null space is > empty > > and the matrix is not singular. I am not sure why I'm getting this error. > > Could you provide me a hint as to how to solve this problem. > > > > Sincerely, > > Pranay. > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ < > http://www.cse.buffalo.edu/~knepley/> > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191130/d57a4cd7/attachment-0001.html > > > > ------------------------------ > > Message: 4 > Date: Sat, 30 Nov 2019 14:56:15 -0700 > From: baikadi pranay > To: Matthew Knepley > Cc: PETSc > Subject: Re: [petsc-users] Floating point exception > Message-ID: > < > CA+zFCTk4fLaGf8WXdKiuLHtj6H8S7wJWHYtiE+yJfFayqBC4uw at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hello, > > The entire output is the following: > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Floating point exception > [0]PETSC ERROR: Vec entry at local location 0 is not-a-number or infinite > at end of function: Parameter number 3 > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for > trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 > [0]PETSC ERROR: ./a.out on a linux-gnu-c-debug named > agave1.agave.rc.asu.edu > by pbaikadi Sat Nov 30 14:54:31 2019 > [0]PETSC ERROR: Configure options > [0]PETSC ERROR: #1 VecValidValues() line 28 in > /packages/7x/petsc/3.11.1/petsc-3.11.1/src/vec/vec/interface/rvector.c > [0]PETSC ERROR: #2 PCApply() line 464 in > /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: #3 KSP_PCApply() line 281 in > /packages/7x/petsc/3.11.1/petsc-3.11.1/include/petsc/private/kspimpl.h > [0]PETSC ERROR: #4 KSPSolve_PREONLY() line 22 in > /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/preonly/preonly.c > [0]PETSC ERROR: #5 KSPSolve() line 782 in > /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #6 STMatSolve() line 193 in > > /packages/7x/slepc/3.11.1/slepc-3.11.1/src/sys/classes/st/interface/stsles.c > [0]PETSC ERROR: #7 STApply_Shift() line 25 in > > /packages/7x/slepc/3.11.1/slepc-3.11.1/src/sys/classes/st/impls/shift/shift.c > [0]PETSC ERROR: #8 STApply() line 57 in > > /packages/7x/slepc/3.11.1/slepc-3.11.1/src/sys/classes/st/interface/stsolve.c > [0]PETSC ERROR: #9 EPSGetStartVector() line 797 in > /packages/7x/slepc/3.11.1/slepc-3.11.1/src/eps/interface/epssolve.c > [0]PETSC ERROR: #10 EPSSolve_KrylovSchur_Symm() line 32 in > > /packages/7x/slepc/3.11.1/slepc-3.11.1/src/eps/impls/krylov/krylovschur/ks-symm.c > [0]PETSC ERROR: #11 EPSSolve() line 149 in > /packages/7x/slepc/3.11.1/slepc-3.11.1/src/eps/interface/epssolve.c > > Regards, > Pranay. > > On Sat, Nov 30, 2019 at 2:46 PM Matthew Knepley wrote: > > > On Sat, Nov 30, 2019 at 3:19 PM baikadi pranay > > > wrote: > > > >> Hello PETSc users, > >> > >> I am currently trying to build a 1-D Schrodinger solver. I have built my > >> hamiltonian matrix (of size 121 x 121) and i'm trying to find the > >> eigenvalues. I have the following lines of code for the solver: > >> > >> *call EPSCreate(PETSC_COMM_WORLD,eps,ierr)* > >> > >> *call EPSSetOperators(eps,ham,S,ierr)call > >> EPSSetProblemType(eps,EPS_GHEP,ierr)* > >> > >> > >> > >> *call EPSSetFromOptions(eps,ierr)call > >> > EPSSetDimensions(eps,10,PETSC_DEFAULT_INTEGER,PETSC_DEFAULT_INTEGER,ierr)call > >> EPSSolve(eps,ierr)call EPSDestroy(eps,ierr)* > >> > >> At the EPSSolve line, i get the following error: > >> > >> > >> > >> > >> > >> *[0]PETSC ERROR: --------------------- Error Message > >> --------------------------------------------------------------[0]PETSC > >> ERROR: Floating point exception[0]PETSC ERROR: Vec entry at local > location > >> 0 is not-a-number or infinite at end of function: Parameter number > >> 3[0]PETSC ERROR: See > http://www.mcs.anl.gov/petsc/documentation/faq.html > >> for trouble > >> shooting.[0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019* > >> > > > > You need to show the entire stack trace that is output here. > > > > Thanks, > > > > Matt > > > > > >> I am using the options *-st_pc_factor_shift_type NONZERO > >> -st_pc_factor_shift_amount 1* ( else I end up getting the "zero > >> pivot in LU factorization" error ). > >> > >> I outputted my matrix to matlab and confirmed that the null space is > >> empty and the matrix is not singular. I am not sure why I'm getting this > >> error. Could you provide me a hint as to how to solve this problem. > >> > >> Sincerely, > >> Pranay. > >> > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > their > > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191130/d2498b43/attachment.html > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > petsc-users mailing list > petsc-users at mcs.anl.gov > https://lists.mcs.anl.gov/mailman/listinfo/petsc-users > > > ------------------------------ > > End of petsc-users Digest, Vol 131, Issue 49 > ******************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: