From hongzhang at anl.gov Sat May 1 09:02:29 2021 From: hongzhang at anl.gov (Zhang, Hong) Date: Sat, 1 May 2021 14:02:29 +0000 Subject: [petsc-users] Detecting steady-state with TS In-Reply-To: References: <87A011C7-E521-41AF-AC77-F0042E1FCF79@llnl.gov> Message-ID: <58A407FD-2E44-499E-AF0E-6D73A5ECED4A@anl.gov> Although TSMonitor may also work, I would suggest to use TSPostStep for the customized convergence check. TSPostStep allows to modify the system or change the solver settings. It skips the time steps that are rolled back (e.g. rejected time steps), but TSMonitor is applied to every time step including the rejected ones. For time derivative, I would just do a RHS function evaluation if the extra cost is acceptable. If not, then I would consider caching the current solution in TSPostStep. Hong (Mr.) On Apr 30, 2021, at 6:36 PM, Salazar De Troya, Miguel via petsc-users > wrote: Thanks, can you elaborate on computing the time derived? TSMonitor only gives me the information at the current time step. I guess I could store a copy of the solution in the context so I can use it in the next call to compute the difference. On the other hand, I could also store the norm of the RHS function (since this is equal to the time derivative \frac{\partial \phi}{\partial t}). Miguel From: Mark Adams > Date: Friday, April 30, 2021 at 3:56 PM To: "Salazar De Troya, Miguel" > Cc: "Zhang, Hong via petsc-users" > Subject: Re: [petsc-users] Detecting steady-state with TS You could add a https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/TS/TSMonitorSet.html method, compute the time derived and decide how to declare converged. Then set converged (https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/TS/TSSetConvergedReason.html) with TS_CONVERGED_USER That should cause TS to wrap up the solve and exit cleanly. Mark On Thu, Apr 29, 2021 at 3:27 PM Salazar De Troya, Miguel via petsc-users > wrote: I am solving the signed distance equation \frac{\partial \phi}{\partial t} + sign (\phi_{0})(|\nabla \phi| - 1) = 0 using a Local Discontinuous Galerkin (LDG) method as described in https://www.sciencedirect.com/science/article/pii/S0021999110005255 I am interested in solving it close to steady state. I was hoping I could measure how close to steady state the solution is by using the TSSetEventHandler infrastructure, but the handler does not have information on the time derivative. I looked at TSPSEUDO, but it forces me to use an implicit method, which I cannot provide because how the LDG method works (it calculates the fluxes solving additional equations). This makes me wonder if the LDG method is the best choice, so I am open to suggestions. Given my current progress with the LDG approach, I am wondering if there is a way to solve to steady state using explicit algorithms such as Runge-Kutta. Thanks Miguel Miguel A. Salazar de Troya Postdoctoral Researcher, Lawrence Livermore National Laboratory B141 Rm: 1085-5 Ph: 1(925) 422-6411 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sun May 2 11:30:08 2021 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 2 May 2021 11:30:08 -0500 Subject: [petsc-users] MatDestroy problem with multiple matrices and SUPERLU_DIST In-Reply-To: References: <23eee78b7748418d948e850c50fefb5f@MAR190n2.marin.local> <6562389d4d694d92a74b3add0cbcc823@MAR190n2.marin.local> Message-ID: > ==1026905== by 0x5317899: MatDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x5336E58: matdestroy_ (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x1528710: __fsi_MOD_fem_constructmatricespetscexit (fsi.F90:2297) > ==1026905== Address 0x2ce67398 is 11,112 bytes inside an unallocated block of size 11,232 in arena "client" Is it possible that this __fsi_MOD_fem_constructmatricespetscexit is being called AFTER PetscFinalize()? Perhaps it is defined with a "final" and the compiler/linker schedule it to be called after the program has "completed". This would explain the crash, the valgrind stack frames and why it even does not crash with MPICH. This can happen with C++ destructors in code such as MyC++Class my; <-- has a destructor that destroys PETSc objects PetscInitialize() .... PetscFinalize() <-- the destructor gets called here and messes with MPI data that no longer exists. return 0; } The fix is to force the destructor to be called before PETSc finalize and this can be done with PetscInitialize() { MyC++Class my; <-- has a destructor that destroys PETSc objects .... <-- the destructor gets called here and everything is fine } PetscFinalize() return 0; } I don't know the details of how Fortran's final is implemented but this is my current guess as to what is happening in your code and you need to somehow arrange for the module final to be called before PetscFinalize(). Barry > On Apr 28, 2021, at 7:22 AM, Deij-van Rijswijk, Menno wrote: > > > The modules have automatic freeing in as much as that when a variable that is local to a subroutine is ALLOCATE'd, it is automatically freed when the subroutine returns. I don't think that is problematic, as MatDestroy is used a lot in the code and normally executes just fine. > > As far as I can see, no specific new communicators are created; MatCreateAIJ or MatCreateSeqAIJ are called with PETSC_COMM_WORLD, resp. PETSC_COMM_SELF as first argument. > > We also run this with the Intel MPI library, which is based on MPICH. There this problem does not occur. > > The Valgrind run did not produce any new insights (at least not for me), I have pasted the relevant bits at the end of this message. I did a run on debug versions of PETSc (v3.14.5) and OpenMPI (v 3.1.2) and I find the following stack trace with line numbers for each frame. Maybe that helps in further pinpointing the problem. > > 0x0000155540d11719 in ompi_comm_free (comm=0x483f4e0) at /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/communicator/comm.c:1470 > 1470 if ( ! OMPI_COMM_IS_INTRINSIC((*comm)->c_local_comm)) { > Missing separate debuginfos, use: yum debuginfo-install libgcc-8.3.1-5.el8.0.2.x86_64 libgfortran-8.3.1-5.el8.0.2.x86_64 libibumad-47mlnx1-1.47329.x86_64 libibverbs-47mlnx1-1.47329.x86_64 libnl3-3.5.0-1.el8.x86_64 libquadmath-8.3.1-5.el8.0.2.x86_64 librdmacm-47mlnx1-1.47329.x86_64 libstdc++-8.3.1-5.el8.0.2.x86_64 libxml2-2.9.7-7.el8.x86_64 numactl-libs-2.0.12-9.el8.x86_64 opensm-libs-5.5.1.MLNX20191120.0c8dde0-0.1.47329.x86_64 openssl-libs-1.1.1c-15.el8.x86_64 python3-libs-3.6.8-23.el8.x86_64 sssd-client-2.2.3-20.el8.x86_64 ucx-cma-1.7.0-1.47329.x86_64 ucx-ib-1.7.0-1.47329.x86_64 xz-libs-5.2.4-3.el8.x86_64 zlib-1.2.11-16.el8_2.x86_64 > (gdb) bt > #0 0x0000155540d11719 in ompi_comm_free (comm=0x483f4e0) at /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/communicator/comm.c:1470 > #1 0x0000155540d4f1af in PMPI_Comm_free (comm=0x483f4e0) at pcomm_free.c:62 > #2 0x000015555346329a in superlu_gridexit (grid=0x483f4e0) at /home/mdeij/install-gnu/extLibs/Linux-x86_64-Intel/superlu_dist-6.3.0/SRC/superlu_grid.c:174 > #3 0x0000155553ca2ff1 in Petsc_Superlu_dist_keyval_Delete_Fn (comm=0x3921b10, keyval=16, attr_val=0x483f4d0, extra_state=0x0) at /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:97 > #4 0x0000155540d0baa1 in ompi_attr_delete_impl (type=COMM_ATTR, object=0x3921b10, attr_hash=0x377efe0, key=16, predefined=true) at /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/attribute/attribute.c:1062 > #5 0x0000155540d0c039 in ompi_attr_delete_all (type=COMM_ATTR, object=0x3921b10, attr_hash=0x377efe0) at /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/attribute/attribute.c:1166 > #6 0x0000155540d11676 in ompi_comm_free (comm=0x7fffffffc5c0) at /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/communicator/comm.c:1462 > #7 0x0000155540d4f1af in PMPI_Comm_free (comm=0x7fffffffc5c0) at pcomm_free.c:62 > #8 0x000015555393fb68 in PetscCommDestroy (comm=0x3943a60) at /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/sys/objects/tagm.c:217 > #9 0x0000155553941e07 in PetscHeaderDestroy_Private (h=0x3943a20) at /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/sys/objects/inherit.c:121 > #10 0x000015555408edfe in MatDestroy (A=0x3558c18) at /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/mat/interface/matrix.c:1306 > #11 0x00001555540cb5fa in matdestroy_ (A=0x3558c18, __ierr=0x7fffffffc73c) at /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/mat/interface/ftn-auto/matrixf.c:770 > > Valgrind output: > > ==1026905== Invalid read of size 1 > ==1026905== at 0x19184538: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x6943B61: superlu_gridexit (in /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0) > ==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x1912447B: ompi_attr_delete_impl (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19126FFE: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x4FEE49D: PetscCommDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x5317899: MatDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x5336E58: matdestroy_ (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x1528710: __fsi_MOD_fem_constructmatricespetscexit (fsi.F90:2297) > ==1026905== Address 0x2ce67398 is 11,112 bytes inside an unallocated block of size 11,232 in arena "client" > ==1026905== > ==1026905== Invalid read of size 8 > ==1026905== at 0x1912AC9A: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x6943B61: superlu_gridexit (in /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0) > ==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x1912447B: ompi_attr_delete_impl (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19126FFE: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x4FEE49D: PetscCommDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x5317899: MatDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x5336E58: matdestroy_ (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== Address 0x2ce673c0 is 11,152 bytes inside an unallocated block of size 11,232 in arena "client" > ==1026905== > ==1026905== Invalid read of size 8 > ==1026905== at 0x19126E5B: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x6943B61: superlu_gridexit (in /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0) > ==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x1912447B: ompi_attr_delete_impl (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19126FFE: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x4FEE49D: PetscCommDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x5317899: MatDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== Address 0x91 is not stack'd, malloc'd or (recently) free'd > ==1026905== > ==1026905== > ==1026905== Process terminating with default action of signal 11 (SIGSEGV) > ==1026905== Access not within mapped region at address 0x91 > ==1026905== at 0x19126E5B: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x6943B61: superlu_gridexit (in /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0) > ==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x1912447B: ompi_attr_delete_impl (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19126FFE: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x4FEE49D: PetscCommDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x5317899: MatDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== If you believe this happened as a result of a stack > ==1026905== overflow in your program's main thread (unlikely but > ==1026905== possible), you can try to increase the size of the > ==1026905== main thread stack using the --main-stacksize= flag. > ==1026905== The main thread stack size used in this run was 16777216. > > > dr. ir. Menno A. Deij-van Rijswijk | Researcher | Research & Development > MARIN | T +31 317 49 35 06 | M.Deij at marin.nl | www.marin.nl > > > MARIN news: WASP webinar & WiSP workshop > > > > From: Barry Smith > > Sent: Friday, April 23, 2021 7:09 PM > To: Deij-van Rijswijk, Menno > > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] MatDestroy problem with multiple matrices and SUPERLU_DIST > > > Thanks for looking. Do these modules have any "automatic freeing" when variables go out of scope (like C++ classes do)? > > Do you make specific new MPI communicators to use create the matrices? > > Have you tried MPICH or a different version of OpenMPI. > > Maybe run the program with valgrind. The stack frames you sent look "funny", that is I would not normally expect them to be in such an order. > > Barry -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sun May 2 11:54:59 2021 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 2 May 2021 11:54:59 -0500 Subject: [petsc-users] Parallel TS for ODE In-Reply-To: References: <278D238E-4519-4943-BC8D-607CF62F0025@gmail.com> <8D806473-44A4-4FA6-A119-6716F6C1C929@anl.gov> <485566D7-83DB-4EA9-87D4-2195F50E2D82@gmail.com> <9C3E8A29-C627-4A2C-B56D-2DFE9C1516D5@gmail.com> <00DC179D-1E7B-4ED7-B64B-A074882A5D34@gmail.com> Message-ID: <25CBF44D-2739-4748-97EF-A2007802B780@petsc.dev> [0]PETSC ERROR: Wrong subtype object:Parameter # 1 must have implementation da it is shell Are you calling TSSetDM() to supply your created DMDA to the TS? Based on the error message you are not, it is using a default shell DM, which is what TS does if you do not provide them with a DM. You need to call TSSetDM() after you create the TS and the DMDA. > On Apr 29, 2021, at 2:57 AM, Francesco Brarda wrote: > > I defined the DM as follows > ierr = DMDACreate1d(PETSC_COMM_WORLD,DM_BOUNDARY_NONE,3,3,3,NULL,&da);CHKERRQ(ierr); If you truly want one "spatial" point then you would want to use 1,3,0. This says 1 location in space, three degrees of freedom at that point and 0 ghost values (since there is only one spatial point there can be no ghost spatial values). BUT DMDA ALWAYS puts all degree of freedom at a point on the same process so this will not give you parallelism. All 3 dof will be on the same MPI rank. For what you want to do you can use 3,1,2. This says three "spatial" points, 1 dof at each "spatial" point and 2 ghost points. In your case "spatial" does not mean spatial in space it is just three abstract points. The global vectors (from DMCreate or Get GlobalVector) will have 1 value on each rank. The local vector from DMCreate or Get LocalVector) will have three values on each rank. Your initial conditions code would be something like if (rank == 0) { > x[0] = appctx->N - appctx->p[2]; } else if (rank == 1) { > x[1] = appctx->p[2]; } else { > x[2] = 0.0; } Your TSSetRHSFunction() would make a call to DMGetLocalVector(...&localX), do a DMGlobalToLocalBegin/End() from the input global X to localX, you would call DMDAVecGetArray(...,&xarray) on the localX and access all three values in xarray. The resulting computation of f the output vector would be something like if (rank == 0) { > farray[0] = your code that can use xarray[0], xarray[1], xarray[2] } else if (rank == 1) { > farray[1] = your code that can use xarray[0], xarray[1], xarray[2] } else { > farray[2] = your code that can use xarray[0], xarray[1], xarray[2] } There are many examples of this pattern in the example tutorials. When you implement a code with a spatial distribution you would use a dof of 3 at each point and not parallelize over the dof at each point. Likely you want to use DMNETWORK to manage the spatial distribution since it has a simple API and allows any number of different number of neighbors for each point. DMDA would not make sense for true spatial distribution except in some truly trivial neighbor configurations. Barry > > I am not sure whether I correctly understood this command properly. The vector should have 3 components (S, I, R) and 3 DOF as it is defined only when the three coordinates have been set. > Then I create a global vector X. When I set the initial conditions as below > > static PetscErrorCode InitialConditions(TS ts,Vec X, void *ctx) > { > PetscErrorCode ierr; > AppCtx *appctx = (AppCtx*) ctx; > PetscScalar *x; > DM da; > > PetscFunctionBeginUser; > ierr = TSGetDM(ts,&da);CHKERRQ(ierr); > > /* Get pointers to vector data */ > ierr = DMDAVecGetArray(da,X,(void*)&x);CHKERRQ(ierr); > > x[0] = appctx->N - appctx->p[2]; > x[1] = appctx->p[2]; > x[2] = 0.0; > > ierr = DMDAVecRestoreArray(da,X,(void*)&x);CHKERRQ(ierr); > PetscFunctionReturn(0); > } > > I have the error: > > [0]PETSC ERROR: Invalid argument > [0]PETSC ERROR: Wrong subtype object:Parameter # 1 must have implementation da it is shell > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown > [0]PETSC ERROR: ./par_sir_model on a arch-debug named srvulx13 by fbrarda Thu Apr 29 09:36:17 2021 > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc --download-mpich PETSC_ARCH=arch-debug > [0]PETSC ERROR: #1 DMDAVecGetArray() line 48 in /home/fbrarda/petsc/src/dm/impls/da/dagetarray.c > [0]PETSC ERROR: #2 InitialConditions() line 175 in par_sir_model.c > [0]PETSC ERROR: #3 main() line 295 in par_sir_model.c > [0]PETSC ERROR: No PETSc Option Table entries > > I would be very happy to receive any advices to fix the code. > Best, > Francesco > >> Il giorno 20 apr 2021, alle ore 21:35, Matthew Knepley > ha scritto: >> >> On Tue, Apr 20, 2021 at 1:17 PM Francesco Brarda > wrote: >> Thank you for the advices, I would just like to convert the code I already have to see what might happen once parallelized. >> Do you think it is better to put the 3 equations into a 1d Distributed Array with 3 dofs and run the job with multiple procs regardless of how many equations I have? Is it possible? >> >> If you plan in the end to use a structured grid, this is a great plan. If not, this is not a good plan. >> >> Thanks, >> >> Matt >> >> Thank you, >> Francesco >> >>> Il giorno 20 apr 2021, alle ore 17:57, Stefano Zampini ha scritto: >>> >>> It does not make sense to parallelize to 1 equation per process, unless that single equation per process is super super super costly. >>> Is this work you are doing used to understand PETSc parallelization strategy? if so, there are multiple examples in the sourcetree that you can look at to populate matrices and vectors in parallel >>> >>> Il giorno mar 20 apr 2021 alle ore 17:52 Francesco Brarda ha scritto: >>> In principle the entire code was for 1 proc only. The functions were built with VecGetArray(). While adapting the code for multiple procs I thought using VecGetOwnershipRange was a possible way to allocate the equations in the vector using multiple procs. What do you think, please? >>> >>> Thank you, >>> Francesco >>> >>>> Il giorno 20 apr 2021, alle ore 16:43, Matthew Knepley ha scritto: >>>> >>>> On Tue, Apr 20, 2021 at 10:41 AM Francesco Brarda wrote: >>>> I was trying to follow Barry's advice some time ago, but I guess that's not the way he meant it. How should I refer to the values contained in x? With Distributed Arrays? >>>> >>>> That is how you get values from x. However, I cannot understand at all what you are doing with "mybase". >>>> >>>> Matt >>>> >>>> Thanks >>>> Francesco >>>> >>>>>> Even though it will not scale and will deliver slower performance it is completely possible for you to solve the 3 variable problem using 3 MPI ranks. Or 10 mpi ranks. You would just create vectors/matrices with 1 degree of freedom for the first three ranks and no degrees of freedom for the later ranks. During your function evaluation (and Jacobian evaluation) for TS you will need to set up the appropriate communication to get the values you need on each rank to evaluate the parts of the function evaluation needed by that rank. This is true for parallelizing any computation. >>>>>> >>>>>> Barry >>>> >>>> >>>> >>>>> Il giorno 20 apr 2021, alle ore 15:40, Matthew Knepley ha scritto: >>>>> >>>>> On Tue, Apr 20, 2021 at 9:36 AM Francesco Brarda wrote: >>>>> Hi! >>>>> I tried to implement the SIR model taking into account the fact that I will only use 3 MPI ranks at this moment. >>>>> I built vectors and matrices following the examples already available. In particular, I defined the functions required similarly (RHSFunction, IFunction, IJacobian), as follows: >>>>> >>>>> I don't think this makes sense. You use "mybase" to distinguish between 3 procs, which would indicate that each procs has only >>>>> 1 degree of freedom. However, you use x[1] on each proc, indicating it has at least 2 dofs. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> static PetscErrorCode RHSFunction(TS ts,PetscReal t,Vec X,Vec F,void *ctx) >>>>> { >>>>> PetscErrorCode ierr; >>>>> AppCtx *appctx = (AppCtx*) ctx; >>>>> PetscScalar f;//, *x_localptr; >>>>> const PetscScalar *x; >>>>> PetscInt mybase; >>>>> >>>>> PetscFunctionBeginUser; >>>>> ierr = VecGetOwnershipRange(X,&mybase,NULL);CHKERRQ(ierr); >>>>> ierr = VecGetArrayRead(X,&x);CHKERRQ(ierr); >>>>> if (mybase == 0) { >>>>> f = (PetscScalar) (-appctx->p1*x[0]*x[1]/appctx->N); >>>>> ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES); >>>>> } >>>>> if (mybase == 1) { >>>>> f = (PetscScalar) (appctx->p1*x[0]*x[1]/appctx->N-appctx->p2*x[1]); >>>>> ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES); >>>>> } >>>>> if (mybase == 2) { >>>>> f = (PetscScalar) (appctx->p2*x[1]); >>>>> ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES); >>>>> } >>>>> ierr = VecRestoreArrayRead(X,&x);CHKERRQ(ierr); >>>>> ierr = VecAssemblyBegin(F);CHKERRQ(ierr); >>>>> ierr = VecAssemblyEnd(F);CHKERRQ(ierr); >>>>> PetscFunctionReturn(0); >>>>> } >>>>> >>>>> >>>>> Whilst for the Jacobian I did: >>>>> >>>>> >>>>> static PetscErrorCode IJacobian(TS ts,PetscReal t,Vec X,Vec Xdot,PetscReal a,Mat A,Mat B,void *ctx) >>>>> { >>>>> PetscErrorCode ierr; >>>>> AppCtx *appctx = (AppCtx*) ctx; >>>>> PetscInt mybase, rowcol[] = {0,1,2}; >>>>> const PetscScalar *x; >>>>> >>>>> PetscFunctionBeginUser; >>>>> ierr = MatGetOwnershipRange(B,&mybase,NULL);CHKERRQ(ierr); >>>>> ierr = VecGetArrayRead(X,&x);CHKERRQ(ierr); >>>>> if (mybase == 0) { >>>>> const PetscScalar J[] = {a + appctx->p1*x[1]/appctx->N, appctx->p1*x[0]/appctx->N, 0}; >>>>> ierr = MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr); >>>>> } >>>>> if (mybase == 1) { >>>>> const PetscScalar J[] = {- appctx->p1*x[1]/appctx->N, a - appctx->p1*x[0]/appctx->N + appctx->p2, 0}; >>>>> ierr = MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr); >>>>> } >>>>> if (mybase == 2) { >>>>> const PetscScalar J[] = {0, - appctx->p2, a}; >>>>> ierr = MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr); >>>>> } >>>>> ierr = VecRestoreArrayRead(X,&x);CHKERRQ(ierr); >>>>> >>>>> ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>> ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>> if (A != B) { >>>>> ierr = MatAssemblyBegin(B,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>> ierr = MatAssemblyEnd(B,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>> } >>>>> PetscFunctionReturn(0); >>>>> } >>>>> >>>>> This code does not provide the correct result, that is, the solution is the initial condition, either using implicit or explicit methods. Is the way I defined these objects wrong? How can I fix it? >>>>> I also tried to print the Jacobian with the following commands but it does not work (blank rows and error message). How should I print the Jacobian? >>>>> >>>>> ierr = TSGetIJacobian(ts,NULL,&K, NULL, NULL); CHKERRQ(ierr); >>>>> ierr = MatAssemblyBegin(K,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>> ierr = MatAssemblyEnd(K,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>> ierr = MatView(K,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); >>>>> >>>>> I would very much appreciate any kind of help or advice. >>>>> Best, >>>>> Francesco >>>>> >>>>>> Il giorno 2 apr 2021, alle ore 04:45, Barry Smith ha scritto: >>>>>> >>>>>> >>>>>> >>>>>>> On Apr 1, 2021, at 9:17 PM, Zhang, Hong via petsc-users wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Mar 31, 2021, at 2:53 AM, Francesco Brarda wrote: >>>>>>>> >>>>>>>> Hi everyone! >>>>>>>> >>>>>>>> I am trying to solve a system of 3 ODEs (a basic SIR model) with TS. Sequentially works pretty well, but I need to switch it into a parallel version. >>>>>>>> I started working with TS not very long time ago, there are few questions I?d like to share with you and if you have any advices I?d be happy to hear. >>>>>>>> First of all, do I need to use a DM object even if the model is only time dependent? All the examples I found were using that object for the other variable when solving PDEs. >>>>>>> >>>>>>> Are you considering SIR on a spatial domain? If so, you can parallelize your model in the spatial domain using DM. Splitting the three variables in the ODE among processors would not scale. >>>>>> >>>>>> Even though it will not scale and will deliver slower performance it is completely possible for you to solve the 3 variable problem using 3 MPI ranks. Or 10 mpi ranks. You would just create vectors/matrices with 1 degree of freedom for the first three ranks and no degrees of freedom for the later ranks. During your function evaluation (and Jacobian evaluation) for TS you will need to set up the appropriate communication to get the values you need on each rank to evaluate the parts of the function evaluation needed by that rank. This is true for parallelizing any computation. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> Hong (Mr.) >>>>>>> >>>>>>>> When I preallocate the space for the Jacobian matrix, is it better to decide the local or global space? >>>>>>>> >>>>>>>> Best, >>>>>>>> Francesco >>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >>> -- >>> Stefano >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brardafrancesco at gmail.com Tue May 4 02:53:54 2021 From: brardafrancesco at gmail.com (Francesco Brarda) Date: Tue, 4 May 2021 09:53:54 +0200 Subject: [petsc-users] Parallel TS for ODE In-Reply-To: <25CBF44D-2739-4748-97EF-A2007802B780@petsc.dev> References: <278D238E-4519-4943-BC8D-607CF62F0025@gmail.com> <8D806473-44A4-4FA6-A119-6716F6C1C929@anl.gov> <485566D7-83DB-4EA9-87D4-2195F50E2D82@gmail.com> <9C3E8A29-C627-4A2C-B56D-2DFE9C1516D5@gmail.com> <00DC179D-1E7B-4ED7-B64B-A074882A5D34@gmail.com> <25CBF44D-2739-4748-97EF-A2007802B780@petsc.dev> Message-ID: <451AA1F5-07E4-4704-BA7F-3A1C5175DEE4@gmail.com> Thank you very much everyone. I do have one more question. > For what you want to do you can use 3,1,2. This says three "spatial" points, 1 dof at each "spatial" point and 2 ghost points. In your case "spatial" does not mean spatial in space it is just three abstract points. In this case, since I have 2 ghost points, should I change the DMBoundaryType? For one process this works, but with 2 or 3 procs it doesn?t. The error I have is the following: Solving a non-linear TS problem, number of processors = 2 [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: Argument out of range [1]PETSC ERROR: Local x-width of domain x 1 is smaller than stencil width s 2 [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [1]PETSC ERROR: Petsc Release Version 3.14.4, unknown [1]PETSC ERROR: ./test_ic on a arch-debug named srvulx13 by fbrarda Mon May 3 17:46:37 2021 [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc --download-mpich PETSC_ARCH=arch-debug [1]PETSC ERROR: #1 DMSetUp_DA_1D() line 199 in /home/fbrarda/petsc/src/dm/impls/da/da1.c [1]PETSC ERROR: #2 DMSetUp_DA() line 20 in /home/fbrarda/petsc/src/dm/impls/da/dareg.c [1]PETSC ERROR: #3 DMSetUp() line 787 in /home/fbrarda/petsc/src/dm/interface/dm.c [1]PETSC ERROR: #4 main() line 232 in test_ic.c [1]PETSC ERROR: No PETSc Option Table entries [1]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- > Il giorno 2 mag 2021, alle ore 18:54, Barry Smith ha scritto: > > > [0]PETSC ERROR: Wrong subtype object:Parameter # 1 must have implementation da it is shell > > Are you calling TSSetDM() to supply your created DMDA to the TS? Based on the error message you are not, it is using a default shell DM, which is what TS does if you do not provide them with a DM. You need to call TSSetDM() after you create the TS and the DMDA. > > >> On Apr 29, 2021, at 2:57 AM, Francesco Brarda wrote: >> >> I defined the DM as follows >> ierr = DMDACreate1d(PETSC_COMM_WORLD,DM_BOUNDARY_NONE,3,3,3,NULL,&da);CHKERRQ(ierr); > > If you truly want one "spatial" point then you would want to use 1,3,0. This says 1 location in space, three degrees of freedom at that point and 0 ghost values (since there is only one spatial point there can be no ghost spatial values). > > BUT DMDA ALWAYS puts all degree of freedom at a point on the same process so this will not give you parallelism. All 3 dof will be on the same MPI rank. > > For what you want to do you can use 3,1,2. This says three "spatial" points, 1 dof at each "spatial" point and 2 ghost points. In your case "spatial" does not mean spatial in space it is just three abstract points. > > The global vectors (from DMCreate or Get GlobalVector) will have 1 value on each rank. The local vector from DMCreate or Get LocalVector) will have three values on each rank. Your initial conditions code would be something like > > if (rank == 0) { >> x[0] = appctx->N - appctx->p[2]; > } else if (rank == 1) { >> x[1] = appctx->p[2]; > } else { >> x[2] = 0.0; > } > > > Your TSSetRHSFunction() would make a call to DMGetLocalVector(...&localX), do a DMGlobalToLocalBegin/End() from the input global X to localX, you would call DMDAVecGetArray(...,&xarray) on the localX and access all three values in xarray. The resulting computation of f the output vector would be something like > > if (rank == 0) { >> farray[0] = your code that can use xarray[0], xarray[1], xarray[2] > } else if (rank == 1) { >> farray[1] = your code that can use xarray[0], xarray[1], xarray[2] > } else { >> farray[2] = your code that can use xarray[0], xarray[1], xarray[2] > } > > There are many examples of this pattern in the example tutorials. > > When you implement a code with a spatial distribution you would use a dof of 3 at each point and not parallelize over the dof at each point. Likely you want to use DMNETWORK to manage the spatial distribution since it has a simple API and allows any number of different number of neighbors for each point. DMDA would not make sense for true spatial distribution except in some truly trivial neighbor configurations. > > Barry > > > >> >> I am not sure whether I correctly understood this command properly. The vector should have 3 components (S, I, R) and 3 DOF as it is defined only when the three coordinates have been set. >> Then I create a global vector X. When I set the initial conditions as below >> >> static PetscErrorCode InitialConditions(TS ts,Vec X, void *ctx) >> { >> PetscErrorCode ierr; >> AppCtx *appctx = (AppCtx*) ctx; >> PetscScalar *x; >> DM da; >> >> PetscFunctionBeginUser; >> ierr = TSGetDM(ts,&da);CHKERRQ(ierr); >> >> /* Get pointers to vector data */ >> ierr = DMDAVecGetArray(da,X,(void*)&x);CHKERRQ(ierr); >> >> x[0] = appctx->N - appctx->p[2]; >> x[1] = appctx->p[2]; >> x[2] = 0.0; >> >> ierr = DMDAVecRestoreArray(da,X,(void*)&x);CHKERRQ(ierr); >> PetscFunctionReturn(0); >> } >> >> I have the error: >> >> [0]PETSC ERROR: Invalid argument >> [0]PETSC ERROR: Wrong subtype object:Parameter # 1 must have implementation da it is shell >> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown >> [0]PETSC ERROR: ./par_sir_model on a arch-debug named srvulx13 by fbrarda Thu Apr 29 09:36:17 2021 >> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc --download-mpich PETSC_ARCH=arch-debug >> [0]PETSC ERROR: #1 DMDAVecGetArray() line 48 in /home/fbrarda/petsc/src/dm/impls/da/dagetarray.c >> [0]PETSC ERROR: #2 InitialConditions() line 175 in par_sir_model.c >> [0]PETSC ERROR: #3 main() line 295 in par_sir_model.c >> [0]PETSC ERROR: No PETSc Option Table entries >> >> I would be very happy to receive any advices to fix the code. >> Best, >> Francesco >> >>> Il giorno 20 apr 2021, alle ore 21:35, Matthew Knepley ha scritto: >>> >>> On Tue, Apr 20, 2021 at 1:17 PM Francesco Brarda wrote: >>> Thank you for the advices, I would just like to convert the code I already have to see what might happen once parallelized. >>> Do you think it is better to put the 3 equations into a 1d Distributed Array with 3 dofs and run the job with multiple procs regardless of how many equations I have? Is it possible? >>> >>> If you plan in the end to use a structured grid, this is a great plan. If not, this is not a good plan. >>> >>> Thanks, >>> >>> Matt >>> >>> Thank you, >>> Francesco >>> >>>> Il giorno 20 apr 2021, alle ore 17:57, Stefano Zampini ha scritto: >>>> >>>> It does not make sense to parallelize to 1 equation per process, unless that single equation per process is super super super costly. >>>> Is this work you are doing used to understand PETSc parallelization strategy? if so, there are multiple examples in the sourcetree that you can look at to populate matrices and vectors in parallel >>>> >>>> Il giorno mar 20 apr 2021 alle ore 17:52 Francesco Brarda ha scritto: >>>> In principle the entire code was for 1 proc only. The functions were built with VecGetArray(). While adapting the code for multiple procs I thought using VecGetOwnershipRange was a possible way to allocate the equations in the vector using multiple procs. What do you think, please? >>>> >>>> Thank you, >>>> Francesco >>>> >>>>> Il giorno 20 apr 2021, alle ore 16:43, Matthew Knepley ha scritto: >>>>> >>>>> On Tue, Apr 20, 2021 at 10:41 AM Francesco Brarda wrote: >>>>> I was trying to follow Barry's advice some time ago, but I guess that's not the way he meant it. How should I refer to the values contained in x? With Distributed Arrays? >>>>> >>>>> That is how you get values from x. However, I cannot understand at all what you are doing with "mybase". >>>>> >>>>> Matt >>>>> >>>>> Thanks >>>>> Francesco >>>>> >>>>>>> Even though it will not scale and will deliver slower performance it is completely possible for you to solve the 3 variable problem using 3 MPI ranks. Or 10 mpi ranks. You would just create vectors/matrices with 1 degree of freedom for the first three ranks and no degrees of freedom for the later ranks. During your function evaluation (and Jacobian evaluation) for TS you will need to set up the appropriate communication to get the values you need on each rank to evaluate the parts of the function evaluation needed by that rank. This is true for parallelizing any computation. >>>>>>> >>>>>>> Barry >>>>> >>>>> >>>>> >>>>>> Il giorno 20 apr 2021, alle ore 15:40, Matthew Knepley ha scritto: >>>>>> >>>>>> On Tue, Apr 20, 2021 at 9:36 AM Francesco Brarda wrote: >>>>>> Hi! >>>>>> I tried to implement the SIR model taking into account the fact that I will only use 3 MPI ranks at this moment. >>>>>> I built vectors and matrices following the examples already available. In particular, I defined the functions required similarly (RHSFunction, IFunction, IJacobian), as follows: >>>>>> >>>>>> I don't think this makes sense. You use "mybase" to distinguish between 3 procs, which would indicate that each procs has only >>>>>> 1 degree of freedom. However, you use x[1] on each proc, indicating it has at least 2 dofs. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> static PetscErrorCode RHSFunction(TS ts,PetscReal t,Vec X,Vec F,void *ctx) >>>>>> { >>>>>> PetscErrorCode ierr; >>>>>> AppCtx *appctx = (AppCtx*) ctx; >>>>>> PetscScalar f;//, *x_localptr; >>>>>> const PetscScalar *x; >>>>>> PetscInt mybase; >>>>>> >>>>>> PetscFunctionBeginUser; >>>>>> ierr = VecGetOwnershipRange(X,&mybase,NULL);CHKERRQ(ierr); >>>>>> ierr = VecGetArrayRead(X,&x);CHKERRQ(ierr); >>>>>> if (mybase == 0) { >>>>>> f = (PetscScalar) (-appctx->p1*x[0]*x[1]/appctx->N); >>>>>> ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES); >>>>>> } >>>>>> if (mybase == 1) { >>>>>> f = (PetscScalar) (appctx->p1*x[0]*x[1]/appctx->N-appctx->p2*x[1]); >>>>>> ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES); >>>>>> } >>>>>> if (mybase == 2) { >>>>>> f = (PetscScalar) (appctx->p2*x[1]); >>>>>> ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES); >>>>>> } >>>>>> ierr = VecRestoreArrayRead(X,&x);CHKERRQ(ierr); >>>>>> ierr = VecAssemblyBegin(F);CHKERRQ(ierr); >>>>>> ierr = VecAssemblyEnd(F);CHKERRQ(ierr); >>>>>> PetscFunctionReturn(0); >>>>>> } >>>>>> >>>>>> >>>>>> Whilst for the Jacobian I did: >>>>>> >>>>>> >>>>>> static PetscErrorCode IJacobian(TS ts,PetscReal t,Vec X,Vec Xdot,PetscReal a,Mat A,Mat B,void *ctx) >>>>>> { >>>>>> PetscErrorCode ierr; >>>>>> AppCtx *appctx = (AppCtx*) ctx; >>>>>> PetscInt mybase, rowcol[] = {0,1,2}; >>>>>> const PetscScalar *x; >>>>>> >>>>>> PetscFunctionBeginUser; >>>>>> ierr = MatGetOwnershipRange(B,&mybase,NULL);CHKERRQ(ierr); >>>>>> ierr = VecGetArrayRead(X,&x);CHKERRQ(ierr); >>>>>> if (mybase == 0) { >>>>>> const PetscScalar J[] = {a + appctx->p1*x[1]/appctx->N, appctx->p1*x[0]/appctx->N, 0}; >>>>>> ierr = MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr); >>>>>> } >>>>>> if (mybase == 1) { >>>>>> const PetscScalar J[] = {- appctx->p1*x[1]/appctx->N, a - appctx->p1*x[0]/appctx->N + appctx->p2, 0}; >>>>>> ierr = MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr); >>>>>> } >>>>>> if (mybase == 2) { >>>>>> const PetscScalar J[] = {0, - appctx->p2, a}; >>>>>> ierr = MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr); >>>>>> } >>>>>> ierr = VecRestoreArrayRead(X,&x);CHKERRQ(ierr); >>>>>> >>>>>> ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>>> ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>>> if (A != B) { >>>>>> ierr = MatAssemblyBegin(B,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>>> ierr = MatAssemblyEnd(B,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>>> } >>>>>> PetscFunctionReturn(0); >>>>>> } >>>>>> >>>>>> This code does not provide the correct result, that is, the solution is the initial condition, either using implicit or explicit methods. Is the way I defined these objects wrong? How can I fix it? >>>>>> I also tried to print the Jacobian with the following commands but it does not work (blank rows and error message). How should I print the Jacobian? >>>>>> >>>>>> ierr = TSGetIJacobian(ts,NULL,&K, NULL, NULL); CHKERRQ(ierr); >>>>>> ierr = MatAssemblyBegin(K,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>>> ierr = MatAssemblyEnd(K,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>>> ierr = MatView(K,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); >>>>>> >>>>>> I would very much appreciate any kind of help or advice. >>>>>> Best, >>>>>> Francesco >>>>>> >>>>>>> Il giorno 2 apr 2021, alle ore 04:45, Barry Smith ha scritto: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Apr 1, 2021, at 9:17 PM, Zhang, Hong via petsc-users wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Mar 31, 2021, at 2:53 AM, Francesco Brarda wrote: >>>>>>>>> >>>>>>>>> Hi everyone! >>>>>>>>> >>>>>>>>> I am trying to solve a system of 3 ODEs (a basic SIR model) with TS. Sequentially works pretty well, but I need to switch it into a parallel version. >>>>>>>>> I started working with TS not very long time ago, there are few questions I?d like to share with you and if you have any advices I?d be happy to hear. >>>>>>>>> First of all, do I need to use a DM object even if the model is only time dependent? All the examples I found were using that object for the other variable when solving PDEs. >>>>>>>> >>>>>>>> Are you considering SIR on a spatial domain? If so, you can parallelize your model in the spatial domain using DM. Splitting the three variables in the ODE among processors would not scale. >>>>>>> >>>>>>> Even though it will not scale and will deliver slower performance it is completely possible for you to solve the 3 variable problem using 3 MPI ranks. Or 10 mpi ranks. You would just create vectors/matrices with 1 degree of freedom for the first three ranks and no degrees of freedom for the later ranks. During your function evaluation (and Jacobian evaluation) for TS you will need to set up the appropriate communication to get the values you need on each rank to evaluate the parts of the function evaluation needed by that rank. This is true for parallelizing any computation. >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Hong (Mr.) >>>>>>>> >>>>>>>>> When I preallocate the space for the Jacobian matrix, is it better to decide the local or global space? >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Francesco >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> >>>> -- >>>> Stefano >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From M.Deij at marin.nl Tue May 4 02:55:36 2021 From: M.Deij at marin.nl (Deij-van Rijswijk, Menno) Date: Tue, 4 May 2021 07:55:36 +0000 Subject: [petsc-users] MatDestroy problem with multiple matrices and SUPERLU_DIST In-Reply-To: References: <23eee78b7748418d948e850c50fefb5f@MAR190n2.marin.local> <6562389d4d694d92a74b3add0cbcc823@MAR190n2.marin.local> Message-ID: <5f24b5b1df3141e4a0c82de8dcca675a@MAR190n2.marin.local> Hi Barry, Thank you for this message about finalisation. I have checked that PetscFinalize is called after the problematic call to MatDestroy, and that is indeed the case. Furthermore, the module does not use "final". Menno dr. ir. Menno A. Deij-van Rijswijk | Researcher | Research & Development MARIN | T +31 317 49 35 06 | M.Deij at marin.nl | www.marin.nl [LinkedIn] [YouTube] [Twitter] [Facebook] MARIN news: Working paper on the Design of the Wageningen F-series From: Barry Smith Sent: Sunday, May 2, 2021 6:30 PM To: Deij-van Rijswijk, Menno Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] MatDestroy problem with multiple matrices and SUPERLU_DIST ==1026905== by 0x5317899: MatDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x5336E58: matdestroy_ (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x1528710: __fsi_MOD_fem_constructmatricespetscexit (fsi.F90:2297) ==1026905== Address 0x2ce67398 is 11,112 bytes inside an unallocated block of size 11,232 in arena "client" Is it possible that this __fsi_MOD_fem_constructmatricespetscexit is being called AFTER PetscFinalize()? Perhaps it is defined with a "final" and the compiler/linker schedule it to be called after the program has "completed". This would explain the crash, the valgrind stack frames and why it even does not crash with MPICH. This can happen with C++ destructors in code such as MyC++Class my; <-- has a destructor that destroys PETSc objects PetscInitialize() .... PetscFinalize() <-- the destructor gets called here and messes with MPI data that no longer exists. return 0; } The fix is to force the destructor to be called before PETSc finalize and this can be done with PetscInitialize() { MyC++Class my; <-- has a destructor that destroys PETSc objects .... <-- the destructor gets called here and everything is fine } PetscFinalize() return 0; } I don't know the details of how Fortran's final is implemented but this is my current guess as to what is happening in your code and you need to somehow arrange for the module final to be called before PetscFinalize(). Barry On Apr 28, 2021, at 7:22 AM, Deij-van Rijswijk, Menno > wrote: The modules have automatic freeing in as much as that when a variable that is local to a subroutine is ALLOCATE'd, it is automatically freed when the subroutine returns. I don't think that is problematic, as MatDestroy is used a lot in the code and normally executes just fine. As far as I can see, no specific new communicators are created; MatCreateAIJ or MatCreateSeqAIJ are called with PETSC_COMM_WORLD, resp. PETSC_COMM_SELF as first argument. We also run this with the Intel MPI library, which is based on MPICH. There this problem does not occur. The Valgrind run did not produce any new insights (at least not for me), I have pasted the relevant bits at the end of this message. I did a run on debug versions of PETSc (v3.14.5) and OpenMPI (v 3.1.2) and I find the following stack trace with line numbers for each frame. Maybe that helps in further pinpointing the problem. 0x0000155540d11719 in ompi_comm_free (comm=0x483f4e0) at /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/communicator/comm.c:1470 1470 if ( ! OMPI_COMM_IS_INTRINSIC((*comm)->c_local_comm)) { Missing separate debuginfos, use: yum debuginfo-install libgcc-8.3.1-5.el8.0.2.x86_64 libgfortran-8.3.1-5.el8.0.2.x86_64 libibumad-47mlnx1-1.47329.x86_64 libibverbs-47mlnx1-1.47329.x86_64 libnl3-3.5.0-1.el8.x86_64 libquadmath-8.3.1-5.el8.0.2.x86_64 librdmacm-47mlnx1-1.47329.x86_64 libstdc++-8.3.1-5.el8.0.2.x86_64 libxml2-2.9.7-7.el8.x86_64 numactl-libs-2.0.12-9.el8.x86_64 opensm-libs-5.5.1.MLNX20191120.0c8dde0-0.1.47329.x86_64 openssl-libs-1.1.1c-15.el8.x86_64 python3-libs-3.6.8-23.el8.x86_64 sssd-client-2.2.3-20.el8.x86_64 ucx-cma-1.7.0-1.47329.x86_64 ucx-ib-1.7.0-1.47329.x86_64 xz-libs-5.2.4-3.el8.x86_64 zlib-1.2.11-16.el8_2.x86_64 (gdb) bt #0 0x0000155540d11719 in ompi_comm_free (comm=0x483f4e0) at /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/communicator/comm.c:1470 #1 0x0000155540d4f1af in PMPI_Comm_free (comm=0x483f4e0) at pcomm_free.c:62 #2 0x000015555346329a in superlu_gridexit (grid=0x483f4e0) at /home/mdeij/install-gnu/extLibs/Linux-x86_64-Intel/superlu_dist-6.3.0/SRC/superlu_grid.c:174 #3 0x0000155553ca2ff1 in Petsc_Superlu_dist_keyval_Delete_Fn (comm=0x3921b10, keyval=16, attr_val=0x483f4d0, extra_state=0x0) at /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:97 #4 0x0000155540d0baa1 in ompi_attr_delete_impl (type=COMM_ATTR, object=0x3921b10, attr_hash=0x377efe0, key=16, predefined=true) at /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/attribute/attribute.c:1062 #5 0x0000155540d0c039 in ompi_attr_delete_all (type=COMM_ATTR, object=0x3921b10, attr_hash=0x377efe0) at /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/attribute/attribute.c:1166 #6 0x0000155540d11676 in ompi_comm_free (comm=0x7fffffffc5c0) at /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/communicator/comm.c:1462 #7 0x0000155540d4f1af in PMPI_Comm_free (comm=0x7fffffffc5c0) at pcomm_free.c:62 #8 0x000015555393fb68 in PetscCommDestroy (comm=0x3943a60) at /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/sys/objects/tagm.c:217 #9 0x0000155553941e07 in PetscHeaderDestroy_Private (h=0x3943a20) at /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/sys/objects/inherit.c:121 #10 0x000015555408edfe in MatDestroy (A=0x3558c18) at /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/mat/interface/matrix.c:1306 #11 0x00001555540cb5fa in matdestroy_ (A=0x3558c18, __ierr=0x7fffffffc73c) at /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/mat/interface/ftn-auto/matrixf.c:770 Valgrind output: ==1026905== Invalid read of size 1 ==1026905== at 0x19184538: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x6943B61: superlu_gridexit (in /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0) ==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x1912447B: ompi_attr_delete_impl (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x19126FFE: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x4FEE49D: PetscCommDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x5317899: MatDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x5336E58: matdestroy_ (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x1528710: __fsi_MOD_fem_constructmatricespetscexit (fsi.F90:2297) ==1026905== Address 0x2ce67398 is 11,112 bytes inside an unallocated block of size 11,232 in arena "client" ==1026905== ==1026905== Invalid read of size 8 ==1026905== at 0x1912AC9A: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x6943B61: superlu_gridexit (in /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0) ==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x1912447B: ompi_attr_delete_impl (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x19126FFE: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x4FEE49D: PetscCommDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x5317899: MatDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x5336E58: matdestroy_ (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== Address 0x2ce673c0 is 11,152 bytes inside an unallocated block of size 11,232 in arena "client" ==1026905== ==1026905== Invalid read of size 8 ==1026905== at 0x19126E5B: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x6943B61: superlu_gridexit (in /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0) ==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x1912447B: ompi_attr_delete_impl (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x19126FFE: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x4FEE49D: PetscCommDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x5317899: MatDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== Address 0x91 is not stack'd, malloc'd or (recently) free'd ==1026905== ==1026905== ==1026905== Process terminating with default action of signal 11 (SIGSEGV) ==1026905== Access not within mapped region at address 0x91 ==1026905== at 0x19126E5B: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x6943B61: superlu_gridexit (in /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0) ==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x1912447B: ompi_attr_delete_impl (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x19126FFE: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x4FEE49D: PetscCommDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x5317899: MatDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== If you believe this happened as a result of a stack ==1026905== overflow in your program's main thread (unlikely but ==1026905== possible), you can try to increase the size of the ==1026905== main thread stack using the --main-stacksize= flag. ==1026905== The main thread stack size used in this run was 16777216. dr. ir. Menno A. Deij-van Rijswijk | Researcher | Research & Development MARIN | T +31 317 49 35 06 | M.Deij at marin.nl | www.marin.nl MARIN news: WASP webinar & WiSP workshop From: Barry Smith > Sent: Friday, April 23, 2021 7:09 PM To: Deij-van Rijswijk, Menno > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] MatDestroy problem with multiple matrices and SUPERLU_DIST Thanks for looking. Do these modules have any "automatic freeing" when variables go out of scope (like C++ classes do)? Do you make specific new MPI communicators to use create the matrices? Have you tried MPICH or a different version of OpenMPI. Maybe run the program with valgrind. The stack frames you sent look "funny", that is I would not normally expect them to be in such an order. Barry Help us improve the spam filter. If this message contains SPAM, click here to report. Thank you, MARIN Digital Services -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image021333.PNG Type: image/png Size: 293 bytes Desc: image021333.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image6d0c90.PNG Type: image/png Size: 331 bytes Desc: image6d0c90.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image8d0af3.PNG Type: image/png Size: 333 bytes Desc: image8d0af3.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image92bfe7.PNG Type: image/png Size: 253 bytes Desc: image92bfe7.PNG URL: From stefano.zampini at gmail.com Tue May 4 03:57:23 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 4 May 2021 11:57:23 +0300 Subject: [petsc-users] Parallel TS for ODE In-Reply-To: <451AA1F5-07E4-4704-BA7F-3A1C5175DEE4@gmail.com> References: <278D238E-4519-4943-BC8D-607CF62F0025@gmail.com> <8D806473-44A4-4FA6-A119-6716F6C1C929@anl.gov> <485566D7-83DB-4EA9-87D4-2195F50E2D82@gmail.com> <9C3E8A29-C627-4A2C-B56D-2DFE9C1516D5@gmail.com> <00DC179D-1E7B-4ED7-B64B-A074882A5D34@gmail.com> <25CBF44D-2739-4748-97EF-A2007802B780@petsc.dev> <451AA1F5-07E4-4704-BA7F-3A1C5175DEE4@gmail.com> Message-ID: Using DMDA to get automatic parallelization of a 0-D code is nonsense. Ghost points are meant for spatially distributed data, not for 0D. If you really want to use DM, you should use DMRedundant and call DMGlobalToLocal to go to your distributed to local full representation (it calls MPI_Bcast internally). My advice is to read the many examples we have. Il giorno mar 4 mag 2021 alle ore 10:54 Francesco Brarda < brardafrancesco at gmail.com> ha scritto: > Thank you very much everyone. I do have one more question. > > For what you want to do you can use 3,1,2. This says three "spatial" > points, 1 dof at each "spatial" point and 2 ghost points. In your > case "spatial" does not mean spatial in space it is just three abstract > points. > > In this case, since I have 2 ghost points, should I change the > DMBoundaryType? > For one process this works, but with 2 or 3 procs it doesn?t. The error I > have is the following: > > Solving a non-linear TS problem, number of processors = 2 > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: Argument out of range > [1]PETSC ERROR: Local x-width of domain x 1 is smaller than stencil width > s 2 > [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.14.4, unknown > [1]PETSC ERROR: ./test_ic on a arch-debug named srvulx13 by fbrarda Mon > May 3 17:46:37 2021 > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc > --download-mpich PETSC_ARCH=arch-debug > [1]PETSC ERROR: #1 DMSetUp_DA_1D() line 199 in > /home/fbrarda/petsc/src/dm/impls/da/da1.c > [1]PETSC ERROR: #2 DMSetUp_DA() line 20 in > /home/fbrarda/petsc/src/dm/impls/da/dareg.c > [1]PETSC ERROR: #3 DMSetUp() line 787 in > /home/fbrarda/petsc/src/dm/interface/dm.c > [1]PETSC ERROR: #4 main() line 232 in test_ic.c > [1]PETSC ERROR: No PETSc Option Table entries > [1]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > > Il giorno 2 mag 2021, alle ore 18:54, Barry Smith ha > scritto: > > > [0]PETSC ERROR: Wrong subtype object:Parameter # 1 must have > implementation da it is shell > > Are you calling TSSetDM() to supply your created DMDA to the TS? Based on > the error message you are not, it is using a default shell DM, which is > what TS does if you do not provide them with a DM. You need to call > TSSetDM() after you create the TS and the DMDA. > > > On Apr 29, 2021, at 2:57 AM, Francesco Brarda > wrote: > > I defined the DM as follows > ierr = > DMDACreate1d(PETSC_COMM_WORLD,DM_BOUNDARY_NONE,3,3,3,NULL,&da);CHKERRQ(ierr); > > > If you truly want one "spatial" point then you would want to use 1,3,0. > This says 1 location in space, three degrees of freedom at that point and 0 > ghost values (since there is only one spatial point there can be no ghost > spatial values). > > BUT DMDA ALWAYS puts all degree of freedom at a point on the same process > so this will not give you parallelism. All 3 dof will be on the same MPI > rank. > > For what you want to do you can use 3,1,2. This says three "spatial" > points, 1 dof at each "spatial" point and 2 ghost points. In your > case "spatial" does not mean spatial in space it is just three abstract > points. > > The global vectors (from DMCreate or Get GlobalVector) will have 1 value > on each rank. The local vector from DMCreate or Get LocalVector) will have > three values on each rank. Your initial conditions code would be something > like > > if (rank == 0) { > > x[0] = appctx->N - appctx->p[2]; > > } else if (rank == 1) { > > x[1] = appctx->p[2]; > > } else { > > x[2] = 0.0; > > } > > > Your TSSetRHSFunction() would make a call to DMGetLocalVector(...&localX), > do a DMGlobalToLocalBegin/End() from the input global X to localX, you > would call DMDAVecGetArray(...,&xarray) on the localX and access all three > values in xarray. The resulting computation of f the output vector would be > something like > > if (rank == 0) { > > farray[0] = your code that can use xarray[0], xarray[1], xarray[2] > > } else if (rank == 1) { > > farray[1] = your code that can use xarray[0], xarray[1], xarray[2] > > } else { > > farray[2] = your code that can use xarray[0], xarray[1], xarray[2] > > } > > There are many examples of this pattern in the example tutorials. > > When you implement a code with a spatial distribution you would use a dof > of 3 at each point and not parallelize over the dof at each point. Likely > you want to use DMNETWORK to manage the spatial distribution since it has a > simple API and allows any number of different number of neighbors for each > point. DMDA would not make sense for true spatial distribution except in > some truly trivial neighbor configurations. > > Barry > > > > > I am not sure whether I correctly understood this command properly. The > vector should have 3 components (S, I, R) and 3 DOF as it is defined only > when the three coordinates have been set. > Then I create a global vector X. When I set the initial conditions as > below > > static PetscErrorCode InitialConditions(TS ts,Vec X, void *ctx) > { > PetscErrorCode ierr; > AppCtx *appctx = (AppCtx*) ctx; > PetscScalar *x; > DM da; > > PetscFunctionBeginUser; > ierr = TSGetDM(ts,&da);CHKERRQ(ierr); > > /* Get pointers to vector data */ > ierr = DMDAVecGetArray(da,X,(void*)&x);CHKERRQ(ierr); > > x[0] = appctx->N - appctx->p[2]; > x[1] = appctx->p[2]; > x[2] = 0.0; > > ierr = DMDAVecRestoreArray(da,X,(void*)&x);CHKERRQ(ierr); > PetscFunctionReturn(0); > } > > I have the error: > > [0]PETSC ERROR: Invalid argument > [0]PETSC ERROR: Wrong subtype object:Parameter # 1 must have > implementation da it is shell > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown > [0]PETSC ERROR: ./par_sir_model on a arch-debug named srvulx13 by fbrarda > Thu Apr 29 09:36:17 2021 > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc > --download-mpich PETSC_ARCH=arch-debug > [0]PETSC ERROR: #1 DMDAVecGetArray() line 48 in > /home/fbrarda/petsc/src/dm/impls/da/dagetarray.c > [0]PETSC ERROR: #2 InitialConditions() line 175 in par_sir_model.c > [0]PETSC ERROR: #3 main() line 295 in par_sir_model.c > [0]PETSC ERROR: No PETSc Option Table entries > > I would be very happy to receive any advices to fix the code. > Best, > Francesco > > Il giorno 20 apr 2021, alle ore 21:35, Matthew Knepley > ha scritto: > > On Tue, Apr 20, 2021 at 1:17 PM Francesco Brarda < > brardafrancesco at gmail.com> wrote: > Thank you for the advices, I would just like to convert the code I already > have to see what might happen once parallelized. > Do you think it is better to put the 3 equations into a 1d Distributed > Array with 3 dofs and run the job with multiple procs regardless of how > many equations I have? Is it possible? > > If you plan in the end to use a structured grid, this is a great plan. If > not, this is not a good plan. > > Thanks, > > Matt > > Thank you, > Francesco > > Il giorno 20 apr 2021, alle ore 17:57, Stefano Zampini < > stefano.zampini at gmail.com> ha scritto: > > It does not make sense to parallelize to 1 equation per process, unless > that single equation per process is super super super costly. > Is this work you are doing used to understand PETSc parallelization > strategy? if so, there are multiple examples in the sourcetree that you can > look at to populate matrices and vectors in parallel > > Il giorno mar 20 apr 2021 alle ore 17:52 Francesco Brarda < > brardafrancesco at gmail.com> ha scritto: > In principle the entire code was for 1 proc only. The functions were built > with VecGetArray(). While adapting the code for multiple procs I thought > using VecGetOwnershipRange was a possible way to allocate the equations in > the vector using multiple procs. What do you think, please? > > Thank you, > Francesco > > Il giorno 20 apr 2021, alle ore 16:43, Matthew Knepley > ha scritto: > > On Tue, Apr 20, 2021 at 10:41 AM Francesco Brarda < > brardafrancesco at gmail.com> wrote: > I was trying to follow Barry's advice some time ago, but I guess that's > not the way he meant it. How should I refer to the values contained in x? > With Distributed Arrays? > > That is how you get values from x. However, I cannot understand at all > what you are doing with "mybase". > > Matt > > Thanks > Francesco > > Even though it will not scale and will deliver slower performance it is > completely possible for you to solve the 3 variable problem using 3 MPI > ranks. Or 10 mpi ranks. You would just create vectors/matrices with 1 > degree of freedom for the first three ranks and no degrees of freedom for > the later ranks. During your function evaluation (and Jacobian evaluation) > for TS you will need to set up the appropriate communication to get the > values you need on each rank to evaluate the parts of the function > evaluation needed by that rank. This is true for parallelizing any > computation. > > Barry > > > > > Il giorno 20 apr 2021, alle ore 15:40, Matthew Knepley > ha scritto: > > On Tue, Apr 20, 2021 at 9:36 AM Francesco Brarda < > brardafrancesco at gmail.com> wrote: > Hi! > I tried to implement the SIR model taking into account the fact that I > will only use 3 MPI ranks at this moment. > I built vectors and matrices following the examples already available. In > particular, I defined the functions required similarly (RHSFunction, > IFunction, IJacobian), as follows: > > I don't think this makes sense. You use "mybase" to distinguish between 3 > procs, which would indicate that each procs has only > 1 degree of freedom. However, you use x[1] on each proc, indicating it has > at least 2 dofs. > > Thanks, > > Matt > > static PetscErrorCode RHSFunction(TS ts,PetscReal t,Vec X,Vec F,void *ctx) > { > PetscErrorCode ierr; > AppCtx *appctx = (AppCtx*) ctx; > PetscScalar f;//, *x_localptr; > const PetscScalar *x; > PetscInt mybase; > > PetscFunctionBeginUser; > ierr = VecGetOwnershipRange(X,&mybase,NULL);CHKERRQ(ierr); > ierr = VecGetArrayRead(X,&x);CHKERRQ(ierr); > if (mybase == 0) { > f = (PetscScalar) (-appctx->p1*x[0]*x[1]/appctx->N); > ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES); > } > if (mybase == 1) { > f = (PetscScalar) (appctx->p1*x[0]*x[1]/appctx->N-appctx->p2*x[1]); > ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES); > } > if (mybase == 2) { > f = (PetscScalar) (appctx->p2*x[1]); > ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES); > } > ierr = VecRestoreArrayRead(X,&x);CHKERRQ(ierr); > ierr = VecAssemblyBegin(F);CHKERRQ(ierr); > ierr = VecAssemblyEnd(F);CHKERRQ(ierr); > PetscFunctionReturn(0); > } > > > Whilst for the Jacobian I did: > > > static PetscErrorCode IJacobian(TS ts,PetscReal t,Vec X,Vec Xdot,PetscReal > a,Mat A,Mat B,void *ctx) > { > PetscErrorCode ierr; > AppCtx *appctx = (AppCtx*) ctx; > PetscInt mybase, rowcol[] = {0,1,2}; > const PetscScalar *x; > > PetscFunctionBeginUser; > ierr = MatGetOwnershipRange(B,&mybase,NULL);CHKERRQ(ierr); > ierr = VecGetArrayRead(X,&x);CHKERRQ(ierr); > if (mybase == 0) { > const PetscScalar J[] = {a + appctx->p1*x[1]/appctx->N, > appctx->p1*x[0]/appctx->N, 0}; > ierr = > MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr); > } > if (mybase == 1) { > const PetscScalar J[] = {- appctx->p1*x[1]/appctx->N, a - > appctx->p1*x[0]/appctx->N + appctx->p2, 0}; > ierr = > MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr); > } > if (mybase == 2) { > const PetscScalar J[] = {0, - appctx->p2, a}; > ierr = > MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr); > } > ierr = VecRestoreArrayRead(X,&x);CHKERRQ(ierr); > > ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); > ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); > if (A != B) { > ierr = MatAssemblyBegin(B,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); > ierr = MatAssemblyEnd(B,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); > } > PetscFunctionReturn(0); > } > > This code does not provide the correct result, that is, the solution is > the initial condition, either using implicit or explicit methods. Is the > way I defined these objects wrong? How can I fix it? > I also tried to print the Jacobian with the following commands but it does > not work (blank rows and error message). How should I print the Jacobian? > > ierr = TSGetIJacobian(ts,NULL,&K, NULL, NULL); CHKERRQ(ierr); > ierr = MatAssemblyBegin(K,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); > ierr = MatAssemblyEnd(K,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); > ierr = MatView(K,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); > > > I would very much appreciate any kind of help or advice. > Best, > Francesco > > Il giorno 2 apr 2021, alle ore 04:45, Barry Smith ha > scritto: > > > > On Apr 1, 2021, at 9:17 PM, Zhang, Hong via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > On Mar 31, 2021, at 2:53 AM, Francesco Brarda > wrote: > > Hi everyone! > > I am trying to solve a system of 3 ODEs (a basic SIR model) with TS. > Sequentially works pretty well, but I need to switch it into a parallel > version. > I started working with TS not very long time ago, there are few questions > I?d like to share with you and if you have any advices I?d be happy to hear. > First of all, do I need to use a DM object even if the model is only time > dependent? All the examples I found were using that object for the other > variable when solving PDEs. > > > Are you considering SIR on a spatial domain? If so, you can parallelize > your model in the spatial domain using DM. Splitting the three variables in > the ODE among processors would not scale. > > > Even though it will not scale and will deliver slower performance it is > completely possible for you to solve the 3 variable problem using 3 MPI > ranks. Or 10 mpi ranks. You would just create vectors/matrices with 1 > degree of freedom for the first three ranks and no degrees of freedom for > the later ranks. During your function evaluation (and Jacobian evaluation) > for TS you will need to set up the appropriate communication to get the > values you need on each rank to evaluate the parts of the function > evaluation needed by that rank. This is true for parallelizing any > computation. > > Barry > > > > > > Hong (Mr.) > > When I preallocate the space for the Jacobian matrix, is it better to > decide the local or global space? > > Best, > Francesco > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > > -- > Stefano > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue May 4 10:40:25 2021 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 4 May 2021 10:40:25 -0500 Subject: [petsc-users] Parallel TS for ODE In-Reply-To: <451AA1F5-07E4-4704-BA7F-3A1C5175DEE4@gmail.com> References: <278D238E-4519-4943-BC8D-607CF62F0025@gmail.com> <8D806473-44A4-4FA6-A119-6716F6C1C929@anl.gov> <485566D7-83DB-4EA9-87D4-2195F50E2D82@gmail.com> <9C3E8A29-C627-4A2C-B56D-2DFE9C1516D5@gmail.com> <00DC179D-1E7B-4ED7-B64B-A074882A5D34@gmail.com> <25CBF44D-2739-4748-97EF-A2007802B780@petsc.dev> <451AA1F5-07E4-4704-BA7F-3A1C5175DEE4@gmail.com> Message-ID: > On May 4, 2021, at 2:53 AM, Francesco Brarda wrote: > > Thank you very much everyone. I do have one more question. >> For what you want to do you can use 3,1,2. This says three "spatial" points, 1 dof at each "spatial" point and 2 ghost points. In your case "spatial" does not mean spatial in space it is just three abstract points. > In this case, since I have 2 ghost points, should I change the DMBoundaryType? > For one process this works, but with 2 or 3 procs it doesn?t. The error I have is the following: > > Solving a non-linear TS problem, number of processors = 2 > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: Argument out of range > [1]PETSC ERROR: Local x-width of domain x 1 is smaller than stencil width s 2 I had forgotten this. It is a limitation of the DMDA implementation that one cannot require data from two ranks away from the current rank. But yes perhaps you can "cheat" by using DM_BOUNDARY_PERIODIC and a stencil width of 1. For each rank it will bring the value from the previous and next rank and due to the periodicity it will thus bring the 3rd value to the first rank and the first value to the third rank. Barry > [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.14.4, unknown > [1]PETSC ERROR: ./test_ic on a arch-debug named srvulx13 by fbrarda Mon May 3 17:46:37 2021 > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc --download-mpich PETSC_ARCH=arch-debug > [1]PETSC ERROR: #1 DMSetUp_DA_1D() line 199 in /home/fbrarda/petsc/src/dm/impls/da/da1.c > [1]PETSC ERROR: #2 DMSetUp_DA() line 20 in /home/fbrarda/petsc/src/dm/impls/da/dareg.c > [1]PETSC ERROR: #3 DMSetUp() line 787 in /home/fbrarda/petsc/src/dm/interface/dm.c > [1]PETSC ERROR: #4 main() line 232 in test_ic.c > [1]PETSC ERROR: No PETSc Option Table entries > [1]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov ---------- > >> Il giorno 2 mag 2021, alle ore 18:54, Barry Smith > ha scritto: >> >> >> [0]PETSC ERROR: Wrong subtype object:Parameter # 1 must have implementation da it is shell >> >> Are you calling TSSetDM() to supply your created DMDA to the TS? Based on the error message you are not, it is using a default shell DM, which is what TS does if you do not provide them with a DM. You need to call TSSetDM() after you create the TS and the DMDA. >> >> >>> On Apr 29, 2021, at 2:57 AM, Francesco Brarda > wrote: >>> >>> I defined the DM as follows >>> ierr = DMDACreate1d(PETSC_COMM_WORLD,DM_BOUNDARY_NONE,3,3,3,NULL,&da);CHKERRQ(ierr); >> >> If you truly want one "spatial" point then you would want to use 1,3,0. This says 1 location in space, three degrees of freedom at that point and 0 ghost values (since there is only one spatial point there can be no ghost spatial values). >> >> BUT DMDA ALWAYS puts all degree of freedom at a point on the same process so this will not give you parallelism. All 3 dof will be on the same MPI rank. >> >> For what you want to do you can use 3,1,2. This says three "spatial" points, 1 dof at each "spatial" point and 2 ghost points. In your case "spatial" does not mean spatial in space it is just three abstract points. >> >> The global vectors (from DMCreate or Get GlobalVector) will have 1 value on each rank. The local vector from DMCreate or Get LocalVector) will have three values on each rank. Your initial conditions code would be something like >> >> if (rank == 0) { >>> x[0] = appctx->N - appctx->p[2]; >> } else if (rank == 1) { >>> x[1] = appctx->p[2]; >> } else { >>> x[2] = 0.0; >> } >> >> >> Your TSSetRHSFunction() would make a call to DMGetLocalVector(...&localX), do a DMGlobalToLocalBegin/End() from the input global X to localX, you would call DMDAVecGetArray(...,&xarray) on the localX and access all three values in xarray. The resulting computation of f the output vector would be something like >> >> if (rank == 0) { >>> farray[0] = your code that can use xarray[0], xarray[1], xarray[2] >> } else if (rank == 1) { >>> farray[1] = your code that can use xarray[0], xarray[1], xarray[2] >> } else { >>> farray[2] = your code that can use xarray[0], xarray[1], xarray[2] >> } >> >> There are many examples of this pattern in the example tutorials. >> >> When you implement a code with a spatial distribution you would use a dof of 3 at each point and not parallelize over the dof at each point. Likely you want to use DMNETWORK to manage the spatial distribution since it has a simple API and allows any number of different number of neighbors for each point. DMDA would not make sense for true spatial distribution except in some truly trivial neighbor configurations. >> >> Barry >> >> >> >>> >>> I am not sure whether I correctly understood this command properly. The vector should have 3 components (S, I, R) and 3 DOF as it is defined only when the three coordinates have been set. >>> Then I create a global vector X. When I set the initial conditions as below >>> >>> static PetscErrorCode InitialConditions(TS ts,Vec X, void *ctx) >>> { >>> PetscErrorCode ierr; >>> AppCtx *appctx = (AppCtx*) ctx; >>> PetscScalar *x; >>> DM da; >>> >>> PetscFunctionBeginUser; >>> ierr = TSGetDM(ts,&da);CHKERRQ(ierr); >>> >>> /* Get pointers to vector data */ >>> ierr = DMDAVecGetArray(da,X,(void*)&x);CHKERRQ(ierr); >>> >>> x[0] = appctx->N - appctx->p[2]; >>> x[1] = appctx->p[2]; >>> x[2] = 0.0; >>> >>> ierr = DMDAVecRestoreArray(da,X,(void*)&x);CHKERRQ(ierr); >>> PetscFunctionReturn(0); >>> } >>> >>> I have the error: >>> >>> [0]PETSC ERROR: Invalid argument >>> [0]PETSC ERROR: Wrong subtype object:Parameter # 1 must have implementation da it is shell >>> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>> [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown >>> [0]PETSC ERROR: ./par_sir_model on a arch-debug named srvulx13 by fbrarda Thu Apr 29 09:36:17 2021 >>> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc --download-mpich PETSC_ARCH=arch-debug >>> [0]PETSC ERROR: #1 DMDAVecGetArray() line 48 in /home/fbrarda/petsc/src/dm/impls/da/dagetarray.c >>> [0]PETSC ERROR: #2 InitialConditions() line 175 in par_sir_model.c >>> [0]PETSC ERROR: #3 main() line 295 in par_sir_model.c >>> [0]PETSC ERROR: No PETSc Option Table entries >>> >>> I would be very happy to receive any advices to fix the code. >>> Best, >>> Francesco >>> >>>> Il giorno 20 apr 2021, alle ore 21:35, Matthew Knepley ha scritto: >>>> >>>> On Tue, Apr 20, 2021 at 1:17 PM Francesco Brarda wrote: >>>> Thank you for the advices, I would just like to convert the code I already have to see what might happen once parallelized. >>>> Do you think it is better to put the 3 equations into a 1d Distributed Array with 3 dofs and run the job with multiple procs regardless of how many equations I have? Is it possible? >>>> >>>> If you plan in the end to use a structured grid, this is a great plan. If not, this is not a good plan. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> Thank you, >>>> Francesco >>>> >>>>> Il giorno 20 apr 2021, alle ore 17:57, Stefano Zampini ha scritto: >>>>> >>>>> It does not make sense to parallelize to 1 equation per process, unless that single equation per process is super super super costly. >>>>> Is this work you are doing used to understand PETSc parallelization strategy? if so, there are multiple examples in the sourcetree that you can look at to populate matrices and vectors in parallel >>>>> >>>>> Il giorno mar 20 apr 2021 alle ore 17:52 Francesco Brarda ha scritto: >>>>> In principle the entire code was for 1 proc only. The functions were built with VecGetArray(). While adapting the code for multiple procs I thought using VecGetOwnershipRange was a possible way to allocate the equations in the vector using multiple procs. What do you think, please? >>>>> >>>>> Thank you, >>>>> Francesco >>>>> >>>>>> Il giorno 20 apr 2021, alle ore 16:43, Matthew Knepley ha scritto: >>>>>> >>>>>> On Tue, Apr 20, 2021 at 10:41 AM Francesco Brarda wrote: >>>>>> I was trying to follow Barry's advice some time ago, but I guess that's not the way he meant it. How should I refer to the values contained in x? With Distributed Arrays? >>>>>> >>>>>> That is how you get values from x. However, I cannot understand at all what you are doing with "mybase". >>>>>> >>>>>> Matt >>>>>> >>>>>> Thanks >>>>>> Francesco >>>>>> >>>>>>>> Even though it will not scale and will deliver slower performance it is completely possible for you to solve the 3 variable problem using 3 MPI ranks. Or 10 mpi ranks. You would just create vectors/matrices with 1 degree of freedom for the first three ranks and no degrees of freedom for the later ranks. During your function evaluation (and Jacobian evaluation) for TS you will need to set up the appropriate communication to get the values you need on each rank to evaluate the parts of the function evaluation needed by that rank. This is true for parallelizing any computation. >>>>>>>> >>>>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>>> Il giorno 20 apr 2021, alle ore 15:40, Matthew Knepley ha scritto: >>>>>>> >>>>>>> On Tue, Apr 20, 2021 at 9:36 AM Francesco Brarda wrote: >>>>>>> Hi! >>>>>>> I tried to implement the SIR model taking into account the fact that I will only use 3 MPI ranks at this moment. >>>>>>> I built vectors and matrices following the examples already available. In particular, I defined the functions required similarly (RHSFunction, IFunction, IJacobian), as follows: >>>>>>> >>>>>>> I don't think this makes sense. You use "mybase" to distinguish between 3 procs, which would indicate that each procs has only >>>>>>> 1 degree of freedom. However, you use x[1] on each proc, indicating it has at least 2 dofs. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> static PetscErrorCode RHSFunction(TS ts,PetscReal t,Vec X,Vec F,void *ctx) >>>>>>> { >>>>>>> PetscErrorCode ierr; >>>>>>> AppCtx *appctx = (AppCtx*) ctx; >>>>>>> PetscScalar f;//, *x_localptr; >>>>>>> const PetscScalar *x; >>>>>>> PetscInt mybase; >>>>>>> >>>>>>> PetscFunctionBeginUser; >>>>>>> ierr = VecGetOwnershipRange(X,&mybase,NULL);CHKERRQ(ierr); >>>>>>> ierr = VecGetArrayRead(X,&x);CHKERRQ(ierr); >>>>>>> if (mybase == 0) { >>>>>>> f = (PetscScalar) (-appctx->p1*x[0]*x[1]/appctx->N); >>>>>>> ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES); >>>>>>> } >>>>>>> if (mybase == 1) { >>>>>>> f = (PetscScalar) (appctx->p1*x[0]*x[1]/appctx->N-appctx->p2*x[1]); >>>>>>> ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES); >>>>>>> } >>>>>>> if (mybase == 2) { >>>>>>> f = (PetscScalar) (appctx->p2*x[1]); >>>>>>> ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES); >>>>>>> } >>>>>>> ierr = VecRestoreArrayRead(X,&x);CHKERRQ(ierr); >>>>>>> ierr = VecAssemblyBegin(F);CHKERRQ(ierr); >>>>>>> ierr = VecAssemblyEnd(F);CHKERRQ(ierr); >>>>>>> PetscFunctionReturn(0); >>>>>>> } >>>>>>> >>>>>>> >>>>>>> Whilst for the Jacobian I did: >>>>>>> >>>>>>> >>>>>>> static PetscErrorCode IJacobian(TS ts,PetscReal t,Vec X,Vec Xdot,PetscReal a,Mat A,Mat B,void *ctx) >>>>>>> { >>>>>>> PetscErrorCode ierr; >>>>>>> AppCtx *appctx = (AppCtx*) ctx; >>>>>>> PetscInt mybase, rowcol[] = {0,1,2}; >>>>>>> const PetscScalar *x; >>>>>>> >>>>>>> PetscFunctionBeginUser; >>>>>>> ierr = MatGetOwnershipRange(B,&mybase,NULL);CHKERRQ(ierr); >>>>>>> ierr = VecGetArrayRead(X,&x);CHKERRQ(ierr); >>>>>>> if (mybase == 0) { >>>>>>> const PetscScalar J[] = {a + appctx->p1*x[1]/appctx->N, appctx->p1*x[0]/appctx->N, 0}; >>>>>>> ierr = MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr); >>>>>>> } >>>>>>> if (mybase == 1) { >>>>>>> const PetscScalar J[] = {- appctx->p1*x[1]/appctx->N, a - appctx->p1*x[0]/appctx->N + appctx->p2, 0}; >>>>>>> ierr = MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr); >>>>>>> } >>>>>>> if (mybase == 2) { >>>>>>> const PetscScalar J[] = {0, - appctx->p2, a}; >>>>>>> ierr = MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr); >>>>>>> } >>>>>>> ierr = VecRestoreArrayRead(X,&x);CHKERRQ(ierr); >>>>>>> >>>>>>> ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>>>> ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>>>> if (A != B) { >>>>>>> ierr = MatAssemblyBegin(B,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>>>> ierr = MatAssemblyEnd(B,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>>>> } >>>>>>> PetscFunctionReturn(0); >>>>>>> } >>>>>>> >>>>>>> This code does not provide the correct result, that is, the solution is the initial condition, either using implicit or explicit methods. Is the way I defined these objects wrong? How can I fix it? >>>>>>> I also tried to print the Jacobian with the following commands but it does not work (blank rows and error message). How should I print the Jacobian? >>>>>>> >>>>>>> ierr = TSGetIJacobian(ts,NULL,&K, NULL, NULL); CHKERRQ(ierr); >>>>>>> ierr = MatAssemblyBegin(K,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>>>> ierr = MatAssemblyEnd(K,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>>>> ierr = MatView(K,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); >>>>>>> >>>>>>> I would very much appreciate any kind of help or advice. >>>>>>> Best, >>>>>>> Francesco >>>>>>> >>>>>>>> Il giorno 2 apr 2021, alle ore 04:45, Barry Smith ha scritto: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Apr 1, 2021, at 9:17 PM, Zhang, Hong via petsc-users wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Mar 31, 2021, at 2:53 AM, Francesco Brarda wrote: >>>>>>>>>> >>>>>>>>>> Hi everyone! >>>>>>>>>> >>>>>>>>>> I am trying to solve a system of 3 ODEs (a basic SIR model) with TS. Sequentially works pretty well, but I need to switch it into a parallel version. >>>>>>>>>> I started working with TS not very long time ago, there are few questions I?d like to share with you and if you have any advices I?d be happy to hear. >>>>>>>>>> First of all, do I need to use a DM object even if the model is only time dependent? All the examples I found were using that object for the other variable when solving PDEs. >>>>>>>>> >>>>>>>>> Are you considering SIR on a spatial domain? If so, you can parallelize your model in the spatial domain using DM. Splitting the three variables in the ODE among processors would not scale. >>>>>>>> >>>>>>>> Even though it will not scale and will deliver slower performance it is completely possible for you to solve the 3 variable problem using 3 MPI ranks. Or 10 mpi ranks. You would just create vectors/matrices with 1 degree of freedom for the first three ranks and no degrees of freedom for the later ranks. During your function evaluation (and Jacobian evaluation) for TS you will need to set up the appropriate communication to get the values you need on each rank to evaluate the parts of the function evaluation needed by that rank. This is true for parallelizing any computation. >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Hong (Mr.) >>>>>>>>> >>>>>>>>>> When I preallocate the space for the Jacobian matrix, is it better to decide the local or global space? >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Francesco >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>>> >>>>> -- >>>>> Stefano >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue May 4 11:55:34 2021 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 4 May 2021 11:55:34 -0500 Subject: [petsc-users] MatDestroy problem with multiple matrices and SUPERLU_DIST In-Reply-To: <5f24b5b1df3141e4a0c82de8dcca675a@MAR190n2.marin.local> References: <23eee78b7748418d948e850c50fefb5f@MAR190n2.marin.local> <6562389d4d694d92a74b3add0cbcc823@MAR190n2.marin.local> <5f24b5b1df3141e4a0c82de8dcca675a@MAR190n2.marin.local> Message-ID: <5F4F1847-8C31-448F-B60E-980F4B4DCDAE@petsc.dev> Are you using complex numbers? > On May 4, 2021, at 2:55 AM, Deij-van Rijswijk, Menno wrote: > > > Hi Barry, > > Thank you for this message about finalisation. I have checked that PetscFinalize is called after the problematic call to MatDestroy, and that is indeed the case. Furthermore, the module does not use "final". > > Menno > > > dr. ir. Menno A. Deij-van Rijswijk | Researcher | Research & Development > MARIN | T +31 317 49 35 06 | M.Deij at marin.nl | www.marin.nl > > > MARIN news: Working paper on the Design of the Wageningen F-series > > > > From: Barry Smith > > Sent: Sunday, May 2, 2021 6:30 PM > To: Deij-van Rijswijk, Menno > > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] MatDestroy problem with multiple matrices and SUPERLU_DIST > > > ==1026905== by 0x5317899: MatDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x5336E58: matdestroy_ (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x1528710: __fsi_MOD_fem_constructmatricespetscexit (fsi.F90:2297) > ==1026905== Address 0x2ce67398 is 11,112 bytes inside an unallocated block of size 11,232 in arena "client" > > Is it possible that this __fsi_MOD_fem_constructmatricespetscexit is being called AFTER PetscFinalize()? Perhaps it is defined with a "final" and the compiler/linker schedule it to be called after the program has "completed". > > This would explain the crash, the valgrind stack frames and why it even does not crash with MPICH. This can happen with C++ destructors in code such as > > MyC++Class my; <-- has a destructor that destroys PETSc objects > PetscInitialize() > .... > PetscFinalize() > <-- the destructor gets called here and messes with MPI data that no longer exists. > return 0; > } > > The fix is to force the destructor to be called before PETSc finalize and this can be done with > > PetscInitialize() > { > MyC++Class my; <-- has a destructor that destroys PETSc objects > .... > <-- the destructor gets called here and everything is fine > } > PetscFinalize() > return 0; > } > > I don't know the details of how Fortran's final is implemented but this is my current guess as to what is happening in your code and you need to somehow arrange for the module final to be called before PetscFinalize(). > > Barry > > > On Apr 28, 2021, at 7:22 AM, Deij-van Rijswijk, Menno > wrote: > > > The modules have automatic freeing in as much as that when a variable that is local to a subroutine is ALLOCATE'd, it is automatically freed when the subroutine returns. I don't think that is problematic, as MatDestroy is used a lot in the code and normally executes just fine. > > As far as I can see, no specific new communicators are created; MatCreateAIJ or MatCreateSeqAIJ are called with PETSC_COMM_WORLD, resp. PETSC_COMM_SELF as first argument. > > We also run this with the Intel MPI library, which is based on MPICH. There this problem does not occur. > > The Valgrind run did not produce any new insights (at least not for me), I have pasted the relevant bits at the end of this message. I did a run on debug versions of PETSc (v3.14.5) and OpenMPI (v 3.1.2) and I find the following stack trace with line numbers for each frame. Maybe that helps in further pinpointing the problem. > > 0x0000155540d11719 in ompi_comm_free (comm=0x483f4e0) at /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/communicator/comm.c:1470 > 1470 if ( ! OMPI_COMM_IS_INTRINSIC((*comm)->c_local_comm)) { > Missing separate debuginfos, use: yum debuginfo-install libgcc-8.3.1-5.el8.0.2.x86_64 libgfortran-8.3.1-5.el8.0.2.x86_64 libibumad-47mlnx1-1.47329.x86_64 libibverbs-47mlnx1-1.47329.x86_64 libnl3-3.5.0-1.el8.x86_64 libquadmath-8.3.1-5.el8.0.2.x86_64 librdmacm-47mlnx1-1.47329.x86_64 libstdc++-8.3.1-5.el8.0.2.x86_64 libxml2-2.9.7-7.el8.x86_64 numactl-libs-2.0.12-9.el8.x86_64 opensm-libs-5.5.1.MLNX20191120.0c8dde0-0.1.47329.x86_64 openssl-libs-1.1.1c-15.el8.x86_64 python3-libs-3.6.8-23.el8.x86_64 sssd-client-2.2.3-20.el8.x86_64 ucx-cma-1.7.0-1.47329.x86_64 ucx-ib-1.7.0-1.47329.x86_64 xz-libs-5.2.4-3.el8.x86_64 zlib-1.2.11-16.el8_2.x86_64 > (gdb) bt > #0 0x0000155540d11719 in ompi_comm_free (comm=0x483f4e0) at /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/communicator/comm.c:1470 > #1 0x0000155540d4f1af in PMPI_Comm_free (comm=0x483f4e0) at pcomm_free.c:62 > #2 0x000015555346329a in superlu_gridexit (grid=0x483f4e0) at /home/mdeij/install-gnu/extLibs/Linux-x86_64-Intel/superlu_dist-6.3.0/SRC/superlu_grid.c:174 > #3 0x0000155553ca2ff1 in Petsc_Superlu_dist_keyval_Delete_Fn (comm=0x3921b10, keyval=16, attr_val=0x483f4d0, extra_state=0x0) at /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:97 > #4 0x0000155540d0baa1 in ompi_attr_delete_impl (type=COMM_ATTR, object=0x3921b10, attr_hash=0x377efe0, key=16, predefined=true) at /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/attribute/attribute.c:1062 > #5 0x0000155540d0c039 in ompi_attr_delete_all (type=COMM_ATTR, object=0x3921b10, attr_hash=0x377efe0) at /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/attribute/attribute.c:1166 > #6 0x0000155540d11676 in ompi_comm_free (comm=0x7fffffffc5c0) at /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/communicator/comm.c:1462 > #7 0x0000155540d4f1af in PMPI_Comm_free (comm=0x7fffffffc5c0) at pcomm_free.c:62 > #8 0x000015555393fb68 in PetscCommDestroy (comm=0x3943a60) at /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/sys/objects/tagm.c:217 > #9 0x0000155553941e07 in PetscHeaderDestroy_Private (h=0x3943a20) at /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/sys/objects/inherit.c:121 > #10 0x000015555408edfe in MatDestroy (A=0x3558c18) at /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/mat/interface/matrix.c:1306 > #11 0x00001555540cb5fa in matdestroy_ (A=0x3558c18, __ierr=0x7fffffffc73c) at /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/mat/interface/ftn-auto/matrixf.c:770 > > Valgrind output: > > ==1026905== Invalid read of size 1 > ==1026905== at 0x19184538: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x6943B61: superlu_gridexit (in /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0) > ==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x1912447B: ompi_attr_delete_impl (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19126FFE: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x4FEE49D: PetscCommDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x5317899: MatDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x5336E58: matdestroy_ (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x1528710: __fsi_MOD_fem_constructmatricespetscexit (fsi.F90:2297) > ==1026905== Address 0x2ce67398 is 11,112 bytes inside an unallocated block of size 11,232 in arena "client" > ==1026905== > ==1026905== Invalid read of size 8 > ==1026905== at 0x1912AC9A: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x6943B61: superlu_gridexit (in /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0) > ==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x1912447B: ompi_attr_delete_impl (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19126FFE: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x4FEE49D: PetscCommDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x5317899: MatDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x5336E58: matdestroy_ (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== Address 0x2ce673c0 is 11,152 bytes inside an unallocated block of size 11,232 in arena "client" > ==1026905== > ==1026905== Invalid read of size 8 > ==1026905== at 0x19126E5B: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x6943B61: superlu_gridexit (in /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0) > ==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x1912447B: ompi_attr_delete_impl (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19126FFE: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x4FEE49D: PetscCommDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x5317899: MatDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== Address 0x91 is not stack'd, malloc'd or (recently) free'd > ==1026905== > ==1026905== > ==1026905== Process terminating with default action of signal 11 (SIGSEGV) > ==1026905== Access not within mapped region at address 0x91 > ==1026905== at 0x19126E5B: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x6943B61: superlu_gridexit (in /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0) > ==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x1912447B: ompi_attr_delete_impl (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19126FFE: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x4FEE49D: PetscCommDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x5317899: MatDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== If you believe this happened as a result of a stack > ==1026905== overflow in your program's main thread (unlikely but > ==1026905== possible), you can try to increase the size of the > ==1026905== main thread stack using the --main-stacksize= flag. > ==1026905== The main thread stack size used in this run was 16777216. > > > dr. ir. Menno A. Deij-van Rijswijk | Researcher | Research & Development > MARIN | T +31 317 49 35 06 | M.Deij at marin.nl | www.marin.nl > > > MARIN news: WASP webinar & WiSP workshop > > > > From: Barry Smith > > Sent: Friday, April 23, 2021 7:09 PM > To: Deij-van Rijswijk, Menno > > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] MatDestroy problem with multiple matrices and SUPERLU_DIST > > > Thanks for looking. Do these modules have any "automatic freeing" when variables go out of scope (like C++ classes do)? > > Do you make specific new MPI communicators to use create the matrices? > > Have you tried MPICH or a different version of OpenMPI. > > Maybe run the program with valgrind. The stack frames you sent look "funny", that is I would not normally expect them to be in such an order. > > Barry > > > > Help us improve the spam filter. If this message contains SPAM, click here to report. Thank you, MARIN Digital Services > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brardafrancesco at gmail.com Wed May 5 04:12:01 2021 From: brardafrancesco at gmail.com (Francesco Brarda) Date: Wed, 5 May 2021 11:12:01 +0200 Subject: [petsc-users] Parallel TS for ODE In-Reply-To: References: <278D238E-4519-4943-BC8D-607CF62F0025@gmail.com> <8D806473-44A4-4FA6-A119-6716F6C1C929@anl.gov> <485566D7-83DB-4EA9-87D4-2195F50E2D82@gmail.com> <9C3E8A29-C627-4A2C-B56D-2DFE9C1516D5@gmail.com> <00DC179D-1E7B-4ED7-B64B-A074882A5D34@gmail.com> <25CBF44D-2739-4748-97EF-A2007802B780@petsc.dev> <451AA1F5-07E4-4704-BA7F-3A1C5175DEE4@gmail.com> Message-ID: <9ED2AF52-AEA9-4FBB-8B7F-793F6C109EF8@gmail.com> Many thanks, this helped a lot. Now it finally works. Best, Francesco > Il giorno 4 mag 2021, alle ore 17:40, Barry Smith ha scritto: > > > >> On May 4, 2021, at 2:53 AM, Francesco Brarda > wrote: >> >> Thank you very much everyone. I do have one more question. >>> For what you want to do you can use 3,1,2. This says three "spatial" points, 1 dof at each "spatial" point and 2 ghost points. In your case "spatial" does not mean spatial in space it is just three abstract points. >> In this case, since I have 2 ghost points, should I change the DMBoundaryType? >> For one process this works, but with 2 or 3 procs it doesn?t. The error I have is the following: >> >> Solving a non-linear TS problem, number of processors = 2 >> [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [1]PETSC ERROR: Argument out of range >> [1]PETSC ERROR: Local x-width of domain x 1 is smaller than stencil width s 2 > > I had forgotten this. It is a limitation of the DMDA implementation that one cannot require data from two ranks away from the current rank. > > But yes perhaps you can "cheat" by using DM_BOUNDARY_PERIODIC and a stencil width of 1. For each rank it will bring the value from the previous and next rank and due to the periodicity it will thus bring the 3rd value to the first rank and the first value to the third rank. > > Barry > > > >> [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [1]PETSC ERROR: Petsc Release Version 3.14.4, unknown >> [1]PETSC ERROR: ./test_ic on a arch-debug named srvulx13 by fbrarda Mon May 3 17:46:37 2021 >> [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc --download-mpich PETSC_ARCH=arch-debug >> [1]PETSC ERROR: #1 DMSetUp_DA_1D() line 199 in /home/fbrarda/petsc/src/dm/impls/da/da1.c >> [1]PETSC ERROR: #2 DMSetUp_DA() line 20 in /home/fbrarda/petsc/src/dm/impls/da/dareg.c >> [1]PETSC ERROR: #3 DMSetUp() line 787 in /home/fbrarda/petsc/src/dm/interface/dm.c >> [1]PETSC ERROR: #4 main() line 232 in test_ic.c >> [1]PETSC ERROR: No PETSc Option Table entries >> [1]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov ---------- >> >>> Il giorno 2 mag 2021, alle ore 18:54, Barry Smith > ha scritto: >>> >>> >>> [0]PETSC ERROR: Wrong subtype object:Parameter # 1 must have implementation da it is shell >>> >>> Are you calling TSSetDM() to supply your created DMDA to the TS? Based on the error message you are not, it is using a default shell DM, which is what TS does if you do not provide them with a DM. You need to call TSSetDM() after you create the TS and the DMDA. >>> >>> >>>> On Apr 29, 2021, at 2:57 AM, Francesco Brarda > wrote: >>>> >>>> I defined the DM as follows >>>> ierr = DMDACreate1d(PETSC_COMM_WORLD,DM_BOUNDARY_NONE,3,3,3,NULL,&da);CHKERRQ(ierr); >>> >>> If you truly want one "spatial" point then you would want to use 1,3,0. This says 1 location in space, three degrees of freedom at that point and 0 ghost values (since there is only one spatial point there can be no ghost spatial values). >>> >>> BUT DMDA ALWAYS puts all degree of freedom at a point on the same process so this will not give you parallelism. All 3 dof will be on the same MPI rank. >>> >>> For what you want to do you can use 3,1,2. This says three "spatial" points, 1 dof at each "spatial" point and 2 ghost points. In your case "spatial" does not mean spatial in space it is just three abstract points. >>> >>> The global vectors (from DMCreate or Get GlobalVector) will have 1 value on each rank. The local vector from DMCreate or Get LocalVector) will have three values on each rank. Your initial conditions code would be something like >>> >>> if (rank == 0) { >>>> x[0] = appctx->N - appctx->p[2]; >>> } else if (rank == 1) { >>>> x[1] = appctx->p[2]; >>> } else { >>>> x[2] = 0.0; >>> } >>> >>> >>> Your TSSetRHSFunction() would make a call to DMGetLocalVector(...&localX), do a DMGlobalToLocalBegin/End() from the input global X to localX, you would call DMDAVecGetArray(...,&xarray) on the localX and access all three values in xarray. The resulting computation of f the output vector would be something like >>> >>> if (rank == 0) { >>>> farray[0] = your code that can use xarray[0], xarray[1], xarray[2] >>> } else if (rank == 1) { >>>> farray[1] = your code that can use xarray[0], xarray[1], xarray[2] >>> } else { >>>> farray[2] = your code that can use xarray[0], xarray[1], xarray[2] >>> } >>> >>> There are many examples of this pattern in the example tutorials. >>> >>> When you implement a code with a spatial distribution you would use a dof of 3 at each point and not parallelize over the dof at each point. Likely you want to use DMNETWORK to manage the spatial distribution since it has a simple API and allows any number of different number of neighbors for each point. DMDA would not make sense for true spatial distribution except in some truly trivial neighbor configurations. >>> >>> Barry >>> >>> >>> >>>> >>>> I am not sure whether I correctly understood this command properly. The vector should have 3 components (S, I, R) and 3 DOF as it is defined only when the three coordinates have been set. >>>> Then I create a global vector X. When I set the initial conditions as below >>>> >>>> static PetscErrorCode InitialConditions(TS ts,Vec X, void *ctx) >>>> { >>>> PetscErrorCode ierr; >>>> AppCtx *appctx = (AppCtx*) ctx; >>>> PetscScalar *x; >>>> DM da; >>>> >>>> PetscFunctionBeginUser; >>>> ierr = TSGetDM(ts,&da);CHKERRQ(ierr); >>>> >>>> /* Get pointers to vector data */ >>>> ierr = DMDAVecGetArray(da,X,(void*)&x);CHKERRQ(ierr); >>>> >>>> x[0] = appctx->N - appctx->p[2]; >>>> x[1] = appctx->p[2]; >>>> x[2] = 0.0; >>>> >>>> ierr = DMDAVecRestoreArray(da,X,(void*)&x);CHKERRQ(ierr); >>>> PetscFunctionReturn(0); >>>> } >>>> >>>> I have the error: >>>> >>>> [0]PETSC ERROR: Invalid argument >>>> [0]PETSC ERROR: Wrong subtype object:Parameter # 1 must have implementation da it is shell >>>> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>> [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown >>>> [0]PETSC ERROR: ./par_sir_model on a arch-debug named srvulx13 by fbrarda Thu Apr 29 09:36:17 2021 >>>> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc --download-mpich PETSC_ARCH=arch-debug >>>> [0]PETSC ERROR: #1 DMDAVecGetArray() line 48 in /home/fbrarda/petsc/src/dm/impls/da/dagetarray.c >>>> [0]PETSC ERROR: #2 InitialConditions() line 175 in par_sir_model.c >>>> [0]PETSC ERROR: #3 main() line 295 in par_sir_model.c >>>> [0]PETSC ERROR: No PETSc Option Table entries >>>> >>>> I would be very happy to receive any advices to fix the code. >>>> Best, >>>> Francesco >>>> >>>>> Il giorno 20 apr 2021, alle ore 21:35, Matthew Knepley > ha scritto: >>>>> >>>>> On Tue, Apr 20, 2021 at 1:17 PM Francesco Brarda > wrote: >>>>> Thank you for the advices, I would just like to convert the code I already have to see what might happen once parallelized. >>>>> Do you think it is better to put the 3 equations into a 1d Distributed Array with 3 dofs and run the job with multiple procs regardless of how many equations I have? Is it possible? >>>>> >>>>> If you plan in the end to use a structured grid, this is a great plan. If not, this is not a good plan. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> Thank you, >>>>> Francesco >>>>> >>>>>> Il giorno 20 apr 2021, alle ore 17:57, Stefano Zampini > ha scritto: >>>>>> >>>>>> It does not make sense to parallelize to 1 equation per process, unless that single equation per process is super super super costly. >>>>>> Is this work you are doing used to understand PETSc parallelization strategy? if so, there are multiple examples in the sourcetree that you can look at to populate matrices and vectors in parallel >>>>>> >>>>>> Il giorno mar 20 apr 2021 alle ore 17:52 Francesco Brarda > ha scritto: >>>>>> In principle the entire code was for 1 proc only. The functions were built with VecGetArray(). While adapting the code for multiple procs I thought using VecGetOwnershipRange was a possible way to allocate the equations in the vector using multiple procs. What do you think, please? >>>>>> >>>>>> Thank you, >>>>>> Francesco >>>>>> >>>>>>> Il giorno 20 apr 2021, alle ore 16:43, Matthew Knepley > ha scritto: >>>>>>> >>>>>>> On Tue, Apr 20, 2021 at 10:41 AM Francesco Brarda > wrote: >>>>>>> I was trying to follow Barry's advice some time ago, but I guess that's not the way he meant it. How should I refer to the values contained in x? With Distributed Arrays? >>>>>>> >>>>>>> That is how you get values from x. However, I cannot understand at all what you are doing with "mybase". >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> Thanks >>>>>>> Francesco >>>>>>> >>>>>>>>> Even though it will not scale and will deliver slower performance it is completely possible for you to solve the 3 variable problem using 3 MPI ranks. Or 10 mpi ranks. You would just create vectors/matrices with 1 degree of freedom for the first three ranks and no degrees of freedom for the later ranks. During your function evaluation (and Jacobian evaluation) for TS you will need to set up the appropriate communication to get the values you need on each rank to evaluate the parts of the function evaluation needed by that rank. This is true for parallelizing any computation. >>>>>>>>> >>>>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Il giorno 20 apr 2021, alle ore 15:40, Matthew Knepley > ha scritto: >>>>>>>> >>>>>>>> On Tue, Apr 20, 2021 at 9:36 AM Francesco Brarda > wrote: >>>>>>>> Hi! >>>>>>>> I tried to implement the SIR model taking into account the fact that I will only use 3 MPI ranks at this moment. >>>>>>>> I built vectors and matrices following the examples already available. In particular, I defined the functions required similarly (RHSFunction, IFunction, IJacobian), as follows: >>>>>>>> >>>>>>>> I don't think this makes sense. You use "mybase" to distinguish between 3 procs, which would indicate that each procs has only >>>>>>>> 1 degree of freedom. However, you use x[1] on each proc, indicating it has at least 2 dofs. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>> static PetscErrorCode RHSFunction(TS ts,PetscReal t,Vec X,Vec F,void *ctx) >>>>>>>> { >>>>>>>> PetscErrorCode ierr; >>>>>>>> AppCtx *appctx = (AppCtx*) ctx; >>>>>>>> PetscScalar f;//, *x_localptr; >>>>>>>> const PetscScalar *x; >>>>>>>> PetscInt mybase; >>>>>>>> >>>>>>>> PetscFunctionBeginUser; >>>>>>>> ierr = VecGetOwnershipRange(X,&mybase,NULL);CHKERRQ(ierr); >>>>>>>> ierr = VecGetArrayRead(X,&x);CHKERRQ(ierr); >>>>>>>> if (mybase == 0) { >>>>>>>> f = (PetscScalar) (-appctx->p1*x[0]*x[1]/appctx->N); >>>>>>>> ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES); >>>>>>>> } >>>>>>>> if (mybase == 1) { >>>>>>>> f = (PetscScalar) (appctx->p1*x[0]*x[1]/appctx->N-appctx->p2*x[1]); >>>>>>>> ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES); >>>>>>>> } >>>>>>>> if (mybase == 2) { >>>>>>>> f = (PetscScalar) (appctx->p2*x[1]); >>>>>>>> ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES); >>>>>>>> } >>>>>>>> ierr = VecRestoreArrayRead(X,&x);CHKERRQ(ierr); >>>>>>>> ierr = VecAssemblyBegin(F);CHKERRQ(ierr); >>>>>>>> ierr = VecAssemblyEnd(F);CHKERRQ(ierr); >>>>>>>> PetscFunctionReturn(0); >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> Whilst for the Jacobian I did: >>>>>>>> >>>>>>>> >>>>>>>> static PetscErrorCode IJacobian(TS ts,PetscReal t,Vec X,Vec Xdot,PetscReal a,Mat A,Mat B,void *ctx) >>>>>>>> { >>>>>>>> PetscErrorCode ierr; >>>>>>>> AppCtx *appctx = (AppCtx*) ctx; >>>>>>>> PetscInt mybase, rowcol[] = {0,1,2}; >>>>>>>> const PetscScalar *x; >>>>>>>> >>>>>>>> PetscFunctionBeginUser; >>>>>>>> ierr = MatGetOwnershipRange(B,&mybase,NULL);CHKERRQ(ierr); >>>>>>>> ierr = VecGetArrayRead(X,&x);CHKERRQ(ierr); >>>>>>>> if (mybase == 0) { >>>>>>>> const PetscScalar J[] = {a + appctx->p1*x[1]/appctx->N, appctx->p1*x[0]/appctx->N, 0}; >>>>>>>> ierr = MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr); >>>>>>>> } >>>>>>>> if (mybase == 1) { >>>>>>>> const PetscScalar J[] = {- appctx->p1*x[1]/appctx->N, a - appctx->p1*x[0]/appctx->N + appctx->p2, 0}; >>>>>>>> ierr = MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr); >>>>>>>> } >>>>>>>> if (mybase == 2) { >>>>>>>> const PetscScalar J[] = {0, - appctx->p2, a}; >>>>>>>> ierr = MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr); >>>>>>>> } >>>>>>>> ierr = VecRestoreArrayRead(X,&x);CHKERRQ(ierr); >>>>>>>> >>>>>>>> ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>>>>> ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>>>>> if (A != B) { >>>>>>>> ierr = MatAssemblyBegin(B,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>>>>> ierr = MatAssemblyEnd(B,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>>>>> } >>>>>>>> PetscFunctionReturn(0); >>>>>>>> } >>>>>>>> >>>>>>>> This code does not provide the correct result, that is, the solution is the initial condition, either using implicit or explicit methods. Is the way I defined these objects wrong? How can I fix it? >>>>>>>> I also tried to print the Jacobian with the following commands but it does not work (blank rows and error message). How should I print the Jacobian? >>>>>>>> >>>>>>>> ierr = TSGetIJacobian(ts,NULL,&K, NULL, NULL); CHKERRQ(ierr); >>>>>>>> ierr = MatAssemblyBegin(K,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>>>>> ierr = MatAssemblyEnd(K,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>>>>>>> ierr = MatView(K,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); >>>>>>>> >>>>>>>> I would very much appreciate any kind of help or advice. >>>>>>>> Best, >>>>>>>> Francesco >>>>>>>> >>>>>>>>> Il giorno 2 apr 2021, alle ore 04:45, Barry Smith > ha scritto: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Apr 1, 2021, at 9:17 PM, Zhang, Hong via petsc-users > wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Mar 31, 2021, at 2:53 AM, Francesco Brarda > wrote: >>>>>>>>>>> >>>>>>>>>>> Hi everyone! >>>>>>>>>>> >>>>>>>>>>> I am trying to solve a system of 3 ODEs (a basic SIR model) with TS. Sequentially works pretty well, but I need to switch it into a parallel version. >>>>>>>>>>> I started working with TS not very long time ago, there are few questions I?d like to share with you and if you have any advices I?d be happy to hear. >>>>>>>>>>> First of all, do I need to use a DM object even if the model is only time dependent? All the examples I found were using that object for the other variable when solving PDEs. >>>>>>>>>> >>>>>>>>>> Are you considering SIR on a spatial domain? If so, you can parallelize your model in the spatial domain using DM. Splitting the three variables in the ODE among processors would not scale. >>>>>>>>> >>>>>>>>> Even though it will not scale and will deliver slower performance it is completely possible for you to solve the 3 variable problem using 3 MPI ranks. Or 10 mpi ranks. You would just create vectors/matrices with 1 degree of freedom for the first three ranks and no degrees of freedom for the later ranks. During your function evaluation (and Jacobian evaluation) for TS you will need to set up the appropriate communication to get the values you need on each rank to evaluate the parts of the function evaluation needed by that rank. This is true for parallelizing any computation. >>>>>>>>> >>>>>>>>> Barry >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hong (Mr.) >>>>>>>>>> >>>>>>>>>>> When I preallocate the space for the Jacobian matrix, is it better to decide the local or global space? >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Francesco >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Stefano >>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From M.Deij at marin.nl Thu May 6 01:25:29 2021 From: M.Deij at marin.nl (Deij-van Rijswijk, Menno) Date: Thu, 6 May 2021 06:25:29 +0000 Subject: [petsc-users] MatDestroy problem with multiple matrices and SUPERLU_DIST In-Reply-To: <5F4F1847-8C31-448F-B60E-980F4B4DCDAE@petsc.dev> References: <23eee78b7748418d948e850c50fefb5f@MAR190n2.marin.local> <6562389d4d694d92a74b3add0cbcc823@MAR190n2.marin.local> <5f24b5b1df3141e4a0c82de8dcca675a@MAR190n2.marin.local> <5F4F1847-8C31-448F-B60E-980F4B4DCDAE@petsc.dev> Message-ID: No, no complex numbers are used. dr. ir. Menno A. Deij-van Rijswijk | Researcher | Research & Development MARIN | T +31 317 49 35 06 | M.Deij at marin.nl | www.marin.nl [LinkedIn] [YouTube] [Twitter] [Facebook] MARIN news: WASP webinar & WiSP workshop From: Barry Smith Sent: Tuesday, May 4, 2021 6:56 PM To: Deij-van Rijswijk, Menno Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] MatDestroy problem with multiple matrices and SUPERLU_DIST Are you using complex numbers? On May 4, 2021, at 2:55 AM, Deij-van Rijswijk, Menno > wrote: Hi Barry, Thank you for this message about finalisation. I have checked that PetscFinalize is called after the problematic call to MatDestroy, and that is indeed the case. Furthermore, the module does not use "final". Menno dr. ir. Menno A. Deij-van Rijswijk | Researcher | Research & Development MARIN | T +31 317 49 35 06 | M.Deij at marin.nl | www.marin.nl MARIN news: Working paper on the Design of the Wageningen F-series From: Barry Smith > Sent: Sunday, May 2, 2021 6:30 PM To: Deij-van Rijswijk, Menno > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] MatDestroy problem with multiple matrices and SUPERLU_DIST ==1026905== by 0x5317899: MatDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x5336E58: matdestroy_ (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x1528710: __fsi_MOD_fem_constructmatricespetscexit (fsi.F90:2297) ==1026905== Address 0x2ce67398 is 11,112 bytes inside an unallocated block of size 11,232 in arena "client" Is it possible that this __fsi_MOD_fem_constructmatricespetscexit is being called AFTER PetscFinalize()? Perhaps it is defined with a "final" and the compiler/linker schedule it to be called after the program has "completed". This would explain the crash, the valgrind stack frames and why it even does not crash with MPICH. This can happen with C++ destructors in code such as MyC++Class my; <-- has a destructor that destroys PETSc objects PetscInitialize() .... PetscFinalize() <-- the destructor gets called here and messes with MPI data that no longer exists. return 0; } The fix is to force the destructor to be called before PETSc finalize and this can be done with PetscInitialize() { MyC++Class my; <-- has a destructor that destroys PETSc objects .... <-- the destructor gets called here and everything is fine } PetscFinalize() return 0; } I don't know the details of how Fortran's final is implemented but this is my current guess as to what is happening in your code and you need to somehow arrange for the module final to be called before PetscFinalize(). Barry On Apr 28, 2021, at 7:22 AM, Deij-van Rijswijk, Menno > wrote: The modules have automatic freeing in as much as that when a variable that is local to a subroutine is ALLOCATE'd, it is automatically freed when the subroutine returns. I don't think that is problematic, as MatDestroy is used a lot in the code and normally executes just fine. As far as I can see, no specific new communicators are created; MatCreateAIJ or MatCreateSeqAIJ are called with PETSC_COMM_WORLD, resp. PETSC_COMM_SELF as first argument. We also run this with the Intel MPI library, which is based on MPICH. There this problem does not occur. The Valgrind run did not produce any new insights (at least not for me), I have pasted the relevant bits at the end of this message. I did a run on debug versions of PETSc (v3.14.5) and OpenMPI (v 3.1.2) and I find the following stack trace with line numbers for each frame. Maybe that helps in further pinpointing the problem. 0x0000155540d11719 in ompi_comm_free (comm=0x483f4e0) at /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/communicator/comm.c:1470 1470 if ( ! OMPI_COMM_IS_INTRINSIC((*comm)->c_local_comm)) { Missing separate debuginfos, use: yum debuginfo-install libgcc-8.3.1-5.el8.0.2.x86_64 libgfortran-8.3.1-5.el8.0.2.x86_64 libibumad-47mlnx1-1.47329.x86_64 libibverbs-47mlnx1-1.47329.x86_64 libnl3-3.5.0-1.el8.x86_64 libquadmath-8.3.1-5.el8.0.2.x86_64 librdmacm-47mlnx1-1.47329.x86_64 libstdc++-8.3.1-5.el8.0.2.x86_64 libxml2-2.9.7-7.el8.x86_64 numactl-libs-2.0.12-9.el8.x86_64 opensm-libs-5.5.1.MLNX20191120.0c8dde0-0.1.47329.x86_64 openssl-libs-1.1.1c-15.el8.x86_64 python3-libs-3.6.8-23.el8.x86_64 sssd-client-2.2.3-20.el8.x86_64 ucx-cma-1.7.0-1.47329.x86_64 ucx-ib-1.7.0-1.47329.x86_64 xz-libs-5.2.4-3.el8.x86_64 zlib-1.2.11-16.el8_2.x86_64 (gdb) bt #0 0x0000155540d11719 in ompi_comm_free (comm=0x483f4e0) at /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/communicator/comm.c:1470 #1 0x0000155540d4f1af in PMPI_Comm_free (comm=0x483f4e0) at pcomm_free.c:62 #2 0x000015555346329a in superlu_gridexit (grid=0x483f4e0) at /home/mdeij/install-gnu/extLibs/Linux-x86_64-Intel/superlu_dist-6.3.0/SRC/superlu_grid.c:174 #3 0x0000155553ca2ff1 in Petsc_Superlu_dist_keyval_Delete_Fn (comm=0x3921b10, keyval=16, attr_val=0x483f4d0, extra_state=0x0) at /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:97 #4 0x0000155540d0baa1 in ompi_attr_delete_impl (type=COMM_ATTR, object=0x3921b10, attr_hash=0x377efe0, key=16, predefined=true) at /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/attribute/attribute.c:1062 #5 0x0000155540d0c039 in ompi_attr_delete_all (type=COMM_ATTR, object=0x3921b10, attr_hash=0x377efe0) at /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/attribute/attribute.c:1166 #6 0x0000155540d11676 in ompi_comm_free (comm=0x7fffffffc5c0) at /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/communicator/comm.c:1462 #7 0x0000155540d4f1af in PMPI_Comm_free (comm=0x7fffffffc5c0) at pcomm_free.c:62 #8 0x000015555393fb68 in PetscCommDestroy (comm=0x3943a60) at /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/sys/objects/tagm.c:217 #9 0x0000155553941e07 in PetscHeaderDestroy_Private (h=0x3943a20) at /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/sys/objects/inherit.c:121 #10 0x000015555408edfe in MatDestroy (A=0x3558c18) at /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/mat/interface/matrix.c:1306 #11 0x00001555540cb5fa in matdestroy_ (A=0x3558c18, __ierr=0x7fffffffc73c) at /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/mat/interface/ftn-auto/matrixf.c:770 Valgrind output: ==1026905== Invalid read of size 1 ==1026905== at 0x19184538: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x6943B61: superlu_gridexit (in /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0) ==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x1912447B: ompi_attr_delete_impl (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x19126FFE: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x4FEE49D: PetscCommDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x5317899: MatDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x5336E58: matdestroy_ (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x1528710: __fsi_MOD_fem_constructmatricespetscexit (fsi.F90:2297) ==1026905== Address 0x2ce67398 is 11,112 bytes inside an unallocated block of size 11,232 in arena "client" ==1026905== ==1026905== Invalid read of size 8 ==1026905== at 0x1912AC9A: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x6943B61: superlu_gridexit (in /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0) ==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x1912447B: ompi_attr_delete_impl (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x19126FFE: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x4FEE49D: PetscCommDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x5317899: MatDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x5336E58: matdestroy_ (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== Address 0x2ce673c0 is 11,152 bytes inside an unallocated block of size 11,232 in arena "client" ==1026905== ==1026905== Invalid read of size 8 ==1026905== at 0x19126E5B: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x6943B61: superlu_gridexit (in /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0) ==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x1912447B: ompi_attr_delete_impl (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x19126FFE: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x4FEE49D: PetscCommDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x5317899: MatDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== Address 0x91 is not stack'd, malloc'd or (recently) free'd ==1026905== ==1026905== ==1026905== Process terminating with default action of signal 11 (SIGSEGV) ==1026905== Access not within mapped region at address 0x91 ==1026905== at 0x19126E5B: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x6943B61: superlu_gridexit (in /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0) ==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x1912447B: ompi_attr_delete_impl (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x19126FFE: ompi_attr_delete_all (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x1912ACC6: ompi_comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x19184555: PMPI_Comm_free (in /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) ==1026905== by 0x4FEE49D: PetscCommDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== by 0x5317899: MatDestroy (in /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) ==1026905== If you believe this happened as a result of a stack ==1026905== overflow in your program's main thread (unlikely but ==1026905== possible), you can try to increase the size of the ==1026905== main thread stack using the --main-stacksize= flag. ==1026905== The main thread stack size used in this run was 16777216. dr. ir. Menno A. Deij-van Rijswijk | Researcher | Research & Development MARIN | T +31 317 49 35 06 | M.Deij at marin.nl | www.marin.nl MARIN news: WASP webinar & WiSP workshop From: Barry Smith > Sent: Friday, April 23, 2021 7:09 PM To: Deij-van Rijswijk, Menno > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] MatDestroy problem with multiple matrices and SUPERLU_DIST Thanks for looking. Do these modules have any "automatic freeing" when variables go out of scope (like C++ classes do)? Do you make specific new MPI communicators to use create the matrices? Have you tried MPICH or a different version of OpenMPI. Maybe run the program with valgrind. The stack frames you sent look "funny", that is I would not normally expect them to be in such an order. Barry Help us improve the spam filter. If this message contains SPAM, click here to report. Thank you, MARIN Digital Services -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imagebac17c.PNG Type: image/png Size: 293 bytes Desc: imagebac17c.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imageb85205.PNG Type: image/png Size: 331 bytes Desc: imageb85205.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image9140b1.PNG Type: image/png Size: 333 bytes Desc: image9140b1.PNG URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: imagea38d5b.PNG Type: image/png Size: 253 bytes Desc: imagea38d5b.PNG URL: From milan.pelletier at protonmail.com Thu May 6 10:08:33 2021 From: milan.pelletier at protonmail.com (Milan Pelletier) Date: Thu, 06 May 2021 15:08:33 +0000 Subject: [petsc-users] Equivalent function call for -mg_levels_ksp_type & -mg_levels_pc_type Message-ID: Dear users, I would like to know what function call would allow me to achieve the same result as using the two flags -mg_levels_ksp_type richardson -mg_levels_pc_type jacobi and then PCSetFromOptions(). I would prefer not to rely on command-line options and rather do the setup "by hand". To achieve that, what should I add after these lines: KSPCreate(PETSC_COMM_WORLD, solver); KSPSetType(solver, KSPCG); KSPGetPC(solver, &pc); PCSetType(pc, PCGAMGAGG); Thanks for your help, Regards, Milan Pelletier -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu May 6 11:51:31 2021 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 6 May 2021 11:51:31 -0500 Subject: [petsc-users] Equivalent function call for -mg_levels_ksp_type & -mg_levels_pc_type In-Reply-To: References: Message-ID: <1C006501-35FA-4F37-B452-EFDE81DE7B36@petsc.dev> Milan, The simplest way is to use PetscOptionsSetValue() in the code to set the options. If you have multiple solvers you can use PetscObjectSetOptionsPrefix() to use different prefixes for each solver and control them separately. The number of levels and the solvers at each level are created "on the fly" based on the matrix, there is currently no "function call" way to set these options manually by function calls. Barry > On May 6, 2021, at 10:08 AM, Milan Pelletier via petsc-users wrote: > > Dear users, > > I would like to know what function call would allow me to achieve the same result as using the two flags > -mg_levels_ksp_type richardson > -mg_levels_pc_type jacobi > and then PCSetFromOptions(). > I would prefer not to rely on command-line options and rather do the setup "by hand". > > To achieve that, what should I add after these lines: > KSPCreate(PETSC_COMM_WORLD, solver); > KSPSetType(solver, KSPCG); > KSPGetPC(solver, &pc); > PCSetType(pc, PCGAMGAGG); > > > Thanks for your help, > Regards, > > Milan Pelletier > From arashrezaei96 at gmail.com Thu May 6 00:48:05 2021 From: arashrezaei96 at gmail.com (arash rezaei) Date: Thu, 6 May 2021 10:18:05 +0430 Subject: [petsc-users] Error in installation of petsc4py Message-ID: Hello, I want to install pets4py I use this command: python -m pip install [--user] petsc petsc4py -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- CUsersArashpython -m pip install --user numpy mpi4py Requirement already satisfied numpy in cusersarashappdataroamingpythonpython39site-packages (1.19.4) Requirement already satisfied mpi4py in cusersarashappdataroamingpythonpython39site-packages (3.0.3) CUsersArashpython -m pip install --user petsc petsc4py Collecting petsc Using cached petsc-3.15.0.tar.gz (16.0 MB) Collecting petsc4py Using cached petsc4py-3.15.0.tar.gz (2.1 MB) Requirement already satisfied numpy in cusersarashappdataroamingpythonpython39site-packages (from petsc4py) (1.19.4) Using legacy 'setup.py install' for petsc, since package 'wheel' is not installed. Using legacy 'setup.py install' for petsc4py, since package 'wheel' is not installed. Installing collected packages petsc, petsc4py Running setup.py install for petsc ... error ERROR Command errored out with exit status 1 command 'CProgram FilesPython39python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '''CUsersArashAppDataLocalTemppip-install-2pv0ftutpetsc_718bdaa5c0fd475eb4937d6b87c8c4afsetup.py'''; __file__='''CUsersArashAppDataLocalTemppip-install-2pv0ftutpetsc_718bdaa5c0fd475eb4937d6b87c8c4afsetup.py''';f = getattr(tokenize, '''open''', open)(__file__) if os.path.exists(__file__) else io.StringIO('''from setuptools import setup; setup()''');code = f.read().replace('''rn''', '''n''');f.close();exec(compile(code, __file__, '''exec'''))' install --record 'CUsersArashAppDataLocalTemppip-record-c5j1b3k_install-record.txt' --single-version-externally-managed --user --prefix= --compile --install-headers 'CUsersArashAppDataRoamingPythonPython39Includepetsc' cwd CUsersArashAppDataLocalTemppip-install-2pv0ftutpetsc_718bdaa5c0fd475eb4937d6b87c8c4af Complete output (28 lines) running install PETSc configure configure options --prefix=CUsersArashAppDataRoamingPythonPython39site-packagespetsc PETSC_ARCH=arch-python-win-amd64 --with-shared-libraries=1 --with-debugging=0 --with-c2html=0 --with-mpi=0 'CProgram' is not recognized as an internal or external command, operable program or batch file. Traceback (most recent call last) File string, line 1, in module File CUsersArashAppDataLocalTemppip-install-2pv0ftutpetsc_718bdaa5c0fd475eb4937d6b87c8c4afsetup.py, line 293, in module setup(name='petsc', File CProgram FilesPython39libsite-packagessetuptools__init__.py, line 165, in setup return distutils.core.setup(attrs) File CProgram FilesPython39libdistutilscore.py, line 148, in setup dist.run_commands() File CProgram FilesPython39libdistutilsdist.py, line 966, in run_commands self.run_command(cmd) File CProgram FilesPython39libdistutilsdist.py, line 985, in run_command cmd_obj.run() File CUsersArashAppDataLocalTemppip-install-2pv0ftutpetsc_718bdaa5c0fd475eb4937d6b87c8c4afsetup.py, line 229, in run config(prefix, self.dry_run) File CUsersArashAppDataLocalTemppip-install-2pv0ftutpetsc_718bdaa5c0fd475eb4937d6b87c8c4afsetup.py, line 165, in config if status != 0 raise RuntimeError(status) RuntimeError 1 ---------------------------------------- ERROR Command errored out with exit status 1 'CProgram FilesPython39python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '''CUsersArashAppDataLocalTemppip-install-2pv0ftutpetsc_718bdaa5c0fd475eb4937d6b87c8c4afsetup.py'''; __file__='''CUsersArashAppDataLocalTemppip-install-2pv0ftutpetsc_718bdaa5c0fd475eb4937d6b87c8c4afsetup.py''';f = getattr(tokenize, '''open''', open)(__file__) if os.path.exists(__file__) else io.StringIO('''from setuptools import setup; setup()''');code = f.read().replace('''rn''', '''n''');f.close();exec(compile(code, __file__, '''exec'''))' install --record 'CUsersArashAppDataLocalTemppip-record-c5j1b3k_install-record.txt' --single-version-externally-managed --user --prefix= --compile --install-headers 'CUsersArashAppDataRoamingPythonPython39Includepetsc' Check the logs for full command output. From arashrezaei96 at gmail.com Thu May 6 00:49:14 2021 From: arashrezaei96 at gmail.com (arash rezaei) Date: Thu, 6 May 2021 10:19:14 +0430 Subject: [petsc-users] Error in installation of petsc4py In-Reply-To: References: Message-ID: But it gives Error, I sent you the Error description. On Thu, May 6, 2021 at 10:18 AM arash rezaei wrote: > Hello, > I want to install pets4py > I use this command: > > python -m pip install [--user] petsc petsc4py > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From saransh.saxena5571 at gmail.com Thu May 6 08:18:29 2021 From: saransh.saxena5571 at gmail.com (Saransh Saxena) Date: Thu, 6 May 2021 15:18:29 +0200 Subject: [petsc-users] Integrating SNES in FEM code Message-ID: Hi, I am trying to incorporate newton method in solving a nonlinear FEM equation using SNES from PETSc. The overall equation is of the type A(x).x = b, where b is a vector of external loads, x is the solution field (say displacements for e.g.) and A is the combined LHS matrix derived from the discretization of weak formulation of the governing finite element equation. While going through the manual and examples of snes, I found that I need to define the function of residual using SNESSetFunction() and jacobian using SNESSetJacobian(). In that context I had a couple of questions :- 1. In the snes tutorials I've browsed through, the functions for computing residual passed had arguments only for x, the solution vector and f, the residual vector. Is there a way a user can pass an additional vector (b) and matrix (A) for computing the residual as well? as in my case, f = b - A(x).x 2. Since computing jacobian is not that trivial, I would like to use one of the pre-built jacobian methods. Is there any other step other than setting the 3rd argument in SNESSetJacobian to SNESComputeJacobianDefault? Best regards, Saransh -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu May 6 13:16:41 2021 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 6 May 2021 14:16:41 -0400 Subject: [petsc-users] Error in installation of petsc4py In-Reply-To: References: Message-ID: The error is in your script: 'CProgram' is not recognized as an internal or external command, Matt On Thu, May 6, 2021 at 2:09 PM arash rezaei wrote: > Hello, > I want to install pets4py > I use this command: > > python -m pip install [--user] petsc petsc4py > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Thu May 6 13:19:24 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 6 May 2021 13:19:24 -0500 Subject: [petsc-users] Error in installation of petsc4py In-Reply-To: References: Message-ID: <2dd5a2a-31f7-ebae-d0e5-e296d48e738@mcs.anl.gov> Is this on windows with windows python? This likely won't work. Can you use WSL? Satish On Thu, 6 May 2021, Matthew Knepley wrote: > The error is in your script: > > 'CProgram' is not recognized as an internal or external command, > > > Matt > > On Thu, May 6, 2021 at 2:09 PM arash rezaei wrote: > > > Hello, > > I want to install pets4py > > I use this command: > > > > python -m pip install [--user] petsc petsc4py > > > > > > From knepley at gmail.com Thu May 6 13:21:57 2021 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 6 May 2021 14:21:57 -0400 Subject: [petsc-users] Integrating SNES in FEM code In-Reply-To: References: Message-ID: On Thu, May 6, 2021 at 2:09 PM Saransh Saxena wrote: > Hi, > > I am trying to incorporate newton method in solving a nonlinear FEM > equation using SNES from PETSc. The overall equation is of the type A(x).x > = b, where b is a vector of external loads, x is the solution field (say > displacements for e.g.) and A is the combined LHS matrix derived from the > discretization of weak formulation of the governing finite element > equation. > > While going through the manual and examples of snes, I found that I need > to define the function of residual using SNESSetFunction() and jacobian > using SNESSetJacobian(). In that context I had a couple of questions :- > > 1. In the snes tutorials I've browsed through, the functions for computing > residual passed had arguments only for x, the solution vector and f, the > residual vector. Is there a way a user can pass an additional vector (b) > and matrix (A) for computing the residual as well? as in my case, f = b - > A(x).x > You would give PETSc an outer function MyResidual() that looked like this: PetscErrorCode MyResidual(SNES snes, Vec X, Vec F, void *ctx) { MatMult(A, X, F); VecAXPY(F, -1.0, b); } > 2. Since computing jacobian is not that trivial, I would like to use one > of the pre-built jacobian methods. Is there any other step other than > setting the 3rd argument in SNESSetJacobian to SNESComputeJacobianDefault? > If you do nothing, we will compute it by default. Thanks, MAtt > Best regards, > > Saransh > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu May 6 14:03:31 2021 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 6 May 2021 14:03:31 -0500 Subject: [petsc-users] Error in installation of petsc4py In-Reply-To: <2dd5a2a-31f7-ebae-d0e5-e296d48e738@mcs.anl.gov> References: <2dd5a2a-31f7-ebae-d0e5-e296d48e738@mcs.anl.gov> Message-ID: <3B32D925-FD08-4B6E-B82F-377B96567C66@petsc.dev> PETSc folks who know a little bit about Windows Python, Does the Microsoft Python environment work with the standard Python setup tools? It seems like it should but ... Is the following error coming from the generic use of setup.py on Windows or is it somehow specific to PETSc's use of setup? Running setup.py install for petsc ... error ERROR Command errored out with exit status 1 command 'CProgram FilesPython39python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '''CUsersArashAppDataLocalTemppip-install-2pv0ftutpetsc_718bdaa5c0fd475eb4937d6b87c8c4afsetup.py'''; __file__='''CUsersArashAppDataLocalTemppip-install-2pv0ftutpetsc_718bdaa5c0fd475eb4937d6b87c8c4afsetup.py''';f = getattr(tokenize, '''open''', open)(__file__) if os.path.exists(__file__) else io.StringIO('''from setuptools import setup; setup()''');code = f.read().replace('''rn''', '''n''');f.close();exec(compile(code, __file__, '''exec'''))' install --record 'CUsersArashAppDataLocalTemppip-record-c5j1b3k_install-record.txt' --single-version-externally-managed --user --prefix= --compile --install-headers 'CUsersArashAppDataRoamingPythonPython39Includepetsc' cwd CUsersArashAppDataLocalTemppip-install-2pv0ftutpetsc_718bdaa5c0fd475eb4937d6b87c8c4af Complete output (28 lines) The page https://docs.microsoft.com/en-us/windows/python/faqs has the following text: To paste a path as a string in Python, add the r prefix. This indicates that it is a raw string, and no escape characters will be used except for \? (you might need to remove the last backslash in your path). So your path might look like:r"C:\Users\MyName\Documents\Document.txt" When working with paths in Python, we recommend using the standard pathlib module. This will let you convert the string to a rich Path object that can do path manipulations consistently whether it uses forward slashes or backslashes, making your codework better across different operating systems. Should we/can we convert PETSc's configure and setup.py to use pathlib (and maybe need to add some r in places) to fully support using Microsoft Python and the Microsoft way of handling file systems? Thanks for any input, Barry We've never supported using the Microsoft Python environment and always used cygwin python in the past. > On May 6, 2021, at 1:19 PM, Satish Balay via petsc-users wrote: > > Is this on windows with windows python? This likely won't work. > > Can you use WSL? > > Satish > > On Thu, 6 May 2021, Matthew Knepley wrote: > >> The error is in your script: >> >> 'CProgram' is not recognized as an internal or external command, >> >> >> Matt >> >> On Thu, May 6, 2021 at 2:09 PM arash rezaei wrote: >> >>> Hello, >>> I want to install pets4py >>> I use this command: >>> >>> python -m pip install [--user] petsc petsc4py >>> >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From arashrezaei96 at gmail.com Thu May 6 15:22:31 2021 From: arashrezaei96 at gmail.com (arash rezaei) Date: Fri, 7 May 2021 00:52:31 +0430 Subject: [petsc-users] Error in installation of petsc4py In-Reply-To: <3B32D925-FD08-4B6E-B82F-377B96567C66@petsc.dev> References: <2dd5a2a-31f7-ebae-d0e5-e296d48e738@mcs.anl.gov> <3B32D925-FD08-4B6E-B82F-377B96567C66@petsc.dev> Message-ID: On Thu, May 6, 2021 at 11:33 PM Barry Smith wrote: > > PETSc folks who know a little bit about Windows Python, > > Does the Microsoft Python environment work with the standard Python > setup tools? It seems like it should but ... > > Is the following error coming from the generic use of setup.py on > Windows or is it somehow specific to PETSc's use of setup? > > Running setup.py install for petsc ... error > ERROR Command errored out with exit status 1 > command 'CProgram FilesPython39python.exe' -u -c 'import io, os, sys, > setuptools, tokenize; sys.argv[0] = > '''CUsersArashAppDataLocalTemppip-install-2pv0ftutpetsc_718bdaa5c0fd475eb4937d6b87c8c4afsetup.py'''; > __file__='''CUsersArashAppDataLocalTemppip-install-2pv0ftutpetsc_718bdaa5c0fd475eb4937d6b87c8c4afsetup.py''';f > = getattr(tokenize, '''open''', open)(__file__) if os.path.exists(__file__) > else io.StringIO('''from setuptools import setup; setup()''');code = > f.read().replace('''rn''', '''n''');f.close();exec(compile(code, __file__, > '''exec'''))' install --record > 'CUsersArashAppDataLocalTemppip-record-c5j1b3k_install-record.txt' > --single-version-externally-managed --user --prefix= --compile > --install-headers 'CUsersArashAppDataRoamingPythonPython39Includepetsc' > cwd > CUsersArashAppDataLocalTemppip-install-2pv0ftutpetsc_718bdaa5c0fd475eb4937d6b87c8c4af > Complete output (28 lines) > > The page https://docs.microsoft.com/en-us/windows/python/faqs has > the following text: > > To paste a path as a string in Python, add the r prefix. This > indicates that it is a raw string, and no escape characters will be used except > for \? (you might need to remove the last backslash in your path). So your > path might look like:r"C:\Users\MyName\Documents\Document.txt" When > working with paths in Python, we recommend using the standard pathlib > module. This will let you convert the string to a rich Path object that > can do path manipulations consistently whether it uses forward slashes or > backslashes, making your codework better across different operating > systems. > > Should we/can we convert PETSc's configure and setup.py to use > pathlib (and maybe need to add some r in places) to fully support using > Microsoft Python and the Microsoft way of handling file systems? > > Thanks for any input, > > Barry > > We've never supported using the Microsoft Python environment and always > used cygwin python in the past. > > > > > On May 6, 2021, at 1:19 PM, Satish Balay via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Is this on windows with windows python? This likely won't work. > > Can you use WSL? > > Satish > > On Thu, 6 May 2021, Matthew Knepley wrote: > > The error is in your script: > > 'CProgram' is not recognized as an internal or external command, > > > Matt > > On Thu, May 6, 2021 at 2:09 PM arash rezaei > wrote: > > Hello, > I want to install pets4py > I use this command: > > python -m pip install [--user] petsc petsc4py > > > > Thanks to all. I check it with linux tomorrow. I want to ask you if it > possible, please create a documentation and some example that show how we > can use this package in python. I am a master student and this package can > help me a lot. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vikram.bhamidipati at swri.org Thu May 6 16:54:25 2021 From: vikram.bhamidipati at swri.org (Bhamidipati, Vikram) Date: Thu, 6 May 2021 21:54:25 +0000 Subject: [petsc-users] MUMPS error Message-ID: I configured a PetSc build to include mumps using '--download-mumps=1'. But it was not successful. I am attaching the relevant portion from configure.log. Could you advise me on what went wrong with the compilation and building of MUMPS library? Thanks, Vikram --------------------------------------------------- Vikram Bhamidipati Senior Research Engineer Computational Material Integrity Section Materials Engineering Department Mechanical Engineering Division Ph: (210) 522-2576 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: mumps_error.txt URL: From balay at mcs.anl.gov Thu May 6 17:02:41 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 6 May 2021 17:02:41 -0500 Subject: [petsc-users] MUMPS error In-Reply-To: References: Message-ID: Can you retry and see if the problem persists? There might be an issue with missing dependencies in mumps parallel build.. If it still fails - you can try the additional configure option: --with-make-np=1 Satish On Thu, 6 May 2021, Bhamidipati, Vikram wrote: > I configured a PetSc build to include mumps using '--download-mumps=1'. But it was not successful. I am attaching the relevant portion from configure.log. Could you advise me on what went wrong with the compilation and building of MUMPS library? > > Thanks, > Vikram > > --------------------------------------------------- > Vikram Bhamidipati > Senior Research Engineer > Computational Material Integrity Section > Materials Engineering Department > Mechanical Engineering Division > Ph: (210) 522-2576 > > From vikram.bhamidipati at swri.org Thu May 6 17:23:10 2021 From: vikram.bhamidipati at swri.org (Bhamidipati, Vikram) Date: Thu, 6 May 2021 22:23:10 +0000 Subject: [petsc-users] MUMPS error In-Reply-To: References: Message-ID: The problem did not go away on a second attempt but was overcome with '--with-make-np=1' flag. Thank you, Vikram -----Original Message----- From: Satish Balay Sent: Thursday, May 6, 2021 5:03 PM To: Bhamidipati, Vikram Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] MUMPS error [EXTERNAL EMAIL] Can you retry and see if the problem persists? There might be an issue with missing dependencies in mumps parallel build.. If it still fails - you can try the additional configure option: --with-make-np=1 Satish On Thu, 6 May 2021, Bhamidipati, Vikram wrote: > I configured a PetSc build to include mumps using '--download-mumps=1'. But it was not successful. I am attaching the relevant portion from configure.log. Could you advise me on what went wrong with the compilation and building of MUMPS library? > > Thanks, > Vikram > > --------------------------------------------------- > Vikram Bhamidipati > Senior Research Engineer > Computational Material Integrity Section Materials Engineering > Department Mechanical Engineering Division > Ph: (210) 522-2576 > > From bsmith at petsc.dev Fri May 7 22:18:04 2021 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 7 May 2021 22:18:04 -0500 Subject: [petsc-users] Integrating SNES in FEM code In-Reply-To: References: Message-ID: <82576661-7CE2-4AF3-B1EA-E0C04B103702@petsc.dev> Saransh, I've add some code for SNESSetPicard() in the PETSc branch barry/2021-05-06/add-snes-picard-mf see also https://gitlab.com/petsc/petsc/-/merge_requests/3962 that will make your coding much easier. With this branch you can provide code that computes A(x), using SNESSetPicard(). 1) by default it will use the defection-correction form of Picard iteration A(x^n)(x^{n+1} - x^{n}) = b - A(x^n) to solve, which can be slower than Newton 2) with -snes_fd_color it will compute the Jacobian via coloring using SNESComputeJacobianDefaultColor() (assuming the true Jacobian has the same sparsity structure as A). The true Jacobian is J(x^n) = A'(x^n)[x^n] - A(x^n) where A'(x^n) is the third order tensor of the derivatives of A() and A'(x^n)[x^n] is a matrix, I do not know if, in general, it has the same nonzero structure as A. (I'm lost beyond matrices :-(). 3) with -snes_mf_operator it will apply the true Jacobian matrix-free and precondition it with a preconditioner built from A(x^n) matrix, for some problems this works well. 4) with -snes_fd it uses SNESComputeJacobianDefault() and computes the Jacobian by finite differencing one column at a time, thus it is very slow and not useful for large problems. But useful for testing with small problems. So you can provide A() and need not worrying about providing the Jacobian or even the function evaluation code. It is all taken care of by SNESSetPicard(). Hope this helps, Barry > On May 6, 2021, at 1:21 PM, Matthew Knepley wrote: > > On Thu, May 6, 2021 at 2:09 PM Saransh Saxena > wrote: > Hi, > > I am trying to incorporate newton method in solving a nonlinear FEM equation using SNES from PETSc. The overall equation is of the type A(x).x = b, where b is a vector of external loads, x is the solution field (say displacements for e.g.) and A is the combined LHS matrix derived from the discretization of weak formulation of the governing finite element equation. > > While going through the manual and examples of snes, I found that I need to define the function of residual using SNESSetFunction() and jacobian using SNESSetJacobian(). In that context I had a couple of questions :- > > 1. In the snes tutorials I've browsed through, the functions for computing residual passed had arguments only for x, the solution vector and f, the residual vector. Is there a way a user can pass an additional vector (b) and matrix (A) for computing the residual as well? as in my case, f = b - A(x).x > > You would give PETSc an outer function MyResidual() that looked like this: > > PetscErrorCode MyResidual(SNES snes, Vec X, Vec F, void *ctx) > { > > > MatMult(A, X, F); > VecAXPY(F, -1.0, b); > } > > 2. Since computing jacobian is not that trivial, I would like to use one of the pre-built jacobian methods. Is there any other step other than setting the 3rd argument in SNESSetJacobian to SNESComputeJacobianDefault? > > If you do nothing, we will compute it by default. > > Thanks, > > MAtt > > Best regards, > > Saransh > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From saransh.saxena5571 at gmail.com Sun May 9 14:07:58 2021 From: saransh.saxena5571 at gmail.com (Saransh Saxena) Date: Sun, 9 May 2021 21:07:58 +0200 Subject: [petsc-users] Integrating SNES in FEM code In-Reply-To: <82576661-7CE2-4AF3-B1EA-E0C04B103702@petsc.dev> References: <82576661-7CE2-4AF3-B1EA-E0C04B103702@petsc.dev> Message-ID: Thanks Barry and Matt, Till now I was only using a simple fixed point nonlinear solver manually coded instead of ones provided by PETSc. However, the problem I am trying to solve is highly nonlinear so I suppose I'll need at least a newton based solver to start with. I'll get back to you guys if I have any questions. Cheers, Saransh On Sat, May 8, 2021 at 5:18 AM Barry Smith wrote: > Saransh, > > I've add some code for SNESSetPicard() in the PETSc branch > barry/2021-05-06/add-snes-picard-mf see also http > s://gitlab.com/petsc/petsc/-/merge_requests/3962 that will make your > coding much easier. > > With this branch you can provide code that computes A(x), using > SNESSetPicard(). > > 1) by default it will use the defection-correction form of Picard > iteration A(x^n)(x^{n+1} - x^{n}) = b - A(x^n) to solve, which can be > slower than Newton > > 2) with -snes_fd_color it will compute the Jacobian via coloring using > SNESComputeJacobianDefaultColor() (assuming the true Jacobian has the same > sparsity structure as A). The true Jacobian is J(x^n) = A'(x^n)[x^n] - > A(x^n) where A'(x^n) is the third order tensor of the derivatives of A() > and A'(x^n)[x^n] is a matrix, I do not know if, in general, it has the same > nonzero structure as A. (I'm lost beyond matrices :-(). > > 3) with -snes_mf_operator it will apply the true Jacobian matrix-free and > precondition it with a preconditioner built from A(x^n) matrix, for some > problems this works well. > > 4) with -snes_fd it uses SNESComputeJacobianDefault() and computes the > Jacobian by finite differencing one column at a time, thus it is very slow > and not useful for large problems. But useful for testing with small > problems. > > So you can provide A() and need not worrying about providing the Jacobian > or even the function evaluation code. It is all taken care of by > SNESSetPicard(). > > Hope this helps, > > Barry > > > On May 6, 2021, at 1:21 PM, Matthew Knepley wrote: > > On Thu, May 6, 2021 at 2:09 PM Saransh Saxena < > saransh.saxena5571 at gmail.com> wrote: > >> Hi, >> >> I am trying to incorporate newton method in solving a nonlinear FEM >> equation using SNES from PETSc. The overall equation is of the type A(x).x >> = b, where b is a vector of external loads, x is the solution field (say >> displacements for e.g.) and A is the combined LHS matrix derived from the >> discretization of weak formulation of the governing finite element >> equation. >> >> While going through the manual and examples of snes, I found that I need >> to define the function of residual using SNESSetFunction() and jacobian >> using SNESSetJacobian(). In that context I had a couple of questions :- >> >> 1. In the snes tutorials I've browsed through, the functions for >> computing residual passed had arguments only for x, the solution vector and >> f, the residual vector. Is there a way a user can pass an additional vector >> (b) and matrix (A) for computing the residual as well? as in my case, f = b >> - A(x).x >> > > You would give PETSc an outer function MyResidual() that looked like this: > > PetscErrorCode MyResidual(SNES snes, Vec X, Vec F, void *ctx) > { > > > MatMult(A, X, F); > VecAXPY(F, -1.0, b); > } > > >> 2. Since computing jacobian is not that trivial, I would like to use one >> of the pre-built jacobian methods. Is there any other step other than >> setting the 3rd argument in SNESSetJacobian to SNESComputeJacobianDefault? >> > > If you do nothing, we will compute it by default. > > Thanks, > > MAtt > > >> Best regards, >> >> Saransh >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sun May 9 15:43:31 2021 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 9 May 2021 15:43:31 -0500 Subject: [petsc-users] Integrating SNES in FEM code In-Reply-To: References: <82576661-7CE2-4AF3-B1EA-E0C04B103702@petsc.dev> Message-ID: <3EE85DD6-219F-4FF3-B5A3-F2BDB490CBE8@petsc.dev> Saransh, If Picard or Newton's method does not converge, you can consider adding pseudo-transient and/or other continuation methods. For example, if the problem is made difficult by certain physical parameters you can start with "easier" values of the parameters, solve the nonlinear system, then use its solution as the initial guess for slightly more "difficult" parameters, etc. Or, depending on the problem grid sequencing may be appropriate. We have some tools to help with all these approaches. Barry > On May 9, 2021, at 2:07 PM, Saransh Saxena wrote: > > Thanks Barry and Matt, > > Till now I was only using a simple fixed point nonlinear solver manually coded instead of ones provided by PETSc. However, the problem I am trying to solve is highly nonlinear so I suppose I'll need at least a newton based solver to start with. I'll get back to you guys if I have any questions. > > Cheers, > Saransh > > On Sat, May 8, 2021 at 5:18 AM Barry Smith > wrote: > Saransh, > > I've add some code for SNESSetPicard() in the PETSc branch barry/2021-05-06/add-snes-picard-mf see also https://gitlab.com/petsc/petsc/-/merge_requests/3962 <> that will make your coding much easier. > > With this branch you can provide code that computes A(x), using SNESSetPicard(). > > 1) by default it will use the defection-correction form of Picard iteration A(x^n)(x^{n+1} - x^{n}) = b - A(x^n) to solve, which can be slower than Newton > > 2) with -snes_fd_color it will compute the Jacobian via coloring using SNESComputeJacobianDefaultColor() (assuming the true Jacobian has the same sparsity structure as A). The true Jacobian is J(x^n) = A'(x^n)[x^n] - A(x^n) where A'(x^n) is the third order tensor of the derivatives of A() and A'(x^n)[x^n] is a matrix, I do not know if, in general, it has the same nonzero structure as A. (I'm lost beyond matrices :-(). > > 3) with -snes_mf_operator it will apply the true Jacobian matrix-free and precondition it with a preconditioner built from A(x^n) matrix, for some problems this works well. > > 4) with -snes_fd it uses SNESComputeJacobianDefault() and computes the Jacobian by finite differencing one column at a time, thus it is very slow and not useful for large problems. But useful for testing with small problems. > > So you can provide A() and need not worrying about providing the Jacobian or even the function evaluation code. It is all taken care of by SNESSetPicard(). > > Hope this helps, > > Barry > > >> On May 6, 2021, at 1:21 PM, Matthew Knepley > wrote: >> >> On Thu, May 6, 2021 at 2:09 PM Saransh Saxena > wrote: >> Hi, >> >> I am trying to incorporate newton method in solving a nonlinear FEM equation using SNES from PETSc. The overall equation is of the type A(x).x = b, where b is a vector of external loads, x is the solution field (say displacements for e.g.) and A is the combined LHS matrix derived from the discretization of weak formulation of the governing finite element equation. >> >> While going through the manual and examples of snes, I found that I need to define the function of residual using SNESSetFunction() and jacobian using SNESSetJacobian(). In that context I had a couple of questions :- >> >> 1. In the snes tutorials I've browsed through, the functions for computing residual passed had arguments only for x, the solution vector and f, the residual vector. Is there a way a user can pass an additional vector (b) and matrix (A) for computing the residual as well? as in my case, f = b - A(x).x >> >> You would give PETSc an outer function MyResidual() that looked like this: >> >> PetscErrorCode MyResidual(SNES snes, Vec X, Vec F, void *ctx) >> { >> >> >> MatMult(A, X, F); >> VecAXPY(F, -1.0, b); >> } >> >> 2. Since computing jacobian is not that trivial, I would like to use one of the pre-built jacobian methods. Is there any other step other than setting the 3rd argument in SNESSetJacobian to SNESComputeJacobianDefault? >> >> If you do nothing, we will compute it by default. >> >> Thanks, >> >> MAtt >> >> Best regards, >> >> Saransh >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.stone at opengosim.com Tue May 11 04:22:12 2021 From: daniel.stone at opengosim.com (Daniel Stone) Date: Tue, 11 May 2021 10:22:12 +0100 Subject: [petsc-users] Open MPI library version error Message-ID: Hello, I have the following error: PETSc Error --- Open MPI library version FUJITSU MPI Library 4.0.0 (4.0.1fj4.0.0) does not match what PETSc was compiled with 4.0, aborting I have to configure and compile petsc in a cross compilation environment and run it in a different one. I'm trying to run src/snes/examples/tutorials/ex19 to confirm things are working. It looks like this error could be down to some confusion with string comparison ("4.0.0" =/= "4.0"), or it could be some nightmare misalignment between the two environments I have to use, and/or my efforts to configure PETSc. I'm trying to determine which. Is there some way to turn off this version checking, to see if it is just a string comparison error and maybe the rest of the program will run fine? Or could someone point me to where in the source code/configuration scripts these numbers are harvested and this error is generated, so I can try to get a better idea about what is going on? Thanks, Daniel Stone -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Tue May 11 04:28:48 2021 From: pierre at joliv.et (Pierre Jolivet) Date: Tue, 11 May 2021 11:28:48 +0200 Subject: [petsc-users] Open MPI library version error In-Reply-To: References: Message-ID: <09B3A1EF-B538-4EF7-A0A4-C382505A7084@joliv.et> Hello Daniel, Which PETSc version are you using? I think this has been addressed there: https://gitlab.com/petsc/petsc/-/merge_requests/3671 Thanks, Pierre > On 11 May 2021, at 11:22 AM, Daniel Stone wrote: > > Hello, > > I have the following error: > > PETSc Error --- Open MPI library version > FUJITSU MPI Library 4.0.0 (4.0.1fj4.0.0) does not match what PETSc was compiled with 4.0, aborting > > I have to configure and compile petsc in a cross compilation environment and run it > in a different one. I'm trying to run src/snes/examples/tutorials/ex19 to confirm things > are working. > > It looks like this error could be down to some confusion with string comparison > ("4.0.0" =/= "4.0"), or it could be some nightmare misalignment between the two > environments I have to use, and/or my efforts to configure PETSc. I'm trying to > determine which. > > Is there some way to turn off this version checking, to see if it is just a string comparison > error and maybe the rest of the program will run fine? Or could someone point me to where > in the source code/configuration scripts these numbers are harvested and this error is generated, so I can try to get a better idea about what is going on? > > Thanks, > > Daniel Stone -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.stone at opengosim.com Tue May 11 04:34:09 2021 From: daniel.stone at opengosim.com (Daniel Stone) Date: Tue, 11 May 2021 10:34:09 +0100 Subject: [petsc-users] Open MPI library version error In-Reply-To: <09B3A1EF-B538-4EF7-A0A4-C382505A7084@joliv.et> References: <09B3A1EF-B538-4EF7-A0A4-C382505A7084@joliv.et> Message-ID: Hi Pierre, I'm using 3.12.2, which is a bit old, but needed for the Pflotran version I'm using. It certainly predates that merge. Many thanks for pointing me to that merge - if nothing else I could now study that and try manually implementing it to see if it helps my situation. Thanks, Daniel On Tue, May 11, 2021 at 10:28 AM Pierre Jolivet wrote: > Hello Daniel, > Which PETSc version are you using? > I think this has been addressed there: > https://gitlab.com/petsc/petsc/-/merge_requests/3671 > > Thanks, > Pierre > > On 11 May 2021, at 11:22 AM, Daniel Stone > wrote: > > Hello, > > I have the following error: > > PETSc Error --- Open MPI library version > FUJITSU MPI Library 4.0.0 (4.0.1fj4.0.0) does not match what PETSc was > compiled with 4.0, aborting > > I have to configure and compile petsc in a cross compilation environment > and run it > in a different one. I'm trying to run src/snes/examples/tutorials/ex19 to > confirm things > are working. > > It looks like this error could be down to some confusion with string > comparison > ("4.0.0" =/= "4.0"), or it could be some nightmare misalignment between > the two > environments I have to use, and/or my efforts to configure PETSc. I'm > trying to > determine which. > > Is there some way to turn off this version checking, to see if it is just > a string comparison > error and maybe the rest of the program will run fine? Or could someone > point me to where > in the source code/configuration scripts these numbers are harvested and > this error is generated, so I can try to get a better idea about what is > going on? > > Thanks, > > Daniel Stone > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue May 11 05:56:32 2021 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 11 May 2021 06:56:32 -0400 Subject: [petsc-users] Open MPI library version error In-Reply-To: References: <09B3A1EF-B538-4EF7-A0A4-C382505A7084@joliv.et> Message-ID: On Tue, May 11, 2021 at 5:34 AM Daniel Stone wrote: > Hi Pierre, > > I'm using 3.12.2, which is a bit old, but needed for the Pflotran version > I'm using. It certainly predates that merge. > > Many thanks for pointing me to that merge - if nothing else I could now > study that and try manually implementing > it to see if it helps my situation. > You might just be able to cd $PETSC_DIR git cherry-pick 16dc8964ca763700383c940ec0bdd667c5da14cf make Thanks, Matt > Thanks, > > Daniel > > On Tue, May 11, 2021 at 10:28 AM Pierre Jolivet wrote: > >> Hello Daniel, >> Which PETSc version are you using? >> I think this has been addressed there: >> https://gitlab.com/petsc/petsc/-/merge_requests/3671 >> >> Thanks, >> Pierre >> >> On 11 May 2021, at 11:22 AM, Daniel Stone >> wrote: >> >> Hello, >> >> I have the following error: >> >> PETSc Error --- Open MPI library version >> FUJITSU MPI Library 4.0.0 (4.0.1fj4.0.0) does not match what PETSc was >> compiled with 4.0, aborting >> >> I have to configure and compile petsc in a cross compilation environment >> and run it >> in a different one. I'm trying to run src/snes/examples/tutorials/ex19 to >> confirm things >> are working. >> >> It looks like this error could be down to some confusion with string >> comparison >> ("4.0.0" =/= "4.0"), or it could be some nightmare misalignment between >> the two >> environments I have to use, and/or my efforts to configure PETSc. I'm >> trying to >> determine which. >> >> Is there some way to turn off this version checking, to see if it is just >> a string comparison >> error and maybe the rest of the program will run fine? Or could someone >> point me to where >> in the source code/configuration scripts these numbers are harvested and >> this error is generated, so I can try to get a better idea about what is >> going on? >> >> Thanks, >> >> Daniel Stone >> >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From teixeira at zmt.swiss Tue May 11 08:30:31 2021 From: teixeira at zmt.swiss (Frederico Teixeira) Date: Tue, 11 May 2021 15:30:31 +0200 (CEST) Subject: [petsc-users] Binary format in real vs. complex scalar type configurations Message-ID: <1589889972.662636.1620739831184.JavaMail.zimbra@zurichmedtech.com> Dear fellows, I hope this message finds you safe and well. I have a complex-valued matrix and its real/imaginary components in binary format . They were extracted from a solver that only works with "scalar-type=complex" configuration. I am getting weird results when I load them into a small test program that's configured with "scalar-type=real", but I believe this is expected. At the end of the day, I would like to have both real and imaginary components as real-valued matrices. Is it possible to do it? I want to test preconditioners that are tailored for this sort of problem. Regards, Frederico. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Tue May 11 10:16:35 2021 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 11 May 2021 11:16:35 -0400 Subject: [petsc-users] Binary format in real vs. complex scalar type configurations In-Reply-To: <1589889972.662636.1620739831184.JavaMail.zimbra@zurichmedtech.com> References: <1589889972.662636.1620739831184.JavaMail.zimbra@zurichmedtech.com> Message-ID: On Tue, May 11, 2021 at 9:30 AM Frederico Teixeira wrote: > Dear fellows, > > I hope this message finds you safe and well. > > I have a complex-valued matrix and its real/imaginary components in binary > format. They were extracted from a solver that only works with > "scalar-type=complex" configuration. > I am getting weird results when I load them into a small test program > that's configured with "scalar-type=real", but I believe this is expected. > At the end of the day, I would like to have both real and imaginary > components as real-valued matrices. > Is it possible to do it? I want to test preconditioners that are tailored > for this sort of problem. > Do you mean you want what is called "equivalent real form" where the real and complex parts are stored as type 'double' for example and operations like multiply take two pairs of doubles, do a complex multiply manually, and return a real/complex pair of doubles? > > Regards, > Frederico. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Tue May 11 11:03:20 2021 From: pierre at joliv.et (Pierre Jolivet) Date: Tue, 11 May 2021 18:03:20 +0200 Subject: [petsc-users] Binary format in real vs. complex scalar type configurations In-Reply-To: <1589889972.662636.1620739831184.JavaMail.zimbra@zurichmedtech.com> References: <1589889972.662636.1620739831184.JavaMail.zimbra@zurichmedtech.com> Message-ID: Hello Frederico, I?m not sure that?s possible. Here is what I do, it makes me sick, but mixing precisions/scalar types with PETSc is difficult (crossing my fingers this will be better with future). In MATLAB (after putting petsc/share/petsc/matlab in the path): A = PetscBinaryRead('your_binary_mat_with_re+im.dat','complex',true); % scalar-type=complex PetscBinaryWrite('re.dat',real(A)); % scalar-type=real PetscBinaryWrite('im.dat',imag(A)); % scalar-type=real Thanks, Pierre > On 11 May 2021, at 3:30 PM, Frederico Teixeira wrote: > > Dear fellows, > > I hope this message finds you safe and well. > > I have a complex-valued matrix and its real/imaginary components in binary format. They were extracted from a solver that only works with "scalar-type=complex" configuration. > I am getting weird results when I load them into a small test program that's configured with "scalar-type=real", but I believe this is expected. > At the end of the day, I would like to have both real and imaginary components as real-valued matrices. > Is it possible to do it? I want to test preconditioners that are tailored for this sort of problem. > > Regards, > Frederico. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue May 11 11:20:48 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 11 May 2021 11:20:48 -0500 Subject: [petsc-users] Open MPI library version error In-Reply-To: References: <09B3A1EF-B538-4EF7-A0A4-C382505A7084@joliv.et> Message-ID: <2016b413-f3cf-ad1-da27-89c346644af@mcs.anl.gov> Or disable this check.. i.e change #elif defined(OMPI_MAJOR_VERSION) to #elif defined(OMPI_MAJOR_VERSION_disable_check) Satish On Tue, 11 May 2021, Matthew Knepley wrote: > On Tue, May 11, 2021 at 5:34 AM Daniel Stone > wrote: > > > Hi Pierre, > > > > I'm using 3.12.2, which is a bit old, but needed for the Pflotran version > > I'm using. It certainly predates that merge. > > > > Many thanks for pointing me to that merge - if nothing else I could now > > study that and try manually implementing > > it to see if it helps my situation. > > > > You might just be able to > > cd $PETSC_DIR > git cherry-pick 16dc8964ca763700383c940ec0bdd667c5da14cf > make > > Thanks, > > Matt > > > > Thanks, > > > > Daniel > > > > On Tue, May 11, 2021 at 10:28 AM Pierre Jolivet wrote: > > > >> Hello Daniel, > >> Which PETSc version are you using? > >> I think this has been addressed there: > >> https://gitlab.com/petsc/petsc/-/merge_requests/3671 > >> > >> Thanks, > >> Pierre > >> > >> On 11 May 2021, at 11:22 AM, Daniel Stone > >> wrote: > >> > >> Hello, > >> > >> I have the following error: > >> > >> PETSc Error --- Open MPI library version > >> FUJITSU MPI Library 4.0.0 (4.0.1fj4.0.0) does not match what PETSc was > >> compiled with 4.0, aborting > >> > >> I have to configure and compile petsc in a cross compilation environment > >> and run it > >> in a different one. I'm trying to run src/snes/examples/tutorials/ex19 to > >> confirm things > >> are working. > >> > >> It looks like this error could be down to some confusion with string > >> comparison > >> ("4.0.0" =/= "4.0"), or it could be some nightmare misalignment between > >> the two > >> environments I have to use, and/or my efforts to configure PETSc. I'm > >> trying to > >> determine which. > >> > >> Is there some way to turn off this version checking, to see if it is just > >> a string comparison > >> error and maybe the rest of the program will run fine? Or could someone > >> point me to where > >> in the source code/configuration scripts these numbers are harvested and > >> this error is generated, so I can try to get a better idea about what is > >> going on? > >> > >> Thanks, > >> > >> Daniel Stone > >> > >> > >> > > From knepley at gmail.com Tue May 11 11:26:46 2021 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 11 May 2021 12:26:46 -0400 Subject: [petsc-users] Binary format in real vs. complex scalar type configurations In-Reply-To: References: <1589889972.662636.1620739831184.JavaMail.zimbra@zurichmedtech.com> Message-ID: On Tue, May 11, 2021 at 12:03 PM Pierre Jolivet wrote: > Hello Frederico, > I?m not sure that?s possible. > Here is what I do, it makes me sick, but mixing precisions/scalar types > with PETSc is difficult (crossing my fingers this will be better with > future). > In MATLAB (after putting petsc/share/petsc/matlab in the path): > A = PetscBinaryRead('your_binary_mat_with_re+im.dat','complex',true); % > scalar-type=complex > PetscBinaryWrite('re.dat',real(A)); % scalar-type=real > PetscBinaryWrite('im.dat',imag(A)); % scalar-type=real > So what you want to happen is that MatLoad() looks at the datatype, sees that it is complex and PetscScalar is real, and returns two matrices with the real and imaginary parts? The hard part is that the MatLoad interface returns a single matrix. I guess we could have a flag that says what to do with complex numbers (read real, read imaginary, read norm, etc.) and you could read it twice. Would that work? Thanks, Matt > Thanks, > Pierre > > On 11 May 2021, at 3:30 PM, Frederico Teixeira wrote: > > Dear fellows, > > I hope this message finds you safe and well. > > I have a complex-valued matrix and its real/imaginary components in binary > format. They were extracted from a solver that only works with > "scalar-type=complex" configuration. > I am getting weird results when I load them into a small test program > that's configured with "scalar-type=real", but I believe this is expected. > At the end of the day, I would like to have both real and imaginary > components as real-valued matrices. > Is it possible to do it? I want to test preconditioners that are tailored > for this sort of problem. > > Regards, > Frederico. > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From teixeira at zmt.swiss Tue May 11 15:03:39 2021 From: teixeira at zmt.swiss (Frederico Teixeira) Date: Tue, 11 May 2021 22:03:39 +0200 (CEST) Subject: [petsc-users] Binary format in real vs. complex scalar type configurations In-Reply-To: References: <1589889972.662636.1620739831184.JavaMail.zimbra@zurichmedtech.com> Message-ID: <721568093.720617.1620763419373.JavaMail.zimbra@zurichmedtech.com> Hi Mark, I think so. The complex-valued linear system Ax = b, with A=T+Wi, u=x+yi, b=p+qi (i is the unit complex) can be transformed into a 2x2 real-valued system: [ T -W ] (x) = (p) [ W T ] (y) (q) Perhaps a bit more background can clarify things. System A stems from a Poisson equation with complex coefficients. There are ~130 piecewise constant coefficients ranging across 5 orders of magnitude. Most of the KSPs and PCs I managed to test take between 6000-9000 iterations to reach 10^-9. I hope to get some speed up with preconditioners that exploit the structure above. Regards, Frederico. Dr. Frederico Teixeira Computational Modeler and Software Developer, [ http://www.itis.swiss/ | ZMT ] (member of [ https://www.z43.swiss/ | Zurich43 ] ) P +41 44 245 9698 Zeughausstrasse 43, 8004 Zurich, Switzerland From: "Mark Adams" To: "Frederico Teixeira" Cc: "petsc-users" Sent: Tuesday, May 11, 2021 5:16:35 PM Subject: Re: [petsc-users] Binary format in real vs. complex scalar type configurations On Tue, May 11, 2021 at 9:30 AM Frederico Teixeira wrote: Dear fellows, I hope this message finds you safe and well. I have a complex-valued matrix and its real/imaginary components in binary format . They were extracted from a solver that only works with "scalar-type=complex" configuration. I am getting weird results when I load them into a small test program that's configured with "scalar-type=real", but I believe this is expected. At the end of the day, I would like to have both real and imaginary components as real-valued matrices. Is it possible to do it? I want to test preconditioners that are tailored for this sort of problem. Do you mean you want what is called "equivalent real form" where the real and complex parts are stored as type 'double' for example and operations like multiply take two pairs of doubles, do a complex multiply manually, and return a real/complex pair of doubles? BQ_BEGIN Regards, Frederico. BQ_END -------------- next part -------------- An HTML attachment was scrubbed... URL: From teixeira at zmt.swiss Tue May 11 15:08:18 2021 From: teixeira at zmt.swiss (Frederico Teixeira) Date: Tue, 11 May 2021 22:08:18 +0200 (CEST) Subject: [petsc-users] Binary format in real vs. complex scalar type configurations In-Reply-To: References: <1589889972.662636.1620739831184.JavaMail.zimbra@zurichmedtech.com> Message-ID: <1088055923.720763.1620763698852.JavaMail.zimbra@zurichmedtech.com> Hi Pierre, Thanks for your tip. I don't have access to MATLAB. Would Octave also work? Or Python? Regards, Frederico. Dr. Frederico Teixeira Computational Modeler and Software Developer, [ http://www.itis.swiss/ | ZMT ] (member of [ https://www.z43.swiss/ | Zurich43 ] ) P +41 44 245 9698 Zeughausstrasse 43, 8004 Zurich, Switzerland From: "Pierre Jolivet" To: "Frederico Teixeira" Cc: "petsc-users" Sent: Tuesday, May 11, 2021 6:03:20 PM Subject: Re: [petsc-users] Binary format in real vs. complex scalar type configurations Hello Frederico, I?m not sure that?s possible. Here is what I do, it makes me sick, but mixing precisions/scalar types with PETSc is difficult (crossing my fingers this will be better with future). In MATLAB (after putting petsc/share/petsc/matlab in the path): A = PetscBinaryRead('your_binary_mat_with_re+im.dat','complex',true); % scalar-type=complex PetscBinaryWrite('re.dat',real(A)); % scalar-type=real PetscBinaryWrite('im.dat',imag(A)); % scalar-type=real Thanks, Pierre On 11 May 2021, at 3:30 PM, Frederico Teixeira < [ mailto:teixeira at zmt.swiss | teixeira at zmt.swiss ] > wrote: Dear fellows, I hope this message finds you safe and well. I have a complex-valued matrix and its real/imaginary components in binary format . They were extracted from a solver that only works with "scalar-type=complex" configuration. I am getting weird results when I load them into a small test program that's configured with "scalar-type=real", but I believe this is expected. At the end of the day, I would like to have both real and imaginary components as real-valued matrices. Is it possible to do it? I want to test preconditioners that are tailored for this sort of problem. Regards, Frederico. -------------- next part -------------- An HTML attachment was scrubbed... URL: From teixeira at zmt.swiss Tue May 11 15:19:30 2021 From: teixeira at zmt.swiss (Frederico Teixeira) Date: Tue, 11 May 2021 22:19:30 +0200 (CEST) Subject: [petsc-users] Binary format in real vs. complex scalar type configurations In-Reply-To: References: <1589889972.662636.1620739831184.JavaMail.zimbra@zurichmedtech.com> Message-ID: <22091036.721336.1620764370375.JavaMail.zimbra@zurichmedtech.com> Hi Matt, Thanks for your tip. I will take a look at the "MatLoad()" implementation tomorrow and return to you to discuss it further. It sounds like a good strategy. Regards, Frederico. Dr. Frederico Teixeira Computational Modeler and Software Developer, [ http://www.itis.swiss/ | ZMT ] (member of [ https://www.z43.swiss/ | Zurich43 ] ) P +41 44 245 9698 Zeughausstrasse 43, 8004 Zurich, Switzerland From: "Matthew Knepley" To: "Pierre Jolivet" Cc: "Frederico Teixeira" , "petsc-users" Sent: Tuesday, May 11, 2021 6:26:46 PM Subject: Re: [petsc-users] Binary format in real vs. complex scalar type configurations On Tue, May 11, 2021 at 12:03 PM Pierre Jolivet < [ mailto:pierre at joliv.et | pierre at joliv.et ] > wrote: Hello Frederico, I?m not sure that?s possible. Here is what I do, it makes me sick, but mixing precisions/scalar types with PETSc is difficult (crossing my fingers this will be better with future). In MATLAB (after putting petsc/share/petsc/matlab in the path): A = PetscBinaryRead('your_binary_mat_with_re+im.dat','complex',true); % scalar-type=complex PetscBinaryWrite('re.dat',real(A)); % scalar-type=real PetscBinaryWrite('im.dat',imag(A)); % scalar-type=real So what you want to happen is that MatLoad() looks at the datatype, sees that it is complex and PetscScalar is real, and returns two matrices with the real and imaginary parts? The hard part is that the MatLoad interface returns a single matrix. I guess we could have a flag that says what to do with complex numbers (read real, read imaginary, read norm, etc.) and you could read it twice. Would that work? Thanks, Matt BQ_BEGIN Thanks, Pierre BQ_BEGIN On 11 May 2021, at 3:30 PM, Frederico Teixeira < [ mailto:teixeira at zmt.swiss | teixeira at zmt.swiss ] > wrote: Dear fellows, I hope this message finds you safe and well. I have a complex-valued matrix and its real/imaginary components in binary format . They were extracted from a solver that only works with "scalar-type=complex" configuration. I am getting weird results when I load them into a small test program that's configured with "scalar-type=real", but I believe this is expected. At the end of the day, I would like to have both real and imaginary components as real-valued matrices. Is it possible to do it? I want to test preconditioners that are tailored for this sort of problem. Regards, Frederico. BQ_END BQ_END -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener [ http://www.cse.buffalo.edu/~knepley/ | https://www.cse.buffalo.edu/~knepley/ ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue May 11 15:59:15 2021 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 11 May 2021 15:59:15 -0500 Subject: [petsc-users] Binary format in real vs. complex scalar type configurations In-Reply-To: <1088055923.720763.1620763698852.JavaMail.zimbra@zurichmedtech.com> References: <1589889972.662636.1620739831184.JavaMail.zimbra@zurichmedtech.com> <1088055923.720763.1620763698852.JavaMail.zimbra@zurichmedtech.com> Message-ID: <328470E9-6512-4A5C-BB50-46EE11A05B0E@petsc.dev> > On May 11, 2021, at 3:08 PM, Frederico Teixeira wrote: > > Hi Pierre, > > Thanks for your tip. > I don't have access to MATLAB. Would Octave also work? Or Python? Yes, they should both be fine. All you need to is split the matrix into the 2 matrices in Octave or Python and then save both of them with the PETSc real format configure. Barry > > Regards, > Frederico. > > Dr. Frederico Teixeira > Computational Modeler and Software Developer, ZMT (member of Zurich43 ) > > P +41 44 245 9698 > Zeughausstrasse 43, 8004 Zurich, Switzerland > > > From: "Pierre Jolivet" > To: "Frederico Teixeira" > Cc: "petsc-users" > Sent: Tuesday, May 11, 2021 6:03:20 PM > Subject: Re: [petsc-users] Binary format in real vs. complex scalar type configurations > > Hello Frederico, > I?m not sure that?s possible. > Here is what I do, it makes me sick, but mixing precisions/scalar types with PETSc is difficult (crossing my fingers this will be better with future). > In MATLAB (after putting petsc/share/petsc/matlab in the path): > A = PetscBinaryRead('your_binary_mat_with_re+im.dat','complex',true); % scalar-type=complex > PetscBinaryWrite('re.dat',real(A)); % scalar-type=real > PetscBinaryWrite('im.dat',imag(A)); % scalar-type=real > > Thanks, > Pierre > > On 11 May 2021, at 3:30 PM, Frederico Teixeira > wrote: > > Dear fellows, > > I hope this message finds you safe and well. > > I have a complex-valued matrix and its real/imaginary components in binary format. They were extracted from a solver that only works with "scalar-type=complex" configuration. > I am getting weird results when I load them into a small test program that's configured with "scalar-type=real", but I believe this is expected. > At the end of the day, I would like to have both real and imaginary components as real-valued matrices. > Is it possible to do it? I want to test preconditioners that are tailored for this sort of problem. > > Regards, > Frederico. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Tue May 11 16:36:19 2021 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 11 May 2021 17:36:19 -0400 Subject: [petsc-users] Binary format in real vs. complex scalar type configurations In-Reply-To: <721568093.720617.1620763419373.JavaMail.zimbra@zurichmedtech.com> References: <1589889972.662636.1620739831184.JavaMail.zimbra@zurichmedtech.com> <721568093.720617.1620763419373.JavaMail.zimbra@zurichmedtech.com> Message-ID: On Tue, May 11, 2021 at 4:03 PM Frederico Teixeira wrote: > Hi Mark, > > I think so. The complex-valued linear system Ax = b, with A=T+Wi, u=x+yi, > b=p+qi (i is the unit complex) can be transformed into a 2x2 real-valued > system: > [ T -W ] (x) = (p) > [ W T ] (y) (q) > Yes, this is ERF. > > Perhaps a bit more background can clarify things. System A stems from a > Poisson equation with complex coefficients. There are ~130 piecewise > constant coefficients ranging across 5 orders of magnitude. > I assume these are big jumps, are they? You might be able to use GAMG or Hypre and tune the threshold to "see" these jumps if they are big enough. In general large random coefficient jumps are hard with generic AMG. You have 130 patches so a coarse grid could in theory resolve this structure with a not too large "grid" This may not be clear, but I would start with a real valued version of this and see if you can tune Hypre or GAMG to work OK. This could be done with, as I said, tuning the threshold parameter so that coarsening "sees" your jumps and respects them. Next, use a coarse grid solve that is big enough to capture the geometry of your jumps, approximately, where an exact solver is used and it does not care about jumps. As far as ERF, if you can translate it into a (complex) PetscScalar matrix somehow then you can use GAMG. I don't know about Hypre. Or, I have used ERF with some success. A long time ago. You want to interleave the real and complex parts. How big is |W| / |T| ? If this is small it should work OK, but it is hard to be sure. > Most of the KSPs and PCs I managed to test take between 6000-9000 > iterations to reach 10^-9. I hope to get some speed up with preconditioners > that exploit the structure above. > > Regards, > Frederico. > > Dr. Frederico Teixeira > Computational Modeler and Software Developer, ZMT (member > of Zurich43 ) > > P +41 44 245 9698 > Zeughausstrasse 43, 8004 Zurich, Switzerland > > ------------------------------ > *From: *"Mark Adams" > *To: *"Frederico Teixeira" > *Cc: *"petsc-users" > *Sent: *Tuesday, May 11, 2021 5:16:35 PM > *Subject: *Re: [petsc-users] Binary format in real vs. complex scalar > type configurations > > > > On Tue, May 11, 2021 at 9:30 AM Frederico Teixeira > wrote: > >> Dear fellows, >> >> I hope this message finds you safe and well. >> >> I have a complex-valued matrix and its real/imaginary components in binary >> format. They were extracted from a solver that only works with >> "scalar-type=complex" configuration. >> I am getting weird results when I load them into a small test program >> that's configured with "scalar-type=real", but I believe this is expected. >> At the end of the day, I would like to have both real and imaginary >> components as real-valued matrices. >> Is it possible to do it? I want to test preconditioners that are >> tailored for this sort of problem. >> > > Do you mean you want what is called "equivalent real form" where the real > and complex parts are stored as type 'double' for example and operations > like multiply take two pairs of doubles, do a complex multiply manually, > and return a real/complex pair of doubles? > > >> >> Regards, >> Frederico. >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Wed May 12 01:17:21 2021 From: pierre at joliv.et (Pierre Jolivet) Date: Wed, 12 May 2021 08:17:21 +0200 Subject: [petsc-users] Binary format in real vs. complex scalar type configurations In-Reply-To: References: <1589889972.662636.1620739831184.JavaMail.zimbra@zurichmedtech.com> Message-ID: <8039A166-5324-42D0-AF66-3A3A6A84066A@joliv.et> > On 11 May 2021, at 6:26 PM, Matthew Knepley wrote: > > On Tue, May 11, 2021 at 12:03 PM Pierre Jolivet > wrote: > Hello Frederico, > I?m not sure that?s possible. > Here is what I do, it makes me sick, but mixing precisions/scalar types with PETSc is difficult (crossing my fingers this will be better with future). > In MATLAB (after putting petsc/share/petsc/matlab in the path): > A = PetscBinaryRead('your_binary_mat_with_re+im.dat','complex',true); % scalar-type=complex > PetscBinaryWrite('re.dat',real(A)); % scalar-type=real > PetscBinaryWrite('im.dat',imag(A)); % scalar-type=real > > So what you want to happen is that MatLoad() looks at the datatype, sees that it is complex and PetscScalar is real, and returns two matrices with the real and imaginary parts? > > The hard part is that the MatLoad interface returns a single matrix. There is Mat[Real,Imaginary]Part(), maybe there could be Mat[Real,Imaginary]PartLoad()? It will be inefficient (complex Mat read twice in a very naive implementation), but functional and would not require tinkering with the options (but maybe you had something clear in mind). Also, just for reference, https://gitlab.com/petsc/petsc/-/issues/901 . Frederico, as Barry wrote, it will work with Octave or Python. In fact, it will work in any code that does not include petsc.h or link to libpetsc. If you want to stick to C, I think you could simply copy/paste the MatLoad_SeqAIJ_Binary() implementation https://www.mcs.anl.gov/petsc/petsc-3.15.0/src/mat/impls/aij/seq/aij.c.html#line4811 and replace PETSC_SCALAR line 4861 by PETSC_COMPLEX. Then, as Matt wrote, instead of assembling a single Mat, assemble two Mats by splitting a->a in two PetscScalar arrays (remember that this is for your scalar-type=real configuration). Thanks, Pierre > I guess we could have a flag that says what to do with complex numbers (read real, read imaginary, read norm, etc.) > and you could read it twice. Would that work? > > Thanks, > > Matt > > Thanks, > Pierre > >> On 11 May 2021, at 3:30 PM, Frederico Teixeira > wrote: >> >> Dear fellows, >> >> I hope this message finds you safe and well. >> >> I have a complex-valued matrix and its real/imaginary components in binary format. They were extracted from a solver that only works with "scalar-type=complex" configuration. >> I am getting weird results when I load them into a small test program that's configured with "scalar-type=real", but I believe this is expected. >> At the end of the day, I would like to have both real and imaginary components as real-valued matrices. >> Is it possible to do it? I want to test preconditioners that are tailored for this sort of problem. >> >> Regards, >> Frederico. >> > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 12 06:35:55 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 12 May 2021 07:35:55 -0400 Subject: [petsc-users] Binary format in real vs. complex scalar type configurations In-Reply-To: <8039A166-5324-42D0-AF66-3A3A6A84066A@joliv.et> References: <1589889972.662636.1620739831184.JavaMail.zimbra@zurichmedtech.com> <8039A166-5324-42D0-AF66-3A3A6A84066A@joliv.et> Message-ID: On Wed, May 12, 2021 at 2:17 AM Pierre Jolivet wrote: > > > On 11 May 2021, at 6:26 PM, Matthew Knepley wrote: > > On Tue, May 11, 2021 at 12:03 PM Pierre Jolivet wrote: > >> Hello Frederico, >> I?m not sure that?s possible. >> Here is what I do, it makes me sick, but mixing precisions/scalar types >> with PETSc is difficult (crossing my fingers this will be better with >> future). >> In MATLAB (after putting petsc/share/petsc/matlab in the path): >> A = PetscBinaryRead('your_binary_mat_with_re+im.dat','complex',true); % >> scalar-type=complex >> PetscBinaryWrite('re.dat',real(A)); % scalar-type=real >> PetscBinaryWrite('im.dat',imag(A)); % scalar-type=real >> > > So what you want to happen is that MatLoad() looks at the datatype, sees > that it is complex and PetscScalar is real, and returns two matrices with > the real and imaginary parts? > > The hard part is that the MatLoad interface returns a single matrix. > > > There is Mat[Real,Imaginary]Part(), maybe there could be > Mat[Real,Imaginary]PartLoad()? > It will be inefficient (complex Mat read twice in a very naive > implementation), but functional and would not require tinkering with the > options (but maybe you had something clear in mind). > Also, just for reference, https://gitlab.com/petsc/petsc/-/issues/901. > > Frederico, as Barry wrote, it will work with Octave or Python. In fact, it > will work in any code that does not include petsc.h or link to libpetsc. > If you want to stick to C, I think you could simply copy/paste the > MatLoad_SeqAIJ_Binary() implementation > https://www.mcs.anl.gov/petsc/petsc-3.15.0/src/mat/impls/aij/seq/aij.c.html#line4811 > and > replace PETSC_SCALAR line 4861 by PETSC_COMPLEX. > Then, as Matt wrote, instead of assembling a single Mat, assemble two Mats > by splitting a->a in two PetscScalar arrays (remember that this is for your > scalar-type=real configuration). > Should this make two matrices, or the equivalent real matrix? / R Q \ \ -Q R / THanks, Matt > Thanks, > Pierre > > I guess we could have a flag that says what to do with complex numbers > (read real, read imaginary, read norm, etc.) > and you could read it twice. Would that work? > > Thanks, > > Matt > > >> Thanks, >> Pierre >> >> On 11 May 2021, at 3:30 PM, Frederico Teixeira >> wrote: >> >> Dear fellows, >> >> I hope this message finds you safe and well. >> >> I have a complex-valued matrix and its real/imaginary components in binary >> format. They were extracted from a solver that only works with >> "scalar-type=complex" configuration. >> I am getting weird results when I load them into a small test program >> that's configured with "scalar-type=real", but I believe this is expected. >> At the end of the day, I would like to have both real and imaginary >> components as real-valued matrices. >> Is it possible to do it? I want to test preconditioners that are >> tailored for this sort of problem. >> >> Regards, >> Frederico. >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From snailsoar at hotmail.com Wed May 12 07:06:56 2021 From: snailsoar at hotmail.com (feng wang) Date: Wed, 12 May 2021 12:06:56 +0000 Subject: [petsc-users] Questions on matrix-free GMRES implementation In-Reply-To: References: <584E6514-C3C6-469B-A256-5470811D8D52@petsc.dev> <38509E59-D27A-47C6-8D97-EAAEBFC15FBF@petsc.dev> <5B018B57-B679-4015-8097-042B7C6B9D38@petsc.dev> <151FDDB8-2384-4A3E-9B17-45318E2CC7CC@petsc.dev> , <1599C26D-14C3-4EA7-9CD3-F0526F098AD6@petsc.dev>, Message-ID: Hi Barry, I have implemented my matrix-free GMRES in parallel. There is a small difference in the residuals (please see the figure below) when I vary the number of cores. But they all converged in the end. I have tried to multiply my shell matrix or preconditoning matrix with a vector in parallel, I got same values as a serial run. so I believe halo exchange is ok for matrix-vector multiplications. I am using ASM preconditioner with overlap set to 1. Is this behaviour in parallel normal for ASM pre-conditioners? If it is not, then I know I need to find a bug in my code. Thanks, Feng [cid:c9f5f5bf-c1fc-4350-bf3d-0ff072b884e9] ________________________________ From: petsc-users on behalf of feng wang Sent: 28 March 2021 22:05 To: Barry Smith Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation Hi Barry, Thanks for your comments. I will try that. Thanks, Feng ________________________________ From: Barry Smith Sent: 26 March 2021 23:44 To: feng wang Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation On Mar 25, 2021, at 6:39 PM, feng wang > wrote: Hi Barry, Thanks for your comments. I will renumber the cells in the way as you recommended. I went through the manual again and understand how to update the halo elements for my shell matrix routine "mymult(Mat m ,Vec x, Vec y)". I can use the global index of ghost cells for each rank and "Vec x" to get the ghost values for each rank via scattering. It should be similar to the example in page 40 in the manual. One more question, I also have an assembled approximate Jacobian matrix for pre-conditioning GMRES. If I re-number the cells properly as your suggested, I don't need to worry about communication and petsc will handle it properly together with my shell-matrix? If you assembly the approximate Jaocobian using the "new" ordering then it will reflect the same function evaluation and matrix free operators so should be ok. Barry Thanks, Feng ________________________________ From: Barry Smith > Sent: 25 March 2021 0:03 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation On Mar 24, 2021, at 5:08 AM, feng wang > wrote: Hi Barry, Thanks for your comments. It's very helpful. For your comments, I have a bit more questions 1. for your 1st comment " Yes, in some sense. So long as each process ....". * If I understand it correctly (hopefully) a parallel vector in petsc can hold discontinuous rows of data in a global array. If this is true, If I call "VecGetArray", it would create a copy in a continuous space if the data is not continuous, do some operations and petsc will figure out how to put updated values back to the right place in the global array? * This would generate an overhead. If I do the renumbering to make each process hold continuous rows, this overhead can be avoided when I call "VecGetArray"? GetArray does nothing except return the pointer to the data in the vector. It does not copy anything or reorder anything. Whatever order the numbers are in vector they are in the same order as in the array you obtain with VecGetArray. 1. for your 2nd comment " The matrix and vectors the algebraic solvers see DO NOT have......." For the callback function of my shell matrix "mymult(Mat m ,Vec x, Vec y)", I need to get "x" for the halo elements to compute the non-linear function. My code will take care of other halo exchanges, but I am not sure how to use petsc to get the halo elements "x" in the shell matrix, could you please elaborate on this? some related examples or simple pesudo code would be great. Basically all the parallel code in PETSc does this. How you need to set up the halo communication depends on how you are managing the assignment of degrees of freedom on each process and between processes. VecScatterCreate() is the tool you will use to tell PETSc how to get the correct values from one process to their halo-ed location on the process. It like everything in PETSc uses a number in the vectors of 0 ... n_0-1 on the first process, n_0, n_0+1, ... n_1-1 on the second etc. Since you are managing the partitioning and distribution of parallel data you must renumber the vector entry numbering in your data structures to match that shown above. Just do the numbering once after you have setup your distributed data and use it for the rest of the run. You might use the object from AOCreate to do the renumbering for you. Barry Thanks, Feng ________________________________ From: Barry Smith > Sent: 22 March 2021 1:28 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation On Mar 21, 2021, at 6:22 PM, feng wang > wrote: Hi Barry, Thanks for your help, I really appreciate it. In the end I used a shell matrix to compute the matrix-vector product, it is clearer to me and there are more things under my control. I am now trying to do a parallel implementation, I have some questions on setting up parallel matrices and vectors for a user-defined partition, could you please provide some advice? Suppose I have already got a partition for 2 CPUs. Each cpu is assigned a list of elements and also their halo elements. 1. The global element index for each partition is not necessarily continuous, do I have to I re-order them to make them continuous? Yes, in some sense. So long as each process can march over ITS elements computing the function and Jacobian matrix-vector product it doesn't matter how you have named/numbered entries. But conceptually the first process has the first set of vector entries and the second the second set. 1. 2. When I set up the size of the matrix and vectors for each cpu, should I take into account the halo elements? The matrix and vectors the algebraic solvers see DO NOT have halo elements in their sizes. You will likely need a halo-ed work vector to do the matrix-free multiply from. The standard model is use VecScatterBegin/End to get the values from the non-halo-ed algebraic vector input to MatMult into a halo-ed one to do the local product. 1. In my serial version, when I initialize my RHS vector, I am not using VecSetValues, Instead I use VecGetArray/VecRestoreArray to assign the values. VecAssemblyBegin()/VecAssemblyEnd() is never used. would this still work for a parallel version? Yes, you can use Get/Restore but the input vector x will need to be, as noted above, scattered into a haloed version to get all the entries you will need to do the local part of the product. Thanks, Feng ________________________________ From: Barry Smith > Sent: 12 March 2021 23:40 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation On Mar 12, 2021, at 9:37 AM, feng wang > wrote: Hi Matt, Thanks for your prompt response. Below are my two versions. one is buggy and the 2nd one is working. For the first one, I add the diagonal contribution to the true RHS (variable: rhs) and then set the base point, the callback function is somehow called twice afterwards to compute Jacobian. Do you mean "to compute the Jacobian matrix-vector product?" Is it only in the first computation of the product (for the given base vector) that it calls it twice or every matrix-vector product? It is possible there is a bug in our logic; run in the debugger with a break point in FormFunction_mf and each time the function is hit in the debugger type where or bt to get the stack frames from the calls. Send this. From this we can all see if it is being called excessively and why. For the 2nd one, I just call the callback function manually to recompute everything, the callback function is then called once as expected to compute the Jacobian. For me, both versions should do the same things. but I don't know why in the first one the callback function is called twice after I set the base point. what could possibly go wrong? The logic of how it is suppose to work is shown below. Thanks, Feng //This does not work fld->cnsv( iqs,iqe, q, aux, csv ); //add contribution of time-stepping for(iv=0; ivcnsv( iqs,iqe, q, aux, csv ); ierr = petsc_setcsv(petsc_csv); CHKERRQ(ierr); ierr = FormFunction_mf(this, petsc_csv, petsc_baserhs); //this is my callback function, now call it manually ierr = MatMFFDSetBase(petsc_A_mf, petsc_csv, petsc_baserhs); CHKERRQ(ierr); Since you provide petsc_baserhs MatMFFD assumes (naturally) that you will keep the correct values in it. Hence for each new base value YOU need to compute the new values in petsc_baserhs. This approach gives you a bit more control over reusing the information in petsc_baserhs. If you would prefer that MatMFFD recomputes the base values, as needed, then you call FormFunction_mf(this, petsc_csv, NULL); and PETSc will allocate a vector and fill it up as needed by calling your FormFunction_mf() But you need to call MatAssemblyBegin/End each time you the base input vector this, petsc_csv values change. For example MatAssemblyBegin(petsc_A_mf,...) MatAssemblyEnd(petsc_A_mf,...) KSPSolve() ________________________________ From: Matthew Knepley > Sent: 12 March 2021 15:08 To: feng wang > Cc: Barry Smith >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation On Fri, Mar 12, 2021 at 9:55 AM feng wang > wrote: Hi Mat, Thanks for your reply. I will try the parallel implementation. I've got a serial matrix-free GMRES working, but I would like to know why my initial version of matrix-free implementation does not work and there is still something I don't understand. I did some debugging and find that the callback function to compute the RHS for the matrix-free matrix is called twice by Petsc when it computes the finite difference Jacobian, but it should only be called once. I don't know why, could you please give some advice? F is called once to calculate the base point and once to get the perturbation. The base point is not recalculated, so if you do many iterates, it is amortized. Thanks, Matt Thanks, Feng ________________________________ From: Matthew Knepley > Sent: 12 March 2021 12:05 To: feng wang > Cc: Barry Smith >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation On Fri, Mar 12, 2021 at 6:02 AM feng wang > wrote: Hi Barry, Thanks for your advice. You are right on this. somehow there is some inconsistency when I compute the right hand side (true RHS + time-stepping contribution to the diagonal matrix) to compute the finite difference Jacobian. If I just use the call back function to recompute my RHS before I call MatMFFDSetBase, then it works like a charm. But now I end up with computing my RHS three times. 1st time is to compute the true RHS, the rest two is for computing finite difference Jacobian. In my previous buggy version, I only compute RHS twice. If possible, could you elaborate on your comments "Also be careful about petsc_baserhs", so I may possibly understand what was going on with my buggy version. Our FD implementation is simple. It approximates the action of the Jacobian as J(b) v = (F(b + h v) - F(b)) / h ||v|| where h is some small parameter and b is the base vector, namely the one that you are linearizing around. In a Newton step, b is the previous solution and v is the proposed solution update. Besides, for a parallel implementation, my code already has its own partition method, is it possible to allow petsc read in a user-defined partition? if not what is a better way to do this? Sure https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecSetSizes.html Thanks, Matt Many thanks, Feng ________________________________ From: Barry Smith > Sent: 11 March 2021 22:15 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation Feng, The first thing to check is that for each linear solve that involves a new operator (values in the base vector) the MFFD matrix knows it is using a new operator. The easiest way is to call MatMFFDSetBase() before each solve that involves a new operator (new values in the base vector). Also be careful about petsc_baserhs, when you change the base vector's values you also need to change the petsc_baserhs values to the function evaluation at that point. If that is correct I would check with a trivial function evaluator to make sure the infrastructure is all set up correctly. For examples use for the matrix free a 1 4 1 operator applied matrix free. Barry On Mar 11, 2021, at 7:35 AM, feng wang > wrote: Dear All, I am new to petsc and trying to implement a matrix-free GMRES. I have assembled an approximate Jacobian matrix just for preconditioning. After reading some previous questions on this topic, my approach is: the matrix-free matrix is created as: ierr = MatCreateMFFD(*A_COMM_WORLD, iqe*blocksize, iqe*blocksize, PETSC_DETERMINE, PETSC_DETERMINE, &petsc_A_mf); CHKERRQ(ierr); ierr = MatMFFDSetFunction(petsc_A_mf, FormFunction_mf, this); CHKERRQ(ierr); KSP linear operator is set up as: ierr = KSPSetOperators(petsc_ksp, petsc_A_mf, petsc_A_pre); CHKERRQ(ierr); //petsc_A_pre is my assembled pre-conditioning matrix Before calling KSPSolve, I do: ierr = MatMFFDSetBase(petsc_A_mf, petsc_csv, petsc_baserhs); CHKERRQ(ierr); //petsc_csv is the flow states, petsc_baserhs is the pre-computed right hand side The call back function is defined as: PetscErrorCode cFdDomain::FormFunction_mf(void *ctx, Vec in_vec, Vec out_vec) { PetscErrorCode ierr; cFdDomain *user_ctx; cout << "FormFunction_mf called\n"; //in_vec: flow states //out_vec: right hand side + diagonal contributions from CFL number user_ctx = (cFdDomain*)ctx; //get perturbed conservative variables from petsc user_ctx->petsc_getcsv(in_vec); //get new right side user_ctx->petsc_fd_rhs(); //set new right hand side to the output vector user_ctx->petsc_setrhs(out_vec); ierr = 0; return ierr; } The linear system I am solving is (J+D)x=RHS. J is the Jacobian matrix. D is a diagonal matrix and it is used to stabilise the solution at the start but reduced gradually when the solution moves on to recover Newton's method. I add D*x to the true right side when non-linear function is computed to work out finite difference Jacobian, so when finite difference is used, it actually computes (J+D)*dx. The code runs but diverges in the end. If I don't do matrix-free and use my approximate Jacobian matrix, GMRES works. So something is wrong with my matrix-free implementation. Have I missed something in my implementation? Besides, is there a way to check if the finite difference Jacobian matrix is computed correctly in a matrix-free implementation? Thanks for your help in advance. Feng -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tmp.png Type: image/png Size: 16572 bytes Desc: tmp.png URL: From teixeira at zmt.swiss Wed May 12 08:14:13 2021 From: teixeira at zmt.swiss (Frederico Teixeira) Date: Wed, 12 May 2021 15:14:13 +0200 (CEST) Subject: [petsc-users] Binary format in real vs. complex scalar type configurations In-Reply-To: References: <1589889972.662636.1620739831184.JavaMail.zimbra@zurichmedtech.com> <8039A166-5324-42D0-AF66-3A3A6A84066A@joliv.et> Message-ID: <1159729513.829022.1620825253784.JavaMail.zimbra@zurichmedtech.com> Matt, I initially had in mind to output the real and imaginary matrices separately, instead of the equivalent real matrix. Pierre, if you think there is value on it, I can contribute an implementation of "Mat[Real,Imaginary]PartLoad()" as you suggested, instead of solving the problem on my side with Python etc.. Regards, Frederico. From: "Matthew Knepley" To: "Pierre Jolivet" Cc: "Frederico Teixeira" , "petsc-users" Sent: Wednesday, May 12, 2021 1:35:55 PM Subject: Re: [petsc-users] Binary format in real vs. complex scalar type configurations On Wed, May 12, 2021 at 2:17 AM Pierre Jolivet < [ mailto:pierre at joliv.et | pierre at joliv.et ] > wrote: BQ_BEGIN On 11 May 2021, at 6:26 PM, Matthew Knepley < [ mailto:knepley at gmail.com | knepley at gmail.com ] > wrote: On Tue, May 11, 2021 at 12:03 PM Pierre Jolivet < [ mailto:pierre at joliv.et | pierre at joliv.et ] > wrote: BQ_BEGIN Hello Frederico, I?m not sure that?s possible. Here is what I do, it makes me sick, but mixing precisions/scalar types with PETSc is difficult (crossing my fingers this will be better with future). In MATLAB (after putting petsc/share/petsc/matlab in the path): A = PetscBinaryRead('your_binary_mat_with_re+im.dat','complex',true); % scalar-type=complex PetscBinaryWrite('re.dat',real(A)); % scalar-type=real PetscBinaryWrite('im.dat',imag(A)); % scalar-type=real So what you want to happen is that MatLoad() looks at the datatype, sees that it is complex and PetscScalar is real, and returns two matrices with the real and imaginary parts? The hard part is that the MatLoad interface returns a single matrix. BQ_END There is Mat[Real,Imaginary]Part(), maybe there could be Mat[Real,Imaginary]PartLoad()? It will be inefficient (complex Mat read twice in a very naive implementation), but functional and would not require tinkering with the options (but maybe you had something clear in mind). Also, just for reference, [ https://gitlab.com/petsc/petsc/-/issues/901 | https://gitlab.com/petsc/petsc/-/issues/901 ] . Frederico, as Barry wrote, it will work with Octave or Python. In fact, it will work in any code that does not include petsc.h or link to libpetsc. If you want to stick to C, I think you could simply copy/paste the MatLoad_SeqAIJ_Binary() implementation [ https://www.mcs.anl.gov/petsc/petsc-current/src/mat/impls/aij/seq/aij.c.html#line4811 | https://www.mcs.anl.gov/petsc/petsc-3.15.0/src/mat/impls/aij/seq/aij.c.html#line4811 ] and replace PETSC_SCALAR line 4861 by PETSC_COMPLEX. Then, as Matt wrote, instead of assembling a single Mat, assemble two Mats by splitting a->a in two PetscScalar arrays (remember that this is for your scalar-type=real configuration). BQ_END Should this make two matrices, or the equivalent real matrix? / R Q \ \ -Q R / THanks, Matt BQ_BEGIN Thanks, Pierre BQ_BEGIN I guess we could have a flag that says what to do with complex numbers (read real, read imaginary, read norm, etc.) and you could read it twice. Would that work? Thanks, Matt BQ_BEGIN Thanks, Pierre BQ_BEGIN On 11 May 2021, at 3:30 PM, Frederico Teixeira < [ mailto:teixeira at zmt.swiss | teixeira at zmt.swiss ] > wrote: Dear fellows, I hope this message finds you safe and well. I have a complex-valued matrix and its real/imaginary components in binary format . They were extracted from a solver that only works with "scalar-type=complex" configuration. I am getting weird results when I load them into a small test program that's configured with "scalar-type=real", but I believe this is expected. At the end of the day, I would like to have both real and imaginary components as real-valued matrices. Is it possible to do it? I want to test preconditioners that are tailored for this sort of problem. Regards, Frederico. BQ_END BQ_END -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener [ http://www.cse.buffalo.edu/~knepley/ | https://www.cse.buffalo.edu/~knepley/ ] BQ_END BQ_END -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener [ http://www.cse.buffalo.edu/~knepley/ | https://www.cse.buffalo.edu/~knepley/ ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed May 12 12:11:11 2021 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 12 May 2021 12:11:11 -0500 Subject: [petsc-users] Questions on matrix-free GMRES implementation In-Reply-To: References: <584E6514-C3C6-469B-A256-5470811D8D52@petsc.dev> <38509E59-D27A-47C6-8D97-EAAEBFC15FBF@petsc.dev> <5B018B57-B679-4015-8097-042B7C6B9D38@petsc.dev> <151FDDB8-2384-4A3E-9B17-45318E2CC7CC@petsc.dev> <1599C26D-14C3-4EA7-9CD3-F0526F098AD6@petsc.dev> Message-ID: Is the the plot for KSP convergence with ASM preconditioning? Or just matrix-free multiply and no preconditioner? One normally expects the number of iterations to increase with more cores for ASM (especially from one to two ranks) because there is less coupling in the preconditioner the more blocks (ranks) you use. Barry > On May 12, 2021, at 7:06 AM, feng wang wrote: > > Hi Barry, > > I have implemented my matrix-free GMRES in parallel. There is a small difference in the residuals (please see the figure below) when I vary the number of cores. But they all converged in the end. I have tried to multiply my shell matrix or preconditoning matrix with a vector in parallel, I got same values as a serial run. so I believe halo exchange is ok for matrix-vector multiplications. > > I am using ASM preconditioner with overlap set to 1. Is this behaviour in parallel normal for ASM pre-conditioners? If it is not, then I know I need to find a bug in my code. > > Thanks, > Feng > > > > From: petsc-users > on behalf of feng wang > > Sent: 28 March 2021 22:05 > To: Barry Smith > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation > > Hi Barry, > Thanks for your comments. I will try that. > Thanks, > Feng > > From: Barry Smith > > Sent: 26 March 2021 23:44 > To: feng wang > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation > > > >> On Mar 25, 2021, at 6:39 PM, feng wang > wrote: >> >> Hi Barry, >> >> Thanks for your comments. >> >> I will renumber the cells in the way as you recommended. I went through the manual again and understand how to update the halo elements for my shell matrix routine "mymult(Mat m ,Vec x, Vec y)". I can use the global index of ghost cells for each rank and "Vec x" to get the ghost values for each rank via scattering. It should be similar to the example in page 40 in the manual. >> >> One more question, I also have an assembled approximate Jacobian matrix for pre-conditioning GMRES. If I re-number the cells properly as your suggested, I don't need to worry about communication and petsc will handle it properly together with my shell-matrix? > > If you assembly the approximate Jaocobian using the "new" ordering then it will reflect the same function evaluation and matrix free operators so should be ok. > > Barry > >> >> Thanks, >> Feng >> >> From: Barry Smith > >> Sent: 25 March 2021 0:03 >> To: feng wang > >> Cc: petsc-users at mcs.anl.gov > >> Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation >> >> >> >>> On Mar 24, 2021, at 5:08 AM, feng wang > wrote: >>> >>> Hi Barry, >>> >>> Thanks for your comments. It's very helpful. For your comments, I have a bit more questions >>> >>> for your 1st comment " Yes, in some sense. So long as each process ....". >>> If I understand it correctly (hopefully) a parallel vector in petsc can hold discontinuous rows of data in a global array. If this is true, If I call "VecGetArray", it would create a copy in a continuous space if the data is not continuous, do some operations and petsc will figure out how to put updated values back to the right place in the global array? >>> This would generate an overhead. If I do the renumbering to make each process hold continuous rows, this overhead can be avoided when I call "VecGetArray"? >> >> GetArray does nothing except return the pointer to the data in the vector. It does not copy anything or reorder anything. Whatever order the numbers are in vector they are in the same order as in the array you obtain with VecGetArray. >> >>> for your 2nd comment " The matrix and vectors the algebraic solvers see DO NOT have......." For the callback function of my shell matrix "mymult(Mat m ,Vec x, Vec y)", I need to get "x" for the halo elements to compute the non-linear function. My code will take care of other halo exchanges, but I am not sure how to use petsc to get the halo elements "x" in the shell matrix, could you please elaborate on this? some related examples or simple pesudo code would be great. >> Basically all the parallel code in PETSc does this. How you need to set up the halo communication depends on how you are managing the assignment of degrees of freedom on each process and between processes. VecScatterCreate() is the tool you will use to tell PETSc how to get the correct values from one process to their halo-ed location on the process. It like everything in PETSc uses a number in the vectors of 0 ... n_0-1 on the first process, n_0, n_0+1, ... n_1-1 on the second etc. Since you are managing the partitioning and distribution of parallel data you must renumber the vector entry numbering in your data structures to match that shown above. Just do the numbering once after you have setup your distributed data and use it for the rest of the run. You might use the object from AOCreate to do the renumbering for you. >> >> Barry >> >> >> >>> Thanks, >>> Feng >>> >>> From: Barry Smith > >>> Sent: 22 March 2021 1:28 >>> To: feng wang > >>> Cc: petsc-users at mcs.anl.gov > >>> Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation >>> >>> >>> >>>> On Mar 21, 2021, at 6:22 PM, feng wang > wrote: >>>> >>>> Hi Barry, >>>> >>>> Thanks for your help, I really appreciate it. >>>> >>>> In the end I used a shell matrix to compute the matrix-vector product, it is clearer to me and there are more things under my control. I am now trying to do a parallel implementation, I have some questions on setting up parallel matrices and vectors for a user-defined partition, could you please provide some advice? Suppose I have already got a partition for 2 CPUs. Each cpu is assigned a list of elements and also their halo elements. >>>> The global element index for each partition is not necessarily continuous, do I have to I re-order them to make them continuous? >>> >>> Yes, in some sense. So long as each process can march over ITS elements computing the function and Jacobian matrix-vector product it doesn't matter how you have named/numbered entries. But conceptually the first process has the first set of vector entries and the second the second set. >>>> >>>> When I set up the size of the matrix and vectors for each cpu, should I take into account the halo elements? >>> >>> The matrix and vectors the algebraic solvers see DO NOT have halo elements in their sizes. You will likely need a halo-ed work vector to do the matrix-free multiply from. The standard model is use VecScatterBegin/End to get the values from the non-halo-ed algebraic vector input to MatMult into a halo-ed one to do the local product. >>> >>>> In my serial version, when I initialize my RHS vector, I am not using VecSetValues, Instead I use VecGetArray/VecRestoreArray to assign the values. VecAssemblyBegin()/VecAssemblyEnd() is never used. would this still work for a parallel version? >>> >>> Yes, you can use Get/Restore but the input vector x will need to be, as noted above, scattered into a haloed version to get all the entries you will need to do the local part of the product. >>> >>> >>>> Thanks, >>>> Feng >>>> >>>> >>>> From: Barry Smith > >>>> Sent: 12 March 2021 23:40 >>>> To: feng wang > >>>> Cc: petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation >>>> >>>> >>>> >>>>> On Mar 12, 2021, at 9:37 AM, feng wang > wrote: >>>>> >>>>> Hi Matt, >>>>> >>>>> Thanks for your prompt response. >>>>> >>>>> Below are my two versions. one is buggy and the 2nd one is working. For the first one, I add the diagonal contribution to the true RHS (variable: rhs) and then set the base point, the callback function is somehow called twice afterwards to compute Jacobian. >>>> >>>> Do you mean "to compute the Jacobian matrix-vector product?" >>>> >>>> Is it only in the first computation of the product (for the given base vector) that it calls it twice or every matrix-vector product? >>>> >>>> It is possible there is a bug in our logic; run in the debugger with a break point in FormFunction_mf and each time the function is hit in the debugger type where or bt to get the stack frames from the calls. Send this. From this we can all see if it is being called excessively and why. >>>> >>>>> For the 2nd one, I just call the callback function manually to recompute everything, the callback function is then called once as expected to compute the Jacobian. For me, both versions should do the same things. but I don't know why in the first one the callback function is called twice after I set the base point. what could possibly go wrong? >>>> >>>> The logic of how it is suppose to work is shown below. >>>>> >>>>> Thanks, >>>>> Feng >>>>> >>>>> //This does not work >>>>> fld->cnsv( iqs,iqe, q, aux, csv ); >>>>> //add contribution of time-stepping >>>>> for(iv=0; iv>>>> { >>>>> for(iq=0; iq>>>> { >>>>> //use conservative variables here >>>>> rhs[iv][iq] = -rhs[iv][iq] + csv[iv][iq]*lhsa[nlhs-1][iq]/cfl; >>>>> } >>>>> } >>>>> ierr = petsc_setcsv(petsc_csv); CHKERRQ(ierr); >>>>> ierr = petsc_setrhs(petsc_baserhs); CHKERRQ(ierr); >>>>> ierr = MatMFFDSetBase(petsc_A_mf, petsc_csv, petsc_baserhs); CHKERRQ(ierr); >>>>> >>>>> //This works >>>>> fld->cnsv( iqs,iqe, q, aux, csv ); >>>>> ierr = petsc_setcsv(petsc_csv); CHKERRQ(ierr); >>>>> ierr = FormFunction_mf(this, petsc_csv, petsc_baserhs); //this is my callback function, now call it manually >>>>> ierr = MatMFFDSetBase(petsc_A_mf, petsc_csv, petsc_baserhs); CHKERRQ(ierr); >>>>> >>>>> >>>> Since you provide petsc_baserhs MatMFFD assumes (naturally) that you will keep the correct values in it. Hence for each new base value YOU need to compute the new values in petsc_baserhs. This approach gives you a bit more control over reusing the information in petsc_baserhs. >>>> >>>> If you would prefer that MatMFFD recomputes the base values, as needed, then you call FormFunction_mf(this, petsc_csv, NULL); and PETSc will allocate a vector and fill it up as needed by calling your FormFunction_mf() But you need to call MatAssemblyBegin/End each time you the base input vector this, petsc_csv values change. For example >>>> >>>> MatAssemblyBegin(petsc_A_mf,...) >>>> MatAssemblyEnd(petsc_A_mf,...) >>>> KSPSolve() >>>> >>>> >>>> >>>> >>>>> >>>>> >>>>> From: Matthew Knepley > >>>>> Sent: 12 March 2021 15:08 >>>>> To: feng wang > >>>>> Cc: Barry Smith >; petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation >>>>> >>>>> On Fri, Mar 12, 2021 at 9:55 AM feng wang > wrote: >>>>> Hi Mat, >>>>> >>>>> Thanks for your reply. I will try the parallel implementation. >>>>> >>>>> I've got a serial matrix-free GMRES working, but I would like to know why my initial version of matrix-free implementation does not work and there is still something I don't understand. I did some debugging and find that the callback function to compute the RHS for the matrix-free matrix is called twice by Petsc when it computes the finite difference Jacobian, but it should only be called once. I don't know why, could you please give some advice? >>>>> >>>>> F is called once to calculate the base point and once to get the perturbation. The base point is not recalculated, so if you do many iterates, it is amortized. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> Thanks, >>>>> Feng >>>>> >>>>> >>>>> >>>>> >>>>> From: Matthew Knepley > >>>>> Sent: 12 March 2021 12:05 >>>>> To: feng wang > >>>>> Cc: Barry Smith >; petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation >>>>> >>>>> On Fri, Mar 12, 2021 at 6:02 AM feng wang > wrote: >>>>> Hi Barry, >>>>> >>>>> Thanks for your advice. >>>>> >>>>> You are right on this. somehow there is some inconsistency when I compute the right hand side (true RHS + time-stepping contribution to the diagonal matrix) to compute the finite difference Jacobian. If I just use the call back function to recompute my RHS before I call MatMFFDSetBase, then it works like a charm. But now I end up with computing my RHS three times. 1st time is to compute the true RHS, the rest two is for computing finite difference Jacobian. >>>>> >>>>> In my previous buggy version, I only compute RHS twice. If possible, could you elaborate on your comments "Also be careful about petsc_baserhs", so I may possibly understand what was going on with my buggy version. >>>>> >>>>> Our FD implementation is simple. It approximates the action of the Jacobian as >>>>> >>>>> J(b) v = (F(b + h v) - F(b)) / h ||v|| >>>>> >>>>> where h is some small parameter and b is the base vector, namely the one that you are linearizing around. In a Newton step, b is the previous solution >>>>> and v is the proposed solution update. >>>>> >>>>> Besides, for a parallel implementation, my code already has its own partition method, is it possible to allow petsc read in a user-defined partition? if not what is a better way to do this? >>>>> >>>>> Sure >>>>> >>>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecSetSizes.html >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> Many thanks, >>>>> Feng >>>>> >>>>> >>>>> From: Barry Smith > >>>>> Sent: 11 March 2021 22:15 >>>>> To: feng wang > >>>>> Cc: petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation >>>>> >>>>> >>>>> Feng, >>>>> >>>>> The first thing to check is that for each linear solve that involves a new operator (values in the base vector) the MFFD matrix knows it is using a new operator. >>>>> >>>>> The easiest way is to call MatMFFDSetBase() before each solve that involves a new operator (new values in the base vector). Also be careful about petsc_baserhs, when you change the base vector's values you also need to change the petsc_baserhs values to the function evaluation at that point. >>>>> >>>>> If that is correct I would check with a trivial function evaluator to make sure the infrastructure is all set up correctly. For examples use for the matrix free a 1 4 1 operator applied matrix free. >>>>> >>>>> Barry >>>>> >>>>> >>>>>> On Mar 11, 2021, at 7:35 AM, feng wang > wrote: >>>>>> >>>>>> Dear All, >>>>>> >>>>>> I am new to petsc and trying to implement a matrix-free GMRES. I have assembled an approximate Jacobian matrix just for preconditioning. After reading some previous questions on this topic, my approach is: >>>>>> >>>>>> the matrix-free matrix is created as: >>>>>> >>>>>> ierr = MatCreateMFFD(*A_COMM_WORLD, iqe*blocksize, iqe*blocksize, PETSC_DETERMINE, PETSC_DETERMINE, &petsc_A_mf); CHKERRQ(ierr); >>>>>> ierr = MatMFFDSetFunction(petsc_A_mf, FormFunction_mf, this); CHKERRQ(ierr); >>>>>> >>>>>> KSP linear operator is set up as: >>>>>> >>>>>> ierr = KSPSetOperators(petsc_ksp, petsc_A_mf, petsc_A_pre); CHKERRQ(ierr); //petsc_A_pre is my assembled pre-conditioning matrix >>>>>> >>>>>> Before calling KSPSolve, I do: >>>>>> >>>>>> ierr = MatMFFDSetBase(petsc_A_mf, petsc_csv, petsc_baserhs); CHKERRQ(ierr); //petsc_csv is the flow states, petsc_baserhs is the pre-computed right hand side >>>>>> >>>>>> The call back function is defined as: >>>>>> >>>>>> PetscErrorCode cFdDomain::FormFunction_mf(void *ctx, Vec in_vec, Vec out_vec) >>>>>> { >>>>>> PetscErrorCode ierr; >>>>>> cFdDomain *user_ctx; >>>>>> >>>>>> cout << "FormFunction_mf called\n"; >>>>>> >>>>>> //in_vec: flow states >>>>>> //out_vec: right hand side + diagonal contributions from CFL number >>>>>> >>>>>> user_ctx = (cFdDomain*)ctx; >>>>>> >>>>>> //get perturbed conservative variables from petsc >>>>>> user_ctx->petsc_getcsv(in_vec); >>>>>> >>>>>> //get new right side >>>>>> user_ctx->petsc_fd_rhs(); >>>>>> >>>>>> //set new right hand side to the output vector >>>>>> user_ctx->petsc_setrhs(out_vec); >>>>>> >>>>>> ierr = 0; >>>>>> return ierr; >>>>>> } >>>>>> >>>>>> The linear system I am solving is (J+D)x=RHS. J is the Jacobian matrix. D is a diagonal matrix and it is used to stabilise the solution at the start but reduced gradually when the solution moves on to recover Newton's method. I add D*x to the true right side when non-linear function is computed to work out finite difference Jacobian, so when finite difference is used, it actually computes (J+D)*dx. >>>>>> >>>>>> The code runs but diverges in the end. If I don't do matrix-free and use my approximate Jacobian matrix, GMRES works. So something is wrong with my matrix-free implementation. Have I missed something in my implementation? Besides, is there a way to check if the finite difference Jacobian matrix is computed correctly in a matrix-free implementation? >>>>>> >>>>>> Thanks for your help in advance. >>>>>> Feng >>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From snailsoar at hotmail.com Wed May 12 12:40:47 2021 From: snailsoar at hotmail.com (feng wang) Date: Wed, 12 May 2021 17:40:47 +0000 Subject: [petsc-users] Questions on matrix-free GMRES implementation In-Reply-To: References: <584E6514-C3C6-469B-A256-5470811D8D52@petsc.dev> <38509E59-D27A-47C6-8D97-EAAEBFC15FBF@petsc.dev> <5B018B57-B679-4015-8097-042B7C6B9D38@petsc.dev> <151FDDB8-2384-4A3E-9B17-45318E2CC7CC@petsc.dev> <1599C26D-14C3-4EA7-9CD3-F0526F098AD6@petsc.dev> , Message-ID: "Is the the plot for KSP convergence with ASM preconditioning? Or just matrix-free multiply and no preconditioner? " The plot shows the non-linear residual, which is the L2 norm of my right hand side, against number of non-linear iterations. I am using GMRES preconditioned with ASM. From your comments, If I understand it correctly, my non-linear residual history would change with number of cpus which I use because of ASM? "One normally expects the number of iterations to increase with more cores for ASM (especially from one to two ranks) because there is less coupling in the preconditioner the more blocks (ranks) you use". somehow I did not see a big difference in my non-linear residual when I use 1, 2, 3 cpus in this case. I can try a different case and check. Thanks, Feng ________________________________ From: Barry Smith Sent: 12 May 2021 17:11 To: feng wang Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation Is the the plot for KSP convergence with ASM preconditioning? Or just matrix-free multiply and no preconditioner? One normally expects the number of iterations to increase with more cores for ASM (especially from one to two ranks) because there is less coupling in the preconditioner the more blocks (ranks) you use. Barry On May 12, 2021, at 7:06 AM, feng wang > wrote: Hi Barry, I have implemented my matrix-free GMRES in parallel. There is a small difference in the residuals (please see the figure below) when I vary the number of cores. But they all converged in the end. I have tried to multiply my shell matrix or preconditoning matrix with a vector in parallel, I got same values as a serial run. so I believe halo exchange is ok for matrix-vector multiplications. I am using ASM preconditioner with overlap set to 1. Is this behaviour in parallel normal for ASM pre-conditioners? If it is not, then I know I need to find a bug in my code. Thanks, Feng ________________________________ From: petsc-users > on behalf of feng wang > Sent: 28 March 2021 22:05 To: Barry Smith > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation Hi Barry, Thanks for your comments. I will try that. Thanks, Feng ________________________________ From: Barry Smith > Sent: 26 March 2021 23:44 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation On Mar 25, 2021, at 6:39 PM, feng wang > wrote: Hi Barry, Thanks for your comments. I will renumber the cells in the way as you recommended. I went through the manual again and understand how to update the halo elements for my shell matrix routine "mymult(Mat m ,Vec x, Vec y)". I can use the global index of ghost cells for each rank and "Vec x" to get the ghost values for each rank via scattering. It should be similar to the example in page 40 in the manual. One more question, I also have an assembled approximate Jacobian matrix for pre-conditioning GMRES. If I re-number the cells properly as your suggested, I don't need to worry about communication and petsc will handle it properly together with my shell-matrix? If you assembly the approximate Jaocobian using the "new" ordering then it will reflect the same function evaluation and matrix free operators so should be ok. Barry Thanks, Feng ________________________________ From: Barry Smith > Sent: 25 March 2021 0:03 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation On Mar 24, 2021, at 5:08 AM, feng wang > wrote: Hi Barry, Thanks for your comments. It's very helpful. For your comments, I have a bit more questions 1. for your 1st comment " Yes, in some sense. So long as each process ....". * If I understand it correctly (hopefully) a parallel vector in petsc can hold discontinuous rows of data in a global array. If this is true, If I call "VecGetArray", it would create a copy in a continuous space if the data is not continuous, do some operations and petsc will figure out how to put updated values back to the right place in the global array? * This would generate an overhead. If I do the renumbering to make each process hold continuous rows, this overhead can be avoided when I call "VecGetArray"? GetArray does nothing except return the pointer to the data in the vector. It does not copy anything or reorder anything. Whatever order the numbers are in vector they are in the same order as in the array you obtain with VecGetArray. 1. for your 2nd comment " The matrix and vectors the algebraic solvers see DO NOT have......." For the callback function of my shell matrix "mymult(Mat m ,Vec x, Vec y)", I need to get "x" for the halo elements to compute the non-linear function. My code will take care of other halo exchanges, but I am not sure how to use petsc to get the halo elements "x" in the shell matrix, could you please elaborate on this? some related examples or simple pesudo code would be great. Basically all the parallel code in PETSc does this. How you need to set up the halo communication depends on how you are managing the assignment of degrees of freedom on each process and between processes. VecScatterCreate() is the tool you will use to tell PETSc how to get the correct values from one process to their halo-ed location on the process. It like everything in PETSc uses a number in the vectors of 0 ... n_0-1 on the first process, n_0, n_0+1, ... n_1-1 on the second etc. Since you are managing the partitioning and distribution of parallel data you must renumber the vector entry numbering in your data structures to match that shown above. Just do the numbering once after you have setup your distributed data and use it for the rest of the run. You might use the object from AOCreate to do the renumbering for you. Barry Thanks, Feng ________________________________ From: Barry Smith > Sent: 22 March 2021 1:28 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation On Mar 21, 2021, at 6:22 PM, feng wang > wrote: Hi Barry, Thanks for your help, I really appreciate it. In the end I used a shell matrix to compute the matrix-vector product, it is clearer to me and there are more things under my control. I am now trying to do a parallel implementation, I have some questions on setting up parallel matrices and vectors for a user-defined partition, could you please provide some advice? Suppose I have already got a partition for 2 CPUs. Each cpu is assigned a list of elements and also their halo elements. 1. The global element index for each partition is not necessarily continuous, do I have to I re-order them to make them continuous? Yes, in some sense. So long as each process can march over ITS elements computing the function and Jacobian matrix-vector product it doesn't matter how you have named/numbered entries. But conceptually the first process has the first set of vector entries and the second the second set. 1. 2. When I set up the size of the matrix and vectors for each cpu, should I take into account the halo elements? The matrix and vectors the algebraic solvers see DO NOT have halo elements in their sizes. You will likely need a halo-ed work vector to do the matrix-free multiply from. The standard model is use VecScatterBegin/End to get the values from the non-halo-ed algebraic vector input to MatMult into a halo-ed one to do the local product. 1. In my serial version, when I initialize my RHS vector, I am not using VecSetValues, Instead I use VecGetArray/VecRestoreArray to assign the values. VecAssemblyBegin()/VecAssemblyEnd() is never used. would this still work for a parallel version? Yes, you can use Get/Restore but the input vector x will need to be, as noted above, scattered into a haloed version to get all the entries you will need to do the local part of the product. Thanks, Feng ________________________________ From: Barry Smith > Sent: 12 March 2021 23:40 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation On Mar 12, 2021, at 9:37 AM, feng wang > wrote: Hi Matt, Thanks for your prompt response. Below are my two versions. one is buggy and the 2nd one is working. For the first one, I add the diagonal contribution to the true RHS (variable: rhs) and then set the base point, the callback function is somehow called twice afterwards to compute Jacobian. Do you mean "to compute the Jacobian matrix-vector product?" Is it only in the first computation of the product (for the given base vector) that it calls it twice or every matrix-vector product? It is possible there is a bug in our logic; run in the debugger with a break point in FormFunction_mf and each time the function is hit in the debugger type where or bt to get the stack frames from the calls. Send this. From this we can all see if it is being called excessively and why. For the 2nd one, I just call the callback function manually to recompute everything, the callback function is then called once as expected to compute the Jacobian. For me, both versions should do the same things. but I don't know why in the first one the callback function is called twice after I set the base point. what could possibly go wrong? The logic of how it is suppose to work is shown below. Thanks, Feng //This does not work fld->cnsv( iqs,iqe, q, aux, csv ); //add contribution of time-stepping for(iv=0; ivcnsv( iqs,iqe, q, aux, csv ); ierr = petsc_setcsv(petsc_csv); CHKERRQ(ierr); ierr = FormFunction_mf(this, petsc_csv, petsc_baserhs); //this is my callback function, now call it manually ierr = MatMFFDSetBase(petsc_A_mf, petsc_csv, petsc_baserhs); CHKERRQ(ierr); Since you provide petsc_baserhs MatMFFD assumes (naturally) that you will keep the correct values in it. Hence for each new base value YOU need to compute the new values in petsc_baserhs. This approach gives you a bit more control over reusing the information in petsc_baserhs. If you would prefer that MatMFFD recomputes the base values, as needed, then you call FormFunction_mf(this, petsc_csv, NULL); and PETSc will allocate a vector and fill it up as needed by calling your FormFunction_mf() But you need to call MatAssemblyBegin/End each time you the base input vector this, petsc_csv values change. For example MatAssemblyBegin(petsc_A_mf,...) MatAssemblyEnd(petsc_A_mf,...) KSPSolve() ________________________________ From: Matthew Knepley > Sent: 12 March 2021 15:08 To: feng wang > Cc: Barry Smith >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation On Fri, Mar 12, 2021 at 9:55 AM feng wang > wrote: Hi Mat, Thanks for your reply. I will try the parallel implementation. I've got a serial matrix-free GMRES working, but I would like to know why my initial version of matrix-free implementation does not work and there is still something I don't understand. I did some debugging and find that the callback function to compute the RHS for the matrix-free matrix is called twice by Petsc when it computes the finite difference Jacobian, but it should only be called once. I don't know why, could you please give some advice? F is called once to calculate the base point and once to get the perturbation. The base point is not recalculated, so if you do many iterates, it is amortized. Thanks, Matt Thanks, Feng ________________________________ From: Matthew Knepley > Sent: 12 March 2021 12:05 To: feng wang > Cc: Barry Smith >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation On Fri, Mar 12, 2021 at 6:02 AM feng wang > wrote: Hi Barry, Thanks for your advice. You are right on this. somehow there is some inconsistency when I compute the right hand side (true RHS + time-stepping contribution to the diagonal matrix) to compute the finite difference Jacobian. If I just use the call back function to recompute my RHS before I call MatMFFDSetBase, then it works like a charm. But now I end up with computing my RHS three times. 1st time is to compute the true RHS, the rest two is for computing finite difference Jacobian. In my previous buggy version, I only compute RHS twice. If possible, could you elaborate on your comments "Also be careful about petsc_baserhs", so I may possibly understand what was going on with my buggy version. Our FD implementation is simple. It approximates the action of the Jacobian as J(b) v = (F(b + h v) - F(b)) / h ||v|| where h is some small parameter and b is the base vector, namely the one that you are linearizing around. In a Newton step, b is the previous solution and v is the proposed solution update. Besides, for a parallel implementation, my code already has its own partition method, is it possible to allow petsc read in a user-defined partition? if not what is a better way to do this? Sure https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecSetSizes.html Thanks, Matt Many thanks, Feng ________________________________ From: Barry Smith > Sent: 11 March 2021 22:15 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation Feng, The first thing to check is that for each linear solve that involves a new operator (values in the base vector) the MFFD matrix knows it is using a new operator. The easiest way is to call MatMFFDSetBase() before each solve that involves a new operator (new values in the base vector). Also be careful about petsc_baserhs, when you change the base vector's values you also need to change the petsc_baserhs values to the function evaluation at that point. If that is correct I would check with a trivial function evaluator to make sure the infrastructure is all set up correctly. For examples use for the matrix free a 1 4 1 operator applied matrix free. Barry On Mar 11, 2021, at 7:35 AM, feng wang > wrote: Dear All, I am new to petsc and trying to implement a matrix-free GMRES. I have assembled an approximate Jacobian matrix just for preconditioning. After reading some previous questions on this topic, my approach is: the matrix-free matrix is created as: ierr = MatCreateMFFD(*A_COMM_WORLD, iqe*blocksize, iqe*blocksize, PETSC_DETERMINE, PETSC_DETERMINE, &petsc_A_mf); CHKERRQ(ierr); ierr = MatMFFDSetFunction(petsc_A_mf, FormFunction_mf, this); CHKERRQ(ierr); KSP linear operator is set up as: ierr = KSPSetOperators(petsc_ksp, petsc_A_mf, petsc_A_pre); CHKERRQ(ierr); //petsc_A_pre is my assembled pre-conditioning matrix Before calling KSPSolve, I do: ierr = MatMFFDSetBase(petsc_A_mf, petsc_csv, petsc_baserhs); CHKERRQ(ierr); //petsc_csv is the flow states, petsc_baserhs is the pre-computed right hand side The call back function is defined as: PetscErrorCode cFdDomain::FormFunction_mf(void *ctx, Vec in_vec, Vec out_vec) { PetscErrorCode ierr; cFdDomain *user_ctx; cout << "FormFunction_mf called\n"; //in_vec: flow states //out_vec: right hand side + diagonal contributions from CFL number user_ctx = (cFdDomain*)ctx; //get perturbed conservative variables from petsc user_ctx->petsc_getcsv(in_vec); //get new right side user_ctx->petsc_fd_rhs(); //set new right hand side to the output vector user_ctx->petsc_setrhs(out_vec); ierr = 0; return ierr; } The linear system I am solving is (J+D)x=RHS. J is the Jacobian matrix. D is a diagonal matrix and it is used to stabilise the solution at the start but reduced gradually when the solution moves on to recover Newton's method. I add D*x to the true right side when non-linear function is computed to work out finite difference Jacobian, so when finite difference is used, it actually computes (J+D)*dx. The code runs but diverges in the end. If I don't do matrix-free and use my approximate Jacobian matrix, GMRES works. So something is wrong with my matrix-free implementation. Have I missed something in my implementation? Besides, is there a way to check if the finite difference Jacobian matrix is computed correctly in a matrix-free implementation? Thanks for your help in advance. Feng -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed May 12 16:58:32 2021 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 12 May 2021 16:58:32 -0500 Subject: [petsc-users] Questions on matrix-free GMRES implementation In-Reply-To: References: <584E6514-C3C6-469B-A256-5470811D8D52@petsc.dev> <38509E59-D27A-47C6-8D97-EAAEBFC15FBF@petsc.dev> <5B018B57-B679-4015-8097-042B7C6B9D38@petsc.dev> <151FDDB8-2384-4A3E-9B17-45318E2CC7CC@petsc.dev> <1599C26D-14C3-4EA7-9CD3-F0526F098AD6@petsc.dev> Message-ID: <61F8A0FD-D904-4199-95D1-2A940DAC2C0A@petsc.dev> > On May 12, 2021, at 12:40 PM, feng wang wrote: > > "Is the the plot for KSP convergence with ASM preconditioning? Or just matrix-free multiply and no preconditioner? " > > The plot shows the non-linear residual, which is the L2 norm of my right hand side, against number of non-linear iterations. I am using GMRES preconditioned with ASM. From your comments, If I understand it correctly, my non-linear residual history would change with number of cpus which I use because of ASM? That is fine. The ASM affects the linear convergence history but should not normally affect the nonlinear residual history much at all. > > "One normally expects the number of iterations to increase with more cores for ASM (especially from one to two ranks) because there is less coupling in the preconditioner the more blocks (ranks) you use". > somehow I did not see a big difference in my non-linear residual when I use 1, 2, 3 cpus in this case. I can try a different case and check. > > Thanks, > Feng > > From: Barry Smith > > Sent: 12 May 2021 17:11 > To: feng wang > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation > > > Is the the plot for KSP convergence with ASM preconditioning? Or just matrix-free multiply and no preconditioner? > > One normally expects the number of iterations to increase with more cores for ASM (especially from one to two ranks) because there is less coupling in the preconditioner the more blocks (ranks) you use. > > Barry > > >> On May 12, 2021, at 7:06 AM, feng wang > wrote: >> >> Hi Barry, >> >> I have implemented my matrix-free GMRES in parallel. There is a small difference in the residuals (please see the figure below) when I vary the number of cores. But they all converged in the end. I have tried to multiply my shell matrix or preconditoning matrix with a vector in parallel, I got same values as a serial run. so I believe halo exchange is ok for matrix-vector multiplications. >> >> I am using ASM preconditioner with overlap set to 1. Is this behaviour in parallel normal for ASM pre-conditioners? If it is not, then I know I need to find a bug in my code. >> >> Thanks, >> Feng >> >> >> >> From: petsc-users > on behalf of feng wang > >> Sent: 28 March 2021 22:05 >> To: Barry Smith > >> Cc: petsc-users at mcs.anl.gov > >> Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation >> >> Hi Barry, >> Thanks for your comments. I will try that. >> Thanks, >> Feng >> >> From: Barry Smith > >> Sent: 26 March 2021 23:44 >> To: feng wang > >> Cc: petsc-users at mcs.anl.gov > >> Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation >> >> >> >>> On Mar 25, 2021, at 6:39 PM, feng wang > wrote: >>> >>> Hi Barry, >>> >>> Thanks for your comments. >>> >>> I will renumber the cells in the way as you recommended. I went through the manual again and understand how to update the halo elements for my shell matrix routine "mymult(Mat m ,Vec x, Vec y)". I can use the global index of ghost cells for each rank and "Vec x" to get the ghost values for each rank via scattering. It should be similar to the example in page 40 in the manual. >>> >>> One more question, I also have an assembled approximate Jacobian matrix for pre-conditioning GMRES. If I re-number the cells properly as your suggested, I don't need to worry about communication and petsc will handle it properly together with my shell-matrix? >> >> If you assembly the approximate Jaocobian using the "new" ordering then it will reflect the same function evaluation and matrix free operators so should be ok. >> >> Barry >> >>> >>> Thanks, >>> Feng >>> >>> From: Barry Smith > >>> Sent: 25 March 2021 0:03 >>> To: feng wang > >>> Cc: petsc-users at mcs.anl.gov > >>> Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation >>> >>> >>> >>>> On Mar 24, 2021, at 5:08 AM, feng wang > wrote: >>>> >>>> Hi Barry, >>>> >>>> Thanks for your comments. It's very helpful. For your comments, I have a bit more questions >>>> >>>> for your 1st comment " Yes, in some sense. So long as each process ....". >>>> If I understand it correctly (hopefully) a parallel vector in petsc can hold discontinuous rows of data in a global array. If this is true, If I call "VecGetArray", it would create a copy in a continuous space if the data is not continuous, do some operations and petsc will figure out how to put updated values back to the right place in the global array? >>>> This would generate an overhead. If I do the renumbering to make each process hold continuous rows, this overhead can be avoided when I call "VecGetArray"? >>> >>> GetArray does nothing except return the pointer to the data in the vector. It does not copy anything or reorder anything. Whatever order the numbers are in vector they are in the same order as in the array you obtain with VecGetArray. >>> >>>> for your 2nd comment " The matrix and vectors the algebraic solvers see DO NOT have......." For the callback function of my shell matrix "mymult(Mat m ,Vec x, Vec y)", I need to get "x" for the halo elements to compute the non-linear function. My code will take care of other halo exchanges, but I am not sure how to use petsc to get the halo elements "x" in the shell matrix, could you please elaborate on this? some related examples or simple pesudo code would be great. >>> Basically all the parallel code in PETSc does this. How you need to set up the halo communication depends on how you are managing the assignment of degrees of freedom on each process and between processes. VecScatterCreate() is the tool you will use to tell PETSc how to get the correct values from one process to their halo-ed location on the process. It like everything in PETSc uses a number in the vectors of 0 ... n_0-1 on the first process, n_0, n_0+1, ... n_1-1 on the second etc. Since you are managing the partitioning and distribution of parallel data you must renumber the vector entry numbering in your data structures to match that shown above. Just do the numbering once after you have setup your distributed data and use it for the rest of the run. You might use the object from AOCreate to do the renumbering for you. >>> >>> Barry >>> >>> >>> >>>> Thanks, >>>> Feng >>>> >>>> From: Barry Smith > >>>> Sent: 22 March 2021 1:28 >>>> To: feng wang > >>>> Cc: petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation >>>> >>>> >>>> >>>>> On Mar 21, 2021, at 6:22 PM, feng wang > wrote: >>>>> >>>>> Hi Barry, >>>>> >>>>> Thanks for your help, I really appreciate it. >>>>> >>>>> In the end I used a shell matrix to compute the matrix-vector product, it is clearer to me and there are more things under my control. I am now trying to do a parallel implementation, I have some questions on setting up parallel matrices and vectors for a user-defined partition, could you please provide some advice? Suppose I have already got a partition for 2 CPUs. Each cpu is assigned a list of elements and also their halo elements. >>>>> The global element index for each partition is not necessarily continuous, do I have to I re-order them to make them continuous? >>>> >>>> Yes, in some sense. So long as each process can march over ITS elements computing the function and Jacobian matrix-vector product it doesn't matter how you have named/numbered entries. But conceptually the first process has the first set of vector entries and the second the second set. >>>>> >>>>> When I set up the size of the matrix and vectors for each cpu, should I take into account the halo elements? >>>> >>>> The matrix and vectors the algebraic solvers see DO NOT have halo elements in their sizes. You will likely need a halo-ed work vector to do the matrix-free multiply from. The standard model is use VecScatterBegin/End to get the values from the non-halo-ed algebraic vector input to MatMult into a halo-ed one to do the local product. >>>> >>>>> In my serial version, when I initialize my RHS vector, I am not using VecSetValues, Instead I use VecGetArray/VecRestoreArray to assign the values. VecAssemblyBegin()/VecAssemblyEnd() is never used. would this still work for a parallel version? >>>> >>>> Yes, you can use Get/Restore but the input vector x will need to be, as noted above, scattered into a haloed version to get all the entries you will need to do the local part of the product. >>>> >>>> >>>>> Thanks, >>>>> Feng >>>>> >>>>> >>>>> From: Barry Smith > >>>>> Sent: 12 March 2021 23:40 >>>>> To: feng wang > >>>>> Cc: petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation >>>>> >>>>> >>>>> >>>>>> On Mar 12, 2021, at 9:37 AM, feng wang > wrote: >>>>>> >>>>>> Hi Matt, >>>>>> >>>>>> Thanks for your prompt response. >>>>>> >>>>>> Below are my two versions. one is buggy and the 2nd one is working. For the first one, I add the diagonal contribution to the true RHS (variable: rhs) and then set the base point, the callback function is somehow called twice afterwards to compute Jacobian. >>>>> >>>>> Do you mean "to compute the Jacobian matrix-vector product?" >>>>> >>>>> Is it only in the first computation of the product (for the given base vector) that it calls it twice or every matrix-vector product? >>>>> >>>>> It is possible there is a bug in our logic; run in the debugger with a break point in FormFunction_mf and each time the function is hit in the debugger type where or bt to get the stack frames from the calls. Send this. From this we can all see if it is being called excessively and why. >>>>> >>>>>> For the 2nd one, I just call the callback function manually to recompute everything, the callback function is then called once as expected to compute the Jacobian. For me, both versions should do the same things. but I don't know why in the first one the callback function is called twice after I set the base point. what could possibly go wrong? >>>>> >>>>> The logic of how it is suppose to work is shown below. >>>>>> >>>>>> Thanks, >>>>>> Feng >>>>>> >>>>>> //This does not work >>>>>> fld->cnsv( iqs,iqe, q, aux, csv ); >>>>>> //add contribution of time-stepping >>>>>> for(iv=0; iv>>>>> { >>>>>> for(iq=0; iq>>>>> { >>>>>> //use conservative variables here >>>>>> rhs[iv][iq] = -rhs[iv][iq] + csv[iv][iq]*lhsa[nlhs-1][iq]/cfl; >>>>>> } >>>>>> } >>>>>> ierr = petsc_setcsv(petsc_csv); CHKERRQ(ierr); >>>>>> ierr = petsc_setrhs(petsc_baserhs); CHKERRQ(ierr); >>>>>> ierr = MatMFFDSetBase(petsc_A_mf, petsc_csv, petsc_baserhs); CHKERRQ(ierr); >>>>>> >>>>>> //This works >>>>>> fld->cnsv( iqs,iqe, q, aux, csv ); >>>>>> ierr = petsc_setcsv(petsc_csv); CHKERRQ(ierr); >>>>>> ierr = FormFunction_mf(this, petsc_csv, petsc_baserhs); //this is my callback function, now call it manually >>>>>> ierr = MatMFFDSetBase(petsc_A_mf, petsc_csv, petsc_baserhs); CHKERRQ(ierr); >>>>>> >>>>>> >>>>> Since you provide petsc_baserhs MatMFFD assumes (naturally) that you will keep the correct values in it. Hence for each new base value YOU need to compute the new values in petsc_baserhs. This approach gives you a bit more control over reusing the information in petsc_baserhs. >>>>> >>>>> If you would prefer that MatMFFD recomputes the base values, as needed, then you call FormFunction_mf(this, petsc_csv, NULL); and PETSc will allocate a vector and fill it up as needed by calling your FormFunction_mf() But you need to call MatAssemblyBegin/End each time you the base input vector this, petsc_csv values change. For example >>>>> >>>>> MatAssemblyBegin(petsc_A_mf,...) >>>>> MatAssemblyEnd(petsc_A_mf,...) >>>>> KSPSolve() >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> >>>>>> From: Matthew Knepley > >>>>>> Sent: 12 March 2021 15:08 >>>>>> To: feng wang > >>>>>> Cc: Barry Smith >; petsc-users at mcs.anl.gov > >>>>>> Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation >>>>>> >>>>>> On Fri, Mar 12, 2021 at 9:55 AM feng wang > wrote: >>>>>> Hi Mat, >>>>>> >>>>>> Thanks for your reply. I will try the parallel implementation. >>>>>> >>>>>> I've got a serial matrix-free GMRES working, but I would like to know why my initial version of matrix-free implementation does not work and there is still something I don't understand. I did some debugging and find that the callback function to compute the RHS for the matrix-free matrix is called twice by Petsc when it computes the finite difference Jacobian, but it should only be called once. I don't know why, could you please give some advice? >>>>>> >>>>>> F is called once to calculate the base point and once to get the perturbation. The base point is not recalculated, so if you do many iterates, it is amortized. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> Thanks, >>>>>> Feng >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> From: Matthew Knepley > >>>>>> Sent: 12 March 2021 12:05 >>>>>> To: feng wang > >>>>>> Cc: Barry Smith >; petsc-users at mcs.anl.gov > >>>>>> Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation >>>>>> >>>>>> On Fri, Mar 12, 2021 at 6:02 AM feng wang > wrote: >>>>>> Hi Barry, >>>>>> >>>>>> Thanks for your advice. >>>>>> >>>>>> You are right on this. somehow there is some inconsistency when I compute the right hand side (true RHS + time-stepping contribution to the diagonal matrix) to compute the finite difference Jacobian. If I just use the call back function to recompute my RHS before I call MatMFFDSetBase, then it works like a charm. But now I end up with computing my RHS three times. 1st time is to compute the true RHS, the rest two is for computing finite difference Jacobian. >>>>>> >>>>>> In my previous buggy version, I only compute RHS twice. If possible, could you elaborate on your comments "Also be careful about petsc_baserhs", so I may possibly understand what was going on with my buggy version. >>>>>> >>>>>> Our FD implementation is simple. It approximates the action of the Jacobian as >>>>>> >>>>>> J(b) v = (F(b + h v) - F(b)) / h ||v|| >>>>>> >>>>>> where h is some small parameter and b is the base vector, namely the one that you are linearizing around. In a Newton step, b is the previous solution >>>>>> and v is the proposed solution update. >>>>>> >>>>>> Besides, for a parallel implementation, my code already has its own partition method, is it possible to allow petsc read in a user-defined partition? if not what is a better way to do this? >>>>>> >>>>>> Sure >>>>>> >>>>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecSetSizes.html >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> Many thanks, >>>>>> Feng >>>>>> >>>>>> >>>>>> From: Barry Smith > >>>>>> Sent: 11 March 2021 22:15 >>>>>> To: feng wang > >>>>>> Cc: petsc-users at mcs.anl.gov > >>>>>> Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation >>>>>> >>>>>> >>>>>> Feng, >>>>>> >>>>>> The first thing to check is that for each linear solve that involves a new operator (values in the base vector) the MFFD matrix knows it is using a new operator. >>>>>> >>>>>> The easiest way is to call MatMFFDSetBase() before each solve that involves a new operator (new values in the base vector). Also be careful about petsc_baserhs, when you change the base vector's values you also need to change the petsc_baserhs values to the function evaluation at that point. >>>>>> >>>>>> If that is correct I would check with a trivial function evaluator to make sure the infrastructure is all set up correctly. For examples use for the matrix free a 1 4 1 operator applied matrix free. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>>> On Mar 11, 2021, at 7:35 AM, feng wang > wrote: >>>>>>> >>>>>>> Dear All, >>>>>>> >>>>>>> I am new to petsc and trying to implement a matrix-free GMRES. I have assembled an approximate Jacobian matrix just for preconditioning. After reading some previous questions on this topic, my approach is: >>>>>>> >>>>>>> the matrix-free matrix is created as: >>>>>>> >>>>>>> ierr = MatCreateMFFD(*A_COMM_WORLD, iqe*blocksize, iqe*blocksize, PETSC_DETERMINE, PETSC_DETERMINE, &petsc_A_mf); CHKERRQ(ierr); >>>>>>> ierr = MatMFFDSetFunction(petsc_A_mf, FormFunction_mf, this); CHKERRQ(ierr); >>>>>>> >>>>>>> KSP linear operator is set up as: >>>>>>> >>>>>>> ierr = KSPSetOperators(petsc_ksp, petsc_A_mf, petsc_A_pre); CHKERRQ(ierr); //petsc_A_pre is my assembled pre-conditioning matrix >>>>>>> >>>>>>> Before calling KSPSolve, I do: >>>>>>> >>>>>>> ierr = MatMFFDSetBase(petsc_A_mf, petsc_csv, petsc_baserhs); CHKERRQ(ierr); //petsc_csv is the flow states, petsc_baserhs is the pre-computed right hand side >>>>>>> >>>>>>> The call back function is defined as: >>>>>>> >>>>>>> PetscErrorCode cFdDomain::FormFunction_mf(void *ctx, Vec in_vec, Vec out_vec) >>>>>>> { >>>>>>> PetscErrorCode ierr; >>>>>>> cFdDomain *user_ctx; >>>>>>> >>>>>>> cout << "FormFunction_mf called\n"; >>>>>>> >>>>>>> //in_vec: flow states >>>>>>> //out_vec: right hand side + diagonal contributions from CFL number >>>>>>> >>>>>>> user_ctx = (cFdDomain*)ctx; >>>>>>> >>>>>>> //get perturbed conservative variables from petsc >>>>>>> user_ctx->petsc_getcsv(in_vec); >>>>>>> >>>>>>> //get new right side >>>>>>> user_ctx->petsc_fd_rhs(); >>>>>>> >>>>>>> //set new right hand side to the output vector >>>>>>> user_ctx->petsc_setrhs(out_vec); >>>>>>> >>>>>>> ierr = 0; >>>>>>> return ierr; >>>>>>> } >>>>>>> >>>>>>> The linear system I am solving is (J+D)x=RHS. J is the Jacobian matrix. D is a diagonal matrix and it is used to stabilise the solution at the start but reduced gradually when the solution moves on to recover Newton's method. I add D*x to the true right side when non-linear function is computed to work out finite difference Jacobian, so when finite difference is used, it actually computes (J+D)*dx. >>>>>>> >>>>>>> The code runs but diverges in the end. If I don't do matrix-free and use my approximate Jacobian matrix, GMRES works. So something is wrong with my matrix-free implementation. Have I missed something in my implementation? Besides, is there a way to check if the finite difference Jacobian matrix is computed correctly in a matrix-free implementation? >>>>>>> >>>>>>> Thanks for your help in advance. >>>>>>> Feng >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From snailsoar at hotmail.com Fri May 14 03:23:32 2021 From: snailsoar at hotmail.com (feng wang) Date: Fri, 14 May 2021 08:23:32 +0000 Subject: [petsc-users] reuse a real matrix for a second linear system with complex numbers Message-ID: Dear All, I am solving a coupled system. One system is AX=B. A, X and B are all real numbers and it is solved with GMRES in petsc. Now I need to solve a second linear system, it can be represented as (A+i*w)*Z=C. i is the imaginary unit. Z and C are also complex numbers. So the Jacobian matrix of the second system is just A plus a diagonal contribution i*w. I would like solve the second system with GMRES, could petsc handle this? any comments are welcome. Thanks, Feng -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri May 14 05:00:01 2021 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 14 May 2021 06:00:01 -0400 Subject: [petsc-users] reuse a real matrix for a second linear system with complex numbers In-Reply-To: References: Message-ID: On Fri, May 14, 2021 at 4:23 AM feng wang wrote: > Dear All, > > I am solving a coupled system. One system is AX=B. A, X and B are all real > numbers and it is solved with GMRES in petsc. Now I need to solve a second > linear system, it can be represented as (A+i*w)*Z=C. i is the imaginary > unit. Z and C are also complex numbers. > > So the Jacobian matrix of the second system is just A plus a diagonal > contribution i*w. I would like solve the second system with GMRES, could > petsc handle this? any comments are welcome. > Mixing real and complex numbers in the same code is somewhat difficult now. You have two obvious choices: 1) Configure for complex numbers and solve your first system as complex but with 0 imaginary part. This will work fine, but uses more memory for that system. However, since you will already use that much memory for the second system, it does not seem like a big deal to me. 2) You could solve the second system in its equivalent real form / A w \ /Zr\ = /Cr\ \ -w A / \Zi/ \Ci/ This uses more memory for the second system, but does not require reconfiguring. THanks, Matt Thanks, > Feng > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From snailsoar at hotmail.com Fri May 14 09:36:47 2021 From: snailsoar at hotmail.com (feng wang) Date: Fri, 14 May 2021 14:36:47 +0000 Subject: [petsc-users] reuse a real matrix for a second linear system with complex numbers In-Reply-To: References: , Message-ID: Thanks for your comments. It is very helpful! I might try the 1st approach first. For the 2nd approach which uses an equivalent real-number system, I see potential issues when running in parallel. I have re-ordered my cells to allow each rank hold continuous rows in the first real system Ax=B. For the equivalent real-number system, each rank now holds (or can assign values to) two patches of continuous rows, which are separated by N rows, N is the size of square matrix A. I can't see a straightforward way to allow each rank hold continuous rows in this case. or petsc can handle these two patches of continuous rows with fixed row index difference in this case? By the way, could I still re-use my KSP object in my second system by simply changing the operators and setting new parameters? Thanks, Feng ________________________________ From: Matthew Knepley Sent: 14 May 2021 10:00 To: feng wang Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] reuse a real matrix for a second linear system with complex numbers On Fri, May 14, 2021 at 4:23 AM feng wang > wrote: Dear All, I am solving a coupled system. One system is AX=B. A, X and B are all real numbers and it is solved with GMRES in petsc. Now I need to solve a second linear system, it can be represented as (A+i*w)*Z=C. i is the imaginary unit. Z and C are also complex numbers. So the Jacobian matrix of the second system is just A plus a diagonal contribution i*w. I would like solve the second system with GMRES, could petsc handle this? any comments are welcome. Mixing real and complex numbers in the same code is somewhat difficult now. You have two obvious choices: 1) Configure for complex numbers and solve your first system as complex but with 0 imaginary part. This will work fine, but uses more memory for that system. However, since you will already use that much memory for the second system, it does not seem like a big deal to me. 2) You could solve the second system in its equivalent real form / A w \ /Zr\ = /Cr\ \ -w A / \Zi/ \Ci/ This uses more memory for the second system, but does not require reconfiguring. THanks, Matt Thanks, Feng -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri May 14 10:20:39 2021 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 14 May 2021 11:20:39 -0400 Subject: [petsc-users] reuse a real matrix for a second linear system with complex numbers In-Reply-To: References: Message-ID: On Fri, May 14, 2021 at 10:36 AM feng wang wrote: > Thanks for your comments. It is very helpful! > > I might try the 1st approach first. For the 2nd approach which uses an > equivalent real-number system, I see potential issues when running in > parallel. I have re-ordered my cells to allow each rank hold continuous > rows in the first real system Ax=B. For the equivalent real-number system, > each rank now holds (or can assign values to) two patches of continuous > rows, which are separated by N rows, N is the size of square matrix A. I > can't see a straightforward way to allow each rank hold continuous rows in > this case. or petsc can handle these two patches of continuous rows with > fixed row index difference in this case? > I just wrote it that way for ease of typing. You can imagine permuting into 2x2 blocks with /a w\ \-w a/ for each entry. Thanks, Matt > By the way, could I still re-use my KSP object in my second system by > simply changing the operators and setting new parameters? > > Thanks, > Feng > > ------------------------------ > *From:* Matthew Knepley > *Sent:* 14 May 2021 10:00 > *To:* feng wang > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] reuse a real matrix for a second linear > system with complex numbers > > On Fri, May 14, 2021 at 4:23 AM feng wang wrote: > > Dear All, > > I am solving a coupled system. One system is AX=B. A, X and B are all real > numbers and it is solved with GMRES in petsc. Now I need to solve a second > linear system, it can be represented as (A+i*w)*Z=C. i is the imaginary > unit. Z and C are also complex numbers. > > So the Jacobian matrix of the second system is just A plus a diagonal > contribution i*w. I would like solve the second system with GMRES, could > petsc handle this? any comments are welcome. > > > Mixing real and complex numbers in the same code is somewhat difficult > now. You have two obvious choices: > > 1) Configure for complex numbers and solve your first system as complex > but with 0 imaginary part. This will work fine, but uses more memory for > that system. However, since you will already > use that much memory for the second system, it does not seem like a > big deal to me. > > 2) You could solve the second system in its equivalent real form > > / A w \ /Zr\ = /Cr\ > \ -w A / \Zi/ \Ci/ > > This uses more memory for the second system, but does not require > reconfiguring. > > THanks, > > Matt > > Thanks, > Feng > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From snailsoar at hotmail.com Fri May 14 12:36:09 2021 From: snailsoar at hotmail.com (feng wang) Date: Fri, 14 May 2021 17:36:09 +0000 Subject: [petsc-users] reuse a real matrix for a second linear system with complex numbers In-Reply-To: References: , Message-ID: Yes, you are right. I can do row permutations to make them continuous. I will try this. could I re-use my KSP object from the 1st linear system in my 2nd system by simply changing the operators and setting new parameters? or I need a separate KSP object for the 2nd system? Thanks, Feng ________________________________ From: Matthew Knepley Sent: 14 May 2021 15:20 To: feng wang Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] reuse a real matrix for a second linear system with complex numbers On Fri, May 14, 2021 at 10:36 AM feng wang > wrote: Thanks for your comments. It is very helpful! I might try the 1st approach first. For the 2nd approach which uses an equivalent real-number system, I see potential issues when running in parallel. I have re-ordered my cells to allow each rank hold continuous rows in the first real system Ax=B. For the equivalent real-number system, each rank now holds (or can assign values to) two patches of continuous rows, which are separated by N rows, N is the size of square matrix A. I can't see a straightforward way to allow each rank hold continuous rows in this case. or petsc can handle these two patches of continuous rows with fixed row index difference in this case? I just wrote it that way for ease of typing. You can imagine permuting into 2x2 blocks with /a w\ \-w a/ for each entry. Thanks, Matt By the way, could I still re-use my KSP object in my second system by simply changing the operators and setting new parameters? Thanks, Feng ________________________________ From: Matthew Knepley > Sent: 14 May 2021 10:00 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] reuse a real matrix for a second linear system with complex numbers On Fri, May 14, 2021 at 4:23 AM feng wang > wrote: Dear All, I am solving a coupled system. One system is AX=B. A, X and B are all real numbers and it is solved with GMRES in petsc. Now I need to solve a second linear system, it can be represented as (A+i*w)*Z=C. i is the imaginary unit. Z and C are also complex numbers. So the Jacobian matrix of the second system is just A plus a diagonal contribution i*w. I would like solve the second system with GMRES, could petsc handle this? any comments are welcome. Mixing real and complex numbers in the same code is somewhat difficult now. You have two obvious choices: 1) Configure for complex numbers and solve your first system as complex but with 0 imaginary part. This will work fine, but uses more memory for that system. However, since you will already use that much memory for the second system, it does not seem like a big deal to me. 2) You could solve the second system in its equivalent real form / A w \ /Zr\ = /Cr\ \ -w A / \Zi/ \Ci/ This uses more memory for the second system, but does not require reconfiguring. THanks, Matt Thanks, Feng -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri May 14 15:26:55 2021 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 14 May 2021 16:26:55 -0400 Subject: [petsc-users] reuse a real matrix for a second linear system with complex numbers In-Reply-To: References: Message-ID: On Fri, May 14, 2021 at 1:36 PM feng wang wrote: > Yes, you are right. I can do row permutations to make them continuous. I > will try this. > > could I re-use my KSP object from the 1st linear system in my 2nd system > by simply changing the operators and setting new parameters? or I need a > separate KSP object for the 2nd system? > I tink you want 2 KSP objects. You could reuse the settings of the first, but since the system is a different size, all storage would have to be deleted and recreated anyway. Thanks, Matt > Thanks, > Feng > > ------------------------------ > *From:* Matthew Knepley > *Sent:* 14 May 2021 15:20 > *To:* feng wang > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] reuse a real matrix for a second linear > system with complex numbers > > On Fri, May 14, 2021 at 10:36 AM feng wang wrote: > > Thanks for your comments. It is very helpful! > > I might try the 1st approach first. For the 2nd approach which uses an > equivalent real-number system, I see potential issues when running in > parallel. I have re-ordered my cells to allow each rank hold continuous > rows in the first real system Ax=B. For the equivalent real-number system, > each rank now holds (or can assign values to) two patches of continuous > rows, which are separated by N rows, N is the size of square matrix A. I > can't see a straightforward way to allow each rank hold continuous rows in > this case. or petsc can handle these two patches of continuous rows with > fixed row index difference in this case? > > > I just wrote it that way for ease of typing. You can imagine permuting > into 2x2 blocks with > > /a w\ > \-w a/ > > for each entry. > > Thanks, > > Matt > > > By the way, could I still re-use my KSP object in my second system by > simply changing the operators and setting new parameters? > > Thanks, > Feng > > ------------------------------ > *From:* Matthew Knepley > *Sent:* 14 May 2021 10:00 > *To:* feng wang > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] reuse a real matrix for a second linear > system with complex numbers > > On Fri, May 14, 2021 at 4:23 AM feng wang wrote: > > Dear All, > > I am solving a coupled system. One system is AX=B. A, X and B are all real > numbers and it is solved with GMRES in petsc. Now I need to solve a second > linear system, it can be represented as (A+i*w)*Z=C. i is the imaginary > unit. Z and C are also complex numbers. > > So the Jacobian matrix of the second system is just A plus a diagonal > contribution i*w. I would like solve the second system with GMRES, could > petsc handle this? any comments are welcome. > > > Mixing real and complex numbers in the same code is somewhat difficult > now. You have two obvious choices: > > 1) Configure for complex numbers and solve your first system as complex > but with 0 imaginary part. This will work fine, but uses more memory for > that system. However, since you will already > use that much memory for the second system, it does not seem like a > big deal to me. > > 2) You could solve the second system in its equivalent real form > > / A w \ /Zr\ = /Cr\ > \ -w A / \Zi/ \Ci/ > > This uses more memory for the second system, but does not require > reconfiguring. > > THanks, > > Matt > > Thanks, > Feng > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sat May 15 14:51:39 2021 From: mfadams at lbl.gov (Mark Adams) Date: Sat, 15 May 2021 15:51:39 -0400 Subject: [petsc-users] configure error Message-ID: I can build a non-kokkos PETSc here (Cori/GPU) but I get this error with Kokkos. Any suggestions? Thanks, Mark ============================================================================================= ============================================================================================= Trying to download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING ============================================================================================= ============================================================================================= Trying to download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING ============================================================================================= ============================================================================================= Trying to download https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz for SOWING ============================================================================================= ******************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): ------------------------------------------------------------------------------- Error during download/extract/detection of SOWING: Unable to clone sowing Could not execute "['git clone https://bitbucket.org/petsc/pkg-sowing.git /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": Cloning into '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/': The requested URL returned error: 503 Unable to download package SOWING from: git:// https://bitbucket.org/petsc/pkg-sowing.git * If URL specified manually - perhaps there is a typo? * If your network is disconnected - please reconnect and rerun ./configure * Or perhaps you have a firewall blocking the download * You can run with --with-packages-download-dir=/adirectory and ./configure will instruct you what packages to download manually * or you can download the above URL manually, to /yourselectedlocation and use the configure option: --download-sowing=/yourselectedlocation Unable to clone sowing Could not execute "['git clone https://bitbucket.org/petsc/pkg-sowing.git /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": Cloning into '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/': The requested URL returned error: 503 Unable to download package SOWING from: git:// https://bitbucket.org/petsc/pkg-sowing.git * If URL specified manually - perhaps there is a typo? * If your network is disconnected - please reconnect and rerun ./configure * Or perhaps you have a firewall blocking the download * You can run with --with-packages-download-dir=/adirectory and ./configure will instruct you what packages to download manually * or you can download the above URL manually, to /yourselectedlocation and use the configure option: --download-sowing=/yourselectedlocation file could not be opened successfully Downloaded package SOWING from: https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz is not a tarball. [or installed python cannot process compressed files] * If you are behind a firewall - please fix your proxy and rerun ./configure For example at LANL you may need to set the environmental variable http_proxy (or HTTP_PROXY?) to http://proxyout.lanl.gov * You can run with --with-packages-download-dir=/adirectory and ./configure will instruct you what packages to download manually * or you can download the above URL manually, to /yourselectedlocation/v1.1.26-p1.tar.gz and use the configure option: --download-sowing=/yourselectedlocation/v1.1.26-p1.tar.gz ******************************************************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 751877 bytes Desc: not available URL: From bsmith at petsc.dev Sat May 15 16:30:33 2021 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 15 May 2021 16:30:33 -0500 Subject: [petsc-users] configure error In-Reply-To: References: Message-ID: <22289CC9-0240-4046-B42E-8CB80F6DD9BB@petsc.dev> Bitbucket is off-line. Do you need fortran stubs at this moment? If not use --with-fortran-interface=0 > On May 15, 2021, at 2:51 PM, Mark Adams wrote: > > I can build a non-kokkos PETSc here (Cori/GPU) but I get this error with Kokkos. > Any suggestions? > Thanks, > Mark > > ============================================================================================= ============================================================================================= Trying to download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING ============================================================================================= ============================================================================================= Trying to download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING ============================================================================================= ============================================================================================= Trying to download https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz for SOWING ============================================================================================= ******************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > ------------------------------------------------------------------------------- > Error during download/extract/detection of SOWING: > Unable to clone sowing > Could not execute "['git clone https://bitbucket.org/petsc/pkg-sowing.git /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": > Cloning into '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... > fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/ ': The requested URL returned error: 503 > Unable to download package SOWING from: git://https://bitbucket.org/petsc/pkg-sowing.git > * If URL specified manually - perhaps there is a typo? > * If your network is disconnected - please reconnect and rerun ./configure > * Or perhaps you have a firewall blocking the download > * You can run with --with-packages-download-dir=/adirectory and ./configure will instruct you what packages to download manually > * or you can download the above URL manually, to /yourselectedlocation > and use the configure option: > --download-sowing=/yourselectedlocation > Unable to clone sowing > Could not execute "['git clone https://bitbucket.org/petsc/pkg-sowing.git /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": > Cloning into '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... > fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/ ': The requested URL returned error: 503 > Unable to download package SOWING from: git://https://bitbucket.org/petsc/pkg-sowing.git > * If URL specified manually - perhaps there is a typo? > * If your network is disconnected - please reconnect and rerun ./configure > * Or perhaps you have a firewall blocking the download > * You can run with --with-packages-download-dir=/adirectory and ./configure will instruct you what packages to download manually > * or you can download the above URL manually, to /yourselectedlocation > and use the configure option: > --download-sowing=/yourselectedlocation > file could not be opened successfully > Downloaded package SOWING from: https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz is not a tarball. > [or installed python cannot process compressed files] > * If you are behind a firewall - please fix your proxy and rerun ./configure > For example at LANL you may need to set the environmental variable http_proxy (or HTTP_PROXY?) to http://proxyout.lanl.gov > * You can run with --with-packages-download-dir=/adirectory and ./configure will instruct you what packages to download manually > * or you can download the above URL manually, to /yourselectedlocation/v1.1.26-p1.tar.gz > and use the configure option: > --download-sowing=/yourselectedlocation/v1.1.26-p1.tar.gz > ******************************************************************************* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sat May 15 19:13:24 2021 From: mfadams at lbl.gov (Mark Adams) Date: Sat, 15 May 2021 20:13:24 -0400 Subject: [petsc-users] configure error In-Reply-To: <22289CC9-0240-4046-B42E-8CB80F6DD9BB@petsc.dev> References: <22289CC9-0240-4046-B42E-8CB80F6DD9BB@petsc.dev> Message-ID: Thanks, Now I get an error in make all. THis was working a few weeks ago. Make.log was empty but here is the output. .... CC arch-cori-gpu-opt-kokkos-gcc/obj/sys/error/ftn-custom/zerrf.o CC arch-cori-gpu-opt-kokkos-gcc/obj/sys/f90-src/f90_cwrap.o CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/hdf5io.o PATH=/global/common/sw/cray/sles15/x86_64/zlib/1.2.11/gcc/8.2.0/pep2pal/bin:/global/common/sw/cray/cnl7/haswell/cmake/3.18.2/bin:/opt/cray/rca/2.2.20-7.0.1.1_4.62__g8e3fb5b.ari/bin:/opt/cray/alps/6.6.58-7.0.1.1_6.20__g437d88db.ari/sbin:/opt/cray/alps/default/bin:/opt/cray/job/2.2.4-7.0.1.1_3.48__g36b56f4.ari/bin:/opt/cray/pe/craype/2.6.2/bin:/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/compilers/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/cuda/bin:/usr/common/software/sles15_cgpu/cuda/11.1.1/bin:/usr/common/software/sles15_cgpu/gcc/8.3.0/bin:/opt/esslurm/bin:/opt/cray/pe/modules/ 3.2.11.4/bin:/usr/common/software/bin:/usr/common/nsg/bin:/opt/ovis/bin:/opt/ovis/sbin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/cray/pe/bin:`dirname /usr/common/software/sles15_cgpu/cuda/11.1.1/bin/nvcc` NVCC_WRAPPER_DEFAULT_COMPILER=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin/mpicxx /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/bin/nvcc_wrapper --expt-extended-lambda -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -Xcompiler -fPIC -O3 -gencode arch=compute_80,code=sm_80 -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -Wno-deprecated-gpu-targets -I/global/homes/m/madams/petsc/include -I/global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -I/usr/common/software/sles15_cgpu/cuda/11.1.1/include /global/u2/m/madams/petsc/src/sys/objects/kokkos/kinit.kokkos.cxx -o arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o g++: error: arch=compute_80: No such file or directory g++: error: code=sm_80: No such file or directory gmake[3]: *** [gmakefile:188: arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o] Error 1 gmake[3]: *** Waiting for unfinished jobs.... CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/isio.o CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/ftn-auto/pmapf.o CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/pmap.o CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/psort.o CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/isltog.o FC arch-cori-gpu-opt-kokkos-gcc/obj/vec/f90-mod/petscvecmod.o CUDAC arch-cori-gpu-opt-kokkos-gcc/obj/sys/memory/cuda/mcudahost.o CUDAC.dep arch-cori-gpu-opt-kokkos-gcc/obj/sys/memory/cuda/mcudahost.o gmake[2]: *** [/global/homes/m/madams/petsc/lib/petsc/conf/rules:50: libs] Error 2 **************************ERROR************************************* Error during compile, check arch-cori-gpu-opt-kokkos-gcc/lib/petsc/conf/make.log Send it and arch-cori-gpu-opt-kokkos-gcc/lib/petsc/conf/configure.log to petsc-maint at mcs.anl.gov ******************************************************************** gmake[1]: *** [makefile:40: all] Error 1 make: *** [GNUmakefile:9: all] Error 2 On Sat, May 15, 2021 at 5:30 PM Barry Smith wrote: > > Bitbucket is off-line. Do you need fortran stubs at this moment? If not > use --with-fortran-interface=0 > > > On May 15, 2021, at 2:51 PM, Mark Adams wrote: > > I can build a non-kokkos PETSc here (Cori/GPU) but I get this error with > Kokkos. > Any suggestions? > Thanks, > Mark > > ============================================================================================= > > > > ============================================================================================= > > > Trying to > download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING > > > > ============================================================================================= > > > > ============================================================================================= > > > Trying to > download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING > > > > ============================================================================================= > > > > ============================================================================================= > > > Trying to > download https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz for > SOWING > > > ============================================================================================= > > > > > > > > ******************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > details): > > ------------------------------------------------------------------------------- > Error during download/extract/detection of SOWING: > Unable to clone sowing > Could not execute "['git clone https://bitbucket.org/petsc/pkg-sowing.git > /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": > Cloning into > '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... > fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/': > The requested URL returned error: 503 > Unable to download package SOWING from: git:// > https://bitbucket.org/petsc/pkg-sowing.git > * If URL specified manually - perhaps there is a typo? > * If your network is disconnected - please reconnect and rerun ./configure > * Or perhaps you have a firewall blocking the download > * You can run with --with-packages-download-dir=/adirectory and > ./configure will instruct you what packages to download manually > * or you can download the above URL manually, to /yourselectedlocation > and use the configure option: > --download-sowing=/yourselectedlocation > Unable to clone sowing > Could not execute "['git clone https://bitbucket.org/petsc/pkg-sowing.git > /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": > Cloning into > '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... > fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/': > The requested URL returned error: 503 > Unable to download package SOWING from: git:// > https://bitbucket.org/petsc/pkg-sowing.git > * If URL specified manually - perhaps there is a typo? > * If your network is disconnected - please reconnect and rerun ./configure > * Or perhaps you have a firewall blocking the download > * You can run with --with-packages-download-dir=/adirectory and > ./configure will instruct you what packages to download manually > * or you can download the above URL manually, to /yourselectedlocation > and use the configure option: > --download-sowing=/yourselectedlocation > file could not be opened successfully > Downloaded package SOWING from: > https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz is not a > tarball. > [or installed python cannot process compressed files] > * If you are behind a firewall - please fix your proxy and rerun > ./configure > For example at LANL you may need to set the environmental variable > http_proxy (or HTTP_PROXY?) to http://proxyout.lanl.gov > * You can run with --with-packages-download-dir=/adirectory and > ./configure will instruct you what packages to download manually > * or you can download the above URL manually, to > /yourselectedlocation/v1.1.26-p1.tar.gz > and use the configure option: > --download-sowing=/yourselectedlocation/v1.1.26-p1.tar.gz > > ******************************************************************************* > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 2690560 bytes Desc: not available URL: From junchao.zhang at gmail.com Sat May 15 20:44:59 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Sat, 15 May 2021 20:44:59 -0500 Subject: [petsc-users] configure error In-Reply-To: References: <22289CC9-0240-4046-B42E-8CB80F6DD9BB@petsc.dev> Message-ID: Add --show to the failing nvcc_wrapper command and see how nvcc_wrapper parses the options PATH=/global/common/sw/cray/sles15/x86_64/zlib/1.2.11/gcc/8.2.0/pep2pal/bin:/global/common/sw/cray/cnl7/haswell/cmake/3.18.2/bin:/opt/cray/rca/2.2.20-7.0.1.1_4.62__g8e3fb5b.ari/bin:/opt/cray/alps/6.6.58-7.0.1.1_6.20__g437d88db.ari/sbin:/opt/cray/alps/default/bin:/opt/cray/job/2.2.4-7.0.1.1_3.48__g36b56f4.ari/bin:/opt/cray/pe/craype/2.6.2/bin:/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/compilers/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/cuda/bin:/usr/common/software/sles15_cgpu/cuda/11.1.1/bin:/usr/common/software/sles15_cgpu/gcc/8.3.0/bin:/opt/esslurm/bin:/opt/cray/pe/modules/ 3.2.11.4/bin:/usr/common/software/bin:/usr/common/nsg/bin:/opt/ovis/bin:/opt/ovis/sbin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/cray/pe/bin:`dirname /usr/common/software/sles15_cgpu/cuda/11.1.1/bin/nvcc` NVCC_WRAPPER_DEFAULT_COMPILER=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin/mpicxx /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/bin/nvcc_wrapper --expt-extended-lambda -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -Xcompiler -fPIC -O3 -gencode arch=compute_80,code=sm_80 -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -Wno-deprecated-gpu-targets -I/global/homes/m/madams/petsc/include -I/global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -I/usr/common/software/sles15_cgpu/cuda/11.1.1/include /global/u2/m/madams/petsc/src/sys/objects/kokkos/kinit.kokkos.cxx -o arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o --show --Junchao Zhang On Sat, May 15, 2021 at 7:14 PM Mark Adams wrote: > Thanks, > Now I get an error in make all. THis was working a few weeks ago. > Make.log was empty but here is the output. > .... > CC arch-cori-gpu-opt-kokkos-gcc/obj/sys/error/ftn-custom/zerrf.o > CC arch-cori-gpu-opt-kokkos-gcc/obj/sys/f90-src/f90_cwrap.o > CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/hdf5io.o > > PATH=/global/common/sw/cray/sles15/x86_64/zlib/1.2.11/gcc/8.2.0/pep2pal/bin:/global/common/sw/cray/cnl7/haswell/cmake/3.18.2/bin:/opt/cray/rca/2.2.20-7.0.1.1_4.62__g8e3fb5b.ari/bin:/opt/cray/alps/6.6.58-7.0.1.1_6.20__g437d88db.ari/sbin:/opt/cray/alps/default/bin:/opt/cray/job/2.2.4-7.0.1.1_3.48__g36b56f4.ari/bin:/opt/cray/pe/craype/2.6.2/bin:/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/compilers/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/cuda/bin:/usr/common/software/sles15_cgpu/cuda/11.1.1/bin:/usr/common/software/sles15_cgpu/gcc/8.3.0/bin:/opt/esslurm/bin:/opt/cray/pe/modules/ > 3.2.11.4/bin:/usr/common/software/bin:/usr/common/nsg/bin:/opt/ovis/bin:/opt/ovis/sbin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/cray/pe/bin:`dirname > > /usr/common/software/sles15_cgpu/cuda/11.1.1/bin/nvcc` > NVCC_WRAPPER_DEFAULT_COMPILER=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin/mpicxx > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/bin/nvcc_wrapper > --expt-extended-lambda -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 > -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 > -Xcompiler -fPIC -O3 -gencode arch=compute_80,code=sm_80 > -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include > -Wno-deprecated-gpu-targets -I/global/homes/m/madams/petsc/include > -I/global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include > -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include > -I/usr/common/software/sles15_cgpu/cuda/11.1.1/include > /global/u2/m/madams/petsc/src/sys/objects/kokkos/kinit.kokkos.cxx -o > arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o > g++: error: arch=compute_80: No such file or directory > g++: error: code=sm_80: No such file or directory > gmake[3]: *** [gmakefile:188: > arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o] Error 1 > gmake[3]: *** Waiting for unfinished jobs.... > CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/isio.o > CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/ftn-auto/pmapf.o > CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/pmap.o > CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/psort.o > CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/isltog.o > FC arch-cori-gpu-opt-kokkos-gcc/obj/vec/f90-mod/petscvecmod.o > CUDAC arch-cori-gpu-opt-kokkos-gcc/obj/sys/memory/cuda/mcudahost.o > CUDAC.dep arch-cori-gpu-opt-kokkos-gcc/obj/sys/memory/cuda/mcudahost.o > gmake[2]: *** [/global/homes/m/madams/petsc/lib/petsc/conf/rules:50: libs] > Error 2 > **************************ERROR************************************* > Error during compile, check > arch-cori-gpu-opt-kokkos-gcc/lib/petsc/conf/make.log > Send it and arch-cori-gpu-opt-kokkos-gcc/lib/petsc/conf/configure.log to > petsc-maint at mcs.anl.gov > ******************************************************************** > gmake[1]: *** [makefile:40: all] Error 1 > make: *** [GNUmakefile:9: all] Error 2 > > On Sat, May 15, 2021 at 5:30 PM Barry Smith wrote: > >> >> Bitbucket is off-line. Do you need fortran stubs at this moment? If not >> use --with-fortran-interface=0 >> >> >> On May 15, 2021, at 2:51 PM, Mark Adams wrote: >> >> I can build a non-kokkos PETSc here (Cori/GPU) but I get this error with >> Kokkos. >> Any suggestions? >> Thanks, >> Mark >> >> ============================================================================================= >> >> >> >> ============================================================================================= >> >> >> Trying to >> download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING >> >> >> >> ============================================================================================= >> >> >> >> ============================================================================================= >> >> >> Trying to >> download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING >> >> >> >> ============================================================================================= >> >> >> >> ============================================================================================= >> >> >> Trying to >> download https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz >> for SOWING >> >> >> ============================================================================================= >> >> >> >> >> >> >> >> ******************************************************************************* >> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for >> details): >> >> ------------------------------------------------------------------------------- >> Error during download/extract/detection of SOWING: >> Unable to clone sowing >> Could not execute "['git clone https://bitbucket.org/petsc/pkg-sowing.git >> /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": >> Cloning into >> '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... >> fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/': >> The requested URL returned error: 503 >> Unable to download package SOWING from: git:// >> https://bitbucket.org/petsc/pkg-sowing.git >> * If URL specified manually - perhaps there is a typo? >> * If your network is disconnected - please reconnect and rerun ./configure >> * Or perhaps you have a firewall blocking the download >> * You can run with --with-packages-download-dir=/adirectory and >> ./configure will instruct you what packages to download manually >> * or you can download the above URL manually, to /yourselectedlocation >> and use the configure option: >> --download-sowing=/yourselectedlocation >> Unable to clone sowing >> Could not execute "['git clone https://bitbucket.org/petsc/pkg-sowing.git >> /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": >> Cloning into >> '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... >> fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/': >> The requested URL returned error: 503 >> Unable to download package SOWING from: git:// >> https://bitbucket.org/petsc/pkg-sowing.git >> * If URL specified manually - perhaps there is a typo? >> * If your network is disconnected - please reconnect and rerun ./configure >> * Or perhaps you have a firewall blocking the download >> * You can run with --with-packages-download-dir=/adirectory and >> ./configure will instruct you what packages to download manually >> * or you can download the above URL manually, to /yourselectedlocation >> and use the configure option: >> --download-sowing=/yourselectedlocation >> file could not be opened successfully >> Downloaded package SOWING from: >> https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz is not a >> tarball. >> [or installed python cannot process compressed files] >> * If you are behind a firewall - please fix your proxy and rerun >> ./configure >> For example at LANL you may need to set the environmental variable >> http_proxy (or HTTP_PROXY?) to http://proxyout.lanl.gov >> * You can run with --with-packages-download-dir=/adirectory and >> ./configure will instruct you what packages to download manually >> * or you can download the above URL manually, to >> /yourselectedlocation/v1.1.26-p1.tar.gz >> and use the configure option: >> --download-sowing=/yourselectedlocation/v1.1.26-p1.tar.gz >> >> ******************************************************************************* >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sat May 15 23:31:56 2021 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 15 May 2021 23:31:56 -0500 Subject: [petsc-users] configure error In-Reply-To: References: <22289CC9-0240-4046-B42E-8CB80F6DD9BB@petsc.dev> Message-ID: <2ADD8B67-6C33-47A7-824F-5F1C15937DD5@petsc.dev> Looks like nvcc_wrapper gets confused when both -arch and -gencode are used. But handles either of them separately correctly. Mark, PETSc configure now handles the -arch and --with-kokkos-cuda-arch= business automatically so you do not, and generally shouldn't pass -arch sm_70 in --CUDAFLAGS anymore or use the --with-kokkos-cuda-arch option anymore. If you need to set the -arch to something earlier than the system supports (rare) you can now use -with-cuda-gencodearch to set that instead of using CUDAFLAGS. Barry $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda -c -Xcompiler -fPIC -g -gencode arch=compute_70,code=sm_70 -I/home/bsmith/soft/gnu-mpich/include -Wno-deprecated-gpu-targets -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx -o arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o nvcc --expt-extended-lambda -gencode arch=compute_70,code=sm_70 -Wno-deprecated-gpu-targets -ccbin mpicxx -g -I/home/bsmith/soft/gnu-mpich/include -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include -Xcompiler -fPIC -x cu /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx -c -o arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o ~/petsc (main=) arch-main $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -Xcompiler -fPIC -O3 -gencode arch=compute_80,code=sm_80 -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -Wno-deprecated-gpu-targets nvcc --expt-extended-lambda -arch=sm_70 -Wno-deprecated-gpu-targets -ccbin mpicxx -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -O3 -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -Xcompiler -rdynamic,-fPIC,arch=compute_80,code=sm_80 -c ~/petsc (main=) arch-main $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda -c -Xcompiler -fPIC -g -arch=sm_70 -gencode arch=compute_70,code=sm_70 -I/home/bsmith/soft/gnu-mpich/include -Wno-deprecated-gpu-targets -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx -o arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o nvcc --expt-extended-lambda -arch=sm_70 -Wno-deprecated-gpu-targets -ccbin mpicxx -g -I/home/bsmith/soft/gnu-mpich/include -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include -Xcompiler -fPIC,arch=compute_70,code=sm_70 -x cu /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx -c -o arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o > On May 15, 2021, at 8:44 PM, Junchao Zhang wrote: > > Add --show to the failing nvcc_wrapper command and see how nvcc_wrapper parses the options > > PATH=/global/common/sw/cray/sles15/x86_64/zlib/1.2.11/gcc/8.2.0/pep2pal/bin:/global/common/sw/cray/cnl7/haswell/cmake/3.18.2/bin:/opt/cray/rca/2.2.20-7.0.1.1_4.62__g8e3fb5b.ari/bin:/opt/cray/alps/6.6.58-7.0.1.1_6.20__g437d88db.ari/sbin:/opt/cray/alps/default/bin:/opt/cray/job/2.2.4-7.0.1.1_3.48__g36b56f4.ari/bin:/opt/cray/pe/craype/2.6.2/bin:/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/compilers/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/cuda/bin:/usr/common/software/sles15_cgpu/cuda/11.1.1/bin:/usr/common/software/sles15_cgpu/gcc/8.3.0/bin:/opt/esslurm/bin:/opt/cray/pe/modules/3.2.11.4/bin:/usr/common/software/bin:/usr/common/nsg/bin:/opt/ovis/bin:/opt/ovis/sbin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/cray/pe/bin:`dirname /usr/common/software/sles15_cgpu/cuda/11.1.1/bin/nvcc` NVCC_WRAPPER_DEFAULT_COMPILER=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin/mpicxx /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/bin/nvcc_wrapper --expt-extended-lambda -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -Xcompiler -fPIC -O3 -gencode arch=compute_80,code=sm_80 -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -Wno-deprecated-gpu-targets -I/global/homes/m/madams/petsc/include -I/global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -I/usr/common/software/sles15_cgpu/cuda/11.1.1/include /global/u2/m/madams/petsc/src/sys/objects/kokkos/kinit.kokkos.cxx -o arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o --show > > --Junchao Zhang > > > On Sat, May 15, 2021 at 7:14 PM Mark Adams > wrote: > Thanks, > Now I get an error in make all. THis was working a few weeks ago. > Make.log was empty but here is the output. > .... > CC arch-cori-gpu-opt-kokkos-gcc/obj/sys/error/ftn-custom/zerrf.o > CC arch-cori-gpu-opt-kokkos-gcc/obj/sys/f90-src/f90_cwrap.o > CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/hdf5io.o > PATH=/global/common/sw/cray/sles15/x86_64/zlib/1.2.11/gcc/8.2.0/pep2pal/bin:/global/common/sw/cray/cnl7/haswell/cmake/3.18.2/bin:/opt/cray/rca/2.2.20-7.0.1.1_4.62__g8e3fb5b.ari/bin:/opt/cray/alps/6.6.58-7.0.1.1_6.20__g437d88db.ari/sbin:/opt/cray/alps/default/bin:/opt/cray/job/2.2.4-7.0.1.1_3.48__g36b56f4.ari/bin:/opt/cray/pe/craype/2.6.2/bin:/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/compilers/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/cuda/bin:/usr/common/software/sles15_cgpu/cuda/11.1.1/bin:/usr/common/software/sles15_cgpu/gcc/8.3.0/bin:/opt/esslurm/bin:/opt/cray/pe/modules/3.2.11.4/bin:/usr/common/software/bin:/usr/common/nsg/bin:/opt/ovis/bin:/opt/ovis/sbin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/cray/pe/bin:`dirname /usr/common/software/sles15_cgpu/cuda/11.1.1/bin/nvcc` NVCC_WRAPPER_DEFAULT_COMPILER=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin/mpicxx /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/bin/nvcc_wrapper --expt-extended-lambda -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -Xcompiler -fPIC -O3 -gencode arch=compute_80,code=sm_80 -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -Wno-deprecated-gpu-targets -I/global/homes/m/madams/petsc/include -I/global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -I/usr/common/software/sles15_cgpu/cuda/11.1.1/include /global/u2/m/madams/petsc/src/sys/objects/kokkos/kinit.kokkos.cxx -o arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o > g++: error: arch=compute_80: No such file or directory > g++: error: code=sm_80: No such file or directory > gmake[3]: *** [gmakefile:188: arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o] Error 1 > gmake[3]: *** Waiting for unfinished jobs.... > CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/isio.o > CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/ftn-auto/pmapf.o > CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/pmap.o > CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/psort.o > CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/isltog.o > FC arch-cori-gpu-opt-kokkos-gcc/obj/vec/f90-mod/petscvecmod.o > CUDAC arch-cori-gpu-opt-kokkos-gcc/obj/sys/memory/cuda/mcudahost.o > CUDAC.dep arch-cori-gpu-opt-kokkos-gcc/obj/sys/memory/cuda/mcudahost.o > gmake[2]: *** [/global/homes/m/madams/petsc/lib/petsc/conf/rules:50: libs] Error 2 > **************************ERROR************************************* > Error during compile, check arch-cori-gpu-opt-kokkos-gcc/lib/petsc/conf/make.log > Send it and arch-cori-gpu-opt-kokkos-gcc/lib/petsc/conf/configure.log to petsc-maint at mcs.anl.gov > ******************************************************************** > gmake[1]: *** [makefile:40: all] Error 1 > make: *** [GNUmakefile:9: all] Error 2 > > On Sat, May 15, 2021 at 5:30 PM Barry Smith > wrote: > > Bitbucket is off-line. Do you need fortran stubs at this moment? If not use --with-fortran-interface=0 > > >> On May 15, 2021, at 2:51 PM, Mark Adams > wrote: >> >> I can build a non-kokkos PETSc here (Cori/GPU) but I get this error with Kokkos. >> Any suggestions? >> Thanks, >> Mark >> >> ============================================================================================= ============================================================================================= Trying to download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING ============================================================================================= ============================================================================================= Trying to download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING ============================================================================================= ============================================================================================= Trying to download https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz for SOWING ============================================================================================= ******************************************************************************* >> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): >> ------------------------------------------------------------------------------- >> Error during download/extract/detection of SOWING: >> Unable to clone sowing >> Could not execute "['git clone https://bitbucket.org/petsc/pkg-sowing.git /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": >> Cloning into '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... >> fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/ ': The requested URL returned error: 503 >> Unable to download package SOWING from: git://https://bitbucket.org/petsc/pkg-sowing.git >> * If URL specified manually - perhaps there is a typo? >> * If your network is disconnected - please reconnect and rerun ./configure >> * Or perhaps you have a firewall blocking the download >> * You can run with --with-packages-download-dir=/adirectory and ./configure will instruct you what packages to download manually >> * or you can download the above URL manually, to /yourselectedlocation >> and use the configure option: >> --download-sowing=/yourselectedlocation >> Unable to clone sowing >> Could not execute "['git clone https://bitbucket.org/petsc/pkg-sowing.git /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": >> Cloning into '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... >> fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/ ': The requested URL returned error: 503 >> Unable to download package SOWING from: git://https://bitbucket.org/petsc/pkg-sowing.git >> * If URL specified manually - perhaps there is a typo? >> * If your network is disconnected - please reconnect and rerun ./configure >> * Or perhaps you have a firewall blocking the download >> * You can run with --with-packages-download-dir=/adirectory and ./configure will instruct you what packages to download manually >> * or you can download the above URL manually, to /yourselectedlocation >> and use the configure option: >> --download-sowing=/yourselectedlocation >> file could not be opened successfully >> Downloaded package SOWING from: https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz is not a tarball. >> [or installed python cannot process compressed files] >> * If you are behind a firewall - please fix your proxy and rerun ./configure >> For example at LANL you may need to set the environmental variable http_proxy (or HTTP_PROXY?) to http://proxyout.lanl.gov >> * You can run with --with-packages-download-dir=/adirectory and ./configure will instruct you what packages to download manually >> * or you can download the above URL manually, to /yourselectedlocation/v1.1.26-p1.tar.gz >> and use the configure option: >> --download-sowing=/yourselectedlocation/v1.1.26-p1.tar.gz >> ******************************************************************************* >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sun May 16 06:13:50 2021 From: mfadams at lbl.gov (Mark Adams) Date: Sun, 16 May 2021 07:13:50 -0400 Subject: [petsc-users] configure error In-Reply-To: <2ADD8B67-6C33-47A7-824F-5F1C15937DD5@petsc.dev> References: <22289CC9-0240-4046-B42E-8CB80F6DD9BB@petsc.dev> <2ADD8B67-6C33-47A7-824F-5F1C15937DD5@petsc.dev> Message-ID: progress: 04:08 cgpu11 adams/cusparse-cpu-solve *= ~/petsc$ make PETSC_DIR=/global/homes/m/madams/petsc PETSC_ARCH=arch-cori-gpu-opt-kokkos-gcc check Running check examples to verify correct installation Using PETSC_DIR=/global/homes/m/madams/petsc and PETSC_ARCH=arch-cori-gpu-opt-kokkos-gcc C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes See http://www.mcs.anl.gov/petsc/documentation/faq.html srun: error: Unable to create step for job 1921749: More processors requested than permitted 2,15c2,32 < 0 SNES Function norm 2.391552133017e-01 < 0 KSP Residual norm 2.325621076120e-01 < 1 KSP Residual norm 1.654206318674e-02 < 2 KSP Residual norm 7.202836119880e-04 < 3 KSP Residual norm 1.796861424199e-05 < 4 KSP Residual norm 2.461332992052e-07 < 1 SNES Function norm 6.826585648929e-05 < 0 KSP Residual norm 2.347339172985e-05 < 1 KSP Residual norm 8.356798075993e-07 < 2 KSP Residual norm 1.844045309619e-08 < 3 KSP Residual norm 5.336386977405e-10 < 4 KSP Residual norm 2.662608472862e-11 < 2 SNES Function norm 6.549682264799e-11 < Number of SNES iterations = 2 --- > Kokkos::Cuda::initialize ERROR: likely mismatch of architecture > [cgpu11:39927] *** Process received signal *** > [cgpu11:39927] Signal: Aborted (6) > [cgpu11:39927] Signal code: (-6) > [cgpu11:39927] [ 0] /lib64/libpthread.so.0(+0x12360)[0x2aab12731360] > [cgpu11:39927] [ 1] /lib64/libc.so.6(gsignal+0x110)[0x2aab12973160] > [cgpu11:39927] [ 2] /lib64/libc.so.6(abort+0x151)[0x2aab12974741] > [cgpu11:39927] [ 3] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(+0x408e5)[0x2aaabbfb88e5] > [cgpu11:39927] [ 4] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl12CudaInternal10initializeEiP11CUstream_st+0xb5c)[0x2aaabbfc762c] > [cgpu11:39927] [ 5] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl16ExecSpaceManager17initialize_spacesERKNS_13InitArgumentsE+0x24)[0x2aaabbfa1224] > [cgpu11:39927] [ 6] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_00011209_00000000_7_Kokkos_Core_cpp1_ii_889c95a619initialize_internalERKNS_13InitArgumentsE+0x11)[0x2aaabbfa49d1] > [cgpu11:39927] [ 7] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(PetscKokkosInitializeCheck+0x1f9)[0x2aaaaafad809] > [cgpu11:39927] [ 8] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x68541b)[0x2aaaab35841b] > [cgpu11:39927] [ 9] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x36a224)[0x2aaaab03d224] > [cgpu11:39927] [10] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x35f596)[0x2aaaab032596] > [cgpu11:39927] [11] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x2f9d94)[0x2aaaaafccd94] > [cgpu11:39927] [12] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(PetscSFBcastWithMemTypeBegin+0x11f)[0x2aaaab04bc3f] > [cgpu11:39927] [13] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x37f53a)[0x2aaaab05253a] > [cgpu11:39927] [14] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(VecScatterBegin+0x70)[0x2aaaab057b30] > [cgpu11:39927] [15] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(DMGlobalToLocalBegin_DA+0x1f)[0x2aaaabc27a5f] > [cgpu11:39927] [16] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(DMGlobalToLocalBegin+0x1dd)[0x2aaaabe61b3d] > [cgpu11:39927] [17] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x144567a)[0x2aaaac11867a] > [cgpu11:39927] [18] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESComputeFunction+0xf5)[0x2aaaac138ef5] > [cgpu11:39927] [19] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x14ad0de)[0x2aaaac1800de] > [cgpu11:39927] [20] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESSolve+0x821)[0x2aaaac146ed1] > [cgpu11:39927] [21] /global/u2/m/madams/petsc/src/snes/tutorials/./ex19[0x401a69] > [cgpu11:39927] [22] /lib64/libc.so.6(__libc_start_main+0xea)[0x2aab1295df8a] > [cgpu11:39927] [23] /global/u2/m/madams/petsc/src/snes/tutorials/./ex19[0x4021ba] > [cgpu11:39927] *** End of error message *** > srun: error: cgpu11: task 0: Aborted > srun: Terminating job step 1921749.6 /global/homes/m/madams/petsc/src/snes/tutorials Possible problem with ex19 running with cuda, diffs above ========================================= gmake[3]: [makefile:102: runex3k_kokkos] Error 1 (ignored) On Sun, May 16, 2021 at 12:31 AM Barry Smith wrote: > > Looks like nvcc_wrapper gets confused when both -arch and -gencode are > used. But handles either of them separately correctly. > > Mark, > > PETSc configure now handles the -arch and --with-kokkos-cuda-arch= > business automatically so you do not, and generally shouldn't pass -arch > sm_70 in --CUDAFLAGS anymore or use the --with-kokkos-cuda-arch option > anymore. > > If you need to set the -arch to something earlier than the system > supports (rare) you can now use -with-cuda-gencodearch to set that instead > of using CUDAFLAGS. > > Barry > > > $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx > /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda > -c -Xcompiler -fPIC -g -gencode arch=compute_70,code=sm_70 > -I/home/bsmith/soft/gnu-mpich/include -Wno-deprecated-gpu-targets > -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include > -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include > /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx > -o arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o > nvcc --expt-extended-lambda -gencode arch=compute_70,code=sm_70 > -Wno-deprecated-gpu-targets -ccbin mpicxx -g > -I/home/bsmith/soft/gnu-mpich/include -I/home/bsmith/petsc/include > -I/home/bsmith/petsc/arch-main/include > -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include > -Xcompiler -fPIC -x cu > /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx -c -o > arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o > ~/petsc* (main=)* arch-main > > $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx > /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda > -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 > -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -Xcompiler -fPIC -O3 -gencode > arch=compute_80,code=sm_80 > -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include > -Wno-deprecated-gpu-targets > nvcc --expt-extended-lambda -arch=sm_70 -Wno-deprecated-gpu-targets -ccbin > mpicxx -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC > -DLANDAU_MAX_Q=4 -O3 > -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -Xcompiler > -rdynamic,-fPIC,arch=compute_80,code=sm_80 -c > ~/petsc* (main=)* arch-main > > $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx > /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda > -c -Xcompiler -fPIC -g -arch=sm_70 -gencode arch=compute_70,code=sm_70 > -I/home/bsmith/soft/gnu-mpich/include -Wno-deprecated-gpu-targets > -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include > -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include > /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx > -o arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o > nvcc --expt-extended-lambda -arch=sm_70 -Wno-deprecated-gpu-targets -ccbin > mpicxx -g -I/home/bsmith/soft/gnu-mpich/include > -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include > -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include > -Xcompiler -fPIC,arch=compute_70,code=sm_70 -x cu > /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx -c -o > arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o > > > On May 15, 2021, at 8:44 PM, Junchao Zhang > wrote: > > Add --show to the failing nvcc_wrapper command and see how nvcc_wrapper > parses the options > > > PATH=/global/common/sw/cray/sles15/x86_64/zlib/1.2.11/gcc/8.2.0/pep2pal/bin:/global/common/sw/cray/cnl7/haswell/cmake/3.18.2/bin:/opt/cray/rca/2.2.20-7.0.1.1_4.62__g8e3fb5b.ari/bin:/opt/cray/alps/6.6.58-7.0.1.1_6.20__g437d88db.ari/sbin:/opt/cray/alps/default/bin:/opt/cray/job/2.2.4-7.0.1.1_3.48__g36b56f4.ari/bin:/opt/cray/pe/craype/2.6.2/bin:/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/compilers/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/cuda/bin:/usr/common/software/sles15_cgpu/cuda/11.1.1/bin:/usr/common/software/sles15_cgpu/gcc/8.3.0/bin:/opt/esslurm/bin:/opt/cray/pe/modules/ > 3.2.11.4/bin:/usr/common/software/bin:/usr/common/nsg/bin:/opt/ovis/bin:/opt/ovis/sbin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/cray/pe/bin:`dirname > /usr/common/software/sles15_cgpu/cuda/11.1.1/bin/nvcc` > NVCC_WRAPPER_DEFAULT_COMPILER=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin/mpicxx > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/bin/nvcc_wrapper > --expt-extended-lambda -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 > -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 > -Xcompiler -fPIC -O3 -gencode arch=compute_80,code=sm_80 > -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include > -Wno-deprecated-gpu-targets -I/global/homes/m/madams/petsc/include > -I/global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include > -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include > -I/usr/common/software/sles15_cgpu/cuda/11.1.1/include > /global/u2/m/madams/petsc/src/sys/objects/kokkos/kinit.kokkos.cxx -o > arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o --show > > --Junchao Zhang > > > On Sat, May 15, 2021 at 7:14 PM Mark Adams wrote: > >> Thanks, >> Now I get an error in make all. THis was working a few weeks ago. >> Make.log was empty but here is the output. >> .... >> CC arch-cori-gpu-opt-kokkos-gcc/obj/sys/error/ftn-custom/zerrf.o >> CC arch-cori-gpu-opt-kokkos-gcc/obj/sys/f90-src/f90_cwrap.o >> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/hdf5io.o >> >> PATH=/global/common/sw/cray/sles15/x86_64/zlib/1.2.11/gcc/8.2.0/pep2pal/bin:/global/common/sw/cray/cnl7/haswell/cmake/3.18.2/bin:/opt/cray/rca/2.2.20-7.0.1.1_4.62__g8e3fb5b.ari/bin:/opt/cray/alps/6.6.58-7.0.1.1_6.20__g437d88db.ari/sbin:/opt/cray/alps/default/bin:/opt/cray/job/2.2.4-7.0.1.1_3.48__g36b56f4.ari/bin:/opt/cray/pe/craype/2.6.2/bin:/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/compilers/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/cuda/bin:/usr/common/software/sles15_cgpu/cuda/11.1.1/bin:/usr/common/software/sles15_cgpu/gcc/8.3.0/bin:/opt/esslurm/bin:/opt/cray/pe/modules/ >> 3.2.11.4/bin:/usr/common/software/bin:/usr/common/nsg/bin:/opt/ovis/bin:/opt/ovis/sbin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/cray/pe/bin:`dirname >> >> /usr/common/software/sles15_cgpu/cuda/11.1.1/bin/nvcc` >> NVCC_WRAPPER_DEFAULT_COMPILER=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin/mpicxx >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/bin/nvcc_wrapper >> --expt-extended-lambda -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 >> -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 >> -Xcompiler -fPIC -O3 -gencode arch=compute_80,code=sm_80 >> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include >> -Wno-deprecated-gpu-targets -I/global/homes/m/madams/petsc/include >> -I/global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include >> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include >> -I/usr/common/software/sles15_cgpu/cuda/11.1.1/include >> /global/u2/m/madams/petsc/src/sys/objects/kokkos/kinit.kokkos.cxx -o >> arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o >> g++: error: arch=compute_80: No such file or directory >> g++: error: code=sm_80: No such file or directory >> gmake[3]: *** [gmakefile:188: >> arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o] Error 1 >> gmake[3]: *** Waiting for unfinished jobs.... >> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/isio.o >> CC >> arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/ftn-auto/pmapf.o >> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/pmap.o >> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/psort.o >> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/isltog.o >> FC arch-cori-gpu-opt-kokkos-gcc/obj/vec/f90-mod/petscvecmod.o >> CUDAC arch-cori-gpu-opt-kokkos-gcc/obj/sys/memory/cuda/mcudahost.o >> CUDAC.dep arch-cori-gpu-opt-kokkos-gcc/obj/sys/memory/cuda/mcudahost.o >> gmake[2]: *** [/global/homes/m/madams/petsc/lib/petsc/conf/rules:50: >> libs] Error 2 >> **************************ERROR************************************* >> Error during compile, check >> arch-cori-gpu-opt-kokkos-gcc/lib/petsc/conf/make.log >> Send it and arch-cori-gpu-opt-kokkos-gcc/lib/petsc/conf/configure.log >> to petsc-maint at mcs.anl.gov >> ******************************************************************** >> gmake[1]: *** [makefile:40: all] Error 1 >> make: *** [GNUmakefile:9: all] Error 2 >> >> On Sat, May 15, 2021 at 5:30 PM Barry Smith wrote: >> >>> >>> Bitbucket is off-line. Do you need fortran stubs at this moment? If >>> not use --with-fortran-interface=0 >>> >>> >>> On May 15, 2021, at 2:51 PM, Mark Adams wrote: >>> >>> I can build a non-kokkos PETSc here (Cori/GPU) but I get this error with >>> Kokkos. >>> Any suggestions? >>> Thanks, >>> Mark >>> >>> ============================================================================================= >>> >>> >>> >>> ============================================================================================= >>> >>> >>> Trying to >>> download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING >>> >>> >>> >>> ============================================================================================= >>> >>> >>> >>> ============================================================================================= >>> >>> >>> Trying to >>> download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING >>> >>> >>> >>> ============================================================================================= >>> >>> >>> >>> ============================================================================================= >>> >>> >>> Trying to >>> download https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz >>> for SOWING >>> >>> >>> ============================================================================================= >>> >>> >>> >>> >>> >>> >>> >>> ******************************************************************************* >>> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log >>> for details): >>> >>> ------------------------------------------------------------------------------- >>> Error during download/extract/detection of SOWING: >>> Unable to clone sowing >>> Could not execute "['git clone >>> https://bitbucket.org/petsc/pkg-sowing.git >>> /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": >>> Cloning into >>> '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... >>> fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/': >>> The requested URL returned error: 503 >>> Unable to download package SOWING from: git:// >>> https://bitbucket.org/petsc/pkg-sowing.git >>> * If URL specified manually - perhaps there is a typo? >>> * If your network is disconnected - please reconnect and rerun >>> ./configure >>> * Or perhaps you have a firewall blocking the download >>> * You can run with --with-packages-download-dir=/adirectory and >>> ./configure will instruct you what packages to download manually >>> * or you can download the above URL manually, to /yourselectedlocation >>> and use the configure option: >>> --download-sowing=/yourselectedlocation >>> Unable to clone sowing >>> Could not execute "['git clone >>> https://bitbucket.org/petsc/pkg-sowing.git >>> /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": >>> Cloning into >>> '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... >>> fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/': >>> The requested URL returned error: 503 >>> Unable to download package SOWING from: git:// >>> https://bitbucket.org/petsc/pkg-sowing.git >>> * If URL specified manually - perhaps there is a typo? >>> * If your network is disconnected - please reconnect and rerun >>> ./configure >>> * Or perhaps you have a firewall blocking the download >>> * You can run with --with-packages-download-dir=/adirectory and >>> ./configure will instruct you what packages to download manually >>> * or you can download the above URL manually, to /yourselectedlocation >>> and use the configure option: >>> --download-sowing=/yourselectedlocation >>> file could not be opened successfully >>> Downloaded package SOWING from: >>> https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz is not a >>> tarball. >>> [or installed python cannot process compressed files] >>> * If you are behind a firewall - please fix your proxy and rerun >>> ./configure >>> For example at LANL you may need to set the environmental variable >>> http_proxy (or HTTP_PROXY?) to http://proxyout.lanl.gov >>> * You can run with --with-packages-download-dir=/adirectory and >>> ./configure will instruct you what packages to download manually >>> * or you can download the above URL manually, to >>> /yourselectedlocation/v1.1.26-p1.tar.gz >>> and use the configure option: >>> --download-sowing=/yourselectedlocation/v1.1.26-p1.tar.gz >>> >>> ******************************************************************************* >>> >>> >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log Type: application/octet-stream Size: 79530 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 2690560 bytes Desc: not available URL: From junchao.zhang at gmail.com Sun May 16 07:37:45 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Sun, 16 May 2021 07:37:45 -0500 Subject: [petsc-users] configure error In-Reply-To: References: <22289CC9-0240-4046-B42E-8CB80F6DD9BB@petsc.dev> <2ADD8B67-6C33-47A7-824F-5F1C15937DD5@petsc.dev> Message-ID: Remove --with-kokkos-cuda-arch=VOLTA70 from your configure Barry: It seems we should either disable this option, or rename it to --with-cuda-arch and get it accepted by both kokkos.py and cuda.py --Junchao Zhang On Sun, May 16, 2021 at 6:15 AM Mark Adams wrote: > progress: > > 04:08 cgpu11 adams/cusparse-cpu-solve *= ~/petsc$ make > PETSC_DIR=/global/homes/m/madams/petsc > PETSC_ARCH=arch-cori-gpu-opt-kokkos-gcc check > Running check examples to verify correct installation > Using PETSC_DIR=/global/homes/m/madams/petsc and > PETSC_ARCH=arch-cori-gpu-opt-kokkos-gcc > C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes > See http://www.mcs.anl.gov/petsc/documentation/faq.html > srun: error: Unable to create step for job 1921749: More processors > requested than permitted > 2,15c2,32 > < 0 SNES Function norm 2.391552133017e-01 > < 0 KSP Residual norm 2.325621076120e-01 > < 1 KSP Residual norm 1.654206318674e-02 > < 2 KSP Residual norm 7.202836119880e-04 > < 3 KSP Residual norm 1.796861424199e-05 > < 4 KSP Residual norm 2.461332992052e-07 > < 1 SNES Function norm 6.826585648929e-05 > < 0 KSP Residual norm 2.347339172985e-05 > < 1 KSP Residual norm 8.356798075993e-07 > < 2 KSP Residual norm 1.844045309619e-08 > < 3 KSP Residual norm 5.336386977405e-10 > < 4 KSP Residual norm 2.662608472862e-11 > < 2 SNES Function norm 6.549682264799e-11 > < Number of SNES iterations = 2 > --- > > Kokkos::Cuda::initialize ERROR: likely mismatch of architecture > > [cgpu11:39927] *** Process received signal *** > > [cgpu11:39927] Signal: Aborted (6) > > [cgpu11:39927] Signal code: (-6) > > [cgpu11:39927] [ 0] /lib64/libpthread.so.0(+0x12360)[0x2aab12731360] > > [cgpu11:39927] [ 1] /lib64/libc.so.6(gsignal+0x110)[0x2aab12973160] > > [cgpu11:39927] [ 2] /lib64/libc.so.6(abort+0x151)[0x2aab12974741] > > [cgpu11:39927] [ 3] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(+0x408e5)[0x2aaabbfb88e5] > > [cgpu11:39927] [ 4] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl12CudaInternal10initializeEiP11CUstream_st+0xb5c)[0x2aaabbfc762c] > > [cgpu11:39927] [ 5] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl16ExecSpaceManager17initialize_spacesERKNS_13InitArgumentsE+0x24)[0x2aaabbfa1224] > > [cgpu11:39927] [ 6] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_00011209_00000000_7_Kokkos_Core_cpp1_ii_889c95a619initialize_internalERKNS_13InitArgumentsE+0x11)[0x2aaabbfa49d1] > > [cgpu11:39927] [ 7] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(PetscKokkosInitializeCheck+0x1f9)[0x2aaaaafad809] > > [cgpu11:39927] [ 8] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x68541b)[0x2aaaab35841b] > > [cgpu11:39927] [ 9] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x36a224)[0x2aaaab03d224] > > [cgpu11:39927] [10] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x35f596)[0x2aaaab032596] > > [cgpu11:39927] [11] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x2f9d94)[0x2aaaaafccd94] > > [cgpu11:39927] [12] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(PetscSFBcastWithMemTypeBegin+0x11f)[0x2aaaab04bc3f] > > [cgpu11:39927] [13] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x37f53a)[0x2aaaab05253a] > > [cgpu11:39927] [14] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(VecScatterBegin+0x70)[0x2aaaab057b30] > > [cgpu11:39927] [15] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(DMGlobalToLocalBegin_DA+0x1f)[0x2aaaabc27a5f] > > [cgpu11:39927] [16] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(DMGlobalToLocalBegin+0x1dd)[0x2aaaabe61b3d] > > [cgpu11:39927] [17] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x144567a)[0x2aaaac11867a] > > [cgpu11:39927] [18] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESComputeFunction+0xf5)[0x2aaaac138ef5] > > [cgpu11:39927] [19] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x14ad0de)[0x2aaaac1800de] > > [cgpu11:39927] [20] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESSolve+0x821)[0x2aaaac146ed1] > > [cgpu11:39927] [21] > /global/u2/m/madams/petsc/src/snes/tutorials/./ex19[0x401a69] > > [cgpu11:39927] [22] > /lib64/libc.so.6(__libc_start_main+0xea)[0x2aab1295df8a] > > [cgpu11:39927] [23] > /global/u2/m/madams/petsc/src/snes/tutorials/./ex19[0x4021ba] > > [cgpu11:39927] *** End of error message *** > > srun: error: cgpu11: task 0: Aborted > > srun: Terminating job step 1921749.6 > /global/homes/m/madams/petsc/src/snes/tutorials > Possible problem with ex19 running with cuda, diffs above > ========================================= > gmake[3]: [makefile:102: runex3k_kokkos] Error 1 (ignored) > > On Sun, May 16, 2021 at 12:31 AM Barry Smith wrote: > >> >> Looks like nvcc_wrapper gets confused when both -arch and -gencode are >> used. But handles either of them separately correctly. >> >> Mark, >> >> PETSc configure now handles the -arch and --with-kokkos-cuda-arch= >> business automatically so you do not, and generally shouldn't pass -arch >> sm_70 in --CUDAFLAGS anymore or use the --with-kokkos-cuda-arch option >> anymore. >> >> If you need to set the -arch to something earlier than the system >> supports (rare) you can now use -with-cuda-gencodearch to set that instead >> of using CUDAFLAGS. >> >> Barry >> >> >> $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx >> /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda >> -c -Xcompiler -fPIC -g -gencode arch=compute_70,code=sm_70 >> -I/home/bsmith/soft/gnu-mpich/include -Wno-deprecated-gpu-targets >> -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include >> -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include >> /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx >> -o arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o >> nvcc --expt-extended-lambda -gencode arch=compute_70,code=sm_70 >> -Wno-deprecated-gpu-targets -ccbin mpicxx -g >> -I/home/bsmith/soft/gnu-mpich/include -I/home/bsmith/petsc/include >> -I/home/bsmith/petsc/arch-main/include >> -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include >> -Xcompiler -fPIC -x cu >> /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx -c -o >> arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o >> ~/petsc* (main=)* arch-main >> >> $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx >> /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda >> -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 >> -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -Xcompiler -fPIC -O3 -gencode >> arch=compute_80,code=sm_80 >> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include >> -Wno-deprecated-gpu-targets >> nvcc --expt-extended-lambda -arch=sm_70 -Wno-deprecated-gpu-targets >> -ccbin mpicxx -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 >> -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -O3 >> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -Xcompiler >> -rdynamic,-fPIC,arch=compute_80,code=sm_80 -c >> ~/petsc* (main=)* arch-main >> >> $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx >> /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda >> -c -Xcompiler -fPIC -g -arch=sm_70 -gencode arch=compute_70,code=sm_70 >> -I/home/bsmith/soft/gnu-mpich/include -Wno-deprecated-gpu-targets >> -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include >> -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include >> /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx >> -o arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o >> nvcc --expt-extended-lambda -arch=sm_70 -Wno-deprecated-gpu-targets >> -ccbin mpicxx -g -I/home/bsmith/soft/gnu-mpich/include >> -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include >> -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include >> -Xcompiler -fPIC,arch=compute_70,code=sm_70 -x cu >> /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx -c -o >> arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o >> >> >> On May 15, 2021, at 8:44 PM, Junchao Zhang >> wrote: >> >> Add --show to the failing nvcc_wrapper command and see how nvcc_wrapper >> parses the options >> >> >> PATH=/global/common/sw/cray/sles15/x86_64/zlib/1.2.11/gcc/8.2.0/pep2pal/bin:/global/common/sw/cray/cnl7/haswell/cmake/3.18.2/bin:/opt/cray/rca/2.2.20-7.0.1.1_4.62__g8e3fb5b.ari/bin:/opt/cray/alps/6.6.58-7.0.1.1_6.20__g437d88db.ari/sbin:/opt/cray/alps/default/bin:/opt/cray/job/2.2.4-7.0.1.1_3.48__g36b56f4.ari/bin:/opt/cray/pe/craype/2.6.2/bin:/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/compilers/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/cuda/bin:/usr/common/software/sles15_cgpu/cuda/11.1.1/bin:/usr/common/software/sles15_cgpu/gcc/8.3.0/bin:/opt/esslurm/bin:/opt/cray/pe/modules/ >> 3.2.11.4/bin:/usr/common/software/bin:/usr/common/nsg/bin:/opt/ovis/bin:/opt/ovis/sbin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/cray/pe/bin:`dirname >> /usr/common/software/sles15_cgpu/cuda/11.1.1/bin/nvcc` >> NVCC_WRAPPER_DEFAULT_COMPILER=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin/mpicxx >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/bin/nvcc_wrapper >> --expt-extended-lambda -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 >> -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 >> -Xcompiler -fPIC -O3 -gencode arch=compute_80,code=sm_80 >> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include >> -Wno-deprecated-gpu-targets -I/global/homes/m/madams/petsc/include >> -I/global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include >> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include >> -I/usr/common/software/sles15_cgpu/cuda/11.1.1/include >> /global/u2/m/madams/petsc/src/sys/objects/kokkos/kinit.kokkos.cxx -o >> arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o --show >> >> --Junchao Zhang >> >> >> On Sat, May 15, 2021 at 7:14 PM Mark Adams wrote: >> >>> Thanks, >>> Now I get an error in make all. THis was working a few weeks ago. >>> Make.log was empty but here is the output. >>> .... >>> CC arch-cori-gpu-opt-kokkos-gcc/obj/sys/error/ftn-custom/zerrf.o >>> CC arch-cori-gpu-opt-kokkos-gcc/obj/sys/f90-src/f90_cwrap.o >>> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/hdf5io.o >>> >>> PATH=/global/common/sw/cray/sles15/x86_64/zlib/1.2.11/gcc/8.2.0/pep2pal/bin:/global/common/sw/cray/cnl7/haswell/cmake/3.18.2/bin:/opt/cray/rca/2.2.20-7.0.1.1_4.62__g8e3fb5b.ari/bin:/opt/cray/alps/6.6.58-7.0.1.1_6.20__g437d88db.ari/sbin:/opt/cray/alps/default/bin:/opt/cray/job/2.2.4-7.0.1.1_3.48__g36b56f4.ari/bin:/opt/cray/pe/craype/2.6.2/bin:/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/compilers/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/cuda/bin:/usr/common/software/sles15_cgpu/cuda/11.1.1/bin:/usr/common/software/sles15_cgpu/gcc/8.3.0/bin:/opt/esslurm/bin:/opt/cray/pe/modules/ >>> 3.2.11.4/bin:/usr/common/software/bin:/usr/common/nsg/bin:/opt/ovis/bin:/opt/ovis/sbin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/cray/pe/bin:`dirname >>> >>> /usr/common/software/sles15_cgpu/cuda/11.1.1/bin/nvcc` >>> NVCC_WRAPPER_DEFAULT_COMPILER=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin/mpicxx >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/bin/nvcc_wrapper >>> --expt-extended-lambda -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 >>> -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 >>> -Xcompiler -fPIC -O3 -gencode arch=compute_80,code=sm_80 >>> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include >>> -Wno-deprecated-gpu-targets -I/global/homes/m/madams/petsc/include >>> -I/global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include >>> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include >>> -I/usr/common/software/sles15_cgpu/cuda/11.1.1/include >>> /global/u2/m/madams/petsc/src/sys/objects/kokkos/kinit.kokkos.cxx -o >>> arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o >>> g++: error: arch=compute_80: No such file or directory >>> g++: error: code=sm_80: No such file or directory >>> gmake[3]: *** [gmakefile:188: >>> arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o] Error 1 >>> gmake[3]: *** Waiting for unfinished jobs.... >>> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/isio.o >>> CC >>> arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/ftn-auto/pmapf.o >>> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/pmap.o >>> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/psort.o >>> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/isltog.o >>> FC arch-cori-gpu-opt-kokkos-gcc/obj/vec/f90-mod/petscvecmod.o >>> CUDAC arch-cori-gpu-opt-kokkos-gcc/obj/sys/memory/cuda/mcudahost.o >>> CUDAC.dep arch-cori-gpu-opt-kokkos-gcc/obj/sys/memory/cuda/mcudahost.o >>> gmake[2]: *** [/global/homes/m/madams/petsc/lib/petsc/conf/rules:50: >>> libs] Error 2 >>> **************************ERROR************************************* >>> Error during compile, check >>> arch-cori-gpu-opt-kokkos-gcc/lib/petsc/conf/make.log >>> Send it and arch-cori-gpu-opt-kokkos-gcc/lib/petsc/conf/configure.log >>> to petsc-maint at mcs.anl.gov >>> ******************************************************************** >>> gmake[1]: *** [makefile:40: all] Error 1 >>> make: *** [GNUmakefile:9: all] Error 2 >>> >>> On Sat, May 15, 2021 at 5:30 PM Barry Smith wrote: >>> >>>> >>>> Bitbucket is off-line. Do you need fortran stubs at this moment? If >>>> not use --with-fortran-interface=0 >>>> >>>> >>>> On May 15, 2021, at 2:51 PM, Mark Adams wrote: >>>> >>>> I can build a non-kokkos PETSc here (Cori/GPU) but I get this >>>> error with Kokkos. >>>> Any suggestions? >>>> Thanks, >>>> Mark >>>> >>>> ============================================================================================= >>>> >>>> >>>> >>>> ============================================================================================= >>>> >>>> >>>> Trying to >>>> download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING >>>> >>>> >>>> >>>> ============================================================================================= >>>> >>>> >>>> >>>> ============================================================================================= >>>> >>>> >>>> Trying to >>>> download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING >>>> >>>> >>>> >>>> ============================================================================================= >>>> >>>> >>>> >>>> ============================================================================================= >>>> >>>> >>>> Trying to >>>> download https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz >>>> for SOWING >>>> >>>> >>>> ============================================================================================= >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ******************************************************************************* >>>> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log >>>> for details): >>>> >>>> ------------------------------------------------------------------------------- >>>> Error during download/extract/detection of SOWING: >>>> Unable to clone sowing >>>> Could not execute "['git clone >>>> https://bitbucket.org/petsc/pkg-sowing.git >>>> /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": >>>> Cloning into >>>> '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... >>>> fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/': >>>> The requested URL returned error: 503 >>>> Unable to download package SOWING from: git:// >>>> https://bitbucket.org/petsc/pkg-sowing.git >>>> * If URL specified manually - perhaps there is a typo? >>>> * If your network is disconnected - please reconnect and rerun >>>> ./configure >>>> * Or perhaps you have a firewall blocking the download >>>> * You can run with --with-packages-download-dir=/adirectory and >>>> ./configure will instruct you what packages to download manually >>>> * or you can download the above URL manually, to /yourselectedlocation >>>> and use the configure option: >>>> --download-sowing=/yourselectedlocation >>>> Unable to clone sowing >>>> Could not execute "['git clone >>>> https://bitbucket.org/petsc/pkg-sowing.git >>>> /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": >>>> Cloning into >>>> '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... >>>> fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/': >>>> The requested URL returned error: 503 >>>> Unable to download package SOWING from: git:// >>>> https://bitbucket.org/petsc/pkg-sowing.git >>>> * If URL specified manually - perhaps there is a typo? >>>> * If your network is disconnected - please reconnect and rerun >>>> ./configure >>>> * Or perhaps you have a firewall blocking the download >>>> * You can run with --with-packages-download-dir=/adirectory and >>>> ./configure will instruct you what packages to download manually >>>> * or you can download the above URL manually, to /yourselectedlocation >>>> and use the configure option: >>>> --download-sowing=/yourselectedlocation >>>> file could not be opened successfully >>>> Downloaded package SOWING from: >>>> https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz is not a >>>> tarball. >>>> [or installed python cannot process compressed files] >>>> * If you are behind a firewall - please fix your proxy and rerun >>>> ./configure >>>> For example at LANL you may need to set the environmental variable >>>> http_proxy (or HTTP_PROXY?) to http://proxyout.lanl.gov >>>> * You can run with --with-packages-download-dir=/adirectory and >>>> ./configure will instruct you what packages to download manually >>>> * or you can download the above URL manually, to >>>> /yourselectedlocation/v1.1.26-p1.tar.gz >>>> and use the configure option: >>>> --download-sowing=/yourselectedlocation/v1.1.26-p1.tar.gz >>>> >>>> ******************************************************************************* >>>> >>>> >>>> >>>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sun May 16 07:57:53 2021 From: mfadams at lbl.gov (Mark Adams) Date: Sun, 16 May 2021 08:57:53 -0400 Subject: [petsc-users] configure error In-Reply-To: References: <22289CC9-0240-4046-B42E-8CB80F6DD9BB@petsc.dev> <2ADD8B67-6C33-47A7-824F-5F1C15937DD5@petsc.dev> Message-ID: Whoops, between scp'ing and soft links I gave you the wrong log file. I removed this already. On Sun, May 16, 2021 at 8:37 AM Junchao Zhang wrote: > Remove --with-kokkos-cuda-arch=VOLTA70 from your configure > > Barry: It seems we should either disable this option, or rename it to > --with-cuda-arch and get it accepted by both kokkos.py and cuda.py > > --Junchao Zhang > > On Sun, May 16, 2021 at 6:15 AM Mark Adams wrote: > >> progress: >> >> 04:08 cgpu11 adams/cusparse-cpu-solve *= ~/petsc$ make >> PETSC_DIR=/global/homes/m/madams/petsc >> PETSC_ARCH=arch-cori-gpu-opt-kokkos-gcc check >> Running check examples to verify correct installation >> Using PETSC_DIR=/global/homes/m/madams/petsc and >> PETSC_ARCH=arch-cori-gpu-opt-kokkos-gcc >> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process >> Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes >> See http://www.mcs.anl.gov/petsc/documentation/faq.html >> srun: error: Unable to create step for job 1921749: More processors >> requested than permitted >> 2,15c2,32 >> < 0 SNES Function norm 2.391552133017e-01 >> < 0 KSP Residual norm 2.325621076120e-01 >> < 1 KSP Residual norm 1.654206318674e-02 >> < 2 KSP Residual norm 7.202836119880e-04 >> < 3 KSP Residual norm 1.796861424199e-05 >> < 4 KSP Residual norm 2.461332992052e-07 >> < 1 SNES Function norm 6.826585648929e-05 >> < 0 KSP Residual norm 2.347339172985e-05 >> < 1 KSP Residual norm 8.356798075993e-07 >> < 2 KSP Residual norm 1.844045309619e-08 >> < 3 KSP Residual norm 5.336386977405e-10 >> < 4 KSP Residual norm 2.662608472862e-11 >> < 2 SNES Function norm 6.549682264799e-11 >> < Number of SNES iterations = 2 >> --- >> > Kokkos::Cuda::initialize ERROR: likely mismatch of architecture >> > [cgpu11:39927] *** Process received signal *** >> > [cgpu11:39927] Signal: Aborted (6) >> > [cgpu11:39927] Signal code: (-6) >> > [cgpu11:39927] [ 0] /lib64/libpthread.so.0(+0x12360)[0x2aab12731360] >> > [cgpu11:39927] [ 1] /lib64/libc.so.6(gsignal+0x110)[0x2aab12973160] >> > [cgpu11:39927] [ 2] /lib64/libc.so.6(abort+0x151)[0x2aab12974741] >> > [cgpu11:39927] [ 3] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(+0x408e5)[0x2aaabbfb88e5] >> > [cgpu11:39927] [ 4] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl12CudaInternal10initializeEiP11CUstream_st+0xb5c)[0x2aaabbfc762c] >> > [cgpu11:39927] [ 5] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl16ExecSpaceManager17initialize_spacesERKNS_13InitArgumentsE+0x24)[0x2aaabbfa1224] >> > [cgpu11:39927] [ 6] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_00011209_00000000_7_Kokkos_Core_cpp1_ii_889c95a619initialize_internalERKNS_13InitArgumentsE+0x11)[0x2aaabbfa49d1] >> > [cgpu11:39927] [ 7] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(PetscKokkosInitializeCheck+0x1f9)[0x2aaaaafad809] >> > [cgpu11:39927] [ 8] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x68541b)[0x2aaaab35841b] >> > [cgpu11:39927] [ 9] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x36a224)[0x2aaaab03d224] >> > [cgpu11:39927] [10] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x35f596)[0x2aaaab032596] >> > [cgpu11:39927] [11] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x2f9d94)[0x2aaaaafccd94] >> > [cgpu11:39927] [12] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(PetscSFBcastWithMemTypeBegin+0x11f)[0x2aaaab04bc3f] >> > [cgpu11:39927] [13] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x37f53a)[0x2aaaab05253a] >> > [cgpu11:39927] [14] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(VecScatterBegin+0x70)[0x2aaaab057b30] >> > [cgpu11:39927] [15] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(DMGlobalToLocalBegin_DA+0x1f)[0x2aaaabc27a5f] >> > [cgpu11:39927] [16] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(DMGlobalToLocalBegin+0x1dd)[0x2aaaabe61b3d] >> > [cgpu11:39927] [17] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x144567a)[0x2aaaac11867a] >> > [cgpu11:39927] [18] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESComputeFunction+0xf5)[0x2aaaac138ef5] >> > [cgpu11:39927] [19] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x14ad0de)[0x2aaaac1800de] >> > [cgpu11:39927] [20] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESSolve+0x821)[0x2aaaac146ed1] >> > [cgpu11:39927] [21] >> /global/u2/m/madams/petsc/src/snes/tutorials/./ex19[0x401a69] >> > [cgpu11:39927] [22] >> /lib64/libc.so.6(__libc_start_main+0xea)[0x2aab1295df8a] >> > [cgpu11:39927] [23] >> /global/u2/m/madams/petsc/src/snes/tutorials/./ex19[0x4021ba] >> > [cgpu11:39927] *** End of error message *** >> > srun: error: cgpu11: task 0: Aborted >> > srun: Terminating job step 1921749.6 >> /global/homes/m/madams/petsc/src/snes/tutorials >> Possible problem with ex19 running with cuda, diffs above >> ========================================= >> gmake[3]: [makefile:102: runex3k_kokkos] Error 1 (ignored) >> >> On Sun, May 16, 2021 at 12:31 AM Barry Smith wrote: >> >>> >>> Looks like nvcc_wrapper gets confused when both -arch and -gencode are >>> used. But handles either of them separately correctly. >>> >>> Mark, >>> >>> PETSc configure now handles the -arch and --with-kokkos-cuda-arch= >>> business automatically so you do not, and generally shouldn't pass -arch >>> sm_70 in --CUDAFLAGS anymore or use the --with-kokkos-cuda-arch option >>> anymore. >>> >>> If you need to set the -arch to something earlier than the system >>> supports (rare) you can now use -with-cuda-gencodearch to set that instead >>> of using CUDAFLAGS. >>> >>> Barry >>> >>> >>> $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx >>> /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda >>> -c -Xcompiler -fPIC -g -gencode arch=compute_70,code=sm_70 >>> -I/home/bsmith/soft/gnu-mpich/include -Wno-deprecated-gpu-targets >>> -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include >>> -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include >>> /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx >>> -o arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o >>> nvcc --expt-extended-lambda -gencode arch=compute_70,code=sm_70 >>> -Wno-deprecated-gpu-targets -ccbin mpicxx -g >>> -I/home/bsmith/soft/gnu-mpich/include -I/home/bsmith/petsc/include >>> -I/home/bsmith/petsc/arch-main/include >>> -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include >>> -Xcompiler -fPIC -x cu >>> /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx -c -o >>> arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o >>> ~/petsc* (main=)* arch-main >>> >>> $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx >>> /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda >>> -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 >>> -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -Xcompiler -fPIC -O3 -gencode >>> arch=compute_80,code=sm_80 >>> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include >>> -Wno-deprecated-gpu-targets >>> nvcc --expt-extended-lambda -arch=sm_70 -Wno-deprecated-gpu-targets >>> -ccbin mpicxx -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 >>> -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -O3 >>> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -Xcompiler >>> -rdynamic,-fPIC,arch=compute_80,code=sm_80 -c >>> ~/petsc* (main=)* arch-main >>> >>> $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx >>> /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda >>> -c -Xcompiler -fPIC -g -arch=sm_70 -gencode arch=compute_70,code=sm_70 >>> -I/home/bsmith/soft/gnu-mpich/include -Wno-deprecated-gpu-targets >>> -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include >>> -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include >>> /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx >>> -o arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o >>> nvcc --expt-extended-lambda -arch=sm_70 -Wno-deprecated-gpu-targets >>> -ccbin mpicxx -g -I/home/bsmith/soft/gnu-mpich/include >>> -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include >>> -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include >>> -Xcompiler -fPIC,arch=compute_70,code=sm_70 -x cu >>> /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx -c -o >>> arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o >>> >>> >>> On May 15, 2021, at 8:44 PM, Junchao Zhang >>> wrote: >>> >>> Add --show to the failing nvcc_wrapper command and see how nvcc_wrapper >>> parses the options >>> >>> >>> PATH=/global/common/sw/cray/sles15/x86_64/zlib/1.2.11/gcc/8.2.0/pep2pal/bin:/global/common/sw/cray/cnl7/haswell/cmake/3.18.2/bin:/opt/cray/rca/2.2.20-7.0.1.1_4.62__g8e3fb5b.ari/bin:/opt/cray/alps/6.6.58-7.0.1.1_6.20__g437d88db.ari/sbin:/opt/cray/alps/default/bin:/opt/cray/job/2.2.4-7.0.1.1_3.48__g36b56f4.ari/bin:/opt/cray/pe/craype/2.6.2/bin:/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/compilers/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/cuda/bin:/usr/common/software/sles15_cgpu/cuda/11.1.1/bin:/usr/common/software/sles15_cgpu/gcc/8.3.0/bin:/opt/esslurm/bin:/opt/cray/pe/modules/ >>> 3.2.11.4/bin:/usr/common/software/bin:/usr/common/nsg/bin:/opt/ovis/bin:/opt/ovis/sbin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/cray/pe/bin:`dirname >>> /usr/common/software/sles15_cgpu/cuda/11.1.1/bin/nvcc` >>> NVCC_WRAPPER_DEFAULT_COMPILER=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin/mpicxx >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/bin/nvcc_wrapper >>> --expt-extended-lambda -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 >>> -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 >>> -Xcompiler -fPIC -O3 -gencode arch=compute_80,code=sm_80 >>> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include >>> -Wno-deprecated-gpu-targets -I/global/homes/m/madams/petsc/include >>> -I/global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include >>> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include >>> -I/usr/common/software/sles15_cgpu/cuda/11.1.1/include >>> /global/u2/m/madams/petsc/src/sys/objects/kokkos/kinit.kokkos.cxx -o >>> arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o --show >>> >>> --Junchao Zhang >>> >>> >>> On Sat, May 15, 2021 at 7:14 PM Mark Adams wrote: >>> >>>> Thanks, >>>> Now I get an error in make all. THis was working a few weeks ago. >>>> Make.log was empty but here is the output. >>>> .... >>>> CC arch-cori-gpu-opt-kokkos-gcc/obj/sys/error/ftn-custom/zerrf.o >>>> CC arch-cori-gpu-opt-kokkos-gcc/obj/sys/f90-src/f90_cwrap.o >>>> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/hdf5io.o >>>> >>>> PATH=/global/common/sw/cray/sles15/x86_64/zlib/1.2.11/gcc/8.2.0/pep2pal/bin:/global/common/sw/cray/cnl7/haswell/cmake/3.18.2/bin:/opt/cray/rca/2.2.20-7.0.1.1_4.62__g8e3fb5b.ari/bin:/opt/cray/alps/6.6.58-7.0.1.1_6.20__g437d88db.ari/sbin:/opt/cray/alps/default/bin:/opt/cray/job/2.2.4-7.0.1.1_3.48__g36b56f4.ari/bin:/opt/cray/pe/craype/2.6.2/bin:/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/compilers/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/cuda/bin:/usr/common/software/sles15_cgpu/cuda/11.1.1/bin:/usr/common/software/sles15_cgpu/gcc/8.3.0/bin:/opt/esslurm/bin:/opt/cray/pe/modules/ >>>> 3.2.11.4/bin:/usr/common/software/bin:/usr/common/nsg/bin:/opt/ovis/bin:/opt/ovis/sbin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/cray/pe/bin:`dirname >>>> >>>> /usr/common/software/sles15_cgpu/cuda/11.1.1/bin/nvcc` >>>> NVCC_WRAPPER_DEFAULT_COMPILER=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin/mpicxx >>>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/bin/nvcc_wrapper >>>> --expt-extended-lambda -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 >>>> -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 >>>> -Xcompiler -fPIC -O3 -gencode arch=compute_80,code=sm_80 >>>> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include >>>> -Wno-deprecated-gpu-targets -I/global/homes/m/madams/petsc/include >>>> -I/global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include >>>> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include >>>> -I/usr/common/software/sles15_cgpu/cuda/11.1.1/include >>>> /global/u2/m/madams/petsc/src/sys/objects/kokkos/kinit.kokkos.cxx -o >>>> arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o >>>> g++: error: arch=compute_80: No such file or directory >>>> g++: error: code=sm_80: No such file or directory >>>> gmake[3]: *** [gmakefile:188: >>>> arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o] Error 1 >>>> gmake[3]: *** Waiting for unfinished jobs.... >>>> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/isio.o >>>> CC >>>> arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/ftn-auto/pmapf.o >>>> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/pmap.o >>>> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/psort.o >>>> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/isltog.o >>>> FC arch-cori-gpu-opt-kokkos-gcc/obj/vec/f90-mod/petscvecmod.o >>>> CUDAC >>>> arch-cori-gpu-opt-kokkos-gcc/obj/sys/memory/cuda/mcudahost.o >>>> CUDAC.dep >>>> arch-cori-gpu-opt-kokkos-gcc/obj/sys/memory/cuda/mcudahost.o >>>> gmake[2]: *** [/global/homes/m/madams/petsc/lib/petsc/conf/rules:50: >>>> libs] Error 2 >>>> **************************ERROR************************************* >>>> Error during compile, check >>>> arch-cori-gpu-opt-kokkos-gcc/lib/petsc/conf/make.log >>>> Send it and arch-cori-gpu-opt-kokkos-gcc/lib/petsc/conf/configure.log >>>> to petsc-maint at mcs.anl.gov >>>> ******************************************************************** >>>> gmake[1]: *** [makefile:40: all] Error 1 >>>> make: *** [GNUmakefile:9: all] Error 2 >>>> >>>> On Sat, May 15, 2021 at 5:30 PM Barry Smith wrote: >>>> >>>>> >>>>> Bitbucket is off-line. Do you need fortran stubs at this moment? If >>>>> not use --with-fortran-interface=0 >>>>> >>>>> >>>>> On May 15, 2021, at 2:51 PM, Mark Adams wrote: >>>>> >>>>> I can build a non-kokkos PETSc here (Cori/GPU) but I get this >>>>> error with Kokkos. >>>>> Any suggestions? >>>>> Thanks, >>>>> Mark >>>>> >>>>> ============================================================================================= >>>>> >>>>> >>>>> >>>>> ============================================================================================= >>>>> >>>>> >>>>> Trying to >>>>> download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING >>>>> >>>>> >>>>> >>>>> ============================================================================================= >>>>> >>>>> >>>>> >>>>> ============================================================================================= >>>>> >>>>> >>>>> Trying to >>>>> download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING >>>>> >>>>> >>>>> >>>>> ============================================================================================= >>>>> >>>>> >>>>> >>>>> ============================================================================================= >>>>> >>>>> >>>>> Trying to >>>>> download https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz >>>>> for SOWING >>>>> >>>>> >>>>> ============================================================================================= >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ******************************************************************************* >>>>> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log >>>>> for details): >>>>> >>>>> ------------------------------------------------------------------------------- >>>>> Error during download/extract/detection of SOWING: >>>>> Unable to clone sowing >>>>> Could not execute "['git clone >>>>> https://bitbucket.org/petsc/pkg-sowing.git >>>>> /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": >>>>> Cloning into >>>>> '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... >>>>> fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/': >>>>> The requested URL returned error: 503 >>>>> Unable to download package SOWING from: git:// >>>>> https://bitbucket.org/petsc/pkg-sowing.git >>>>> * If URL specified manually - perhaps there is a typo? >>>>> * If your network is disconnected - please reconnect and rerun >>>>> ./configure >>>>> * Or perhaps you have a firewall blocking the download >>>>> * You can run with --with-packages-download-dir=/adirectory and >>>>> ./configure will instruct you what packages to download manually >>>>> * or you can download the above URL manually, to /yourselectedlocation >>>>> and use the configure option: >>>>> --download-sowing=/yourselectedlocation >>>>> Unable to clone sowing >>>>> Could not execute "['git clone >>>>> https://bitbucket.org/petsc/pkg-sowing.git >>>>> /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": >>>>> Cloning into >>>>> '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... >>>>> fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/': >>>>> The requested URL returned error: 503 >>>>> Unable to download package SOWING from: git:// >>>>> https://bitbucket.org/petsc/pkg-sowing.git >>>>> * If URL specified manually - perhaps there is a typo? >>>>> * If your network is disconnected - please reconnect and rerun >>>>> ./configure >>>>> * Or perhaps you have a firewall blocking the download >>>>> * You can run with --with-packages-download-dir=/adirectory and >>>>> ./configure will instruct you what packages to download manually >>>>> * or you can download the above URL manually, to /yourselectedlocation >>>>> and use the configure option: >>>>> --download-sowing=/yourselectedlocation >>>>> file could not be opened successfully >>>>> Downloaded package SOWING from: >>>>> https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz is not a >>>>> tarball. >>>>> [or installed python cannot process compressed files] >>>>> * If you are behind a firewall - please fix your proxy and rerun >>>>> ./configure >>>>> For example at LANL you may need to set the environmental variable >>>>> http_proxy (or HTTP_PROXY?) to http://proxyout.lanl.gov >>>>> * You can run with --with-packages-download-dir=/adirectory and >>>>> ./configure will instruct you what packages to download manually >>>>> * or you can download the above URL manually, to >>>>> /yourselectedlocation/v1.1.26-p1.tar.gz >>>>> and use the configure option: >>>>> --download-sowing=/yourselectedlocation/v1.1.26-p1.tar.gz >>>>> >>>>> ******************************************************************************* >>>>> >>>>> >>>>> >>>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 1153242 bytes Desc: not available URL: From junchao.zhang at gmail.com Sun May 16 08:27:37 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Sun, 16 May 2021 08:27:37 -0500 Subject: [petsc-users] configure error In-Reply-To: References: <22289CC9-0240-4046-B42E-8CB80F6DD9BB@petsc.dev> <2ADD8B67-6C33-47A7-824F-5F1C15937DD5@petsc.dev> Message-ID: Check the meaning of 'srun: error: Unable to create step for job 1921749: More processors requested than permitted' --Junchao Zhang On Sun, May 16, 2021 at 7:58 AM Mark Adams wrote: > Whoops, between scp'ing and soft links I gave you the wrong log file. I > removed this already. > > On Sun, May 16, 2021 at 8:37 AM Junchao Zhang > wrote: > >> Remove --with-kokkos-cuda-arch=VOLTA70 from your configure >> >> Barry: It seems we should either disable this option, or rename it to >> --with-cuda-arch and get it accepted by both kokkos.py and cuda.py >> >> --Junchao Zhang >> >> On Sun, May 16, 2021 at 6:15 AM Mark Adams wrote: >> >>> progress: >>> >>> 04:08 cgpu11 adams/cusparse-cpu-solve *= ~/petsc$ make >>> PETSC_DIR=/global/homes/m/madams/petsc >>> PETSC_ARCH=arch-cori-gpu-opt-kokkos-gcc check >>> Running check examples to verify correct installation >>> Using PETSC_DIR=/global/homes/m/madams/petsc and >>> PETSC_ARCH=arch-cori-gpu-opt-kokkos-gcc >>> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process >>> Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes >>> See http://www.mcs.anl.gov/petsc/documentation/faq.html >>> srun: error: Unable to create step for job 1921749: More processors >>> requested than permitted >>> 2,15c2,32 >>> < 0 SNES Function norm 2.391552133017e-01 >>> < 0 KSP Residual norm 2.325621076120e-01 >>> < 1 KSP Residual norm 1.654206318674e-02 >>> < 2 KSP Residual norm 7.202836119880e-04 >>> < 3 KSP Residual norm 1.796861424199e-05 >>> < 4 KSP Residual norm 2.461332992052e-07 >>> < 1 SNES Function norm 6.826585648929e-05 >>> < 0 KSP Residual norm 2.347339172985e-05 >>> < 1 KSP Residual norm 8.356798075993e-07 >>> < 2 KSP Residual norm 1.844045309619e-08 >>> < 3 KSP Residual norm 5.336386977405e-10 >>> < 4 KSP Residual norm 2.662608472862e-11 >>> < 2 SNES Function norm 6.549682264799e-11 >>> < Number of SNES iterations = 2 >>> --- >>> > Kokkos::Cuda::initialize ERROR: likely mismatch of architecture >>> > [cgpu11:39927] *** Process received signal *** >>> > [cgpu11:39927] Signal: Aborted (6) >>> > [cgpu11:39927] Signal code: (-6) >>> > [cgpu11:39927] [ 0] /lib64/libpthread.so.0(+0x12360)[0x2aab12731360] >>> > [cgpu11:39927] [ 1] /lib64/libc.so.6(gsignal+0x110)[0x2aab12973160] >>> > [cgpu11:39927] [ 2] /lib64/libc.so.6(abort+0x151)[0x2aab12974741] >>> > [cgpu11:39927] [ 3] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(+0x408e5)[0x2aaabbfb88e5] >>> > [cgpu11:39927] [ 4] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl12CudaInternal10initializeEiP11CUstream_st+0xb5c)[0x2aaabbfc762c] >>> > [cgpu11:39927] [ 5] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl16ExecSpaceManager17initialize_spacesERKNS_13InitArgumentsE+0x24)[0x2aaabbfa1224] >>> > [cgpu11:39927] [ 6] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_00011209_00000000_7_Kokkos_Core_cpp1_ii_889c95a619initialize_internalERKNS_13InitArgumentsE+0x11)[0x2aaabbfa49d1] >>> > [cgpu11:39927] [ 7] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(PetscKokkosInitializeCheck+0x1f9)[0x2aaaaafad809] >>> > [cgpu11:39927] [ 8] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x68541b)[0x2aaaab35841b] >>> > [cgpu11:39927] [ 9] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x36a224)[0x2aaaab03d224] >>> > [cgpu11:39927] [10] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x35f596)[0x2aaaab032596] >>> > [cgpu11:39927] [11] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x2f9d94)[0x2aaaaafccd94] >>> > [cgpu11:39927] [12] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(PetscSFBcastWithMemTypeBegin+0x11f)[0x2aaaab04bc3f] >>> > [cgpu11:39927] [13] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x37f53a)[0x2aaaab05253a] >>> > [cgpu11:39927] [14] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(VecScatterBegin+0x70)[0x2aaaab057b30] >>> > [cgpu11:39927] [15] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(DMGlobalToLocalBegin_DA+0x1f)[0x2aaaabc27a5f] >>> > [cgpu11:39927] [16] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(DMGlobalToLocalBegin+0x1dd)[0x2aaaabe61b3d] >>> > [cgpu11:39927] [17] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x144567a)[0x2aaaac11867a] >>> > [cgpu11:39927] [18] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESComputeFunction+0xf5)[0x2aaaac138ef5] >>> > [cgpu11:39927] [19] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x14ad0de)[0x2aaaac1800de] >>> > [cgpu11:39927] [20] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESSolve+0x821)[0x2aaaac146ed1] >>> > [cgpu11:39927] [21] >>> /global/u2/m/madams/petsc/src/snes/tutorials/./ex19[0x401a69] >>> > [cgpu11:39927] [22] >>> /lib64/libc.so.6(__libc_start_main+0xea)[0x2aab1295df8a] >>> > [cgpu11:39927] [23] >>> /global/u2/m/madams/petsc/src/snes/tutorials/./ex19[0x4021ba] >>> > [cgpu11:39927] *** End of error message *** >>> > srun: error: cgpu11: task 0: Aborted >>> > srun: Terminating job step 1921749.6 >>> /global/homes/m/madams/petsc/src/snes/tutorials >>> Possible problem with ex19 running with cuda, diffs above >>> ========================================= >>> gmake[3]: [makefile:102: runex3k_kokkos] Error 1 (ignored) >>> >>> On Sun, May 16, 2021 at 12:31 AM Barry Smith wrote: >>> >>>> >>>> Looks like nvcc_wrapper gets confused when both -arch and -gencode >>>> are used. But handles either of them separately correctly. >>>> >>>> Mark, >>>> >>>> PETSc configure now handles the -arch and --with-kokkos-cuda-arch= >>>> business automatically so you do not, and generally shouldn't pass -arch >>>> sm_70 in --CUDAFLAGS anymore or use the --with-kokkos-cuda-arch option >>>> anymore. >>>> >>>> If you need to set the -arch to something earlier than the system >>>> supports (rare) you can now use -with-cuda-gencodearch to set that instead >>>> of using CUDAFLAGS. >>>> >>>> Barry >>>> >>>> >>>> $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx >>>> /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda >>>> -c -Xcompiler -fPIC -g -gencode arch=compute_70,code=sm_70 >>>> -I/home/bsmith/soft/gnu-mpich/include -Wno-deprecated-gpu-targets >>>> -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include >>>> -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include >>>> /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx >>>> -o arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o >>>> nvcc --expt-extended-lambda -gencode arch=compute_70,code=sm_70 >>>> -Wno-deprecated-gpu-targets -ccbin mpicxx -g >>>> -I/home/bsmith/soft/gnu-mpich/include -I/home/bsmith/petsc/include >>>> -I/home/bsmith/petsc/arch-main/include >>>> -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include >>>> -Xcompiler -fPIC -x cu >>>> /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx -c -o >>>> arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o >>>> ~/petsc* (main=)* arch-main >>>> >>>> $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx >>>> /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda >>>> -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 >>>> -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -Xcompiler -fPIC -O3 -gencode >>>> arch=compute_80,code=sm_80 >>>> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include >>>> -Wno-deprecated-gpu-targets >>>> nvcc --expt-extended-lambda -arch=sm_70 -Wno-deprecated-gpu-targets >>>> -ccbin mpicxx -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 >>>> -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -O3 >>>> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -Xcompiler >>>> -rdynamic,-fPIC,arch=compute_80,code=sm_80 -c >>>> ~/petsc* (main=)* arch-main >>>> >>>> $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx >>>> /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda >>>> -c -Xcompiler -fPIC -g -arch=sm_70 -gencode arch=compute_70,code=sm_70 >>>> -I/home/bsmith/soft/gnu-mpich/include -Wno-deprecated-gpu-targets >>>> -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include >>>> -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include >>>> /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx >>>> -o arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o >>>> nvcc --expt-extended-lambda -arch=sm_70 -Wno-deprecated-gpu-targets >>>> -ccbin mpicxx -g -I/home/bsmith/soft/gnu-mpich/include >>>> -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include >>>> -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include >>>> -Xcompiler -fPIC,arch=compute_70,code=sm_70 -x cu >>>> /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx -c -o >>>> arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o >>>> >>>> >>>> On May 15, 2021, at 8:44 PM, Junchao Zhang >>>> wrote: >>>> >>>> Add --show to the failing nvcc_wrapper command and see how >>>> nvcc_wrapper parses the options >>>> >>>> >>>> PATH=/global/common/sw/cray/sles15/x86_64/zlib/1.2.11/gcc/8.2.0/pep2pal/bin:/global/common/sw/cray/cnl7/haswell/cmake/3.18.2/bin:/opt/cray/rca/2.2.20-7.0.1.1_4.62__g8e3fb5b.ari/bin:/opt/cray/alps/6.6.58-7.0.1.1_6.20__g437d88db.ari/sbin:/opt/cray/alps/default/bin:/opt/cray/job/2.2.4-7.0.1.1_3.48__g36b56f4.ari/bin:/opt/cray/pe/craype/2.6.2/bin:/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/compilers/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/cuda/bin:/usr/common/software/sles15_cgpu/cuda/11.1.1/bin:/usr/common/software/sles15_cgpu/gcc/8.3.0/bin:/opt/esslurm/bin:/opt/cray/pe/modules/ >>>> 3.2.11.4/bin:/usr/common/software/bin:/usr/common/nsg/bin:/opt/ovis/bin:/opt/ovis/sbin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/cray/pe/bin:`dirname >>>> /usr/common/software/sles15_cgpu/cuda/11.1.1/bin/nvcc` >>>> NVCC_WRAPPER_DEFAULT_COMPILER=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin/mpicxx >>>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/bin/nvcc_wrapper >>>> --expt-extended-lambda -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 >>>> -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 >>>> -Xcompiler -fPIC -O3 -gencode arch=compute_80,code=sm_80 >>>> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include >>>> -Wno-deprecated-gpu-targets -I/global/homes/m/madams/petsc/include >>>> -I/global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include >>>> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include >>>> -I/usr/common/software/sles15_cgpu/cuda/11.1.1/include >>>> /global/u2/m/madams/petsc/src/sys/objects/kokkos/kinit.kokkos.cxx -o >>>> arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o --show >>>> >>>> --Junchao Zhang >>>> >>>> >>>> On Sat, May 15, 2021 at 7:14 PM Mark Adams wrote: >>>> >>>>> Thanks, >>>>> Now I get an error in make all. THis was working a few weeks ago. >>>>> Make.log was empty but here is the output. >>>>> .... >>>>> CC arch-cori-gpu-opt-kokkos-gcc/obj/sys/error/ftn-custom/zerrf.o >>>>> CC arch-cori-gpu-opt-kokkos-gcc/obj/sys/f90-src/f90_cwrap.o >>>>> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/hdf5io.o >>>>> >>>>> PATH=/global/common/sw/cray/sles15/x86_64/zlib/1.2.11/gcc/8.2.0/pep2pal/bin:/global/common/sw/cray/cnl7/haswell/cmake/3.18.2/bin:/opt/cray/rca/2.2.20-7.0.1.1_4.62__g8e3fb5b.ari/bin:/opt/cray/alps/6.6.58-7.0.1.1_6.20__g437d88db.ari/sbin:/opt/cray/alps/default/bin:/opt/cray/job/2.2.4-7.0.1.1_3.48__g36b56f4.ari/bin:/opt/cray/pe/craype/2.6.2/bin:/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/compilers/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/cuda/bin:/usr/common/software/sles15_cgpu/cuda/11.1.1/bin:/usr/common/software/sles15_cgpu/gcc/8.3.0/bin:/opt/esslurm/bin:/opt/cray/pe/modules/ >>>>> 3.2.11.4/bin:/usr/common/software/bin:/usr/common/nsg/bin:/opt/ovis/bin:/opt/ovis/sbin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/cray/pe/bin:`dirname >>>>> >>>>> /usr/common/software/sles15_cgpu/cuda/11.1.1/bin/nvcc` >>>>> NVCC_WRAPPER_DEFAULT_COMPILER=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin/mpicxx >>>>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/bin/nvcc_wrapper >>>>> --expt-extended-lambda -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 >>>>> -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 >>>>> -Xcompiler -fPIC -O3 -gencode arch=compute_80,code=sm_80 >>>>> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include >>>>> -Wno-deprecated-gpu-targets -I/global/homes/m/madams/petsc/include >>>>> -I/global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include >>>>> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include >>>>> -I/usr/common/software/sles15_cgpu/cuda/11.1.1/include >>>>> /global/u2/m/madams/petsc/src/sys/objects/kokkos/kinit.kokkos.cxx -o >>>>> arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o >>>>> g++: error: arch=compute_80: No such file or directory >>>>> g++: error: code=sm_80: No such file or directory >>>>> gmake[3]: *** [gmakefile:188: >>>>> arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o] Error 1 >>>>> gmake[3]: *** Waiting for unfinished jobs.... >>>>> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/isio.o >>>>> CC >>>>> arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/ftn-auto/pmapf.o >>>>> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/pmap.o >>>>> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/psort.o >>>>> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/isltog.o >>>>> FC arch-cori-gpu-opt-kokkos-gcc/obj/vec/f90-mod/petscvecmod.o >>>>> CUDAC >>>>> arch-cori-gpu-opt-kokkos-gcc/obj/sys/memory/cuda/mcudahost.o >>>>> CUDAC.dep >>>>> arch-cori-gpu-opt-kokkos-gcc/obj/sys/memory/cuda/mcudahost.o >>>>> gmake[2]: *** [/global/homes/m/madams/petsc/lib/petsc/conf/rules:50: >>>>> libs] Error 2 >>>>> **************************ERROR************************************* >>>>> Error during compile, check >>>>> arch-cori-gpu-opt-kokkos-gcc/lib/petsc/conf/make.log >>>>> Send it and >>>>> arch-cori-gpu-opt-kokkos-gcc/lib/petsc/conf/configure.log to >>>>> petsc-maint at mcs.anl.gov >>>>> ******************************************************************** >>>>> gmake[1]: *** [makefile:40: all] Error 1 >>>>> make: *** [GNUmakefile:9: all] Error 2 >>>>> >>>>> On Sat, May 15, 2021 at 5:30 PM Barry Smith wrote: >>>>> >>>>>> >>>>>> Bitbucket is off-line. Do you need fortran stubs at this moment? If >>>>>> not use --with-fortran-interface=0 >>>>>> >>>>>> >>>>>> On May 15, 2021, at 2:51 PM, Mark Adams wrote: >>>>>> >>>>>> I can build a non-kokkos PETSc here (Cori/GPU) but I get this >>>>>> error with Kokkos. >>>>>> Any suggestions? >>>>>> Thanks, >>>>>> Mark >>>>>> >>>>>> ============================================================================================= >>>>>> >>>>>> >>>>>> >>>>>> ============================================================================================= >>>>>> >>>>>> >>>>>> Trying to >>>>>> download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING >>>>>> >>>>>> >>>>>> >>>>>> ============================================================================================= >>>>>> >>>>>> >>>>>> >>>>>> ============================================================================================= >>>>>> >>>>>> >>>>>> Trying to >>>>>> download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING >>>>>> >>>>>> >>>>>> >>>>>> ============================================================================================= >>>>>> >>>>>> >>>>>> >>>>>> ============================================================================================= >>>>>> >>>>>> >>>>>> Trying to >>>>>> download https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz >>>>>> for SOWING >>>>>> >>>>>> >>>>>> ============================================================================================= >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ******************************************************************************* >>>>>> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log >>>>>> for details): >>>>>> >>>>>> ------------------------------------------------------------------------------- >>>>>> Error during download/extract/detection of SOWING: >>>>>> Unable to clone sowing >>>>>> Could not execute "['git clone >>>>>> https://bitbucket.org/petsc/pkg-sowing.git >>>>>> /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": >>>>>> Cloning into >>>>>> '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... >>>>>> fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/': >>>>>> The requested URL returned error: 503 >>>>>> Unable to download package SOWING from: git:// >>>>>> https://bitbucket.org/petsc/pkg-sowing.git >>>>>> * If URL specified manually - perhaps there is a typo? >>>>>> * If your network is disconnected - please reconnect and rerun >>>>>> ./configure >>>>>> * Or perhaps you have a firewall blocking the download >>>>>> * You can run with --with-packages-download-dir=/adirectory and >>>>>> ./configure will instruct you what packages to download manually >>>>>> * or you can download the above URL manually, to /yourselectedlocation >>>>>> and use the configure option: >>>>>> --download-sowing=/yourselectedlocation >>>>>> Unable to clone sowing >>>>>> Could not execute "['git clone >>>>>> https://bitbucket.org/petsc/pkg-sowing.git >>>>>> /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": >>>>>> Cloning into >>>>>> '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... >>>>>> fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/': >>>>>> The requested URL returned error: 503 >>>>>> Unable to download package SOWING from: git:// >>>>>> https://bitbucket.org/petsc/pkg-sowing.git >>>>>> * If URL specified manually - perhaps there is a typo? >>>>>> * If your network is disconnected - please reconnect and rerun >>>>>> ./configure >>>>>> * Or perhaps you have a firewall blocking the download >>>>>> * You can run with --with-packages-download-dir=/adirectory and >>>>>> ./configure will instruct you what packages to download manually >>>>>> * or you can download the above URL manually, to /yourselectedlocation >>>>>> and use the configure option: >>>>>> --download-sowing=/yourselectedlocation >>>>>> file could not be opened successfully >>>>>> Downloaded package SOWING from: >>>>>> https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz is not >>>>>> a tarball. >>>>>> [or installed python cannot process compressed files] >>>>>> * If you are behind a firewall - please fix your proxy and rerun >>>>>> ./configure >>>>>> For example at LANL you may need to set the environmental variable >>>>>> http_proxy (or HTTP_PROXY?) to http://proxyout.lanl.gov >>>>>> * You can run with --with-packages-download-dir=/adirectory and >>>>>> ./configure will instruct you what packages to download manually >>>>>> * or you can download the above URL manually, to >>>>>> /yourselectedlocation/v1.1.26-p1.tar.gz >>>>>> and use the configure option: >>>>>> --download-sowing=/yourselectedlocation/v1.1.26-p1.tar.gz >>>>>> >>>>>> ******************************************************************************* >>>>>> >>>>>> >>>>>> >>>>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sun May 16 08:51:31 2021 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 16 May 2021 08:51:31 -0500 Subject: [petsc-users] configure error In-Reply-To: References: <22289CC9-0240-4046-B42E-8CB80F6DD9BB@petsc.dev> <2ADD8B67-6C33-47A7-824F-5F1C15937DD5@petsc.dev> Message-ID: > On May 16, 2021, at 7:37 AM, Junchao Zhang wrote: > > Remove --with-kokkos-cuda-arch=VOLTA70 from your configure Yes, I think it should be removed. > > Barry: It seems we should either disable this option, or rename it to --with-cuda-arch and get it accepted by both kokkos.py and cuda.py > > --Junchao Zhang > > On Sun, May 16, 2021 at 6:15 AM Mark Adams > wrote: > progress: > > 04:08 cgpu11 adams/cusparse-cpu-solve *= ~/petsc$ make PETSC_DIR=/global/homes/m/madams/petsc PETSC_ARCH=arch-cori-gpu-opt-kokkos-gcc check > Running check examples to verify correct installation > Using PETSC_DIR=/global/homes/m/madams/petsc and PETSC_ARCH=arch-cori-gpu-opt-kokkos-gcc > C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes > See http://www.mcs.anl.gov/petsc/documentation/faq.html > srun: error: Unable to create step for job 1921749: More processors requested than permitted > 2,15c2,32 > < 0 SNES Function norm 2.391552133017e-01 > < 0 KSP Residual norm 2.325621076120e-01 > < 1 KSP Residual norm 1.654206318674e-02 > < 2 KSP Residual norm 7.202836119880e-04 > < 3 KSP Residual norm 1.796861424199e-05 > < 4 KSP Residual norm 2.461332992052e-07 > < 1 SNES Function norm 6.826585648929e-05 > < 0 KSP Residual norm 2.347339172985e-05 > < 1 KSP Residual norm 8.356798075993e-07 > < 2 KSP Residual norm 1.844045309619e-08 > < 3 KSP Residual norm 5.336386977405e-10 > < 4 KSP Residual norm 2.662608472862e-11 > < 2 SNES Function norm 6.549682264799e-11 > < Number of SNES iterations = 2 > --- > > Kokkos::Cuda::initialize ERROR: likely mismatch of architecture > > [cgpu11:39927] *** Process received signal *** > > [cgpu11:39927] Signal: Aborted (6) > > [cgpu11:39927] Signal code: (-6) > > [cgpu11:39927] [ 0] /lib64/libpthread.so.0(+0x12360)[0x2aab12731360] > > [cgpu11:39927] [ 1] /lib64/libc.so.6(gsignal+0x110)[0x2aab12973160] > > [cgpu11:39927] [ 2] /lib64/libc.so.6(abort+0x151)[0x2aab12974741] > > [cgpu11:39927] [ 3] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(+0x408e5)[0x2aaabbfb88e5] > > [cgpu11:39927] [ 4] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl12CudaInternal10initializeEiP11CUstream_st+0xb5c)[0x2aaabbfc762c] > > [cgpu11:39927] [ 5] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl16ExecSpaceManager17initialize_spacesERKNS_13InitArgumentsE+0x24)[0x2aaabbfa1224] > > [cgpu11:39927] [ 6] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_00011209_00000000_7_Kokkos_Core_cpp1_ii_889c95a619initialize_internalERKNS_13InitArgumentsE+0x11)[0x2aaabbfa49d1] > > [cgpu11:39927] [ 7] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(PetscKokkosInitializeCheck+0x1f9)[0x2aaaaafad809] > > [cgpu11:39927] [ 8] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x68541b)[0x2aaaab35841b] > > [cgpu11:39927] [ 9] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x36a224)[0x2aaaab03d224] > > [cgpu11:39927] [10] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x35f596)[0x2aaaab032596] > > [cgpu11:39927] [11] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x2f9d94)[0x2aaaaafccd94] > > [cgpu11:39927] [12] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(PetscSFBcastWithMemTypeBegin+0x11f)[0x2aaaab04bc3f] > > [cgpu11:39927] [13] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x37f53a)[0x2aaaab05253a] > > [cgpu11:39927] [14] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(VecScatterBegin+0x70)[0x2aaaab057b30] > > [cgpu11:39927] [15] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(DMGlobalToLocalBegin_DA+0x1f)[0x2aaaabc27a5f] > > [cgpu11:39927] [16] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(DMGlobalToLocalBegin+0x1dd)[0x2aaaabe61b3d] > > [cgpu11:39927] [17] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x144567a)[0x2aaaac11867a] > > [cgpu11:39927] [18] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESComputeFunction+0xf5)[0x2aaaac138ef5] > > [cgpu11:39927] [19] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x14ad0de)[0x2aaaac1800de] > > [cgpu11:39927] [20] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESSolve+0x821)[0x2aaaac146ed1] > > [cgpu11:39927] [21] /global/u2/m/madams/petsc/src/snes/tutorials/./ex19[0x401a69] > > [cgpu11:39927] [22] /lib64/libc.so.6(__libc_start_main+0xea)[0x2aab1295df8a] > > [cgpu11:39927] [23] /global/u2/m/madams/petsc/src/snes/tutorials/./ex19[0x4021ba] > > [cgpu11:39927] *** End of error message *** > > srun: error: cgpu11: task 0: Aborted > > srun: Terminating job step 1921749.6 > /global/homes/m/madams/petsc/src/snes/tutorials > Possible problem with ex19 running with cuda, diffs above > ========================================= > gmake[3]: [makefile:102: runex3k_kokkos] Error 1 (ignored) > > On Sun, May 16, 2021 at 12:31 AM Barry Smith > wrote: > > Looks like nvcc_wrapper gets confused when both -arch and -gencode are used. But handles either of them separately correctly. > > Mark, > > PETSc configure now handles the -arch and --with-kokkos-cuda-arch= business automatically so you do not, and generally shouldn't pass -arch sm_70 in --CUDAFLAGS anymore or use the --with-kokkos-cuda-arch option anymore. > > If you need to set the -arch to something earlier than the system supports (rare) you can now use -with-cuda-gencodearch to set that instead of using CUDAFLAGS. > > Barry > > > $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda -c -Xcompiler -fPIC -g -gencode arch=compute_70,code=sm_70 -I/home/bsmith/soft/gnu-mpich/include -Wno-deprecated-gpu-targets -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx -o arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o > nvcc --expt-extended-lambda -gencode arch=compute_70,code=sm_70 -Wno-deprecated-gpu-targets -ccbin mpicxx -g -I/home/bsmith/soft/gnu-mpich/include -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include -Xcompiler -fPIC -x cu /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx -c -o arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o > ~/petsc (main=) arch-main > > $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -Xcompiler -fPIC -O3 -gencode arch=compute_80,code=sm_80 -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -Wno-deprecated-gpu-targets > nvcc --expt-extended-lambda -arch=sm_70 -Wno-deprecated-gpu-targets -ccbin mpicxx -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -O3 -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -Xcompiler -rdynamic,-fPIC,arch=compute_80,code=sm_80 -c > ~/petsc (main=) arch-main > > $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda -c -Xcompiler -fPIC -g -arch=sm_70 -gencode arch=compute_70,code=sm_70 -I/home/bsmith/soft/gnu-mpich/include -Wno-deprecated-gpu-targets -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx -o arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o > nvcc --expt-extended-lambda -arch=sm_70 -Wno-deprecated-gpu-targets -ccbin mpicxx -g -I/home/bsmith/soft/gnu-mpich/include -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include -Xcompiler -fPIC,arch=compute_70,code=sm_70 -x cu /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx -c -o arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o > > >> On May 15, 2021, at 8:44 PM, Junchao Zhang > wrote: >> >> Add --show to the failing nvcc_wrapper command and see how nvcc_wrapper parses the options >> >> PATH=/global/common/sw/cray/sles15/x86_64/zlib/1.2.11/gcc/8.2.0/pep2pal/bin:/global/common/sw/cray/cnl7/haswell/cmake/3.18.2/bin:/opt/cray/rca/2.2.20-7.0.1.1_4.62__g8e3fb5b.ari/bin:/opt/cray/alps/6.6.58-7.0.1.1_6.20__g437d88db.ari/sbin:/opt/cray/alps/default/bin:/opt/cray/job/2.2.4-7.0.1.1_3.48__g36b56f4.ari/bin:/opt/cray/pe/craype/2.6.2/bin:/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/compilers/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/cuda/bin:/usr/common/software/sles15_cgpu/cuda/11.1.1/bin:/usr/common/software/sles15_cgpu/gcc/8.3.0/bin:/opt/esslurm/bin:/opt/cray/pe/modules/3.2.11.4/bin:/usr/common/software/bin:/usr/common/nsg/bin:/opt/ovis/bin:/opt/ovis/sbin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/cray/pe/bin:`dirname /usr/common/software/sles15_cgpu/cuda/11.1.1/bin/nvcc` NVCC_WRAPPER_DEFAULT_COMPILER=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin/mpicxx /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/bin/nvcc_wrapper --expt-extended-lambda -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -Xcompiler -fPIC -O3 -gencode arch=compute_80,code=sm_80 -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -Wno-deprecated-gpu-targets -I/global/homes/m/madams/petsc/include -I/global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -I/usr/common/software/sles15_cgpu/cuda/11.1.1/include /global/u2/m/madams/petsc/src/sys/objects/kokkos/kinit.kokkos.cxx -o arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o --show >> >> --Junchao Zhang >> >> >> On Sat, May 15, 2021 at 7:14 PM Mark Adams > wrote: >> Thanks, >> Now I get an error in make all. THis was working a few weeks ago. >> Make.log was empty but here is the output. >> .... >> CC arch-cori-gpu-opt-kokkos-gcc/obj/sys/error/ftn-custom/zerrf.o >> CC arch-cori-gpu-opt-kokkos-gcc/obj/sys/f90-src/f90_cwrap.o >> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/hdf5io.o >> PATH=/global/common/sw/cray/sles15/x86_64/zlib/1.2.11/gcc/8.2.0/pep2pal/bin:/global/common/sw/cray/cnl7/haswell/cmake/3.18.2/bin:/opt/cray/rca/2.2.20-7.0.1.1_4.62__g8e3fb5b.ari/bin:/opt/cray/alps/6.6.58-7.0.1.1_6.20__g437d88db.ari/sbin:/opt/cray/alps/default/bin:/opt/cray/job/2.2.4-7.0.1.1_3.48__g36b56f4.ari/bin:/opt/cray/pe/craype/2.6.2/bin:/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/compilers/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/cuda/bin:/usr/common/software/sles15_cgpu/cuda/11.1.1/bin:/usr/common/software/sles15_cgpu/gcc/8.3.0/bin:/opt/esslurm/bin:/opt/cray/pe/modules/3.2.11.4/bin:/usr/common/software/bin:/usr/common/nsg/bin:/opt/ovis/bin:/opt/ovis/sbin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/cray/pe/bin:`dirname /usr/common/software/sles15_cgpu/cuda/11.1.1/bin/nvcc` NVCC_WRAPPER_DEFAULT_COMPILER=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin/mpicxx /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/bin/nvcc_wrapper --expt-extended-lambda -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -Xcompiler -fPIC -O3 -gencode arch=compute_80,code=sm_80 -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -Wno-deprecated-gpu-targets -I/global/homes/m/madams/petsc/include -I/global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -I/usr/common/software/sles15_cgpu/cuda/11.1.1/include /global/u2/m/madams/petsc/src/sys/objects/kokkos/kinit.kokkos.cxx -o arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o >> g++: error: arch=compute_80: No such file or directory >> g++: error: code=sm_80: No such file or directory >> gmake[3]: *** [gmakefile:188: arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o] Error 1 >> gmake[3]: *** Waiting for unfinished jobs.... >> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/isio.o >> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/ftn-auto/pmapf.o >> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/pmap.o >> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/psort.o >> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/isltog.o >> FC arch-cori-gpu-opt-kokkos-gcc/obj/vec/f90-mod/petscvecmod.o >> CUDAC arch-cori-gpu-opt-kokkos-gcc/obj/sys/memory/cuda/mcudahost.o >> CUDAC.dep arch-cori-gpu-opt-kokkos-gcc/obj/sys/memory/cuda/mcudahost.o >> gmake[2]: *** [/global/homes/m/madams/petsc/lib/petsc/conf/rules:50: libs] Error 2 >> **************************ERROR************************************* >> Error during compile, check arch-cori-gpu-opt-kokkos-gcc/lib/petsc/conf/make.log >> Send it and arch-cori-gpu-opt-kokkos-gcc/lib/petsc/conf/configure.log to petsc-maint at mcs.anl.gov >> ******************************************************************** >> gmake[1]: *** [makefile:40: all] Error 1 >> make: *** [GNUmakefile:9: all] Error 2 >> >> On Sat, May 15, 2021 at 5:30 PM Barry Smith > wrote: >> >> Bitbucket is off-line. Do you need fortran stubs at this moment? If not use --with-fortran-interface=0 >> >> >>> On May 15, 2021, at 2:51 PM, Mark Adams > wrote: >>> >>> I can build a non-kokkos PETSc here (Cori/GPU) but I get this error with Kokkos. >>> Any suggestions? >>> Thanks, >>> Mark >>> >>> ============================================================================================= ============================================================================================= Trying to download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING ============================================================================================= ============================================================================================= Trying to download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING ============================================================================================= ============================================================================================= Trying to download https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz for SOWING ============================================================================================= ******************************************************************************* >>> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): >>> ------------------------------------------------------------------------------- >>> Error during download/extract/detection of SOWING: >>> Unable to clone sowing >>> Could not execute "['git clone https://bitbucket.org/petsc/pkg-sowing.git /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": >>> Cloning into '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... >>> fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/ ': The requested URL returned error: 503 >>> Unable to download package SOWING from: git://https://bitbucket.org/petsc/pkg-sowing.git >>> * If URL specified manually - perhaps there is a typo? >>> * If your network is disconnected - please reconnect and rerun ./configure >>> * Or perhaps you have a firewall blocking the download >>> * You can run with --with-packages-download-dir=/adirectory and ./configure will instruct you what packages to download manually >>> * or you can download the above URL manually, to /yourselectedlocation >>> and use the configure option: >>> --download-sowing=/yourselectedlocation >>> Unable to clone sowing >>> Could not execute "['git clone https://bitbucket.org/petsc/pkg-sowing.git /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": >>> Cloning into '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... >>> fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/ ': The requested URL returned error: 503 >>> Unable to download package SOWING from: git://https://bitbucket.org/petsc/pkg-sowing.git >>> * If URL specified manually - perhaps there is a typo? >>> * If your network is disconnected - please reconnect and rerun ./configure >>> * Or perhaps you have a firewall blocking the download >>> * You can run with --with-packages-download-dir=/adirectory and ./configure will instruct you what packages to download manually >>> * or you can download the above URL manually, to /yourselectedlocation >>> and use the configure option: >>> --download-sowing=/yourselectedlocation >>> file could not be opened successfully >>> Downloaded package SOWING from: https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz is not a tarball. >>> [or installed python cannot process compressed files] >>> * If you are behind a firewall - please fix your proxy and rerun ./configure >>> For example at LANL you may need to set the environmental variable http_proxy (or HTTP_PROXY?) to http://proxyout.lanl.gov >>> * You can run with --with-packages-download-dir=/adirectory and ./configure will instruct you what packages to download manually >>> * or you can download the above URL manually, to /yourselectedlocation/v1.1.26-p1.tar.gz >>> and use the configure option: >>> --download-sowing=/yourselectedlocation/v1.1.26-p1.tar.gz >>> ******************************************************************************* >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sun May 16 08:58:56 2021 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 16 May 2021 08:58:56 -0500 Subject: [petsc-users] configure error In-Reply-To: References: <22289CC9-0240-4046-B42E-8CB80F6DD9BB@petsc.dev> <2ADD8B67-6C33-47A7-824F-5F1C15937DD5@petsc.dev> Message-ID: <16A9657D-0FCE-4BEA-954B-06F4127F4145@petsc.dev> Mark, The kokkos-kernels did not get rebuilt so they will likely have a different geodearch. I think you need to rm -rf arch-cori-gpu-opt-kokkos-gcc but was --with-mpi-dir=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc built with CUDA aware MPI and a different gencodarch than is supported by the hardware? If so, I guess you need to build PETSc using -with-cuda-gencodearch=70 if that is what the MPI was built with. Barry > On May 16, 2021, at 6:13 AM, Mark Adams wrote: > > progress: > > 04:08 cgpu11 adams/cusparse-cpu-solve *= ~/petsc$ make PETSC_DIR=/global/homes/m/madams/petsc PETSC_ARCH=arch-cori-gpu-opt-kokkos-gcc check > Running check examples to verify correct installation > Using PETSC_DIR=/global/homes/m/madams/petsc and PETSC_ARCH=arch-cori-gpu-opt-kokkos-gcc > C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes > See http://www.mcs.anl.gov/petsc/documentation/faq.html > srun: error: Unable to create step for job 1921749: More processors requested than permitted > 2,15c2,32 > < 0 SNES Function norm 2.391552133017e-01 > < 0 KSP Residual norm 2.325621076120e-01 > < 1 KSP Residual norm 1.654206318674e-02 > < 2 KSP Residual norm 7.202836119880e-04 > < 3 KSP Residual norm 1.796861424199e-05 > < 4 KSP Residual norm 2.461332992052e-07 > < 1 SNES Function norm 6.826585648929e-05 > < 0 KSP Residual norm 2.347339172985e-05 > < 1 KSP Residual norm 8.356798075993e-07 > < 2 KSP Residual norm 1.844045309619e-08 > < 3 KSP Residual norm 5.336386977405e-10 > < 4 KSP Residual norm 2.662608472862e-11 > < 2 SNES Function norm 6.549682264799e-11 > < Number of SNES iterations = 2 > --- > > Kokkos::Cuda::initialize ERROR: likely mismatch of architecture > > [cgpu11:39927] *** Process received signal *** > > [cgpu11:39927] Signal: Aborted (6) > > [cgpu11:39927] Signal code: (-6) > > [cgpu11:39927] [ 0] /lib64/libpthread.so.0(+0x12360)[0x2aab12731360] > > [cgpu11:39927] [ 1] /lib64/libc.so.6(gsignal+0x110)[0x2aab12973160] > > [cgpu11:39927] [ 2] /lib64/libc.so.6(abort+0x151)[0x2aab12974741] > > [cgpu11:39927] [ 3] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(+0x408e5)[0x2aaabbfb88e5] > > [cgpu11:39927] [ 4] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl12CudaInternal10initializeEiP11CUstream_st+0xb5c)[0x2aaabbfc762c] > > [cgpu11:39927] [ 5] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl16ExecSpaceManager17initialize_spacesERKNS_13InitArgumentsE+0x24)[0x2aaabbfa1224] > > [cgpu11:39927] [ 6] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_00011209_00000000_7_Kokkos_Core_cpp1_ii_889c95a619initialize_internalERKNS_13InitArgumentsE+0x11)[0x2aaabbfa49d1] > > [cgpu11:39927] [ 7] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(PetscKokkosInitializeCheck+0x1f9)[0x2aaaaafad809] > > [cgpu11:39927] [ 8] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x68541b)[0x2aaaab35841b] > > [cgpu11:39927] [ 9] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x36a224)[0x2aaaab03d224] > > [cgpu11:39927] [10] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x35f596)[0x2aaaab032596] > > [cgpu11:39927] [11] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x2f9d94)[0x2aaaaafccd94] > > [cgpu11:39927] [12] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(PetscSFBcastWithMemTypeBegin+0x11f)[0x2aaaab04bc3f] > > [cgpu11:39927] [13] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x37f53a)[0x2aaaab05253a] > > [cgpu11:39927] [14] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(VecScatterBegin+0x70)[0x2aaaab057b30] > > [cgpu11:39927] [15] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(DMGlobalToLocalBegin_DA+0x1f)[0x2aaaabc27a5f] > > [cgpu11:39927] [16] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(DMGlobalToLocalBegin+0x1dd)[0x2aaaabe61b3d] > > [cgpu11:39927] [17] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x144567a)[0x2aaaac11867a] > > [cgpu11:39927] [18] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESComputeFunction+0xf5)[0x2aaaac138ef5] > > [cgpu11:39927] [19] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x14ad0de)[0x2aaaac1800de] > > [cgpu11:39927] [20] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESSolve+0x821)[0x2aaaac146ed1] > > [cgpu11:39927] [21] /global/u2/m/madams/petsc/src/snes/tutorials/./ex19[0x401a69] > > [cgpu11:39927] [22] /lib64/libc.so.6(__libc_start_main+0xea)[0x2aab1295df8a] > > [cgpu11:39927] [23] /global/u2/m/madams/petsc/src/snes/tutorials/./ex19[0x4021ba] > > [cgpu11:39927] *** End of error message *** > > srun: error: cgpu11: task 0: Aborted > > srun: Terminating job step 1921749.6 > /global/homes/m/madams/petsc/src/snes/tutorials > Possible problem with ex19 running with cuda, diffs above > ========================================= > gmake[3]: [makefile:102: runex3k_kokkos] Error 1 (ignored) > > On Sun, May 16, 2021 at 12:31 AM Barry Smith > wrote: > > Looks like nvcc_wrapper gets confused when both -arch and -gencode are used. But handles either of them separately correctly. > > Mark, > > PETSc configure now handles the -arch and --with-kokkos-cuda-arch= business automatically so you do not, and generally shouldn't pass -arch sm_70 in --CUDAFLAGS anymore or use the --with-kokkos-cuda-arch option anymore. > > If you need to set the -arch to something earlier than the system supports (rare) you can now use -with-cuda-gencodearch to set that instead of using CUDAFLAGS. > > Barry > > > $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda -c -Xcompiler -fPIC -g -gencode arch=compute_70,code=sm_70 -I/home/bsmith/soft/gnu-mpich/include -Wno-deprecated-gpu-targets -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx -o arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o > nvcc --expt-extended-lambda -gencode arch=compute_70,code=sm_70 -Wno-deprecated-gpu-targets -ccbin mpicxx -g -I/home/bsmith/soft/gnu-mpich/include -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include -Xcompiler -fPIC -x cu /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx -c -o arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o > ~/petsc (main=) arch-main > > $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -Xcompiler -fPIC -O3 -gencode arch=compute_80,code=sm_80 -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -Wno-deprecated-gpu-targets > nvcc --expt-extended-lambda -arch=sm_70 -Wno-deprecated-gpu-targets -ccbin mpicxx -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -O3 -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -Xcompiler -rdynamic,-fPIC,arch=compute_80,code=sm_80 -c > ~/petsc (main=) arch-main > > $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda -c -Xcompiler -fPIC -g -arch=sm_70 -gencode arch=compute_70,code=sm_70 -I/home/bsmith/soft/gnu-mpich/include -Wno-deprecated-gpu-targets -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx -o arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o > nvcc --expt-extended-lambda -arch=sm_70 -Wno-deprecated-gpu-targets -ccbin mpicxx -g -I/home/bsmith/soft/gnu-mpich/include -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include -Xcompiler -fPIC,arch=compute_70,code=sm_70 -x cu /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx -c -o arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o > > >> On May 15, 2021, at 8:44 PM, Junchao Zhang > wrote: >> >> Add --show to the failing nvcc_wrapper command and see how nvcc_wrapper parses the options >> >> PATH=/global/common/sw/cray/sles15/x86_64/zlib/1.2.11/gcc/8.2.0/pep2pal/bin:/global/common/sw/cray/cnl7/haswell/cmake/3.18.2/bin:/opt/cray/rca/2.2.20-7.0.1.1_4.62__g8e3fb5b.ari/bin:/opt/cray/alps/6.6.58-7.0.1.1_6.20__g437d88db.ari/sbin:/opt/cray/alps/default/bin:/opt/cray/job/2.2.4-7.0.1.1_3.48__g36b56f4.ari/bin:/opt/cray/pe/craype/2.6.2/bin:/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/compilers/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/cuda/bin:/usr/common/software/sles15_cgpu/cuda/11.1.1/bin:/usr/common/software/sles15_cgpu/gcc/8.3.0/bin:/opt/esslurm/bin:/opt/cray/pe/modules/3.2.11.4/bin:/usr/common/software/bin:/usr/common/nsg/bin:/opt/ovis/bin:/opt/ovis/sbin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/cray/pe/bin:`dirname /usr/common/software/sles15_cgpu/cuda/11.1.1/bin/nvcc` NVCC_WRAPPER_DEFAULT_COMPILER=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin/mpicxx /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/bin/nvcc_wrapper --expt-extended-lambda -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -Xcompiler -fPIC -O3 -gencode arch=compute_80,code=sm_80 -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -Wno-deprecated-gpu-targets -I/global/homes/m/madams/petsc/include -I/global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -I/usr/common/software/sles15_cgpu/cuda/11.1.1/include /global/u2/m/madams/petsc/src/sys/objects/kokkos/kinit.kokkos.cxx -o arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o --show >> >> --Junchao Zhang >> >> >> On Sat, May 15, 2021 at 7:14 PM Mark Adams > wrote: >> Thanks, >> Now I get an error in make all. THis was working a few weeks ago. >> Make.log was empty but here is the output. >> .... >> CC arch-cori-gpu-opt-kokkos-gcc/obj/sys/error/ftn-custom/zerrf.o >> CC arch-cori-gpu-opt-kokkos-gcc/obj/sys/f90-src/f90_cwrap.o >> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/hdf5io.o >> PATH=/global/common/sw/cray/sles15/x86_64/zlib/1.2.11/gcc/8.2.0/pep2pal/bin:/global/common/sw/cray/cnl7/haswell/cmake/3.18.2/bin:/opt/cray/rca/2.2.20-7.0.1.1_4.62__g8e3fb5b.ari/bin:/opt/cray/alps/6.6.58-7.0.1.1_6.20__g437d88db.ari/sbin:/opt/cray/alps/default/bin:/opt/cray/job/2.2.4-7.0.1.1_3.48__g36b56f4.ari/bin:/opt/cray/pe/craype/2.6.2/bin:/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/compilers/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/cuda/bin:/usr/common/software/sles15_cgpu/cuda/11.1.1/bin:/usr/common/software/sles15_cgpu/gcc/8.3.0/bin:/opt/esslurm/bin:/opt/cray/pe/modules/3.2.11.4/bin:/usr/common/software/bin:/usr/common/nsg/bin:/opt/ovis/bin:/opt/ovis/sbin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/cray/pe/bin:`dirname /usr/common/software/sles15_cgpu/cuda/11.1.1/bin/nvcc` NVCC_WRAPPER_DEFAULT_COMPILER=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin/mpicxx /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/bin/nvcc_wrapper --expt-extended-lambda -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -Xcompiler -fPIC -O3 -gencode arch=compute_80,code=sm_80 -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -Wno-deprecated-gpu-targets -I/global/homes/m/madams/petsc/include -I/global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -I/usr/common/software/sles15_cgpu/cuda/11.1.1/include /global/u2/m/madams/petsc/src/sys/objects/kokkos/kinit.kokkos.cxx -o arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o >> g++: error: arch=compute_80: No such file or directory >> g++: error: code=sm_80: No such file or directory >> gmake[3]: *** [gmakefile:188: arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o] Error 1 >> gmake[3]: *** Waiting for unfinished jobs.... >> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/isio.o >> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/ftn-auto/pmapf.o >> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/pmap.o >> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/psort.o >> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/isltog.o >> FC arch-cori-gpu-opt-kokkos-gcc/obj/vec/f90-mod/petscvecmod.o >> CUDAC arch-cori-gpu-opt-kokkos-gcc/obj/sys/memory/cuda/mcudahost.o >> CUDAC.dep arch-cori-gpu-opt-kokkos-gcc/obj/sys/memory/cuda/mcudahost.o >> gmake[2]: *** [/global/homes/m/madams/petsc/lib/petsc/conf/rules:50: libs] Error 2 >> **************************ERROR************************************* >> Error during compile, check arch-cori-gpu-opt-kokkos-gcc/lib/petsc/conf/make.log >> Send it and arch-cori-gpu-opt-kokkos-gcc/lib/petsc/conf/configure.log to petsc-maint at mcs.anl.gov >> ******************************************************************** >> gmake[1]: *** [makefile:40: all] Error 1 >> make: *** [GNUmakefile:9: all] Error 2 >> >> On Sat, May 15, 2021 at 5:30 PM Barry Smith > wrote: >> >> Bitbucket is off-line. Do you need fortran stubs at this moment? If not use --with-fortran-interface=0 >> >> >>> On May 15, 2021, at 2:51 PM, Mark Adams > wrote: >>> >>> I can build a non-kokkos PETSc here (Cori/GPU) but I get this error with Kokkos. >>> Any suggestions? >>> Thanks, >>> Mark >>> >>> ============================================================================================= ============================================================================================= Trying to download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING ============================================================================================= ============================================================================================= Trying to download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING ============================================================================================= ============================================================================================= Trying to download https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz for SOWING ============================================================================================= ******************************************************************************* >>> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): >>> ------------------------------------------------------------------------------- >>> Error during download/extract/detection of SOWING: >>> Unable to clone sowing >>> Could not execute "['git clone https://bitbucket.org/petsc/pkg-sowing.git /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": >>> Cloning into '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... >>> fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/ ': The requested URL returned error: 503 >>> Unable to download package SOWING from: git://https://bitbucket.org/petsc/pkg-sowing.git >>> * If URL specified manually - perhaps there is a typo? >>> * If your network is disconnected - please reconnect and rerun ./configure >>> * Or perhaps you have a firewall blocking the download >>> * You can run with --with-packages-download-dir=/adirectory and ./configure will instruct you what packages to download manually >>> * or you can download the above URL manually, to /yourselectedlocation >>> and use the configure option: >>> --download-sowing=/yourselectedlocation >>> Unable to clone sowing >>> Could not execute "['git clone https://bitbucket.org/petsc/pkg-sowing.git /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": >>> Cloning into '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... >>> fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/ ': The requested URL returned error: 503 >>> Unable to download package SOWING from: git://https://bitbucket.org/petsc/pkg-sowing.git >>> * If URL specified manually - perhaps there is a typo? >>> * If your network is disconnected - please reconnect and rerun ./configure >>> * Or perhaps you have a firewall blocking the download >>> * You can run with --with-packages-download-dir=/adirectory and ./configure will instruct you what packages to download manually >>> * or you can download the above URL manually, to /yourselectedlocation >>> and use the configure option: >>> --download-sowing=/yourselectedlocation >>> file could not be opened successfully >>> Downloaded package SOWING from: https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz is not a tarball. >>> [or installed python cannot process compressed files] >>> * If you are behind a firewall - please fix your proxy and rerun ./configure >>> For example at LANL you may need to set the environmental variable http_proxy (or HTTP_PROXY?) to http://proxyout.lanl.gov >>> * You can run with --with-packages-download-dir=/adirectory and ./configure will instruct you what packages to download manually >>> * or you can download the above URL manually, to /yourselectedlocation/v1.1.26-p1.tar.gz >>> and use the configure option: >>> --download-sowing=/yourselectedlocation/v1.1.26-p1.tar.gz >>> ******************************************************************************* >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sun May 16 10:26:10 2021 From: mfadams at lbl.gov (Mark Adams) Date: Sun, 16 May 2021 11:26:10 -0400 Subject: [petsc-users] configure error In-Reply-To: <16A9657D-0FCE-4BEA-954B-06F4127F4145@petsc.dev> References: <22289CC9-0240-4046-B42E-8CB80F6DD9BB@petsc.dev> <2ADD8B67-6C33-47A7-824F-5F1C15937DD5@petsc.dev> <16A9657D-0FCE-4BEA-954B-06F4127F4145@petsc.dev> Message-ID: On Sun, May 16, 2021 at 9:58 AM Barry Smith wrote: > > Mark, > > The kokkos-kernels did not get rebuilt so they will likely have a > different geodearch. > > I think you need to rm -rf arch-cori-gpu-opt-kokkos-gcc > Yes, of course, but did not help. I still get this error. > > but > was --with-mpi-dir=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc built > with CUDA aware MPI and a different gencodarch than is supported by the > hardware? > I don't know. I use MPI very very little (and not timed) so I don't care about it. This is just what I got to compile and work. > If so, I guess you need to build PETSc using -with-cuda-gencodearch=70 if > that is what the MPI was built with. > OK, that worked, thanks, > > Barry > > > On May 16, 2021, at 6:13 AM, Mark Adams wrote: > > progress: > > 04:08 cgpu11 adams/cusparse-cpu-solve *= ~/petsc$ make > PETSC_DIR=/global/homes/m/madams/petsc > PETSC_ARCH=arch-cori-gpu-opt-kokkos-gcc check > Running check examples to verify correct installation > Using PETSC_DIR=/global/homes/m/madams/petsc and > PETSC_ARCH=arch-cori-gpu-opt-kokkos-gcc > C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes > See http://www.mcs.anl.gov/petsc/documentation/faq.html > srun: error: Unable to create step for job 1921749: More processors > requested than permitted > 2,15c2,32 > < 0 SNES Function norm 2.391552133017e-01 > < 0 KSP Residual norm 2.325621076120e-01 > < 1 KSP Residual norm 1.654206318674e-02 > < 2 KSP Residual norm 7.202836119880e-04 > < 3 KSP Residual norm 1.796861424199e-05 > < 4 KSP Residual norm 2.461332992052e-07 > < 1 SNES Function norm 6.826585648929e-05 > < 0 KSP Residual norm 2.347339172985e-05 > < 1 KSP Residual norm 8.356798075993e-07 > < 2 KSP Residual norm 1.844045309619e-08 > < 3 KSP Residual norm 5.336386977405e-10 > < 4 KSP Residual norm 2.662608472862e-11 > < 2 SNES Function norm 6.549682264799e-11 > < Number of SNES iterations = 2 > --- > > Kokkos::Cuda::initialize ERROR: likely mismatch of architecture > > [cgpu11:39927] *** Process received signal *** > > [cgpu11:39927] Signal: Aborted (6) > > [cgpu11:39927] Signal code: (-6) > > [cgpu11:39927] [ 0] /lib64/libpthread.so.0(+0x12360)[0x2aab12731360] > > [cgpu11:39927] [ 1] /lib64/libc.so.6(gsignal+0x110)[0x2aab12973160] > > [cgpu11:39927] [ 2] /lib64/libc.so.6(abort+0x151)[0x2aab12974741] > > [cgpu11:39927] [ 3] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(+0x408e5)[0x2aaabbfb88e5] > > [cgpu11:39927] [ 4] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl12CudaInternal10initializeEiP11CUstream_st+0xb5c)[0x2aaabbfc762c] > > [cgpu11:39927] [ 5] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl16ExecSpaceManager17initialize_spacesERKNS_13InitArgumentsE+0x24)[0x2aaabbfa1224] > > [cgpu11:39927] [ 6] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_00011209_00000000_7_Kokkos_Core_cpp1_ii_889c95a619initialize_internalERKNS_13InitArgumentsE+0x11)[0x2aaabbfa49d1] > > [cgpu11:39927] [ 7] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(PetscKokkosInitializeCheck+0x1f9)[0x2aaaaafad809] > > [cgpu11:39927] [ 8] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x68541b)[0x2aaaab35841b] > > [cgpu11:39927] [ 9] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x36a224)[0x2aaaab03d224] > > [cgpu11:39927] [10] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x35f596)[0x2aaaab032596] > > [cgpu11:39927] [11] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x2f9d94)[0x2aaaaafccd94] > > [cgpu11:39927] [12] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(PetscSFBcastWithMemTypeBegin+0x11f)[0x2aaaab04bc3f] > > [cgpu11:39927] [13] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x37f53a)[0x2aaaab05253a] > > [cgpu11:39927] [14] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(VecScatterBegin+0x70)[0x2aaaab057b30] > > [cgpu11:39927] [15] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(DMGlobalToLocalBegin_DA+0x1f)[0x2aaaabc27a5f] > > [cgpu11:39927] [16] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(DMGlobalToLocalBegin+0x1dd)[0x2aaaabe61b3d] > > [cgpu11:39927] [17] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x144567a)[0x2aaaac11867a] > > [cgpu11:39927] [18] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESComputeFunction+0xf5)[0x2aaaac138ef5] > > [cgpu11:39927] [19] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x14ad0de)[0x2aaaac1800de] > > [cgpu11:39927] [20] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESSolve+0x821)[0x2aaaac146ed1] > > [cgpu11:39927] [21] > /global/u2/m/madams/petsc/src/snes/tutorials/./ex19[0x401a69] > > [cgpu11:39927] [22] > /lib64/libc.so.6(__libc_start_main+0xea)[0x2aab1295df8a] > > [cgpu11:39927] [23] > /global/u2/m/madams/petsc/src/snes/tutorials/./ex19[0x4021ba] > > [cgpu11:39927] *** End of error message *** > > srun: error: cgpu11: task 0: Aborted > > srun: Terminating job step 1921749.6 > /global/homes/m/madams/petsc/src/snes/tutorials > Possible problem with ex19 running with cuda, diffs above > ========================================= > gmake[3]: [makefile:102: runex3k_kokkos] Error 1 (ignored) > > On Sun, May 16, 2021 at 12:31 AM Barry Smith wrote: > >> >> Looks like nvcc_wrapper gets confused when both -arch and -gencode are >> used. But handles either of them separately correctly. >> >> Mark, >> >> PETSc configure now handles the -arch and --with-kokkos-cuda-arch= >> business automatically so you do not, and generally shouldn't pass -arch >> sm_70 in --CUDAFLAGS anymore or use the --with-kokkos-cuda-arch option >> anymore. >> >> If you need to set the -arch to something earlier than the system >> supports (rare) you can now use -with-cuda-gencodearch to set that instead >> of using CUDAFLAGS. >> >> Barry >> >> >> $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx >> /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda >> -c -Xcompiler -fPIC -g -gencode arch=compute_70,code=sm_70 >> -I/home/bsmith/soft/gnu-mpich/include -Wno-deprecated-gpu-targets >> -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include >> -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include >> /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx >> -o arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o >> nvcc --expt-extended-lambda -gencode arch=compute_70,code=sm_70 >> -Wno-deprecated-gpu-targets -ccbin mpicxx -g >> -I/home/bsmith/soft/gnu-mpich/include -I/home/bsmith/petsc/include >> -I/home/bsmith/petsc/arch-main/include >> -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include >> -Xcompiler -fPIC -x cu >> /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx -c -o >> arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o >> ~/petsc* (main=)* arch-main >> >> $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx >> /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda >> -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 >> -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -Xcompiler -fPIC -O3 -gencode >> arch=compute_80,code=sm_80 >> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include >> -Wno-deprecated-gpu-targets >> nvcc --expt-extended-lambda -arch=sm_70 -Wno-deprecated-gpu-targets >> -ccbin mpicxx -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 >> -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -O3 >> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include -Xcompiler >> -rdynamic,-fPIC,arch=compute_80,code=sm_80 -c >> ~/petsc* (main=)* arch-main >> >> $ NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx >> /home/bsmith/petsc/arch-main/bin/nvcc_wrapper --show --expt-extended-lambda >> -c -Xcompiler -fPIC -g -arch=sm_70 -gencode arch=compute_70,code=sm_70 >> -I/home/bsmith/soft/gnu-mpich/include -Wno-deprecated-gpu-targets >> -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include >> -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include >> /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx >> -o arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o >> nvcc --expt-extended-lambda -arch=sm_70 -Wno-deprecated-gpu-targets >> -ccbin mpicxx -g -I/home/bsmith/soft/gnu-mpich/include >> -I/home/bsmith/petsc/include -I/home/bsmith/petsc/arch-main/include >> -I/nfs/apps/spacks/2021-02-09/opt/spack/linux-centos7-x86_64/gcc-7.3.0/cuda-11.2.0-ikruu5mo6dtt3avvmwsejouqhxu4btdm/include >> -Xcompiler -fPIC,arch=compute_70,code=sm_70 -x cu >> /home/bsmith/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx -c -o >> arch-main/obj/ts/utils/dmplexlandau/kokkos/landau.o >> >> >> On May 15, 2021, at 8:44 PM, Junchao Zhang >> wrote: >> >> Add --show to the failing nvcc_wrapper command and see how nvcc_wrapper >> parses the options >> >> >> PATH=/global/common/sw/cray/sles15/x86_64/zlib/1.2.11/gcc/8.2.0/pep2pal/bin:/global/common/sw/cray/cnl7/haswell/cmake/3.18.2/bin:/opt/cray/rca/2.2.20-7.0.1.1_4.62__g8e3fb5b.ari/bin:/opt/cray/alps/6.6.58-7.0.1.1_6.20__g437d88db.ari/sbin:/opt/cray/alps/default/bin:/opt/cray/job/2.2.4-7.0.1.1_3.48__g36b56f4.ari/bin:/opt/cray/pe/craype/2.6.2/bin:/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/compilers/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/cuda/bin:/usr/common/software/sles15_cgpu/cuda/11.1.1/bin:/usr/common/software/sles15_cgpu/gcc/8.3.0/bin:/opt/esslurm/bin:/opt/cray/pe/modules/ >> 3.2.11.4/bin:/usr/common/software/bin:/usr/common/nsg/bin:/opt/ovis/bin:/opt/ovis/sbin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/cray/pe/bin:`dirname >> /usr/common/software/sles15_cgpu/cuda/11.1.1/bin/nvcc` >> NVCC_WRAPPER_DEFAULT_COMPILER=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin/mpicxx >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/bin/nvcc_wrapper >> --expt-extended-lambda -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 >> -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 >> -Xcompiler -fPIC -O3 -gencode arch=compute_80,code=sm_80 >> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include >> -Wno-deprecated-gpu-targets -I/global/homes/m/madams/petsc/include >> -I/global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include >> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include >> -I/usr/common/software/sles15_cgpu/cuda/11.1.1/include >> /global/u2/m/madams/petsc/src/sys/objects/kokkos/kinit.kokkos.cxx -o >> arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o --show >> >> --Junchao Zhang >> >> >> On Sat, May 15, 2021 at 7:14 PM Mark Adams wrote: >> >>> Thanks, >>> Now I get an error in make all. THis was working a few weeks ago. >>> Make.log was empty but here is the output. >>> .... >>> CC arch-cori-gpu-opt-kokkos-gcc/obj/sys/error/ftn-custom/zerrf.o >>> CC arch-cori-gpu-opt-kokkos-gcc/obj/sys/f90-src/f90_cwrap.o >>> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/hdf5io.o >>> >>> PATH=/global/common/sw/cray/sles15/x86_64/zlib/1.2.11/gcc/8.2.0/pep2pal/bin:/global/common/sw/cray/cnl7/haswell/cmake/3.18.2/bin:/opt/cray/rca/2.2.20-7.0.1.1_4.62__g8e3fb5b.ari/bin:/opt/cray/alps/6.6.58-7.0.1.1_6.20__g437d88db.ari/sbin:/opt/cray/alps/default/bin:/opt/cray/job/2.2.4-7.0.1.1_3.48__g36b56f4.ari/bin:/opt/cray/pe/craype/2.6.2/bin:/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/compilers/bin:/usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/cuda/bin:/usr/common/software/sles15_cgpu/cuda/11.1.1/bin:/usr/common/software/sles15_cgpu/gcc/8.3.0/bin:/opt/esslurm/bin:/opt/cray/pe/modules/ >>> 3.2.11.4/bin:/usr/common/software/bin:/usr/common/nsg/bin:/opt/ovis/bin:/opt/ovis/sbin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/opt/cray/pe/bin:`dirname >>> >>> /usr/common/software/sles15_cgpu/cuda/11.1.1/bin/nvcc` >>> NVCC_WRAPPER_DEFAULT_COMPILER=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/bin/mpicxx >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/bin/nvcc_wrapper >>> --expt-extended-lambda -c -arch=sm_70 -Xcompiler -rdynamic -DLANDAU_DIM=2 >>> -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 >>> -Xcompiler -fPIC -O3 -gencode arch=compute_80,code=sm_80 >>> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include >>> -Wno-deprecated-gpu-targets -I/global/homes/m/madams/petsc/include >>> -I/global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include >>> -I/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc/include >>> -I/usr/common/software/sles15_cgpu/cuda/11.1.1/include >>> /global/u2/m/madams/petsc/src/sys/objects/kokkos/kinit.kokkos.cxx -o >>> arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o >>> g++: error: arch=compute_80: No such file or directory >>> g++: error: code=sm_80: No such file or directory >>> gmake[3]: *** [gmakefile:188: >>> arch-cori-gpu-opt-kokkos-gcc/obj/sys/objects/kokkos/kinit.o] Error 1 >>> gmake[3]: *** Waiting for unfinished jobs.... >>> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/isio.o >>> CC >>> arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/ftn-auto/pmapf.o >>> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/pmap.o >>> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/psort.o >>> CC arch-cori-gpu-opt-kokkos-gcc/obj/vec/is/utils/isltog.o >>> FC arch-cori-gpu-opt-kokkos-gcc/obj/vec/f90-mod/petscvecmod.o >>> CUDAC arch-cori-gpu-opt-kokkos-gcc/obj/sys/memory/cuda/mcudahost.o >>> CUDAC.dep arch-cori-gpu-opt-kokkos-gcc/obj/sys/memory/cuda/mcudahost.o >>> gmake[2]: *** [/global/homes/m/madams/petsc/lib/petsc/conf/rules:50: >>> libs] Error 2 >>> **************************ERROR************************************* >>> Error during compile, check >>> arch-cori-gpu-opt-kokkos-gcc/lib/petsc/conf/make.log >>> Send it and arch-cori-gpu-opt-kokkos-gcc/lib/petsc/conf/configure.log >>> to petsc-maint at mcs.anl.gov >>> ******************************************************************** >>> gmake[1]: *** [makefile:40: all] Error 1 >>> make: *** [GNUmakefile:9: all] Error 2 >>> >>> On Sat, May 15, 2021 at 5:30 PM Barry Smith wrote: >>> >>>> >>>> Bitbucket is off-line. Do you need fortran stubs at this moment? If >>>> not use --with-fortran-interface=0 >>>> >>>> >>>> On May 15, 2021, at 2:51 PM, Mark Adams wrote: >>>> >>>> I can build a non-kokkos PETSc here (Cori/GPU) but I get this >>>> error with Kokkos. >>>> Any suggestions? >>>> Thanks, >>>> Mark >>>> >>>> ============================================================================================= >>>> >>>> >>>> >>>> ============================================================================================= >>>> >>>> >>>> Trying to >>>> download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING >>>> >>>> >>>> >>>> ============================================================================================= >>>> >>>> >>>> >>>> ============================================================================================= >>>> >>>> >>>> Trying to >>>> download git://https://bitbucket.org/petsc/pkg-sowing.git for SOWING >>>> >>>> >>>> >>>> ============================================================================================= >>>> >>>> >>>> >>>> ============================================================================================= >>>> >>>> >>>> Trying to >>>> download https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz >>>> for SOWING >>>> >>>> >>>> ============================================================================================= >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ******************************************************************************* >>>> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log >>>> for details): >>>> >>>> ------------------------------------------------------------------------------- >>>> Error during download/extract/detection of SOWING: >>>> Unable to clone sowing >>>> Could not execute "['git clone >>>> https://bitbucket.org/petsc/pkg-sowing.git >>>> /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": >>>> Cloning into >>>> '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... >>>> fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/': >>>> The requested URL returned error: 503 >>>> Unable to download package SOWING from: git:// >>>> https://bitbucket.org/petsc/pkg-sowing.git >>>> * If URL specified manually - perhaps there is a typo? >>>> * If your network is disconnected - please reconnect and rerun >>>> ./configure >>>> * Or perhaps you have a firewall blocking the download >>>> * You can run with --with-packages-download-dir=/adirectory and >>>> ./configure will instruct you what packages to download manually >>>> * or you can download the above URL manually, to /yourselectedlocation >>>> and use the configure option: >>>> --download-sowing=/yourselectedlocation >>>> Unable to clone sowing >>>> Could not execute "['git clone >>>> https://bitbucket.org/petsc/pkg-sowing.git >>>> /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing']": >>>> Cloning into >>>> '/global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/externalpackages/git.sowing'... >>>> fatal: unable to access 'https://bitbucket.org/petsc/pkg-sowing.git/': >>>> The requested URL returned error: 503 >>>> Unable to download package SOWING from: git:// >>>> https://bitbucket.org/petsc/pkg-sowing.git >>>> * If URL specified manually - perhaps there is a typo? >>>> * If your network is disconnected - please reconnect and rerun >>>> ./configure >>>> * Or perhaps you have a firewall blocking the download >>>> * You can run with --with-packages-download-dir=/adirectory and >>>> ./configure will instruct you what packages to download manually >>>> * or you can download the above URL manually, to /yourselectedlocation >>>> and use the configure option: >>>> --download-sowing=/yourselectedlocation >>>> file could not be opened successfully >>>> Downloaded package SOWING from: >>>> https://bitbucket.org/petsc/pkg-sowing/get/v1.1.26-p1.tar.gz is not a >>>> tarball. >>>> [or installed python cannot process compressed files] >>>> * If you are behind a firewall - please fix your proxy and rerun >>>> ./configure >>>> For example at LANL you may need to set the environmental variable >>>> http_proxy (or HTTP_PROXY?) to http://proxyout.lanl.gov >>>> * You can run with --with-packages-download-dir=/adirectory and >>>> ./configure will instruct you what packages to download manually >>>> * or you can download the above URL manually, to >>>> /yourselectedlocation/v1.1.26-p1.tar.gz >>>> and use the configure option: >>>> --download-sowing=/yourselectedlocation/v1.1.26-p1.tar.gz >>>> >>>> ******************************************************************************* >>>> >>>> >>>> >>>> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 3912047 bytes Desc: not available URL: From mfadams at lbl.gov Sun May 16 21:09:57 2021 From: mfadams at lbl.gov (Mark Adams) Date: Sun, 16 May 2021 22:09:57 -0400 Subject: [petsc-users] configure error In-Reply-To: References: <22289CC9-0240-4046-B42E-8CB80F6DD9BB@petsc.dev> <2ADD8B67-6C33-47A7-824F-5F1C15937DD5@petsc.dev> <16A9657D-0FCE-4BEA-954B-06F4127F4145@petsc.dev> Message-ID: I now get this error. A blas error from VecAXPBYPCZ ... Any ideas? terminate called after throwing an instance of 'std::runtime_error' what(): cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func()) error( cudaErrorInvalidDeviceFunction): invalid device function /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp:654 Traceback functionality not available [cgpu16:55192] *** Process received signal *** [cgpu16:55192] Signal: Aborted (6) [cgpu16:55192] Signal code: (-6) [cgpu16:55192] [ 0] /lib64/libpthread.so.0(+0x12360)[0x2aab12445360] [cgpu16:55192] [ 1] /lib64/libc.so.6(gsignal+0x110)[0x2aab12687160] [cgpu16:55192] [ 2] /lib64/libc.so.6(abort+0x151)[0x2aab12688741] [cgpu16:55192] [ 3] /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x93e83)[0x2aab10cb0e83] [cgpu16:55192] [ 4] /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99de6)[0x2aab10cb6de6] [cgpu16:55192] [ 5] /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99e21)[0x2aab10cb6e21] [cgpu16:55192] [ 6] /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x9a053)[0x2aab10cb7053] [cgpu16:55192] [ 7] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(+0x26a7f)[0x2aaabbcb3a7f] [cgpu16:55192] [ 8] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl25cuda_internal_error_throwE9cudaErrorPKcS3_i+0x29d)[0x2aaabbcdab9d] [cgpu16:55192] [ 9] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl16V_Update_GenericIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEEiEEvRKNT_20non_const_value_typeERKSG_RKNT0_20non_const_value_typeERKSM_RKNT1_20non_const_value_typeERKSS_iii+0x3357)[0x2aaaae7108a7] [cgpu16:55192] [10] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl6UpdateIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEELi1ELb0ELb1EE6updateERS4_RKSD_SH_SJ_SH_RKSF_+0xc1)[0x2aaaae7171a1] [cgpu16:55192] [11] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(_ZN10KokkosBlas6updateIN6Kokkos4ViewIPKdJNS1_9CudaSpaceEEEES6_NS2_IPdJS5_EEEEEvRKNT_20non_const_value_typeERKS9_RKNT0_20non_const_value_typeERKSF_RKNT1_20non_const_value_typeERKSL_+0x271)[0x2aaaab76d781] [cgpu16:55192] [12] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0xa9333b)[0x2aaaab76633b] [cgpu16:55192] [13] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(VecAXPBYPCZ+0x261)[0x2aaaab0b03c1] [cgpu16:55192] [14] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155144e)[0x2aaaac22444e] [cgpu16:55192] [15] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESTSFormFunction+0xa)[0x2aaaac1c9c1a] [cgpu16:55192] [16] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESComputeFunction+0xf5)[0x2aaaac138675] [cgpu16:55192] [17] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x14ac85e)[0x2aaaac17f85e] [cgpu16:55192] [18] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESSolve+0x821)[0x2aaaac146651] [cgpu16:55192] [19] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155526c)[0x2aaaac22826c] [cgpu16:55192] [20] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSStep+0x1f5)[0x2aaaac1d6a05] [cgpu16:55192] [21] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSSolve+0x6a5)[0x2aaaac1dc455] [cgpu16:55192] [22] ../ex2-kok[0x4033eb] [cgpu16:55192] [23] /lib64/libc.so.6(__libc_start_main+0xea)[0x2aab12671f8a] [cgpu16:55192] [24] ../ex2-kok[0x404aaa] [cgpu16:55192] *** End of error message *** /global/homes/m/madams/mps-wrapper.sh: line 30: 55192 Aborted "$@" 0 stopping nvidia-cuda-mps-control on cgpu16 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sun May 16 22:14:55 2021 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 16 May 2021 22:14:55 -0500 Subject: [petsc-users] configure error In-Reply-To: References: <22289CC9-0240-4046-B42E-8CB80F6DD9BB@petsc.dev> <2ADD8B67-6C33-47A7-824F-5F1C15937DD5@petsc.dev> <16A9657D-0FCE-4BEA-954B-06F4127F4145@petsc.dev> Message-ID: Could still be a gencode arch issue. Is it possible that Kokkos was built with the 80 arch and when you reran configure with 70 it did not rebuild Kokkos because it didn't know it needed to? Sorry, but this may require another rm -rf arch* and running ./configure again. https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg3f51e3575c2178246db0a94a430e0038b6af535e7e53d3f21e2437e8977b8c2e cudaErrorInvalidDeviceFunction = 98 The requested device function does not exist or is not compiled for the proper device architecture. > On May 16, 2021, at 9:09 PM, Mark Adams wrote: > > I now get this error. A blas error from VecAXPBYPCZ ... > Any ideas? > > > terminate called after throwing an instance of 'std::runtime_error' > what(): cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func()) error( cudaErrorInvalidDeviceFunction): invalid device function /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp:654 > Traceback functionality not available > > [cgpu16:55192] *** Process received signal *** > [cgpu16:55192] Signal: Aborted (6) > [cgpu16:55192] Signal code: (-6) > [cgpu16:55192] [ 0] /lib64/libpthread.so.0(+0x12360)[0x2aab12445360] > [cgpu16:55192] [ 1] /lib64/libc.so.6(gsignal+0x110)[0x2aab12687160] > [cgpu16:55192] [ 2] /lib64/libc.so.6(abort+0x151)[0x2aab12688741] > [cgpu16:55192] [ 3] /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x93e83)[0x2aab10cb0e83] > [cgpu16:55192] [ 4] /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99de6)[0x2aab10cb6de6] > [cgpu16:55192] [ 5] /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99e21)[0x2aab10cb6e21] > [cgpu16:55192] [ 6] /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x9a053)[0x2aab10cb7053] > [cgpu16:55192] [ 7] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(+0x26a7f)[0x2aaabbcb3a7f] > [cgpu16:55192] [ 8] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl25cuda_internal_error_throwE9cudaErrorPKcS3_i+0x29d)[0x2aaabbcdab9d] > [cgpu16:55192] [ 9] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl16V_Update_GenericIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEEiEEvRKNT_20non_const_value_typeERKSG_RKNT0_20non_const_value_typeERKSM_RKNT1_20non_const_value_typeERKSS_iii+0x3357)[0x2aaaae7108a7] > [cgpu16:55192] [10] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl6UpdateIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEELi1ELb0ELb1EE6updateERS4_RKSD_SH_SJ_SH_RKSF_+0xc1)[0x2aaaae7171a1] > [cgpu16:55192] [11] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(_ZN10KokkosBlas6updateIN6Kokkos4ViewIPKdJNS1_9CudaSpaceEEEES6_NS2_IPdJS5_EEEEEvRKNT_20non_const_value_typeERKS9_RKNT0_20non_const_value_typeERKSF_RKNT1_20non_const_value_typeERKSL_+0x271)[0x2aaaab76d781] > [cgpu16:55192] [12] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0xa9333b)[0x2aaaab76633b] > [cgpu16:55192] [13] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(VecAXPBYPCZ+0x261)[0x2aaaab0b03c1] > [cgpu16:55192] [14] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155144e)[0x2aaaac22444e] > [cgpu16:55192] [15] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESTSFormFunction+0xa)[0x2aaaac1c9c1a] > [cgpu16:55192] [16] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESComputeFunction+0xf5)[0x2aaaac138675] > [cgpu16:55192] [17] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x14ac85e)[0x2aaaac17f85e] > [cgpu16:55192] [18] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESSolve+0x821)[0x2aaaac146651] > [cgpu16:55192] [19] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155526c)[0x2aaaac22826c] > [cgpu16:55192] [20] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSStep+0x1f5)[0x2aaaac1d6a05] > [cgpu16:55192] [21] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSSolve+0x6a5)[0x2aaaac1dc455] > [cgpu16:55192] [22] ../ex2-kok[0x4033eb] > [cgpu16:55192] [23] /lib64/libc.so.6(__libc_start_main+0xea)[0x2aab12671f8a] > [cgpu16:55192] [24] ../ex2-kok[0x404aaa] > [cgpu16:55192] *** End of error message *** > /global/homes/m/madams/mps-wrapper.sh: line 30: 55192 Aborted "$@" > 0 stopping nvidia-cuda-mps-control on cgpu16 -------------- next part -------------- An HTML attachment was scrubbed... URL: From snailsoar at hotmail.com Mon May 17 04:18:08 2021 From: snailsoar at hotmail.com (feng wang) Date: Mon, 17 May 2021 09:18:08 +0000 Subject: [petsc-users] reuse a real matrix for a second linear system with complex numbers In-Reply-To: References: , Message-ID: Hi Mat, I have tried the 1st approach you suggested, which re-configures petsc with complex numbers. For my real-number system, Ax=B, it still works fine. I would like to save a copy a A, because for my complex system (A + i*w) Z = C, the value of "w" could be a list of values, so I only need to do a MatShift for my saved copy of A to build the LHS for each "w". I am trying to use MatDuplicate to create the copy of "A" and MatCopy to copy the values of "A". the following is what I do. "petsc_A_pre" is "A" and "petsc_A_pre_copy" is its copy. ierr = MatCreateBAIJ(*A_COMM_WORLD, blocksize, nlocal*blocksize, nlocal*blocksize, PETSC_DETERMINE, PETSC_DETERMINE, maxneig, NULL, maxneig, NULL, &petsc_A_pre); CHKERRQ(ierr); ierr = MatSetOption(petsc_A_pre, MAT_STRUCTURALLY_SYMMETRIC, PETSC_TRUE); CHKERRQ(ierr); ierr = MatDuplicate(petsc_A_pre, MAT_SHARE_NONZERO_PATTERN, &petsc_A_pre_copy); CHKERRQ(ierr); //line 69 .....some operations..... ierr = MatAssemblyBegin(petsc_A_pre,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); ierr = MatAssemblyEnd(petsc_A_pre,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); ierr = MatCopy(petsc_A_pre, petsc_A_pre_copy, SAME_NONZERO_PATTERN); ierr = MatAssemblyBegin(petsc_A_pre_copy,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); ierr = MatAssemblyEnd(petsc_A_pre_copy,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); I get the following error for MatDuplicate and MatCopy: [0]PETSC ERROR: Petsc has generated inconsistent data [0]PETSC ERROR: Corrupt matrix [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown [0]PETSC ERROR: cfd on a arch-linux2-c-debug named snail by snail Mon May 17 09:34:46 2021 [0]PETSC ERROR: Configure options --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-scalar-type=complex [0]PETSC ERROR: #1 MatDuplicateNoCreate_SeqBAIJ() line 3103 in /home/feng/cfd/petsc.complex/src/mat/impls/baij/seq/baij.c [0]PETSC ERROR: #2 MatDuplicate_SeqBAIJ() line 3215 in /home/feng/cfd/petsc.complex/src/mat/impls/baij/seq/baij.c [0]PETSC ERROR: #3 MatDuplicate() line 4663 in /home/feng/cfd/petsc.complex/src/mat/interface/matrix.c [0]PETSC ERROR: #4 petsc_nk_init() line 69 in domain/cfd/petsc_nk.cpp [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on argument 2 "B" before MatCopy() [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown [0]PETSC ERROR: cfd on a arch-linux2-c-debug named feng by feng Mon May 17 09:34:46 2021 [0]PETSC ERROR: Configure options --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-scalar-type=complex [0]PETSC ERROR: #5 MatCopy() line 4087 in /home/feng/cfd/petsc.complex/src/mat/interface/matrix.c Is there a better way to do this? could you give any comments? Thanks, Feng ________________________________ From: Matthew Knepley Sent: 14 May 2021 20:26 To: feng wang Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] reuse a real matrix for a second linear system with complex numbers On Fri, May 14, 2021 at 1:36 PM feng wang > wrote: Yes, you are right. I can do row permutations to make them continuous. I will try this. could I re-use my KSP object from the 1st linear system in my 2nd system by simply changing the operators and setting new parameters? or I need a separate KSP object for the 2nd system? I tink you want 2 KSP objects. You could reuse the settings of the first, but since the system is a different size, all storage would have to be deleted and recreated anyway. Thanks, Matt Thanks, Feng ________________________________ From: Matthew Knepley > Sent: 14 May 2021 15:20 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] reuse a real matrix for a second linear system with complex numbers On Fri, May 14, 2021 at 10:36 AM feng wang > wrote: Thanks for your comments. It is very helpful! I might try the 1st approach first. For the 2nd approach which uses an equivalent real-number system, I see potential issues when running in parallel. I have re-ordered my cells to allow each rank hold continuous rows in the first real system Ax=B. For the equivalent real-number system, each rank now holds (or can assign values to) two patches of continuous rows, which are separated by N rows, N is the size of square matrix A. I can't see a straightforward way to allow each rank hold continuous rows in this case. or petsc can handle these two patches of continuous rows with fixed row index difference in this case? I just wrote it that way for ease of typing. You can imagine permuting into 2x2 blocks with /a w\ \-w a/ for each entry. Thanks, Matt By the way, could I still re-use my KSP object in my second system by simply changing the operators and setting new parameters? Thanks, Feng ________________________________ From: Matthew Knepley > Sent: 14 May 2021 10:00 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] reuse a real matrix for a second linear system with complex numbers On Fri, May 14, 2021 at 4:23 AM feng wang > wrote: Dear All, I am solving a coupled system. One system is AX=B. A, X and B are all real numbers and it is solved with GMRES in petsc. Now I need to solve a second linear system, it can be represented as (A+i*w)*Z=C. i is the imaginary unit. Z and C are also complex numbers. So the Jacobian matrix of the second system is just A plus a diagonal contribution i*w. I would like solve the second system with GMRES, could petsc handle this? any comments are welcome. Mixing real and complex numbers in the same code is somewhat difficult now. You have two obvious choices: 1) Configure for complex numbers and solve your first system as complex but with 0 imaginary part. This will work fine, but uses more memory for that system. However, since you will already use that much memory for the second system, it does not seem like a big deal to me. 2) You could solve the second system in its equivalent real form / A w \ /Zr\ = /Cr\ \ -w A / \Zi/ \Ci/ This uses more memory for the second system, but does not require reconfiguring. THanks, Matt Thanks, Feng -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 17 06:37:50 2021 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 17 May 2021 07:37:50 -0400 Subject: [petsc-users] reuse a real matrix for a second linear system with complex numbers In-Reply-To: References: Message-ID: On Mon, May 17, 2021 at 5:18 AM feng wang wrote: > Hi Mat, > > I have tried the 1st approach you suggested, which re-configures petsc > with complex numbers. For my real-number system, Ax=B, it still works fine. > I would like to save a copy a A, because for my complex system (A + i*w) Z > = C, the value of "w" could be a list of values, so I only need to do a > MatShift for my saved copy of A to build the LHS for each "w". > > I am trying to use MatDuplicate to create the copy of "A" and MatCopy to > copy the values of "A". the following is what I do. "petsc_A_pre" is "A" > and "petsc_A_pre_copy" is its copy. > > ierr = MatCreateBAIJ(*A_COMM_WORLD, blocksize, nlocal*blocksize, > nlocal*blocksize, PETSC_DETERMINE, PETSC_DETERMINE, > maxneig, NULL, maxneig, NULL, &petsc_A_pre); > CHKERRQ(ierr); > ierr = MatSetOption(petsc_A_pre, MAT_STRUCTURALLY_SYMMETRIC, > PETSC_TRUE); CHKERRQ(ierr); > ierr = MatDuplicate(petsc_A_pre, MAT_SHARE_NONZERO_PATTERN, > &petsc_A_pre_copy); CHKERRQ(ierr); *//line 69* > > .....some operations..... > > ierr = > MatAssemblyBegin(petsc_A_pre,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); > ierr = > MatAssemblyEnd(petsc_A_pre,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); > > ierr = MatCopy(petsc_A_pre, petsc_A_pre_copy, > SAME_NONZERO_PATTERN); > ierr = > MatAssemblyBegin(petsc_A_pre_copy,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); > ierr = > MatAssemblyEnd(petsc_A_pre_copy,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); > > > I get the following error for MatDuplicate and MatCopy: > > [0]PETSC ERROR: Petsc has generated inconsistent data > [0]PETSC ERROR: Corrupt matrix > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown > [0]PETSC ERROR: cfd on a arch-linux2-c-debug named snail by snail Mon May > 17 09:34:46 2021 > [0]PETSC ERROR: Configure options --with-cc=mpicc --with-cxx=mpicxx > --with-fc=0 --with-scalar-type=complex > [0]PETSC ERROR: #1 MatDuplicateNoCreate_SeqBAIJ() line 3103 in > /home/feng/cfd/petsc.complex/src/mat/impls/baij/seq/baij.c > [0]PETSC ERROR: #2 MatDuplicate_SeqBAIJ() line 3215 in > /home/feng/cfd/petsc.complex/src/mat/impls/baij/seq/baij.c > [0]PETSC ERROR: #3 MatDuplicate() line 4663 in > /home/feng/cfd/petsc.complex/src/mat/interface/matrix.c > [0]PETSC ERROR: #4 petsc_nk_init() line 69 in domain/cfd/petsc_nk.cpp > Your matrix is actually corrupt. We have to fix this. It is either a PETSc bug or a bug in your code. Would you be able to send us a stripped down version of your code (say one that puts in all zeros, or similarly simplified), so we can track down where this is? Alternatively, you can run under valgrind, since this looks like a memory overwrite. Thanks, Matt > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on > argument 2 "B" before MatCopy() > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown > [0]PETSC ERROR: cfd on a arch-linux2-c-debug named feng by feng Mon May 17 > 09:34:46 2021 > [0]PETSC ERROR: Configure options --with-cc=mpicc --with-cxx=mpicxx > --with-fc=0 --with-scalar-type=complex > [0]PETSC ERROR: #5 MatCopy() line 4087 in > /home/feng/cfd/petsc.complex/src/mat/interface/matrix.c > > Is there a better way to do this? could you give any comments? > > Thanks, > Feng > > ------------------------------ > *From:* Matthew Knepley > *Sent:* 14 May 2021 20:26 > *To:* feng wang > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] reuse a real matrix for a second linear > system with complex numbers > > On Fri, May 14, 2021 at 1:36 PM feng wang wrote: > > Yes, you are right. I can do row permutations to make them continuous. I > will try this. > > could I re-use my KSP object from the 1st linear system in my 2nd system > by simply changing the operators and setting new parameters? or I need a > separate KSP object for the 2nd system? > > > I tink you want 2 KSP objects. You could reuse the settings of the first, > but since the system is a different size, all storage would have to be > deleted and recreated anyway. > > Thanks, > > Matt > > > Thanks, > Feng > > ------------------------------ > *From:* Matthew Knepley > *Sent:* 14 May 2021 15:20 > *To:* feng wang > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] reuse a real matrix for a second linear > system with complex numbers > > On Fri, May 14, 2021 at 10:36 AM feng wang wrote: > > Thanks for your comments. It is very helpful! > > I might try the 1st approach first. For the 2nd approach which uses an > equivalent real-number system, I see potential issues when running in > parallel. I have re-ordered my cells to allow each rank hold continuous > rows in the first real system Ax=B. For the equivalent real-number system, > each rank now holds (or can assign values to) two patches of continuous > rows, which are separated by N rows, N is the size of square matrix A. I > can't see a straightforward way to allow each rank hold continuous rows in > this case. or petsc can handle these two patches of continuous rows with > fixed row index difference in this case? > > > I just wrote it that way for ease of typing. You can imagine permuting > into 2x2 blocks with > > /a w\ > \-w a/ > > for each entry. > > Thanks, > > Matt > > > By the way, could I still re-use my KSP object in my second system by > simply changing the operators and setting new parameters? > > Thanks, > Feng > > ------------------------------ > *From:* Matthew Knepley > *Sent:* 14 May 2021 10:00 > *To:* feng wang > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] reuse a real matrix for a second linear > system with complex numbers > > On Fri, May 14, 2021 at 4:23 AM feng wang wrote: > > Dear All, > > I am solving a coupled system. One system is AX=B. A, X and B are all real > numbers and it is solved with GMRES in petsc. Now I need to solve a second > linear system, it can be represented as (A+i*w)*Z=C. i is the imaginary > unit. Z and C are also complex numbers. > > So the Jacobian matrix of the second system is just A plus a diagonal > contribution i*w. I would like solve the second system with GMRES, could > petsc handle this? any comments are welcome. > > > Mixing real and complex numbers in the same code is somewhat difficult > now. You have two obvious choices: > > 1) Configure for complex numbers and solve your first system as complex > but with 0 imaginary part. This will work fine, but uses more memory for > that system. However, since you will already > use that much memory for the second system, it does not seem like a > big deal to me. > > 2) You could solve the second system in its equivalent real form > > / A w \ /Zr\ = /Cr\ > \ -w A / \Zi/ \Ci/ > > This uses more memory for the second system, but does not require > reconfiguring. > > THanks, > > Matt > > Thanks, > Feng > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Mon May 17 07:24:00 2021 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 17 May 2021 08:24:00 -0400 Subject: [petsc-users] configure error In-Reply-To: References: <22289CC9-0240-4046-B42E-8CB80F6DD9BB@petsc.dev> <2ADD8B67-6C33-47A7-824F-5F1C15937DD5@petsc.dev> <16A9657D-0FCE-4BEA-954B-06F4127F4145@petsc.dev> Message-ID: I thought I did a clean make but I made a clean one now and it seems to be working now. Also, I am trying to fix this error message that I get on Cori with 'make check'. I set mpiexec='srun -G 2 -c 20' and get an interactive shell with these parameters, but I get error messages on Kokkos: Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes See http://www.mcs.anl.gov/petsc/documentation/faq.html *srun: error: Unable to create step for job 1923618: More processors requested than permitted*C/C++ example src/snes/tutorials/ex19 run successfully with cuda gmake[3]: [makefile:102: runex3k_kokkos] Error 1 (ignored) 1,25c1 < atol=1e-50, rtol=1e-08, stol=1e-08, maxit=50, maxf=10000 < Vec Object: Exact Solution 2 MPI processes < type: mpikokkos < Process [0] < 0. < 0.015625 < 0.125 < Process [1] < 0.421875 < 1. < Vec Object: Forcing function 2 MPI processes < type: mpikokkos < Process [0] < 1e-72 < 1.50024 < 3.01563 < Process [1] < 4.67798 < 7. < 0 SNES Function norm 5.414682427127e+00 < 1 SNES Function norm 2.952582418265e-01 < 2 SNES Function norm 4.502293658739e-04 < 3 SNES Function norm 1.389665806646e-09 < Number of SNES iterations = 3 < Norm of error 1.49752e-10 Iterations 3 --- *> srun: error: Unable to create step for job 1923618: More processors requested than permitted*/global/homes/m/madams/petsc/src/snes/tutorials Possible problem with ex3k running with kokkos-kernels, diffs above ========================================= Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process Completed test examples On Sun, May 16, 2021 at 11:14 PM Barry Smith wrote: > > Could still be a gencode arch issue. Is it possible that Kokkos was built > with the 80 arch and when you reran configure with 70 it did not rebuild > Kokkos because it didn't know it needed to? > > Sorry, but this may require another rm -rf arch* and running ./configure > again. > > > https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg3f51e3575c2178246db0a94a430e0038b6af535e7e53d3f21e2437e8977b8c2e > > > cudaErrorInvalidDeviceFunction = 98The requested device function does not > exist or is not compiled for the proper device architecture. > > > > On May 16, 2021, at 9:09 PM, Mark Adams wrote: > > I now get this error. A blas error from VecAXPBYPCZ ... > Any ideas? > > > terminate called after throwing an instance of 'std::runtime_error' > what(): cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func()) > error( cudaErrorInvalidDeviceFunction): invalid device function > /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp:654 > Traceback functionality not available > > [cgpu16:55192] *** Process received signal *** > [cgpu16:55192] Signal: Aborted (6) > [cgpu16:55192] Signal code: (-6) > [cgpu16:55192] [ 0] /lib64/libpthread.so.0(+0x12360)[0x2aab12445360] > [cgpu16:55192] [ 1] /lib64/libc.so.6(gsignal+0x110)[0x2aab12687160] > [cgpu16:55192] [ 2] /lib64/libc.so.6(abort+0x151)[0x2aab12688741] > [cgpu16:55192] [ 3] > /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x93e83)[0x2aab10cb0e83] > [cgpu16:55192] [ 4] > /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99de6)[0x2aab10cb6de6] > [cgpu16:55192] [ 5] > /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99e21)[0x2aab10cb6e21] > [cgpu16:55192] [ 6] > /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x9a053)[0x2aab10cb7053] > [cgpu16:55192] [ 7] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(+0x26a7f)[0x2aaabbcb3a7f] > [cgpu16:55192] [ 8] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl25cuda_internal_error_throwE9cudaErrorPKcS3_i+0x29d)[0x2aaabbcdab9d] > [cgpu16:55192] [ 9] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl16V_Update_GenericIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEEiEEvRKNT_20non_const_value_typeERKSG_RKNT0_20non_const_value_typeERKSM_RKNT1_20non_const_value_typeERKSS_iii+0x3357)[0x2aaaae7108a7] > [cgpu16:55192] [10] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl6UpdateIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEELi1ELb0ELb1EE6updateERS4_RKSD_SH_SJ_SH_RKSF_+0xc1)[0x2aaaae7171a1] > [cgpu16:55192] [11] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(_ZN10KokkosBlas6updateIN6Kokkos4ViewIPKdJNS1_9CudaSpaceEEEES6_NS2_IPdJS5_EEEEEvRKNT_20non_const_value_typeERKS9_RKNT0_20non_const_value_typeERKSF_RKNT1_20non_const_value_typeERKSL_+0x271)[0x2aaaab76d781] > [cgpu16:55192] [12] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0xa9333b)[0x2aaaab76633b] > [cgpu16:55192] [13] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(VecAXPBYPCZ+0x261)[0x2aaaab0b03c1] > [cgpu16:55192] [14] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155144e)[0x2aaaac22444e] > [cgpu16:55192] [15] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESTSFormFunction+0xa)[0x2aaaac1c9c1a] > [cgpu16:55192] [16] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESComputeFunction+0xf5)[0x2aaaac138675] > [cgpu16:55192] [17] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x14ac85e)[0x2aaaac17f85e] > [cgpu16:55192] [18] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESSolve+0x821)[0x2aaaac146651] > [cgpu16:55192] [19] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155526c)[0x2aaaac22826c] > [cgpu16:55192] [20] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSStep+0x1f5)[0x2aaaac1d6a05] > [cgpu16:55192] [21] > /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSSolve+0x6a5)[0x2aaaac1dc455] > [cgpu16:55192] [22] ../ex2-kok[0x4033eb] > [cgpu16:55192] [23] > /lib64/libc.so.6(__libc_start_main+0xea)[0x2aab12671f8a] > [cgpu16:55192] [24] ../ex2-kok[0x404aaa] > [cgpu16:55192] *** End of error message *** > /global/homes/m/madams/mps-wrapper.sh: line 30: 55192 Aborted > "$@" > 0 stopping nvidia-cuda-mps-control on cgpu16 > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From snailsoar at hotmail.com Mon May 17 09:30:49 2021 From: snailsoar at hotmail.com (feng wang) Date: Mon, 17 May 2021 14:30:49 +0000 Subject: [petsc-users] reuse a real matrix for a second linear system with complex numbers In-Reply-To: References: , Message-ID: Thanks for your reply. Ax=B (A + i*w) Z = C I have been using the matrix A for my real-number system, it is ok, so I believe the matrix "A" should not be corrupted. It is a bit strange that MatDuplicate and MatCopy complain. I am quite puzzled by this. my code is a big library, it would take some effort to make a stripped-down version, I can have a try. Maybe I don't actually need to make a copy of A. I can directly work on it to add the diagonal contribution i*w and then call KSPSolve. For a different "w", I can undo the previous diagonal contribution and add the new one and so on. so I only need the memory space of "A" in the end for both systems. I've tried this for one "w" and it seems working for me. I am relying on the fact that "A" (its values and structure) is untouched by "KSPSolve" , am I correct on this or I am being lucky? Thanks, feng ________________________________ From: Matthew Knepley Sent: 17 May 2021 11:37 To: feng wang Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] reuse a real matrix for a second linear system with complex numbers On Mon, May 17, 2021 at 5:18 AM feng wang > wrote: Hi Mat, I have tried the 1st approach you suggested, which re-configures petsc with complex numbers. For my real-number system, Ax=B, it still works fine. I would like to save a copy a A, because for my complex system (A + i*w) Z = C, the value of "w" could be a list of values, so I only need to do a MatShift for my saved copy of A to build the LHS for each "w". I am trying to use MatDuplicate to create the copy of "A" and MatCopy to copy the values of "A". the following is what I do. "petsc_A_pre" is "A" and "petsc_A_pre_copy" is its copy. ierr = MatCreateBAIJ(*A_COMM_WORLD, blocksize, nlocal*blocksize, nlocal*blocksize, PETSC_DETERMINE, PETSC_DETERMINE, maxneig, NULL, maxneig, NULL, &petsc_A_pre); CHKERRQ(ierr); ierr = MatSetOption(petsc_A_pre, MAT_STRUCTURALLY_SYMMETRIC, PETSC_TRUE); CHKERRQ(ierr); ierr = MatDuplicate(petsc_A_pre, MAT_SHARE_NONZERO_PATTERN, &petsc_A_pre_copy); CHKERRQ(ierr); //line 69 .....some operations..... ierr = MatAssemblyBegin(petsc_A_pre,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); ierr = MatAssemblyEnd(petsc_A_pre,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); ierr = MatCopy(petsc_A_pre, petsc_A_pre_copy, SAME_NONZERO_PATTERN); ierr = MatAssemblyBegin(petsc_A_pre_copy,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); ierr = MatAssemblyEnd(petsc_A_pre_copy,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); I get the following error for MatDuplicate and MatCopy: [0]PETSC ERROR: Petsc has generated inconsistent data [0]PETSC ERROR: Corrupt matrix [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown [0]PETSC ERROR: cfd on a arch-linux2-c-debug named snail by snail Mon May 17 09:34:46 2021 [0]PETSC ERROR: Configure options --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-scalar-type=complex [0]PETSC ERROR: #1 MatDuplicateNoCreate_SeqBAIJ() line 3103 in /home/feng/cfd/petsc.complex/src/mat/impls/baij/seq/baij.c [0]PETSC ERROR: #2 MatDuplicate_SeqBAIJ() line 3215 in /home/feng/cfd/petsc.complex/src/mat/impls/baij/seq/baij.c [0]PETSC ERROR: #3 MatDuplicate() line 4663 in /home/feng/cfd/petsc.complex/src/mat/interface/matrix.c [0]PETSC ERROR: #4 petsc_nk_init() line 69 in domain/cfd/petsc_nk.cpp Your matrix is actually corrupt. We have to fix this. It is either a PETSc bug or a bug in your code. Would you be able to send us a stripped down version of your code (say one that puts in all zeros, or similarly simplified), so we can track down where this is? Alternatively, you can run under valgrind, since this looks like a memory overwrite. Thanks, Matt [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Must call MatXXXSetPreallocation() or MatSetUp() on argument 2 "B" before MatCopy() [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown [0]PETSC ERROR: cfd on a arch-linux2-c-debug named feng by feng Mon May 17 09:34:46 2021 [0]PETSC ERROR: Configure options --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-scalar-type=complex [0]PETSC ERROR: #5 MatCopy() line 4087 in /home/feng/cfd/petsc.complex/src/mat/interface/matrix.c Is there a better way to do this? could you give any comments? Thanks, Feng ________________________________ From: Matthew Knepley > Sent: 14 May 2021 20:26 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] reuse a real matrix for a second linear system with complex numbers On Fri, May 14, 2021 at 1:36 PM feng wang > wrote: Yes, you are right. I can do row permutations to make them continuous. I will try this. could I re-use my KSP object from the 1st linear system in my 2nd system by simply changing the operators and setting new parameters? or I need a separate KSP object for the 2nd system? I tink you want 2 KSP objects. You could reuse the settings of the first, but since the system is a different size, all storage would have to be deleted and recreated anyway. Thanks, Matt Thanks, Feng ________________________________ From: Matthew Knepley > Sent: 14 May 2021 15:20 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] reuse a real matrix for a second linear system with complex numbers On Fri, May 14, 2021 at 10:36 AM feng wang > wrote: Thanks for your comments. It is very helpful! I might try the 1st approach first. For the 2nd approach which uses an equivalent real-number system, I see potential issues when running in parallel. I have re-ordered my cells to allow each rank hold continuous rows in the first real system Ax=B. For the equivalent real-number system, each rank now holds (or can assign values to) two patches of continuous rows, which are separated by N rows, N is the size of square matrix A. I can't see a straightforward way to allow each rank hold continuous rows in this case. or petsc can handle these two patches of continuous rows with fixed row index difference in this case? I just wrote it that way for ease of typing. You can imagine permuting into 2x2 blocks with /a w\ \-w a/ for each entry. Thanks, Matt By the way, could I still re-use my KSP object in my second system by simply changing the operators and setting new parameters? Thanks, Feng ________________________________ From: Matthew Knepley > Sent: 14 May 2021 10:00 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] reuse a real matrix for a second linear system with complex numbers On Fri, May 14, 2021 at 4:23 AM feng wang > wrote: Dear All, I am solving a coupled system. One system is AX=B. A, X and B are all real numbers and it is solved with GMRES in petsc. Now I need to solve a second linear system, it can be represented as (A+i*w)*Z=C. i is the imaginary unit. Z and C are also complex numbers. So the Jacobian matrix of the second system is just A plus a diagonal contribution i*w. I would like solve the second system with GMRES, could petsc handle this? any comments are welcome. Mixing real and complex numbers in the same code is somewhat difficult now. You have two obvious choices: 1) Configure for complex numbers and solve your first system as complex but with 0 imaginary part. This will work fine, but uses more memory for that system. However, since you will already use that much memory for the second system, it does not seem like a big deal to me. 2) You could solve the second system in its equivalent real form / A w \ /Zr\ = /Cr\ \ -w A / \Zi/ \Ci/ This uses more memory for the second system, but does not require reconfiguring. THanks, Matt Thanks, Feng -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibault.bridelbertomeu at gmail.com Tue May 18 04:18:51 2021 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Tue, 18 May 2021 11:18:51 +0200 Subject: [petsc-users] DMPlex from periodic Gmsh and ghost cells Message-ID: Dear all, I am playing around with creating DMPlex from a periodic Gmsh mesh (still in the same finite volume code that solves the Euler equations) because I want to run some classical periodic test cases from the literature (like convecting a vortex and so on). I got from the mailing list and the examples that I gotta use -dm_plex_gmsh_periodic for the reader and the DM creator to do their job properly and read the $Periodic section from the Gmsh mesh. That works fine, I can write the DM in HDF5 format afterwards and the periodicity is clearly there (i can see in Paraview that opposite sides are "attached" to each other). Now, in the context of the finite volume solver, I wrote my one right-hand side term computing routine, and I rely heavily on routines such as DMPlexGetFaceFields. Those routines, in turn rely on the ghost cells and ... I am having an issue with DMPlexConstructGhostCells. When I use -dm_plex_gmsh_periodic, DMPlexCreateFromFile works fine, DMSetBasicAdjacency(dm, true, false) works fine, DMPlexDistribute works fine, but when I get to DMPlexConstructGhostCells, I get "DM has boundary face 2048 with 2 support cells" I understand the DM begin periodic, even the "boundary" faces have two neighbors because of the topological folding, so ... yeah, the ghost cells cannot really exist. But then how can I proceed ? Because I need that "ghost" label for all the routines I call from the RHS computing routine ... I attach to this email the periodic Gmsh mesh I am playing with, and the routine from my code where I create the DM and so on: subroutine initmesh PetscErrorCode :: ierr DM :: dmDist, dmGhost integer :: overlap PetscViewer :: vtkViewer call PetscLogEventBegin(timer_initmesh, ierr); CHKERRA(ierr) ! Number of neighbours taken into account in MP communications(1 - Order 1; 2 - Order 2) overlap = 1 call PetscPrintf(PETSC_COMM_WORLD, "Initializing mesh...\n", ierr) ; CHKERRA(ierr) ! Force DMPlex to use gmsh marker ! call PetscOptionsSetValue(PETSC_NULL_OPTIONS, "-dm_plex_gmsh_use_marker", "true", ierr); CHKERRA(ierr) ! Read mesh from file name 'meshname' call DMPlexCreateFromFile(PETSC_COMM_WORLD, meshname, PETSC_TRUE, dm, ierr); CHKERRA(ierr) ! Distribute on processors ! Start with connectivity call DMSetBasicAdjacency(dm, PETSC_TRUE, PETSC_FALSE, ierr) ; CHKERRA(ierr) ! Distribute on processors call DMPlexDistribute(dm, overlap, PETSC_NULL_SF, dmDist, ierr) ; CHKERRA(ierr) ! Security check if (dmDist /= PETSC_NULL_DM) then ! Destroy previous dm call DMDestroy(dm, ierr) ; CHKERRA(ierr) ! Replace with dmDist dm = dmDist end if ! Finalize setup of the object call DMSetFromOptions(dm, ierr) ; CHKERRA(ierr) ! Boundary condition with ghost cells call DMPlexConstructGhostCells(dm, PETSC_NULL_CHARACTER, PETSC_NULL_INTEGER, dmGhost, ierr); CHKERRA(ierr) ! Security check if (dmGhost /= PETSC_NULL_DM) then ! Destroy previous dm call DMDestroy(dm, ierr) ; CHKERRA(ierr) ! Replace with dmGhost dm = dmGhost end if if (debug) then ! Show in terminal call PetscPrintf(PETSC_COMM_WORLD, ":: [DEBUG] Visualizing DM in console ::\n", ierr); CHKERRA(ierr) call DMView(dm, PETSC_VIEWER_STDOUT_WORLD, ierr) ; CHKERRA(ierr) ! VTK viewer call PetscViewerCreate(PETSC_COMM_WORLD, vtkViewer, ierr) ; CHKERRA(ierr) call PetscViewerSetType(vtkViewer, PETSCVIEWERHDF5, ierr) ; CHKERRA(ierr) call PetscViewerFileSetMode(vtkViewer, FILE_MODE_WRITE, ierr) ; CHKERRA(ierr) call PetscViewerFileSetName(vtkViewer, "debug_initmesh.h5", ierr) ; CHKERRA(ierr) call DMView(dm, vtkViewer, ierr) ; CHKERRA(ierr) call PetscViewerDestroy(vtkViewer, ierr) ; CHKERRA(ierr) end if call PetscPrintf(PETSC_COMM_WORLD, "Done !\n", ierr) ; CHKERRA(ierr) call PetscLogEventEnd(timer_initmesh, ierr); CHKERRA(ierr) end subroutine initmesh Thank you very much for your help ! Best regards, Thibault Bridel-Bertomeu -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: vortex.msh Type: application/octet-stream Size: 69414 bytes Desc: not available URL: From mfadams at lbl.gov Tue May 18 06:27:28 2021 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 18 May 2021 07:27:28 -0400 Subject: [petsc-users] configure error In-Reply-To: References: <22289CC9-0240-4046-B42E-8CB80F6DD9BB@petsc.dev> <2ADD8B67-6C33-47A7-824F-5F1C15937DD5@petsc.dev> <16A9657D-0FCE-4BEA-954B-06F4127F4145@petsc.dev> Message-ID: Damn, I am getting this problem on Summit and did a clean configure. I removed the Kokkos arch=70 line and added '--with-cudac-gencodearch=70', Any ideas? < Number of SNES iterations = 2 --- > Kokkos::Cuda::initialize ERROR: likely mismatch of architecture > [h50n11:35759] *** Process received signal *** > [h50n11:35759] Signal: Aborted (6) > [h50n11:35759] Signal code: (-6) > [h50n11:35759] [ 0] [0x2000000504d8] > [h50n11:35759] [ 1] /lib64/libc.so.6(abort+0x2b4)[0x200032322094] > [h50n11:35759] [ 2] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl10host_abortEPKc+0x58)[0x20000f944558] > [h50n11:35759] [ 3] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl12CudaInternal10initializeEiP11CUstream_st+0xe60)[0x20000f95c210] > [h50n11:35759] [ 4] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Cuda15impl_initializeENS0_12SelectDeviceEm+0x30)[0x20000f95c2b0] > [h50n11:35759] [ 5] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl20CudaSpaceInitializer10initializeERKNS_13InitArgumentsE+0x34)[0x20000f95c314] > [h50n11:35759] [ 6] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl16ExecSpaceManager17initialize_spacesERKNS_13InitArgumentsE+0x60)[0x20000f926aa0] > [h50n11:35759] [ 7] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_0001ef6b_00000000_6_Kokkos_Core_cpp1_ii_889c95a619initialize_backendsERKNS_13InitArgumentsE+0x2c)[0x20000f926dac] > [h50n11:35759] [ 8] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_0001ef6b_00000000_6_Kokkos_Core_cpp1_ii_889c95a619initialize_internalERKNS_13InitArgumentsE+0x2c)[0x20000f92b73c] > [h50n11:35759] [ 9] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos10initializeENS_13InitArgumentsE+0x2c)[0x20000f92d08c] > [h50n11:35759] [10] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(PetscKokkosInitializeCheck+0x1f4)[0x200000343424] > [h50n11:35759] [11] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x68b3dc)[0x20000077b3dc] > [h50n11:35759] [12] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x3119b4)[0x2000004019b4] > [h50n11:35759] [13] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x306790)[0x2000003f6790] > [h50n11:35759] [14] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x274d24)[0x200000364d24] > [h50n11:35759] [15] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(PetscSFBcastWithMemTypeBegin+0xd4)[0x200000412504] > [h50n11:35759] [16] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x329f9c)[0x200000419f9c] > [h50n11:35759] [17] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(VecScatterBegin+0x9c)[0x20000041fa8c] > [h50n11:35759] [18] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(DMGlobalToLocalBegin_DA+0x30)[0x2000010ef560] > [h50n11:35759] [19] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(DMGlobalToLocalBegin+0x290)[0x2000013314b0] > [h50n11:35759] [20] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x1513c10)[0x200001603c10] > [h50n11:35759] [21] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(SNESComputeFunction+0x164)[0x200001625584] > [h50n11:35759] [22] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x15830f4)[0x2000016730f4] > [h50n11:35759] [23] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(SNESSolve+0x814)[0x200001634c44] > [h50n11:35759] [24] ./ex19[0x10001a70] > [h50n11:35759] [25] /lib64/libc.so.6(+0x25200)[0x200032305200] > [h50n11:35759] [26] /lib64/libc.so.6(__libc_start_main+0xc4)[0x2000323053f4] > [h50n11:35759] *** End of error message *** > ERROR: One or more process (first noticed rank 0) terminated with signal 6 (core dumped) /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials On Mon, May 17, 2021 at 8:24 AM Mark Adams wrote: > I thought I did a clean make but I made a clean one now and it seems to be > working now. > > Also, I am trying to fix this error message that I get on Cori with 'make > check'. > I set mpiexec='srun -G 2 -c 20' and get an interactive shell with these > parameters, but I get error messages on Kokkos: > > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > *srun: error: Unable to create step for job 1923618: More processors > requested than permitted*C/C++ example src/snes/tutorials/ex19 run > successfully with cuda > gmake[3]: [makefile:102: runex3k_kokkos] Error 1 (ignored) > 1,25c1 > < atol=1e-50, rtol=1e-08, stol=1e-08, maxit=50, maxf=10000 > < Vec Object: Exact Solution 2 MPI processes > < type: mpikokkos > < Process [0] > < 0. > < 0.015625 > < 0.125 > < Process [1] > < 0.421875 > < 1. > < Vec Object: Forcing function 2 MPI processes > < type: mpikokkos > < Process [0] > < 1e-72 > < 1.50024 > < 3.01563 > < Process [1] > < 4.67798 > < 7. > < 0 SNES Function norm 5.414682427127e+00 > < 1 SNES Function norm 2.952582418265e-01 > < 2 SNES Function norm 4.502293658739e-04 > < 3 SNES Function norm 1.389665806646e-09 > < Number of SNES iterations = 3 > < Norm of error 1.49752e-10 Iterations 3 > --- > > *> srun: error: Unable to create step for job 1923618: More processors > requested than permitted*/global/homes/m/madams/petsc/src/snes/tutorials > Possible problem with ex3k running with kokkos-kernels, diffs above > ========================================= > Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process > Completed test examples > > On Sun, May 16, 2021 at 11:14 PM Barry Smith wrote: > >> >> Could still be a gencode arch issue. Is it possible that Kokkos was built >> with the 80 arch and when you reran configure with 70 it did not rebuild >> Kokkos because it didn't know it needed to? >> >> Sorry, but this may require another rm -rf arch* and running ./configure >> again. >> >> >> https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg3f51e3575c2178246db0a94a430e0038b6af535e7e53d3f21e2437e8977b8c2e >> >> >> cudaErrorInvalidDeviceFunction = 98The requested device function does >> not exist or is not compiled for the proper device architecture. >> >> >> >> On May 16, 2021, at 9:09 PM, Mark Adams wrote: >> >> I now get this error. A blas error from VecAXPBYPCZ ... >> Any ideas? >> >> >> terminate called after throwing an instance of 'std::runtime_error' >> what(): cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func()) >> error( cudaErrorInvalidDeviceFunction): invalid device function >> /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp:654 >> Traceback functionality not available >> >> [cgpu16:55192] *** Process received signal *** >> [cgpu16:55192] Signal: Aborted (6) >> [cgpu16:55192] Signal code: (-6) >> [cgpu16:55192] [ 0] /lib64/libpthread.so.0(+0x12360)[0x2aab12445360] >> [cgpu16:55192] [ 1] /lib64/libc.so.6(gsignal+0x110)[0x2aab12687160] >> [cgpu16:55192] [ 2] /lib64/libc.so.6(abort+0x151)[0x2aab12688741] >> [cgpu16:55192] [ 3] >> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x93e83)[0x2aab10cb0e83] >> [cgpu16:55192] [ 4] >> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99de6)[0x2aab10cb6de6] >> [cgpu16:55192] [ 5] >> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99e21)[0x2aab10cb6e21] >> [cgpu16:55192] [ 6] >> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x9a053)[0x2aab10cb7053] >> [cgpu16:55192] [ 7] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(+0x26a7f)[0x2aaabbcb3a7f] >> [cgpu16:55192] [ 8] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl25cuda_internal_error_throwE9cudaErrorPKcS3_i+0x29d)[0x2aaabbcdab9d] >> [cgpu16:55192] [ 9] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl16V_Update_GenericIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEEiEEvRKNT_20non_const_value_typeERKSG_RKNT0_20non_const_value_typeERKSM_RKNT1_20non_const_value_typeERKSS_iii+0x3357)[0x2aaaae7108a7] >> [cgpu16:55192] [10] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl6UpdateIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEELi1ELb0ELb1EE6updateERS4_RKSD_SH_SJ_SH_RKSF_+0xc1)[0x2aaaae7171a1] >> [cgpu16:55192] [11] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(_ZN10KokkosBlas6updateIN6Kokkos4ViewIPKdJNS1_9CudaSpaceEEEES6_NS2_IPdJS5_EEEEEvRKNT_20non_const_value_typeERKS9_RKNT0_20non_const_value_typeERKSF_RKNT1_20non_const_value_typeERKSL_+0x271)[0x2aaaab76d781] >> [cgpu16:55192] [12] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0xa9333b)[0x2aaaab76633b] >> [cgpu16:55192] [13] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(VecAXPBYPCZ+0x261)[0x2aaaab0b03c1] >> [cgpu16:55192] [14] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155144e)[0x2aaaac22444e] >> [cgpu16:55192] [15] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESTSFormFunction+0xa)[0x2aaaac1c9c1a] >> [cgpu16:55192] [16] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESComputeFunction+0xf5)[0x2aaaac138675] >> [cgpu16:55192] [17] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x14ac85e)[0x2aaaac17f85e] >> [cgpu16:55192] [18] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESSolve+0x821)[0x2aaaac146651] >> [cgpu16:55192] [19] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155526c)[0x2aaaac22826c] >> [cgpu16:55192] [20] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSStep+0x1f5)[0x2aaaac1d6a05] >> [cgpu16:55192] [21] >> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSSolve+0x6a5)[0x2aaaac1dc455] >> [cgpu16:55192] [22] ../ex2-kok[0x4033eb] >> [cgpu16:55192] [23] >> /lib64/libc.so.6(__libc_start_main+0xea)[0x2aab12671f8a] >> [cgpu16:55192] [24] ../ex2-kok[0x404aaa] >> [cgpu16:55192] *** End of error message *** >> /global/homes/m/madams/mps-wrapper.sh: line 30: 55192 Aborted >> "$@" >> 0 stopping nvidia-cuda-mps-control on cgpu16 >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 3271339 bytes Desc: not available URL: From niko.karin at gmail.com Tue May 18 07:18:19 2021 From: niko.karin at gmail.com (Karin&NiKo) Date: Tue, 18 May 2021 14:18:19 +0200 Subject: [petsc-users] DMPlex and Boundary facets Message-ID: Dear PETSc team, I have tried to load a test mesh available in Gmsh' s demos directory (share/doc/gmsh/demos/simple_geo/filter.geo, attached to this email) as a DMPlex. So I produced a msh4 file by doing : gmsh -3 filter.geo -o /tmp/test.msh4 Then I used src/dm/impls/plex/tutorials/ex2.c to load the mesh by doing : ./ex2 -filename /tmp/test.msh4 Unfortunately I get the error : [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: Could not determine Plex facet for Gmsh element 1268 (Plex cell 12681) The error seems to come from the fact that the msh file contains tets *and* facets *only on the Physical entities* (aka parts of the mesh boundary where the user will assign Dirichlet or Neuman conditions). If I suppress these facets by commenting the "Physical Surface" lines in the geo file and regenerating the mesh, everything is fine. But the use of these "Physical" stuff is very common in lots of finite element codes in order to assign boundary conditions. How should I do to keep these boundary groups of 2D elements (with corresponding names) ? Thanks for your help, Nicolas -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: filter.geo Type: application/octet-stream Size: 11766 bytes Desc: not available URL: From roland.richter at ntnu.no Tue May 18 09:48:31 2021 From: roland.richter at ntnu.no (Roland Richter) Date: Tue, 18 May 2021 16:48:31 +0200 Subject: [petsc-users] Efficient FFTShift-implementation for vectors/matrices Message-ID: <30147ae8-b91d-9f97-2415-f1ec6170f902@ntnu.no> Dear all, I tried to implement the function fftshift from numpy (i.e. swap the half-spaces of all axis) for row vectors in a matrix by using the following code void fft_shift(Mat &fft_matrix) { ??? ??? PetscScalar *mat_ptr; ??? ??? MatDenseGetArray (fft_matrix, &mat_ptr); ??? ??? PetscInt r_0, r_1; ??? ??? MatGetOwnershipRange(fft_matrix, &r_0, &r_1); ??? ??? PetscInt local_row_num = r_1 - r_0; ??? ??? arma::cx_mat temp_mat(local_row_num, Ntime, arma::fill::zeros); ??? ??? for(int i = 0; i < Ntime; ++i) { ??? ??? ??? const PetscInt row_shift = i * local_row_num; ??? ??? ??? for(int j = 0; j < local_row_num; ++j) { ??? ??? ??? ??? const PetscInt cur_pos = j + row_shift; ??? ??? ??? ??? if(i < (int)(Ntime / 2)) ??? ??? ??? ??? ??? temp_mat(j, i + int(Ntime / 2)) = *(mat_ptr + cur_pos); ??? ??? ??? ??? else ??? ??? ??? ??? ??? temp_mat(j, i - int(Ntime / 2)) = *(mat_ptr + cur_pos); ??? ??? ??? } ??? ??? } ??? ??? for(int i = 0; i < Ntime; ++i) { ??? ??? ??? const PetscInt row_shift = i * local_row_num; ??? ??? ??? for(int j = 0; j < local_row_num; ++j) { ??? ??? ??? ??? const PetscInt cur_pos = j + row_shift; ??? ??? ??? ??? *(mat_ptr + cur_pos) = temp_mat(j, i); ??? ??? ??? } ??? ??? } ??? ??? MatDenseRestoreArray (fft_matrix, &mat_ptr); } but I do not like the approach of having a second matrix as temporary storage space. Are there more efficient approaches possible using PETSc-functions? Thanks! Regards, Roland Richter From junchao.zhang at gmail.com Tue May 18 10:30:05 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Tue, 18 May 2021 10:30:05 -0500 Subject: [petsc-users] configure error In-Reply-To: References: <22289CC9-0240-4046-B42E-8CB80F6DD9BB@petsc.dev> <2ADD8B67-6C33-47A7-824F-5F1C15937DD5@petsc.dev> <16A9657D-0FCE-4BEA-954B-06F4127F4145@petsc.dev> Message-ID: * '--with-cuda-gencodearch=70',* --Junchao Zhang On Tue, May 18, 2021 at 6:29 AM Mark Adams wrote: > Damn, I am getting this problem on Summit and did a clean configure. > I removed the Kokkos arch=70 line and added > '--with-cudac-gencodearch=70', > > Any ideas? > > < Number of SNES iterations = 2 > --- > > Kokkos::Cuda::initialize ERROR: likely mismatch of architecture > > [h50n11:35759] *** Process received signal *** > > [h50n11:35759] Signal: Aborted (6) > > [h50n11:35759] Signal code: (-6) > > [h50n11:35759] [ 0] [0x2000000504d8] > > [h50n11:35759] [ 1] /lib64/libc.so.6(abort+0x2b4)[0x200032322094] > > [h50n11:35759] [ 2] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl10host_abortEPKc+0x58)[0x20000f944558] > > [h50n11:35759] [ 3] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl12CudaInternal10initializeEiP11CUstream_st+0xe60)[0x20000f95c210] > > [h50n11:35759] [ 4] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Cuda15impl_initializeENS0_12SelectDeviceEm+0x30)[0x20000f95c2b0] > > [h50n11:35759] [ 5] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl20CudaSpaceInitializer10initializeERKNS_13InitArgumentsE+0x34)[0x20000f95c314] > > [h50n11:35759] [ 6] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl16ExecSpaceManager17initialize_spacesERKNS_13InitArgumentsE+0x60)[0x20000f926aa0] > > [h50n11:35759] [ 7] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_0001ef6b_00000000_6_Kokkos_Core_cpp1_ii_889c95a619initialize_backendsERKNS_13InitArgumentsE+0x2c)[0x20000f926dac] > > [h50n11:35759] [ 8] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_0001ef6b_00000000_6_Kokkos_Core_cpp1_ii_889c95a619initialize_internalERKNS_13InitArgumentsE+0x2c)[0x20000f92b73c] > > [h50n11:35759] [ 9] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos10initializeENS_13InitArgumentsE+0x2c)[0x20000f92d08c] > > [h50n11:35759] [10] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(PetscKokkosInitializeCheck+0x1f4)[0x200000343424] > > [h50n11:35759] [11] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x68b3dc)[0x20000077b3dc] > > [h50n11:35759] [12] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x3119b4)[0x2000004019b4] > > [h50n11:35759] [13] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x306790)[0x2000003f6790] > > [h50n11:35759] [14] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x274d24)[0x200000364d24] > > [h50n11:35759] [15] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(PetscSFBcastWithMemTypeBegin+0xd4)[0x200000412504] > > [h50n11:35759] [16] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x329f9c)[0x200000419f9c] > > [h50n11:35759] [17] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(VecScatterBegin+0x9c)[0x20000041fa8c] > > [h50n11:35759] [18] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(DMGlobalToLocalBegin_DA+0x30)[0x2000010ef560] > > [h50n11:35759] [19] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(DMGlobalToLocalBegin+0x290)[0x2000013314b0] > > [h50n11:35759] [20] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x1513c10)[0x200001603c10] > > [h50n11:35759] [21] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(SNESComputeFunction+0x164)[0x200001625584] > > [h50n11:35759] [22] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x15830f4)[0x2000016730f4] > > [h50n11:35759] [23] > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(SNESSolve+0x814)[0x200001634c44] > > [h50n11:35759] [24] ./ex19[0x10001a70] > > [h50n11:35759] [25] /lib64/libc.so.6(+0x25200)[0x200032305200] > > [h50n11:35759] [26] > /lib64/libc.so.6(__libc_start_main+0xc4)[0x2000323053f4] > > [h50n11:35759] *** End of error message *** > > ERROR: One or more process (first noticed rank 0) terminated with > signal 6 (core dumped) > /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials > > On Mon, May 17, 2021 at 8:24 AM Mark Adams wrote: > >> I thought I did a clean make but I made a clean one now and it seems to >> be working now. >> >> Also, I am trying to fix this error message that I get on Cori with 'make >> check'. >> I set mpiexec='srun -G 2 -c 20' and get an interactive shell with these >> parameters, but I get error messages on Kokkos: >> >> Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes >> See http://www.mcs.anl.gov/petsc/documentation/faq.html >> >> *srun: error: Unable to create step for job 1923618: More processors >> requested than permitted*C/C++ example src/snes/tutorials/ex19 run >> successfully with cuda >> gmake[3]: [makefile:102: runex3k_kokkos] Error 1 (ignored) >> 1,25c1 >> < atol=1e-50, rtol=1e-08, stol=1e-08, maxit=50, maxf=10000 >> < Vec Object: Exact Solution 2 MPI processes >> < type: mpikokkos >> < Process [0] >> < 0. >> < 0.015625 >> < 0.125 >> < Process [1] >> < 0.421875 >> < 1. >> < Vec Object: Forcing function 2 MPI processes >> < type: mpikokkos >> < Process [0] >> < 1e-72 >> < 1.50024 >> < 3.01563 >> < Process [1] >> < 4.67798 >> < 7. >> < 0 SNES Function norm 5.414682427127e+00 >> < 1 SNES Function norm 2.952582418265e-01 >> < 2 SNES Function norm 4.502293658739e-04 >> < 3 SNES Function norm 1.389665806646e-09 >> < Number of SNES iterations = 3 >> < Norm of error 1.49752e-10 Iterations 3 >> --- >> >> *> srun: error: Unable to create step for job 1923618: More processors >> requested than permitted*/global/homes/m/madams/petsc/src/snes/tutorials >> Possible problem with ex3k running with kokkos-kernels, diffs above >> ========================================= >> Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI >> process >> Completed test examples >> >> On Sun, May 16, 2021 at 11:14 PM Barry Smith wrote: >> >>> >>> Could still be a gencode arch issue. Is it possible that Kokkos was >>> built with the 80 arch and when you reran configure with 70 it did not >>> rebuild Kokkos because it didn't know it needed to? >>> >>> Sorry, but this may require another rm -rf arch* and running ./configure >>> again. >>> >>> >>> https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg3f51e3575c2178246db0a94a430e0038b6af535e7e53d3f21e2437e8977b8c2e >>> >>> >>> cudaErrorInvalidDeviceFunction = 98The requested device function does >>> not exist or is not compiled for the proper device architecture. >>> >>> >>> >>> On May 16, 2021, at 9:09 PM, Mark Adams wrote: >>> >>> I now get this error. A blas error from VecAXPBYPCZ ... >>> Any ideas? >>> >>> >>> terminate called after throwing an instance of 'std::runtime_error' >>> what(): cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func()) >>> error( cudaErrorInvalidDeviceFunction): invalid device function >>> /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp:654 >>> Traceback functionality not available >>> >>> [cgpu16:55192] *** Process received signal *** >>> [cgpu16:55192] Signal: Aborted (6) >>> [cgpu16:55192] Signal code: (-6) >>> [cgpu16:55192] [ 0] /lib64/libpthread.so.0(+0x12360)[0x2aab12445360] >>> [cgpu16:55192] [ 1] /lib64/libc.so.6(gsignal+0x110)[0x2aab12687160] >>> [cgpu16:55192] [ 2] /lib64/libc.so.6(abort+0x151)[0x2aab12688741] >>> [cgpu16:55192] [ 3] >>> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x93e83)[0x2aab10cb0e83] >>> [cgpu16:55192] [ 4] >>> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99de6)[0x2aab10cb6de6] >>> [cgpu16:55192] [ 5] >>> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99e21)[0x2aab10cb6e21] >>> [cgpu16:55192] [ 6] >>> /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x9a053)[0x2aab10cb7053] >>> [cgpu16:55192] [ 7] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(+0x26a7f)[0x2aaabbcb3a7f] >>> [cgpu16:55192] [ 8] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl25cuda_internal_error_throwE9cudaErrorPKcS3_i+0x29d)[0x2aaabbcdab9d] >>> [cgpu16:55192] [ 9] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl16V_Update_GenericIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEEiEEvRKNT_20non_const_value_typeERKSG_RKNT0_20non_const_value_typeERKSM_RKNT1_20non_const_value_typeERKSS_iii+0x3357)[0x2aaaae7108a7] >>> [cgpu16:55192] [10] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl6UpdateIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEELi1ELb0ELb1EE6updateERS4_RKSD_SH_SJ_SH_RKSF_+0xc1)[0x2aaaae7171a1] >>> [cgpu16:55192] [11] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(_ZN10KokkosBlas6updateIN6Kokkos4ViewIPKdJNS1_9CudaSpaceEEEES6_NS2_IPdJS5_EEEEEvRKNT_20non_const_value_typeERKS9_RKNT0_20non_const_value_typeERKSF_RKNT1_20non_const_value_typeERKSL_+0x271)[0x2aaaab76d781] >>> [cgpu16:55192] [12] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0xa9333b)[0x2aaaab76633b] >>> [cgpu16:55192] [13] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(VecAXPBYPCZ+0x261)[0x2aaaab0b03c1] >>> [cgpu16:55192] [14] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155144e)[0x2aaaac22444e] >>> [cgpu16:55192] [15] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESTSFormFunction+0xa)[0x2aaaac1c9c1a] >>> [cgpu16:55192] [16] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESComputeFunction+0xf5)[0x2aaaac138675] >>> [cgpu16:55192] [17] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x14ac85e)[0x2aaaac17f85e] >>> [cgpu16:55192] [18] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESSolve+0x821)[0x2aaaac146651] >>> [cgpu16:55192] [19] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155526c)[0x2aaaac22826c] >>> [cgpu16:55192] [20] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSStep+0x1f5)[0x2aaaac1d6a05] >>> [cgpu16:55192] [21] >>> /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSSolve+0x6a5)[0x2aaaac1dc455] >>> [cgpu16:55192] [22] ../ex2-kok[0x4033eb] >>> [cgpu16:55192] [23] >>> /lib64/libc.so.6(__libc_start_main+0xea)[0x2aab12671f8a] >>> [cgpu16:55192] [24] ../ex2-kok[0x404aaa] >>> [cgpu16:55192] *** End of error message *** >>> /global/homes/m/madams/mps-wrapper.sh: line 30: 55192 Aborted >>> "$@" >>> 0 stopping nvidia-cuda-mps-control on cgpu16 >>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue May 18 10:46:01 2021 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 18 May 2021 11:46:01 -0400 Subject: [petsc-users] DMPlex from periodic Gmsh and ghost cells In-Reply-To: References: Message-ID: On Tue, May 18, 2021 at 5:19 AM Thibault Bridel-Bertomeu < thibault.bridelbertomeu at gmail.com> wrote: > Dear all, > > I am playing around with creating DMPlex from a periodic Gmsh mesh (still > in the same finite volume code that solves the Euler equations) because I > want to run some classical periodic test cases from the literature (like > convecting a vortex and so on). > I got from the mailing list and the examples that I gotta use > -dm_plex_gmsh_periodic for the reader and the DM creator to do their job > properly and read the $Periodic section from the Gmsh mesh. That works > fine, I can write the DM in HDF5 format afterwards and the periodicity is > clearly there (i can see in Paraview that opposite sides are "attached" to > each other). > Now, in the context of the finite volume solver, I wrote my one right-hand > side term computing routine, and I rely heavily on routines such as > DMPlexGetFaceFields. Those routines, in turn rely on the ghost cells and > ... I am having an issue with DMPlexConstructGhostCells. When I use > -dm_plex_gmsh_periodic, DMPlexCreateFromFile works fine, > DMSetBasicAdjacency(dm, true, false) works fine, DMPlexDistribute works > fine, but when I get to DMPlexConstructGhostCells, I get > > "DM has boundary face 2048 with 2 support cells" > > I understand the DM begin periodic, even the "boundary" faces have two > neighbors because of the topological folding, so ... yeah, the ghost cells > cannot really exist. But then how can I proceed ? Because I need that > "ghost" label for all the routines I call from the RHS computing routine ... > Thanks for the example. I have run this myself. It looks to me like the mesh you are providing. Has no boundary. It is doubly periodic (a torus). When I tell ConstructGhostCells() to create a label for the boundary, rather than using "Face Sets", it creates an empty label because no faces have only 1 neighbor. Thanks, Matt > I attach to this email the periodic Gmsh mesh I am playing with, and the > routine from my code where I create the DM and so on: > > subroutine initmesh > > PetscErrorCode :: ierr > DM :: dmDist, dmGhost > integer :: overlap > PetscViewer :: vtkViewer > > call PetscLogEventBegin(timer_initmesh, ierr); CHKERRA(ierr) > > ! Number of neighbours taken into account in MP communications(1 - Order > 1; 2 - Order 2) > overlap = 1 > > call PetscPrintf(PETSC_COMM_WORLD, "Initializing mesh...\n", ierr) ; > CHKERRA(ierr) > > ! Force DMPlex to use gmsh marker > ! call PetscOptionsSetValue(PETSC_NULL_OPTIONS, > "-dm_plex_gmsh_use_marker", "true", ierr); CHKERRA(ierr) > > ! Read mesh from file name 'meshname' > call DMPlexCreateFromFile(PETSC_COMM_WORLD, meshname, PETSC_TRUE, dm, > ierr); CHKERRA(ierr) > > ! Distribute on processors > ! Start with connectivity > call DMSetBasicAdjacency(dm, PETSC_TRUE, PETSC_FALSE, ierr) ; > CHKERRA(ierr) > > ! Distribute on processors > call DMPlexDistribute(dm, overlap, PETSC_NULL_SF, dmDist, ierr) ; > CHKERRA(ierr) > > ! Security check > if (dmDist /= PETSC_NULL_DM) then > ! Destroy previous dm > call DMDestroy(dm, ierr) ; CHKERRA(ierr) > ! Replace with dmDist > dm = dmDist > end if > > ! Finalize setup of the object > call DMSetFromOptions(dm, ierr) ; CHKERRA(ierr) > > ! Boundary condition with ghost cells > call DMPlexConstructGhostCells(dm, PETSC_NULL_CHARACTER, > PETSC_NULL_INTEGER, dmGhost, ierr); CHKERRA(ierr) > > ! Security check > if (dmGhost /= PETSC_NULL_DM) then > ! Destroy previous dm > call DMDestroy(dm, ierr) ; CHKERRA(ierr) > ! Replace with dmGhost > dm = dmGhost > end if > > if (debug) then > ! Show in terminal > call PetscPrintf(PETSC_COMM_WORLD, ":: [DEBUG] Visualizing DM in console > ::\n", ierr); CHKERRA(ierr) > call DMView(dm, PETSC_VIEWER_STDOUT_WORLD, ierr) ; CHKERRA(ierr) > ! VTK viewer > call PetscViewerCreate(PETSC_COMM_WORLD, vtkViewer, ierr) ; CHKERRA(ierr) > call PetscViewerSetType(vtkViewer, PETSCVIEWERHDF5, ierr) ; CHKERRA(ierr) > call PetscViewerFileSetMode(vtkViewer, FILE_MODE_WRITE, ierr) ; > CHKERRA(ierr) > call PetscViewerFileSetName(vtkViewer, "debug_initmesh.h5", ierr) ; > CHKERRA(ierr) > call DMView(dm, vtkViewer, ierr) ; CHKERRA(ierr) > call PetscViewerDestroy(vtkViewer, ierr) ; CHKERRA(ierr) > end if > > call PetscPrintf(PETSC_COMM_WORLD, "Done !\n", ierr) ; CHKERRA(ierr) > > call PetscLogEventEnd(timer_initmesh, ierr); CHKERRA(ierr) > > end subroutine initmesh > > Thank you very much for your help ! > > Best regards, > > Thibault Bridel-Bertomeu > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue May 18 10:49:37 2021 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 18 May 2021 11:49:37 -0400 Subject: [petsc-users] DMPlex and Boundary facets In-Reply-To: References: Message-ID: On Tue, May 18, 2021 at 8:18 AM Karin&NiKo wrote: > Dear PETSc team, > > I have tried to load a test mesh available in Gmsh' s demos directory > (share/doc/gmsh/demos/simple_geo/filter.geo, attached to this email) as a > DMPlex. > So I produced a msh4 file by doing : > gmsh -3 filter.geo -o /tmp/test.msh4 > Then I used src/dm/impls/plex/tutorials/ex2.c to load the mesh by doing : > ./ex2 -filename /tmp/test.msh4 > > Unfortunately I get the error : > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: No support for this operation for this object type > [0]PETSC ERROR: Could not determine Plex facet for Gmsh element 1268 (Plex > cell 12681) > > The error seems to come from the fact that the msh file contains > tets *and* facets *only on the Physical entities* (aka parts of the mesh > boundary where > the user will assign Dirichlet or Neuman conditions). > If I suppress these facets by commenting the "Physical Surface" lines in > the geo file and regenerating the mesh, everything is fine. > > But the use of these "Physical" stuff is very common in lots of finite > element codes in order to assign boundary conditions. > How should I do to keep these boundary groups of 2D elements (with > corresponding names) ? > Can you also send the *.msh file? I do not have Gmsh on this machine. Thanks, Matt > Thanks for your help, > Nicolas > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibault.bridelbertomeu at gmail.com Tue May 18 11:16:12 2021 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Tue, 18 May 2021 18:16:12 +0200 Subject: [petsc-users] DMPlex from periodic Gmsh and ghost cells In-Reply-To: References: Message-ID: Hi Matthew, Thank you very much for your quick answer, as always ! Le mar. 18 mai 2021 ? 17:46, Matthew Knepley a ?crit : > On Tue, May 18, 2021 at 5:19 AM Thibault Bridel-Bertomeu < > thibault.bridelbertomeu at gmail.com> wrote: > >> Dear all, >> >> I am playing around with creating DMPlex from a periodic Gmsh mesh (still >> in the same finite volume code that solves the Euler equations) because I >> want to run some classical periodic test cases from the literature (like >> convecting a vortex and so on). >> I got from the mailing list and the examples that I gotta use >> -dm_plex_gmsh_periodic for the reader and the DM creator to do their job >> properly and read the $Periodic section from the Gmsh mesh. That works >> fine, I can write the DM in HDF5 format afterwards and the periodicity is >> clearly there (i can see in Paraview that opposite sides are "attached" to >> each other). >> Now, in the context of the finite volume solver, I wrote my one >> right-hand side term computing routine, and I rely heavily on routines such >> as DMPlexGetFaceFields. Those routines, in turn rely on the ghost cells and >> ... I am having an issue with DMPlexConstructGhostCells. When I use >> -dm_plex_gmsh_periodic, DMPlexCreateFromFile works fine, >> DMSetBasicAdjacency(dm, true, false) works fine, DMPlexDistribute works >> fine, but when I get to DMPlexConstructGhostCells, I get >> >> "DM has boundary face 2048 with 2 support cells" >> >> I understand the DM begin periodic, even the "boundary" faces have two >> neighbors because of the topological folding, so ... yeah, the ghost cells >> cannot really exist. But then how can I proceed ? Because I need that >> "ghost" label for all the routines I call from the RHS computing routine ... >> > > Thanks for the example. I have run this myself. It looks to me like the > mesh you are providing. Has no boundary. It is doubly periodic (a torus). > When I tell ConstructGhostCells() to > create a label for the boundary, rather than using "Face Sets", it creates > an empty label because no faces have only 1 neighbor. > Yes indeed, up is down and right is left for this mesh, the idea being of emulating an infinite medium. So you manage to get through ConstructGhostCells without a segfault ? What did you specify instead of PETSC_NULL_CHARACTER ? Also, does it still create a ? ghost ? label even if it?s empty ? Do you think that will hold for later uses in the GetFaceFields and similar routines ? I read through the source and it seems it won?t like it much if there are no ghost at all, but since I did not pass the ConstructGhostCell step I?m not sure. Thanks ! Thibault > Thanks, > > Matt > > >> I attach to this email the periodic Gmsh mesh I am playing with, and the >> routine from my code where I create the DM and so on: >> >> subroutine initmesh >> >> PetscErrorCode :: ierr >> DM :: dmDist, dmGhost >> integer :: overlap >> PetscViewer :: vtkViewer >> >> call PetscLogEventBegin(timer_initmesh, ierr); CHKERRA(ierr) >> >> ! Number of neighbours taken into account in MP communications(1 - Order >> 1; 2 - Order 2) >> overlap = 1 >> >> call PetscPrintf(PETSC_COMM_WORLD, "Initializing mesh...\n", ierr) ; >> CHKERRA(ierr) >> >> ! Force DMPlex to use gmsh marker >> ! call PetscOptionsSetValue(PETSC_NULL_OPTIONS, >> "-dm_plex_gmsh_use_marker", "true", ierr); CHKERRA(ierr) >> >> ! Read mesh from file name 'meshname' >> call DMPlexCreateFromFile(PETSC_COMM_WORLD, meshname, PETSC_TRUE, dm, >> ierr); CHKERRA(ierr) >> >> ! Distribute on processors >> ! Start with connectivity >> call DMSetBasicAdjacency(dm, PETSC_TRUE, PETSC_FALSE, ierr) ; >> CHKERRA(ierr) >> >> ! Distribute on processors >> call DMPlexDistribute(dm, overlap, PETSC_NULL_SF, dmDist, ierr) ; >> CHKERRA(ierr) >> >> ! Security check >> if (dmDist /= PETSC_NULL_DM) then >> ! Destroy previous dm >> call DMDestroy(dm, ierr) ; CHKERRA(ierr) >> ! Replace with dmDist >> dm = dmDist >> end if >> >> ! Finalize setup of the object >> call DMSetFromOptions(dm, ierr) ; CHKERRA(ierr) >> >> ! Boundary condition with ghost cells >> call DMPlexConstructGhostCells(dm, PETSC_NULL_CHARACTER, >> PETSC_NULL_INTEGER, dmGhost, ierr); CHKERRA(ierr) >> >> ! Security check >> if (dmGhost /= PETSC_NULL_DM) then >> ! Destroy previous dm >> call DMDestroy(dm, ierr) ; CHKERRA(ierr) >> ! Replace with dmGhost >> dm = dmGhost >> end if >> >> if (debug) then >> ! Show in terminal >> call PetscPrintf(PETSC_COMM_WORLD, ":: [DEBUG] Visualizing DM in console >> ::\n", ierr); CHKERRA(ierr) >> call DMView(dm, PETSC_VIEWER_STDOUT_WORLD, ierr) ; CHKERRA(ierr) >> ! VTK viewer >> call PetscViewerCreate(PETSC_COMM_WORLD, vtkViewer, ierr) ; CHKERRA(ierr) >> call PetscViewerSetType(vtkViewer, PETSCVIEWERHDF5, ierr) ; CHKERRA(ierr) >> call PetscViewerFileSetMode(vtkViewer, FILE_MODE_WRITE, ierr) ; >> CHKERRA(ierr) >> call PetscViewerFileSetName(vtkViewer, "debug_initmesh.h5", ierr) ; >> CHKERRA(ierr) >> call DMView(dm, vtkViewer, ierr) ; CHKERRA(ierr) >> call PetscViewerDestroy(vtkViewer, ierr) ; CHKERRA(ierr) >> end if >> >> call PetscPrintf(PETSC_COMM_WORLD, "Done !\n", ierr) ; CHKERRA(ierr) >> >> call PetscLogEventEnd(timer_initmesh, ierr); CHKERRA(ierr) >> >> end subroutine initmesh >> >> Thank you very much for your help ! >> >> Best regards, >> >> Thibault Bridel-Bertomeu >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- Thibault Bridel-Bertomeu ? Eng, MSc, PhD Research Engineer CEA/CESTA 33114 LE BARP Tel.: (+33)557046924 Mob.: (+33)611025322 Mail: thibault.bridelbertomeu at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From niko.karin at gmail.com Tue May 18 11:25:02 2021 From: niko.karin at gmail.com (Karin&NiKo) Date: Tue, 18 May 2021 18:25:02 +0200 Subject: [petsc-users] DMPlex and Boundary facets In-Reply-To: References: Message-ID: Sure! I send you both, one with the facets and one without. Thanks, Nicolas Le mar. 18 mai 2021 ? 17:49, Matthew Knepley a ?crit : > On Tue, May 18, 2021 at 8:18 AM Karin&NiKo wrote: > >> Dear PETSc team, >> >> I have tried to load a test mesh available in Gmsh' s demos directory >> (share/doc/gmsh/demos/simple_geo/filter.geo, attached to this email) as a >> DMPlex. >> So I produced a msh4 file by doing : >> gmsh -3 filter.geo -o /tmp/test.msh4 >> Then I used src/dm/impls/plex/tutorials/ex2.c to load the mesh by doing : >> ./ex2 -filename /tmp/test.msh4 >> >> Unfortunately I get the error : >> >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: No support for this operation for this object type >> [0]PETSC ERROR: Could not determine Plex facet for Gmsh element 1268 >> (Plex cell 12681) >> >> The error seems to come from the fact that the msh file contains >> tets *and* facets *only on the Physical entities* (aka parts of the mesh >> boundary where >> the user will assign Dirichlet or Neuman conditions). >> If I suppress these facets by commenting the "Physical Surface" lines in >> the geo file and regenerating the mesh, everything is fine. >> >> But the use of these "Physical" stuff is very common in lots of finite >> element codes in order to assign boundary conditions. >> How should I do to keep these boundary groups of 2D elements (with >> corresponding names) ? >> > > Can you also send the *.msh file? I do not have Gmsh on this machine. > > Thanks, > > Matt > > >> Thanks for your help, >> Nicolas >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_without_facets.msh4 Type: application/octet-stream Size: 443883 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_with_facets.msh4 Type: application/octet-stream Size: 474595 bytes Desc: not available URL: From bsmith at petsc.dev Tue May 18 11:44:36 2021 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 18 May 2021 11:44:36 -0500 Subject: [petsc-users] Efficient FFTShift-implementation for vectors/matrices In-Reply-To: <30147ae8-b91d-9f97-2415-f1ec6170f902@ntnu.no> References: <30147ae8-b91d-9f97-2415-f1ec6170f902@ntnu.no> Message-ID: <83012843-A364-4CE1-A92B-1768EB11591E@petsc.dev> I found a variety of things on the web, below. I don't understand this but for the even case it seems one simply modifies the input matrix before the FFT http://www.fftw.org/faq/section3.html#centerorigin https://stackoverflow.com/questions/5915125/fftshift-ifftshift-c-c-source-code https://www.dsprelated.com/showthread/comp.dsp/20790-1.php > On May 18, 2021, at 9:48 AM, Roland Richter wrote: > > Dear all, > > I tried to implement the function fftshift from numpy (i.e. swap the > half-spaces of all axis) for row vectors in a matrix by using the > following code > > void fft_shift(Mat &fft_matrix) { > PetscScalar *mat_ptr; > MatDenseGetArray (fft_matrix, &mat_ptr); > PetscInt r_0, r_1; > MatGetOwnershipRange(fft_matrix, &r_0, &r_1); > PetscInt local_row_num = r_1 - r_0; > arma::cx_mat temp_mat(local_row_num, Ntime, arma::fill::zeros); > for(int i = 0; i < Ntime; ++i) { > const PetscInt row_shift = i * local_row_num; > for(int j = 0; j < local_row_num; ++j) { > const PetscInt cur_pos = j + row_shift; > if(i < (int)(Ntime / 2)) > temp_mat(j, i + int(Ntime / 2)) = *(mat_ptr + cur_pos); > else > temp_mat(j, i - int(Ntime / 2)) = *(mat_ptr + cur_pos); > } > } > for(int i = 0; i < Ntime; ++i) { > const PetscInt row_shift = i * local_row_num; > for(int j = 0; j < local_row_num; ++j) { > const PetscInt cur_pos = j + row_shift; > *(mat_ptr + cur_pos) = temp_mat(j, i); > } > } > MatDenseRestoreArray (fft_matrix, &mat_ptr); > } > > but I do not like the approach of having a second matrix as temporary > storage space. Are there more efficient approaches possible using > PETSc-functions? > > Thanks! > > Regards, > > Roland Richter > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue May 18 11:49:23 2021 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 18 May 2021 11:49:23 -0500 Subject: [petsc-users] configure error In-Reply-To: References: <22289CC9-0240-4046-B42E-8CB80F6DD9BB@petsc.dev> <2ADD8B67-6C33-47A7-824F-5F1C15937DD5@petsc.dev> <16A9657D-0FCE-4BEA-954B-06F4127F4145@petsc.dev> Message-ID: configure prints the information about CUDA at the end of the run, you can check that information to see which was actually used. I have a new MR where PETSc records the gencodearch it was built with and then when your program starts up CUDA it verifies that the hardware supports the gencodearch it was built with. Hopefully this will alleviate difficulties in the future. Of course this won't help when using libraries that use CUDA built externally from PETSc. Barry > On May 18, 2021, at 10:30 AM, Junchao Zhang wrote: > > '--with-cuda-gencodearch=70', > > --Junchao Zhang > > > On Tue, May 18, 2021 at 6:29 AM Mark Adams > wrote: > Damn, I am getting this problem on Summit and did a clean configure. > I removed the Kokkos arch=70 line and added > '--with-cudac-gencodearch=70', > > Any ideas? > > < Number of SNES iterations = 2 > --- > > Kokkos::Cuda::initialize ERROR: likely mismatch of architecture > > [h50n11:35759] *** Process received signal *** > > [h50n11:35759] Signal: Aborted (6) > > [h50n11:35759] Signal code: (-6) > > [h50n11:35759] [ 0] [0x2000000504d8] > > [h50n11:35759] [ 1] /lib64/libc.so.6(abort+0x2b4)[0x200032322094] > > [h50n11:35759] [ 2] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl10host_abortEPKc+0x58)[0x20000f944558] > > [h50n11:35759] [ 3] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl12CudaInternal10initializeEiP11CUstream_st+0xe60)[0x20000f95c210] > > [h50n11:35759] [ 4] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Cuda15impl_initializeENS0_12SelectDeviceEm+0x30)[0x20000f95c2b0] > > [h50n11:35759] [ 5] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl20CudaSpaceInitializer10initializeERKNS_13InitArgumentsE+0x34)[0x20000f95c314] > > [h50n11:35759] [ 6] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl16ExecSpaceManager17initialize_spacesERKNS_13InitArgumentsE+0x60)[0x20000f926aa0] > > [h50n11:35759] [ 7] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_0001ef6b_00000000_6_Kokkos_Core_cpp1_ii_889c95a619initialize_backendsERKNS_13InitArgumentsE+0x2c)[0x20000f926dac] > > [h50n11:35759] [ 8] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl70_GLOBAL__N__46_tmpxft_0001ef6b_00000000_6_Kokkos_Core_cpp1_ii_889c95a619initialize_internalERKNS_13InitArgumentsE+0x2c)[0x20000f92b73c] > > [h50n11:35759] [ 9] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libkokkoscore.so.3.4(_ZN6Kokkos10initializeENS_13InitArgumentsE+0x2c)[0x20000f92d08c] > > [h50n11:35759] [10] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(PetscKokkosInitializeCheck+0x1f4)[0x200000343424] > > [h50n11:35759] [11] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x68b3dc)[0x20000077b3dc] > > [h50n11:35759] [12] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x3119b4)[0x2000004019b4] > > [h50n11:35759] [13] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x306790)[0x2000003f6790] > > [h50n11:35759] [14] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x274d24)[0x200000364d24] > > [h50n11:35759] [15] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(PetscSFBcastWithMemTypeBegin+0xd4)[0x200000412504] > > [h50n11:35759] [16] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x329f9c)[0x200000419f9c] > > [h50n11:35759] [17] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(VecScatterBegin+0x9c)[0x20000041fa8c] > > [h50n11:35759] [18] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(DMGlobalToLocalBegin_DA+0x30)[0x2000010ef560] > > [h50n11:35759] [19] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(DMGlobalToLocalBegin+0x290)[0x2000013314b0] > > [h50n11:35759] [20] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x1513c10)[0x200001603c10] > > [h50n11:35759] [21] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(SNESComputeFunction+0x164)[0x200001625584] > > [h50n11:35759] [22] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(+0x15830f4)[0x2000016730f4] > > [h50n11:35759] [23] /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015(SNESSolve+0x814)[0x200001634c44] > > [h50n11:35759] [24] ./ex19[0x10001a70] > > [h50n11:35759] [25] /lib64/libc.so.6(+0x25200)[0x200032305200] > > [h50n11:35759] [26] /lib64/libc.so.6(__libc_start_main+0xc4)[0x2000323053f4] > > [h50n11:35759] *** End of error message *** > > ERROR: One or more process (first noticed rank 0) terminated with signal 6 (core dumped) > /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials > > On Mon, May 17, 2021 at 8:24 AM Mark Adams > wrote: > I thought I did a clean make but I made a clean one now and it seems to be working now. > > Also, I am trying to fix this error message that I get on Cori with 'make check'. > I set mpiexec='srun -G 2 -c 20' and get an interactive shell with these parameters, but I get error messages on Kokkos: > > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes > See http://www.mcs.anl.gov/petsc/documentation/faq.html > srun: error: Unable to create step for job 1923618: More processors requested than permitted > C/C++ example src/snes/tutorials/ex19 run successfully with cuda > gmake[3]: [makefile:102: runex3k_kokkos] Error 1 (ignored) > 1,25c1 > < atol=1e-50, rtol=1e-08, stol=1e-08, maxit=50, maxf=10000 > < Vec Object: Exact Solution 2 MPI processes > < type: mpikokkos > < Process [0] > < 0. > < 0.015625 > < 0.125 > < Process [1] > < 0.421875 > < 1. > < Vec Object: Forcing function 2 MPI processes > < type: mpikokkos > < Process [0] > < 1e-72 > < 1.50024 > < 3.01563 > < Process [1] > < 4.67798 > < 7. > < 0 SNES Function norm 5.414682427127e+00 > < 1 SNES Function norm 2.952582418265e-01 > < 2 SNES Function norm 4.502293658739e-04 > < 3 SNES Function norm 1.389665806646e-09 > < Number of SNES iterations = 3 > < Norm of error 1.49752e-10 Iterations 3 > --- > > srun: error: Unable to create step for job 1923618: More processors requested than permitted > /global/homes/m/madams/petsc/src/snes/tutorials > Possible problem with ex3k running with kokkos-kernels, diffs above > ========================================= > Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process > Completed test examples > > On Sun, May 16, 2021 at 11:14 PM Barry Smith > wrote: > > Could still be a gencode arch issue. Is it possible that Kokkos was built with the 80 arch and when you reran configure with 70 it did not rebuild Kokkos because it didn't know it needed to? > > Sorry, but this may require another rm -rf arch* and running ./configure again. > > https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg3f51e3575c2178246db0a94a430e0038b6af535e7e53d3f21e2437e8977b8c2e > > > cudaErrorInvalidDeviceFunction = 98 > The requested device function does not exist or is not compiled for the proper device architecture. > > > >> On May 16, 2021, at 9:09 PM, Mark Adams > wrote: >> >> I now get this error. A blas error from VecAXPBYPCZ ... >> Any ideas? >> >> >> terminate called after throwing an instance of 'std::runtime_error' >> what(): cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func()) error( cudaErrorInvalidDeviceFunction): invalid device function /global/u2/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp:654 >> Traceback functionality not available >> >> [cgpu16:55192] *** Process received signal *** >> [cgpu16:55192] Signal: Aborted (6) >> [cgpu16:55192] Signal code: (-6) >> [cgpu16:55192] [ 0] /lib64/libpthread.so.0(+0x12360)[0x2aab12445360] >> [cgpu16:55192] [ 1] /lib64/libc.so.6(gsignal+0x110)[0x2aab12687160] >> [cgpu16:55192] [ 2] /lib64/libc.so.6(abort+0x151)[0x2aab12688741] >> [cgpu16:55192] [ 3] /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x93e83)[0x2aab10cb0e83] >> [cgpu16:55192] [ 4] /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99de6)[0x2aab10cb6de6] >> [cgpu16:55192] [ 5] /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x99e21)[0x2aab10cb6e21] >> [cgpu16:55192] [ 6] /usr/common/software/sles15_cgpu/gcc/8.3.0/lib64/libstdc++.so.6(+0x9a053)[0x2aab10cb7053] >> [cgpu16:55192] [ 7] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(+0x26a7f)[0x2aaabbcb3a7f] >> [cgpu16:55192] [ 8] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoscore.so.3.4(_ZN6Kokkos4Impl25cuda_internal_error_throwE9cudaErrorPKcS3_i+0x29d)[0x2aaabbcdab9d] >> [cgpu16:55192] [ 9] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl16V_Update_GenericIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEEiEEvRKNT_20non_const_value_typeERKSG_RKNT0_20non_const_value_typeERKSM_RKNT1_20non_const_value_typeERKSS_iii+0x3357)[0x2aaaae7108a7] >> [cgpu16:55192] [10] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libkokkoskernels.so(_ZN10KokkosBlas4Impl6UpdateIN6Kokkos4ViewIPKdJNS2_10LayoutLeftENS2_6DeviceINS2_4CudaENS2_9CudaSpaceEEENS2_12MemoryTraitsILj1EEEEEESD_NS3_IPdJS6_SA_SC_EEELi1ELb0ELb1EE6updateERS4_RKSD_SH_SJ_SH_RKSF_+0xc1)[0x2aaaae7171a1] >> [cgpu16:55192] [11] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(_ZN10KokkosBlas6updateIN6Kokkos4ViewIPKdJNS1_9CudaSpaceEEEES6_NS2_IPdJS5_EEEEEvRKNT_20non_const_value_typeERKS9_RKNT0_20non_const_value_typeERKSF_RKNT1_20non_const_value_typeERKSL_+0x271)[0x2aaaab76d781] >> [cgpu16:55192] [12] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0xa9333b)[0x2aaaab76633b] >> [cgpu16:55192] [13] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(VecAXPBYPCZ+0x261)[0x2aaaab0b03c1] >> [cgpu16:55192] [14] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155144e)[0x2aaaac22444e] >> [cgpu16:55192] [15] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESTSFormFunction+0xa)[0x2aaaac1c9c1a] >> [cgpu16:55192] [16] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESComputeFunction+0xf5)[0x2aaaac138675] >> [cgpu16:55192] [17] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x14ac85e)[0x2aaaac17f85e] >> [cgpu16:55192] [18] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(SNESSolve+0x821)[0x2aaaac146651] >> [cgpu16:55192] [19] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(+0x155526c)[0x2aaaac22826c] >> [cgpu16:55192] [20] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSStep+0x1f5)[0x2aaaac1d6a05] >> [cgpu16:55192] [21] /global/homes/m/madams/petsc/arch-cori-gpu-opt-kokkos-gcc/lib/libpetsc.so.3.015(TSSolve+0x6a5)[0x2aaaac1dc455] >> [cgpu16:55192] [22] ../ex2-kok[0x4033eb] >> [cgpu16:55192] [23] /lib64/libc.so.6(__libc_start_main+0xea)[0x2aab12671f8a] >> [cgpu16:55192] [24] ../ex2-kok[0x404aaa] >> [cgpu16:55192] *** End of error message *** >> /global/homes/m/madams/mps-wrapper.sh: line 30: 55192 Aborted "$@" >> 0 stopping nvidia-cuda-mps-control on cgpu16 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sajidsyed2021 at u.northwestern.edu Tue May 18 11:59:09 2021 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Tue, 18 May 2021 11:59:09 -0500 Subject: [petsc-users] petsc-users Digest, Vol 149, Issue 47 In-Reply-To: References: Message-ID: You could use VecPermulte and MatPermute from the PETSc API to permute the vectors/matrices. However, MatPermute creates a new matrix and even though VecPermute permutes the vector (locally) in-place, it allocates a temporary array and frees the original array. Since you are working on dense matrices and vectors and desire to avoid temporary allocation, you could use BLAS level 1 swap function (used by VecSwap to swap two vectors) which will probably be the most optimized version for the hardware you're using (since it's implemented by platform specific intrinsics and assembly). ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 18 May 2021 11:44:36 -0500 > From: Barry Smith > To: Roland Richter > Cc: PETSc > Subject: Re: [petsc-users] Efficient FFTShift-implementation for > vectors/matrices > Message-ID: <83012843-A364-4CE1-A92B-1768EB11591E at petsc.dev> > Content-Type: text/plain; charset="us-ascii" > > > I found a variety of things on the web, below. I don't understand this > but for the even case it seems one simply modifies the input matrix before > the FFT http://www.fftw.org/faq/section3.html#centerorigin > > > https://stackoverflow.com/questions/5915125/fftshift-ifftshift-c-c-source-code > < > https://stackoverflow.com/questions/5915125/fftshift-ifftshift-c-c-source-code > > > > https://www.dsprelated.com/showthread/comp.dsp/20790-1.php < > https://www.dsprelated.com/showthread/comp.dsp/20790-1.php> > > > > > On May 18, 2021, at 9:48 AM, Roland Richter > wrote: > > > > Dear all, > > > > I tried to implement the function fftshift from numpy (i.e. swap the > > half-spaces of all axis) for row vectors in a matrix by using the > > following code > > > > void fft_shift(Mat &fft_matrix) { > > PetscScalar *mat_ptr; > > MatDenseGetArray (fft_matrix, &mat_ptr); > > PetscInt r_0, r_1; > > MatGetOwnershipRange(fft_matrix, &r_0, &r_1); > > PetscInt local_row_num = r_1 - r_0; > > arma::cx_mat temp_mat(local_row_num, Ntime, arma::fill::zeros); > > for(int i = 0; i < Ntime; ++i) { > > const PetscInt row_shift = i * local_row_num; > > for(int j = 0; j < local_row_num; ++j) { > > const PetscInt cur_pos = j + row_shift; > > if(i < (int)(Ntime / 2)) > > temp_mat(j, i + int(Ntime / 2)) = *(mat_ptr + > cur_pos); > > else > > temp_mat(j, i - int(Ntime / 2)) = *(mat_ptr + > cur_pos); > > } > > } > > for(int i = 0; i < Ntime; ++i) { > > const PetscInt row_shift = i * local_row_num; > > for(int j = 0; j < local_row_num; ++j) { > > const PetscInt cur_pos = j + row_shift; > > *(mat_ptr + cur_pos) = temp_mat(j, i); > > } > > } > > MatDenseRestoreArray (fft_matrix, &mat_ptr); > > } > > > > but I do not like the approach of having a second matrix as temporary > > storage space. Are there more efficient approaches possible using > > PETSc-functions? > > > > Thanks! > > > > Regards, > > > > Roland Richter > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210518/b8710455/attachment-0001.html > > > > -- Sajid Ali (he/him) | PhD Candidate Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue May 18 13:30:58 2021 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 18 May 2021 14:30:58 -0400 Subject: [petsc-users] DMPlex from periodic Gmsh and ghost cells In-Reply-To: References: Message-ID: On Tue, May 18, 2021 at 12:16 PM Thibault Bridel-Bertomeu < thibault.bridelbertomeu at gmail.com> wrote: > Hi Matthew, > > Thank you very much for your quick answer, as always ! > Cool. > Le mar. 18 mai 2021 ? 17:46, Matthew Knepley a ?crit : > >> On Tue, May 18, 2021 at 5:19 AM Thibault Bridel-Bertomeu < >> thibault.bridelbertomeu at gmail.com> wrote: >> >>> Dear all, >>> >>> I am playing around with creating DMPlex from a periodic Gmsh mesh >>> (still in the same finite volume code that solves the Euler equations) >>> because I want to run some classical periodic test cases from the >>> literature (like convecting a vortex and so on). >>> I got from the mailing list and the examples that I gotta use >>> -dm_plex_gmsh_periodic for the reader and the DM creator to do their job >>> properly and read the $Periodic section from the Gmsh mesh. That works >>> fine, I can write the DM in HDF5 format afterwards and the periodicity is >>> clearly there (i can see in Paraview that opposite sides are "attached" to >>> each other). >>> Now, in the context of the finite volume solver, I wrote my one >>> right-hand side term computing routine, and I rely heavily on routines such >>> as DMPlexGetFaceFields. Those routines, in turn rely on the ghost cells and >>> ... I am having an issue with DMPlexConstructGhostCells. When I use >>> -dm_plex_gmsh_periodic, DMPlexCreateFromFile works fine, >>> DMSetBasicAdjacency(dm, true, false) works fine, DMPlexDistribute works >>> fine, but when I get to DMPlexConstructGhostCells, I get >>> >>> "DM has boundary face 2048 with 2 support cells" >>> >>> I understand the DM begin periodic, even the "boundary" faces have two >>> neighbors because of the topological folding, so ... yeah, the ghost cells >>> cannot really exist. But then how can I proceed ? Because I need that >>> "ghost" label for all the routines I call from the RHS computing routine ... >>> >> >> Thanks for the example. I have run this myself. It looks to me like the >> mesh you are providing. Has no boundary. It is doubly periodic (a torus). >> When I tell ConstructGhostCells() to >> create a label for the boundary, rather than using "Face Sets", it >> creates an empty label because no faces have only 1 neighbor. >> > > Yes indeed, up is down and right is left for this mesh, the idea being of > emulating an infinite medium. > So you manage to get through ConstructGhostCells without a segfault ? What > did you specify instead of PETSC_NULL_CHARACTER ? > The NULL tells it to use the label "Face Sets", but that label coming from Gmsh marks edges that are not actually boundaries. I asked PETSc to create a boundary label named "marker" using https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMPLEX/DMPlexMarkBoundaryFaces.html It was empty, but everything goes through fine. > Also, does it still create a ? ghost ? label even if it?s empty ? > Yes. > Do you think that will hold for later uses in the GetFaceFields and > similar routines ? > Yes. > I read through the source and it seems it won?t like it much if there are > no ghost at all, but since I did not pass the ConstructGhostCell step I?m > not sure. > It should work fine, you just will not need boundary conditions. Thanks, Matt > Thanks ! > > Thibault > > >> Thanks, >> >> Matt >> >> >>> I attach to this email the periodic Gmsh mesh I am playing with, and the >>> routine from my code where I create the DM and so on: >>> >>> subroutine initmesh >>> >>> PetscErrorCode :: ierr >>> DM :: dmDist, dmGhost >>> integer :: overlap >>> PetscViewer :: vtkViewer >>> >>> call PetscLogEventBegin(timer_initmesh, ierr); CHKERRA(ierr) >>> >>> ! Number of neighbours taken into account in MP communications(1 - Order >>> 1; 2 - Order 2) >>> overlap = 1 >>> >>> call PetscPrintf(PETSC_COMM_WORLD, "Initializing mesh...\n", ierr) ; >>> CHKERRA(ierr) >>> >>> ! Force DMPlex to use gmsh marker >>> ! call PetscOptionsSetValue(PETSC_NULL_OPTIONS, >>> "-dm_plex_gmsh_use_marker", "true", ierr); CHKERRA(ierr) >>> >>> ! Read mesh from file name 'meshname' >>> call DMPlexCreateFromFile(PETSC_COMM_WORLD, meshname, PETSC_TRUE, dm, >>> ierr); CHKERRA(ierr) >>> >>> ! Distribute on processors >>> ! Start with connectivity >>> call DMSetBasicAdjacency(dm, PETSC_TRUE, PETSC_FALSE, ierr) ; >>> CHKERRA(ierr) >>> >>> ! Distribute on processors >>> call DMPlexDistribute(dm, overlap, PETSC_NULL_SF, dmDist, ierr) ; >>> CHKERRA(ierr) >>> >>> ! Security check >>> if (dmDist /= PETSC_NULL_DM) then >>> ! Destroy previous dm >>> call DMDestroy(dm, ierr) ; CHKERRA(ierr) >>> ! Replace with dmDist >>> dm = dmDist >>> end if >>> >>> ! Finalize setup of the object >>> call DMSetFromOptions(dm, ierr) ; CHKERRA(ierr) >>> >>> ! Boundary condition with ghost cells >>> call DMPlexConstructGhostCells(dm, PETSC_NULL_CHARACTER, >>> PETSC_NULL_INTEGER, dmGhost, ierr); CHKERRA(ierr) >>> >>> ! Security check >>> if (dmGhost /= PETSC_NULL_DM) then >>> ! Destroy previous dm >>> call DMDestroy(dm, ierr) ; CHKERRA(ierr) >>> ! Replace with dmGhost >>> dm = dmGhost >>> end if >>> >>> if (debug) then >>> ! Show in terminal >>> call PetscPrintf(PETSC_COMM_WORLD, ":: [DEBUG] Visualizing DM in >>> console ::\n", ierr); CHKERRA(ierr) >>> call DMView(dm, PETSC_VIEWER_STDOUT_WORLD, ierr) ; CHKERRA(ierr) >>> ! VTK viewer >>> call PetscViewerCreate(PETSC_COMM_WORLD, vtkViewer, ierr) ; >>> CHKERRA(ierr) >>> call PetscViewerSetType(vtkViewer, PETSCVIEWERHDF5, ierr) ; >>> CHKERRA(ierr) >>> call PetscViewerFileSetMode(vtkViewer, FILE_MODE_WRITE, ierr) ; >>> CHKERRA(ierr) >>> call PetscViewerFileSetName(vtkViewer, "debug_initmesh.h5", ierr) ; >>> CHKERRA(ierr) >>> call DMView(dm, vtkViewer, ierr) ; CHKERRA(ierr) >>> call PetscViewerDestroy(vtkViewer, ierr) ; CHKERRA(ierr) >>> end if >>> >>> call PetscPrintf(PETSC_COMM_WORLD, "Done !\n", ierr) ; CHKERRA(ierr) >>> >>> call PetscLogEventEnd(timer_initmesh, ierr); CHKERRA(ierr) >>> >>> end subroutine initmesh >>> >>> Thank you very much for your help ! >>> >>> Best regards, >>> >>> Thibault Bridel-Bertomeu >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- > Thibault Bridel-Bertomeu > ? > Eng, MSc, PhD > Research Engineer > CEA/CESTA > 33114 LE BARP > Tel.: (+33)557046924 > Mob.: (+33)611025322 > Mail: thibault.bridelbertomeu at gmail.com > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibault.bridelbertomeu at gmail.com Wed May 19 03:36:26 2021 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Wed, 19 May 2021 10:36:26 +0200 Subject: [petsc-users] DMPlex from periodic Gmsh and ghost cells In-Reply-To: References: Message-ID: Hello Matt, Le mar. 18 mai 2021 ? 20:31, Matthew Knepley a ?crit : > On Tue, May 18, 2021 at 12:16 PM Thibault Bridel-Bertomeu < > thibault.bridelbertomeu at gmail.com> wrote: > >> Hi Matthew, >> >> Thank you very much for your quick answer, as always ! >> > > Cool. > > >> Le mar. 18 mai 2021 ? 17:46, Matthew Knepley a >> ?crit : >> >>> On Tue, May 18, 2021 at 5:19 AM Thibault Bridel-Bertomeu < >>> thibault.bridelbertomeu at gmail.com> wrote: >>> >>>> Dear all, >>>> >>>> I am playing around with creating DMPlex from a periodic Gmsh mesh >>>> (still in the same finite volume code that solves the Euler equations) >>>> because I want to run some classical periodic test cases from the >>>> literature (like convecting a vortex and so on). >>>> I got from the mailing list and the examples that I gotta use >>>> -dm_plex_gmsh_periodic for the reader and the DM creator to do their job >>>> properly and read the $Periodic section from the Gmsh mesh. That works >>>> fine, I can write the DM in HDF5 format afterwards and the periodicity is >>>> clearly there (i can see in Paraview that opposite sides are "attached" to >>>> each other). >>>> Now, in the context of the finite volume solver, I wrote my one >>>> right-hand side term computing routine, and I rely heavily on routines such >>>> as DMPlexGetFaceFields. Those routines, in turn rely on the ghost cells and >>>> ... I am having an issue with DMPlexConstructGhostCells. When I use >>>> -dm_plex_gmsh_periodic, DMPlexCreateFromFile works fine, >>>> DMSetBasicAdjacency(dm, true, false) works fine, DMPlexDistribute works >>>> fine, but when I get to DMPlexConstructGhostCells, I get >>>> >>>> "DM has boundary face 2048 with 2 support cells" >>>> >>>> I understand the DM begin periodic, even the "boundary" faces have two >>>> neighbors because of the topological folding, so ... yeah, the ghost cells >>>> cannot really exist. But then how can I proceed ? Because I need that >>>> "ghost" label for all the routines I call from the RHS computing routine ... >>>> >>> >>> Thanks for the example. I have run this myself. It looks to me like the >>> mesh you are providing. Has no boundary. It is doubly periodic (a torus). >>> When I tell ConstructGhostCells() to >>> create a label for the boundary, rather than using "Face Sets", it >>> creates an empty label because no faces have only 1 neighbor. >>> >> >> Yes indeed, up is down and right is left for this mesh, the idea being of >> emulating an infinite medium. >> So you manage to get through ConstructGhostCells without a segfault ? >> What did you specify instead of PETSC_NULL_CHARACTER ? >> > > The NULL tells it to use the label "Face Sets", but that label coming from > Gmsh marks edges that are not actually boundaries. I asked PETSc > to create a boundary label named "marker" using > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMPLEX/DMPlexMarkBoundaryFaces.html > It was empty, but everything goes through fine. > OK thanks ! I had to write the F90 wrapper for DMLabelCreate but aside from that, everything indeed goes through fine with the ConstructGhostCells. > > >> Also, does it still create a ? ghost ? label even if it?s empty ? >> > > Yes. > > >> Do you think that will hold for later uses in the GetFaceFields and >> similar routines ? >> > > Yes. > > >> I read through the source and it seems it won?t like it much if there are >> no ghost at all, but since I did not pass the ConstructGhostCell step I?m >> not sure. >> > > It should work fine, you just will not need boundary conditions. > Actually, there is an issue with the corner cells. My mesh is periodic in both directions and it turns out the corner cells accumulate way too much numerical flux and the computation crashes after 4 iterations or so - it happens both with quads and tris. Whether I use my version of the RHS computing routine or the DMPlexTSComputeRHSFunctionFVM leads to the same crash. I'll keep investigating, but if you have any idea what could go wrong ... Thanks !! Thibault > > Thanks, > > Matt > > >> Thanks ! >> >> Thibault >> >> >>> Thanks, >>> >>> Matt >>> >>> >>>> I attach to this email the periodic Gmsh mesh I am playing with, and >>>> the routine from my code where I create the DM and so on: >>>> >>>> subroutine initmesh >>>> >>>> PetscErrorCode :: ierr >>>> DM :: dmDist, dmGhost >>>> integer :: overlap >>>> PetscViewer :: vtkViewer >>>> >>>> call PetscLogEventBegin(timer_initmesh, ierr); CHKERRA(ierr) >>>> >>>> ! Number of neighbours taken into account in MP communications(1 - >>>> Order 1; 2 - Order 2) >>>> overlap = 1 >>>> >>>> call PetscPrintf(PETSC_COMM_WORLD, "Initializing mesh...\n", ierr) ; >>>> CHKERRA(ierr) >>>> >>>> ! Force DMPlex to use gmsh marker >>>> ! call PetscOptionsSetValue(PETSC_NULL_OPTIONS, >>>> "-dm_plex_gmsh_use_marker", "true", ierr); CHKERRA(ierr) >>>> >>>> ! Read mesh from file name 'meshname' >>>> call DMPlexCreateFromFile(PETSC_COMM_WORLD, meshname, PETSC_TRUE, dm, >>>> ierr); CHKERRA(ierr) >>>> >>>> ! Distribute on processors >>>> ! Start with connectivity >>>> call DMSetBasicAdjacency(dm, PETSC_TRUE, PETSC_FALSE, ierr) ; >>>> CHKERRA(ierr) >>>> >>>> ! Distribute on processors >>>> call DMPlexDistribute(dm, overlap, PETSC_NULL_SF, dmDist, ierr) ; >>>> CHKERRA(ierr) >>>> >>>> ! Security check >>>> if (dmDist /= PETSC_NULL_DM) then >>>> ! Destroy previous dm >>>> call DMDestroy(dm, ierr) ; CHKERRA(ierr) >>>> ! Replace with dmDist >>>> dm = dmDist >>>> end if >>>> >>>> ! Finalize setup of the object >>>> call DMSetFromOptions(dm, ierr) ; CHKERRA(ierr) >>>> >>>> ! Boundary condition with ghost cells >>>> call DMPlexConstructGhostCells(dm, PETSC_NULL_CHARACTER, >>>> PETSC_NULL_INTEGER, dmGhost, ierr); CHKERRA(ierr) >>>> >>>> ! Security check >>>> if (dmGhost /= PETSC_NULL_DM) then >>>> ! Destroy previous dm >>>> call DMDestroy(dm, ierr) ; CHKERRA(ierr) >>>> ! Replace with dmGhost >>>> dm = dmGhost >>>> end if >>>> >>>> if (debug) then >>>> ! Show in terminal >>>> call PetscPrintf(PETSC_COMM_WORLD, ":: [DEBUG] Visualizing DM in >>>> console ::\n", ierr); CHKERRA(ierr) >>>> call DMView(dm, PETSC_VIEWER_STDOUT_WORLD, ierr) ; CHKERRA(ierr) >>>> ! VTK viewer >>>> call PetscViewerCreate(PETSC_COMM_WORLD, vtkViewer, ierr) ; >>>> CHKERRA(ierr) >>>> call PetscViewerSetType(vtkViewer, PETSCVIEWERHDF5, ierr) ; >>>> CHKERRA(ierr) >>>> call PetscViewerFileSetMode(vtkViewer, FILE_MODE_WRITE, ierr) ; >>>> CHKERRA(ierr) >>>> call PetscViewerFileSetName(vtkViewer, "debug_initmesh.h5", ierr) ; >>>> CHKERRA(ierr) >>>> call DMView(dm, vtkViewer, ierr) ; CHKERRA(ierr) >>>> call PetscViewerDestroy(vtkViewer, ierr) ; CHKERRA(ierr) >>>> end if >>>> >>>> call PetscPrintf(PETSC_COMM_WORLD, "Done !\n", ierr) ; CHKERRA(ierr) >>>> >>>> call PetscLogEventEnd(timer_initmesh, ierr); CHKERRA(ierr) >>>> >>>> end subroutine initmesh >>>> >>>> Thank you very much for your help ! >>>> >>>> Best regards, >>>> >>>> Thibault Bridel-Bertomeu >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> -- >> Thibault Bridel-Bertomeu >> ? >> Eng, MSc, PhD >> Research Engineer >> CEA/CESTA >> 33114 LE BARP >> Tel.: (+33)557046924 >> Mob.: (+33)611025322 >> Mail: thibault.bridelbertomeu at gmail.com >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 19 06:32:50 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 19 May 2021 07:32:50 -0400 Subject: [petsc-users] DMPlex from periodic Gmsh and ghost cells In-Reply-To: References: Message-ID: On Wed, May 19, 2021 at 4:36 AM Thibault Bridel-Bertomeu < thibault.bridelbertomeu at gmail.com> wrote: > Hello Matt, > > Le mar. 18 mai 2021 ? 20:31, Matthew Knepley a ?crit : > >> On Tue, May 18, 2021 at 12:16 PM Thibault Bridel-Bertomeu < >> thibault.bridelbertomeu at gmail.com> wrote: >> >>> Hi Matthew, >>> >>> Thank you very much for your quick answer, as always ! >>> >> >> Cool. >> >> >>> Le mar. 18 mai 2021 ? 17:46, Matthew Knepley a >>> ?crit : >>> >>>> On Tue, May 18, 2021 at 5:19 AM Thibault Bridel-Bertomeu < >>>> thibault.bridelbertomeu at gmail.com> wrote: >>>> >>>>> Dear all, >>>>> >>>>> I am playing around with creating DMPlex from a periodic Gmsh mesh >>>>> (still in the same finite volume code that solves the Euler equations) >>>>> because I want to run some classical periodic test cases from the >>>>> literature (like convecting a vortex and so on). >>>>> I got from the mailing list and the examples that I gotta use >>>>> -dm_plex_gmsh_periodic for the reader and the DM creator to do their job >>>>> properly and read the $Periodic section from the Gmsh mesh. That works >>>>> fine, I can write the DM in HDF5 format afterwards and the periodicity is >>>>> clearly there (i can see in Paraview that opposite sides are "attached" to >>>>> each other). >>>>> Now, in the context of the finite volume solver, I wrote my one >>>>> right-hand side term computing routine, and I rely heavily on routines such >>>>> as DMPlexGetFaceFields. Those routines, in turn rely on the ghost cells and >>>>> ... I am having an issue with DMPlexConstructGhostCells. When I use >>>>> -dm_plex_gmsh_periodic, DMPlexCreateFromFile works fine, >>>>> DMSetBasicAdjacency(dm, true, false) works fine, DMPlexDistribute works >>>>> fine, but when I get to DMPlexConstructGhostCells, I get >>>>> >>>>> "DM has boundary face 2048 with 2 support cells" >>>>> >>>>> I understand the DM begin periodic, even the "boundary" faces have two >>>>> neighbors because of the topological folding, so ... yeah, the ghost cells >>>>> cannot really exist. But then how can I proceed ? Because I need that >>>>> "ghost" label for all the routines I call from the RHS computing routine ... >>>>> >>>> >>>> Thanks for the example. I have run this myself. It looks to me like >>>> the mesh you are providing. Has no boundary. It is doubly periodic (a >>>> torus). When I tell ConstructGhostCells() to >>>> create a label for the boundary, rather than using "Face Sets", it >>>> creates an empty label because no faces have only 1 neighbor. >>>> >>> >>> Yes indeed, up is down and right is left for this mesh, the idea being >>> of emulating an infinite medium. >>> So you manage to get through ConstructGhostCells without a segfault ? >>> What did you specify instead of PETSC_NULL_CHARACTER ? >>> >> >> The NULL tells it to use the label "Face Sets", but that label coming >> from Gmsh marks edges that are not actually boundaries. I asked PETSc >> to create a boundary label named "marker" using >> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMPLEX/DMPlexMarkBoundaryFaces.html >> It was empty, but everything goes through fine. >> > > OK thanks ! I had to write the F90 wrapper for DMLabelCreate but aside > from that, everything indeed goes through fine with the ConstructGhostCells. > > >> >> >>> Also, does it still create a ? ghost ? label even if it?s empty ? >>> >> >> Yes. >> >> >>> Do you think that will hold for later uses in the GetFaceFields and >>> similar routines ? >>> >> >> Yes. >> >> >>> I read through the source and it seems it won?t like it much if there >>> are no ghost at all, but since I did not pass the ConstructGhostCell step >>> I?m not sure. >>> >> >> It should work fine, you just will not need boundary conditions. >> > > Actually, there is an issue with the corner cells. My mesh is periodic in > both directions and it turns out the corner cells accumulate way too much > numerical flux and the computation crashes after 4 iterations or so - it > happens both with quads and tris. Whether I use my version of the RHS > computing routine or the DMPlexTSComputeRHSFunctionFVM leads to the same > crash. > I'll keep investigating, but if you have any idea what could go wrong ... > Hmm, I have not run a fully periodic case. I guess I could set one up. However, it must be the case that we are conservative, since we first find a flux on every edge, and then update cells. So the corner cells should not be able to accumulate more than the total mass/4. I would first check at each step that total mass is conserved. If not, we can track that down. Thanks, Matt > Thanks !! > > Thibault > > >> >> Thanks, >> >> Matt >> >> >>> Thanks ! >>> >>> Thibault >>> >>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> I attach to this email the periodic Gmsh mesh I am playing with, and >>>>> the routine from my code where I create the DM and so on: >>>>> >>>>> subroutine initmesh >>>>> >>>>> PetscErrorCode :: ierr >>>>> DM :: dmDist, dmGhost >>>>> integer :: overlap >>>>> PetscViewer :: vtkViewer >>>>> >>>>> call PetscLogEventBegin(timer_initmesh, ierr); CHKERRA(ierr) >>>>> >>>>> ! Number of neighbours taken into account in MP communications(1 - >>>>> Order 1; 2 - Order 2) >>>>> overlap = 1 >>>>> >>>>> call PetscPrintf(PETSC_COMM_WORLD, "Initializing mesh...\n", ierr) ; >>>>> CHKERRA(ierr) >>>>> >>>>> ! Force DMPlex to use gmsh marker >>>>> ! call PetscOptionsSetValue(PETSC_NULL_OPTIONS, >>>>> "-dm_plex_gmsh_use_marker", "true", ierr); CHKERRA(ierr) >>>>> >>>>> ! Read mesh from file name 'meshname' >>>>> call DMPlexCreateFromFile(PETSC_COMM_WORLD, meshname, PETSC_TRUE, dm, >>>>> ierr); CHKERRA(ierr) >>>>> >>>>> ! Distribute on processors >>>>> ! Start with connectivity >>>>> call DMSetBasicAdjacency(dm, PETSC_TRUE, PETSC_FALSE, ierr) ; >>>>> CHKERRA(ierr) >>>>> >>>>> ! Distribute on processors >>>>> call DMPlexDistribute(dm, overlap, PETSC_NULL_SF, dmDist, ierr) ; >>>>> CHKERRA(ierr) >>>>> >>>>> ! Security check >>>>> if (dmDist /= PETSC_NULL_DM) then >>>>> ! Destroy previous dm >>>>> call DMDestroy(dm, ierr) ; CHKERRA(ierr) >>>>> ! Replace with dmDist >>>>> dm = dmDist >>>>> end if >>>>> >>>>> ! Finalize setup of the object >>>>> call DMSetFromOptions(dm, ierr) ; CHKERRA(ierr) >>>>> >>>>> ! Boundary condition with ghost cells >>>>> call DMPlexConstructGhostCells(dm, PETSC_NULL_CHARACTER, >>>>> PETSC_NULL_INTEGER, dmGhost, ierr); CHKERRA(ierr) >>>>> >>>>> ! Security check >>>>> if (dmGhost /= PETSC_NULL_DM) then >>>>> ! Destroy previous dm >>>>> call DMDestroy(dm, ierr) ; CHKERRA(ierr) >>>>> ! Replace with dmGhost >>>>> dm = dmGhost >>>>> end if >>>>> >>>>> if (debug) then >>>>> ! Show in terminal >>>>> call PetscPrintf(PETSC_COMM_WORLD, ":: [DEBUG] Visualizing DM in >>>>> console ::\n", ierr); CHKERRA(ierr) >>>>> call DMView(dm, PETSC_VIEWER_STDOUT_WORLD, ierr) ; CHKERRA(ierr) >>>>> ! VTK viewer >>>>> call PetscViewerCreate(PETSC_COMM_WORLD, vtkViewer, ierr) ; >>>>> CHKERRA(ierr) >>>>> call PetscViewerSetType(vtkViewer, PETSCVIEWERHDF5, ierr) ; >>>>> CHKERRA(ierr) >>>>> call PetscViewerFileSetMode(vtkViewer, FILE_MODE_WRITE, ierr) ; >>>>> CHKERRA(ierr) >>>>> call PetscViewerFileSetName(vtkViewer, "debug_initmesh.h5", ierr) ; >>>>> CHKERRA(ierr) >>>>> call DMView(dm, vtkViewer, ierr) ; CHKERRA(ierr) >>>>> call PetscViewerDestroy(vtkViewer, ierr) ; CHKERRA(ierr) >>>>> end if >>>>> >>>>> call PetscPrintf(PETSC_COMM_WORLD, "Done !\n", ierr) ; CHKERRA(ierr) >>>>> >>>>> call PetscLogEventEnd(timer_initmesh, ierr); CHKERRA(ierr) >>>>> >>>>> end subroutine initmesh >>>>> >>>>> Thank you very much for your help ! >>>>> >>>>> Best regards, >>>>> >>>>> Thibault Bridel-Bertomeu >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> -- >>> Thibault Bridel-Bertomeu >>> ? >>> Eng, MSc, PhD >>> Research Engineer >>> CEA/CESTA >>> 33114 LE BARP >>> Tel.: (+33)557046924 >>> Mob.: (+33)611025322 >>> Mail: thibault.bridelbertomeu at gmail.com >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangnian at hnu.edu.cn Wed May 19 04:14:14 2021 From: yangnian at hnu.edu.cn (=?utf-8?B?5p2o5b+1?=) Date: Wed, 19 May 2021 17:14:14 +0800 Subject: [petsc-users] The method of print the preconditioned matrix Message-ID: Dear Sir or Madam:   Hello, I am a new PETSc learner. I'm having some problem with it and I'm using version 3.6.3 of PETSc. I can only print out the Jacobi matrix (J) with -mat_view from the command line, but I also need to print out the preconditioning operator (M^(-1)) or the preconditioned matrix (JM^(-1)). Is there a function like that in PETSC? Could you please give me some suggestions? Thank you so much.   Best regards,   Nian Yang -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 19 08:12:20 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 19 May 2021 09:12:20 -0400 Subject: [petsc-users] DMPlex and Boundary facets In-Reply-To: References: Message-ID: Okay, what Plex is saying is that Gmsh gave it a facet (a set of vertices), but those are not the vertices of any tetrahedral face in the mesh. It is hard to verify such a thing with 12K tets. I have definitely read in GMsh files before with facets, so I don't think the code is completely wrong. Would it be possible to make a smaller mesh (10-20 cells) that shows the same bug? Thanks, Matt On Tue, May 18, 2021 at 12:25 PM Karin&NiKo wrote: > Sure! I send you both, one with the facets and one without. > > Thanks, > Nicolas > > Le mar. 18 mai 2021 ? 17:49, Matthew Knepley a ?crit : > >> On Tue, May 18, 2021 at 8:18 AM Karin&NiKo wrote: >> >>> Dear PETSc team, >>> >>> I have tried to load a test mesh available in Gmsh' s demos directory >>> (share/doc/gmsh/demos/simple_geo/filter.geo, attached to this email) as a >>> DMPlex. >>> So I produced a msh4 file by doing : >>> gmsh -3 filter.geo -o /tmp/test.msh4 >>> Then I used src/dm/impls/plex/tutorials/ex2.c to load the mesh by doing : >>> ./ex2 -filename /tmp/test.msh4 >>> >>> Unfortunately I get the error : >>> >>> [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [0]PETSC ERROR: No support for this operation for this object type >>> [0]PETSC ERROR: Could not determine Plex facet for Gmsh element 1268 >>> (Plex cell 12681) >>> >>> The error seems to come from the fact that the msh file contains >>> tets *and* facets *only on the Physical entities* (aka parts of the mesh >>> boundary where >>> the user will assign Dirichlet or Neuman conditions). >>> If I suppress these facets by commenting the "Physical Surface" lines >>> in the geo file and regenerating the mesh, everything is fine. >>> >>> But the use of these "Physical" stuff is very common in lots of finite >>> element codes in order to assign boundary conditions. >>> How should I do to keep these boundary groups of 2D elements (with >>> corresponding names) ? >>> >> >> Can you also send the *.msh file? I do not have Gmsh on this machine. >> >> Thanks, >> >> Matt >> >> >>> Thanks for your help, >>> Nicolas >>> >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 19 08:15:50 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 19 May 2021 09:15:50 -0400 Subject: [petsc-users] The method of print the preconditioned matrix In-Reply-To: References: Message-ID: On Wed, May 19, 2021 at 8:36 AM ?? wrote: > Dear Sir or Madam: > > > > Hello, I am a new PETSc learner. I'm having some problem with it and I'm > using version 3.6.3 of PETSc. I can only print out the Jacobi matrix (J) > with -mat_view from the command line, but I also need to print out the > preconditioning operator (M^(-1)) or the preconditioned matrix (JM^(-1)). > Is there a function like that in PETSC? Could you please give me some > suggestions? Thank you so much. > I think you want https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCComputeOperator.html Thanks, Matt > > > Best regards, > > > > Nian Yang > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From niko.karin at gmail.com Wed May 19 08:46:01 2021 From: niko.karin at gmail.com (Karin&NiKo) Date: Wed, 19 May 2021 15:46:01 +0200 Subject: [petsc-users] DMPlex and Boundary facets In-Reply-To: References: Message-ID: Dear Matthew, You are (again) right. This Gmsh test file is "dirty" : some triangles do not belong to tests. Sorry for that. I have tried with another geo file (which is clean in that sense) and PETSc reads with no error. I take this opportunity to ask my initial question : are the labels of the Physical Entities saved somewhere ? A gmsh test file with several Physical entities is attached to this email. Thank you again, Nicolas Le mer. 19 mai 2021 ? 15:12, Matthew Knepley a ?crit : > Okay, what Plex is saying is that Gmsh gave it a facet (a set of > vertices), but those are not the vertices > of any tetrahedral face in the mesh. It is hard to verify such a thing > with 12K tets. I have definitely read in > GMsh files before with facets, so I don't think the code is completely > wrong. > > Would it be possible to make a smaller mesh (10-20 cells) that shows the > same bug? > > Thanks, > > Matt > > On Tue, May 18, 2021 at 12:25 PM Karin&NiKo wrote: > >> Sure! I send you both, one with the facets and one without. >> >> Thanks, >> Nicolas >> >> Le mar. 18 mai 2021 ? 17:49, Matthew Knepley a >> ?crit : >> >>> On Tue, May 18, 2021 at 8:18 AM Karin&NiKo wrote: >>> >>>> Dear PETSc team, >>>> >>>> I have tried to load a test mesh available in Gmsh' s demos directory >>>> (share/doc/gmsh/demos/simple_geo/filter.geo, attached to this email) as a >>>> DMPlex. >>>> So I produced a msh4 file by doing : >>>> gmsh -3 filter.geo -o /tmp/test.msh4 >>>> Then I used src/dm/impls/plex/tutorials/ex2.c to load the mesh by doing >>>> : >>>> ./ex2 -filename /tmp/test.msh4 >>>> >>>> Unfortunately I get the error : >>>> >>>> [0]PETSC ERROR: --------------------- Error Message >>>> -------------------------------------------------------------- >>>> [0]PETSC ERROR: No support for this operation for this object type >>>> [0]PETSC ERROR: Could not determine Plex facet for Gmsh element 1268 >>>> (Plex cell 12681) >>>> >>>> The error seems to come from the fact that the msh file contains >>>> tets *and* facets *only on the Physical entities* (aka parts of the mesh >>>> boundary where >>>> the user will assign Dirichlet or Neuman conditions). >>>> If I suppress these facets by commenting the "Physical Surface" lines >>>> in the geo file and regenerating the mesh, everything is fine. >>>> >>>> But the use of these "Physical" stuff is very common in lots of finite >>>> element codes in order to assign boundary conditions. >>>> How should I do to keep these boundary groups of 2D elements (with >>>> corresponding names) ? >>>> >>> >>> Can you also send the *.msh file? I do not have Gmsh on this machine. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks for your help, >>>> Nicolas >>>> >>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Cube_with_facets.msh4 Type: application/octet-stream Size: 7692 bytes Desc: not available URL: From wence at gmx.li Wed May 19 08:49:20 2021 From: wence at gmx.li (Lawrence Mitchell) Date: Wed, 19 May 2021 14:49:20 +0100 Subject: [petsc-users] DMPlex and Boundary facets In-Reply-To: References: Message-ID: <888D7CE9-CCEC-4309-9EE7-FE8D0BEA0367@gmx.li> > On 19 May 2021, at 14:46, Karin&NiKo wrote: > > Dear Matthew, > > You are (again) right. This Gmsh test file is "dirty" : some triangles do not belong to tests. Sorry for that. > I have tried with another geo file (which is clean in that sense) and PETSc reads with no error. > > I take this opportunity to ask my initial question : are the labels of the Physical Entities saved somewhere ? > A gmsh test file with several Physical entities is attached to this email. Yes, plex represents Physical entities as "labels". In particular, for facets, they are loaded as the "Face Sets" label, plex also loads markers on cells in the "Cell Sets" label. e.g. here's the DMView output of your mesh. DM Object: DM_0x84000000_0 1 MPI processes type: plex DM_0x84000000_0 in 3 dimensions: 0-cells: 64 1-cells: 279 2-cells: 378 3-cells: 162 Labels: celltype: 4 strata with value/size (0 (64), 6 (162), 3 (378), 1 (279)) depth: 4 strata with value/size (0 (64), 1 (279), 2 (378), 3 (162)) Cell Sets: 1 strata with value/size (7 (162)) Face Sets: 6 strata with value/size (3 (18), 6 (18), 2 (18), 5 (18), 1 (18), 4 (18)) Lawrence From niko.karin at gmail.com Wed May 19 08:57:03 2021 From: niko.karin at gmail.com (Karin&NiKo) Date: Wed, 19 May 2021 15:57:03 +0200 Subject: [petsc-users] DMPlex and Boundary facets In-Reply-To: <888D7CE9-CCEC-4309-9EE7-FE8D0BEA0367@gmx.li> References: <888D7CE9-CCEC-4309-9EE7-FE8D0BEA0367@gmx.li> Message-ID: Thank you very much Lawrence. Are the names lost or are they saved somewhere ? If I do : "./ex2 -filename /tmp/test/Cube_with_facets.msh4 -dm_view vtk:/tmp/foo.vtk" , I only get the tets of the initial mesh. Thanks, Nicolas Le mer. 19 mai 2021 ? 15:49, Lawrence Mitchell a ?crit : > > > > On 19 May 2021, at 14:46, Karin&NiKo wrote: > > > > Dear Matthew, > > > > You are (again) right. This Gmsh test file is "dirty" : some triangles > do not belong to tests. Sorry for that. > > I have tried with another geo file (which is clean in that sense) and > PETSc reads with no error. > > > > I take this opportunity to ask my initial question : are the labels of > the Physical Entities saved somewhere ? > > A gmsh test file with several Physical entities is attached to this > email. > > Yes, plex represents Physical entities as "labels". In particular, for > facets, they are loaded as the "Face Sets" label, plex also loads markers > on cells in the "Cell Sets" label. > > e.g. here's the DMView output of your mesh. > > DM Object: DM_0x84000000_0 1 MPI processes > type: plex > DM_0x84000000_0 in 3 dimensions: > 0-cells: 64 > 1-cells: 279 > 2-cells: 378 > 3-cells: 162 > Labels: > celltype: 4 strata with value/size (0 (64), 6 (162), 3 (378), 1 (279)) > depth: 4 strata with value/size (0 (64), 1 (279), 2 (378), 3 (162)) > Cell Sets: 1 strata with value/size (7 (162)) > Face Sets: 6 strata with value/size (3 (18), 6 (18), 2 (18), 5 (18), 1 > (18), 4 (18)) > > > Lawrence -------------- next part -------------- An HTML attachment was scrubbed... URL: From wence at gmx.li Wed May 19 09:00:22 2021 From: wence at gmx.li (Lawrence Mitchell) Date: Wed, 19 May 2021 15:00:22 +0100 Subject: [petsc-users] DMPlex and Boundary facets In-Reply-To: References: <888D7CE9-CCEC-4309-9EE7-FE8D0BEA0367@gmx.li> Message-ID: <962F6715-AD86-4419-ACF5-514EC5FA00CD@gmx.li> > On 19 May 2021, at 14:57, Karin&NiKo wrote: > > Thank you very much Lawrence. > Are the names lost or are they saved somewhere ? > If I do : "./ex2 -filename /tmp/test/Cube_with_facets.msh4 -dm_view vtk:/tmp/foo.vtk" , I only get the tets of the initial mesh. I believe that the mapping of names to integer labels is not maintained in the loading process (you mean this bit, right?): $PhysicalNames 7 2 1 "Surf1" 2 2 "Surf2" 2 3 "Encast" 2 4 "Press" 2 5 "Surf5" 2 6 "Surf6" 3 7 "Vol" $EndPhysicalNames I don't know enough (anything) about the VTK output to know what is output for viz. Lawrence From knepley at gmail.com Wed May 19 09:02:04 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 19 May 2021 10:02:04 -0400 Subject: [petsc-users] DMPlex and Boundary facets In-Reply-To: References: <888D7CE9-CCEC-4309-9EE7-FE8D0BEA0367@gmx.li> Message-ID: On Wed, May 19, 2021 at 9:57 AM Karin&NiKo wrote: > Thank you very much Lawrence. > Are the names lost or are they saved somewhere ? > If I do : "./ex2 -filename /tmp/test/Cube_with_facets.msh4 -dm_view > vtk:/tmp/foo.vtk" , I only get the tets of the initial mesh. > Yes, the names are lost. Jed, Lisandro and Stefano have also asked me about this. I have not done since everything is currently done by number in Plex. It would involve either a) making a separate table in the DM which translates names to numbers or b) putting such a translation table in the DMLabel itself I am putting it off because doing some short is not hard, but getting it to work with everything else would be somewhat of a pain. It would need to distribute properly, propagate to refined/coarsened/subset meshes, etc. which labels currently do. It would not be properly passed on when you just set an int on a point. I think it will happen eventually, but I don't think I have time in the next few weeks. Thanks, Matt > Thanks, > Nicolas > > Le mer. 19 mai 2021 ? 15:49, Lawrence Mitchell a ?crit : > >> >> >> > On 19 May 2021, at 14:46, Karin&NiKo wrote: >> > >> > Dear Matthew, >> > >> > You are (again) right. This Gmsh test file is "dirty" : some triangles >> do not belong to tests. Sorry for that. >> > I have tried with another geo file (which is clean in that sense) and >> PETSc reads with no error. >> > >> > I take this opportunity to ask my initial question : are the labels of >> the Physical Entities saved somewhere ? >> > A gmsh test file with several Physical entities is attached to this >> email. >> >> Yes, plex represents Physical entities as "labels". In particular, for >> facets, they are loaded as the "Face Sets" label, plex also loads markers >> on cells in the "Cell Sets" label. >> >> e.g. here's the DMView output of your mesh. >> >> DM Object: DM_0x84000000_0 1 MPI processes >> type: plex >> DM_0x84000000_0 in 3 dimensions: >> 0-cells: 64 >> 1-cells: 279 >> 2-cells: 378 >> 3-cells: 162 >> Labels: >> celltype: 4 strata with value/size (0 (64), 6 (162), 3 (378), 1 (279)) >> depth: 4 strata with value/size (0 (64), 1 (279), 2 (378), 3 (162)) >> Cell Sets: 1 strata with value/size (7 (162)) >> Face Sets: 6 strata with value/size (3 (18), 6 (18), 2 (18), 5 (18), 1 >> (18), 4 (18)) >> >> >> Lawrence > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From kaushikv318 at gmail.com Wed May 19 10:00:33 2021 From: kaushikv318 at gmail.com (Kaushik Vijaykumar) Date: Wed, 19 May 2021 11:00:33 -0400 Subject: [petsc-users] Using SNES to solve Non linear elasticity, passing matrix to form function Message-ID: Hi everyone, We are in the process of integrating nonlinear solution to our FE code through SNES. The first steps that I understood that need to be done is to be able to pass the assembled stiffness matrix K and force vector F to the " *formfunction*" to calculate the residual and "*formjacobian*" to calculate the tangent stiffness matrix. To do so, I have defined a struct: *extern PetscErrorCode FormJacobian(SNES,Vec,Mat,Mat,void*);extern PetscErrorCode FormFunction(SNES,Vec,Vec,void*);* // define a struct type to pass K and f as "user context" *typedef struct { Mat K; Vec f; } K_and_f;* In the main program, the struct is declared *int main()* *{* *K_and_f main_K_and_f;* // declare the struct // SNES - Populate K and f into the struct *main_K_and_f.K = K;* // K matrix *main_K_and_f.f = f; *// f vector .... .... *}* In form function *PetscErrorCode FormFunction(SNES snes, Vec x, Vec F, void* ctx) { PetscErrorCode ierr; PetscReal *ax,*c; PetscReal *aF; PetscScalar *Kc,v; PetscInt nlocal,m,n,i,j,index[100000]; * // Create local Vec and Mat * Mat Kloc;* * Vec Floc, Uloc, KUloc, resloc;* * K_and_f* ptr = (K_and_f*) ctx; *// cast the pointer to void into pointer to struct // Get local F array, FLOC * ierr = VecGhostGetLocalForm(ptr->f,&Floc);CHKERRQ(ierr);* I am able to get the f array from the main program using " VecGhostGetLocalForm" and the vec is correct. However, I am having trouble with reading the K matrix in formfunction. The following trial gives me incorrect K values in Kloc: *ierr = MatSeqAIJGetArray(ptr->K,&Kc);CHKERRQ(ierr); * *ierr = MatGetLocalSize(ptr->K,&m,&n); CHKERRQ(ierr);* *for(i=0;i From bsmith at petsc.dev Wed May 19 10:25:01 2021 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 19 May 2021 10:25:01 -0500 Subject: [petsc-users] Using SNES to solve Non linear elasticity, passing matrix to form function In-Reply-To: References: Message-ID: > ierr = MatSeqAIJGetArray(ptr->K,&Kc);CHKERRQ(ierr); > ierr = MatGetLocalSize(ptr->K,&m,&n); CHKERRQ(ierr); > for(i=0;i { > for(j=0;j { > v = *(Kc + (i*n+j)); > ierr = MatSetValues(Kloc,1,&i,1,&j,&v,INSERT_VALUES);CHKERRQ(ierr); > } > } > I don't understand the purpose of this code fragment. Are you wanting to copy your K into Kloc? What is the purpose of Kloc? With standard usage of SNES one only needs to provide PETSc a global matrix, one does not need to work with "ghosted" matrices at all. Also normally one fills the f vector in FormFunction and the Jacobian in FormJacobian and there is no need to access the Jacobians at all in SNES https://www.mcs.anl.gov/petsc/documentation/faq.html#functionjacobian nor do you need to pass into FormFunction in the context the F or Jacobian matrix. Maybe if you explained your use case a bit more we could make suggestions on how to accomplish it. Barry > On May 19, 2021, at 10:00 AM, Kaushik Vijaykumar wrote: > > Hi everyone, > > We are in the process of integrating nonlinear solution to our FE code through SNES. The first steps that I understood that need to be done is to be able to pass the assembled stiffness matrix K and force vector F to the "formfunction" to calculate the residual and "formjacobian" to calculate the tangent stiffness matrix. To do so, I have defined a struct: > > extern PetscErrorCode FormJacobian(SNES,Vec,Mat,Mat,void*); > extern PetscErrorCode FormFunction(SNES,Vec,Vec,void*); > > // define a struct type to pass K and f as "user context" > typedef struct > { > Mat K; > Vec f; > } K_and_f; > > In the main program, the struct is declared > int main() > { > K_and_f main_K_and_f; // declare the struct > > // SNES - Populate K and f into the struct > main_K_and_f.K = K; // K matrix > main_K_and_f.f = f; // f vector > .... > .... > } > In form function > > PetscErrorCode FormFunction(SNES snes, Vec x, Vec F, void* ctx) { > PetscErrorCode ierr; > PetscReal *ax,*c; > PetscReal *aF; > PetscScalar *Kc,v; > PetscInt nlocal,m,n,i,j,index[100000]; > > // Create local Vec and Mat > Mat Kloc; > Vec Floc, Uloc, KUloc, resloc; > > K_and_f* ptr = (K_and_f*) ctx; // cast the pointer to void into pointer to struct > // Get local F array, FLOC > ierr = VecGhostGetLocalForm(ptr->f,&Floc);CHKERRQ(ierr); > > > I am able to get the f array from the main program using " VecGhostGetLocalForm" and the vec is correct. However, I am having trouble with reading the K matrix in formfunction. > > The following trial gives me incorrect K values in Kloc: > ierr = MatSeqAIJGetArray(ptr->K,&Kc);CHKERRQ(ierr); > ierr = MatGetLocalSize(ptr->K,&m,&n); CHKERRQ(ierr); > for(i=0;i { > for(j=0;j { > v = *(Kc + (i*n+j)); > ierr = MatSetValues(Kloc,1,&i,1,&j,&v,INSERT_VALUES);CHKERRQ(ierr); > } > } > > When I compare K and Kloc, they are not identical > > > Please let me know if there is an equivalent function like VecGhostGetLocalForm() for Matrices. If not, is there a better way to do this. > > Any guidance/help is greatly appreciated. > > Thanks > Kaushik > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kaushikv318 at gmail.com Wed May 19 10:48:23 2021 From: kaushikv318 at gmail.com (Kaushik Vijaykumar) Date: Wed, 19 May 2021 11:48:23 -0400 Subject: [petsc-users] Using SNES to solve Non linear elasticity, passing matrix to form function In-Reply-To: References: Message-ID: Barry, Thanks for your response. I am trying to copy K into Kloc, the purpose of doing that is to calculate the residual in formfunction - Residual = [Kloc]{X} - {Floc}, where Kloc and Floc are the copies of assembled global K matrix and F vector, and X is the current iterated solution. This is the reason that I need to access the global stiffness matrix and force vector in formfunction. I agree that formjacobian only needs access to global stiffness matrix K to form the jacobian and does not need to access the force vector. Therefore, to be able to pass both K and F to formfunction, I defined a struct that contains a vector F and matrix K and populated them in the main(). Please let me know, if this unclear and I can clarify with more details. Thanks Kaushik On Wed, May 19, 2021 at 11:25 AM Barry Smith wrote: > *ierr = MatSeqAIJGetArray(ptr->K,&Kc);CHKERRQ(ierr); * > > *ierr = MatGetLocalSize(ptr->K,&m,&n); CHKERRQ(ierr);* > > > > > > > > *for(i=0;i (i*n+j)); ierr = > MatSetValues(Kloc,1,&i,1,&j,&v,INSERT_VALUES);CHKERRQ(ierr); }* > * } * > > > > I don't understand the purpose of this code fragment. Are you wanting to > copy your K into Kloc? What is the purpose of Kloc? With standard usage of > SNES one only needs to provide PETSc a global matrix, one does not need to > work with "ghosted" matrices at all. > > Also normally one fills the f vector in FormFunction and the Jacobian in > FormJacobian and there is no need to access the Jacobians at all in SNES > https://www.mcs.anl.gov/petsc/documentation/faq.html#functionjacobian > nor do you need to pass into FormFunction in the context the F or Jacobian > matrix. > > Maybe if you explained your use case a bit more we could make suggestions > on how to accomplish it. > > Barry > > > > On May 19, 2021, at 10:00 AM, Kaushik Vijaykumar > wrote: > > Hi everyone, > > We are in the process of integrating nonlinear solution to our FE code > through SNES. The first steps that I understood that need to be done is to > be able to pass the assembled stiffness matrix K and force vector F to the " > *formfunction*" to calculate the residual and "*formjacobian*" to > calculate the tangent stiffness matrix. To do so, I have defined a struct: > > > *extern PetscErrorCode FormJacobian(SNES,Vec,Mat,Mat,void*);extern > PetscErrorCode FormFunction(SNES,Vec,Vec,void*);* > > // define a struct type to pass K and f as "user context" > > > > > *typedef struct { Mat K; Vec f; } K_and_f;* > > In the main program, the struct is declared > *int main()* > *{* > *K_and_f main_K_and_f;* // declare the struct > > // SNES - Populate K and f into the struct > *main_K_and_f.K = K;* // K matrix > *main_K_and_f.f = f; *// f vector > .... > .... > *}* > In form function > > > > > > > > > *PetscErrorCode FormFunction(SNES snes, Vec x, Vec F, void* ctx) { > PetscErrorCode ierr; PetscReal *ax,*c; PetscReal *aF; PetscScalar > *Kc,v; PetscInt nlocal,m,n,i,j,index[100000]; * // Create local Vec > and Mat > * Mat Kloc;* > * Vec Floc, Uloc, KUloc, resloc;* > > * K_and_f* ptr = (K_and_f*) ctx; *// cast the pointer to void into > pointer to struct > // Get local F array, FLOC > * ierr = VecGhostGetLocalForm(ptr->f,&Floc);CHKERRQ(ierr);* > > > I am able to get the f array from the main program using " > VecGhostGetLocalForm" and the vec is correct. However, I am having trouble > with reading the K matrix in formfunction. > > The following trial gives me incorrect K values in Kloc: > *ierr = MatSeqAIJGetArray(ptr->K,&Kc);CHKERRQ(ierr); * > > *ierr = MatGetLocalSize(ptr->K,&m,&n); CHKERRQ(ierr);* > > > > > > > > *for(i=0;i (i*n+j)); ierr = > MatSetValues(Kloc,1,&i,1,&j,&v,INSERT_VALUES);CHKERRQ(ierr); }* > * } * > > When I compare K and Kloc, they are not identical > > > Please let me know if there is an equivalent function like > *VecGhostGetLocalForm()* for Matrices. If not, is there a better way to > do this. > > Any guidance/help is greatly appreciated. > > Thanks > Kaushik > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 19 10:50:49 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 19 May 2021 11:50:49 -0400 Subject: [petsc-users] Using SNES to solve Non linear elasticity, passing matrix to form function In-Reply-To: References: Message-ID: On Wed, May 19, 2021 at 11:48 AM Kaushik Vijaykumar wrote: > Barry, > > Thanks for your response. I am trying to copy K into Kloc, the purpose of > doing that is to calculate the residual in formfunction - Residual = > [Kloc]{X} - {Floc}, where Kloc and Floc are the copies of assembled global > K matrix and F vector, and X is the current iterated solution. This is the > reason that I need to access the global stiffness matrix and force vector > in formfunction. > I agree that formjacobian only needs access to global stiffness matrix K > to form the jacobian and does not need to access the force vector. > > Therefore, to be able to pass both K and F to formfunction, I defined a > struct that contains a vector F and matrix K and populated them in the > main(). > > Please let me know, if this unclear and I can clarify with more details > This still does not make sense. If you have a nonlinear problem, how can you know the Jacobian up front? It changes with the input point. Thanks, Matt > Thanks > Kaushik > > > On Wed, May 19, 2021 at 11:25 AM Barry Smith wrote: > >> *ierr = MatSeqAIJGetArray(ptr->K,&Kc);CHKERRQ(ierr); * >> >> *ierr = MatGetLocalSize(ptr->K,&m,&n); CHKERRQ(ierr);* >> >> >> >> >> >> >> >> *for(i=0;i> (i*n+j)); ierr = >> MatSetValues(Kloc,1,&i,1,&j,&v,INSERT_VALUES);CHKERRQ(ierr); }* >> * } * >> >> >> >> I don't understand the purpose of this code fragment. Are you wanting to >> copy your K into Kloc? What is the purpose of Kloc? With standard usage of >> SNES one only needs to provide PETSc a global matrix, one does not need to >> work with "ghosted" matrices at all. >> >> Also normally one fills the f vector in FormFunction and the Jacobian in >> FormJacobian and there is no need to access the Jacobians at all in SNES >> https://www.mcs.anl.gov/petsc/documentation/faq.html#functionjacobian >> nor do you need to pass into FormFunction in the context the F or Jacobian >> matrix. >> >> Maybe if you explained your use case a bit more we could make suggestions >> on how to accomplish it. >> >> Barry >> >> >> >> On May 19, 2021, at 10:00 AM, Kaushik Vijaykumar >> wrote: >> >> Hi everyone, >> >> We are in the process of integrating nonlinear solution to our FE code >> through SNES. The first steps that I understood that need to be done is to >> be able to pass the assembled stiffness matrix K and force vector F to the " >> *formfunction*" to calculate the residual and "*formjacobian*" to >> calculate the tangent stiffness matrix. To do so, I have defined a struct: >> >> >> *extern PetscErrorCode FormJacobian(SNES,Vec,Mat,Mat,void*);extern >> PetscErrorCode FormFunction(SNES,Vec,Vec,void*);* >> >> // define a struct type to pass K and f as "user context" >> >> >> >> >> *typedef struct { Mat K; Vec f; } K_and_f;* >> >> In the main program, the struct is declared >> *int main()* >> *{* >> *K_and_f main_K_and_f;* // declare the struct >> >> // SNES - Populate K and f into the struct >> *main_K_and_f.K = K;* // K matrix >> *main_K_and_f.f = f; *// f vector >> .... >> .... >> *}* >> In form function >> >> >> >> >> >> >> >> >> *PetscErrorCode FormFunction(SNES snes, Vec x, Vec F, void* ctx) { >> PetscErrorCode ierr; PetscReal *ax,*c; PetscReal *aF; PetscScalar >> *Kc,v; PetscInt nlocal,m,n,i,j,index[100000]; * // Create local >> Vec and Mat >> * Mat Kloc;* >> * Vec Floc, Uloc, KUloc, resloc;* >> >> * K_and_f* ptr = (K_and_f*) ctx; *// cast the pointer to void into >> pointer to struct >> // Get local F array, FLOC >> * ierr = VecGhostGetLocalForm(ptr->f,&Floc);CHKERRQ(ierr);* >> >> >> I am able to get the f array from the main program using " >> VecGhostGetLocalForm" and the vec is correct. However, I am having trouble >> with reading the K matrix in formfunction. >> >> The following trial gives me incorrect K values in Kloc: >> *ierr = MatSeqAIJGetArray(ptr->K,&Kc);CHKERRQ(ierr); * >> >> *ierr = MatGetLocalSize(ptr->K,&m,&n); CHKERRQ(ierr);* >> >> >> >> >> >> >> >> *for(i=0;i> (i*n+j)); ierr = >> MatSetValues(Kloc,1,&i,1,&j,&v,INSERT_VALUES);CHKERRQ(ierr); }* >> * } * >> >> When I compare K and Kloc, they are not identical >> >> >> Please let me know if there is an equivalent function like >> *VecGhostGetLocalForm()* for Matrices. If not, is there a better way to >> do this. >> >> Any guidance/help is greatly appreciated. >> >> Thanks >> Kaushik >> >> >> >> >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From niko.karin at gmail.com Wed May 19 11:46:11 2021 From: niko.karin at gmail.com (Karin&NiKo) Date: Wed, 19 May 2021 18:46:11 +0200 Subject: [petsc-users] DMPlex and Boundary facets In-Reply-To: References: <888D7CE9-CCEC-4309-9EE7-FE8D0BEA0367@gmx.li> Message-ID: Thanks, I think I understand. So at the moment, the user has to manage the link between the labels of the different strata and the names. I am mainly willing to use DMPlex for AMR and names of groups are essential in our code - reading what Mat says gives me the impression that this is doable. Thanks again, Nicolas Le mer. 19 mai 2021 ? 16:02, Matthew Knepley a ?crit : > On Wed, May 19, 2021 at 9:57 AM Karin&NiKo wrote: > >> Thank you very much Lawrence. >> Are the names lost or are they saved somewhere ? >> If I do : "./ex2 -filename /tmp/test/Cube_with_facets.msh4 -dm_view >> vtk:/tmp/foo.vtk" , I only get the tets of the initial mesh. >> > > Yes, the names are lost. Jed, Lisandro and Stefano have also asked me > about this. I have not done since everything is currently done by > number in Plex. It would involve either > > a) making a separate table in the DM which translates names to numbers > > or > > b) putting such a translation table in the DMLabel itself > > I am putting it off because doing some short is not hard, but getting it > to work with everything else would be somewhat of a pain. > It would need to distribute properly, propagate to > refined/coarsened/subset meshes, etc. which labels currently do. It would > not > be properly passed on when you just set an int on a point. I think it will > happen eventually, but I don't think I have time in the next > few weeks. > > Thanks, > > Matt > > >> Thanks, >> Nicolas >> >> Le mer. 19 mai 2021 ? 15:49, Lawrence Mitchell a ?crit : >> >>> >>> >>> > On 19 May 2021, at 14:46, Karin&NiKo wrote: >>> > >>> > Dear Matthew, >>> > >>> > You are (again) right. This Gmsh test file is "dirty" : some triangles >>> do not belong to tests. Sorry for that. >>> > I have tried with another geo file (which is clean in that sense) and >>> PETSc reads with no error. >>> > >>> > I take this opportunity to ask my initial question : are the labels of >>> the Physical Entities saved somewhere ? >>> > A gmsh test file with several Physical entities is attached to this >>> email. >>> >>> Yes, plex represents Physical entities as "labels". In particular, for >>> facets, they are loaded as the "Face Sets" label, plex also loads markers >>> on cells in the "Cell Sets" label. >>> >>> e.g. here's the DMView output of your mesh. >>> >>> DM Object: DM_0x84000000_0 1 MPI processes >>> type: plex >>> DM_0x84000000_0 in 3 dimensions: >>> 0-cells: 64 >>> 1-cells: 279 >>> 2-cells: 378 >>> 3-cells: 162 >>> Labels: >>> celltype: 4 strata with value/size (0 (64), 6 (162), 3 (378), 1 (279)) >>> depth: 4 strata with value/size (0 (64), 1 (279), 2 (378), 3 (162)) >>> Cell Sets: 1 strata with value/size (7 (162)) >>> Face Sets: 6 strata with value/size (3 (18), 6 (18), 2 (18), 5 (18), 1 >>> (18), 4 (18)) >>> >>> >>> Lawrence >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kaushikv318 at gmail.com Wed May 19 12:30:08 2021 From: kaushikv318 at gmail.com (Kaushik Vijaykumar) Date: Wed, 19 May 2021 13:30:08 -0400 Subject: [petsc-users] Using SNES to solve Non linear elasticity, passing matrix to form function In-Reply-To: References: Message-ID: Matt, Thanks for the comment, I see your point now. I was using the linear elastic [K] from the previous load step as the guess for the jacobian. The stiffness matrix was being built outside formfunction using displacements from the previous load step. What is the best approach, Build [K] abd {F} inside the form function and compute the residual [K]_{i}{U}_{i} - {F}_{i}, where i designates the iterated matrices and vector for a given load step ? Thanks Kaushik On Wed, May 19, 2021 at 11:51 AM Matthew Knepley wrote: > On Wed, May 19, 2021 at 11:48 AM Kaushik Vijaykumar > wrote: > >> Barry, >> >> Thanks for your response. I am trying to copy K into Kloc, the purpose of >> doing that is to calculate the residual in formfunction - Residual = >> [Kloc]{X} - {Floc}, where Kloc and Floc are the copies of assembled global >> K matrix and F vector, and X is the current iterated solution. This is the >> reason that I need to access the global stiffness matrix and force vector >> in formfunction. >> I agree that formjacobian only needs access to global stiffness matrix K >> to form the jacobian and does not need to access the force vector. >> >> Therefore, to be able to pass both K and F to formfunction, I defined a >> struct that contains a vector F and matrix K and populated them in the >> main(). >> >> Please let me know, if this unclear and I can clarify with more details >> > > This still does not make sense. If you have a nonlinear problem, how can > you know the Jacobian up front? It changes with the input point. > > Thanks, > > Matt > > >> Thanks >> Kaushik >> >> >> On Wed, May 19, 2021 at 11:25 AM Barry Smith wrote: >> >>> *ierr = MatSeqAIJGetArray(ptr->K,&Kc);CHKERRQ(ierr); * >>> >>> *ierr = MatGetLocalSize(ptr->K,&m,&n); CHKERRQ(ierr);* >>> >>> >>> >>> >>> >>> >>> >>> *for(i=0;i>> (i*n+j)); ierr = >>> MatSetValues(Kloc,1,&i,1,&j,&v,INSERT_VALUES);CHKERRQ(ierr); }* >>> * } * >>> >>> >>> >>> I don't understand the purpose of this code fragment. Are you wanting to >>> copy your K into Kloc? What is the purpose of Kloc? With standard usage of >>> SNES one only needs to provide PETSc a global matrix, one does not need to >>> work with "ghosted" matrices at all. >>> >>> Also normally one fills the f vector in FormFunction and the Jacobian in >>> FormJacobian and there is no need to access the Jacobians at all in SNES >>> https://www.mcs.anl.gov/petsc/documentation/faq.html#functionjacobian >>> nor do you need to pass into FormFunction in the context the F or Jacobian >>> matrix. >>> >>> Maybe if you explained your use case a bit more we could make >>> suggestions on how to accomplish it. >>> >>> Barry >>> >>> >>> >>> On May 19, 2021, at 10:00 AM, Kaushik Vijaykumar >>> wrote: >>> >>> Hi everyone, >>> >>> We are in the process of integrating nonlinear solution to our FE code >>> through SNES. The first steps that I understood that need to be done is to >>> be able to pass the assembled stiffness matrix K and force vector F to the " >>> *formfunction*" to calculate the residual and "*formjacobian*" to >>> calculate the tangent stiffness matrix. To do so, I have defined a struct: >>> >>> >>> *extern PetscErrorCode FormJacobian(SNES,Vec,Mat,Mat,void*);extern >>> PetscErrorCode FormFunction(SNES,Vec,Vec,void*);* >>> >>> // define a struct type to pass K and f as "user context" >>> >>> >>> >>> >>> *typedef struct { Mat K; Vec f; } K_and_f;* >>> >>> In the main program, the struct is declared >>> *int main()* >>> *{* >>> *K_and_f main_K_and_f;* // declare the struct >>> >>> // SNES - Populate K and f into the struct >>> *main_K_and_f.K = K;* // K matrix >>> *main_K_and_f.f = f; *// f vector >>> .... >>> .... >>> *}* >>> In form function >>> >>> >>> >>> >>> >>> >>> >>> >>> *PetscErrorCode FormFunction(SNES snes, Vec x, Vec F, void* ctx) { >>> PetscErrorCode ierr; PetscReal *ax,*c; PetscReal *aF; PetscScalar >>> *Kc,v; PetscInt nlocal,m,n,i,j,index[100000]; * // Create local >>> Vec and Mat >>> * Mat Kloc;* >>> * Vec Floc, Uloc, KUloc, resloc;* >>> >>> * K_and_f* ptr = (K_and_f*) ctx; *// cast the pointer to void into >>> pointer to struct >>> // Get local F array, FLOC >>> * ierr = VecGhostGetLocalForm(ptr->f,&Floc);CHKERRQ(ierr);* >>> >>> >>> I am able to get the f array from the main program using " >>> VecGhostGetLocalForm" and the vec is correct. However, I am having trouble >>> with reading the K matrix in formfunction. >>> >>> The following trial gives me incorrect K values in Kloc: >>> *ierr = MatSeqAIJGetArray(ptr->K,&Kc);CHKERRQ(ierr); * >>> >>> *ierr = MatGetLocalSize(ptr->K,&m,&n); CHKERRQ(ierr);* >>> >>> >>> >>> >>> >>> >>> >>> *for(i=0;i>> (i*n+j)); ierr = >>> MatSetValues(Kloc,1,&i,1,&j,&v,INSERT_VALUES);CHKERRQ(ierr); }* >>> * } * >>> >>> When I compare K and Kloc, they are not identical >>> >>> >>> Please let me know if there is an equivalent function like >>> *VecGhostGetLocalForm()* for Matrices. If not, is there a better way to >>> do this. >>> >>> Any guidance/help is greatly appreciated. >>> >>> Thanks >>> Kaushik >>> >>> >>> >>> >>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 19 12:42:28 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 19 May 2021 13:42:28 -0400 Subject: [petsc-users] Using SNES to solve Non linear elasticity, passing matrix to form function In-Reply-To: References: Message-ID: On Wed, May 19, 2021 at 1:30 PM Kaushik Vijaykumar wrote: > Matt, > > Thanks for the comment, I see your point now. I was using the linear > elastic [K] from the previous load step as the guess for the jacobian. The > stiffness matrix was being built outside formfunction using displacements > from the previous load step. What is the best approach, Build [K] abd {F} > inside the form function and compute the residual [K]_{i}{U}_{i} - {F}_{i}, > where i designates the iterated matrices and vector for a given load step ? > Yes, exactly. You want the residual F(U) = 0 which for you sounds like K(U) U - F = 0 and then the Jacobian callback builds K(U) from the input U. Thanks, Matt > Thanks > Kaushik > > On Wed, May 19, 2021 at 11:51 AM Matthew Knepley > wrote: > >> On Wed, May 19, 2021 at 11:48 AM Kaushik Vijaykumar < >> kaushikv318 at gmail.com> wrote: >> >>> Barry, >>> >>> Thanks for your response. I am trying to copy K into Kloc, the purpose >>> of doing that is to calculate the residual in formfunction - Residual = >>> [Kloc]{X} - {Floc}, where Kloc and Floc are the copies of assembled global >>> K matrix and F vector, and X is the current iterated solution. This is the >>> reason that I need to access the global stiffness matrix and force vector >>> in formfunction. >>> I agree that formjacobian only needs access to global stiffness matrix K >>> to form the jacobian and does not need to access the force vector. >>> >>> Therefore, to be able to pass both K and F to formfunction, I defined a >>> struct that contains a vector F and matrix K and populated them in the >>> main(). >>> >>> Please let me know, if this unclear and I can clarify with more details >>> >> >> This still does not make sense. If you have a nonlinear problem, how can >> you know the Jacobian up front? It changes with the input point. >> >> Thanks, >> >> Matt >> >> >>> Thanks >>> Kaushik >>> >>> >>> On Wed, May 19, 2021 at 11:25 AM Barry Smith wrote: >>> >>>> *ierr = MatSeqAIJGetArray(ptr->K,&Kc);CHKERRQ(ierr); * >>>> >>>> *ierr = MatGetLocalSize(ptr->K,&m,&n); CHKERRQ(ierr);* >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *for(i=0;i>>> (i*n+j)); ierr = >>>> MatSetValues(Kloc,1,&i,1,&j,&v,INSERT_VALUES);CHKERRQ(ierr); }* >>>> * } * >>>> >>>> >>>> >>>> I don't understand the purpose of this code fragment. Are you wanting >>>> to copy your K into Kloc? What is the purpose of Kloc? With standard usage >>>> of SNES one only needs to provide PETSc a global matrix, one does not need >>>> to work with "ghosted" matrices at all. >>>> >>>> Also normally one fills the f vector in FormFunction and the Jacobian >>>> in FormJacobian and there is no need to access the Jacobians at all in SNES >>>> https://www.mcs.anl.gov/petsc/documentation/faq.html#functionjacobian >>>> nor do you need to pass into FormFunction in the context the F or Jacobian >>>> matrix. >>>> >>>> Maybe if you explained your use case a bit more we could make >>>> suggestions on how to accomplish it. >>>> >>>> Barry >>>> >>>> >>>> >>>> On May 19, 2021, at 10:00 AM, Kaushik Vijaykumar >>>> wrote: >>>> >>>> Hi everyone, >>>> >>>> We are in the process of integrating nonlinear solution to our FE code >>>> through SNES. The first steps that I understood that need to be done is to >>>> be able to pass the assembled stiffness matrix K and force vector F to the " >>>> *formfunction*" to calculate the residual and "*formjacobian*" to >>>> calculate the tangent stiffness matrix. To do so, I have defined a struct: >>>> >>>> >>>> *extern PetscErrorCode FormJacobian(SNES,Vec,Mat,Mat,void*);extern >>>> PetscErrorCode FormFunction(SNES,Vec,Vec,void*);* >>>> >>>> // define a struct type to pass K and f as "user context" >>>> >>>> >>>> >>>> >>>> *typedef struct { Mat K; Vec f; } K_and_f;* >>>> >>>> In the main program, the struct is declared >>>> *int main()* >>>> *{* >>>> *K_and_f main_K_and_f;* // declare the struct >>>> >>>> // SNES - Populate K and f into the struct >>>> *main_K_and_f.K = K;* // K matrix >>>> *main_K_and_f.f = f; *// f vector >>>> .... >>>> .... >>>> *}* >>>> In form function >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *PetscErrorCode FormFunction(SNES snes, Vec x, Vec F, void* ctx) { >>>> PetscErrorCode ierr; PetscReal *ax,*c; PetscReal *aF; PetscScalar >>>> *Kc,v; PetscInt nlocal,m,n,i,j,index[100000]; * // Create local >>>> Vec and Mat >>>> * Mat Kloc;* >>>> * Vec Floc, Uloc, KUloc, resloc;* >>>> >>>> * K_and_f* ptr = (K_and_f*) ctx; *// cast the pointer to void into >>>> pointer to struct >>>> // Get local F array, FLOC >>>> * ierr = VecGhostGetLocalForm(ptr->f,&Floc);CHKERRQ(ierr);* >>>> >>>> >>>> I am able to get the f array from the main program using " >>>> VecGhostGetLocalForm" and the vec is correct. However, I am having trouble >>>> with reading the K matrix in formfunction. >>>> >>>> The following trial gives me incorrect K values in Kloc: >>>> *ierr = MatSeqAIJGetArray(ptr->K,&Kc);CHKERRQ(ierr); * >>>> >>>> *ierr = MatGetLocalSize(ptr->K,&m,&n); CHKERRQ(ierr);* >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *for(i=0;i>>> (i*n+j)); ierr = >>>> MatSetValues(Kloc,1,&i,1,&j,&v,INSERT_VALUES);CHKERRQ(ierr); }* >>>> * } * >>>> >>>> When I compare K and Kloc, they are not identical >>>> >>>> >>>> Please let me know if there is an equivalent function like >>>> *VecGhostGetLocalForm()* for Matrices. If not, is there a better way >>>> to do this. >>>> >>>> Any guidance/help is greatly appreciated. >>>> >>>> Thanks >>>> Kaushik >>>> >>>> >>>> >>>> >>>> >>>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From kaushikv318 at gmail.com Wed May 19 12:46:15 2021 From: kaushikv318 at gmail.com (Kaushik Vijaykumar) Date: Wed, 19 May 2021 13:46:15 -0400 Subject: [petsc-users] Using SNES to solve Non linear elasticity, passing matrix to form function In-Reply-To: References: Message-ID: Thanks Matt, for pointing me in the right direction. On Wed, May 19, 2021 at 1:42 PM Matthew Knepley wrote: > On Wed, May 19, 2021 at 1:30 PM Kaushik Vijaykumar > wrote: > >> Matt, >> >> Thanks for the comment, I see your point now. I was using the linear >> elastic [K] from the previous load step as the guess for the jacobian. The >> stiffness matrix was being built outside formfunction using displacements >> from the previous load step. What is the best approach, Build [K] abd {F} >> inside the form function and compute the residual [K]_{i}{U}_{i} - {F}_{i}, >> where i designates the iterated matrices and vector for a given load step ? >> > > Yes, exactly. You want the residual > > F(U) = 0 > > which for you sounds like > > K(U) U - F = 0 > > and then the Jacobian callback builds > > K(U) > > from the input U. > > Thanks, > > Matt > > >> Thanks >> Kaushik >> >> On Wed, May 19, 2021 at 11:51 AM Matthew Knepley >> wrote: >> >>> On Wed, May 19, 2021 at 11:48 AM Kaushik Vijaykumar < >>> kaushikv318 at gmail.com> wrote: >>> >>>> Barry, >>>> >>>> Thanks for your response. I am trying to copy K into Kloc, the purpose >>>> of doing that is to calculate the residual in formfunction - Residual = >>>> [Kloc]{X} - {Floc}, where Kloc and Floc are the copies of assembled global >>>> K matrix and F vector, and X is the current iterated solution. This is the >>>> reason that I need to access the global stiffness matrix and force vector >>>> in formfunction. >>>> I agree that formjacobian only needs access to global stiffness matrix >>>> K to form the jacobian and does not need to access the force vector. >>>> >>>> Therefore, to be able to pass both K and F to formfunction, I defined a >>>> struct that contains a vector F and matrix K and populated them in the >>>> main(). >>>> >>>> Please let me know, if this unclear and I can clarify with more details >>>> >>> >>> This still does not make sense. If you have a nonlinear problem, how can >>> you know the Jacobian up front? It changes with the input point. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks >>>> Kaushik >>>> >>>> >>>> On Wed, May 19, 2021 at 11:25 AM Barry Smith wrote: >>>> >>>>> *ierr = MatSeqAIJGetArray(ptr->K,&Kc);CHKERRQ(ierr); * >>>>> >>>>> *ierr = MatGetLocalSize(ptr->K,&m,&n); CHKERRQ(ierr);* >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> *for(i=0;i>>>> + (i*n+j)); ierr = >>>>> MatSetValues(Kloc,1,&i,1,&j,&v,INSERT_VALUES);CHKERRQ(ierr); }* >>>>> * } * >>>>> >>>>> >>>>> >>>>> I don't understand the purpose of this code fragment. Are you wanting >>>>> to copy your K into Kloc? What is the purpose of Kloc? With standard usage >>>>> of SNES one only needs to provide PETSc a global matrix, one does not need >>>>> to work with "ghosted" matrices at all. >>>>> >>>>> Also normally one fills the f vector in FormFunction and the Jacobian >>>>> in FormJacobian and there is no need to access the Jacobians at all in SNES >>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html#functionjacobian >>>>> nor do you need to pass into FormFunction in the context the F or Jacobian >>>>> matrix. >>>>> >>>>> Maybe if you explained your use case a bit more we could make >>>>> suggestions on how to accomplish it. >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>> On May 19, 2021, at 10:00 AM, Kaushik Vijaykumar < >>>>> kaushikv318 at gmail.com> wrote: >>>>> >>>>> Hi everyone, >>>>> >>>>> We are in the process of integrating nonlinear solution to our FE code >>>>> through SNES. The first steps that I understood that need to be done is to >>>>> be able to pass the assembled stiffness matrix K and force vector F to the " >>>>> *formfunction*" to calculate the residual and "*formjacobian*" to >>>>> calculate the tangent stiffness matrix. To do so, I have defined a struct: >>>>> >>>>> >>>>> *extern PetscErrorCode FormJacobian(SNES,Vec,Mat,Mat,void*);extern >>>>> PetscErrorCode FormFunction(SNES,Vec,Vec,void*);* >>>>> >>>>> // define a struct type to pass K and f as "user context" >>>>> >>>>> >>>>> >>>>> >>>>> *typedef struct { Mat K; Vec f; } K_and_f;* >>>>> >>>>> In the main program, the struct is declared >>>>> *int main()* >>>>> *{* >>>>> *K_and_f main_K_and_f;* // declare the struct >>>>> >>>>> // SNES - Populate K and f into the struct >>>>> *main_K_and_f.K = K;* // K matrix >>>>> *main_K_and_f.f = f; *// f vector >>>>> .... >>>>> .... >>>>> *}* >>>>> In form function >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> *PetscErrorCode FormFunction(SNES snes, Vec x, Vec F, void* ctx) { >>>>> PetscErrorCode ierr; PetscReal *ax,*c; PetscReal *aF; PetscScalar >>>>> *Kc,v; PetscInt nlocal,m,n,i,j,index[100000]; * // Create local >>>>> Vec and Mat >>>>> * Mat Kloc;* >>>>> * Vec Floc, Uloc, KUloc, resloc;* >>>>> >>>>> * K_and_f* ptr = (K_and_f*) ctx; *// cast the pointer to void into >>>>> pointer to struct >>>>> // Get local F array, FLOC >>>>> * ierr = VecGhostGetLocalForm(ptr->f,&Floc);CHKERRQ(ierr);* >>>>> >>>>> >>>>> I am able to get the f array from the main program using " >>>>> VecGhostGetLocalForm" and the vec is correct. However, I am having trouble >>>>> with reading the K matrix in formfunction. >>>>> >>>>> The following trial gives me incorrect K values in Kloc: >>>>> *ierr = MatSeqAIJGetArray(ptr->K,&Kc);CHKERRQ(ierr); * >>>>> >>>>> *ierr = MatGetLocalSize(ptr->K,&m,&n); CHKERRQ(ierr);* >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> *for(i=0;i>>>> + (i*n+j)); ierr = >>>>> MatSetValues(Kloc,1,&i,1,&j,&v,INSERT_VALUES);CHKERRQ(ierr); }* >>>>> * } * >>>>> >>>>> When I compare K and Kloc, they are not identical >>>>> >>>>> >>>>> Please let me know if there is an equivalent function like >>>>> *VecGhostGetLocalForm()* for Matrices. If not, is there a better way >>>>> to do this. >>>>> >>>>> Any guidance/help is greatly appreciated. >>>>> >>>>> Thanks >>>>> Kaushik >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 19 12:47:35 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 19 May 2021 13:47:35 -0400 Subject: [petsc-users] DMPlex and Boundary facets In-Reply-To: References: <888D7CE9-CCEC-4309-9EE7-FE8D0BEA0367@gmx.li> Message-ID: On Wed, May 19, 2021 at 12:46 PM Karin&NiKo wrote: > Thanks, I think I understand. > So at the moment, the user has to manage the link between the labels of > the different strata and the names. > Yes. > I am mainly willing to use DMPlex for AMR and names of groups are > essential in our code - reading what Mat says > gives me the impression that this is doable. > I think it is. I will be able to do it in the middle of June if you do not have something by then. Can you make an issue? Thanks, Matt > Thanks again, > Nicolas > > Le mer. 19 mai 2021 ? 16:02, Matthew Knepley a ?crit : > >> On Wed, May 19, 2021 at 9:57 AM Karin&NiKo wrote: >> >>> Thank you very much Lawrence. >>> Are the names lost or are they saved somewhere ? >>> If I do : "./ex2 -filename /tmp/test/Cube_with_facets.msh4 -dm_view >>> vtk:/tmp/foo.vtk" , I only get the tets of the initial mesh. >>> >> >> Yes, the names are lost. Jed, Lisandro and Stefano have also asked me >> about this. I have not done since everything is currently done by >> number in Plex. It would involve either >> >> a) making a separate table in the DM which translates names to numbers >> >> or >> >> b) putting such a translation table in the DMLabel itself >> >> I am putting it off because doing some short is not hard, but getting it >> to work with everything else would be somewhat of a pain. >> It would need to distribute properly, propagate to >> refined/coarsened/subset meshes, etc. which labels currently do. It would >> not >> be properly passed on when you just set an int on a point. I think it >> will happen eventually, but I don't think I have time in the next >> few weeks. >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> Nicolas >>> >>> Le mer. 19 mai 2021 ? 15:49, Lawrence Mitchell a ?crit : >>> >>>> >>>> >>>> > On 19 May 2021, at 14:46, Karin&NiKo wrote: >>>> > >>>> > Dear Matthew, >>>> > >>>> > You are (again) right. This Gmsh test file is "dirty" : some >>>> triangles do not belong to tests. Sorry for that. >>>> > I have tried with another geo file (which is clean in that sense) and >>>> PETSc reads with no error. >>>> > >>>> > I take this opportunity to ask my initial question : are the labels >>>> of the Physical Entities saved somewhere ? >>>> > A gmsh test file with several Physical entities is attached to this >>>> email. >>>> >>>> Yes, plex represents Physical entities as "labels". In particular, for >>>> facets, they are loaded as the "Face Sets" label, plex also loads markers >>>> on cells in the "Cell Sets" label. >>>> >>>> e.g. here's the DMView output of your mesh. >>>> >>>> DM Object: DM_0x84000000_0 1 MPI processes >>>> type: plex >>>> DM_0x84000000_0 in 3 dimensions: >>>> 0-cells: 64 >>>> 1-cells: 279 >>>> 2-cells: 378 >>>> 3-cells: 162 >>>> Labels: >>>> celltype: 4 strata with value/size (0 (64), 6 (162), 3 (378), 1 (279)) >>>> depth: 4 strata with value/size (0 (64), 1 (279), 2 (378), 3 (162)) >>>> Cell Sets: 1 strata with value/size (7 (162)) >>>> Face Sets: 6 strata with value/size (3 (18), 6 (18), 2 (18), 5 (18), >>>> 1 (18), 4 (18)) >>>> >>>> >>>> Lawrence >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed May 19 12:50:21 2021 From: jed at jedbrown.org (Jed Brown) Date: Wed, 19 May 2021 11:50:21 -0600 Subject: [petsc-users] Using SNES to solve Non linear elasticity, passing matrix to form function In-Reply-To: References: Message-ID: <87pmxmboki.fsf@jedbrown.org> You may be interested in discussion of computing and representing Newton linearizations for hyperelasticity starting here: https://libceed.readthedocs.io/en/latest/examples/solids/#id4 Kaushik Vijaykumar writes: > Thanks Matt, for pointing me in the right direction. > > > > On Wed, May 19, 2021 at 1:42 PM Matthew Knepley wrote: > >> On Wed, May 19, 2021 at 1:30 PM Kaushik Vijaykumar >> wrote: >> >>> Matt, >>> >>> Thanks for the comment, I see your point now. I was using the linear >>> elastic [K] from the previous load step as the guess for the jacobian. The >>> stiffness matrix was being built outside formfunction using displacements >>> from the previous load step. What is the best approach, Build [K] abd {F} >>> inside the form function and compute the residual [K]_{i}{U}_{i} - {F}_{i}, >>> where i designates the iterated matrices and vector for a given load step ? >>> >> >> Yes, exactly. You want the residual >> >> F(U) = 0 >> >> which for you sounds like >> >> K(U) U - F = 0 >> >> and then the Jacobian callback builds >> >> K(U) >> >> from the input U. >> >> Thanks, >> >> Matt >> >> >>> Thanks >>> Kaushik >>> >>> On Wed, May 19, 2021 at 11:51 AM Matthew Knepley >>> wrote: >>> >>>> On Wed, May 19, 2021 at 11:48 AM Kaushik Vijaykumar < >>>> kaushikv318 at gmail.com> wrote: >>>> >>>>> Barry, >>>>> >>>>> Thanks for your response. I am trying to copy K into Kloc, the purpose >>>>> of doing that is to calculate the residual in formfunction - Residual = >>>>> [Kloc]{X} - {Floc}, where Kloc and Floc are the copies of assembled global >>>>> K matrix and F vector, and X is the current iterated solution. This is the >>>>> reason that I need to access the global stiffness matrix and force vector >>>>> in formfunction. >>>>> I agree that formjacobian only needs access to global stiffness matrix >>>>> K to form the jacobian and does not need to access the force vector. >>>>> >>>>> Therefore, to be able to pass both K and F to formfunction, I defined a >>>>> struct that contains a vector F and matrix K and populated them in the >>>>> main(). >>>>> >>>>> Please let me know, if this unclear and I can clarify with more details >>>>> >>>> >>>> This still does not make sense. If you have a nonlinear problem, how can >>>> you know the Jacobian up front? It changes with the input point. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thanks >>>>> Kaushik >>>>> >>>>> >>>>> On Wed, May 19, 2021 at 11:25 AM Barry Smith wrote: >>>>> >>>>>> *ierr = MatSeqAIJGetArray(ptr->K,&Kc);CHKERRQ(ierr); * >>>>>> >>>>>> *ierr = MatGetLocalSize(ptr->K,&m,&n); CHKERRQ(ierr);* >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> *for(i=0;i>>>>> + (i*n+j)); ierr = >>>>>> MatSetValues(Kloc,1,&i,1,&j,&v,INSERT_VALUES);CHKERRQ(ierr); }* >>>>>> * } * >>>>>> >>>>>> >>>>>> >>>>>> I don't understand the purpose of this code fragment. Are you wanting >>>>>> to copy your K into Kloc? What is the purpose of Kloc? With standard usage >>>>>> of SNES one only needs to provide PETSc a global matrix, one does not need >>>>>> to work with "ghosted" matrices at all. >>>>>> >>>>>> Also normally one fills the f vector in FormFunction and the Jacobian >>>>>> in FormJacobian and there is no need to access the Jacobians at all in SNES >>>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html#functionjacobian >>>>>> nor do you need to pass into FormFunction in the context the F or Jacobian >>>>>> matrix. >>>>>> >>>>>> Maybe if you explained your use case a bit more we could make >>>>>> suggestions on how to accomplish it. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>> On May 19, 2021, at 10:00 AM, Kaushik Vijaykumar < >>>>>> kaushikv318 at gmail.com> wrote: >>>>>> >>>>>> Hi everyone, >>>>>> >>>>>> We are in the process of integrating nonlinear solution to our FE code >>>>>> through SNES. The first steps that I understood that need to be done is to >>>>>> be able to pass the assembled stiffness matrix K and force vector F to the " >>>>>> *formfunction*" to calculate the residual and "*formjacobian*" to >>>>>> calculate the tangent stiffness matrix. To do so, I have defined a struct: >>>>>> >>>>>> >>>>>> *extern PetscErrorCode FormJacobian(SNES,Vec,Mat,Mat,void*);extern >>>>>> PetscErrorCode FormFunction(SNES,Vec,Vec,void*);* >>>>>> >>>>>> // define a struct type to pass K and f as "user context" >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> *typedef struct { Mat K; Vec f; } K_and_f;* >>>>>> >>>>>> In the main program, the struct is declared >>>>>> *int main()* >>>>>> *{* >>>>>> *K_and_f main_K_and_f;* // declare the struct >>>>>> >>>>>> // SNES - Populate K and f into the struct >>>>>> *main_K_and_f.K = K;* // K matrix >>>>>> *main_K_and_f.f = f; *// f vector >>>>>> .... >>>>>> .... >>>>>> *}* >>>>>> In form function >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> *PetscErrorCode FormFunction(SNES snes, Vec x, Vec F, void* ctx) { >>>>>> PetscErrorCode ierr; PetscReal *ax,*c; PetscReal *aF; PetscScalar >>>>>> *Kc,v; PetscInt nlocal,m,n,i,j,index[100000]; * // Create local >>>>>> Vec and Mat >>>>>> * Mat Kloc;* >>>>>> * Vec Floc, Uloc, KUloc, resloc;* >>>>>> >>>>>> * K_and_f* ptr = (K_and_f*) ctx; *// cast the pointer to void into >>>>>> pointer to struct >>>>>> // Get local F array, FLOC >>>>>> * ierr = VecGhostGetLocalForm(ptr->f,&Floc);CHKERRQ(ierr);* >>>>>> >>>>>> >>>>>> I am able to get the f array from the main program using " >>>>>> VecGhostGetLocalForm" and the vec is correct. However, I am having trouble >>>>>> with reading the K matrix in formfunction. >>>>>> >>>>>> The following trial gives me incorrect K values in Kloc: >>>>>> *ierr = MatSeqAIJGetArray(ptr->K,&Kc);CHKERRQ(ierr); * >>>>>> >>>>>> *ierr = MatGetLocalSize(ptr->K,&m,&n); CHKERRQ(ierr);* >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> *for(i=0;i>>>>> + (i*n+j)); ierr = >>>>>> MatSetValues(Kloc,1,&i,1,&j,&v,INSERT_VALUES);CHKERRQ(ierr); }* >>>>>> * } * >>>>>> >>>>>> When I compare K and Kloc, they are not identical >>>>>> >>>>>> >>>>>> Please let me know if there is an equivalent function like >>>>>> *VecGhostGetLocalForm()* for Matrices. If not, is there a better way >>>>>> to do this. >>>>>> >>>>>> Any guidance/help is greatly appreciated. >>>>>> >>>>>> Thanks >>>>>> Kaushik >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> From niko.karin at gmail.com Wed May 19 15:00:58 2021 From: niko.karin at gmail.com (Karin&NiKo) Date: Wed, 19 May 2021 22:00:58 +0200 Subject: [petsc-users] DMPlex and Boundary facets In-Reply-To: References: <888D7CE9-CCEC-4309-9EE7-FE8D0BEA0367@gmx.li> Message-ID: Sure, no problem. Thanks, Nicolas Le mer. 19 mai 2021 ? 19:47, Matthew Knepley a ?crit : > On Wed, May 19, 2021 at 12:46 PM Karin&NiKo wrote: > >> Thanks, I think I understand. >> So at the moment, the user has to manage the link between the labels of >> the different strata and the names. >> > > Yes. > > >> I am mainly willing to use DMPlex for AMR and names of groups are >> essential in our code - reading what Mat says >> gives me the impression that this is doable. >> > > I think it is. I will be able to do it in the middle of June if you do not > have something by then. Can you make an issue? > > Thanks, > > Matt > > >> Thanks again, >> Nicolas >> >> Le mer. 19 mai 2021 ? 16:02, Matthew Knepley a >> ?crit : >> >>> On Wed, May 19, 2021 at 9:57 AM Karin&NiKo wrote: >>> >>>> Thank you very much Lawrence. >>>> Are the names lost or are they saved somewhere ? >>>> If I do : "./ex2 -filename /tmp/test/Cube_with_facets.msh4 -dm_view >>>> vtk:/tmp/foo.vtk" , I only get the tets of the initial mesh. >>>> >>> >>> Yes, the names are lost. Jed, Lisandro and Stefano have also asked me >>> about this. I have not done since everything is currently done by >>> number in Plex. It would involve either >>> >>> a) making a separate table in the DM which translates names to numbers >>> >>> or >>> >>> b) putting such a translation table in the DMLabel itself >>> >>> I am putting it off because doing some short is not hard, but getting it >>> to work with everything else would be somewhat of a pain. >>> It would need to distribute properly, propagate to >>> refined/coarsened/subset meshes, etc. which labels currently do. It would >>> not >>> be properly passed on when you just set an int on a point. I think it >>> will happen eventually, but I don't think I have time in the next >>> few weeks. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks, >>>> Nicolas >>>> >>>> Le mer. 19 mai 2021 ? 15:49, Lawrence Mitchell a ?crit : >>>> >>>>> >>>>> >>>>> > On 19 May 2021, at 14:46, Karin&NiKo wrote: >>>>> > >>>>> > Dear Matthew, >>>>> > >>>>> > You are (again) right. This Gmsh test file is "dirty" : some >>>>> triangles do not belong to tests. Sorry for that. >>>>> > I have tried with another geo file (which is clean in that sense) >>>>> and PETSc reads with no error. >>>>> > >>>>> > I take this opportunity to ask my initial question : are the labels >>>>> of the Physical Entities saved somewhere ? >>>>> > A gmsh test file with several Physical entities is attached to this >>>>> email. >>>>> >>>>> Yes, plex represents Physical entities as "labels". In particular, for >>>>> facets, they are loaded as the "Face Sets" label, plex also loads markers >>>>> on cells in the "Cell Sets" label. >>>>> >>>>> e.g. here's the DMView output of your mesh. >>>>> >>>>> DM Object: DM_0x84000000_0 1 MPI processes >>>>> type: plex >>>>> DM_0x84000000_0 in 3 dimensions: >>>>> 0-cells: 64 >>>>> 1-cells: 279 >>>>> 2-cells: 378 >>>>> 3-cells: 162 >>>>> Labels: >>>>> celltype: 4 strata with value/size (0 (64), 6 (162), 3 (378), 1 >>>>> (279)) >>>>> depth: 4 strata with value/size (0 (64), 1 (279), 2 (378), 3 (162)) >>>>> Cell Sets: 1 strata with value/size (7 (162)) >>>>> Face Sets: 6 strata with value/size (3 (18), 6 (18), 2 (18), 5 (18), >>>>> 1 (18), 4 (18)) >>>>> >>>>> >>>>> Lawrence >>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sayosale at hotmail.com Thu May 20 03:25:41 2021 From: sayosale at hotmail.com (dazza simplythebest) Date: Thu, 20 May 2021 08:25:41 +0000 Subject: [petsc-users] Code hangs when calling PetscIntView (MPI, fortran) Message-ID: Dear All, As part of preparing a code to call the SLEPC eigenvalue solving library, I am constructing a matrix in sparse CSR format row-by-row. Just for debugging purposes I write out the column values for a given row, which are stored in a PetscInt allocatable vector, using PetscIntView. Everything works fine when the number of MPI processes exactly divide the number of rows of the matrix, and so each process owns the same number of rows. However, when the number of MPI processes does not exactly divide the number of rows of the matrix, and so each process owns a different number of rows, the code hangs when it reaches the line that calls PetscIntView. To be precise the code hangs on the final row that a process, other than root, owns. If I however comment out the call to PetscIntView the code completes without error, and produces the correct eigenvalues (hence we are not missing a row / miswriting a row). Note also that a simple direct writeout of this same array using a plain fortran command will write out the array without problem. I have attached below a small code that reproduces the problem. For this code we have nominally assigned 200 rows to our matrix. The code runs without problem using 1,2,4,5,8 or 10 MPI processes, all of which precisely divide 200, but will hang for 3 MPI processes for example. For the case of 3 MPI processes the subroutine WHOSE_ROW_IS_IT allocates the rows to each process as : process no first row last row no. of rows 0 1 66 66 1 67 133 67 2 134 200 67 The code will hang when process 1 calls PetscIntView for its last row, row 133 for example. One piece of additional information that may be relevant is that the code does run to completion without hanging if I comment out the final slepc/MPI finalisation command CALL SlepcFinalize(ierr_pets) (I of course I get ' bad termination' errors, but the otherwise the run is successful.) I would appreciate it if anyone has any ideas on what is going wrong! Many thanks, Dan. code: MODULE ALL_STAB_ROUTINES IMPLICIT NONE CONTAINS SUBROUTINE WHOSE_ROW_IS_IT(ROW_NO, TOTAL_NO_ROWS, NO_PROCESSES, & & OWNER) ! THIS ROUTINE ALLOCATES ROWS EVENLY BETWEEN mpi PROCESSES #include use slepceps IMPLICIT NONE PetscInt, INTENT(IN) :: ROW_NO, TOTAL_NO_ROWS, NO_PROCESSES PetscInt, INTENT(OUT) :: OWNER PetscInt :: P, REM P = TOTAL_NO_ROWS / NO_PROCESSES ! NOTE INTEGER DIVISION REM = TOTAL_NO_ROWS - P*NO_PROCESSES IF (ROW_NO < (NO_PROCESSES - REM)*P + 1 ) THEN OWNER = (ROW_NO - 1)/P ! NOTE INTEGER DIVISION ELSE OWNER = ( ROW_NO + NO_PROCESSES - REM -1 )/(P+1) ! NOTE INTEGER DIVISION ENDIF END SUBROUTINE WHOSE_ROW_IS_IT END MODULE ALL_STAB_ROUTINES PROGRAM trialer USE MPI #include use slepceps USE ALL_STAB_ROUTINES IMPLICIT NONE PetscMPIInt rank3, total_mpi_size PetscInt nl3, code, PROC_ROW, ISTATUS, jm, N_rows,NO_A_ENTRIES PetscInt, ALLOCATABLE, DIMENSION(:) :: JALOC PetscInt, PARAMETER :: ZERO = 0 , ONE = 1, TWO = 2, THREE = 3 PetscErrorCode ierr_pets ! Initialise sleps/mpi call SlepcInitialize(PETSC_NULL_CHARACTER,ierr_pets) ! note that this initialises MPI call MPI_COMM_SIZE(MPI_COMM_WORLD, total_mpi_size, ierr_pets) !! find total no of MPI processes nL3= total_mpi_size call MPI_COMM_RANK(MPI_COMM_WORLD,rank3,ierr_pets) !! find my overall rank -> rank3 write(*,*)'Welcome: PROCESS NO , TOTAL NO. OF PROCESSES = ',rank3, nl3 N_rows = 200 ! NUMBER OF ROWS OF A NOTIONAL MATRIX NO_A_ENTRIES = 12 ! NUMBER OF ENTRIES FOR JALOC ! LOOP OVER ROWS do jm = 1, N_rows CALL whose_row_is_it(JM, N_rows , NL3, PROC_ROW) ! FIND OUT WHICH PROCESS OWNS ROW if (rank3 == PROC_ROW) then ! IF mpi PROCESS OWNS THIS ROW THEN .. ! ALLOCATE jaloc ARRAY AND INITIALISE allocate(jaloc(NO_A_ENTRIES), STAT=ISTATUS ) jaloc = three WRITE(*,*)'JALOC',JALOC ! THIS SIMPLE PLOT ALWAYS WORKS write(*,*)'calling PetscIntView: PROCESS NO. ROW NO.',rank3, jm ! THIS CALL TO PetscIntView CAUSES CODE TO HANG WHEN E.G. total_mpi_size=3, JM=133 call PetscIntView(NO_A_ENTRIES,JALOC(1:NO_A_ENTRIES), & & PETSC_VIEWER_STDOUT_WORLD, ierr_pets) CHKERRA(ierr_pets) deallocate(jaloc) endif enddo CALL SlepcFinalize(ierr_pets) end program trialer -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Thu May 20 04:06:31 2021 From: jroman at dsic.upv.es (Jose E. Roman) Date: Thu, 20 May 2021 11:06:31 +0200 Subject: [petsc-users] Code hangs when calling PetscIntView (MPI, fortran) In-Reply-To: References: Message-ID: If you look at the manpage https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscIntView.html you will see that PetscIntView() is collective. This means that all MPI processes must call this function, so it is forbidden to call it within an IF rank==... Jose > El 20 may 2021, a las 10:25, dazza simplythebest escribi?: > > Dear All, > As part of preparing a code to call the SLEPC eigenvalue solving library, > I am constructing a matrix in sparse CSR format row-by-row. Just for debugging > purposes I write out the column values for a given row, which are stored in a > PetscInt allocatable vector, using PetscIntView. > > Everything works fine when the number of MPI processes exactly divide the > number of rows of the matrix, and so each process owns the same number of rows. > However, when the number of MPI processes does not exactly divide the > number of rows of the matrix, and so each process owns a different number of rows, > the code hangs when it reaches the line that calls PetscIntView. > To be precise the code hangs on the final row that a process, other than root, owns. > If I however comment out the call to PetscIntView the code completes without error, > and produces the correct eigenvalues (hence we are not missing a row / miswriting a row). > Note also that a simple direct writeout of this same array using a plain fortran command > will write out the array without problem. > > I have attached below a small code that reproduces the problem. > For this code we have nominally assigned 200 rows to our matrix. The code runs without > problem using 1,2,4,5,8 or 10 MPI processes, all of which precisely divide 200, > but will hang for 3 MPI processes for example. > For the case of 3 MPI processes the subroutine WHOSE_ROW_IS_IT allocates the rows > to each process as : > process no first row last row no. of rows > 0 1 66 66 > 1 67 133 67 > 2 134 200 67 > > The code will hang when process 1 calls PetscIntView for its last row, row 133 for example. > > One piece of additional information that may be relevant is that the code does run to completion > without hanging if I comment out the final slepc/MPI finalisation command > CALL SlepcFinalize(ierr_pets) > (I of course I get ' bad termination' errors, but the otherwise the run is successful.) > > I would appreciate it if anyone has any ideas on what is going wrong! > Many thanks, > Dan. > > > code: > > MODULE ALL_STAB_ROUTINES > IMPLICIT NONE > CONTAINS > > SUBROUTINE WHOSE_ROW_IS_IT(ROW_NO, TOTAL_NO_ROWS, NO_PROCESSES, & > & OWNER) > ! THIS ROUTINE ALLOCATES ROWS EVENLY BETWEEN mpi PROCESSES > #include > use slepceps > IMPLICIT NONE > PetscInt, INTENT(IN) :: ROW_NO, TOTAL_NO_ROWS, NO_PROCESSES > PetscInt, INTENT(OUT) :: OWNER > PetscInt :: P, REM > > P = TOTAL_NO_ROWS / NO_PROCESSES ! NOTE INTEGER DIVISION > REM = TOTAL_NO_ROWS - P*NO_PROCESSES > IF (ROW_NO < (NO_PROCESSES - REM)*P + 1 ) THEN > OWNER = (ROW_NO - 1)/P ! NOTE INTEGER DIVISION > ELSE > OWNER = ( ROW_NO + NO_PROCESSES - REM -1 )/(P+1) ! NOTE INTEGER DIVISION > ENDIF > END SUBROUTINE WHOSE_ROW_IS_IT > END MODULE ALL_STAB_ROUTINES > > > PROGRAM trialer > USE MPI > #include > use slepceps > USE ALL_STAB_ROUTINES > IMPLICIT NONE > PetscMPIInt rank3, total_mpi_size > PetscInt nl3, code, PROC_ROW, ISTATUS, jm, N_rows,NO_A_ENTRIES > PetscInt, ALLOCATABLE, DIMENSION(:) :: JALOC > PetscInt, PARAMETER :: ZERO = 0 , ONE = 1, TWO = 2, THREE = 3 > PetscErrorCode ierr_pets > > ! Initialise sleps/mpi > call SlepcInitialize(PETSC_NULL_CHARACTER,ierr_pets) ! note that this initialises MPI > call MPI_COMM_SIZE(MPI_COMM_WORLD, total_mpi_size, ierr_pets) !! find total no of MPI processes > nL3= total_mpi_size > call MPI_COMM_RANK(MPI_COMM_WORLD,rank3,ierr_pets) !! find my overall rank -> rank3 > write(*,*)'Welcome: PROCESS NO , TOTAL NO. OF PROCESSES = ',rank3, nl3 > > N_rows = 200 ! NUMBER OF ROWS OF A NOTIONAL MATRIX > NO_A_ENTRIES = 12 ! NUMBER OF ENTRIES FOR JALOC > > ! LOOP OVER ROWS > do jm = 1, N_rows > > CALL whose_row_is_it(JM, N_rows , NL3, PROC_ROW) ! FIND OUT WHICH PROCESS OWNS ROW > if (rank3 == PROC_ROW) then ! IF mpi PROCESS OWNS THIS ROW THEN .. > ! ALLOCATE jaloc ARRAY AND INITIALISE > > allocate(jaloc(NO_A_ENTRIES), STAT=ISTATUS ) > jaloc = three > > > WRITE(*,*)'JALOC',JALOC ! THIS SIMPLE PLOT ALWAYS WORKS > write(*,*)'calling PetscIntView: PROCESS NO. ROW NO.',rank3, jm > ! THIS CALL TO PetscIntView CAUSES CODE TO HANG WHEN E.G. total_mpi_size=3, JM=133 > call PetscIntView(NO_A_ENTRIES,JALOC(1:NO_A_ENTRIES), & > & PETSC_VIEWER_STDOUT_WORLD, ierr_pets) > CHKERRA(ierr_pets) > deallocate(jaloc) > endif > enddo > > CALL SlepcFinalize(ierr_pets) > end program trialer From sayosale at hotmail.com Thu May 20 04:32:26 2021 From: sayosale at hotmail.com (dazza simplythebest) Date: Thu, 20 May 2021 09:32:26 +0000 Subject: [petsc-users] Code hangs when calling PetscIntView (MPI, fortran) In-Reply-To: References: , Message-ID: Dear Jose, Many thanks for the prompt explanation - that would definitely explain what is going on, I will adjust my code accordingly . Thanks again, Dan. ________________________________ From: Jose E. Roman Sent: Thursday, May 20, 2021 9:06 AM To: dazza simplythebest Cc: PETSc users list Subject: Re: [petsc-users] Code hangs when calling PetscIntView (MPI, fortran) If you look at the manpage https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscIntView.html you will see that PetscIntView() is collective. This means that all MPI processes must call this function, so it is forbidden to call it within an IF rank==... Jose > El 20 may 2021, a las 10:25, dazza simplythebest escribi?: > > Dear All, > As part of preparing a code to call the SLEPC eigenvalue solving library, > I am constructing a matrix in sparse CSR format row-by-row. Just for debugging > purposes I write out the column values for a given row, which are stored in a > PetscInt allocatable vector, using PetscIntView. > > Everything works fine when the number of MPI processes exactly divide the > number of rows of the matrix, and so each process owns the same number of rows. > However, when the number of MPI processes does not exactly divide the > number of rows of the matrix, and so each process owns a different number of rows, > the code hangs when it reaches the line that calls PetscIntView. > To be precise the code hangs on the final row that a process, other than root, owns. > If I however comment out the call to PetscIntView the code completes without error, > and produces the correct eigenvalues (hence we are not missing a row / miswriting a row). > Note also that a simple direct writeout of this same array using a plain fortran command > will write out the array without problem. > > I have attached below a small code that reproduces the problem. > For this code we have nominally assigned 200 rows to our matrix. The code runs without > problem using 1,2,4,5,8 or 10 MPI processes, all of which precisely divide 200, > but will hang for 3 MPI processes for example. > For the case of 3 MPI processes the subroutine WHOSE_ROW_IS_IT allocates the rows > to each process as : > process no first row last row no. of rows > 0 1 66 66 > 1 67 133 67 > 2 134 200 67 > > The code will hang when process 1 calls PetscIntView for its last row, row 133 for example. > > One piece of additional information that may be relevant is that the code does run to completion > without hanging if I comment out the final slepc/MPI finalisation command > CALL SlepcFinalize(ierr_pets) > (I of course I get ' bad termination' errors, but the otherwise the run is successful.) > > I would appreciate it if anyone has any ideas on what is going wrong! > Many thanks, > Dan. > > > code: > > MODULE ALL_STAB_ROUTINES > IMPLICIT NONE > CONTAINS > > SUBROUTINE WHOSE_ROW_IS_IT(ROW_NO, TOTAL_NO_ROWS, NO_PROCESSES, & > & OWNER) > ! THIS ROUTINE ALLOCATES ROWS EVENLY BETWEEN mpi PROCESSES > #include > use slepceps > IMPLICIT NONE > PetscInt, INTENT(IN) :: ROW_NO, TOTAL_NO_ROWS, NO_PROCESSES > PetscInt, INTENT(OUT) :: OWNER > PetscInt :: P, REM > > P = TOTAL_NO_ROWS / NO_PROCESSES ! NOTE INTEGER DIVISION > REM = TOTAL_NO_ROWS - P*NO_PROCESSES > IF (ROW_NO < (NO_PROCESSES - REM)*P + 1 ) THEN > OWNER = (ROW_NO - 1)/P ! NOTE INTEGER DIVISION > ELSE > OWNER = ( ROW_NO + NO_PROCESSES - REM -1 )/(P+1) ! NOTE INTEGER DIVISION > ENDIF > END SUBROUTINE WHOSE_ROW_IS_IT > END MODULE ALL_STAB_ROUTINES > > > PROGRAM trialer > USE MPI > #include > use slepceps > USE ALL_STAB_ROUTINES > IMPLICIT NONE > PetscMPIInt rank3, total_mpi_size > PetscInt nl3, code, PROC_ROW, ISTATUS, jm, N_rows,NO_A_ENTRIES > PetscInt, ALLOCATABLE, DIMENSION(:) :: JALOC > PetscInt, PARAMETER :: ZERO = 0 , ONE = 1, TWO = 2, THREE = 3 > PetscErrorCode ierr_pets > > ! Initialise sleps/mpi > call SlepcInitialize(PETSC_NULL_CHARACTER,ierr_pets) ! note that this initialises MPI > call MPI_COMM_SIZE(MPI_COMM_WORLD, total_mpi_size, ierr_pets) !! find total no of MPI processes > nL3= total_mpi_size > call MPI_COMM_RANK(MPI_COMM_WORLD,rank3,ierr_pets) !! find my overall rank -> rank3 > write(*,*)'Welcome: PROCESS NO , TOTAL NO. OF PROCESSES = ',rank3, nl3 > > N_rows = 200 ! NUMBER OF ROWS OF A NOTIONAL MATRIX > NO_A_ENTRIES = 12 ! NUMBER OF ENTRIES FOR JALOC > > ! LOOP OVER ROWS > do jm = 1, N_rows > > CALL whose_row_is_it(JM, N_rows , NL3, PROC_ROW) ! FIND OUT WHICH PROCESS OWNS ROW > if (rank3 == PROC_ROW) then ! IF mpi PROCESS OWNS THIS ROW THEN .. > ! ALLOCATE jaloc ARRAY AND INITIALISE > > allocate(jaloc(NO_A_ENTRIES), STAT=ISTATUS ) > jaloc = three > > > WRITE(*,*)'JALOC',JALOC ! THIS SIMPLE PLOT ALWAYS WORKS > write(*,*)'calling PetscIntView: PROCESS NO. ROW NO.',rank3, jm > ! THIS CALL TO PetscIntView CAUSES CODE TO HANG WHEN E.G. total_mpi_size=3, JM=133 > call PetscIntView(NO_A_ENTRIES,JALOC(1:NO_A_ENTRIES), & > & PETSC_VIEWER_STDOUT_WORLD, ierr_pets) > CHKERRA(ierr_pets) > deallocate(jaloc) > endif > enddo > > CALL SlepcFinalize(ierr_pets) > end program trialer -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu May 20 05:31:28 2021 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 20 May 2021 06:31:28 -0400 Subject: [petsc-users] Code hangs when calling PetscIntView (MPI, fortran) In-Reply-To: References: Message-ID: On Thu, May 20, 2021 at 5:32 AM dazza simplythebest wrote: > Dear Jose, > Many thanks for the prompt explanation - that would > definitely explain what is going on, > I will adjust my code accordingly . > If you want to print different things from each process in parallel, I suggest https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscSynchronizedPrintf.html Thanks, Matt > Thanks again, > Dan. > > ------------------------------ > *From:* Jose E. Roman > *Sent:* Thursday, May 20, 2021 9:06 AM > *To:* dazza simplythebest > *Cc:* PETSc users list > *Subject:* Re: [petsc-users] Code hangs when calling PetscIntView (MPI, > fortran) > > If you look at the manpage > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscIntView.html > you will see that PetscIntView() is collective. This means that all MPI > processes must call this function, so it is forbidden to call it within an > IF rank==... > > Jose > > > El 20 may 2021, a las 10:25, dazza simplythebest > escribi?: > > > > Dear All, > > As part of preparing a code to call the SLEPC eigenvalue > solving library, > > I am constructing a matrix in sparse CSR format row-by-row. Just for > debugging > > purposes I write out the column values for a given row, which are stored > in a > > PetscInt allocatable vector, using PetscIntView. > > > > Everything works fine when the number of MPI processes exactly divide the > > number of rows of the matrix, and so each process owns the same number > of rows. > > However, when the number of MPI processes does not exactly divide the > > number of rows of the matrix, and so each process owns a different > number of rows, > > the code hangs when it reaches the line that calls PetscIntView. > > To be precise the code hangs on the final row that a process, other than > root, owns. > > If I however comment out the call to PetscIntView the code completes > without error, > > and produces the correct eigenvalues (hence we are not missing a row / > miswriting a row). > > Note also that a simple direct writeout of this same array using a > plain fortran command > > will write out the array without problem. > > > > I have attached below a small code that reproduces the problem. > > For this code we have nominally assigned 200 rows to our matrix. The > code runs without > > problem using 1,2,4,5,8 or 10 MPI processes, all of which precisely > divide 200, > > but will hang for 3 MPI processes for example. > > For the case of 3 MPI processes the subroutine WHOSE_ROW_IS_IT allocates > the rows > > to each process as : > > process no first row last row no. of rows > > 0 1 66 > 66 > > 1 67 133 67 > > 2 134 200 67 > > > > The code will hang when process 1 calls PetscIntView for its last row, > row 133 for example. > > > > One piece of additional information that may be relevant is that the > code does run to completion > > without hanging if I comment out the final slepc/MPI finalisation > command > > CALL SlepcFinalize(ierr_pets) > > (I of course I get ' bad termination' errors, but the otherwise the run > is successful.) > > > > I would appreciate it if anyone has any ideas on what is going wrong! > > Many thanks, > > Dan. > > > > > > code: > > > > MODULE ALL_STAB_ROUTINES > > IMPLICIT NONE > > CONTAINS > > > > SUBROUTINE WHOSE_ROW_IS_IT(ROW_NO, TOTAL_NO_ROWS, > NO_PROCESSES, & > > & OWNER) > > ! THIS ROUTINE ALLOCATES ROWS EVENLY BETWEEN mpi PROCESSES > > #include > > use slepceps > > IMPLICIT NONE > > PetscInt, INTENT(IN) :: ROW_NO, TOTAL_NO_ROWS, NO_PROCESSES > > PetscInt, INTENT(OUT) :: OWNER > > PetscInt :: P, REM > > > > P = TOTAL_NO_ROWS / NO_PROCESSES ! NOTE INTEGER DIVISION > > REM = TOTAL_NO_ROWS - P*NO_PROCESSES > > IF (ROW_NO < (NO_PROCESSES - REM)*P + 1 ) THEN > > OWNER = (ROW_NO - 1)/P ! NOTE INTEGER DIVISION > > ELSE > > OWNER = ( ROW_NO + NO_PROCESSES - REM -1 )/(P+1) ! NOTE > INTEGER DIVISION > > ENDIF > > END SUBROUTINE WHOSE_ROW_IS_IT > > END MODULE ALL_STAB_ROUTINES > > > > > > PROGRAM trialer > > USE MPI > > #include > > use slepceps > > USE ALL_STAB_ROUTINES > > IMPLICIT NONE > > PetscMPIInt rank3, total_mpi_size > > PetscInt nl3, code, PROC_ROW, ISTATUS, jm, N_rows,NO_A_ENTRIES > > PetscInt, ALLOCATABLE, DIMENSION(:) :: JALOC > > PetscInt, PARAMETER :: ZERO = 0 , ONE = 1, TWO = 2, THREE = 3 > > PetscErrorCode ierr_pets > > > > ! Initialise sleps/mpi > > call SlepcInitialize(PETSC_NULL_CHARACTER,ierr_pets) ! note that > this initialises MPI > > call MPI_COMM_SIZE(MPI_COMM_WORLD, total_mpi_size, ierr_pets) !! > find total no of MPI processes > > nL3= total_mpi_size > > call MPI_COMM_RANK(MPI_COMM_WORLD,rank3,ierr_pets) !! find my > overall rank -> rank3 > > write(*,*)'Welcome: PROCESS NO , TOTAL NO. OF PROCESSES = > ',rank3, nl3 > > > > N_rows = 200 ! NUMBER OF ROWS OF A NOTIONAL MATRIX > > NO_A_ENTRIES = 12 ! NUMBER OF ENTRIES FOR JALOC > > > > ! LOOP OVER ROWS > > do jm = 1, N_rows > > > > CALL whose_row_is_it(JM, N_rows , NL3, PROC_ROW) ! FIND OUT WHICH > PROCESS OWNS ROW > > if (rank3 == PROC_ROW) then ! IF mpi PROCESS OWNS THIS ROW THEN .. > > ! ALLOCATE jaloc ARRAY AND INITIALISE > > > > allocate(jaloc(NO_A_ENTRIES), STAT=ISTATUS ) > > jaloc = three > > > > > > WRITE(*,*)'JALOC',JALOC ! THIS SIMPLE PLOT ALWAYS WORKS > > write(*,*)'calling PetscIntView: PROCESS NO. ROW NO.',rank3, jm > > ! THIS CALL TO PetscIntView CAUSES CODE TO HANG WHEN E.G. > total_mpi_size=3, JM=133 > > call > PetscIntView(NO_A_ENTRIES,JALOC(1:NO_A_ENTRIES), & > > & PETSC_VIEWER_STDOUT_WORLD, ierr_pets) > > CHKERRA(ierr_pets) > > deallocate(jaloc) > > endif > > enddo > > > > CALL SlepcFinalize(ierr_pets) > > end program trialer > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From vikram.bhamidipati at swri.org Thu May 20 11:25:38 2021 From: vikram.bhamidipati at swri.org (Bhamidipati, Vikram) Date: Thu, 20 May 2021 16:25:38 +0000 Subject: [petsc-users] nghosts in 'DMMoabLoadFromFile' Message-ID: Hello, I am in the process of updating PetSc versions and I see that 'DMMoabLoadFromFile' function (in dmmutil.cxx) has a new argument 'nghosts'. For those of us who don't use ghost cells should we set it to 0? Thanks, Vikram --------------------------------------------------- Vikram Bhamidipati Senior Research Engineer Computational Material Integrity Section Materials Engineering Department Mechanical Engineering Division Ph: (210) 522-2576 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu May 20 12:00:04 2021 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 20 May 2021 12:00:04 -0500 Subject: [petsc-users] Code hangs when calling PetscIntView (MPI, fortran) In-Reply-To: References: Message-ID: You can also have the processes with no values print an array of length zero. Like if (rank3 == PROC_ROW) then ! IF mpi PROCESS OWNS THIS ROW THEN .. .. else NO_A_ENTRIES = 0 call PetscIntView(NO_A_ENTRIES,JALOC(1:NO_A_ENTRIES), & & PETSC_VIEWER_STDOUT_WORLD, ierr_pets) > On May 20, 2021, at 5:31 AM, Matthew Knepley wrote: > > On Thu, May 20, 2021 at 5:32 AM dazza simplythebest > wrote: > Dear Jose, > Many thanks for the prompt explanation - that would definitely explain what is going on, > I will adjust my code accordingly . > > If you want to print different things from each process in parallel, I suggest > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscSynchronizedPrintf.html > > Thanks, > > Matt > > Thanks again, > Dan. > > From: Jose E. Roman > > Sent: Thursday, May 20, 2021 9:06 AM > To: dazza simplythebest > > Cc: PETSc users list > > Subject: Re: [petsc-users] Code hangs when calling PetscIntView (MPI, fortran) > > If you look at the manpage https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscIntView.html you will see that PetscIntView() is collective. This means that all MPI processes must call this function, so it is forbidden to call it within an IF rank==... > > Jose > > > El 20 may 2021, a las 10:25, dazza simplythebest > escribi?: > > > > Dear All, > > As part of preparing a code to call the SLEPC eigenvalue solving library, > > I am constructing a matrix in sparse CSR format row-by-row. Just for debugging > > purposes I write out the column values for a given row, which are stored in a > > PetscInt allocatable vector, using PetscIntView. > > > > Everything works fine when the number of MPI processes exactly divide the > > number of rows of the matrix, and so each process owns the same number of rows. > > However, when the number of MPI processes does not exactly divide the > > number of rows of the matrix, and so each process owns a different number of rows, > > the code hangs when it reaches the line that calls PetscIntView. > > To be precise the code hangs on the final row that a process, other than root, owns. > > If I however comment out the call to PetscIntView the code completes without error, > > and produces the correct eigenvalues (hence we are not missing a row / miswriting a row). > > Note also that a simple direct writeout of this same array using a plain fortran command > > will write out the array without problem. > > > > I have attached below a small code that reproduces the problem. > > For this code we have nominally assigned 200 rows to our matrix. The code runs without > > problem using 1,2,4,5,8 or 10 MPI processes, all of which precisely divide 200, > > but will hang for 3 MPI processes for example. > > For the case of 3 MPI processes the subroutine WHOSE_ROW_IS_IT allocates the rows > > to each process as : > > process no first row last row no. of rows > > 0 1 66 66 > > 1 67 133 67 > > 2 134 200 67 > > > > The code will hang when process 1 calls PetscIntView for its last row, row 133 for example. > > > > One piece of additional information that may be relevant is that the code does run to completion > > without hanging if I comment out the final slepc/MPI finalisation command > > CALL SlepcFinalize(ierr_pets) > > (I of course I get ' bad termination' errors, but the otherwise the run is successful.) > > > > I would appreciate it if anyone has any ideas on what is going wrong! > > Many thanks, > > Dan. > > > > > > code: > > > > MODULE ALL_STAB_ROUTINES > > IMPLICIT NONE > > CONTAINS > > > > SUBROUTINE WHOSE_ROW_IS_IT(ROW_NO, TOTAL_NO_ROWS, NO_PROCESSES, & > > & OWNER) > > ! THIS ROUTINE ALLOCATES ROWS EVENLY BETWEEN mpi PROCESSES > > #include > > use slepceps > > IMPLICIT NONE > > PetscInt, INTENT(IN) :: ROW_NO, TOTAL_NO_ROWS, NO_PROCESSES > > PetscInt, INTENT(OUT) :: OWNER > > PetscInt :: P, REM > > > > P = TOTAL_NO_ROWS / NO_PROCESSES ! NOTE INTEGER DIVISION > > REM = TOTAL_NO_ROWS - P*NO_PROCESSES > > IF (ROW_NO < (NO_PROCESSES - REM)*P + 1 ) THEN > > OWNER = (ROW_NO - 1)/P ! NOTE INTEGER DIVISION > > ELSE > > OWNER = ( ROW_NO + NO_PROCESSES - REM -1 )/(P+1) ! NOTE INTEGER DIVISION > > ENDIF > > END SUBROUTINE WHOSE_ROW_IS_IT > > END MODULE ALL_STAB_ROUTINES > > > > > > PROGRAM trialer > > USE MPI > > #include > > use slepceps > > USE ALL_STAB_ROUTINES > > IMPLICIT NONE > > PetscMPIInt rank3, total_mpi_size > > PetscInt nl3, code, PROC_ROW, ISTATUS, jm, N_rows,NO_A_ENTRIES > > PetscInt, ALLOCATABLE, DIMENSION(:) :: JALOC > > PetscInt, PARAMETER :: ZERO = 0 , ONE = 1, TWO = 2, THREE = 3 > > PetscErrorCode ierr_pets > > > > ! Initialise sleps/mpi > > call SlepcInitialize(PETSC_NULL_CHARACTER,ierr_pets) ! note that this initialises MPI > > call MPI_COMM_SIZE(MPI_COMM_WORLD, total_mpi_size, ierr_pets) !! find total no of MPI processes > > nL3= total_mpi_size > > call MPI_COMM_RANK(MPI_COMM_WORLD,rank3,ierr_pets) !! find my overall rank -> rank3 > > write(*,*)'Welcome: PROCESS NO , TOTAL NO. OF PROCESSES = ',rank3, nl3 > > > > N_rows = 200 ! NUMBER OF ROWS OF A NOTIONAL MATRIX > > NO_A_ENTRIES = 12 ! NUMBER OF ENTRIES FOR JALOC > > > > ! LOOP OVER ROWS > > do jm = 1, N_rows > > > > CALL whose_row_is_it(JM, N_rows , NL3, PROC_ROW) ! FIND OUT WHICH PROCESS OWNS ROW > > if (rank3 == PROC_ROW) then ! IF mpi PROCESS OWNS THIS ROW THEN .. > > ! ALLOCATE jaloc ARRAY AND INITIALISE > > > > allocate(jaloc(NO_A_ENTRIES), STAT=ISTATUS ) > > jaloc = three > > > > > > WRITE(*,*)'JALOC',JALOC ! THIS SIMPLE PLOT ALWAYS WORKS > > write(*,*)'calling PetscIntView: PROCESS NO. ROW NO.',rank3, jm > > ! THIS CALL TO PetscIntView CAUSES CODE TO HANG WHEN E.G. total_mpi_size=3, JM=133 > > call PetscIntView(NO_A_ENTRIES,JALOC(1:NO_A_ENTRIES), & > > & PETSC_VIEWER_STDOUT_WORLD, ierr_pets) > > CHKERRA(ierr_pets) > > deallocate(jaloc) > > endif > > enddo > > > > CALL SlepcFinalize(ierr_pets) > > end program trialer > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu May 20 12:03:10 2021 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 20 May 2021 12:03:10 -0500 Subject: [petsc-users] nghosts in 'DMMoabLoadFromFile' In-Reply-To: References: Message-ID: <97E8448F-9714-4500-87F8-943D1820C914@petsc.dev> Probably. It is not documented unfortunately, but does seem to related to the how many ghost layers are needed. Barry > On May 20, 2021, at 11:25 AM, Bhamidipati, Vikram wrote: > > Hello, > > I am in the process of updating PetSc versions and I see that ?DMMoabLoadFromFile? function (in dmmutil.cxx) has a new argument ?nghosts?. For those of us who don?t use ghost cells should we set it to 0? > > Thanks, > Vikram > > --------------------------------------------------- > Vikram Bhamidipati > Senior Research Engineer > Computational Material Integrity Section > Materials Engineering Department > Mechanical Engineering Division > Ph: (210) 522-2576 -------------- next part -------------- An HTML attachment was scrubbed... URL: From vijay.m at gmail.com Thu May 20 13:19:40 2021 From: vijay.m at gmail.com (Vijay S. Mahadevan) Date: Thu, 20 May 2021 14:19:40 -0400 Subject: [petsc-users] nghosts in 'DMMoabLoadFromFile' In-Reply-To: <97E8448F-9714-4500-87F8-943D1820C914@petsc.dev> References: <97E8448F-9714-4500-87F8-943D1820C914@petsc.dev> Message-ID: Dear vikram, Yes if you are running in serial or if you do not require any ghost layers, initialize the parameter input to zero. Otherwise, it should specify the the number of ghost layers of elements you need when running in parallel. I'm not well at the moment but will update the documentation for DMMoab when I get back. Best, Vijay On Thu., May 20, 2021, 13:03 Barry Smith, wrote: > > Probably. It is not documented unfortunately, but does seem to related > to the how many ghost layers are needed. > > Barry > > > On May 20, 2021, at 11:25 AM, Bhamidipati, Vikram < > vikram.bhamidipati at swri.org> wrote: > > Hello, > > I am in the process of updating PetSc versions and I see that > ?DMMoabLoadFromFile? function (in dmmutil.cxx) has a new argument > ?nghosts?. For those of us who don?t use ghost cells should we set it to 0? > > Thanks, > Vikram > > --------------------------------------------------- > Vikram Bhamidipati > Senior Research Engineer > Computational Material Integrity Section > Materials Engineering Department > Mechanical Engineering Division > Ph: (210) 522-2576 > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vikram.bhamidipati at swri.org Thu May 20 13:25:32 2021 From: vikram.bhamidipati at swri.org (Bhamidipati, Vikram) Date: Thu, 20 May 2021 18:25:32 +0000 Subject: [petsc-users] nghosts in 'DMMoabLoadFromFile' In-Reply-To: References: <97E8448F-9714-4500-87F8-943D1820C914@petsc.dev> Message-ID: Thank you! From: Vijay S. Mahadevan Sent: Thursday, May 20, 2021 1:20 PM To: Barry Smith Cc: Bhamidipati, Vikram ; petsc-users ; moab-dev at mcs.anl.gov Subject: Re: [petsc-users] nghosts in 'DMMoabLoadFromFile' [EXTERNAL EMAIL] Dear vikram, Yes if you are running in serial or if you do not require any ghost layers, initialize the parameter input to zero. Otherwise, it should specify the the number of ghost layers of elements you need when running in parallel. I'm not well at the moment but will update the documentation for DMMoab when I get back. Best, Vijay On Thu., May 20, 2021, 13:03 Barry Smith, > wrote: Probably. It is not documented unfortunately, but does seem to related to the how many ghost layers are needed. Barry On May 20, 2021, at 11:25 AM, Bhamidipati, Vikram > wrote: Hello, I am in the process of updating PetSc versions and I see that ?DMMoabLoadFromFile? function (in dmmutil.cxx) has a new argument ?nghosts?. For those of us who don?t use ghost cells should we set it to 0? Thanks, Vikram --------------------------------------------------- Vikram Bhamidipati Senior Research Engineer Computational Material Integrity Section Materials Engineering Department Mechanical Engineering Division Ph: (210) 522-2576 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sayosale at hotmail.com Thu May 20 21:07:38 2021 From: sayosale at hotmail.com (dazza simplythebest) Date: Fri, 21 May 2021 02:07:38 +0000 Subject: [petsc-users] Code hangs when calling PetscIntView (MPI, fortran) In-Reply-To: References: , Message-ID: Hi Matthew, Many thanks for the tip re: the synchronized print, I wasn't aware of that routine. It is great how many useful utility routines PETSC seems to have - it's a big timesaver! Thanks, Dan ________________________________ From: Matthew Knepley Sent: Thursday, May 20, 2021 10:31 AM To: dazza simplythebest Cc: Jose E. Roman ; PETSc users list Subject: Re: [petsc-users] Code hangs when calling PetscIntView (MPI, fortran) On Thu, May 20, 2021 at 5:32 AM dazza simplythebest > wrote: Dear Jose, Many thanks for the prompt explanation - that would definitely explain what is going on, I will adjust my code accordingly . If you want to print different things from each process in parallel, I suggest https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscSynchronizedPrintf.html Thanks, Matt Thanks again, Dan. ________________________________ From: Jose E. Roman > Sent: Thursday, May 20, 2021 9:06 AM To: dazza simplythebest > Cc: PETSc users list > Subject: Re: [petsc-users] Code hangs when calling PetscIntView (MPI, fortran) If you look at the manpage https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscIntView.html you will see that PetscIntView() is collective. This means that all MPI processes must call this function, so it is forbidden to call it within an IF rank==... Jose > El 20 may 2021, a las 10:25, dazza simplythebest > escribi?: > > Dear All, > As part of preparing a code to call the SLEPC eigenvalue solving library, > I am constructing a matrix in sparse CSR format row-by-row. Just for debugging > purposes I write out the column values for a given row, which are stored in a > PetscInt allocatable vector, using PetscIntView. > > Everything works fine when the number of MPI processes exactly divide the > number of rows of the matrix, and so each process owns the same number of rows. > However, when the number of MPI processes does not exactly divide the > number of rows of the matrix, and so each process owns a different number of rows, > the code hangs when it reaches the line that calls PetscIntView. > To be precise the code hangs on the final row that a process, other than root, owns. > If I however comment out the call to PetscIntView the code completes without error, > and produces the correct eigenvalues (hence we are not missing a row / miswriting a row). > Note also that a simple direct writeout of this same array using a plain fortran command > will write out the array without problem. > > I have attached below a small code that reproduces the problem. > For this code we have nominally assigned 200 rows to our matrix. The code runs without > problem using 1,2,4,5,8 or 10 MPI processes, all of which precisely divide 200, > but will hang for 3 MPI processes for example. > For the case of 3 MPI processes the subroutine WHOSE_ROW_IS_IT allocates the rows > to each process as : > process no first row last row no. of rows > 0 1 66 66 > 1 67 133 67 > 2 134 200 67 > > The code will hang when process 1 calls PetscIntView for its last row, row 133 for example. > > One piece of additional information that may be relevant is that the code does run to completion > without hanging if I comment out the final slepc/MPI finalisation command > CALL SlepcFinalize(ierr_pets) > (I of course I get ' bad termination' errors, but the otherwise the run is successful.) > > I would appreciate it if anyone has any ideas on what is going wrong! > Many thanks, > Dan. > > > code: > > MODULE ALL_STAB_ROUTINES > IMPLICIT NONE > CONTAINS > > SUBROUTINE WHOSE_ROW_IS_IT(ROW_NO, TOTAL_NO_ROWS, NO_PROCESSES, & > & OWNER) > ! THIS ROUTINE ALLOCATES ROWS EVENLY BETWEEN mpi PROCESSES > #include > use slepceps > IMPLICIT NONE > PetscInt, INTENT(IN) :: ROW_NO, TOTAL_NO_ROWS, NO_PROCESSES > PetscInt, INTENT(OUT) :: OWNER > PetscInt :: P, REM > > P = TOTAL_NO_ROWS / NO_PROCESSES ! NOTE INTEGER DIVISION > REM = TOTAL_NO_ROWS - P*NO_PROCESSES > IF (ROW_NO < (NO_PROCESSES - REM)*P + 1 ) THEN > OWNER = (ROW_NO - 1)/P ! NOTE INTEGER DIVISION > ELSE > OWNER = ( ROW_NO + NO_PROCESSES - REM -1 )/(P+1) ! NOTE INTEGER DIVISION > ENDIF > END SUBROUTINE WHOSE_ROW_IS_IT > END MODULE ALL_STAB_ROUTINES > > > PROGRAM trialer > USE MPI > #include > use slepceps > USE ALL_STAB_ROUTINES > IMPLICIT NONE > PetscMPIInt rank3, total_mpi_size > PetscInt nl3, code, PROC_ROW, ISTATUS, jm, N_rows,NO_A_ENTRIES > PetscInt, ALLOCATABLE, DIMENSION(:) :: JALOC > PetscInt, PARAMETER :: ZERO = 0 , ONE = 1, TWO = 2, THREE = 3 > PetscErrorCode ierr_pets > > ! Initialise sleps/mpi > call SlepcInitialize(PETSC_NULL_CHARACTER,ierr_pets) ! note that this initialises MPI > call MPI_COMM_SIZE(MPI_COMM_WORLD, total_mpi_size, ierr_pets) !! find total no of MPI processes > nL3= total_mpi_size > call MPI_COMM_RANK(MPI_COMM_WORLD,rank3,ierr_pets) !! find my overall rank -> rank3 > write(*,*)'Welcome: PROCESS NO , TOTAL NO. OF PROCESSES = ',rank3, nl3 > > N_rows = 200 ! NUMBER OF ROWS OF A NOTIONAL MATRIX > NO_A_ENTRIES = 12 ! NUMBER OF ENTRIES FOR JALOC > > ! LOOP OVER ROWS > do jm = 1, N_rows > > CALL whose_row_is_it(JM, N_rows , NL3, PROC_ROW) ! FIND OUT WHICH PROCESS OWNS ROW > if (rank3 == PROC_ROW) then ! IF mpi PROCESS OWNS THIS ROW THEN .. > ! ALLOCATE jaloc ARRAY AND INITIALISE > > allocate(jaloc(NO_A_ENTRIES), STAT=ISTATUS ) > jaloc = three > > > WRITE(*,*)'JALOC',JALOC ! THIS SIMPLE PLOT ALWAYS WORKS > write(*,*)'calling PetscIntView: PROCESS NO. ROW NO.',rank3, jm > ! THIS CALL TO PetscIntView CAUSES CODE TO HANG WHEN E.G. total_mpi_size=3, JM=133 > call PetscIntView(NO_A_ENTRIES,JALOC(1:NO_A_ENTRIES), & > & PETSC_VIEWER_STDOUT_WORLD, ierr_pets) > CHKERRA(ierr_pets) > deallocate(jaloc) > endif > enddo > > CALL SlepcFinalize(ierr_pets) > end program trialer -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierrebernigaudl at gmail.com Fri May 21 08:24:44 2021 From: pierrebernigaudl at gmail.com (Pierre Bernigaud) Date: Fri, 21 May 2021 15:24:44 +0200 Subject: [petsc-users] Nested SNES in FormFunction Message-ID: <374B3B91-E37C-425D-956A-A2710DBA8180@gmail.com> Greetings, I am currently working on a CFD solver using PETSc. I have a non linear system which is solved using 2D_DMDA/SNES, and submitted to boundary conditions that are treated implicitly and updated in the FormFunction. The calculation of one of these boundary conditions requires the resolution of an other non linear system. I am hence using a nested 1D_DMDA/SNES system within the FormFunction of my main SNES solver to solve for this boundary condition. This is working fine, but doing a scalability study we found out that this causes the code to show sub-par acceleration properties. Have you ever encountered this kind of nested SNES application, and are there some critical points to be aware of in order to avoid a loss of performances? For instance, the sub 1D_DMDA/SNES objects are created and destroyed at each update of the boundary, hence at each call to FormFunction, which results in an important number of object manipulation. Could this be a problem? Furthermore, the use of a sub 1D_DMDA/SNES allows to use multiple processors to solve for the boundary condition, composed of say N cells. When running the code with M > N processors, everything is working great, but I am curious about the state of the (M-N) processors which aren?t working on boundary condition problem. Do they just stay idle? Thank you for your help. Respectfully, Pierre Bernigaud From knepley at gmail.com Fri May 21 09:01:19 2021 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 21 May 2021 10:01:19 -0400 Subject: [petsc-users] Nested SNES in FormFunction In-Reply-To: <374B3B91-E37C-425D-956A-A2710DBA8180@gmail.com> References: <374B3B91-E37C-425D-956A-A2710DBA8180@gmail.com> Message-ID: On Fri, May 21, 2021 at 9:34 AM Pierre Bernigaud wrote: > Greetings, > > I am currently working on a CFD solver using PETSc. I have a non linear > system which is solved using 2D_DMDA/SNES, and submitted to boundary > conditions that are treated implicitly and updated in the FormFunction. The > calculation of one of these boundary conditions requires the resolution of > an other non linear system. > > I am hence using a nested 1D_DMDA/SNES system within the FormFunction of > my main SNES solver to solve for this boundary condition. This is working > fine, but doing a scalability study we found out that this causes the code > to show sub-par acceleration properties. > > Have you ever encountered this kind of nested SNES application, and are > there some critical points to be aware of in order to avoid a loss of > performances? > For instance, the sub 1D_DMDA/SNES objects are created and destroyed at > each update of the boundary, hence at each call to FormFunction, which > results in an important number of object manipulation. Could this be a > problem? > Yes. You should keep this subsolver around for as long as the outer solver lives. > Furthermore, the use of a sub 1D_DMDA/SNES allows to use multiple > processors to solve for the boundary condition, composed of say N cells. > When running the code with M > N processors, everything is working great, > but I am curious about the state of the (M-N) processors which aren?t > working on boundary condition problem. Do they just stay idle? > Yes, they are idle. You could try to fold the boundary condition into the same system as the bulk, but this is not a clear win since you will still have load imbalance. What percentage of time does the boundary solve take? Thanks, Matt > Thank you for your help. > Respectfully, > Pierre Bernigaud -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From juaneah at gmail.com Fri May 21 10:49:14 2021 From: juaneah at gmail.com (Emmanuel Ayala) Date: Fri, 21 May 2021 10:49:14 -0500 Subject: [petsc-users] MatChop Message-ID: Hi everybody, I just updated petsc from version 13 to 15. Before the update everything works well, then my code give me an error: [9]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [9]PETSC ERROR: Invalid argument [9]PETSC ERROR: Setting off process row 53484 even though MatSetOption(,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE) was set [9]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [9]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 [9]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-superlud_mumps_hyp named eayala by ayala Fri May 21 10:40:36 2021 [9]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich --download-hypre --download-mumps --download-scalapack --download-parmetis --download-metis --download-superlu_dist --download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11 The error appears after a matrix assembly, the matrix was created with DMCreateMatrix and updated with MatSetValuesLocal. A Little work around I found the solution, avoid using MatChop on this matrix, but I still need to use MatChop. There is any reason to have this problem? Thanks in advance. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Fri May 21 11:02:09 2021 From: pierre at joliv.et (Pierre Jolivet) Date: Fri, 21 May 2021 18:02:09 +0200 Subject: [petsc-users] MatChop In-Reply-To: References: Message-ID: <5D52E520-4449-4542-8738-C061E0C1B9DD@joliv.et> Hello Emmanuel, I broke that, so I?ll fix it. I think you can bypass the error by explicitly resetting the appropriate option after your call to MatChop()? ierr = MatChop(A, tol);CHKERRQ(ierr); ierr = MatSetOption(A, MAT_NO_OFF_PROC_ENTRIES, PETSC_FALSE);CHKERRQ(ierr); Thanks, Pierre > On 21 May 2021, at 5:49 PM, Emmanuel Ayala wrote: > > Hi everybody, > > I just updated petsc from version 13 to 15. Before the update everything works well, then my code give me an error: > > [9]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [9]PETSC ERROR: Invalid argument > [9]PETSC ERROR: Setting off process row 53484 even though MatSetOption(,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE) was set > [9]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [9]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [9]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-superlud_mumps_hyp named eayala by ayala Fri May 21 10:40:36 2021 > [9]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich --download-hypre --download-mumps --download-scalapack --download-parmetis --download-metis --download-superlu_dist --download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11 > > The error appears after a matrix assembly, the matrix was created with DMCreateMatrix and updated with MatSetValuesLocal. > > A Little work around I found the solution, avoid using MatChop on this matrix, but I still need to use MatChop. There is any reason to have this problem? > > Thanks in advance. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Fri May 21 11:03:07 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Fri, 21 May 2021 19:03:07 +0300 Subject: [petsc-users] MatChop In-Reply-To: References: Message-ID: Emmanuel thanks for reporting this. I believe we have a regression in MatChop from https://gitlab.com/petsc/petsc/-/commit/038df967165af8ac6c3de46a36f650566a7db07c (cc'ing Pierre) We call MatAssemblyBegin/End within the row loop. Also. I don't understand why we need to check for r < rend herre https://gitlab.com/petsc/petsc/-/blob/038df967165af8ac6c3de46a36f650566a7db07c/src/mat/utils/axpy.c#L513. nor why we need to allocate newCols (can use cols) Pierre, can you take a look? Il giorno ven 21 mag 2021 alle ore 18:49 Emmanuel Ayala ha scritto: > Hi everybody, > > I just updated petsc from version 13 to 15. Before the update everything > works well, then my code give me an error: > > [9]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [9]PETSC ERROR: Invalid argument > [9]PETSC ERROR: Setting off process row 53484 even though > MatSetOption(,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE) was set > [9]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [9]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [9]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-superlud_mumps_hyp named > eayala by ayala Fri May 21 10:40:36 2021 > [9]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 > -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" > FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich > --download-hypre --download-mumps --download-scalapack --download-parmetis > --download-metis --download-superlu_dist --download-cmake > --download-fblaslapack=1 --with-cxx-dialect=C++11 > > The error appears after a matrix assembly, the matrix was created with > DMCreateMatrix and updated with MatSetValuesLocal. > > A Little work around I found the solution, avoid using MatChop on this > matrix, but I still need to use MatChop. There is any reason to have this > problem? > > Thanks in advance. > > > > > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From juaneah at gmail.com Fri May 21 11:15:50 2021 From: juaneah at gmail.com (Emmanuel Ayala) Date: Fri, 21 May 2021 11:15:50 -0500 Subject: [petsc-users] MatChop In-Reply-To: References: Message-ID: Hi Pierre, Your suggestion works fine for me. Thanks. El vie, 21 de may. de 2021 a la(s) 11:03, Stefano Zampini ( stefano.zampini at gmail.com) escribi?: > Emmanuel > > thanks for reporting this. > I believe we have a regression in MatChop from > https://gitlab.com/petsc/petsc/-/commit/038df967165af8ac6c3de46a36f650566a7db07c > (cc'ing Pierre) > We call MatAssemblyBegin/End within the row loop. Also. I don't understand > why we need to check for r < rend herre > https://gitlab.com/petsc/petsc/-/blob/038df967165af8ac6c3de46a36f650566a7db07c/src/mat/utils/axpy.c#L513. > nor why we need to allocate newCols (can use cols) > > Pierre, can you take a look? > > Il giorno ven 21 mag 2021 alle ore 18:49 Emmanuel Ayala > ha scritto: > >> Hi everybody, >> >> I just updated petsc from version 13 to 15. Before the update everything >> works well, then my code give me an error: >> >> [9]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [9]PETSC ERROR: Invalid argument >> [9]PETSC ERROR: Setting off process row 53484 even though >> MatSetOption(,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE) was set >> [9]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html >> for trouble shooting. >> [9]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 >> [9]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-superlud_mumps_hyp named >> eayala by ayala Fri May 21 10:40:36 2021 >> [9]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 >> -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" >> FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich >> --download-hypre --download-mumps --download-scalapack --download-parmetis >> --download-metis --download-superlu_dist --download-cmake >> --download-fblaslapack=1 --with-cxx-dialect=C++11 >> >> The error appears after a matrix assembly, the matrix was created with >> DMCreateMatrix and updated with MatSetValuesLocal. >> >> A Little work around I found the solution, avoid using MatChop on this >> matrix, but I still need to use MatChop. There is any reason to have this >> problem? >> >> Thanks in advance. >> >> >> >> >> > > -- > Stefano > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Fri May 21 11:17:51 2021 From: pierre at joliv.et (Pierre Jolivet) Date: Fri, 21 May 2021 18:17:51 +0200 Subject: [petsc-users] MatChop In-Reply-To: References: Message-ID: > On 21 May 2021, at 6:03 PM, Stefano Zampini wrote: > > Emmanuel > > thanks for reporting this. > I believe we have a regression in MatChop from https://gitlab.com/petsc/petsc/-/commit/038df967165af8ac6c3de46a36f650566a7db07c (cc'ing Pierre) > We call MatAssemblyBegin/End within the row loop. Also. I don't understand why we need to check for r < rend herre https://gitlab.com/petsc/petsc/-/blob/038df967165af8ac6c3de46a36f650566a7db07c/src/mat/utils/axpy.c#L513 . nor why we need to allocate newCols (can use cols) That part is from the initial 8-year old implementation from Matt (https://gitlab.com/petsc/petsc/-/commit/4325cce7191c5c61f4f090c59eaf6773fdee7b48#9d78409dea8190bffda8b68fee5aef233dc1c677 ). You need the check otherwise this error is raised: https://www.mcs.anl.gov/petsc/petsc-current/src/mat/interface/matrix.c.html#line566 . Thanks, Pierre > Pierre, can you take a look? > > Il giorno ven 21 mag 2021 alle ore 18:49 Emmanuel Ayala > ha scritto: > Hi everybody, > > I just updated petsc from version 13 to 15. Before the update everything works well, then my code give me an error: > > [9]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [9]PETSC ERROR: Invalid argument > [9]PETSC ERROR: Setting off process row 53484 even though MatSetOption(,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE) was set > [9]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [9]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [9]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-superlud_mumps_hyp named eayala by ayala Fri May 21 10:40:36 2021 > [9]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich --download-hypre --download-mumps --download-scalapack --download-parmetis --download-metis --download-superlu_dist --download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11 > > The error appears after a matrix assembly, the matrix was created with DMCreateMatrix and updated with MatSetValuesLocal. > > A Little work around I found the solution, avoid using MatChop on this matrix, but I still need to use MatChop. There is any reason to have this problem? > > Thanks in advance. > > > > > > > -- > Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Fri May 21 11:33:46 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Fri, 21 May 2021 19:33:46 +0300 Subject: [petsc-users] MatChop In-Reply-To: References: Message-ID: > On 21 May 2021, at 7:17 PM, Pierre Jolivet wrote: > > > >> On 21 May 2021, at 6:03 PM, Stefano Zampini > wrote: >> >> Emmanuel >> >> thanks for reporting this. >> I believe we have a regression in MatChop from https://gitlab.com/petsc/petsc/-/commit/038df967165af8ac6c3de46a36f650566a7db07c (cc'ing Pierre) >> We call MatAssemblyBegin/End within the row loop. Also. I don't understand why we need to check for r < rend herre https://gitlab.com/petsc/petsc/-/blob/038df967165af8ac6c3de46a36f650566a7db07c/src/mat/utils/axpy.c#L513 . nor why we need to allocate newCols (can use cols) > > That part is from the initial 8-year old implementation from Matt (https://gitlab.com/petsc/petsc/-/commit/4325cce7191c5c61f4f090c59eaf6773fdee7b48#9d78409dea8190bffda8b68fee5aef233dc1c677 ). > You need the check otherwise this error is raised: https://www.mcs.anl.gov/petsc/petsc-current/src/mat/interface/matrix.c.html#line566 . I see, anyway you do not need the check if the loop range [rStart,rEnd). So now I don?t understand why the loop must be [rStart,rStart+maxRows], Matt? > > Thanks, > Pierre > >> Pierre, can you take a look? >> >> Il giorno ven 21 mag 2021 alle ore 18:49 Emmanuel Ayala > ha scritto: >> Hi everybody, >> >> I just updated petsc from version 13 to 15. Before the update everything works well, then my code give me an error: >> >> [9]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [9]PETSC ERROR: Invalid argument >> [9]PETSC ERROR: Setting off process row 53484 even though MatSetOption(,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE) was set >> [9]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [9]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 >> [9]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-superlud_mumps_hyp named eayala by ayala Fri May 21 10:40:36 2021 >> [9]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich --download-hypre --download-mumps --download-scalapack --download-parmetis --download-metis --download-superlu_dist --download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11 >> >> The error appears after a matrix assembly, the matrix was created with DMCreateMatrix and updated with MatSetValuesLocal. >> >> A Little work around I found the solution, avoid using MatChop on this matrix, but I still need to use MatChop. There is any reason to have this problem? >> >> Thanks in advance. >> >> >> >> >> >> >> -- >> Stefano > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri May 21 11:49:54 2021 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 21 May 2021 12:49:54 -0400 Subject: [petsc-users] MatChop In-Reply-To: References: Message-ID: On Fri, May 21, 2021 at 12:33 PM Stefano Zampini wrote: > > > On 21 May 2021, at 7:17 PM, Pierre Jolivet wrote: > > > > On 21 May 2021, at 6:03 PM, Stefano Zampini > wrote: > > Emmanuel > > thanks for reporting this. > I believe we have a regression in MatChop from > https://gitlab.com/petsc/petsc/-/commit/038df967165af8ac6c3de46a36f650566a7db07c > (cc'ing Pierre) > We call MatAssemblyBegin/End within the row loop. Also. I don't understand > why we need to check for r < rend herre > https://gitlab.com/petsc/petsc/-/blob/038df967165af8ac6c3de46a36f650566a7db07c/src/mat/utils/axpy.c#L513. > nor why we need to allocate newCols (can use cols) > > > That part is from the initial 8-year old implementation from Matt ( > https://gitlab.com/petsc/petsc/-/commit/4325cce7191c5c61f4f090c59eaf6773fdee7b48#9d78409dea8190bffda8b68fee5aef233dc1c677 > ). > You need the check otherwise this error is raised: > https://www.mcs.anl.gov/petsc/petsc-current/src/mat/interface/matrix.c.html#line566 > . > > > I see, anyway you do not need the check if the loop range [rStart,rEnd). > So now I don?t understand why the loop must be [rStart,rStart+maxRows], > Matt? > It is terrible, but I could not see a way around it. We want to use MatGetRow() for each row, but that requires an assembled matrix. We want to use MatSetValues() to changes things, but that unassembles the matrix, so we need an assembly at each iteration, but assembly is collective, so everyone has to take the same number of iterations. Thus, maxRows. Matt > Thanks, > Pierre > > Pierre, can you take a look? > > Il giorno ven 21 mag 2021 alle ore 18:49 Emmanuel Ayala > ha scritto: > >> Hi everybody, >> >> I just updated petsc from version 13 to 15. Before the update everything >> works well, then my code give me an error: >> >> [9]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [9]PETSC ERROR: Invalid argument >> [9]PETSC ERROR: Setting off process row 53484 even though >> MatSetOption(,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE) was set >> [9]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html >> for trouble shooting. >> [9]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 >> [9]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-superlud_mumps_hyp named >> eayala by ayala Fri May 21 10:40:36 2021 >> [9]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 >> -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" >> FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich >> --download-hypre --download-mumps --download-scalapack --download-parmetis >> --download-metis --download-superlu_dist --download-cmake >> --download-fblaslapack=1 --with-cxx-dialect=C++11 >> >> The error appears after a matrix assembly, the matrix was created with >> DMCreateMatrix and updated with MatSetValuesLocal. >> >> A Little work around I found the solution, avoid using MatChop on this >> matrix, but I still need to use MatChop. There is any reason to have this >> problem? >> >> Thanks in advance. >> >> >> >> >> > > -- > Stefano > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Fri May 21 11:53:32 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Fri, 21 May 2021 19:53:32 +0300 Subject: [petsc-users] MatChop In-Reply-To: References: Message-ID: > On 21 May 2021, at 7:49 PM, Matthew Knepley wrote: > > On Fri, May 21, 2021 at 12:33 PM Stefano Zampini > wrote: > > >> On 21 May 2021, at 7:17 PM, Pierre Jolivet > wrote: >> >> >> >>> On 21 May 2021, at 6:03 PM, Stefano Zampini > wrote: >>> >>> Emmanuel >>> >>> thanks for reporting this. >>> I believe we have a regression in MatChop from https://gitlab.com/petsc/petsc/-/commit/038df967165af8ac6c3de46a36f650566a7db07c (cc'ing Pierre) >>> We call MatAssemblyBegin/End within the row loop. Also. I don't understand why we need to check for r < rend herre https://gitlab.com/petsc/petsc/-/blob/038df967165af8ac6c3de46a36f650566a7db07c/src/mat/utils/axpy.c#L513 . nor why we need to allocate newCols (can use cols) >> >> That part is from the initial 8-year old implementation from Matt (https://gitlab.com/petsc/petsc/-/commit/4325cce7191c5c61f4f090c59eaf6773fdee7b48#9d78409dea8190bffda8b68fee5aef233dc1c677 ). >> You need the check otherwise this error is raised: https://www.mcs.anl.gov/petsc/petsc-current/src/mat/interface/matrix.c.html#line566 . > > I see, anyway you do not need the check if the loop range [rStart,rEnd). So now I don?t understand why the loop must be [rStart,rStart+maxRows], Matt? > > It is terrible, but I could not see a way around it. We want to use MatGetRow() for each row, but that requires an assembled matrix. What is the use case for calling MatChop on an unassembled matrix ? > We want to use > MatSetValues() to changes things, but that unassembles the matrix, so we need an assembly at each iteration, but assembly is collective, so everyone has > to take the same number of iterations. Thus, maxRows. > > Matt >> Thanks, >> Pierre >> >>> Pierre, can you take a look? >>> >>> Il giorno ven 21 mag 2021 alle ore 18:49 Emmanuel Ayala > ha scritto: >>> Hi everybody, >>> >>> I just updated petsc from version 13 to 15. Before the update everything works well, then my code give me an error: >>> >>> [9]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>> [9]PETSC ERROR: Invalid argument >>> [9]PETSC ERROR: Setting off process row 53484 even though MatSetOption(,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE) was set >>> [9]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>> [9]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 >>> [9]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-superlud_mumps_hyp named eayala by ayala Fri May 21 10:40:36 2021 >>> [9]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich --download-hypre --download-mumps --download-scalapack --download-parmetis --download-metis --download-superlu_dist --download-cmake --download-fblaslapack=1 --with-cxx-dialect=C++11 >>> >>> The error appears after a matrix assembly, the matrix was created with DMCreateMatrix and updated with MatSetValuesLocal. >>> >>> A Little work around I found the solution, avoid using MatChop on this matrix, but I still need to use MatChop. There is any reason to have this problem? >>> >>> Thanks in advance. >>> >>> >>> >>> >>> >>> >>> -- >>> Stefano >> > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From wence at gmx.li Fri May 21 12:10:58 2021 From: wence at gmx.li (Lawrence Mitchell) Date: Fri, 21 May 2021 18:10:58 +0100 Subject: [petsc-users] MatChop In-Reply-To: References: Message-ID: > On 21 May 2021, at 17:53, Stefano Zampini wrote: > >> I see, anyway you do not need the check if the loop range [rStart,rEnd). So now I don?t understand why the loop must be [rStart,rStart+maxRows], Matt? >> >> It is terrible, but I could not see a way around it. We want to use MatGetRow() for each row, but that requires an assembled matrix. > > What is the use case for calling MatChop on an unassembled matrix ? It's rather that the matrix is modified "through the front door" by just calling MatSetValues repeatedly to replace the small values with zero. This is because an in-place modification of the matrix would require a method for each matrix type. So the process is: for each row: rowvals = MatGetRow(row) rowvals[abs(rowvals) < tol] = 0 MatSetValues(rowvals, ..., INSERT) MatRestoreRow(row) MatAssemblyBegin/End <- so that the next MatGetRow does not error. Now, one "knows" that this assembly will not need to communicate (because you only set local values), but the automatic state tracking can't know this. A disgusting hack that is tremendously fragile would be to do: for each row: ... mat->assembled = PETSC_TRUE mat->assembled = PETSC_FALSE MatAssemblyBegin/End But I would probably refuse to accept that could :) Lawrence From stefano.zampini at gmail.com Fri May 21 13:12:19 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Fri, 21 May 2021 21:12:19 +0300 Subject: [petsc-users] MatChop In-Reply-To: References: Message-ID: <17996832-1943-41E7-81D3-1972CD3722FB@gmail.com> > On 21 May 2021, at 8:10 PM, Lawrence Mitchell wrote: > > > >> On 21 May 2021, at 17:53, Stefano Zampini wrote: >> >>> I see, anyway you do not need the check if the loop range [rStart,rEnd). So now I don?t understand why the loop must be [rStart,rStart+maxRows], Matt? >>> >>> It is terrible, but I could not see a way around it. We want to use MatGetRow() for each row, but that requires an assembled matrix. >> >> What is the use case for calling MatChop on an unassembled matrix ? > > It's rather that the matrix is modified "through the front door" by just calling MatSetValues repeatedly to replace the small values with zero. This is because an in-place modification of the matrix would require a method for each matrix type. > If the matrix is assembled, this procedure will not insert new values, nor replace old ones if we set MatSetOption(MAT_IGNORE_ZEROENTRIES,PETSC_FALSE). This should never fail, or am I wrong? > So the process is: > > for each row: > rowvals = MatGetRow(row) > rowvals[abs(rowvals) < tol] = 0 > MatSetValues(rowvals, ..., INSERT) > MatRestoreRow(row) > MatAssemblyBegin/End <- so that the next MatGetRow does not error. > > Now, one "knows" that this assembly will not need to communicate (because you only set local values), but the automatic state tracking can't know this. > > A disgusting hack that is tremendously fragile would be to do: > > for each row: > ... > mat->assembled = PETSC_TRUE > mat->assembled = PETSC_FALSE > MatAssemblyBegin/End > > But I would probably refuse to accept that could :) > > Lawrence From y.juntao at hotmail.com Sun May 23 10:16:51 2021 From: y.juntao at hotmail.com (Karl Yang) Date: Sun, 23 May 2021 23:16:51 +0800 Subject: [petsc-users] Help needed with MUMPS solver Message-ID: Hello, I am using MUMPS direct solver for my project. I used the following options for solving my problem and it works in most cases. But for some cases I encounter a divergence error. But I think it is actually error due to MUMPS? I'm not sure how to debug the error. It is appreciated if anyone familiar with MUMPS solver to offer me some guidance. regards Juntao MUMPS options: PetscOptionsSetValue(NULL, "-ksp_type", "preonly"); PetscOptionsSetValue(NULL, "-pc_type", "cholesky"); PetscOptionsSetValue(NULL, "-pc_factor_mat_solver_type", "mumps"); PetscOptionsSetValue(NULL, "-mat_mumps_icntl_1", "1"); PetscOptionsSetValue(NULL, "-mat_mumps_icntl_2", "1"); PetscOptionsSetValue(NULL, "-mat_mumps_icntl_3", "1"); PetscOptionsSetValue(NULL, "-mat_mumps_icntl_4", "3"); PetscOptionsSetValue(NULL, "-mat_mumps_icntl_28", "1"); PetscOptionsSetValue(NULL, "-mat_mumps_icntl_7", "2"); PetscOptionsSetValue(NULL, "-mat_mumps_icntl_24", "1"); Log output from MUMPS and error message from PETSC at the bottom Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 1 240 2448 executing #MPI = 1, without OMP ================================================= MUMPS compiled with option -Dmetis MUMPS compiled with option -Dptscotch MUMPS compiled with option -Dscotch This MUMPS version includes code for SAVE_RESTORE ================================================= L D L^T Solver for general symmetric matrices Type of parallelism: Working host ****** ANALYSIS STEP ******** Scaling will be computed during analysis Compute maximum matching (Maximum Transversal): 5 ... JOB = 5: MAXIMIZE PRODUCT DIAGONAL AND SCALE Entering analysis phase with ... N NNZ LIW INFO(1) 240 2448 5137 0 Matrix entries: IRN() ICN() 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 Average density of rows/columns = 18 Average density of rows/columns = 18 Ordering based on AMF Constrained Ordering based on AMF Average density of rows/columns = 18 Average density of rows/columns = 18 NFSIZ(.) = 0 38 14 0 33 33 0 0 0 0 FILS (.) = 0 148 4 -96 224 163 20 -43 8 1 FRERE(.) = 241 -5 -6 241 0 -2 241 241 241 241 Leaving analysis phase with ... INFOG(1) = 0 INFOG(2) = 0 -- (20) Number of entries in factors (estim.) = 3750 -- (3) Real space for factors (estimated) = 4641 -- (4) Integer space for factors (estimated) = 2816 -- (5) Maximum frontal size (estimated) = 38 -- (6) Number of nodes in the tree = 56 -- (32) Type of analysis effectively used = 1 -- (7) Ordering option effectively used = 2 ICNTL(6) Maximum transversal option = 0 ICNTL(7) Pivot order option = 2 ICNTL(14) Percentage of memory relaxation = 20 Number of level 2 nodes = 0 Number of split nodes = 0 RINFOG(1) Operations during elimination (estim)= 7.137D+04 Ordering compressed/constrained (ICNTL(12)) = 3 MEMORY ESTIMATIONS ... Estimations with standard Full-Rank (FR) factorization: Total space in MBytes, IC factorization (INFOG(17)): 0 Total space in MBytes, OOC factorization (INFOG(27)): 0 Elapsed time in analysis driver= 0.0016 Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 2 240 2448 executing #MPI = 1, without OMP ****** FACTORIZATION STEP ******** GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ... Number of working processes = 1 ICNTL(22) Out-of-core option = 0 ICNTL(35) BLR activation (eff. choice) = 0 ICNTL(14) Memory relaxation = 20 INFOG(3) Real space for factors (estimated)= 4641 INFOG(4) Integer space for factors (estim.)= 2816 Maximum frontal size (estimated) = 38 Number of nodes in the tree = 56 Memory allowed (MB -- 0: N/A ) = 0 Memory provided by user, sum of LWK_USER = 0 Relative threshold for pivoting, CNTL(1) = 0.1000D-01 ZERO PIVOT DETECTION ON, THRESHOLD = 2.8931920285365730E-020 INFINITE FIXATION Effective size of S (based on INFO(39))= 7981 Elapsed time to reformat/distribute matrix = 0.0001 ** Memory allocated, total in Mbytes (INFOG(19)): 0 ** Memory effectively used, total in Mbytes (INFOG(22)): 0 ** Memory dynamically allocated for CB, total in Mbytes : 0 Elapsed time for factorization = 0.0006 Leaving factorization with ... RINFOG(2) Operations in node assembly = 5.976D+03 ------(3) Operations in node elimination = 1.197D+05 INFOG (9) Real space for factors = 6193 INFOG(10) Integer space for factors = 3036 INFOG(11) Maximum front size = 42 INFOG(29) Number of entries in factors = 4896 INFOG(12) Number of negative pivots = 79 INFOG(13) Number of delayed pivots = 110 Number of 2x2 pivots in type 1 nodes = 1 Number of 2X2 pivots in type 2 nodes = 0 Nb of null pivots detected by ICNTL(24) = 0 INFOG(28) Estimated deficiency = 0 INFOG(14) Number of memory compress = 0 Elapsed time in factorization driver= 0.0009 Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 3 240 2448 executing #MPI = 1, without OMP ****** SOLVE & CHECK STEP ******** GLOBAL STATISTICS PRIOR SOLVE PHASE ........... Number of right-hand-sides = 1 Blocking factor for multiple rhs = 1 ICNTL (9) = 1 --- (10) = 0 --- (11) = 0 --- (20) = 0 --- (21) = 0 --- (30) = 0 --- (35) = 0 Vector solution for column 1 RHS -7.828363D-02 -3.255337D+00 1.054729D+00 1.379822D-01 -3.892113D-01 1.433990D-01 1.089250D+00 2.252611D+00 3.215399D+00 -6.788806D-02 ** Space in MBYTES used for solve : 0 Leaving solve with ... Time to build/scatter RHS = 0.000003 Time in solution step (fwd/bwd) = 0.000167 .. Time in forward (fwd) step = 0.000053 .. Time in backward (bwd) step = 0.000093 Time to gather solution(cent.sol)= 0.000000 Time to copy/scale dist. solution= 0.000000 Elapsed time in solve driver= 0.0004 *** Warning: Verbose output for PETScKrylovSolver not implemented, calling PETSc KSPView directly. KSP Object: 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: 1 MPI processes type: cholesky out-of-place factorization tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 0., needed 0. Factored matrix follows: Mat Object: 1 MPI processes type: mumps rows=240, cols=240 package used to perform factorization: mumps total: nonzeros=3750, allocated nonzeros=3750 total number of mallocs used during MatSetValues calls=0 MUMPS run parameters: SYM (matrix type): 2 PAR (host participation): 1 ICNTL(1) (output for error): 1 ICNTL(2) (output of diagnostic msg): 1 ICNTL(3) (output for global info): 6 ICNTL(4) (level of printing): 3 ICNTL(5) (input mat struct): 0 ICNTL(6) (matrix prescaling): 7 ICNTL(7) (sequential matrix ordering):2 ICNTL(8) (scaling strategy): 77 ICNTL(10) (max num of refinements): 0 ICNTL(11) (error analysis): 0 ICNTL(12) (efficiency control): 0 ICNTL(13) (efficiency control): 1 ICNTL(14) (percentage of estimated workspace increase): 20 ICNTL(18) (input mat struct): 0 ICNTL(19) (Schur complement info): 0 ICNTL(20) (rhs sparse pattern): 0 ICNTL(21) (solution struct): 0 ICNTL(22) (in-core/out-of-core facility): 0 ICNTL(23) (max size of memory can be allocated locally):0 ICNTL(24) (detection of null pivot rows): 1 ICNTL(25) (computation of a null space basis): 0 ICNTL(26) (Schur options for rhs or solution): 0 ICNTL(27) (experimental parameter): -32 ICNTL(28) (use parallel or sequential ordering): 1 ICNTL(29) (parallel ordering): 0 ICNTL(30) (user-specified set of entries in inv(A)): 0 ICNTL(31) (factors is discarded in the solve phase): 0 ICNTL(33) (compute determinant): 0 ICNTL(35) (activate BLR based factorization): 0 ICNTL(36) (choice of BLR factorization variant): 0 ICNTL(38) (estimated compression rate of LU factors): 333 CNTL(1) (relative pivoting threshold): 0.01 CNTL(2) (stopping criterion of refinement): 1.49012e-08 CNTL(3) (absolute pivoting threshold): 0. CNTL(4) (value of static pivoting): -1. CNTL(5) (fixation for null pivots): 0. CNTL(7) (dropping parameter for BLR): 0. RINFO(1) (local estimated flops for the elimination after analysis): [0] 71368. RINFO(2) (local estimated flops for the assembly after factorization): [0] 5976. RINFO(3) (local estimated flops for the elimination after factorization): [0] 119716. INFO(15) (estimated size of (in MB) MUMPS internal data for running numerical factorization): [0] 0 INFO(16) (size of (in MB) MUMPS internal data used during numerical factorization): [0] 0 INFO(23) (num of pivots eliminated on this processor after factorization): [0] 240 RINFOG(1) (global estimated flops for the elimination after analysis): 71368. RINFOG(2) (global estimated flops for the assembly after factorization): 5976. RINFOG(3) (global estimated flops for the elimination after factorization): 119716. (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0.,0.)*(2^0) INFOG(3) (estimated real workspace for factors on all processors after analysis): 4641 INFOG(4) (estimated integer workspace for factors on all processors after analysis): 2816 INFOG(5) (estimated maximum front size in the complete tree): 38 INFOG(6) (number of nodes in the complete tree): 56 INFOG(7) (ordering option effectively use after analysis): 2 INFOG(8) (structural symmetry in percent of the permuted matrix after analysis): 100 INFOG(9) (total real/complex workspace to store the matrix factors after factorization): 6193 INFOG(10) (total integer space store the matrix factors after factorization): 3036 INFOG(11) (order of largest frontal matrix after factorization): 42 INFOG(12) (number of off-diagonal pivots): 79 INFOG(13) (number of delayed pivots after factorization): 110 INFOG(14) (number of memory compress after factorization): 0 INFOG(15) (number of steps of iterative refinement after solution): 0 INFOG(16) (estimated size (in MB) of all MUMPS internal data for factorization after analysis: value on the most memory consuming processor): 0 INFOG(17) (estimated size of all MUMPS internal data for factorization after analysis: sum over all processors): 0 INFOG(18) (size of all MUMPS internal data allocated during factorization: value on the most memory consuming processor): 0 INFOG(19) (size of all MUMPS internal data allocated during factorization: sum over all processors): 0 INFOG(20) (estimated number of entries in the factors): 3750 INFOG(21) (size in MB of memory effectively used during factorization - value on the most memory consuming processor): 0 INFOG(22) (size in MB of memory effectively used during factorization - sum over all processors): 0 INFOG(23) (after analysis: value of ICNTL(6) effectively used): 5 INFOG(24) (after analysis: value of ICNTL(12) effectively used): 3 INFOG(25) (after factorization: number of pivots modified by static pivoting): 0 INFOG(28) (after factorization: number of null pivots encountered): 0 INFOG(29) (after factorization: effective number of entries in the factors (sum over all processors)): 4896 INFOG(30, 31) (after solution: size in Mbytes of memory used during solution phase): 0, 0 INFOG(32) (after analysis: type of analysis done): 1 INFOG(33) (value used for ICNTL(8)): -2 INFOG(34) (exponent of the determinant if determinant is requested): 0 INFOG(35) (after factorization: number of entries taking into account BLR factor compression - sum over all processors): 4896 INFOG(36) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - value on the most memory consuming processor): 0 INFOG(37) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - sum over all processors): 0 INFOG(38) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - value on the most memory consuming processor): 0 INFOG(39) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - sum over all processors): 0 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=240, cols=240 total: nonzeros=4656, allocated nonzeros=4656 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 167 nodes, limit used is 5 Entering DMUMPS 5.2.1 from C interface with JOB = -2 executing #MPI = 1, without OMP rank: 0 coefficient: 0.132368 Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 1 960 9792 executing #MPI = 1, without OMP ================================================= MUMPS compiled with option -Dmetis MUMPS compiled with option -Dptscotch MUMPS compiled with option -Dscotch This MUMPS version includes code for SAVE_RESTORE ================================================= L D L^T Solver for general symmetric matrices Type of parallelism: Working host ****** ANALYSIS STEP ******** Scaling will be computed during analysis Compute maximum matching (Maximum Transversal): 5 ... JOB = 5: MAXIMIZE PRODUCT DIAGONAL AND SCALE Entering analysis phase with ... N NNZ LIW INFO(1) 960 9792 20545 0 Matrix entries: IRN() ICN() 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 Average density of rows/columns = 18 Average density of rows/columns = 18 Ordering based on AMF Constrained Ordering based on AMF Average density of rows/columns = 18 Average density of rows/columns = 18 NFSIZ(.) = 0 0 0 58 0 0 0 73 14 0 FILS (.) = 0 -747 -80 922 146 5 6 669 3 1 FRERE(.) = 961 961 961 0 961 961 961 -4 -69 961 Leaving analysis phase with ... INFOG(1) = 0 INFOG(2) = 0 -- (20) Number of entries in factors (estim.) = 20336 -- (3) Real space for factors (estimated) = 24094 -- (4) Integer space for factors (estimated) = 12143 -- (5) Maximum frontal size (estimated) = 80 -- (6) Number of nodes in the tree = 227 -- (32) Type of analysis effectively used = 1 -- (7) Ordering option effectively used = 2 ICNTL(6) Maximum transversal option = 0 ICNTL(7) Pivot order option = 2 ICNTL(14) Percentage of memory relaxation = 20 Number of level 2 nodes = 0 Number of split nodes = 0 RINFOG(1) Operations during elimination (estim)= 6.966D+05 Ordering compressed/constrained (ICNTL(12)) = 3 MEMORY ESTIMATIONS ... Estimations with standard Full-Rank (FR) factorization: Total space in MBytes, IC factorization (INFOG(17)): 1 Total space in MBytes, OOC factorization (INFOG(27)): 1 Elapsed time in analysis driver= 0.0066 Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 2 960 9792 executing #MPI = 1, without OMP ****** FACTORIZATION STEP ******** GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ... Number of working processes = 1 ICNTL(22) Out-of-core option = 0 ICNTL(35) BLR activation (eff. choice) = 0 ICNTL(14) Memory relaxation = 20 INFOG(3) Real space for factors (estimated)= 24094 INFOG(4) Integer space for factors (estim.)= 12143 Maximum frontal size (estimated) = 80 Number of nodes in the tree = 227 Memory allowed (MB -- 0: N/A ) = 0 Memory provided by user, sum of LWK_USER = 0 Relative threshold for pivoting, CNTL(1) = 0.1000D-01 ZERO PIVOT DETECTION ON, THRESHOLD = 2.9434468577175697E-020 INFINITE FIXATION Effective size of S (based on INFO(39))= 31314 Elapsed time to reformat/distribute matrix = 0.0006 ** Memory allocated, total in Mbytes (INFOG(19)): 1 ** Memory effectively used, total in Mbytes (INFOG(22)): 1 ** Memory dynamically allocated for CB, total in Mbytes : 0 Elapsed time for (failed) factorization = 0.0032 Leaving factorization with ... RINFOG(2) Operations in node assembly = 3.366D+04 ------(3) Operations in node elimination = 9.346D+05 INFOG (9) Real space for factors = 26980 INFOG(10) Integer space for factors = 13047 INFOG(11) Maximum front size = 84 INFOG(29) Number of entries in factors = 24047 INFOG(12) Number of negative pivots = 294 INFOG(13) Number of delayed pivots = 452 Number of 2x2 pivots in type 1 nodes = 0 Number of 2X2 pivots in type 2 nodes = 0 Nb of null pivots detected by ICNTL(24) = 0 INFOG(28) Estimated deficiency = 0 INFOG(14) Number of memory compress = 1 Elapsed time in factorization driver= 0.0042 On return from DMUMPS, INFOG(1)= -9 On return from DMUMPS, INFOG(2)= 22 terminate called after throwing an instance of 'std::runtime_error' what(): *** ------------------------------------------------------------------------- *** DOLFIN encountered an error. If you are not able to resolve this issue *** using the information listed below, you can ask for help at *** *** fenics-support at googlegroups.com *** *** Remember to include the error message listed below and, if possible, *** include a *minimal* running example to reproduce the error. *** *** ------------------------------------------------------------------------- *** Error: Unable to solve linear system using PETSc Krylov solver. *** Reason: Solution failed to converge in 0 iterations (PETSc reason DIVERGED_PC_FAILED, residual norm ||r|| = 0.000000e+00). *** Where: This error was encountered inside PETScKrylovSolver.cpp. *** Process: 0 *** *** DOLFIN version: 2019.1.0 *** Git changeset: 74d7efe1e84d65e9433fd96c50f1d278fa3e3f3f *** ------------------------------------------------------------------------- Aborted (core dumped) -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun May 23 11:17:12 2021 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 23 May 2021 12:17:12 -0400 Subject: [petsc-users] Help needed with MUMPS solver In-Reply-To: References: Message-ID: On Sun, May 23, 2021 at 11:17 AM Karl Yang wrote: > Hello, > > I am using MUMPS direct solver for my project. I used the following > options for solving my problem and it works in most cases. But for some > cases I encounter a divergence error. But I think it is actually error due > to MUMPS? > > I'm not sure how to debug the error. It is appreciated if anyone familiar > with MUMPS solver to offer me some guidance. > This says On return from DMUMPS, INFOG(1)= -9 On return from DMUMPS, INFOG(2)= 22 that the internal work array for MUMPS is too small. I am not sure which option controls that. Thanks, Matt > regards > Juntao > > MUMPS options: > > PetscOptionsSetValue(NULL, "-ksp_type", "preonly"); > PetscOptionsSetValue(NULL, "-pc_type", "cholesky"); > PetscOptionsSetValue(NULL, "-pc_factor_mat_solver_type", "mumps"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_1", "1"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_2", "1"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_3", "1"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_4", "3"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_28", "1"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_7", "2"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_24", "1"); > > > > Log output from MUMPS and error message from PETSC at the bottom > > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 1 > 240 2448 > executing #MPI = 1, without OMP > > ================================================= > MUMPS compiled with option -Dmetis > MUMPS compiled with option -Dptscotch > MUMPS compiled with option -Dscotch > This MUMPS version includes code for SAVE_RESTORE > ================================================= > L D L^T Solver for general symmetric matrices > Type of parallelism: Working host > > ****** ANALYSIS STEP ******** > > Scaling will be computed during analysis > Compute maximum matching (Maximum Transversal): 5 > ... JOB = 5: MAXIMIZE PRODUCT DIAGONAL AND SCALE > > Entering analysis phase with ... > N NNZ LIW INFO(1) > 240 2448 5137 0 > Matrix entries: IRN() ICN() > 1 1 1 2 1 3 > 1 4 1 5 1 6 > 1 7 1 8 1 9 > 1 10 > Average density of rows/columns = 18 > Average density of rows/columns = 18 > Ordering based on AMF > Constrained Ordering based on AMF > Average density of rows/columns = 18 > Average density of rows/columns = 18 > NFSIZ(.) = 0 38 14 0 33 33 0 0 0 0 > > FILS (.) = 0 148 4 -96 224 163 20 -43 8 1 > > FRERE(.) = 241 -5 -6 241 0 -2 241 241 241 241 > > > Leaving analysis phase with ... > INFOG(1) = 0 > INFOG(2) = 0 > -- (20) Number of entries in factors (estim.) = 3750 > -- (3) Real space for factors (estimated) = 4641 > -- (4) Integer space for factors (estimated) = 2816 > -- (5) Maximum frontal size (estimated) = 38 > -- (6) Number of nodes in the tree = 56 > -- (32) Type of analysis effectively used = 1 > -- (7) Ordering option effectively used = 2 > ICNTL(6) Maximum transversal option = 0 > ICNTL(7) Pivot order option = 2 > ICNTL(14) Percentage of memory relaxation = 20 > Number of level 2 nodes = 0 > Number of split nodes = 0 > RINFOG(1) Operations during elimination (estim)= 7.137D+04 > Ordering compressed/constrained (ICNTL(12)) = 3 > > MEMORY ESTIMATIONS ... > Estimations with standard Full-Rank (FR) factorization: > Total space in MBytes, IC factorization (INFOG(17)): 0 > Total space in MBytes, OOC factorization (INFOG(27)): 0 > > Elapsed time in analysis driver= 0.0016 > > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 2 > 240 2448 > executing #MPI = 1, without OMP > > > > ****** FACTORIZATION STEP ******** > > GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ... > Number of working processes = 1 > ICNTL(22) Out-of-core option = 0 > ICNTL(35) BLR activation (eff. choice) = 0 > ICNTL(14) Memory relaxation = 20 > INFOG(3) Real space for factors (estimated)= 4641 > INFOG(4) Integer space for factors (estim.)= 2816 > Maximum frontal size (estimated) = 38 > Number of nodes in the tree = 56 > Memory allowed (MB -- 0: N/A ) = 0 > Memory provided by user, sum of LWK_USER = 0 > Relative threshold for pivoting, CNTL(1) = 0.1000D-01 > ZERO PIVOT DETECTION ON, THRESHOLD = 2.8931920285365730E-020 > INFINITE FIXATION > Effective size of S (based on INFO(39))= 7981 > Elapsed time to reformat/distribute matrix = 0.0001 > ** Memory allocated, total in Mbytes (INFOG(19)): 0 > ** Memory effectively used, total in Mbytes (INFOG(22)): 0 > ** Memory dynamically allocated for CB, total in Mbytes : 0 > > Elapsed time for factorization = 0.0006 > > Leaving factorization with ... > RINFOG(2) Operations in node assembly = 5.976D+03 > ------(3) Operations in node elimination = 1.197D+05 > INFOG (9) Real space for factors = 6193 > INFOG(10) Integer space for factors = 3036 > INFOG(11) Maximum front size = 42 > INFOG(29) Number of entries in factors = 4896 > INFOG(12) Number of negative pivots = 79 > INFOG(13) Number of delayed pivots = 110 > Number of 2x2 pivots in type 1 nodes = 1 > Number of 2X2 pivots in type 2 nodes = 0 > Nb of null pivots detected by ICNTL(24) = 0 > INFOG(28) Estimated deficiency = 0 > INFOG(14) Number of memory compress = 0 > > Elapsed time in factorization driver= 0.0009 > > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 3 > 240 2448 > executing #MPI = 1, without OMP > > > > ****** SOLVE & CHECK STEP ******** > > GLOBAL STATISTICS PRIOR SOLVE PHASE ........... > Number of right-hand-sides = 1 > Blocking factor for multiple rhs = 1 > ICNTL (9) = 1 > --- (10) = 0 > --- (11) = 0 > --- (20) = 0 > --- (21) = 0 > --- (30) = 0 > --- (35) = 0 > > > Vector solution for column 1 > RHS > -7.828363D-02 -3.255337D+00 1.054729D+00 1.379822D-01 -3.892113D-01 > 1.433990D-01 1.089250D+00 2.252611D+00 3.215399D+00 -6.788806D-02 > ** Space in MBYTES used for solve : 0 > > Leaving solve with ... > Time to build/scatter RHS = 0.000003 > Time in solution step (fwd/bwd) = 0.000167 > .. Time in forward (fwd) step = 0.000053 > .. Time in backward (bwd) step = 0.000093 > Time to gather solution(cent.sol)= 0.000000 > Time to copy/scale dist. solution= 0.000000 > > Elapsed time in solve driver= 0.0004 > *** Warning: Verbose output for PETScKrylovSolver not implemented, calling > PETSc KSPView directly. > KSP Object: 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: 1 MPI processes > type: cholesky > out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 0., needed 0. > Factored matrix follows: > Mat Object: 1 MPI processes > type: mumps > rows=240, cols=240 > package used to perform factorization: mumps > total: nonzeros=3750, allocated nonzeros=3750 > total number of mallocs used during MatSetValues calls=0 > MUMPS run parameters: > SYM (matrix type): 2 > PAR (host participation): 1 > ICNTL(1) (output for error): 1 > ICNTL(2) (output of diagnostic msg): 1 > ICNTL(3) (output for global info): 6 > ICNTL(4) (level of printing): 3 > ICNTL(5) (input mat struct): 0 > ICNTL(6) (matrix prescaling): 7 > ICNTL(7) (sequential matrix ordering):2 > ICNTL(8) (scaling strategy): 77 > ICNTL(10) (max num of refinements): 0 > ICNTL(11) (error analysis): 0 > ICNTL(12) (efficiency control): 0 > ICNTL(13) (efficiency control): 1 > ICNTL(14) (percentage of estimated workspace increase): 20 > ICNTL(18) (input mat struct): 0 > ICNTL(19) (Schur complement info): 0 > ICNTL(20) (rhs sparse pattern): 0 > ICNTL(21) (solution struct): 0 > ICNTL(22) (in-core/out-of-core facility): 0 > ICNTL(23) (max size of memory can be allocated locally):0 > ICNTL(24) (detection of null pivot rows): 1 > ICNTL(25) (computation of a null space basis): 0 > ICNTL(26) (Schur options for rhs or solution): 0 > ICNTL(27) (experimental parameter): -32 > ICNTL(28) (use parallel or sequential ordering): 1 > ICNTL(29) (parallel ordering): 0 > ICNTL(30) (user-specified set of entries in inv(A)): 0 > ICNTL(31) (factors is discarded in the solve phase): 0 > ICNTL(33) (compute determinant): 0 > ICNTL(35) (activate BLR based factorization): 0 > ICNTL(36) (choice of BLR factorization variant): 0 > ICNTL(38) (estimated compression rate of LU factors): 333 > CNTL(1) (relative pivoting threshold): 0.01 > CNTL(2) (stopping criterion of refinement): 1.49012e-08 > CNTL(3) (absolute pivoting threshold): 0. > CNTL(4) (value of static pivoting): -1. > CNTL(5) (fixation for null pivots): 0. > CNTL(7) (dropping parameter for BLR): 0. > RINFO(1) (local estimated flops for the elimination after > analysis): > [0] 71368. > RINFO(2) (local estimated flops for the assembly after > factorization): > [0] 5976. > RINFO(3) (local estimated flops for the elimination after > factorization): > [0] 119716. > INFO(15) (estimated size of (in MB) MUMPS internal data for > running numerical factorization): > [0] 0 > INFO(16) (size of (in MB) MUMPS internal data used during > numerical factorization): > [0] 0 > INFO(23) (num of pivots eliminated on this processor after > factorization): > [0] 240 > RINFOG(1) (global estimated flops for the elimination after > analysis): 71368. > RINFOG(2) (global estimated flops for the assembly after > factorization): 5976. > RINFOG(3) (global estimated flops for the elimination after > factorization): 119716. > (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): > (0.,0.)*(2^0) > INFOG(3) (estimated real workspace for factors on all > processors after analysis): 4641 > INFOG(4) (estimated integer workspace for factors on all > processors after analysis): 2816 > INFOG(5) (estimated maximum front size in the complete > tree): 38 > INFOG(6) (number of nodes in the complete tree): 56 > INFOG(7) (ordering option effectively use after analysis): 2 > INFOG(8) (structural symmetry in percent of the permuted > matrix after analysis): 100 > INFOG(9) (total real/complex workspace to store the matrix > factors after factorization): 6193 > INFOG(10) (total integer space store the matrix factors > after factorization): 3036 > INFOG(11) (order of largest frontal matrix after > factorization): 42 > INFOG(12) (number of off-diagonal pivots): 79 > INFOG(13) (number of delayed pivots after factorization): > 110 > INFOG(14) (number of memory compress after factorization): 0 > INFOG(15) (number of steps of iterative refinement after > solution): 0 > INFOG(16) (estimated size (in MB) of all MUMPS internal data > for factorization after analysis: value on the most memory consuming > processor): 0 > INFOG(17) (estimated size of all MUMPS internal data for > factorization after analysis: sum over all processors): 0 > INFOG(18) (size of all MUMPS internal data allocated during > factorization: value on the most memory consuming processor): 0 > INFOG(19) (size of all MUMPS internal data allocated during > factorization: sum over all processors): 0 > INFOG(20) (estimated number of entries in the factors): 3750 > INFOG(21) (size in MB of memory effectively used during > factorization - value on the most memory consuming processor): 0 > INFOG(22) (size in MB of memory effectively used during > factorization - sum over all processors): 0 > INFOG(23) (after analysis: value of ICNTL(6) effectively > used): 5 > INFOG(24) (after analysis: value of ICNTL(12) effectively > used): 3 > INFOG(25) (after factorization: number of pivots modified by > static pivoting): 0 > INFOG(28) (after factorization: number of null pivots > encountered): 0 > INFOG(29) (after factorization: effective number of entries > in the factors (sum over all processors)): 4896 > INFOG(30, 31) (after solution: size in Mbytes of memory used > during solution phase): 0, 0 > INFOG(32) (after analysis: type of analysis done): 1 > INFOG(33) (value used for ICNTL(8)): -2 > INFOG(34) (exponent of the determinant if determinant is > requested): 0 > INFOG(35) (after factorization: number of entries taking > into account BLR factor compression - sum over all processors): 4896 > INFOG(36) (after analysis: estimated size of all MUMPS > internal data for running BLR in-core - value on the most memory consuming > processor): 0 > INFOG(37) (after analysis: estimated size of all MUMPS > internal data for running BLR in-core - sum over all processors): 0 > INFOG(38) (after analysis: estimated size of all MUMPS > internal data for running BLR out-of-core - value on the most memory > consuming processor): 0 > INFOG(39) (after analysis: estimated size of all MUMPS > internal data for running BLR out-of-core - sum over all processors): 0 > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=240, cols=240 > total: nonzeros=4656, allocated nonzeros=4656 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 167 nodes, limit used is 5 > > Entering DMUMPS 5.2.1 from C interface with JOB = -2 > executing #MPI = 1, without OMP > rank: 0 coefficient: 0.132368 > > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 1 > 960 9792 > executing #MPI = 1, without OMP > > ================================================= > MUMPS compiled with option -Dmetis > MUMPS compiled with option -Dptscotch > MUMPS compiled with option -Dscotch > This MUMPS version includes code for SAVE_RESTORE > ================================================= > L D L^T Solver for general symmetric matrices > Type of parallelism: Working host > > ****** ANALYSIS STEP ******** > > Scaling will be computed during analysis > Compute maximum matching (Maximum Transversal): 5 > ... JOB = 5: MAXIMIZE PRODUCT DIAGONAL AND SCALE > > Entering analysis phase with ... > N NNZ LIW INFO(1) > 960 9792 20545 0 > Matrix entries: IRN() ICN() > 1 1 1 2 1 3 > 1 4 1 5 1 6 > 1 7 1 8 1 9 > 1 10 > Average density of rows/columns = 18 > Average density of rows/columns = 18 > Ordering based on AMF > Constrained Ordering based on AMF > Average density of rows/columns = 18 > Average density of rows/columns = 18 > NFSIZ(.) = 0 0 0 58 0 0 0 73 14 0 > > FILS (.) = 0 -747 -80 922 146 5 6 669 3 1 > > FRERE(.) = 961 961 961 0 961 961 961 -4 -69 961 > > > Leaving analysis phase with ... > INFOG(1) = 0 > INFOG(2) = 0 > -- (20) Number of entries in factors (estim.) = 20336 > -- (3) Real space for factors (estimated) = 24094 > -- (4) Integer space for factors (estimated) = 12143 > -- (5) Maximum frontal size (estimated) = 80 > -- (6) Number of nodes in the tree = 227 > -- (32) Type of analysis effectively used = 1 > -- (7) Ordering option effectively used = 2 > ICNTL(6) Maximum transversal option = 0 > ICNTL(7) Pivot order option = 2 > ICNTL(14) Percentage of memory relaxation = 20 > Number of level 2 nodes = 0 > Number of split nodes = 0 > RINFOG(1) Operations during elimination (estim)= 6.966D+05 > Ordering compressed/constrained (ICNTL(12)) = 3 > > MEMORY ESTIMATIONS ... > Estimations with standard Full-Rank (FR) factorization: > Total space in MBytes, IC factorization (INFOG(17)): 1 > Total space in MBytes, OOC factorization (INFOG(27)): 1 > > Elapsed time in analysis driver= 0.0066 > > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 2 > 960 9792 > executing #MPI = 1, without OMP > > > > ****** FACTORIZATION STEP ******** > > GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ... > Number of working processes = 1 > ICNTL(22) Out-of-core option = 0 > ICNTL(35) BLR activation (eff. choice) = 0 > ICNTL(14) Memory relaxation = 20 > INFOG(3) Real space for factors (estimated)= 24094 > INFOG(4) Integer space for factors (estim.)= 12143 > Maximum frontal size (estimated) = 80 > Number of nodes in the tree = 227 > Memory allowed (MB -- 0: N/A ) = 0 > Memory provided by user, sum of LWK_USER = 0 > Relative threshold for pivoting, CNTL(1) = 0.1000D-01 > ZERO PIVOT DETECTION ON, THRESHOLD = 2.9434468577175697E-020 > INFINITE FIXATION > Effective size of S (based on INFO(39))= 31314 > Elapsed time to reformat/distribute matrix = 0.0006 > ** Memory allocated, total in Mbytes (INFOG(19)): 1 > ** Memory effectively used, total in Mbytes (INFOG(22)): 1 > ** Memory dynamically allocated for CB, total in Mbytes : 0 > > Elapsed time for (failed) factorization = 0.0032 > > Leaving factorization with ... > RINFOG(2) Operations in node assembly = 3.366D+04 > ------(3) Operations in node elimination = 9.346D+05 > INFOG (9) Real space for factors = 26980 > INFOG(10) Integer space for factors = 13047 > INFOG(11) Maximum front size = 84 > INFOG(29) Number of entries in factors = 24047 > INFOG(12) Number of negative pivots = 294 > INFOG(13) Number of delayed pivots = 452 > Number of 2x2 pivots in type 1 nodes = 0 > Number of 2X2 pivots in type 2 nodes = 0 > Nb of null pivots detected by ICNTL(24) = 0 > INFOG(28) Estimated deficiency = 0 > INFOG(14) Number of memory compress = 1 > > Elapsed time in factorization driver= 0.0042 > On return from DMUMPS, INFOG(1)= -9 > On return from DMUMPS, INFOG(2)= 22 > terminate called after throwing an instance of 'std::runtime_error' > what(): > > *** > ------------------------------------------------------------------------- > *** DOLFIN encountered an error. If you are not able to resolve this issue > *** using the information listed below, you can ask for help at > *** > *** fenics-support at googlegroups.com > *** > *** Remember to include the error message listed below and, if possible, > *** include a *minimal* running example to reproduce the error. > *** > *** > ------------------------------------------------------------------------- > *** Error: Unable to solve linear system using PETSc Krylov solver. > *** Reason: Solution failed to converge in 0 iterations (PETSc reason > DIVERGED_PC_FAILED, residual norm ||r|| = 0.000000e+00). > *** Where: This error was encountered inside PETScKrylovSolver.cpp. > *** Process: 0 > *** > *** DOLFIN version: 2019.1.0 > *** Git changeset: 74d7efe1e84d65e9433fd96c50f1d278fa3e3f3f > *** > ------------------------------------------------------------------------- > > Aborted (core dumped) > > [image: Sent from Mailspring] -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sun May 23 11:17:58 2021 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 23 May 2021 11:17:58 -0500 Subject: [petsc-users] Help needed with MUMPS solver In-Reply-To: References: Message-ID: <3D0CA2AC-CEC3-4B50-8692-32987039568C@petsc.dev> Please run with -ksp_error_if_not_converged and send all the output Barry > On May 23, 2021, at 10:16 AM, Karl Yang wrote: > > Hello, > > I am using MUMPS direct solver for my project. I used the following options for solving my problem and it works in most cases. But for some cases I encounter a divergence error. But I think it is actually error due to MUMPS? > > I'm not sure how to debug the error. It is appreciated if anyone familiar with MUMPS solver to offer me some guidance. > > regards > Juntao > > MUMPS options: > PetscOptionsSetValue(NULL, "-ksp_type", "preonly"); > PetscOptionsSetValue(NULL, "-pc_type", "cholesky"); > PetscOptionsSetValue(NULL, "-pc_factor_mat_solver_type", "mumps"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_1", "1"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_2", "1"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_3", "1"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_4", "3"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_28", "1"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_7", "2"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_24", "1"); > > > Log output from MUMPS and error message from PETSC at the bottom > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 1 240 2448 > executing #MPI = 1, without OMP > > ================================================= > MUMPS compiled with option -Dmetis > MUMPS compiled with option -Dptscotch > MUMPS compiled with option -Dscotch > This MUMPS version includes code for SAVE_RESTORE > ================================================= > L D L^T Solver for general symmetric matrices > Type of parallelism: Working host > > ****** ANALYSIS STEP ******** > > Scaling will be computed during analysis > Compute maximum matching (Maximum Transversal): 5 > ... JOB = 5: MAXIMIZE PRODUCT DIAGONAL AND SCALE > > Entering analysis phase with ... > N NNZ LIW INFO(1) > 240 2448 5137 0 > Matrix entries: IRN() ICN() > 1 1 1 2 1 3 > 1 4 1 5 1 6 > 1 7 1 8 1 9 > 1 10 > Average density of rows/columns = 18 > Average density of rows/columns = 18 > Ordering based on AMF > Constrained Ordering based on AMF > Average density of rows/columns = 18 > Average density of rows/columns = 18 > NFSIZ(.) = 0 38 14 0 33 33 0 0 0 0 > > FILS (.) = 0 148 4 -96 224 163 20 -43 8 1 > > FRERE(.) = 241 -5 -6 241 0 -2 241 241 241 241 > > > Leaving analysis phase with ... > INFOG(1) = 0 > INFOG(2) = 0 > -- (20) Number of entries in factors (estim.) = 3750 > -- (3) Real space for factors (estimated) = 4641 > -- (4) Integer space for factors (estimated) = 2816 > -- (5) Maximum frontal size (estimated) = 38 > -- (6) Number of nodes in the tree = 56 > -- (32) Type of analysis effectively used = 1 > -- (7) Ordering option effectively used = 2 > ICNTL(6) Maximum transversal option = 0 > ICNTL(7) Pivot order option = 2 > ICNTL(14) Percentage of memory relaxation = 20 > Number of level 2 nodes = 0 > Number of split nodes = 0 > RINFOG(1) Operations during elimination (estim)= 7.137D+04 > Ordering compressed/constrained (ICNTL(12)) = 3 > > MEMORY ESTIMATIONS ... > Estimations with standard Full-Rank (FR) factorization: > Total space in MBytes, IC factorization (INFOG(17)): 0 > Total space in MBytes, OOC factorization (INFOG(27)): 0 > > Elapsed time in analysis driver= 0.0016 > > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 2 240 2448 > executing #MPI = 1, without OMP > > > > ****** FACTORIZATION STEP ******** > > GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ... > Number of working processes = 1 > ICNTL(22) Out-of-core option = 0 > ICNTL(35) BLR activation (eff. choice) = 0 > ICNTL(14) Memory relaxation = 20 > INFOG(3) Real space for factors (estimated)= 4641 > INFOG(4) Integer space for factors (estim.)= 2816 > Maximum frontal size (estimated) = 38 > Number of nodes in the tree = 56 > Memory allowed (MB -- 0: N/A ) = 0 > Memory provided by user, sum of LWK_USER = 0 > Relative threshold for pivoting, CNTL(1) = 0.1000D-01 > ZERO PIVOT DETECTION ON, THRESHOLD = 2.8931920285365730E-020 > INFINITE FIXATION > Effective size of S (based on INFO(39))= 7981 > Elapsed time to reformat/distribute matrix = 0.0001 > ** Memory allocated, total in Mbytes (INFOG(19)): 0 > ** Memory effectively used, total in Mbytes (INFOG(22)): 0 > ** Memory dynamically allocated for CB, total in Mbytes : 0 > > Elapsed time for factorization = 0.0006 > > Leaving factorization with ... > RINFOG(2) Operations in node assembly = 5.976D+03 > ------(3) Operations in node elimination = 1.197D+05 > INFOG (9) Real space for factors = 6193 > INFOG(10) Integer space for factors = 3036 > INFOG(11) Maximum front size = 42 > INFOG(29) Number of entries in factors = 4896 > INFOG(12) Number of negative pivots = 79 > INFOG(13) Number of delayed pivots = 110 > Number of 2x2 pivots in type 1 nodes = 1 > Number of 2X2 pivots in type 2 nodes = 0 > Nb of null pivots detected by ICNTL(24) = 0 > INFOG(28) Estimated deficiency = 0 > INFOG(14) Number of memory compress = 0 > > Elapsed time in factorization driver= 0.0009 > > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 3 240 2448 > executing #MPI = 1, without OMP > > > > ****** SOLVE & CHECK STEP ******** > > GLOBAL STATISTICS PRIOR SOLVE PHASE ........... > Number of right-hand-sides = 1 > Blocking factor for multiple rhs = 1 > ICNTL (9) = 1 > --- (10) = 0 > --- (11) = 0 > --- (20) = 0 > --- (21) = 0 > --- (30) = 0 > --- (35) = 0 > > > Vector solution for column 1 > RHS > -7.828363D-02 -3.255337D+00 1.054729D+00 1.379822D-01 -3.892113D-01 > 1.433990D-01 1.089250D+00 2.252611D+00 3.215399D+00 -6.788806D-02 > ** Space in MBYTES used for solve : 0 > > Leaving solve with ... > Time to build/scatter RHS = 0.000003 > Time in solution step (fwd/bwd) = 0.000167 > .. Time in forward (fwd) step = 0.000053 > .. Time in backward (bwd) step = 0.000093 > Time to gather solution(cent.sol)= 0.000000 > Time to copy/scale dist. solution= 0.000000 > > Elapsed time in solve driver= 0.0004 > *** Warning: Verbose output for PETScKrylovSolver not implemented, calling PETSc KSPView directly. > KSP Object: 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: 1 MPI processes > type: cholesky > out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 0., needed 0. > Factored matrix follows: > Mat Object: 1 MPI processes > type: mumps > rows=240, cols=240 > package used to perform factorization: mumps > total: nonzeros=3750, allocated nonzeros=3750 > total number of mallocs used during MatSetValues calls=0 > MUMPS run parameters: > SYM (matrix type): 2 > PAR (host participation): 1 > ICNTL(1) (output for error): 1 > ICNTL(2) (output of diagnostic msg): 1 > ICNTL(3) (output for global info): 6 > ICNTL(4) (level of printing): 3 > ICNTL(5) (input mat struct): 0 > ICNTL(6) (matrix prescaling): 7 > ICNTL(7) (sequential matrix ordering):2 > ICNTL(8) (scaling strategy): 77 > ICNTL(10) (max num of refinements): 0 > ICNTL(11) (error analysis): 0 > ICNTL(12) (efficiency control): 0 > ICNTL(13) (efficiency control): 1 > ICNTL(14) (percentage of estimated workspace increase): 20 > ICNTL(18) (input mat struct): 0 > ICNTL(19) (Schur complement info): 0 > ICNTL(20) (rhs sparse pattern): 0 > ICNTL(21) (solution struct): 0 > ICNTL(22) (in-core/out-of-core facility): 0 > ICNTL(23) (max size of memory can be allocated locally):0 > ICNTL(24) (detection of null pivot rows): 1 > ICNTL(25) (computation of a null space basis): 0 > ICNTL(26) (Schur options for rhs or solution): 0 > ICNTL(27) (experimental parameter): -32 > ICNTL(28) (use parallel or sequential ordering): 1 > ICNTL(29) (parallel ordering): 0 > ICNTL(30) (user-specified set of entries in inv(A)): 0 > ICNTL(31) (factors is discarded in the solve phase): 0 > ICNTL(33) (compute determinant): 0 > ICNTL(35) (activate BLR based factorization): 0 > ICNTL(36) (choice of BLR factorization variant): 0 > ICNTL(38) (estimated compression rate of LU factors): 333 > CNTL(1) (relative pivoting threshold): 0.01 > CNTL(2) (stopping criterion of refinement): 1.49012e-08 > CNTL(3) (absolute pivoting threshold): 0. > CNTL(4) (value of static pivoting): -1. > CNTL(5) (fixation for null pivots): 0. > CNTL(7) (dropping parameter for BLR): 0. > RINFO(1) (local estimated flops for the elimination after analysis): > [0] 71368. > RINFO(2) (local estimated flops for the assembly after factorization): > [0] 5976. > RINFO(3) (local estimated flops for the elimination after factorization): > [0] 119716. > INFO(15) (estimated size of (in MB) MUMPS internal data for running numerical factorization): > [0] 0 > INFO(16) (size of (in MB) MUMPS internal data used during numerical factorization): > [0] 0 > INFO(23) (num of pivots eliminated on this processor after factorization): > [0] 240 > RINFOG(1) (global estimated flops for the elimination after analysis): 71368. > RINFOG(2) (global estimated flops for the assembly after factorization): 5976. > RINFOG(3) (global estimated flops for the elimination after factorization): 119716. > (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0.,0.)*(2^0) > INFOG(3) (estimated real workspace for factors on all processors after analysis): 4641 > INFOG(4) (estimated integer workspace for factors on all processors after analysis): 2816 > INFOG(5) (estimated maximum front size in the complete tree): 38 > INFOG(6) (number of nodes in the complete tree): 56 > INFOG(7) (ordering option effectively use after analysis): 2 > INFOG(8) (structural symmetry in percent of the permuted matrix after analysis): 100 > INFOG(9) (total real/complex workspace to store the matrix factors after factorization): 6193 > INFOG(10) (total integer space store the matrix factors after factorization): 3036 > INFOG(11) (order of largest frontal matrix after factorization): 42 > INFOG(12) (number of off-diagonal pivots): 79 > INFOG(13) (number of delayed pivots after factorization): 110 > INFOG(14) (number of memory compress after factorization): 0 > INFOG(15) (number of steps of iterative refinement after solution): 0 > INFOG(16) (estimated size (in MB) of all MUMPS internal data for factorization after analysis: value on the most memory consuming processor): 0 > INFOG(17) (estimated size of all MUMPS internal data for factorization after analysis: sum over all processors): 0 > INFOG(18) (size of all MUMPS internal data allocated during factorization: value on the most memory consuming processor): 0 > INFOG(19) (size of all MUMPS internal data allocated during factorization: sum over all processors): 0 > INFOG(20) (estimated number of entries in the factors): 3750 > INFOG(21) (size in MB of memory effectively used during factorization - value on the most memory consuming processor): 0 > INFOG(22) (size in MB of memory effectively used during factorization - sum over all processors): 0 > INFOG(23) (after analysis: value of ICNTL(6) effectively used): 5 > INFOG(24) (after analysis: value of ICNTL(12) effectively used): 3 > INFOG(25) (after factorization: number of pivots modified by static pivoting): 0 > INFOG(28) (after factorization: number of null pivots encountered): 0 > INFOG(29) (after factorization: effective number of entries in the factors (sum over all processors)): 4896 > INFOG(30, 31) (after solution: size in Mbytes of memory used during solution phase): 0, 0 > INFOG(32) (after analysis: type of analysis done): 1 > INFOG(33) (value used for ICNTL(8)): -2 > INFOG(34) (exponent of the determinant if determinant is requested): 0 > INFOG(35) (after factorization: number of entries taking into account BLR factor compression - sum over all processors): 4896 > INFOG(36) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - value on the most memory consuming processor): 0 > INFOG(37) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - sum over all processors): 0 > INFOG(38) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - value on the most memory consuming processor): 0 > INFOG(39) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - sum over all processors): 0 > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=240, cols=240 > total: nonzeros=4656, allocated nonzeros=4656 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 167 nodes, limit used is 5 > > Entering DMUMPS 5.2.1 from C interface with JOB = -2 > executing #MPI = 1, without OMP > rank: 0 coefficient: 0.132368 > > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 1 960 9792 > executing #MPI = 1, without OMP > > ================================================= > MUMPS compiled with option -Dmetis > MUMPS compiled with option -Dptscotch > MUMPS compiled with option -Dscotch > This MUMPS version includes code for SAVE_RESTORE > ================================================= > L D L^T Solver for general symmetric matrices > Type of parallelism: Working host > > ****** ANALYSIS STEP ******** > > Scaling will be computed during analysis > Compute maximum matching (Maximum Transversal): 5 > ... JOB = 5: MAXIMIZE PRODUCT DIAGONAL AND SCALE > > Entering analysis phase with ... > N NNZ LIW INFO(1) > 960 9792 20545 0 > Matrix entries: IRN() ICN() > 1 1 1 2 1 3 > 1 4 1 5 1 6 > 1 7 1 8 1 9 > 1 10 > Average density of rows/columns = 18 > Average density of rows/columns = 18 > Ordering based on AMF > Constrained Ordering based on AMF > Average density of rows/columns = 18 > Average density of rows/columns = 18 > NFSIZ(.) = 0 0 0 58 0 0 0 73 14 0 > > FILS (.) = 0 -747 -80 922 146 5 6 669 3 1 > > FRERE(.) = 961 961 961 0 961 961 961 -4 -69 961 > > > Leaving analysis phase with ... > INFOG(1) = 0 > INFOG(2) = 0 > -- (20) Number of entries in factors (estim.) = 20336 > -- (3) Real space for factors (estimated) = 24094 > -- (4) Integer space for factors (estimated) = 12143 > -- (5) Maximum frontal size (estimated) = 80 > -- (6) Number of nodes in the tree = 227 > -- (32) Type of analysis effectively used = 1 > -- (7) Ordering option effectively used = 2 > ICNTL(6) Maximum transversal option = 0 > ICNTL(7) Pivot order option = 2 > ICNTL(14) Percentage of memory relaxation = 20 > Number of level 2 nodes = 0 > Number of split nodes = 0 > RINFOG(1) Operations during elimination (estim)= 6.966D+05 > Ordering compressed/constrained (ICNTL(12)) = 3 > > MEMORY ESTIMATIONS ... > Estimations with standard Full-Rank (FR) factorization: > Total space in MBytes, IC factorization (INFOG(17)): 1 > Total space in MBytes, OOC factorization (INFOG(27)): 1 > > Elapsed time in analysis driver= 0.0066 > > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 2 960 9792 > executing #MPI = 1, without OMP > > > > ****** FACTORIZATION STEP ******** > > GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ... > Number of working processes = 1 > ICNTL(22) Out-of-core option = 0 > ICNTL(35) BLR activation (eff. choice) = 0 > ICNTL(14) Memory relaxation = 20 > INFOG(3) Real space for factors (estimated)= 24094 > INFOG(4) Integer space for factors (estim.)= 12143 > Maximum frontal size (estimated) = 80 > Number of nodes in the tree = 227 > Memory allowed (MB -- 0: N/A ) = 0 > Memory provided by user, sum of LWK_USER = 0 > Relative threshold for pivoting, CNTL(1) = 0.1000D-01 > ZERO PIVOT DETECTION ON, THRESHOLD = 2.9434468577175697E-020 > INFINITE FIXATION > Effective size of S (based on INFO(39))= 31314 > Elapsed time to reformat/distribute matrix = 0.0006 > ** Memory allocated, total in Mbytes (INFOG(19)): 1 > ** Memory effectively used, total in Mbytes (INFOG(22)): 1 > ** Memory dynamically allocated for CB, total in Mbytes : 0 > > Elapsed time for (failed) factorization = 0.0032 > > Leaving factorization with ... > RINFOG(2) Operations in node assembly = 3.366D+04 > ------(3) Operations in node elimination = 9.346D+05 > INFOG (9) Real space for factors = 26980 > INFOG(10) Integer space for factors = 13047 > INFOG(11) Maximum front size = 84 > INFOG(29) Number of entries in factors = 24047 > INFOG(12) Number of negative pivots = 294 > INFOG(13) Number of delayed pivots = 452 > Number of 2x2 pivots in type 1 nodes = 0 > Number of 2X2 pivots in type 2 nodes = 0 > Nb of null pivots detected by ICNTL(24) = 0 > INFOG(28) Estimated deficiency = 0 > INFOG(14) Number of memory compress = 1 > > Elapsed time in factorization driver= 0.0042 > On return from DMUMPS, INFOG(1)= -9 > On return from DMUMPS, INFOG(2)= 22 > terminate called after throwing an instance of 'std::runtime_error' > what(): > > *** ------------------------------------------------------------------------- > *** DOLFIN encountered an error. If you are not able to resolve this issue > *** using the information listed below, you can ask for help at > *** > *** fenics-support at googlegroups.com > *** > *** Remember to include the error message listed below and, if possible, > *** include a *minimal* running example to reproduce the error. > *** > *** ------------------------------------------------------------------------- > *** Error: Unable to solve linear system using PETSc Krylov solver. > *** Reason: Solution failed to converge in 0 iterations (PETSc reason DIVERGED_PC_FAILED, residual norm ||r|| = 0.000000e+00). > *** Where: This error was encountered inside PETScKrylovSolver.cpp. > *** Process: 0 > *** > *** DOLFIN version: 2019.1.0 > *** Git changeset: 74d7efe1e84d65e9433fd96c50f1d278fa3e3f3f > *** ------------------------------------------------------------------------- > > Aborted (core dumped) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Sun May 23 12:04:11 2021 From: pierre at joliv.et (Pierre Jolivet) Date: Sun, 23 May 2021 19:04:11 +0200 Subject: [petsc-users] Help needed with MUMPS solver In-Reply-To: References: Message-ID: > On 23 May 2021, at 6:17 PM, Matthew Knepley wrote: > > On Sun, May 23, 2021 at 11:17 AM Karl Yang > wrote: > Hello, > > I am using MUMPS direct solver for my project. I used the following options for solving my problem and it works in most cases. But for some cases I encounter a divergence error. But I think it is actually error due to MUMPS? > > I'm not sure how to debug the error. It is appreciated if anyone familiar with MUMPS solver to offer me some guidance. > > This says > > On return from DMUMPS, INFOG(1)= -9 > On return from DMUMPS, INFOG(2)= 22 > > that the internal work array for MUMPS is too small. I am not sure which option controls that. -mat_mumps_icntl_14 Juntao, you can usually troubleshoot MUMPS error codes by looking at http://mumps.enseeiht.fr/doc/userguide_5.4.0.pdf#page=92 Thanks, Pierre > Thanks, > > Matt > > regards > Juntao > > MUMPS options: > PetscOptionsSetValue(NULL, "-ksp_type", "preonly"); > PetscOptionsSetValue(NULL, "-pc_type", "cholesky"); > PetscOptionsSetValue(NULL, "-pc_factor_mat_solver_type", "mumps"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_1", "1"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_2", "1"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_3", "1"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_4", "3"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_28", "1"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_7", "2"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_24", "1"); > > > Log output from MUMPS and error message from PETSC at the bottom > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 1 240 2448 > executing #MPI = 1, without OMP > > ================================================= > MUMPS compiled with option -Dmetis > MUMPS compiled with option -Dptscotch > MUMPS compiled with option -Dscotch > This MUMPS version includes code for SAVE_RESTORE > ================================================= > L D L^T Solver for general symmetric matrices > Type of parallelism: Working host > > ****** ANALYSIS STEP ******** > > Scaling will be computed during analysis > Compute maximum matching (Maximum Transversal): 5 > ... JOB = 5: MAXIMIZE PRODUCT DIAGONAL AND SCALE > > Entering analysis phase with ... > N NNZ LIW INFO(1) > 240 2448 5137 0 > Matrix entries: IRN() ICN() > 1 1 1 2 1 3 > 1 4 1 5 1 6 > 1 7 1 8 1 9 > 1 10 > Average density of rows/columns = 18 > Average density of rows/columns = 18 > Ordering based on AMF > Constrained Ordering based on AMF > Average density of rows/columns = 18 > Average density of rows/columns = 18 > NFSIZ(.) = 0 38 14 0 33 33 0 0 0 0 > > FILS (.) = 0 148 4 -96 224 163 20 -43 8 1 > > FRERE(.) = 241 -5 -6 241 0 -2 241 241 241 241 > > > Leaving analysis phase with ... > INFOG(1) = 0 > INFOG(2) = 0 > -- (20) Number of entries in factors (estim.) = 3750 > -- (3) Real space for factors (estimated) = 4641 > -- (4) Integer space for factors (estimated) = 2816 > -- (5) Maximum frontal size (estimated) = 38 > -- (6) Number of nodes in the tree = 56 > -- (32) Type of analysis effectively used = 1 > -- (7) Ordering option effectively used = 2 > ICNTL(6) Maximum transversal option = 0 > ICNTL(7) Pivot order option = 2 > ICNTL(14) Percentage of memory relaxation = 20 > Number of level 2 nodes = 0 > Number of split nodes = 0 > RINFOG(1) Operations during elimination (estim)= 7.137D+04 > Ordering compressed/constrained (ICNTL(12)) = 3 > > MEMORY ESTIMATIONS ... > Estimations with standard Full-Rank (FR) factorization: > Total space in MBytes, IC factorization (INFOG(17)): 0 > Total space in MBytes, OOC factorization (INFOG(27)): 0 > > Elapsed time in analysis driver= 0.0016 > > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 2 240 2448 > executing #MPI = 1, without OMP > > > > ****** FACTORIZATION STEP ******** > > GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ... > Number of working processes = 1 > ICNTL(22) Out-of-core option = 0 > ICNTL(35) BLR activation (eff. choice) = 0 > ICNTL(14) Memory relaxation = 20 > INFOG(3) Real space for factors (estimated)= 4641 > INFOG(4) Integer space for factors (estim.)= 2816 > Maximum frontal size (estimated) = 38 > Number of nodes in the tree = 56 > Memory allowed (MB -- 0: N/A ) = 0 > Memory provided by user, sum of LWK_USER = 0 > Relative threshold for pivoting, CNTL(1) = 0.1000D-01 > ZERO PIVOT DETECTION ON, THRESHOLD = 2.8931920285365730E-020 > INFINITE FIXATION > Effective size of S (based on INFO(39))= 7981 > Elapsed time to reformat/distribute matrix = 0.0001 > ** Memory allocated, total in Mbytes (INFOG(19)): 0 > ** Memory effectively used, total in Mbytes (INFOG(22)): 0 > ** Memory dynamically allocated for CB, total in Mbytes : 0 > > Elapsed time for factorization = 0.0006 > > Leaving factorization with ... > RINFOG(2) Operations in node assembly = 5.976D+03 > ------(3) Operations in node elimination = 1.197D+05 > INFOG (9) Real space for factors = 6193 > INFOG(10) Integer space for factors = 3036 > INFOG(11) Maximum front size = 42 > INFOG(29) Number of entries in factors = 4896 > INFOG(12) Number of negative pivots = 79 > INFOG(13) Number of delayed pivots = 110 > Number of 2x2 pivots in type 1 nodes = 1 > Number of 2X2 pivots in type 2 nodes = 0 > Nb of null pivots detected by ICNTL(24) = 0 > INFOG(28) Estimated deficiency = 0 > INFOG(14) Number of memory compress = 0 > > Elapsed time in factorization driver= 0.0009 > > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 3 240 2448 > executing #MPI = 1, without OMP > > > > ****** SOLVE & CHECK STEP ******** > > GLOBAL STATISTICS PRIOR SOLVE PHASE ........... > Number of right-hand-sides = 1 > Blocking factor for multiple rhs = 1 > ICNTL (9) = 1 > --- (10) = 0 > --- (11) = 0 > --- (20) = 0 > --- (21) = 0 > --- (30) = 0 > --- (35) = 0 > > > Vector solution for column 1 > RHS > -7.828363D-02 -3.255337D+00 1.054729D+00 1.379822D-01 -3.892113D-01 > 1.433990D-01 1.089250D+00 2.252611D+00 3.215399D+00 -6.788806D-02 > ** Space in MBYTES used for solve : 0 > > Leaving solve with ... > Time to build/scatter RHS = 0.000003 > Time in solution step (fwd/bwd) = 0.000167 > .. Time in forward (fwd) step = 0.000053 > .. Time in backward (bwd) step = 0.000093 > Time to gather solution(cent.sol)= 0.000000 > Time to copy/scale dist. solution= 0.000000 > > Elapsed time in solve driver= 0.0004 > *** Warning: Verbose output for PETScKrylovSolver not implemented, calling PETSc KSPView directly. > KSP Object: 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: 1 MPI processes > type: cholesky > out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 0., needed 0. > Factored matrix follows: > Mat Object: 1 MPI processes > type: mumps > rows=240, cols=240 > package used to perform factorization: mumps > total: nonzeros=3750, allocated nonzeros=3750 > total number of mallocs used during MatSetValues calls=0 > MUMPS run parameters: > SYM (matrix type): 2 > PAR (host participation): 1 > ICNTL(1) (output for error): 1 > ICNTL(2) (output of diagnostic msg): 1 > ICNTL(3) (output for global info): 6 > ICNTL(4) (level of printing): 3 > ICNTL(5) (input mat struct): 0 > ICNTL(6) (matrix prescaling): 7 > ICNTL(7) (sequential matrix ordering):2 > ICNTL(8) (scaling strategy): 77 > ICNTL(10) (max num of refinements): 0 > ICNTL(11) (error analysis): 0 > ICNTL(12) (efficiency control): 0 > ICNTL(13) (efficiency control): 1 > ICNTL(14) (percentage of estimated workspace increase): 20 > ICNTL(18) (input mat struct): 0 > ICNTL(19) (Schur complement info): 0 > ICNTL(20) (rhs sparse pattern): 0 > ICNTL(21) (solution struct): 0 > ICNTL(22) (in-core/out-of-core facility): 0 > ICNTL(23) (max size of memory can be allocated locally):0 > ICNTL(24) (detection of null pivot rows): 1 > ICNTL(25) (computation of a null space basis): 0 > ICNTL(26) (Schur options for rhs or solution): 0 > ICNTL(27) (experimental parameter): -32 > ICNTL(28) (use parallel or sequential ordering): 1 > ICNTL(29) (parallel ordering): 0 > ICNTL(30) (user-specified set of entries in inv(A)): 0 > ICNTL(31) (factors is discarded in the solve phase): 0 > ICNTL(33) (compute determinant): 0 > ICNTL(35) (activate BLR based factorization): 0 > ICNTL(36) (choice of BLR factorization variant): 0 > ICNTL(38) (estimated compression rate of LU factors): 333 > CNTL(1) (relative pivoting threshold): 0.01 > CNTL(2) (stopping criterion of refinement): 1.49012e-08 > CNTL(3) (absolute pivoting threshold): 0. > CNTL(4) (value of static pivoting): -1. > CNTL(5) (fixation for null pivots): 0. > CNTL(7) (dropping parameter for BLR): 0. > RINFO(1) (local estimated flops for the elimination after analysis): > [0] 71368. > RINFO(2) (local estimated flops for the assembly after factorization): > [0] 5976. > RINFO(3) (local estimated flops for the elimination after factorization): > [0] 119716. > INFO(15) (estimated size of (in MB) MUMPS internal data for running numerical factorization): > [0] 0 > INFO(16) (size of (in MB) MUMPS internal data used during numerical factorization): > [0] 0 > INFO(23) (num of pivots eliminated on this processor after factorization): > [0] 240 > RINFOG(1) (global estimated flops for the elimination after analysis): 71368. > RINFOG(2) (global estimated flops for the assembly after factorization): 5976. > RINFOG(3) (global estimated flops for the elimination after factorization): 119716. > (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0.,0.)*(2^0) > INFOG(3) (estimated real workspace for factors on all processors after analysis): 4641 > INFOG(4) (estimated integer workspace for factors on all processors after analysis): 2816 > INFOG(5) (estimated maximum front size in the complete tree): 38 > INFOG(6) (number of nodes in the complete tree): 56 > INFOG(7) (ordering option effectively use after analysis): 2 > INFOG(8) (structural symmetry in percent of the permuted matrix after analysis): 100 > INFOG(9) (total real/complex workspace to store the matrix factors after factorization): 6193 > INFOG(10) (total integer space store the matrix factors after factorization): 3036 > INFOG(11) (order of largest frontal matrix after factorization): 42 > INFOG(12) (number of off-diagonal pivots): 79 > INFOG(13) (number of delayed pivots after factorization): 110 > INFOG(14) (number of memory compress after factorization): 0 > INFOG(15) (number of steps of iterative refinement after solution): 0 > INFOG(16) (estimated size (in MB) of all MUMPS internal data for factorization after analysis: value on the most memory consuming processor): 0 > INFOG(17) (estimated size of all MUMPS internal data for factorization after analysis: sum over all processors): 0 > INFOG(18) (size of all MUMPS internal data allocated during factorization: value on the most memory consuming processor): 0 > INFOG(19) (size of all MUMPS internal data allocated during factorization: sum over all processors): 0 > INFOG(20) (estimated number of entries in the factors): 3750 > INFOG(21) (size in MB of memory effectively used during factorization - value on the most memory consuming processor): 0 > INFOG(22) (size in MB of memory effectively used during factorization - sum over all processors): 0 > INFOG(23) (after analysis: value of ICNTL(6) effectively used): 5 > INFOG(24) (after analysis: value of ICNTL(12) effectively used): 3 > INFOG(25) (after factorization: number of pivots modified by static pivoting): 0 > INFOG(28) (after factorization: number of null pivots encountered): 0 > INFOG(29) (after factorization: effective number of entries in the factors (sum over all processors)): 4896 > INFOG(30, 31) (after solution: size in Mbytes of memory used during solution phase): 0, 0 > INFOG(32) (after analysis: type of analysis done): 1 > INFOG(33) (value used for ICNTL(8)): -2 > INFOG(34) (exponent of the determinant if determinant is requested): 0 > INFOG(35) (after factorization: number of entries taking into account BLR factor compression - sum over all processors): 4896 > INFOG(36) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - value on the most memory consuming processor): 0 > INFOG(37) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - sum over all processors): 0 > INFOG(38) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - value on the most memory consuming processor): 0 > INFOG(39) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - sum over all processors): 0 > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=240, cols=240 > total: nonzeros=4656, allocated nonzeros=4656 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 167 nodes, limit used is 5 > > Entering DMUMPS 5.2.1 from C interface with JOB = -2 > executing #MPI = 1, without OMP > rank: 0 coefficient: 0.132368 > > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 1 960 9792 > executing #MPI = 1, without OMP > > ================================================= > MUMPS compiled with option -Dmetis > MUMPS compiled with option -Dptscotch > MUMPS compiled with option -Dscotch > This MUMPS version includes code for SAVE_RESTORE > ================================================= > L D L^T Solver for general symmetric matrices > Type of parallelism: Working host > > ****** ANALYSIS STEP ******** > > Scaling will be computed during analysis > Compute maximum matching (Maximum Transversal): 5 > ... JOB = 5: MAXIMIZE PRODUCT DIAGONAL AND SCALE > > Entering analysis phase with ... > N NNZ LIW INFO(1) > 960 9792 20545 0 > Matrix entries: IRN() ICN() > 1 1 1 2 1 3 > 1 4 1 5 1 6 > 1 7 1 8 1 9 > 1 10 > Average density of rows/columns = 18 > Average density of rows/columns = 18 > Ordering based on AMF > Constrained Ordering based on AMF > Average density of rows/columns = 18 > Average density of rows/columns = 18 > NFSIZ(.) = 0 0 0 58 0 0 0 73 14 0 > > FILS (.) = 0 -747 -80 922 146 5 6 669 3 1 > > FRERE(.) = 961 961 961 0 961 961 961 -4 -69 961 > > > Leaving analysis phase with ... > INFOG(1) = 0 > INFOG(2) = 0 > -- (20) Number of entries in factors (estim.) = 20336 > -- (3) Real space for factors (estimated) = 24094 > -- (4) Integer space for factors (estimated) = 12143 > -- (5) Maximum frontal size (estimated) = 80 > -- (6) Number of nodes in the tree = 227 > -- (32) Type of analysis effectively used = 1 > -- (7) Ordering option effectively used = 2 > ICNTL(6) Maximum transversal option = 0 > ICNTL(7) Pivot order option = 2 > ICNTL(14) Percentage of memory relaxation = 20 > Number of level 2 nodes = 0 > Number of split nodes = 0 > RINFOG(1) Operations during elimination (estim)= 6.966D+05 > Ordering compressed/constrained (ICNTL(12)) = 3 > > MEMORY ESTIMATIONS ... > Estimations with standard Full-Rank (FR) factorization: > Total space in MBytes, IC factorization (INFOG(17)): 1 > Total space in MBytes, OOC factorization (INFOG(27)): 1 > > Elapsed time in analysis driver= 0.0066 > > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 2 960 9792 > executing #MPI = 1, without OMP > > > > ****** FACTORIZATION STEP ******** > > GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ... > Number of working processes = 1 > ICNTL(22) Out-of-core option = 0 > ICNTL(35) BLR activation (eff. choice) = 0 > ICNTL(14) Memory relaxation = 20 > INFOG(3) Real space for factors (estimated)= 24094 > INFOG(4) Integer space for factors (estim.)= 12143 > Maximum frontal size (estimated) = 80 > Number of nodes in the tree = 227 > Memory allowed (MB -- 0: N/A ) = 0 > Memory provided by user, sum of LWK_USER = 0 > Relative threshold for pivoting, CNTL(1) = 0.1000D-01 > ZERO PIVOT DETECTION ON, THRESHOLD = 2.9434468577175697E-020 > INFINITE FIXATION > Effective size of S (based on INFO(39))= 31314 > Elapsed time to reformat/distribute matrix = 0.0006 > ** Memory allocated, total in Mbytes (INFOG(19)): 1 > ** Memory effectively used, total in Mbytes (INFOG(22)): 1 > ** Memory dynamically allocated for CB, total in Mbytes : 0 > > Elapsed time for (failed) factorization = 0.0032 > > Leaving factorization with ... > RINFOG(2) Operations in node assembly = 3.366D+04 > ------(3) Operations in node elimination = 9.346D+05 > INFOG (9) Real space for factors = 26980 > INFOG(10) Integer space for factors = 13047 > INFOG(11) Maximum front size = 84 > INFOG(29) Number of entries in factors = 24047 > INFOG(12) Number of negative pivots = 294 > INFOG(13) Number of delayed pivots = 452 > Number of 2x2 pivots in type 1 nodes = 0 > Number of 2X2 pivots in type 2 nodes = 0 > Nb of null pivots detected by ICNTL(24) = 0 > INFOG(28) Estimated deficiency = 0 > INFOG(14) Number of memory compress = 1 > > Elapsed time in factorization driver= 0.0042 > On return from DMUMPS, INFOG(1)= -9 > On return from DMUMPS, INFOG(2)= 22 > terminate called after throwing an instance of 'std::runtime_error' > what(): > > *** ------------------------------------------------------------------------- > *** DOLFIN encountered an error. If you are not able to resolve this issue > *** using the information listed below, you can ask for help at > *** > *** fenics-support at googlegroups.com > *** > *** Remember to include the error message listed below and, if possible, > *** include a *minimal* running example to reproduce the error. > *** > *** ------------------------------------------------------------------------- > *** Error: Unable to solve linear system using PETSc Krylov solver. > *** Reason: Solution failed to converge in 0 iterations (PETSc reason DIVERGED_PC_FAILED, residual norm ||r|| = 0.000000e+00). > *** Where: This error was encountered inside PETScKrylovSolver.cpp. > *** Process: 0 > *** > *** DOLFIN version: 2019.1.0 > *** Git changeset: 74d7efe1e84d65e9433fd96c50f1d278fa3e3f3f > *** ------------------------------------------------------------------------- > > Aborted (core dumped) > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From salazardetro1 at llnl.gov Mon May 24 20:08:17 2021 From: salazardetro1 at llnl.gov (Salazar De Troya, Miguel) Date: Tue, 25 May 2021 01:08:17 +0000 Subject: [petsc-users] Sum of the absolute values of each row's components Message-ID: <572172F1-F6A2-46BF-B7B3-EF12858058F9@llnl.gov> Hello, I am simply interested in obtaining a vector with x_i = \sum_j | A_{i, j} | for each row ?i? in the matrix A. I found MatGetColumnNorms(), but no row version. I am wondering if it is more efficient to calculate the transpose A^T and then MatGetColumnNorms() or maybe iterate through each row with MatGetRow() and calculate \sum_j | A_{i, j} | by hand by myself. Thanks Miguel Miguel A. Salazar de Troya Postdoctoral Researcher, Lawrence Livermore National Laboratory B141 Rm: 1085-5 Ph: 1(925) 422-6411 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 24 20:19:08 2021 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 24 May 2021 21:19:08 -0400 Subject: [petsc-users] Sum of the absolute values of each row's components In-Reply-To: <572172F1-F6A2-46BF-B7B3-EF12858058F9@llnl.gov> References: <572172F1-F6A2-46BF-B7B3-EF12858058F9@llnl.gov> Message-ID: On Mon, May 24, 2021 at 9:08 PM Salazar De Troya, Miguel via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hello, > > > > I am simply interested in obtaining a vector with x_i = \sum_j | A_{i, j} > | for each row ?i? in the matrix A. I found MatGetColumnNorms(), but no row > version. I am wondering if it is more efficient to calculate the transpose > A^T and then MatGetColumnNorms() or maybe iterate through each row with > MatGetRow() and calculate \sum_j | A_{i, j} | by hand by myself. > Vec ones, sums; MatCreateVecs(A, &ones, &sums); VecSet(ones, 1.0); MatMult(A, ones, sums); VecDestroy(&ones); VecDestroy(&sums); Thanks, Matt > > > Thanks > > Miguel > > > > Miguel A. Salazar de Troya > > Postdoctoral Researcher, Lawrence Livermore National Laboratory > > B141 > > Rm: 1085-5 > > Ph: 1(925) 422-6411 > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 24 20:21:04 2021 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 24 May 2021 21:21:04 -0400 Subject: [petsc-users] Sum of the absolute values of each row's components In-Reply-To: References: <572172F1-F6A2-46BF-B7B3-EF12858058F9@llnl.gov> Message-ID: On Mon, May 24, 2021 at 9:19 PM Matthew Knepley wrote: > On Mon, May 24, 2021 at 9:08 PM Salazar De Troya, Miguel via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Hello, >> >> >> >> I am simply interested in obtaining a vector with x_i = \sum_j | A_{i, j} >> | for each row ?i? in the matrix A. I found MatGetColumnNorms(), but no row >> version. I am wondering if it is more efficient to calculate the transpose >> A^T and then MatGetColumnNorms() or maybe iterate through each row with >> MatGetRow() and calculate \sum_j | A_{i, j} | by hand by myself. >> > > Vec ones, sums; > > MatCreateVecs(A, &ones, &sums); > VecSet(ones, 1.0); > MatMult(A, ones, sums); > VecDestroy(&ones); > VecDestroy(&sums); > This does the row sum, not the absolute value. If you want the absolute value, you can just call MatGetRow() and sum the values. Thanks, Matt > Thanks, > > Matt > > >> >> >> Thanks >> >> Miguel >> >> >> >> Miguel A. Salazar de Troya >> >> Postdoctoral Researcher, Lawrence Livermore National Laboratory >> >> B141 >> >> Rm: 1085-5 >> >> Ph: 1(925) 422-6411 >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From y.juntao at hotmail.com Mon May 24 21:06:11 2021 From: y.juntao at hotmail.com (Karl Yang) Date: Tue, 25 May 2021 10:06:11 +0800 Subject: [petsc-users] Help needed with MUMPS solver In-Reply-To: <3D0CA2AC-CEC3-4B50-8692-32987039568C@petsc.dev> References: <3D0CA2AC-CEC3-4B50-8692-32987039568C@petsc.dev> Message-ID: Hi, Barry I got the following error with -ksp_error_if_not_converged. I'm making use of the PETSc from dolfin package. So it could also be bugs in dolfin. But still thank you for providing more information. Regards Juntao terminate called after throwing an instance of 'std::runtime_error' what(): *** ------------------------------------------------------------------------- *** DOLFIN encountered an error. If you are not able to resolve this issue *** using the information listed below, you can ask for help at *** *** fenics-support at googlegroups.com *** *** Remember to include the error message listed below and, if possible, *** include a *minimal* running example to reproduce the error. *** *** ------------------------------------------------------------------------- *** Error: Unable to successfully call PETSc function 'KSPSolve'. *** Reason: PETSc error code is: 76 (Error in external library). *** Where: This error was encountered inside /tmp/dolfin/dolfin/la/PETScKrylovSolver.cpp. *** Process: 0 *** *** DOLFIN version: 2019.1.0 *** Git changeset: 74d7efe1e84d65e9433fd96c50f1d278fa3e3f3f *** ------------------------------------------------------------------------- =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 290 RUNNING AT c10236694934 = EXIT CODE: 134 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) This typically refers to a problem with your application. Please see the FAQ page for debugging suggestions Regards Juntao On May 24 2021, at 12:17 am, Barry Smith wrote: > > Please run with -ksp_error_if_not_converged and send all the output > > Barry > > > > On May 23, 2021, at 10:16 AM, Karl Yang wrote: > > Hello, > > I am using MUMPS direct solver for my project. I used the following options for solving my problem and it works in most cases. But for some cases I encounter a divergence error. But I think it is actually error due to MUMPS? > > I'm not sure how to debug the error. It is appreciated if anyone familiar with MUMPS solver to offer me some guidance. > > regards > > Juntao > > > > MUMPS options: > > > > PetscOptionsSetValue(NULL, "-ksp_type", "preonly"); > > PetscOptionsSetValue(NULL, "-pc_type", "cholesky"); > > PetscOptionsSetValue(NULL, "-pc_factor_mat_solver_type", "mumps"); > > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_1", "1"); > > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_2", "1"); > > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_3", "1"); > > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_4", "3"); > > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_28", "1"); > > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_7", "2"); > > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_24", "1"); > > > > > > Log output from MUMPS and error message from PETSC at the bottom > > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 1 240 2448 > > executing #MPI = 1, without OMP > > > > ================================================= > > MUMPS compiled with option -Dmetis > > MUMPS compiled with option -Dptscotch > > MUMPS compiled with option -Dscotch > > This MUMPS version includes code for SAVE_RESTORE > > ================================================= > > L D L^T Solver for general symmetric matrices > > Type of parallelism: Working host > > > > ****** ANALYSIS STEP ******** > > Scaling will be computed during analysis > > Compute maximum matching (Maximum Transversal): 5 > > ... JOB = 5: MAXIMIZE PRODUCT DIAGONAL AND SCALE > > > > Entering analysis phase with ... > > N NNZ LIW INFO(1) > > 240 2448 5137 0 > > Matrix entries: IRN() ICN() > > 1 1 1 2 1 3 > > 1 4 1 5 1 6 > > 1 7 1 8 1 9 > > 1 10 > > Average density of rows/columns = 18 > > Average density of rows/columns = 18 > > Ordering based on AMF > > Constrained Ordering based on AMF > > Average density of rows/columns = 18 > > Average density of rows/columns = 18 > > NFSIZ(.) = 0 38 14 0 33 33 0 0 0 0 > > > > FILS (.) = 0 148 4 -96 224 163 20 -43 8 1 > > FRERE(.) = 241 -5 -6 241 0 -2 241 241 241 241 > > > > Leaving analysis phase with ... > > INFOG(1) = 0 > > INFOG(2) = 0 > > -- (20) Number of entries in factors (estim.) = 3750 > > -- (3) Real space for factors (estimated) = 4641 > > -- (4) Integer space for factors (estimated) = 2816 > > -- (5) Maximum frontal size (estimated) = 38 > > -- (6) Number of nodes in the tree = 56 > > -- (32) Type of analysis effectively used = 1 > > -- (7) Ordering option effectively used = 2 > > ICNTL(6) Maximum transversal option = 0 > > ICNTL(7) Pivot order option = 2 > > ICNTL(14) Percentage of memory relaxation = 20 > > Number of level 2 nodes = 0 > > Number of split nodes = 0 > > RINFOG(1) Operations during elimination (estim)= 7.137D+04 > > Ordering compressed/constrained (ICNTL(12)) = 3 > > > > MEMORY ESTIMATIONS ... > > Estimations with standard Full-Rank (FR) factorization: > > Total space in MBytes, IC factorization (INFOG(17)): 0 > > Total space in MBytes, OOC factorization (INFOG(27)): 0 > > > > Elapsed time in analysis driver= 0.0016 > > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 2 240 2448 > > executing #MPI = 1, without OMP > > > > > > > > ****** FACTORIZATION STEP ******** > > GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ... > > Number of working processes = 1 > > ICNTL(22) Out-of-core option = 0 > > ICNTL(35) BLR activation (eff. choice) = 0 > > ICNTL(14) Memory relaxation = 20 > > INFOG(3) Real space for factors (estimated)= 4641 > > INFOG(4) Integer space for factors (estim.)= 2816 > > Maximum frontal size (estimated) = 38 > > Number of nodes in the tree = 56 > > Memory allowed (MB -- 0: N/A ) = 0 > > Memory provided by user, sum of LWK_USER = 0 > > Relative threshold for pivoting, CNTL(1) = 0.1000D-01 > > ZERO PIVOT DETECTION ON, THRESHOLD = 2.8931920285365730E-020 > > INFINITE FIXATION > > Effective size of S (based on INFO(39))= 7981 > > Elapsed time to reformat/distribute matrix = 0.0001 > > ** Memory allocated, total in Mbytes (INFOG(19)): 0 > > ** Memory effectively used, total in Mbytes (INFOG(22)): 0 > > ** Memory dynamically allocated for CB, total in Mbytes : 0 > > > > Elapsed time for factorization = 0.0006 > > Leaving factorization with ... > > RINFOG(2) Operations in node assembly = 5.976D+03 > > ------(3) Operations in node elimination = 1.197D+05 > > INFOG (9) Real space for factors = 6193 > > INFOG(10) Integer space for factors = 3036 > > INFOG(11) Maximum front size = 42 > > INFOG(29) Number of entries in factors = 4896 > > INFOG(12) Number of negative pivots = 79 > > INFOG(13) Number of delayed pivots = 110 > > Number of 2x2 pivots in type 1 nodes = 1 > > Number of 2X2 pivots in type 2 nodes = 0 > > Nb of null pivots detected by ICNTL(24) = 0 > > INFOG(28) Estimated deficiency = 0 > > INFOG(14) Number of memory compress = 0 > > > > Elapsed time in factorization driver= 0.0009 > > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 3 240 2448 > > executing #MPI = 1, without OMP > > > > > > > > ****** SOLVE & CHECK STEP ******** > > GLOBAL STATISTICS PRIOR SOLVE PHASE ........... > > Number of right-hand-sides = 1 > > Blocking factor for multiple rhs = 1 > > ICNTL (9) = 1 > > --- (10) = 0 > > --- (11) = 0 > > --- (20) = 0 > > --- (21) = 0 > > --- (30) = 0 > > --- (35) = 0 > > > > > > Vector solution for column 1 > > RHS > > -7.828363D-02 -3.255337D+00 1.054729D+00 1.379822D-01 -3.892113D-01 > > 1.433990D-01 1.089250D+00 2.252611D+00 3.215399D+00 -6.788806D-02 > > ** Space in MBYTES used for solve : 0 > > > > Leaving solve with ... > > Time to build/scatter RHS = 0.000003 > > Time in solution step (fwd/bwd) = 0.000167 > > .. Time in forward (fwd) step = 0.000053 > > .. Time in backward (bwd) step = 0.000093 > > Time to gather solution(cent.sol)= 0.000000 > > Time to copy/scale dist. solution= 0.000000 > > > > Elapsed time in solve driver= 0.0004 > > *** Warning: Verbose output for PETScKrylovSolver not implemented, calling PETSc KSPView directly. > > KSP Object: 1 MPI processes > > type: preonly > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > PC Object: 1 MPI processes > > type: cholesky > > out-of-place factorization > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: natural > > factor fill ratio given 0., needed 0. > > Factored matrix follows: > > Mat Object: 1 MPI processes > > type: mumps > > rows=240, cols=240 > > package used to perform factorization: mumps > > total: nonzeros=3750, allocated nonzeros=3750 > > total number of mallocs used during MatSetValues calls=0 > > MUMPS run parameters: > > SYM (matrix type): 2 > > PAR (host participation): 1 > > ICNTL(1) (output for error): 1 > > ICNTL(2) (output of diagnostic msg): 1 > > ICNTL(3) (output for global info): 6 > > ICNTL(4) (level of printing): 3 > > ICNTL(5) (input mat struct): 0 > > ICNTL(6) (matrix prescaling): 7 > > ICNTL(7) (sequential matrix ordering):2 > > ICNTL(8) (scaling strategy): 77 > > ICNTL(10) (max num of refinements): 0 > > ICNTL(11) (error analysis): 0 > > ICNTL(12) (efficiency control): 0 > > ICNTL(13) (efficiency control): 1 > > ICNTL(14) (percentage of estimated workspace increase): 20 > > ICNTL(18) (input mat struct): 0 > > ICNTL(19) (Schur complement info): 0 > > ICNTL(20) (rhs sparse pattern): 0 > > ICNTL(21) (solution struct): 0 > > ICNTL(22) (in-core/out-of-core facility): 0 > > ICNTL(23) (max size of memory can be allocated locally):0 > > ICNTL(24) (detection of null pivot rows): 1 > > ICNTL(25) (computation of a null space basis): 0 > > ICNTL(26) (Schur options for rhs or solution): 0 > > ICNTL(27) (experimental parameter): -32 > > ICNTL(28) (use parallel or sequential ordering): 1 > > ICNTL(29) (parallel ordering): 0 > > ICNTL(30) (user-specified set of entries in inv(A)): 0 > > ICNTL(31) (factors is discarded in the solve phase): 0 > > ICNTL(33) (compute determinant): 0 > > ICNTL(35) (activate BLR based factorization): 0 > > ICNTL(36) (choice of BLR factorization variant): 0 > > ICNTL(38) (estimated compression rate of LU factors): 333 > > CNTL(1) (relative pivoting threshold): 0.01 > > CNTL(2) (stopping criterion of refinement): 1.49012e-08 > > CNTL(3) (absolute pivoting threshold): 0. > > CNTL(4) (value of static pivoting): -1. > > CNTL(5) (fixation for null pivots): 0. > > CNTL(7) (dropping parameter for BLR): 0. > > RINFO(1) (local estimated flops for the elimination after analysis): > > [0] 71368. > > RINFO(2) (local estimated flops for the assembly after factorization): > > [0] 5976. > > RINFO(3) (local estimated flops for the elimination after factorization): > > [0] 119716. > > INFO(15) (estimated size of (in MB) MUMPS internal data for running numerical factorization): > > [0] 0 > > INFO(16) (size of (in MB) MUMPS internal data used during numerical factorization): > > [0] 0 > > INFO(23) (num of pivots eliminated on this processor after factorization): > > [0] 240 > > RINFOG(1) (global estimated flops for the elimination after analysis): 71368. > > RINFOG(2) (global estimated flops for the assembly after factorization): 5976. > > RINFOG(3) (global estimated flops for the elimination after factorization): 119716. > > (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0.,0.)*(2^0) > > INFOG(3) (estimated real workspace for factors on all processors after analysis): 4641 > > INFOG(4) (estimated integer workspace for factors on all processors after analysis): 2816 > > INFOG(5) (estimated maximum front size in the complete tree): 38 > > INFOG(6) (number of nodes in the complete tree): 56 > > INFOG(7) (ordering option effectively use after analysis): 2 > > INFOG(8) (structural symmetry in percent of the permuted matrix after analysis): 100 > > INFOG(9) (total real/complex workspace to store the matrix factors after factorization): 6193 > > INFOG(10) (total integer space store the matrix factors after factorization): 3036 > > INFOG(11) (order of largest frontal matrix after factorization): 42 > > INFOG(12) (number of off-diagonal pivots): 79 > > INFOG(13) (number of delayed pivots after factorization): 110 > > INFOG(14) (number of memory compress after factorization): 0 > > INFOG(15) (number of steps of iterative refinement after solution): 0 > > INFOG(16) (estimated size (in MB) of all MUMPS internal data for factorization after analysis: value on the most memory consuming processor): 0 > > INFOG(17) (estimated size of all MUMPS internal data for factorization after analysis: sum over all processors): 0 > > INFOG(18) (size of all MUMPS internal data allocated during factorization: value on the most memory consuming processor): 0 > > INFOG(19) (size of all MUMPS internal data allocated during factorization: sum over all processors): 0 > > INFOG(20) (estimated number of entries in the factors): 3750 > > INFOG(21) (size in MB of memory effectively used during factorization - value on the most memory consuming processor): 0 > > INFOG(22) (size in MB of memory effectively used during factorization - sum over all processors): 0 > > INFOG(23) (after analysis: value of ICNTL(6) effectively used): 5 > > INFOG(24) (after analysis: value of ICNTL(12) effectively used): 3 > > INFOG(25) (after factorization: number of pivots modified by static pivoting): 0 > > INFOG(28) (after factorization: number of null pivots encountered): 0 > > INFOG(29) (after factorization: effective number of entries in the factors (sum over all processors)): 4896 > > INFOG(30, 31) (after solution: size in Mbytes of memory used during solution phase): 0, 0 > > INFOG(32) (after analysis: type of analysis done): 1 > > INFOG(33) (value used for ICNTL(8)): -2 > > INFOG(34) (exponent of the determinant if determinant is requested): 0 > > INFOG(35) (after factorization: number of entries taking into account BLR factor compression - sum over all processors): 4896 > > INFOG(36) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - value on the most memory consuming processor): 0 > > INFOG(37) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - sum over all processors): 0 > > INFOG(38) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - value on the most memory consuming processor): 0 > > INFOG(39) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - sum over all processors): 0 > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=240, cols=240 > > total: nonzeros=4656, allocated nonzeros=4656 > > total number of mallocs used during MatSetValues calls=0 > > using I-node routines: found 167 nodes, limit used is 5 > > > > Entering DMUMPS 5.2.1 from C interface with JOB = -2 > > executing #MPI = 1, without OMP > > rank: 0 coefficient: 0.132368 > > > > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 1 960 9792 > > executing #MPI = 1, without OMP > > > > ================================================= > > MUMPS compiled with option -Dmetis > > MUMPS compiled with option -Dptscotch > > MUMPS compiled with option -Dscotch > > This MUMPS version includes code for SAVE_RESTORE > > ================================================= > > L D L^T Solver for general symmetric matrices > > Type of parallelism: Working host > > > > ****** ANALYSIS STEP ******** > > Scaling will be computed during analysis > > Compute maximum matching (Maximum Transversal): 5 > > ... JOB = 5: MAXIMIZE PRODUCT DIAGONAL AND SCALE > > > > Entering analysis phase with ... > > N NNZ LIW INFO(1) > > 960 9792 20545 0 > > Matrix entries: IRN() ICN() > > 1 1 1 2 1 3 > > 1 4 1 5 1 6 > > 1 7 1 8 1 9 > > 1 10 > > Average density of rows/columns = 18 > > Average density of rows/columns = 18 > > Ordering based on AMF > > Constrained Ordering based on AMF > > Average density of rows/columns = 18 > > Average density of rows/columns = 18 > > NFSIZ(.) = 0 0 0 58 0 0 0 73 14 0 > > > > FILS (.) = 0 -747 -80 922 146 5 6 669 3 1 > > FRERE(.) = 961 961 961 0 961 961 961 -4 -69 961 > > > > Leaving analysis phase with ... > > INFOG(1) = 0 > > INFOG(2) = 0 > > -- (20) Number of entries in factors (estim.) = 20336 > > -- (3) Real space for factors (estimated) = 24094 > > -- (4) Integer space for factors (estimated) = 12143 > > -- (5) Maximum frontal size (estimated) = 80 > > -- (6) Number of nodes in the tree = 227 > > -- (32) Type of analysis effectively used = 1 > > -- (7) Ordering option effectively used = 2 > > ICNTL(6) Maximum transversal option = 0 > > ICNTL(7) Pivot order option = 2 > > ICNTL(14) Percentage of memory relaxation = 20 > > Number of level 2 nodes = 0 > > Number of split nodes = 0 > > RINFOG(1) Operations during elimination (estim)= 6.966D+05 > > Ordering compressed/constrained (ICNTL(12)) = 3 > > > > MEMORY ESTIMATIONS ... > > Estimations with standard Full-Rank (FR) factorization: > > Total space in MBytes, IC factorization (INFOG(17)): 1 > > Total space in MBytes, OOC factorization (INFOG(27)): 1 > > > > Elapsed time in analysis driver= 0.0066 > > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 2 960 9792 > > executing #MPI = 1, without OMP > > > > > > > > ****** FACTORIZATION STEP ******** > > GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ... > > Number of working processes = 1 > > ICNTL(22) Out-of-core option = 0 > > ICNTL(35) BLR activation (eff. choice) = 0 > > ICNTL(14) Memory relaxation = 20 > > INFOG(3) Real space for factors (estimated)= 24094 > > INFOG(4) Integer space for factors (estim.)= 12143 > > Maximum frontal size (estimated) = 80 > > Number of nodes in the tree = 227 > > Memory allowed (MB -- 0: N/A ) = 0 > > Memory provided by user, sum of LWK_USER = 0 > > Relative threshold for pivoting, CNTL(1) = 0.1000D-01 > > ZERO PIVOT DETECTION ON, THRESHOLD = 2.9434468577175697E-020 > > INFINITE FIXATION > > Effective size of S (based on INFO(39))= 31314 > > Elapsed time to reformat/distribute matrix = 0.0006 > > ** Memory allocated, total in Mbytes (INFOG(19)): 1 > > ** Memory effectively used, total in Mbytes (INFOG(22)): 1 > > ** Memory dynamically allocated for CB, total in Mbytes : 0 > > > > Elapsed time for (failed) factorization = 0.0032 > > Leaving factorization with ... > > RINFOG(2) Operations in node assembly = 3.366D+04 > > ------(3) Operations in node elimination = 9.346D+05 > > INFOG (9) Real space for factors = 26980 > > INFOG(10) Integer space for factors = 13047 > > INFOG(11) Maximum front size = 84 > > INFOG(29) Number of entries in factors = 24047 > > INFOG(12) Number of negative pivots = 294 > > INFOG(13) Number of delayed pivots = 452 > > Number of 2x2 pivots in type 1 nodes = 0 > > Number of 2X2 pivots in type 2 nodes = 0 > > Nb of null pivots detected by ICNTL(24) = 0 > > INFOG(28) Estimated deficiency = 0 > > INFOG(14) Number of memory compress = 1 > > > > Elapsed time in factorization driver= 0.0042 > > On return from DMUMPS, INFOG(1)= -9 > > On return from DMUMPS, INFOG(2)= 22 > > terminate called after throwing an instance of 'std::runtime_error' > > what(): > > > > *** ------------------------------------------------------------------------- > > *** DOLFIN encountered an error. If you are not able to resolve this issue > > *** using the information listed below, you can ask for help at > > *** > > *** fenics-support at googlegroups.com (https://link.getmailspring.com/link/3C83A41D-E95A-4725-8766-C8F075BB1972 at getmailspring.com/1?redirect=mailto%3Afenics-support%40googlegroups.com&recipient=cGV0c2MtdXNlcnNAbWNzLmFubC5nb3Y%3D) > > *** > > *** Remember to include the error message listed below and, if possible, > > *** include a *minimal* running example to reproduce the error. > > *** > > *** ------------------------------------------------------------------------- > > *** Error: Unable to solve linear system using PETSc Krylov solver. > > *** Reason: Solution failed to converge in 0 iterations (PETSc reason DIVERGED_PC_FAILED, residual norm ||r|| = 0.000000e+00). > > *** Where: This error was encountered inside PETScKrylovSolver.cpp. > > *** Process: 0 > > *** > > *** DOLFIN version: 2019.1.0 > > *** Git changeset: 74d7efe1e84d65e9433fd96c50f1d278fa3e3f3f > > *** ------------------------------------------------------------------------- > > > > Aborted (core dumped) -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 24 21:13:59 2021 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 24 May 2021 22:13:59 -0400 Subject: [petsc-users] Help needed with MUMPS solver In-Reply-To: References: <3D0CA2AC-CEC3-4B50-8692-32987039568C@petsc.dev> Message-ID: On Mon, May 24, 2021 at 10:06 PM Karl Yang wrote: > Hi, Barry > > [image: Sent from Mailspring] > I got the following error with -ksp_error_if_not_converged. > I'm making use of the PETSc from dolfin package. So it could also be bugs > in dolfin. But still thank you for providing more information. > Unfortunately, Dolfin is eating our nice error message and stack, so we cannot see what is going on. If it is UFL, you should be able to also run it in Firedrake, which I know preserves the error messages. Can you do this? Thanks, Matt > Regards > Juntao > > terminate called after throwing an instance of 'std::runtime_error' > what(): > > *** > ------------------------------------------------------------------------- > *** DOLFIN encountered an error. If you are not able to resolve this issue > *** using the information listed below, you can ask for help at > *** > *** fenics-support at googlegroups.com > *** > *** Remember to include the error message listed below and, if possible, > *** include a *minimal* running example to reproduce the error. > *** > *** > ------------------------------------------------------------------------- > *** Error: Unable to successfully call PETSc function 'KSPSolve'. > *** Reason: PETSc error code is: 76 (Error in external library). > *** Where: This error was encountered inside > /tmp/dolfin/dolfin/la/PETScKrylovSolver.cpp. > *** Process: 0 > *** > *** DOLFIN version: 2019.1.0 > *** Git changeset: 74d7efe1e84d65e9433fd96c50f1d278fa3e3f3f > *** > ------------------------------------------------------------------------- > > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 290 RUNNING AT c10236694934 > = EXIT CODE: 134 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > > =================================================================================== > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) > This typically refers to a problem with your application. > Please see the FAQ page for debugging suggestions > > > Regards > Juntao > > > > > On May 24 2021, at 12:17 am, Barry Smith wrote: > > > Please run with -ksp_error_if_not_converged and send all the output > > Barry > > > On May 23, 2021, at 10:16 AM, Karl Yang > > wrote: > > Hello, > > I am using MUMPS direct solver for my project. I used the following > options for solving my problem and it works in most cases. But for some > cases I encounter a divergence error. But I think it is actually error due > to MUMPS? > > I'm not sure how to debug the error. It is appreciated if anyone familiar > with MUMPS solver to offer me some guidance. > > regards > Juntao > > MUMPS options: > > PetscOptionsSetValue(NULL, "-ksp_type", "preonly"); > PetscOptionsSetValue(NULL, "-pc_type", "cholesky"); > PetscOptionsSetValue(NULL, "-pc_factor_mat_solver_type", "mumps"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_1", "1"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_2", "1"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_3", "1"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_4", "3"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_28", "1"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_7", "2"); > PetscOptionsSetValue(NULL, "-mat_mumps_icntl_24", "1"); > > > > Log output from MUMPS and error message from PETSC at the bottom > > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 1 > 240 2448 > executing #MPI = 1, without OMP > > ================================================= > MUMPS compiled with option -Dmetis > MUMPS compiled with option -Dptscotch > MUMPS compiled with option -Dscotch > This MUMPS version includes code for SAVE_RESTORE > ================================================= > L D L^T Solver for general symmetric matrices > Type of parallelism: Working host > > ****** ANALYSIS STEP ******** > > Scaling will be computed during analysis > Compute maximum matching (Maximum Transversal): 5 > ... JOB = 5: MAXIMIZE PRODUCT DIAGONAL AND SCALE > > Entering analysis phase with ... > N NNZ LIW INFO(1) > 240 2448 5137 0 > Matrix entries: IRN() ICN() > 1 1 1 2 1 3 > 1 4 1 5 1 6 > 1 7 1 8 1 9 > 1 10 > Average density of rows/columns = 18 > Average density of rows/columns = 18 > Ordering based on AMF > Constrained Ordering based on AMF > Average density of rows/columns = 18 > Average density of rows/columns = 18 > NFSIZ(.) = 0 38 14 0 33 33 0 0 0 0 > > FILS (.) = 0 148 4 -96 224 163 20 -43 8 1 > > FRERE(.) = 241 -5 -6 241 0 -2 241 241 241 241 > > > Leaving analysis phase with ... > INFOG(1) = 0 > INFOG(2) = 0 > -- (20) Number of entries in factors (estim.) = 3750 > -- (3) Real space for factors (estimated) = 4641 > -- (4) Integer space for factors (estimated) = 2816 > -- (5) Maximum frontal size (estimated) = 38 > -- (6) Number of nodes in the tree = 56 > -- (32) Type of analysis effectively used = 1 > -- (7) Ordering option effectively used = 2 > ICNTL(6) Maximum transversal option = 0 > ICNTL(7) Pivot order option = 2 > ICNTL(14) Percentage of memory relaxation = 20 > Number of level 2 nodes = 0 > Number of split nodes = 0 > RINFOG(1) Operations during elimination (estim)= 7.137D+04 > Ordering compressed/constrained (ICNTL(12)) = 3 > > MEMORY ESTIMATIONS ... > Estimations with standard Full-Rank (FR) factorization: > Total space in MBytes, IC factorization (INFOG(17)): 0 > Total space in MBytes, OOC factorization (INFOG(27)): 0 > > Elapsed time in analysis driver= 0.0016 > > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 2 > 240 2448 > executing #MPI = 1, without OMP > > > > ****** FACTORIZATION STEP ******** > > GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ... > Number of working processes = 1 > ICNTL(22) Out-of-core option = 0 > ICNTL(35) BLR activation (eff. choice) = 0 > ICNTL(14) Memory relaxation = 20 > INFOG(3) Real space for factors (estimated)= 4641 > INFOG(4) Integer space for factors (estim.)= 2816 > Maximum frontal size (estimated) = 38 > Number of nodes in the tree = 56 > Memory allowed (MB -- 0: N/A ) = 0 > Memory provided by user, sum of LWK_USER = 0 > Relative threshold for pivoting, CNTL(1) = 0.1000D-01 > ZERO PIVOT DETECTION ON, THRESHOLD = 2.8931920285365730E-020 > INFINITE FIXATION > Effective size of S (based on INFO(39))= 7981 > Elapsed time to reformat/distribute matrix = 0.0001 > ** Memory allocated, total in Mbytes (INFOG(19)): 0 > ** Memory effectively used, total in Mbytes (INFOG(22)): 0 > ** Memory dynamically allocated for CB, total in Mbytes : 0 > > Elapsed time for factorization = 0.0006 > > Leaving factorization with ... > RINFOG(2) Operations in node assembly = 5.976D+03 > ------(3) Operations in node elimination = 1.197D+05 > INFOG (9) Real space for factors = 6193 > INFOG(10) Integer space for factors = 3036 > INFOG(11) Maximum front size = 42 > INFOG(29) Number of entries in factors = 4896 > INFOG(12) Number of negative pivots = 79 > INFOG(13) Number of delayed pivots = 110 > Number of 2x2 pivots in type 1 nodes = 1 > Number of 2X2 pivots in type 2 nodes = 0 > Nb of null pivots detected by ICNTL(24) = 0 > INFOG(28) Estimated deficiency = 0 > INFOG(14) Number of memory compress = 0 > > Elapsed time in factorization driver= 0.0009 > > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 3 > 240 2448 > executing #MPI = 1, without OMP > > > > ****** SOLVE & CHECK STEP ******** > > GLOBAL STATISTICS PRIOR SOLVE PHASE ........... > Number of right-hand-sides = 1 > Blocking factor for multiple rhs = 1 > ICNTL (9) = 1 > --- (10) = 0 > --- (11) = 0 > --- (20) = 0 > --- (21) = 0 > --- (30) = 0 > --- (35) = 0 > > > Vector solution for column 1 > RHS > -7.828363D-02 -3.255337D+00 1.054729D+00 1.379822D-01 -3.892113D-01 > 1.433990D-01 1.089250D+00 2.252611D+00 3.215399D+00 -6.788806D-02 > ** Space in MBYTES used for solve : 0 > > Leaving solve with ... > Time to build/scatter RHS = 0.000003 > Time in solution step (fwd/bwd) = 0.000167 > .. Time in forward (fwd) step = 0.000053 > .. Time in backward (bwd) step = 0.000093 > Time to gather solution(cent.sol)= 0.000000 > Time to copy/scale dist. solution= 0.000000 > > Elapsed time in solve driver= 0.0004 > *** Warning: Verbose output for PETScKrylovSolver not implemented, calling > PETSc KSPView directly. > KSP Object: 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: 1 MPI processes > type: cholesky > out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 0., needed 0. > Factored matrix follows: > Mat Object: 1 MPI processes > type: mumps > rows=240, cols=240 > package used to perform factorization: mumps > total: nonzeros=3750, allocated nonzeros=3750 > total number of mallocs used during MatSetValues calls=0 > MUMPS run parameters: > SYM (matrix type): 2 > PAR (host participation): 1 > ICNTL(1) (output for error): 1 > ICNTL(2) (output of diagnostic msg): 1 > ICNTL(3) (output for global info): 6 > ICNTL(4) (level of printing): 3 > ICNTL(5) (input mat struct): 0 > ICNTL(6) (matrix prescaling): 7 > ICNTL(7) (sequential matrix ordering):2 > ICNTL(8) (scaling strategy): 77 > ICNTL(10) (max num of refinements): 0 > ICNTL(11) (error analysis): 0 > ICNTL(12) (efficiency control): 0 > ICNTL(13) (efficiency control): 1 > ICNTL(14) (percentage of estimated workspace increase): 20 > ICNTL(18) (input mat struct): 0 > ICNTL(19) (Schur complement info): 0 > ICNTL(20) (rhs sparse pattern): 0 > ICNTL(21) (solution struct): 0 > ICNTL(22) (in-core/out-of-core facility): 0 > ICNTL(23) (max size of memory can be allocated locally):0 > ICNTL(24) (detection of null pivot rows): 1 > ICNTL(25) (computation of a null space basis): 0 > ICNTL(26) (Schur options for rhs or solution): 0 > ICNTL(27) (experimental parameter): -32 > ICNTL(28) (use parallel or sequential ordering): 1 > ICNTL(29) (parallel ordering): 0 > ICNTL(30) (user-specified set of entries in inv(A)): 0 > ICNTL(31) (factors is discarded in the solve phase): 0 > ICNTL(33) (compute determinant): 0 > ICNTL(35) (activate BLR based factorization): 0 > ICNTL(36) (choice of BLR factorization variant): 0 > ICNTL(38) (estimated compression rate of LU factors): 333 > CNTL(1) (relative pivoting threshold): 0.01 > CNTL(2) (stopping criterion of refinement): 1.49012e-08 > CNTL(3) (absolute pivoting threshold): 0. > CNTL(4) (value of static pivoting): -1. > CNTL(5) (fixation for null pivots): 0. > CNTL(7) (dropping parameter for BLR): 0. > RINFO(1) (local estimated flops for the elimination after > analysis): > [0] 71368. > RINFO(2) (local estimated flops for the assembly after > factorization): > [0] 5976. > RINFO(3) (local estimated flops for the elimination after > factorization): > [0] 119716. > INFO(15) (estimated size of (in MB) MUMPS internal data for > running numerical factorization): > [0] 0 > INFO(16) (size of (in MB) MUMPS internal data used during > numerical factorization): > [0] 0 > INFO(23) (num of pivots eliminated on this processor after > factorization): > [0] 240 > RINFOG(1) (global estimated flops for the elimination after > analysis): 71368. > RINFOG(2) (global estimated flops for the assembly after > factorization): 5976. > RINFOG(3) (global estimated flops for the elimination after > factorization): 119716. > (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): > (0.,0.)*(2^0) > INFOG(3) (estimated real workspace for factors on all > processors after analysis): 4641 > INFOG(4) (estimated integer workspace for factors on all > processors after analysis): 2816 > INFOG(5) (estimated maximum front size in the complete > tree): 38 > INFOG(6) (number of nodes in the complete tree): 56 > INFOG(7) (ordering option effectively use after analysis): 2 > INFOG(8) (structural symmetry in percent of the permuted > matrix after analysis): 100 > INFOG(9) (total real/complex workspace to store the matrix > factors after factorization): 6193 > INFOG(10) (total integer space store the matrix factors > after factorization): 3036 > INFOG(11) (order of largest frontal matrix after > factorization): 42 > INFOG(12) (number of off-diagonal pivots): 79 > INFOG(13) (number of delayed pivots after factorization): 110 > INFOG(14) (number of memory compress after factorization): 0 > INFOG(15) (number of steps of iterative refinement after > solution): 0 > INFOG(16) (estimated size (in MB) of all MUMPS internal data > for factorization after analysis: value on the most memory consuming > processor): 0 > INFOG(17) (estimated size of all MUMPS internal data for > factorization after analysis: sum over all processors): 0 > INFOG(18) (size of all MUMPS internal data allocated during > factorization: value on the most memory consuming processor): 0 > INFOG(19) (size of all MUMPS internal data allocated during > factorization: sum over all processors): 0 > INFOG(20) (estimated number of entries in the factors): 3750 > INFOG(21) (size in MB of memory effectively used during > factorization - value on the most memory consuming processor): 0 > INFOG(22) (size in MB of memory effectively used during > factorization - sum over all processors): 0 > INFOG(23) (after analysis: value of ICNTL(6) effectively > used): 5 > INFOG(24) (after analysis: value of ICNTL(12) effectively > used): 3 > INFOG(25) (after factorization: number of pivots modified by > static pivoting): 0 > INFOG(28) (after factorization: number of null pivots > encountered): 0 > INFOG(29) (after factorization: effective number of entries > in the factors (sum over all processors)): 4896 > INFOG(30, 31) (after solution: size in Mbytes of memory used > during solution phase): 0, 0 > INFOG(32) (after analysis: type of analysis done): 1 > INFOG(33) (value used for ICNTL(8)): -2 > INFOG(34) (exponent of the determinant if determinant is > requested): 0 > INFOG(35) (after factorization: number of entries taking > into account BLR factor compression - sum over all processors): 4896 > INFOG(36) (after analysis: estimated size of all MUMPS > internal data for running BLR in-core - value on the most memory consuming > processor): 0 > INFOG(37) (after analysis: estimated size of all MUMPS > internal data for running BLR in-core - sum over all processors): 0 > INFOG(38) (after analysis: estimated size of all MUMPS > internal data for running BLR out-of-core - value on the most memory > consuming processor): 0 > INFOG(39) (after analysis: estimated size of all MUMPS > internal data for running BLR out-of-core - sum over all processors): 0 > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=240, cols=240 > total: nonzeros=4656, allocated nonzeros=4656 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 167 nodes, limit used is 5 > > Entering DMUMPS 5.2.1 from C interface with JOB = -2 > executing #MPI = 1, without OMP > rank: 0 coefficient: 0.132368 > > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 1 > 960 9792 > executing #MPI = 1, without OMP > > ================================================= > MUMPS compiled with option -Dmetis > MUMPS compiled with option -Dptscotch > MUMPS compiled with option -Dscotch > This MUMPS version includes code for SAVE_RESTORE > ================================================= > L D L^T Solver for general symmetric matrices > Type of parallelism: Working host > > ****** ANALYSIS STEP ******** > > Scaling will be computed during analysis > Compute maximum matching (Maximum Transversal): 5 > ... JOB = 5: MAXIMIZE PRODUCT DIAGONAL AND SCALE > > Entering analysis phase with ... > N NNZ LIW INFO(1) > 960 9792 20545 0 > Matrix entries: IRN() ICN() > 1 1 1 2 1 3 > 1 4 1 5 1 6 > 1 7 1 8 1 9 > 1 10 > Average density of rows/columns = 18 > Average density of rows/columns = 18 > Ordering based on AMF > Constrained Ordering based on AMF > Average density of rows/columns = 18 > Average density of rows/columns = 18 > NFSIZ(.) = 0 0 0 58 0 0 0 73 14 0 > > FILS (.) = 0 -747 -80 922 146 5 6 669 3 1 > > FRERE(.) = 961 961 961 0 961 961 961 -4 -69 961 > > > Leaving analysis phase with ... > INFOG(1) = 0 > INFOG(2) = 0 > -- (20) Number of entries in factors (estim.) = 20336 > -- (3) Real space for factors (estimated) = 24094 > -- (4) Integer space for factors (estimated) = 12143 > -- (5) Maximum frontal size (estimated) = 80 > -- (6) Number of nodes in the tree = 227 > -- (32) Type of analysis effectively used = 1 > -- (7) Ordering option effectively used = 2 > ICNTL(6) Maximum transversal option = 0 > ICNTL(7) Pivot order option = 2 > ICNTL(14) Percentage of memory relaxation = 20 > Number of level 2 nodes = 0 > Number of split nodes = 0 > RINFOG(1) Operations during elimination (estim)= 6.966D+05 > Ordering compressed/constrained (ICNTL(12)) = 3 > > MEMORY ESTIMATIONS ... > Estimations with standard Full-Rank (FR) factorization: > Total space in MBytes, IC factorization (INFOG(17)): 1 > Total space in MBytes, OOC factorization (INFOG(27)): 1 > > Elapsed time in analysis driver= 0.0066 > > Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 2 > 960 9792 > executing #MPI = 1, without OMP > > > > ****** FACTORIZATION STEP ******** > > GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ... > Number of working processes = 1 > ICNTL(22) Out-of-core option = 0 > ICNTL(35) BLR activation (eff. choice) = 0 > ICNTL(14) Memory relaxation = 20 > INFOG(3) Real space for factors (estimated)= 24094 > INFOG(4) Integer space for factors (estim.)= 12143 > Maximum frontal size (estimated) = 80 > Number of nodes in the tree = 227 > Memory allowed (MB -- 0: N/A ) = 0 > Memory provided by user, sum of LWK_USER = 0 > Relative threshold for pivoting, CNTL(1) = 0.1000D-01 > ZERO PIVOT DETECTION ON, THRESHOLD = 2.9434468577175697E-020 > INFINITE FIXATION > Effective size of S (based on INFO(39))= 31314 > Elapsed time to reformat/distribute matrix = 0.0006 > ** Memory allocated, total in Mbytes (INFOG(19)): 1 > ** Memory effectively used, total in Mbytes (INFOG(22)): 1 > ** Memory dynamically allocated for CB, total in Mbytes : 0 > > Elapsed time for (failed) factorization = 0.0032 > > Leaving factorization with ... > RINFOG(2) Operations in node assembly = 3.366D+04 > ------(3) Operations in node elimination = 9.346D+05 > INFOG (9) Real space for factors = 26980 > INFOG(10) Integer space for factors = 13047 > INFOG(11) Maximum front size = 84 > INFOG(29) Number of entries in factors = 24047 > INFOG(12) Number of negative pivots = 294 > INFOG(13) Number of delayed pivots = 452 > Number of 2x2 pivots in type 1 nodes = 0 > Number of 2X2 pivots in type 2 nodes = 0 > Nb of null pivots detected by ICNTL(24) = 0 > INFOG(28) Estimated deficiency = 0 > INFOG(14) Number of memory compress = 1 > > Elapsed time in factorization driver= 0.0042 > On return from DMUMPS, INFOG(1)= -9 > On return from DMUMPS, INFOG(2)= 22 > terminate called after throwing an instance of 'std::runtime_error' > what(): > > *** > ------------------------------------------------------------------------- > *** DOLFIN encountered an error. If you are not able to resolve this issue > *** using the information listed below, you can ask for help at > *** > *** fenics-support at googlegroups.com > > *** > *** Remember to include the error message listed below and, if possible, > *** include a *minimal* running example to reproduce the error. > *** > *** > ------------------------------------------------------------------------- > *** Error: Unable to solve linear system using PETSc Krylov solver. > *** Reason: Solution failed to converge in 0 iterations (PETSc reason > DIVERGED_PC_FAILED, residual norm ||r|| = 0.000000e+00). > *** Where: This error was encountered inside PETScKrylovSolver.cpp. > *** Process: 0 > *** > *** DOLFIN version: 2019.1.0 > *** Git changeset: 74d7efe1e84d65e9433fd96c50f1d278fa3e3f3f > *** > ------------------------------------------------------------------------- > > Aborted (core dumped) > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From saransh.saxena5571 at gmail.com Tue May 25 02:48:02 2021 From: saransh.saxena5571 at gmail.com (Saransh Saxena) Date: Tue, 25 May 2021 09:48:02 +0200 Subject: [petsc-users] Integrating SNES in FEM code In-Reply-To: <3EE85DD6-219F-4FF3-B5A3-F2BDB490CBE8@petsc.dev> References: <82576661-7CE2-4AF3-B1EA-E0C04B103702@petsc.dev> <3EE85DD6-219F-4FF3-B5A3-F2BDB490CBE8@petsc.dev> Message-ID: Hi guys, I've written an implementation of SNES within my code to use the petsc nonlinear solvers but for some reason, I am getting results I can't make sense of. To summarize, I've written a function to calculate residual using Matthew's suggestion. However, when I run the code, the behaviour is odd, the solver seems to enter the myresidual function initially. However, after that it never updates the iteration counter and the solution vector remains unchanged (and a really small value) while the residual vector explodes in value. Residual code :- PetscErrorCode sl::myresidual(SNES snes, Vec x, Vec F, void *ctx) { // Cast the application context: sl::formulCtx *user = (sl::formulCtx*)ctx; // Read the formulation: formulation *thisformul = (*user).formul; thisformul->generate(); //vec *b = user->b; //mat *A = user->A; vec b = thisformul->b(); mat A = thisformul->A(); // Read pointers to A and b: Mat Apetsc = A.getpetsc(); Vec bpetsc = b.getpetsc(); double normvalres, normvalsol; VecNorm(F, NORM_2, &normvalres); VecNorm(x, NORM_2, &normvalsol); std::cout << "----------------------------------------------------------------------------" << std::endl; std::cout << "Entered residual function, norm of residual vector is : " << normvalres << std::endl; std::cout << "Entered residual function, norm of solution vector is : " << normvalsol << std::endl; // Compute the residual as F = A*x - b MatMult(Apetsc, x, F); VecAXPY(F,-1.0, bpetsc); Vec output; VecDuplicate(x, &output); VecCopy(x, output); setdata(vec(b.getpointer()->getdofmanager(), output)); std::cout << "Writing the sol to fields \n"; return 0; } SNES implementation :- void sl::solvenonlinear(formulation thisformul, double restol, int maxitnum) { // Make sure the problem is of the form Ax = b: if (thisformul.isdampingmatrixdefined() || thisformul.ismassmatrixdefined()) { std::cout << "Error in 'sl' namespace: formulation to solve cannot have a damping/mass matrix (use a time resolution algorithm)" << std::endl; abort(); } // Remove leftovers (if any): mat Atemp = thisformul.A(); vec btemp = thisformul.b(); // Create Application Context for formulation sl::formulCtx user; user.formul = &thisformul; // Generate formulation to set PETSc SNES requirements: thisformul.generate(); mat A = thisformul.A(); vec b = thisformul.b(); // SNES requirements: Vec bpetsc = b.getpetsc(); Mat Apetsc = A.getpetsc(); vec residual(std::shared_ptr(new rawvec(b.getpointer()->getdofmanager()))); Vec residualpetsc = residual.getpetsc(); vec sol(std::shared_ptr(new rawvec(b.getpointer()->getdofmanager()))); Vec solpetsc = sol.getpetsc(); //Retrieve the SNES and KSP Context from A matrix: SNES* snes = A.getpointer()->getsnes(); KSP* ksp = A.getpointer()->getksp(); // Create placeholder for preconditioner: PC pc; // Create snes context: SNESCreate(PETSC_COMM_SELF, snes); SNESSetFunction(*snes, residualpetsc, sl::myresidual, &user); SNESSetTolerances(*snes, PETSC_DEFAULT, restol, PETSC_DEFAULT, maxitnum, 5); // Retrieve the KSP context automatically created: SNESGetKSP(*snes, ksp); //Set KSP specific parameters/options: KSPSetOperators(*ksp, Apetsc, Apetsc); KSPSetFromOptions(*ksp); KSPGetPC(*ksp,&pc); PCSetType(pc,PCLU); PCFactorSetMatSolverType(pc,MATSOLVERMUMPS); //Call SNES options to invoke changes from console: SNESSetFromOptions(*snes); // Set SNES Monitor to retrieve convergence information: SNESMonitorSet(*snes, sl::mysnesmonitor, PETSC_NULL, PETSC_NULL); //SNESMonitorLGResidualNorm(); SNESSolve(*snes, PETSC_NULL, solpetsc); // Print the norm of residual: double normres; VecNorm(residualpetsc, NORM_2, &normres); std::cout << "L2 norm of the residual is : " << normres << std::endl; //Set the solution to all the fields: setdata(sol); // Get the number of required iterations and the residual norm: //SNESGetIterationNumber(*snes, &maxitnum); //SNESGetResidualNorm(*snes, &restol); // Destroy SNES context once done with computation: SNESDestroy(snes); } Output :- [image: image.png] Am I doing something incorrect wrt SNES? When I use the linear solver (KSP) and manually coded fixed point nonlinear iteration, it works fine. Best regards, Saransh On Sun, May 9, 2021 at 10:43 PM Barry Smith wrote: > > Saransh, > > If Picard or Newton's method does not converge, you can consider > adding pseudo-transient and/or other continuation methods. For example, if > the problem is made difficult by certain physical parameters you can start > with "easier" values of the parameters, solve the nonlinear system, then > use its solution as the initial guess for slightly more "difficult" > parameters, etc. Or, depending on the problem grid sequencing may be > appropriate. We have some tools to help with all these approaches. > > Barry > > > On May 9, 2021, at 2:07 PM, Saransh Saxena > wrote: > > Thanks Barry and Matt, > > Till now I was only using a simple fixed point nonlinear solver manually > coded instead of ones provided by PETSc. However, the problem I am trying > to solve is highly nonlinear so I suppose I'll need at least a newton based > solver to start with. I'll get back to you guys if I have any questions. > > Cheers, > Saransh > > On Sat, May 8, 2021 at 5:18 AM Barry Smith wrote: > >> Saransh, >> >> I've add some code for SNESSetPicard() in the PETSc branch >> barry/2021-05-06/add-snes-picard-mf see also http >> s://gitlab.com/petsc/petsc/-/merge_requests/3962 that will make your >> coding much easier. >> >> With this branch you can provide code that computes A(x), using >> SNESSetPicard(). >> >> 1) by default it will use the defection-correction form of Picard >> iteration A(x^n)(x^{n+1} - x^{n}) = b - A(x^n) to solve, which can be >> slower than Newton >> >> 2) with -snes_fd_color it will compute the Jacobian via coloring using >> SNESComputeJacobianDefaultColor() (assuming the true Jacobian has the same >> sparsity structure as A). The true Jacobian is J(x^n) = A'(x^n)[x^n] - >> A(x^n) where A'(x^n) is the third order tensor of the derivatives of A() >> and A'(x^n)[x^n] is a matrix, I do not know if, in general, it has the same >> nonzero structure as A. (I'm lost beyond matrices :-(). >> >> 3) with -snes_mf_operator it will apply the true Jacobian matrix-free and >> precondition it with a preconditioner built from A(x^n) matrix, for some >> problems this works well. >> >> 4) with -snes_fd it uses SNESComputeJacobianDefault() and computes the >> Jacobian by finite differencing one column at a time, thus it is very slow >> and not useful for large problems. But useful for testing with small >> problems. >> >> So you can provide A() and need not worrying about providing the Jacobian >> or even the function evaluation code. It is all taken care of by >> SNESSetPicard(). >> >> Hope this helps, >> >> Barry >> >> >> On May 6, 2021, at 1:21 PM, Matthew Knepley wrote: >> >> On Thu, May 6, 2021 at 2:09 PM Saransh Saxena < >> saransh.saxena5571 at gmail.com> wrote: >> >>> Hi, >>> >>> I am trying to incorporate newton method in solving a nonlinear FEM >>> equation using SNES from PETSc. The overall equation is of the type A(x).x >>> = b, where b is a vector of external loads, x is the solution field (say >>> displacements for e.g.) and A is the combined LHS matrix derived from the >>> discretization of weak formulation of the governing finite element >>> equation. >>> >>> While going through the manual and examples of snes, I found that I need >>> to define the function of residual using SNESSetFunction() and jacobian >>> using SNESSetJacobian(). In that context I had a couple of questions :- >>> >>> 1. In the snes tutorials I've browsed through, the functions for >>> computing residual passed had arguments only for x, the solution vector and >>> f, the residual vector. Is there a way a user can pass an additional vector >>> (b) and matrix (A) for computing the residual as well? as in my case, f = b >>> - A(x).x >>> >> >> You would give PETSc an outer function MyResidual() that looked like this: >> >> PetscErrorCode MyResidual(SNES snes, Vec X, Vec F, void *ctx) >> { >> >> >> MatMult(A, X, F); >> VecAXPY(F, -1.0, b); >> } >> >> >>> 2. Since computing jacobian is not that trivial, I would like to use one >>> of the pre-built jacobian methods. Is there any other step other than >>> setting the 3rd argument in SNESSetJacobian to SNESComputeJacobianDefault? >>> >> >> If you do nothing, we will compute it by default. >> >> Thanks, >> >> MAtt >> >> >>> Best regards, >>> >>> Saransh >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 160332 bytes Desc: not available URL: From bsmith at petsc.dev Tue May 25 03:14:57 2021 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 25 May 2021 03:14:57 -0500 Subject: [petsc-users] Integrating SNES in FEM code In-Reply-To: References: <82576661-7CE2-4AF3-B1EA-E0C04B103702@petsc.dev> <3EE85DD6-219F-4FF3-B5A3-F2BDB490CBE8@petsc.dev> Message-ID: <38DA9DD8-93DA-461C-993A-FB7B278D3410@petsc.dev> > VecNorm(F, NORM_2, &normvalres); The F has not yet been computed by you so you shouldn't take the norm here. F could have anything in it. You should take the norm after the line > VecAXPY(F,-1.0, bpetsc); > // Read pointers to A and b: > Mat Apetsc = A.getpetsc(); > Vec bpetsc = b.getpetsc(); Where are these things computed and are they both functions of output? or is b merely x (the current solution snes is working with) > Vec output; > VecDuplicate(x, &output); > VecCopy(x, output); > > setdata(vec(b.getpointer()->getdofmanager(), output)); What is the line above doing? I think you using using Picard iteration A(x^n) x^{n+1} = b(x^n). (Sometimes people call this a fixed-point iteration) If so you should use SNESSetPicard() and not SNESSetFunction(). If you run with SNESSetPicard() with no additional options it will run the defect correction version of Picard If you run with SNESSetPicard() and use -snes_mf_operator then SNES will run matrix-free Newton's method using your A as the preconditioner for the Jacobian If you run with SNESSetPicard() and use -snes_fd then SNES will form explicitly the Jacobian and run Newton's method with it. This will be very slow but you gives you an idea of how Newton's method works on your problem. If you call SNESSetFromOptions() before SNESSolve() then you can use -snes_monitor -ksp_monitor -snes_converged_reason and many other options to monitor the convergence, then you will not have to compute the norms yourself and put print statements in your code for the norms. Barry > On May 25, 2021, at 2:48 AM, Saransh Saxena wrote: > > Hi guys, > > I've written an implementation of SNES within my code to use the petsc nonlinear solvers but for some reason, I am getting results I can't make sense of. To summarize, I've written a function to calculate residual using Matthew's suggestion. However, when I run the code, the behaviour is odd, the solver seems to enter the myresidual function initially. However, after that it never updates the iteration counter and the solution vector remains unchanged (and a really small value) while the residual vector explodes in value. > > Residual code :- > > PetscErrorCode sl::myresidual(SNES snes, Vec x, Vec F, void *ctx) > { > // Cast the application context: > sl::formulCtx *user = (sl::formulCtx*)ctx; > > // Read the formulation: > formulation *thisformul = (*user).formul; > thisformul->generate(); > > //vec *b = user->b; > //mat *A = user->A; > > vec b = thisformul->b(); > mat A = thisformul->A(); > > // Read pointers to A and b: > Mat Apetsc = A.getpetsc(); > Vec bpetsc = b.getpetsc(); > > double normvalres, normvalsol; > VecNorm(F, NORM_2, &normvalres); > VecNorm(x, NORM_2, &normvalsol); > std::cout << "----------------------------------------------------------------------------" << std::endl; > std::cout << "Entered residual function, norm of residual vector is : " << normvalres << std::endl; > std::cout << "Entered residual function, norm of solution vector is : " << normvalsol << std::endl; > > // Compute the residual as F = A*x - b > MatMult(Apetsc, x, F); > VecAXPY(F,-1.0, bpetsc); > > Vec output; > VecDuplicate(x, &output); > VecCopy(x, output); > > setdata(vec(b.getpointer()->getdofmanager(), output)); > > std::cout << "Writing the sol to fields \n"; > > return 0; > } > > SNES implementation :- > > void sl::solvenonlinear(formulation thisformul, double restol, int maxitnum) > { > // Make sure the problem is of the form Ax = b: > if (thisformul.isdampingmatrixdefined() || thisformul.ismassmatrixdefined()) > { > std::cout << "Error in 'sl' namespace: formulation to solve cannot have a damping/mass matrix (use a time resolution algorithm)" << std::endl; > abort(); > } > > // Remove leftovers (if any): > mat Atemp = thisformul.A(); vec btemp = thisformul.b(); > > // Create Application Context for formulation > sl::formulCtx user; > user.formul = &thisformul; > > // Generate formulation to set PETSc SNES requirements: > thisformul.generate(); > > mat A = thisformul.A(); > vec b = thisformul.b(); > > // SNES requirements: > Vec bpetsc = b.getpetsc(); > Mat Apetsc = A.getpetsc(); > > vec residual(std::shared_ptr(new rawvec(b.getpointer()->getdofmanager()))); > Vec residualpetsc = residual.getpetsc(); > > vec sol(std::shared_ptr(new rawvec(b.getpointer()->getdofmanager()))); > Vec solpetsc = sol.getpetsc(); > > //Retrieve the SNES and KSP Context from A matrix: > SNES* snes = A.getpointer()->getsnes(); > KSP* ksp = A.getpointer()->getksp(); > > // Create placeholder for preconditioner: > PC pc; > > // Create snes context: > SNESCreate(PETSC_COMM_SELF, snes); > SNESSetFunction(*snes, residualpetsc, sl::myresidual, &user); > SNESSetTolerances(*snes, PETSC_DEFAULT, restol, PETSC_DEFAULT, maxitnum, 5); > > // Retrieve the KSP context automatically created: > SNESGetKSP(*snes, ksp); > > //Set KSP specific parameters/options: > KSPSetOperators(*ksp, Apetsc, Apetsc); > KSPSetFromOptions(*ksp); > KSPGetPC(*ksp,&pc); > PCSetType(pc,PCLU); > PCFactorSetMatSolverType(pc,MATSOLVERMUMPS); > > //Call SNES options to invoke changes from console: > SNESSetFromOptions(*snes); > > // Set SNES Monitor to retrieve convergence information: > SNESMonitorSet(*snes, sl::mysnesmonitor, PETSC_NULL, PETSC_NULL); > //SNESMonitorLGResidualNorm(); > > SNESSolve(*snes, PETSC_NULL, solpetsc); > > // Print the norm of residual: > double normres; > VecNorm(residualpetsc, NORM_2, &normres); > std::cout << "L2 norm of the residual is : " << normres << std::endl; > > //Set the solution to all the fields: > setdata(sol); > > // Get the number of required iterations and the residual norm: > //SNESGetIterationNumber(*snes, &maxitnum); > //SNESGetResidualNorm(*snes, &restol); > > // Destroy SNES context once done with computation: > SNESDestroy(snes); > > } > > Output :- > > > Am I doing something incorrect wrt SNES? When I use the linear solver (KSP) and manually coded fixed point nonlinear iteration, it works fine. > > Best regards, > Saransh > > > > On Sun, May 9, 2021 at 10:43 PM Barry Smith > wrote: > > Saransh, > > If Picard or Newton's method does not converge, you can consider adding pseudo-transient and/or other continuation methods. For example, if the problem is made difficult by certain physical parameters you can start with "easier" values of the parameters, solve the nonlinear system, then use its solution as the initial guess for slightly more "difficult" parameters, etc. Or, depending on the problem grid sequencing may be appropriate. We have some tools to help with all these approaches. > > Barry > > >> On May 9, 2021, at 2:07 PM, Saransh Saxena > wrote: >> >> Thanks Barry and Matt, >> >> Till now I was only using a simple fixed point nonlinear solver manually coded instead of ones provided by PETSc. However, the problem I am trying to solve is highly nonlinear so I suppose I'll need at least a newton based solver to start with. I'll get back to you guys if I have any questions. >> >> Cheers, >> Saransh >> >> On Sat, May 8, 2021 at 5:18 AM Barry Smith > wrote: >> Saransh, >> >> I've add some code for SNESSetPicard() in the PETSc branch barry/2021-05-06/add-snes-picard-mf see also https://gitlab.com/petsc/petsc/-/merge_requests/3962 <> that will make your coding much easier. >> >> With this branch you can provide code that computes A(x), using SNESSetPicard(). >> >> 1) by default it will use the defection-correction form of Picard iteration A(x^n)(x^{n+1} - x^{n}) = b - A(x^n) to solve, which can be slower than Newton >> >> 2) with -snes_fd_color it will compute the Jacobian via coloring using SNESComputeJacobianDefaultColor() (assuming the true Jacobian has the same sparsity structure as A). The true Jacobian is J(x^n) = A'(x^n)[x^n] - A(x^n) where A'(x^n) is the third order tensor of the derivatives of A() and A'(x^n)[x^n] is a matrix, I do not know if, in general, it has the same nonzero structure as A. (I'm lost beyond matrices :-(). >> >> 3) with -snes_mf_operator it will apply the true Jacobian matrix-free and precondition it with a preconditioner built from A(x^n) matrix, for some problems this works well. >> >> 4) with -snes_fd it uses SNESComputeJacobianDefault() and computes the Jacobian by finite differencing one column at a time, thus it is very slow and not useful for large problems. But useful for testing with small problems. >> >> So you can provide A() and need not worrying about providing the Jacobian or even the function evaluation code. It is all taken care of by SNESSetPicard(). >> >> Hope this helps, >> >> Barry >> >> >>> On May 6, 2021, at 1:21 PM, Matthew Knepley > wrote: >>> >>> On Thu, May 6, 2021 at 2:09 PM Saransh Saxena > wrote: >>> Hi, >>> >>> I am trying to incorporate newton method in solving a nonlinear FEM equation using SNES from PETSc. The overall equation is of the type A(x).x = b, where b is a vector of external loads, x is the solution field (say displacements for e.g.) and A is the combined LHS matrix derived from the discretization of weak formulation of the governing finite element equation. >>> >>> While going through the manual and examples of snes, I found that I need to define the function of residual using SNESSetFunction() and jacobian using SNESSetJacobian(). In that context I had a couple of questions :- >>> >>> 1. In the snes tutorials I've browsed through, the functions for computing residual passed had arguments only for x, the solution vector and f, the residual vector. Is there a way a user can pass an additional vector (b) and matrix (A) for computing the residual as well? as in my case, f = b - A(x).x >>> >>> You would give PETSc an outer function MyResidual() that looked like this: >>> >>> PetscErrorCode MyResidual(SNES snes, Vec X, Vec F, void *ctx) >>> { >>> >>> >>> MatMult(A, X, F); >>> VecAXPY(F, -1.0, b); >>> } >>> >>> 2. Since computing jacobian is not that trivial, I would like to use one of the pre-built jacobian methods. Is there any other step other than setting the 3rd argument in SNESSetJacobian to SNESComputeJacobianDefault? >>> >>> If you do nothing, we will compute it by default. >>> >>> Thanks, >>> >>> MAtt >>> >>> Best regards, >>> >>> Saransh >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Tue May 25 07:40:47 2021 From: hgbk2008 at gmail.com (hg) Date: Tue, 25 May 2021 14:40:47 +0200 Subject: [petsc-users] adding calls before and after each iteration of snes Message-ID: Hello I would like to ask if it is possible to add function call before and after each iteration of SNES solve, e.g. InitializeNonLinearIteration and FinalizeNonLinearIteration. It is particularly useful for debugging the constitutive law or for post-processing to post the intermediate results. Best Giang -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue May 25 09:25:42 2021 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 25 May 2021 10:25:42 -0400 Subject: [petsc-users] adding calls before and after each iteration of snes In-Reply-To: References: Message-ID: On Tue, May 25, 2021 at 8:41 AM hg wrote: > Hello > > I would like to ask if it is possible to add function call before and > after each iteration of SNES solve, e.g. InitializeNonLinearIteration and > FinalizeNonLinearIteration. It is particularly useful for debugging the > constitutive law or for post-processing to post the intermediate results. > There is this: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetUpdate.html Thanks, Matt > Best > Giang > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From saransh.saxena5571 at gmail.com Tue May 25 10:28:47 2021 From: saransh.saxena5571 at gmail.com (Saransh Saxena) Date: Tue, 25 May 2021 17:28:47 +0200 Subject: [petsc-users] Integrating SNES in FEM code In-Reply-To: <38DA9DD8-93DA-461C-993A-FB7B278D3410@petsc.dev> References: <82576661-7CE2-4AF3-B1EA-E0C04B103702@petsc.dev> <3EE85DD6-219F-4FF3-B5A3-F2BDB490CBE8@petsc.dev> <38DA9DD8-93DA-461C-993A-FB7B278D3410@petsc.dev> Message-ID: Hi Barry, Mat Apetsc = A.getpetsc(); Vec bpetsc = b.getpetsc(); Apetsc and bpetsc are Matrix A and vector b (in petsc format), A and b are using different class structure (as per the FEM code) in solving the nonlinear equation A(x).x = b. b is the RHS vector (applied forces in my case) and A is global stiffness matrix (K for static simulations in FEM terms). x is the solution vector (displacements in my case for FEM simulation). r is the residual vector of the form r = b - A(x).x. Only Matrix A is a function of the output in the current case, but I am implementing for a general case where b might also depend on the output. Vec output; VecDuplicate(x, &output); VecCopy(x, output); setdata(vec(b.getpointer()->getdofmanager(), output)); The above lines store the solution for the current iteration so that when the .generate() function is called, updated A matrix is obtained (and updated b as well for a general case where both A and b vary with x, the output). I have to do it by copying the x vector to output because setdata() destroys the vector when called. I was also browsing through the definition of SNESSetFunction and realized that it solves f'(x) x = -f(x), however, in newton raphson x_(n+1) = x_(n) - f(x_(n))/f'(x_(n)). So am I solving for delta_x here with SNESSetFunction? Also in SNESSetPicard(), I need to pass a function to compute b. However, in my case b is constant. How do I use that? Also does Vec r in the definition refer to solution vector or residual vector? Best regards, Saransh On Tue, May 25, 2021 at 10:15 AM Barry Smith wrote: > > VecNorm(F, NORM_2, &normvalres); > > > The F has not yet been computed by you so you shouldn't take the norm > here. F could have anything in it. You should take the norm after the line > > VecAXPY(F,-1.0, bpetsc); > > > > > // Read pointers to A and b: > Mat Apetsc = A.getpetsc(); > Vec bpetsc = b.getpetsc(); > > > Where are these things computed and are they both functions of output? > or is b merely x (the current solution snes is working with) > > Vec output; > VecDuplicate(x, &output); > VecCopy(x, output); > > setdata(vec(b.getpointer()->getdofmanager(), output)); > > > What is the line above doing? > > I think you using using Picard iteration A(x^n) x^{n+1} = b(x^n). > (Sometimes people call this a fixed-point iteration) If so you should use > SNESSetPicard() and not SNESSetFunction(). > > If you run with SNESSetPicard() with no additional options it will run > the defect correction version of Picard > > If you run with SNESSetPicard() and use -snes_mf_operator then SNES > will run matrix-free Newton's method using your A as the preconditioner for > the Jacobian > > If you run with SNESSetPicard() and use -snes_fd then SNES will form > explicitly the Jacobian and run Newton's method with it. This will be very > slow but you gives you an idea of how Newton's method works on your > problem. > > If you call SNESSetFromOptions() before SNESSolve() then you can use > -snes_monitor -ksp_monitor -snes_converged_reason and many other options to > monitor the convergence, then you will not have to compute the norms > yourself and put print statements in your code for the norms. > > Barry > > > On May 25, 2021, at 2:48 AM, Saransh Saxena > wrote: > > Hi guys, > > I've written an implementation of SNES within my code to use the petsc > nonlinear solvers but for some reason, I am getting results I can't make > sense of. To summarize, I've written a function to calculate residual using > Matthew's suggestion. However, when I run the code, the behaviour is > odd, the solver seems to enter the myresidual function initially. However, > after that it never updates the iteration counter and the solution vector > remains unchanged (and a really small value) while the residual vector > explodes in value. > > Residual code :- > > PetscErrorCode sl::myresidual(SNES snes, Vec x, Vec F, void *ctx) > { > // Cast the application context: > sl::formulCtx *user = (sl::formulCtx*)ctx; > > // Read the formulation: > formulation *thisformul = (*user).formul; > thisformul->generate(); > > //vec *b = user->b; > //mat *A = user->A; > > vec b = thisformul->b(); > mat A = thisformul->A(); > > // Read pointers to A and b: > Mat Apetsc = A.getpetsc(); > Vec bpetsc = b.getpetsc(); > > double normvalres, normvalsol; > VecNorm(F, NORM_2, &normvalres); > VecNorm(x, NORM_2, &normvalsol); > std::cout << > "----------------------------------------------------------------------------" > << std::endl; > std::cout << "Entered residual function, norm of residual vector is : > " << normvalres << std::endl; > std::cout << "Entered residual function, norm of solution vector is : > " << normvalsol << std::endl; > > // Compute the residual as F = A*x - b > MatMult(Apetsc, x, F); > VecAXPY(F,-1.0, bpetsc); > > Vec output; > VecDuplicate(x, &output); > VecCopy(x, output); > > setdata(vec(b.getpointer()->getdofmanager(), output)); > > std::cout << "Writing the sol to fields \n"; > > return 0; > } > > SNES implementation :- > > void sl::solvenonlinear(formulation thisformul, double restol, int > maxitnum) > { > // Make sure the problem is of the form Ax = b: > if (thisformul.isdampingmatrixdefined() || > thisformul.ismassmatrixdefined()) > { > std::cout << "Error in 'sl' namespace: formulation to solve cannot > have a damping/mass matrix (use a time resolution algorithm)" << std::endl; > abort(); > } > > // Remove leftovers (if any): > mat Atemp = thisformul.A(); vec btemp = thisformul.b(); > > // Create Application Context for formulation > sl::formulCtx user; > user.formul = &thisformul; > > // Generate formulation to set PETSc SNES requirements: > thisformul.generate(); > > mat A = thisformul.A(); > vec b = thisformul.b(); > > // SNES requirements: > Vec bpetsc = b.getpetsc(); > Mat Apetsc = A.getpetsc(); > > vec residual(std::shared_ptr(new > rawvec(b.getpointer()->getdofmanager()))); > Vec residualpetsc = residual.getpetsc(); > > vec sol(std::shared_ptr(new > rawvec(b.getpointer()->getdofmanager()))); > Vec solpetsc = sol.getpetsc(); > > //Retrieve the SNES and KSP Context from A matrix: > SNES* snes = A.getpointer()->getsnes(); > KSP* ksp = A.getpointer()->getksp(); > > // Create placeholder for preconditioner: > PC pc; > > // Create snes context: > SNESCreate(PETSC_COMM_SELF, snes); > SNESSetFunction(*snes, residualpetsc, sl::myresidual, &user); > SNESSetTolerances(*snes, PETSC_DEFAULT, restol, PETSC_DEFAULT, > maxitnum, 5); > > // Retrieve the KSP context automatically created: > SNESGetKSP(*snes, ksp); > > //Set KSP specific parameters/options: > KSPSetOperators(*ksp, Apetsc, Apetsc); > KSPSetFromOptions(*ksp); > KSPGetPC(*ksp,&pc); > PCSetType(pc,PCLU); > PCFactorSetMatSolverType(pc,MATSOLVERMUMPS); > > //Call SNES options to invoke changes from console: > SNESSetFromOptions(*snes); > > // Set SNES Monitor to retrieve convergence information: > SNESMonitorSet(*snes, sl::mysnesmonitor, PETSC_NULL, PETSC_NULL); > //SNESMonitorLGResidualNorm(); > > SNESSolve(*snes, PETSC_NULL, solpetsc); > > // Print the norm of residual: > double normres; > VecNorm(residualpetsc, NORM_2, &normres); > std::cout << "L2 norm of the residual is : " << normres << std::endl; > > //Set the solution to all the fields: > setdata(sol); > > // Get the number of required iterations and the residual norm: > //SNESGetIterationNumber(*snes, &maxitnum); > //SNESGetResidualNorm(*snes, &restol); > > // Destroy SNES context once done with computation: > SNESDestroy(snes); > > } > > Output :- > > > Am I doing something incorrect wrt SNES? When I use the linear solver > (KSP) and manually coded fixed point nonlinear iteration, it works fine. > > Best regards, > Saransh > > > > On Sun, May 9, 2021 at 10:43 PM Barry Smith wrote: > >> >> Saransh, >> >> If Picard or Newton's method does not converge, you can consider >> adding pseudo-transient and/or other continuation methods. For example, if >> the problem is made difficult by certain physical parameters you can start >> with "easier" values of the parameters, solve the nonlinear system, then >> use its solution as the initial guess for slightly more "difficult" >> parameters, etc. Or, depending on the problem grid sequencing may be >> appropriate. We have some tools to help with all these approaches. >> >> Barry >> >> >> On May 9, 2021, at 2:07 PM, Saransh Saxena >> wrote: >> >> Thanks Barry and Matt, >> >> Till now I was only using a simple fixed point nonlinear solver manually >> coded instead of ones provided by PETSc. However, the problem I am trying >> to solve is highly nonlinear so I suppose I'll need at least a newton based >> solver to start with. I'll get back to you guys if I have any questions. >> >> Cheers, >> Saransh >> >> On Sat, May 8, 2021 at 5:18 AM Barry Smith wrote: >> >>> Saransh, >>> >>> I've add some code for SNESSetPicard() in the PETSc branch >>> barry/2021-05-06/add-snes-picard-mf see also http >>> s://gitlab.com/petsc/petsc/-/merge_requests/3962 that will make your >>> coding much easier. >>> >>> With this branch you can provide code that computes A(x), using >>> SNESSetPicard(). >>> >>> 1) by default it will use the defection-correction form of Picard >>> iteration A(x^n)(x^{n+1} - x^{n}) = b - A(x^n) to solve, which can be >>> slower than Newton >>> >>> 2) with -snes_fd_color it will compute the Jacobian via coloring using >>> SNESComputeJacobianDefaultColor() (assuming the true Jacobian has the same >>> sparsity structure as A). The true Jacobian is J(x^n) = A'(x^n)[x^n] - >>> A(x^n) where A'(x^n) is the third order tensor of the derivatives of A() >>> and A'(x^n)[x^n] is a matrix, I do not know if, in general, it has the same >>> nonzero structure as A. (I'm lost beyond matrices :-(). >>> >>> 3) with -snes_mf_operator it will apply the true Jacobian matrix-free >>> and precondition it with a preconditioner built from A(x^n) matrix, for >>> some problems this works well. >>> >>> 4) with -snes_fd it uses SNESComputeJacobianDefault() and computes the >>> Jacobian by finite differencing one column at a time, thus it is very slow >>> and not useful for large problems. But useful for testing with small >>> problems. >>> >>> So you can provide A() and need not worrying about providing the >>> Jacobian or even the function evaluation code. It is all taken care of by >>> SNESSetPicard(). >>> >>> Hope this helps, >>> >>> Barry >>> >>> >>> On May 6, 2021, at 1:21 PM, Matthew Knepley wrote: >>> >>> On Thu, May 6, 2021 at 2:09 PM Saransh Saxena < >>> saransh.saxena5571 at gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I am trying to incorporate newton method in solving a nonlinear FEM >>>> equation using SNES from PETSc. The overall equation is of the type A(x).x >>>> = b, where b is a vector of external loads, x is the solution field (say >>>> displacements for e.g.) and A is the combined LHS matrix derived from the >>>> discretization of weak formulation of the governing finite element >>>> equation. >>>> >>>> While going through the manual and examples of snes, I found that I >>>> need to define the function of residual using SNESSetFunction() and >>>> jacobian using SNESSetJacobian(). In that context I had a couple of >>>> questions :- >>>> >>>> 1. In the snes tutorials I've browsed through, the functions for >>>> computing residual passed had arguments only for x, the solution vector and >>>> f, the residual vector. Is there a way a user can pass an additional vector >>>> (b) and matrix (A) for computing the residual as well? as in my case, f = b >>>> - A(x).x >>>> >>> >>> You would give PETSc an outer function MyResidual() that looked like >>> this: >>> >>> PetscErrorCode MyResidual(SNES snes, Vec X, Vec F, void *ctx) >>> { >>> >>> >>> MatMult(A, X, F); >>> VecAXPY(F, -1.0, b); >>> } >>> >>> >>>> 2. Since computing jacobian is not that trivial, I would like to use >>>> one of the pre-built jacobian methods. Is there any other step other than >>>> setting the 3rd argument in SNESSetJacobian to SNESComputeJacobianDefault? >>>> >>> >>> If you do nothing, we will compute it by default. >>> >>> Thanks, >>> >>> MAtt >>> >>> >>>> Best regards, >>>> >>>> Saransh >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue May 25 10:50:19 2021 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 25 May 2021 11:50:19 -0400 Subject: [petsc-users] Integrating SNES in FEM code In-Reply-To: References: <82576661-7CE2-4AF3-B1EA-E0C04B103702@petsc.dev> <3EE85DD6-219F-4FF3-B5A3-F2BDB490CBE8@petsc.dev> <38DA9DD8-93DA-461C-993A-FB7B278D3410@petsc.dev> Message-ID: On Tue, May 25, 2021 at 11:29 AM Saransh Saxena < saransh.saxena5571 at gmail.com> wrote: > Hi Barry, > > Mat Apetsc = A.getpetsc(); > Vec bpetsc = b.getpetsc(); > > Apetsc and bpetsc are Matrix A and vector b (in petsc format), A and b are > using different class structure (as per the FEM code) in solving the > nonlinear equation A(x).x = b. b is the RHS vector (applied forces in my > case) and A is global stiffness matrix (K for static simulations in FEM > terms). x is the solution vector (displacements in my case for FEM > simulation). r is the residual vector of the form r = b - A(x).x. Only > Matrix A is a function of the output in the current case, but I am > implementing for a general case where b might also depend on the output. > > Vec output; > VecDuplicate(x, &output); > VecCopy(x, output); > > setdata(vec(b.getpointer()->getdofmanager(), output)); > > The above lines store the solution for the current iteration so that when > the .generate() function is called, updated A matrix is obtained (and > updated b as well for a general case where both A and b vary with x, the > output). I have to do it by copying the x vector to output because > setdata() destroys the vector when called. > > I was also browsing through the definition of SNESSetFunction and realized > that it solves f'(x) x = -f(x), however, in newton raphson x_(n+1) = > x_(n) - f(x_(n))/f'(x_(n)). So am I solving for delta_x here with > SNESSetFunction? > No. SNESSetFunction() provides the residual F, so that F(x) = 0 You can use _many_ different algorithms to solve this system. One is Newton. > Also in SNESSetPicard(), I need to pass a function to compute b. However, > in my case b is constant. How do I use that? Also does Vec r in the > definition refer to solution vector or residual vector? > Just keep returning that vector. Thanks, Matt > Best regards, > Saransh > > > > > > On Tue, May 25, 2021 at 10:15 AM Barry Smith wrote: > >> >> VecNorm(F, NORM_2, &normvalres); >> >> >> The F has not yet been computed by you so you shouldn't take the norm >> here. F could have anything in it. You should take the norm after the line >> >> VecAXPY(F,-1.0, bpetsc); >> >> >> >> >> // Read pointers to A and b: >> Mat Apetsc = A.getpetsc(); >> Vec bpetsc = b.getpetsc(); >> >> >> Where are these things computed and are they both functions of output? >> or is b merely x (the current solution snes is working with) >> >> Vec output; >> VecDuplicate(x, &output); >> VecCopy(x, output); >> >> setdata(vec(b.getpointer()->getdofmanager(), output)); >> >> >> What is the line above doing? >> >> I think you using using Picard iteration A(x^n) x^{n+1} = b(x^n). >> (Sometimes people call this a fixed-point iteration) If so you should use >> SNESSetPicard() and not SNESSetFunction(). >> >> If you run with SNESSetPicard() with no additional options it will >> run the defect correction version of Picard >> >> If you run with SNESSetPicard() and use -snes_mf_operator then SNES >> will run matrix-free Newton's method using your A as the preconditioner for >> the Jacobian >> >> If you run with SNESSetPicard() and use -snes_fd then SNES will form >> explicitly the Jacobian and run Newton's method with it. This will be very >> slow but you gives you an idea of how Newton's method works on your >> problem. >> >> If you call SNESSetFromOptions() before SNESSolve() then you can use >> -snes_monitor -ksp_monitor -snes_converged_reason and many other options to >> monitor the convergence, then you will not have to compute the norms >> yourself and put print statements in your code for the norms. >> >> Barry >> >> >> On May 25, 2021, at 2:48 AM, Saransh Saxena >> wrote: >> >> Hi guys, >> >> I've written an implementation of SNES within my code to use the petsc >> nonlinear solvers but for some reason, I am getting results I can't make >> sense of. To summarize, I've written a function to calculate residual using >> Matthew's suggestion. However, when I run the code, the behaviour is >> odd, the solver seems to enter the myresidual function initially. However, >> after that it never updates the iteration counter and the solution vector >> remains unchanged (and a really small value) while the residual vector >> explodes in value. >> >> Residual code :- >> >> PetscErrorCode sl::myresidual(SNES snes, Vec x, Vec F, void *ctx) >> { >> // Cast the application context: >> sl::formulCtx *user = (sl::formulCtx*)ctx; >> >> // Read the formulation: >> formulation *thisformul = (*user).formul; >> thisformul->generate(); >> >> //vec *b = user->b; >> //mat *A = user->A; >> >> vec b = thisformul->b(); >> mat A = thisformul->A(); >> >> // Read pointers to A and b: >> Mat Apetsc = A.getpetsc(); >> Vec bpetsc = b.getpetsc(); >> >> double normvalres, normvalsol; >> VecNorm(F, NORM_2, &normvalres); >> VecNorm(x, NORM_2, &normvalsol); >> std::cout << >> "----------------------------------------------------------------------------" >> << std::endl; >> std::cout << "Entered residual function, norm of residual vector is : >> " << normvalres << std::endl; >> std::cout << "Entered residual function, norm of solution vector is : >> " << normvalsol << std::endl; >> >> // Compute the residual as F = A*x - b >> MatMult(Apetsc, x, F); >> VecAXPY(F,-1.0, bpetsc); >> >> Vec output; >> VecDuplicate(x, &output); >> VecCopy(x, output); >> >> setdata(vec(b.getpointer()->getdofmanager(), output)); >> >> std::cout << "Writing the sol to fields \n"; >> >> return 0; >> } >> >> SNES implementation :- >> >> void sl::solvenonlinear(formulation thisformul, double restol, int >> maxitnum) >> { >> // Make sure the problem is of the form Ax = b: >> if (thisformul.isdampingmatrixdefined() || >> thisformul.ismassmatrixdefined()) >> { >> std::cout << "Error in 'sl' namespace: formulation to solve >> cannot have a damping/mass matrix (use a time resolution algorithm)" << >> std::endl; >> abort(); >> } >> >> // Remove leftovers (if any): >> mat Atemp = thisformul.A(); vec btemp = thisformul.b(); >> >> // Create Application Context for formulation >> sl::formulCtx user; >> user.formul = &thisformul; >> >> // Generate formulation to set PETSc SNES requirements: >> thisformul.generate(); >> >> mat A = thisformul.A(); >> vec b = thisformul.b(); >> >> // SNES requirements: >> Vec bpetsc = b.getpetsc(); >> Mat Apetsc = A.getpetsc(); >> >> vec residual(std::shared_ptr(new >> rawvec(b.getpointer()->getdofmanager()))); >> Vec residualpetsc = residual.getpetsc(); >> >> vec sol(std::shared_ptr(new >> rawvec(b.getpointer()->getdofmanager()))); >> Vec solpetsc = sol.getpetsc(); >> >> //Retrieve the SNES and KSP Context from A matrix: >> SNES* snes = A.getpointer()->getsnes(); >> KSP* ksp = A.getpointer()->getksp(); >> >> // Create placeholder for preconditioner: >> PC pc; >> >> // Create snes context: >> SNESCreate(PETSC_COMM_SELF, snes); >> SNESSetFunction(*snes, residualpetsc, sl::myresidual, &user); >> SNESSetTolerances(*snes, PETSC_DEFAULT, restol, PETSC_DEFAULT, >> maxitnum, 5); >> >> // Retrieve the KSP context automatically created: >> SNESGetKSP(*snes, ksp); >> >> //Set KSP specific parameters/options: >> KSPSetOperators(*ksp, Apetsc, Apetsc); >> KSPSetFromOptions(*ksp); >> KSPGetPC(*ksp,&pc); >> PCSetType(pc,PCLU); >> PCFactorSetMatSolverType(pc,MATSOLVERMUMPS); >> >> //Call SNES options to invoke changes from console: >> SNESSetFromOptions(*snes); >> >> // Set SNES Monitor to retrieve convergence information: >> SNESMonitorSet(*snes, sl::mysnesmonitor, PETSC_NULL, PETSC_NULL); >> //SNESMonitorLGResidualNorm(); >> >> SNESSolve(*snes, PETSC_NULL, solpetsc); >> >> // Print the norm of residual: >> double normres; >> VecNorm(residualpetsc, NORM_2, &normres); >> std::cout << "L2 norm of the residual is : " << normres << std::endl; >> >> //Set the solution to all the fields: >> setdata(sol); >> >> // Get the number of required iterations and the residual norm: >> //SNESGetIterationNumber(*snes, &maxitnum); >> //SNESGetResidualNorm(*snes, &restol); >> >> // Destroy SNES context once done with computation: >> SNESDestroy(snes); >> >> } >> >> Output :- >> >> >> Am I doing something incorrect wrt SNES? When I use the linear solver >> (KSP) and manually coded fixed point nonlinear iteration, it works fine. >> >> Best regards, >> Saransh >> >> >> >> On Sun, May 9, 2021 at 10:43 PM Barry Smith wrote: >> >>> >>> Saransh, >>> >>> If Picard or Newton's method does not converge, you can consider >>> adding pseudo-transient and/or other continuation methods. For example, if >>> the problem is made difficult by certain physical parameters you can start >>> with "easier" values of the parameters, solve the nonlinear system, then >>> use its solution as the initial guess for slightly more "difficult" >>> parameters, etc. Or, depending on the problem grid sequencing may be >>> appropriate. We have some tools to help with all these approaches. >>> >>> Barry >>> >>> >>> On May 9, 2021, at 2:07 PM, Saransh Saxena >>> wrote: >>> >>> Thanks Barry and Matt, >>> >>> Till now I was only using a simple fixed point nonlinear solver manually >>> coded instead of ones provided by PETSc. However, the problem I am trying >>> to solve is highly nonlinear so I suppose I'll need at least a newton based >>> solver to start with. I'll get back to you guys if I have any questions. >>> >>> Cheers, >>> Saransh >>> >>> On Sat, May 8, 2021 at 5:18 AM Barry Smith wrote: >>> >>>> Saransh, >>>> >>>> I've add some code for SNESSetPicard() in the PETSc branch >>>> barry/2021-05-06/add-snes-picard-mf see also http >>>> s://gitlab.com/petsc/petsc/-/merge_requests/3962 that will make your >>>> coding much easier. >>>> >>>> With this branch you can provide code that computes A(x), using >>>> SNESSetPicard(). >>>> >>>> 1) by default it will use the defection-correction form of Picard >>>> iteration A(x^n)(x^{n+1} - x^{n}) = b - A(x^n) to solve, which can be >>>> slower than Newton >>>> >>>> 2) with -snes_fd_color it will compute the Jacobian via coloring using >>>> SNESComputeJacobianDefaultColor() (assuming the true Jacobian has the same >>>> sparsity structure as A). The true Jacobian is J(x^n) = A'(x^n)[x^n] - >>>> A(x^n) where A'(x^n) is the third order tensor of the derivatives of A() >>>> and A'(x^n)[x^n] is a matrix, I do not know if, in general, it has the same >>>> nonzero structure as A. (I'm lost beyond matrices :-(). >>>> >>>> 3) with -snes_mf_operator it will apply the true Jacobian matrix-free >>>> and precondition it with a preconditioner built from A(x^n) matrix, for >>>> some problems this works well. >>>> >>>> 4) with -snes_fd it uses SNESComputeJacobianDefault() and computes the >>>> Jacobian by finite differencing one column at a time, thus it is very slow >>>> and not useful for large problems. But useful for testing with small >>>> problems. >>>> >>>> So you can provide A() and need not worrying about providing the >>>> Jacobian or even the function evaluation code. It is all taken care of by >>>> SNESSetPicard(). >>>> >>>> Hope this helps, >>>> >>>> Barry >>>> >>>> >>>> On May 6, 2021, at 1:21 PM, Matthew Knepley wrote: >>>> >>>> On Thu, May 6, 2021 at 2:09 PM Saransh Saxena < >>>> saransh.saxena5571 at gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am trying to incorporate newton method in solving a nonlinear FEM >>>>> equation using SNES from PETSc. The overall equation is of the type A(x).x >>>>> = b, where b is a vector of external loads, x is the solution field (say >>>>> displacements for e.g.) and A is the combined LHS matrix derived from the >>>>> discretization of weak formulation of the governing finite element >>>>> equation. >>>>> >>>>> While going through the manual and examples of snes, I found that I >>>>> need to define the function of residual using SNESSetFunction() and >>>>> jacobian using SNESSetJacobian(). In that context I had a couple of >>>>> questions :- >>>>> >>>>> 1. In the snes tutorials I've browsed through, the functions for >>>>> computing residual passed had arguments only for x, the solution vector and >>>>> f, the residual vector. Is there a way a user can pass an additional vector >>>>> (b) and matrix (A) for computing the residual as well? as in my case, f = b >>>>> - A(x).x >>>>> >>>> >>>> You would give PETSc an outer function MyResidual() that looked like >>>> this: >>>> >>>> PetscErrorCode MyResidual(SNES snes, Vec X, Vec F, void *ctx) >>>> { >>>> >>>> >>>> MatMult(A, X, F); >>>> VecAXPY(F, -1.0, b); >>>> } >>>> >>>> >>>>> 2. Since computing jacobian is not that trivial, I would like to use >>>>> one of the pre-built jacobian methods. Is there any other step other than >>>>> setting the 3rd argument in SNESSetJacobian to SNESComputeJacobianDefault? >>>>> >>>> >>>> If you do nothing, we will compute it by default. >>>> >>>> Thanks, >>>> >>>> MAtt >>>> >>>> >>>>> Best regards, >>>>> >>>>> Saransh >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> >>>> >>> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue May 25 11:49:12 2021 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 25 May 2021 11:49:12 -0500 Subject: [petsc-users] Integrating SNES in FEM code In-Reply-To: References: <82576661-7CE2-4AF3-B1EA-E0C04B103702@petsc.dev> <3EE85DD6-219F-4FF3-B5A3-F2BDB490CBE8@petsc.dev> <38DA9DD8-93DA-461C-993A-FB7B278D3410@petsc.dev> Message-ID: <278C3D6B-9951-4CB8-8B5F-802E97C209A3@petsc.dev> > On May 25, 2021, at 10:28 AM, Saransh Saxena wrote: > > Hi Barry, > >> Mat Apetsc = A.getpetsc(); >> Vec bpetsc = b.getpetsc(); > Apetsc and bpetsc are Matrix A and vector b (in petsc format), A and b are using different class structure (as per the FEM code) in solving the nonlinear equation A(x).x = b. b is the RHS vector (applied forces in my case) and A is global stiffness matrix (K for static simulations in FEM terms). x is the solution vector (displacements in my case for FEM simulation). r is the residual vector of the form r = b - A(x).x. Only Matrix A is a function of the output in the current case, but I am implementing for a general case where b might also depend on the output. > >> Vec output; >> VecDuplicate(x, &output); >> VecCopy(x, output); >> >> setdata(vec(b.getpointer()->getdofmanager(), output)); > The above lines store the solution for the current iteration so that when the .generate() function is called, updated A matrix is obtained (and updated b as well for a general case where both A and b vary with x, the output). I have to do it by copying the x vector to output because setdata() destroys the vector when called. > > I was also browsing through the definition of SNESSetFunction and realized that it solves f'(x) x = -f(x), however, in newton raphson x_(n+1) = x_(n) - f(x_(n))/f'(x_(n)). So am I solving for delta_x here with SNESSetFunction? > > Also in SNESSetPicard(), I need to pass a function to compute b. However, in my case b is constant. How do I use that? You have two choices. * As Matt says you can just provide a function that copies over your constant b vector each time. * Or you can pass a NULL for the function and call SNESSolve(snes,b,x) where b is your constant b vector (this will be slightly more efficient) > Also does Vec r in the definition refer to solution vector or residual vector? r is a vector that SNES will use to compute the residual in. It is just a work vector you can pass in. You can pass in NULL and PETSc will create the work vector it needs internally. > > Best regards, > Saransh > > > > > > On Tue, May 25, 2021 at 10:15 AM Barry Smith > wrote: > >> VecNorm(F, NORM_2, &normvalres); > > The F has not yet been computed by you so you shouldn't take the norm here. F could have anything in it. You should take the norm after the line > >> VecAXPY(F,-1.0, bpetsc); > > > >> // Read pointers to A and b: >> Mat Apetsc = A.getpetsc(); >> Vec bpetsc = b.getpetsc(); > > > Where are these things computed and are they both functions of output? or is b merely x (the current solution snes is working with) > >> Vec output; >> VecDuplicate(x, &output); >> VecCopy(x, output); >> >> setdata(vec(b.getpointer()->getdofmanager(), output)); > > What is the line above doing? > > I think you using using Picard iteration A(x^n) x^{n+1} = b(x^n). (Sometimes people call this a fixed-point iteration) If so you should use SNESSetPicard() and not SNESSetFunction(). > > If you run with SNESSetPicard() with no additional options it will run the defect correction version of Picard > > If you run with SNESSetPicard() and use -snes_mf_operator then SNES will run matrix-free Newton's method using your A as the preconditioner for the Jacobian > > If you run with SNESSetPicard() and use -snes_fd then SNES will form explicitly the Jacobian and run Newton's method with it. This will be very slow but you gives you an idea of how Newton's method works on your problem. > > If you call SNESSetFromOptions() before SNESSolve() then you can use -snes_monitor -ksp_monitor -snes_converged_reason and many other options to monitor the convergence, then you will not have to compute the norms yourself and put print statements in your code for the norms. > > Barry > > >> On May 25, 2021, at 2:48 AM, Saransh Saxena > wrote: >> >> Hi guys, >> >> I've written an implementation of SNES within my code to use the petsc nonlinear solvers but for some reason, I am getting results I can't make sense of. To summarize, I've written a function to calculate residual using Matthew's suggestion. However, when I run the code, the behaviour is odd, the solver seems to enter the myresidual function initially. However, after that it never updates the iteration counter and the solution vector remains unchanged (and a really small value) while the residual vector explodes in value. >> >> Residual code :- >> >> PetscErrorCode sl::myresidual(SNES snes, Vec x, Vec F, void *ctx) >> { >> // Cast the application context: >> sl::formulCtx *user = (sl::formulCtx*)ctx; >> >> // Read the formulation: >> formulation *thisformul = (*user).formul; >> thisformul->generate(); >> >> //vec *b = user->b; >> //mat *A = user->A; >> >> vec b = thisformul->b(); >> mat A = thisformul->A(); >> >> // Read pointers to A and b: >> Mat Apetsc = A.getpetsc(); >> Vec bpetsc = b.getpetsc(); >> >> double normvalres, normvalsol; >> VecNorm(F, NORM_2, &normvalres); >> VecNorm(x, NORM_2, &normvalsol); >> std::cout << "----------------------------------------------------------------------------" << std::endl; >> std::cout << "Entered residual function, norm of residual vector is : " << normvalres << std::endl; >> std::cout << "Entered residual function, norm of solution vector is : " << normvalsol << std::endl; >> >> // Compute the residual as F = A*x - b >> MatMult(Apetsc, x, F); >> VecAXPY(F,-1.0, bpetsc); >> >> Vec output; >> VecDuplicate(x, &output); >> VecCopy(x, output); >> >> setdata(vec(b.getpointer()->getdofmanager(), output)); >> >> std::cout << "Writing the sol to fields \n"; >> >> return 0; >> } >> >> SNES implementation :- >> >> void sl::solvenonlinear(formulation thisformul, double restol, int maxitnum) >> { >> // Make sure the problem is of the form Ax = b: >> if (thisformul.isdampingmatrixdefined() || thisformul.ismassmatrixdefined()) >> { >> std::cout << "Error in 'sl' namespace: formulation to solve cannot have a damping/mass matrix (use a time resolution algorithm)" << std::endl; >> abort(); >> } >> >> // Remove leftovers (if any): >> mat Atemp = thisformul.A(); vec btemp = thisformul.b(); >> >> // Create Application Context for formulation >> sl::formulCtx user; >> user.formul = &thisformul; >> >> // Generate formulation to set PETSc SNES requirements: >> thisformul.generate(); >> >> mat A = thisformul.A(); >> vec b = thisformul.b(); >> >> // SNES requirements: >> Vec bpetsc = b.getpetsc(); >> Mat Apetsc = A.getpetsc(); >> >> vec residual(std::shared_ptr(new rawvec(b.getpointer()->getdofmanager()))); >> Vec residualpetsc = residual.getpetsc(); >> >> vec sol(std::shared_ptr(new rawvec(b.getpointer()->getdofmanager()))); >> Vec solpetsc = sol.getpetsc(); >> >> //Retrieve the SNES and KSP Context from A matrix: >> SNES* snes = A.getpointer()->getsnes(); >> KSP* ksp = A.getpointer()->getksp(); >> >> // Create placeholder for preconditioner: >> PC pc; >> >> // Create snes context: >> SNESCreate(PETSC_COMM_SELF, snes); >> SNESSetFunction(*snes, residualpetsc, sl::myresidual, &user); >> SNESSetTolerances(*snes, PETSC_DEFAULT, restol, PETSC_DEFAULT, maxitnum, 5); >> >> // Retrieve the KSP context automatically created: >> SNESGetKSP(*snes, ksp); >> >> //Set KSP specific parameters/options: >> KSPSetOperators(*ksp, Apetsc, Apetsc); >> KSPSetFromOptions(*ksp); >> KSPGetPC(*ksp,&pc); >> PCSetType(pc,PCLU); >> PCFactorSetMatSolverType(pc,MATSOLVERMUMPS); >> >> //Call SNES options to invoke changes from console: >> SNESSetFromOptions(*snes); >> >> // Set SNES Monitor to retrieve convergence information: >> SNESMonitorSet(*snes, sl::mysnesmonitor, PETSC_NULL, PETSC_NULL); >> //SNESMonitorLGResidualNorm(); >> >> SNESSolve(*snes, PETSC_NULL, solpetsc); >> >> // Print the norm of residual: >> double normres; >> VecNorm(residualpetsc, NORM_2, &normres); >> std::cout << "L2 norm of the residual is : " << normres << std::endl; >> >> //Set the solution to all the fields: >> setdata(sol); >> >> // Get the number of required iterations and the residual norm: >> //SNESGetIterationNumber(*snes, &maxitnum); >> //SNESGetResidualNorm(*snes, &restol); >> >> // Destroy SNES context once done with computation: >> SNESDestroy(snes); >> >> } >> >> Output :- >> >> >> Am I doing something incorrect wrt SNES? When I use the linear solver (KSP) and manually coded fixed point nonlinear iteration, it works fine. >> >> Best regards, >> Saransh >> >> >> >> On Sun, May 9, 2021 at 10:43 PM Barry Smith > wrote: >> >> Saransh, >> >> If Picard or Newton's method does not converge, you can consider adding pseudo-transient and/or other continuation methods. For example, if the problem is made difficult by certain physical parameters you can start with "easier" values of the parameters, solve the nonlinear system, then use its solution as the initial guess for slightly more "difficult" parameters, etc. Or, depending on the problem grid sequencing may be appropriate. We have some tools to help with all these approaches. >> >> Barry >> >> >>> On May 9, 2021, at 2:07 PM, Saransh Saxena > wrote: >>> >>> Thanks Barry and Matt, >>> >>> Till now I was only using a simple fixed point nonlinear solver manually coded instead of ones provided by PETSc. However, the problem I am trying to solve is highly nonlinear so I suppose I'll need at least a newton based solver to start with. I'll get back to you guys if I have any questions. >>> >>> Cheers, >>> Saransh >>> >>> On Sat, May 8, 2021 at 5:18 AM Barry Smith > wrote: >>> Saransh, >>> >>> I've add some code for SNESSetPicard() in the PETSc branch barry/2021-05-06/add-snes-picard-mf see also https://gitlab.com/petsc/petsc/-/merge_requests/3962 <> that will make your coding much easier. >>> >>> With this branch you can provide code that computes A(x), using SNESSetPicard(). >>> >>> 1) by default it will use the defection-correction form of Picard iteration A(x^n)(x^{n+1} - x^{n}) = b - A(x^n) to solve, which can be slower than Newton >>> >>> 2) with -snes_fd_color it will compute the Jacobian via coloring using SNESComputeJacobianDefaultColor() (assuming the true Jacobian has the same sparsity structure as A). The true Jacobian is J(x^n) = A'(x^n)[x^n] - A(x^n) where A'(x^n) is the third order tensor of the derivatives of A() and A'(x^n)[x^n] is a matrix, I do not know if, in general, it has the same nonzero structure as A. (I'm lost beyond matrices :-(). >>> >>> 3) with -snes_mf_operator it will apply the true Jacobian matrix-free and precondition it with a preconditioner built from A(x^n) matrix, for some problems this works well. >>> >>> 4) with -snes_fd it uses SNESComputeJacobianDefault() and computes the Jacobian by finite differencing one column at a time, thus it is very slow and not useful for large problems. But useful for testing with small problems. >>> >>> So you can provide A() and need not worrying about providing the Jacobian or even the function evaluation code. It is all taken care of by SNESSetPicard(). >>> >>> Hope this helps, >>> >>> Barry >>> >>> >>>> On May 6, 2021, at 1:21 PM, Matthew Knepley > wrote: >>>> >>>> On Thu, May 6, 2021 at 2:09 PM Saransh Saxena > wrote: >>>> Hi, >>>> >>>> I am trying to incorporate newton method in solving a nonlinear FEM equation using SNES from PETSc. The overall equation is of the type A(x).x = b, where b is a vector of external loads, x is the solution field (say displacements for e.g.) and A is the combined LHS matrix derived from the discretization of weak formulation of the governing finite element equation. >>>> >>>> While going through the manual and examples of snes, I found that I need to define the function of residual using SNESSetFunction() and jacobian using SNESSetJacobian(). In that context I had a couple of questions :- >>>> >>>> 1. In the snes tutorials I've browsed through, the functions for computing residual passed had arguments only for x, the solution vector and f, the residual vector. Is there a way a user can pass an additional vector (b) and matrix (A) for computing the residual as well? as in my case, f = b - A(x).x >>>> >>>> You would give PETSc an outer function MyResidual() that looked like this: >>>> >>>> PetscErrorCode MyResidual(SNES snes, Vec X, Vec F, void *ctx) >>>> { >>>> >>>> >>>> MatMult(A, X, F); >>>> VecAXPY(F, -1.0, b); >>>> } >>>> >>>> 2. Since computing jacobian is not that trivial, I would like to use one of the pre-built jacobian methods. Is there any other step other than setting the 3rd argument in SNESSetJacobian to SNESComputeJacobianDefault? >>>> >>>> If you do nothing, we will compute it by default. >>>> >>>> Thanks, >>>> >>>> MAtt >>>> >>>> Best regards, >>>> >>>> Saransh >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Tue May 25 12:15:37 2021 From: hgbk2008 at gmail.com (hg) Date: Tue, 25 May 2021 19:15:37 +0200 Subject: [petsc-users] divergence of quasi-Newton scheme (SNES) Message-ID: Hello I would expect the setting below would give the same behaviour like -snes_type newtonls -snes_linesearch_type basic: -snes_type qn -snes_qn_type lbfgs -snes_qn_m 1 -snes_qn_restart_type periodic -snes_qn_scale_type jacobian -snes_linesearch_type basic But it's not, below is the convergence log: entering BuildRHS 0 SNES Function norm 7.450427214612e+03 entering BuildLHS entering BuildRHS 1 SNES Function norm 7.902262148182e+03 entering BuildRHS 2 SNES Function norm 8.426417730274e+03 Periodic restart! i_r = 1 entering BuildLHS entering BuildRHS 3 SNES Function norm 5.571513092130e+04 entering BuildRHS 4 SNES Function norm 4.019723509872e+05 Periodic restart! i_r = 1 entering BuildLHS entering BuildRHS 5 SNES Function norm 9.259722791615e+05 entering BuildRHS 6 SNES Function norm 3.985884724278e+08 Nonlinear solve did not converge due to DIVERGED_DTOL iterations 6 For -snes_type newtonls -snes_linesearch_type basic: entering BuildRHS 0 SNES Function norm 7.450427214612e+03 entering BuildLHS entering BuildRHS 1 SNES Function norm 1.937109245338e+01 entering BuildLHS entering BuildRHS 2 SNES Function norm 8.126736406257e-01 entering BuildLHS entering BuildRHS 3 SNES Function norm 1.143237968970e-03 entering BuildLHS entering BuildRHS 4 SNES Function norm 2.706184329411e-09 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 4 Would it be the parameters not the same? My idea is first starting with a good configuration of lbfgs then increase the restart (m) to see how the convergence going (and save time). Thanks Giang -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue May 25 14:24:57 2021 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 25 May 2021 14:24:57 -0500 Subject: [petsc-users] adding calls before and after each iteration of snes In-Reply-To: References: Message-ID: There is also SNESMonitorSet() and SNESSetConvergenceTest(). > On May 25, 2021, at 9:25 AM, Matthew Knepley wrote: > > On Tue, May 25, 2021 at 8:41 AM hg > wrote: > Hello > > I would like to ask if it is possible to add function call before and after each iteration of SNES solve, e.g. InitializeNonLinearIteration and FinalizeNonLinearIteration. It is particularly useful for debugging the constitutive law or for post-processing to post the intermediate results. > > There is this: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetUpdate.html > > Thanks, > > Matt > > Best > Giang > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Tue May 25 14:29:11 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 25 May 2021 22:29:11 +0300 Subject: [petsc-users] adding calls before and after each iteration of snes In-Reply-To: References: Message-ID: <1F9E0AAF-14C8-4664-A91A-D1B94D90D5DF@gmail.com> I use https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetUpdate.html and https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESLineSearchSetPostCheck.html > On May 25, 2021, at 10:24 PM, Barry Smith wrote: > > > There is also SNESMonitorSet() and SNESSetConvergenceTest(). > >> On May 25, 2021, at 9:25 AM, Matthew Knepley > wrote: >> >> On Tue, May 25, 2021 at 8:41 AM hg > wrote: >> Hello >> >> I would like to ask if it is possible to add function call before and after each iteration of SNES solve, e.g. InitializeNonLinearIteration and FinalizeNonLinearIteration. It is particularly useful for debugging the constitutive law or for post-processing to post the intermediate results. >> >> There is this: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetUpdate.html >> >> Thanks, >> >> Matt >> >> Best >> Giang >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue May 25 15:51:07 2021 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 25 May 2021 15:51:07 -0500 Subject: [petsc-users] divergence of quasi-Newton scheme (SNES) In-Reply-To: References: Message-ID: <6F1845FB-5206-4326-B672-7E79F279234A@petsc.dev> Yes, with your options I would expect the first SNES iteration of QN to produce the same result as the first iteration of SNES Newton ls. I have fixed an error that crept (well actually I put it there) in where KSPSetFromOptions() was not being called with the QN and Jacobian option, hence only the default PC (ilu) was being used so if you changed the PC it only affected Newton not QN. I also made it possible to run with a history of length 0 so that with the Jacobian option it should exactly match Newton for all iterations. You can access my fixes with get fetch git checkout barry/2021-05-25/fix-qn-jacobian-setfromoptions/release The merge request with the fixes for release is here https://gitlab.com/petsc/petsc/-/merge_requests/4018 I checked it with the runs below. Using first -pc_type lu then the default PC. ~/Src/petsc/src/snes/tutorials (barry/2021-05-25/fix-qn-jacobian-setfromoptions/release *=) arch-fix-qn-jacobian-setfromoptions $ ./ex19 -pc_type lu -snes_monitor -snes_type qn -snes_qn_type lbfgs -snes_qn_m 0 -snes_qn_restart_type periodic -snes_qn_scale_type jacobian -snes_linesearch_type basic lid velocity = 0.0625, prandtl # = 1., grashof # = 1. 0 SNES Function norm 2.391552133017e-01 1 SNES Function norm 6.819684624592e-05 2 SNES Function norm 4.203401869625e-12 Number of SNES iterations = 2 ~/Src/petsc/src/snes/tutorials (barry/2021-05-25/fix-qn-jacobian-setfromoptions/release *=) arch-fix-qn-jacobian-setfromoptions $ ./ex19 -pc_type lu -snes_monitor -snes_linesearch_type basic lid velocity = 0.0625, prandtl # = 1., grashof # = 1. 0 SNES Function norm 2.391552133017e-01 1 SNES Function norm 6.819684624592e-05 2 SNES Function norm 4.203401869625e-12 Number of SNES iterations = 2 ~/Src/petsc/src/snes/tutorials (barry/2021-05-25/fix-qn-jacobian-setfromoptions/release *=) arch-fix-qn-jacobian-setfromoptions $ ./ex19 -snes_monitor -snes_linesearch_type basic lid velocity = 0.0625, prandtl # = 1., grashof # = 1. 0 SNES Function norm 2.391552133017e-01 1 SNES Function norm 6.839858507066e-05 2 SNES Function norm 8.558777232425e-11 Number of SNES iterations = 2 ~/Src/petsc/src/snes/tutorials (barry/2021-05-25/fix-qn-jacobian-setfromoptions/release *=) arch-fix-qn-jacobian-setfromoptions $ ./ex19 -snes_monitor -snes_type qn -snes_qn_type lbfgs -snes_qn_m 0 -snes_qn_restart_type periodic -snes_qn_scale_type jacobian -snes_linesearch_type basic lid velocity = 0.0625, prandtl # = 1., grashof # = 1. 0 SNES Function norm 2.391552133017e-01 1 SNES Function norm 6.839858507066e-05 2 SNES Function norm 8.558777232425e-11 Number of SNES iterations = 2 ~/Src/petsc/src/snes/tutorials (barry/2021-05-25/fix-qn-jacobian-setfromoptions/release *=) arch-fix-qn-jacobian-setfromoptions $ ./ex19 -snes_monitor -snes_type qn -snes_qn_type lbfgs -snes_qn_m 1 -snes_qn_restart_type periodic -snes_qn_scale_type jacobian -snes_linesearch_type basic lid velocity = 0.0625, prandtl # = 1., grashof # = 1. 0 SNES Function norm 2.391552133017e-01 1 SNES Function norm 6.839858507066e-05 2 SNES Function norm 1.977614824765e-05 3 SNES Function norm 1.286952209377e-10 Number of SNES iterations = 3 If you still have trouble with the branch you can run with -ksp_monitor -snes_linesearch_monitor -info -snes_view to see what may be different with your runs with Newton and with QN. Barry > On May 25, 2021, at 12:15 PM, hg wrote: > > Hello > > I would expect the setting below would give the same behaviour like -snes_type newtonls -snes_linesearch_type basic: > > -snes_type qn > -snes_qn_type lbfgs > -snes_qn_m 1 > -snes_qn_restart_type periodic > -snes_qn_scale_type jacobian > -snes_linesearch_type basic > > But it's not, below is the convergence log: > entering BuildRHS > 0 SNES Function norm 7.450427214612e+03 > entering BuildLHS > entering BuildRHS > 1 SNES Function norm 7.902262148182e+03 > entering BuildRHS > 2 SNES Function norm 8.426417730274e+03 > Periodic restart! i_r = 1 > entering BuildLHS > entering BuildRHS > 3 SNES Function norm 5.571513092130e+04 > entering BuildRHS > 4 SNES Function norm 4.019723509872e+05 > Periodic restart! i_r = 1 > entering BuildLHS > entering BuildRHS > 5 SNES Function norm 9.259722791615e+05 > entering BuildRHS > 6 SNES Function norm 3.985884724278e+08 > Nonlinear solve did not converge due to DIVERGED_DTOL iterations 6 > > For -snes_type newtonls -snes_linesearch_type basic: > > entering BuildRHS > 0 SNES Function norm 7.450427214612e+03 > entering BuildLHS > entering BuildRHS > 1 SNES Function norm 1.937109245338e+01 > entering BuildLHS > entering BuildRHS > 2 SNES Function norm 8.126736406257e-01 > entering BuildLHS > entering BuildRHS > 3 SNES Function norm 1.143237968970e-03 > entering BuildLHS > entering BuildRHS > 4 SNES Function norm 2.706184329411e-09 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 4 > > Would it be the parameters not the same? My idea is first starting with a good configuration of lbfgs then increase the restart (m) to see how the convergence going (and save time). > > Thanks > Giang > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Tue May 25 17:30:27 2021 From: hgbk2008 at gmail.com (hg) Date: Wed, 26 May 2021 00:30:27 +0200 Subject: [petsc-users] divergence of quasi-Newton scheme (SNES) In-Reply-To: <6F1845FB-5206-4326-B672-7E79F279234A@petsc.dev> References: <6F1845FB-5206-4326-B672-7E79F279234A@petsc.dev> Message-ID: Thanks Barry, with -snes_qn_m 0 it gives exact convergence as -snes_type newtonls -snes_linesearch_type basic Giang On Tue, May 25, 2021 at 10:51 PM Barry Smith wrote: > > Yes, with your options I would expect the first SNES iteration of QN to > produce the same result as the first iteration of SNES Newton ls. > > I have fixed an error that crept (well actually I put it there) in where > KSPSetFromOptions() was not being called with the QN and Jacobian option, > hence only the default PC (ilu) was being used so if you changed the PC it > only affected Newton not QN. I also made it possible to run with a history > of length 0 so that with the Jacobian option it should exactly match Newton > for all iterations. > > You can access my fixes with > > get fetch > git checkout barry/2021-05-25/fix-qn-jacobian-setfromoptions/release > > The merge request with the fixes for release is here > https://gitlab.com/petsc/petsc/-/merge_requests/4018 > > I checked it with the runs below. Using first -pc_type lu then the > default PC. > > ~/Src/petsc/src/snes/tutorials* > (barry/2021-05-25/fix-qn-jacobian-setfromoptions/release *=)* > arch-fix-qn-jacobian-setfromoptions > $ ./ex19 -pc_type lu -snes_monitor -snes_type qn -snes_qn_type lbfgs > -snes_qn_m 0 -snes_qn_restart_type periodic -snes_qn_scale_type jacobian > -snes_linesearch_type basic > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > 0 SNES Function norm 2.391552133017e-01 > 1 SNES Function norm 6.819684624592e-05 > 2 SNES Function norm 4.203401869625e-12 > Number of SNES iterations = 2 > ~/Src/petsc/src/snes/tutorials* > (barry/2021-05-25/fix-qn-jacobian-setfromoptions/release *=)* > arch-fix-qn-jacobian-setfromoptions > $ ./ex19 -pc_type lu -snes_monitor -snes_linesearch_type basic > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > 0 SNES Function norm 2.391552133017e-01 > 1 SNES Function norm 6.819684624592e-05 > 2 SNES Function norm 4.203401869625e-12 > Number of SNES iterations = 2 > ~/Src/petsc/src/snes/tutorials* > (barry/2021-05-25/fix-qn-jacobian-setfromoptions/release *=)* > arch-fix-qn-jacobian-setfromoptions > $ ./ex19 -snes_monitor -snes_linesearch_type basic > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > 0 SNES Function norm 2.391552133017e-01 > 1 SNES Function norm 6.839858507066e-05 > 2 SNES Function norm 8.558777232425e-11 > Number of SNES iterations = 2 > ~/Src/petsc/src/snes/tutorials* > (barry/2021-05-25/fix-qn-jacobian-setfromoptions/release *=)* > arch-fix-qn-jacobian-setfromoptions > $ ./ex19 -snes_monitor -snes_type qn -snes_qn_type lbfgs -snes_qn_m 0 > -snes_qn_restart_type periodic -snes_qn_scale_type jacobian > -snes_linesearch_type basic > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > 0 SNES Function norm 2.391552133017e-01 > 1 SNES Function norm 6.839858507066e-05 > 2 SNES Function norm 8.558777232425e-11 > Number of SNES iterations = 2 > ~/Src/petsc/src/snes/tutorials* > (barry/2021-05-25/fix-qn-jacobian-setfromoptions/release *=)* > arch-fix-qn-jacobian-setfromoptions > $ ./ex19 -snes_monitor -snes_type qn -snes_qn_type lbfgs -snes_qn_m 1 > -snes_qn_restart_type periodic -snes_qn_scale_type jacobian > -snes_linesearch_type basic > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > 0 SNES Function norm 2.391552133017e-01 > 1 SNES Function norm 6.839858507066e-05 > 2 SNES Function norm 1.977614824765e-05 > 3 SNES Function norm 1.286952209377e-10 > Number of SNES iterations = 3 > > If you still have trouble with the branch you can run with -ksp_monitor > -snes_linesearch_monitor -info -snes_view to see what may be different with > your runs with Newton and with QN. > > Barry > > > > On May 25, 2021, at 12:15 PM, hg wrote: > > Hello > > I would expect the setting below would give the same behaviour like > -snes_type newtonls -snes_linesearch_type basic: > > -snes_type qn > -snes_qn_type lbfgs > -snes_qn_m 1 > -snes_qn_restart_type periodic > -snes_qn_scale_type jacobian > -snes_linesearch_type basic > > But it's not, below is the convergence log: > entering BuildRHS > 0 SNES Function norm 7.450427214612e+03 > entering BuildLHS > entering BuildRHS > 1 SNES Function norm 7.902262148182e+03 > entering BuildRHS > 2 SNES Function norm 8.426417730274e+03 > Periodic restart! i_r = 1 > entering BuildLHS > entering BuildRHS > 3 SNES Function norm 5.571513092130e+04 > entering BuildRHS > 4 SNES Function norm 4.019723509872e+05 > Periodic restart! i_r = 1 > entering BuildLHS > entering BuildRHS > 5 SNES Function norm 9.259722791615e+05 > entering BuildRHS > 6 SNES Function norm 3.985884724278e+08 > Nonlinear solve did not converge due to DIVERGED_DTOL iterations 6 > > For -snes_type newtonls -snes_linesearch_type basic: > > entering BuildRHS > 0 SNES Function norm 7.450427214612e+03 > entering BuildLHS > entering BuildRHS > 1 SNES Function norm 1.937109245338e+01 > entering BuildLHS > entering BuildRHS > 2 SNES Function norm 8.126736406257e-01 > entering BuildLHS > entering BuildRHS > 3 SNES Function norm 1.143237968970e-03 > entering BuildLHS > entering BuildRHS > 4 SNES Function norm 2.706184329411e-09 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 4 > > Would it be the parameters not the same? My idea is first starting with a > good configuration of lbfgs then increase the restart (m) to see how the > convergence going (and save time). > > Thanks > Giang > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue May 25 18:21:47 2021 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 25 May 2021 18:21:47 -0500 Subject: [petsc-users] divergence of quasi-Newton scheme (SNES) In-Reply-To: References: <6F1845FB-5206-4326-B672-7E79F279234A@petsc.dev> Message-ID: <153A5D2B-3E12-44D2-B542-A023EE2BD46C@petsc.dev> Excellant, thanks for letting us know. > On May 25, 2021, at 5:30 PM, hg wrote: > > Thanks Barry, with -snes_qn_m 0 it gives exact convergence as -snes_type newtonls -snes_linesearch_type basic > > Giang > > > On Tue, May 25, 2021 at 10:51 PM Barry Smith > wrote: > > Yes, with your options I would expect the first SNES iteration of QN to produce the same result as the first iteration of SNES Newton ls. > > I have fixed an error that crept (well actually I put it there) in where KSPSetFromOptions() was not being called with the QN and Jacobian option, hence only the default PC (ilu) was being used so if you changed the PC it only affected Newton not QN. I also made it possible to run with a history of length 0 so that with the Jacobian option it should exactly match Newton for all iterations. > > You can access my fixes with > > get fetch > git checkout barry/2021-05-25/fix-qn-jacobian-setfromoptions/release > > The merge request with the fixes for release is here https://gitlab.com/petsc/petsc/-/merge_requests/4018 > > I checked it with the runs below. Using first -pc_type lu then the default PC. > > ~/Src/petsc/src/snes/tutorials (barry/2021-05-25/fix-qn-jacobian-setfromoptions/release *=) arch-fix-qn-jacobian-setfromoptions > $ ./ex19 -pc_type lu -snes_monitor -snes_type qn -snes_qn_type lbfgs -snes_qn_m 0 -snes_qn_restart_type periodic -snes_qn_scale_type jacobian -snes_linesearch_type basic > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > 0 SNES Function norm 2.391552133017e-01 > 1 SNES Function norm 6.819684624592e-05 > 2 SNES Function norm 4.203401869625e-12 > Number of SNES iterations = 2 > ~/Src/petsc/src/snes/tutorials (barry/2021-05-25/fix-qn-jacobian-setfromoptions/release *=) arch-fix-qn-jacobian-setfromoptions > $ ./ex19 -pc_type lu -snes_monitor -snes_linesearch_type basic > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > 0 SNES Function norm 2.391552133017e-01 > 1 SNES Function norm 6.819684624592e-05 > 2 SNES Function norm 4.203401869625e-12 > Number of SNES iterations = 2 > ~/Src/petsc/src/snes/tutorials (barry/2021-05-25/fix-qn-jacobian-setfromoptions/release *=) arch-fix-qn-jacobian-setfromoptions > $ ./ex19 -snes_monitor -snes_linesearch_type basic > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > 0 SNES Function norm 2.391552133017e-01 > 1 SNES Function norm 6.839858507066e-05 > 2 SNES Function norm 8.558777232425e-11 > Number of SNES iterations = 2 > ~/Src/petsc/src/snes/tutorials (barry/2021-05-25/fix-qn-jacobian-setfromoptions/release *=) arch-fix-qn-jacobian-setfromoptions > $ ./ex19 -snes_monitor -snes_type qn -snes_qn_type lbfgs -snes_qn_m 0 -snes_qn_restart_type periodic -snes_qn_scale_type jacobian -snes_linesearch_type basic > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > 0 SNES Function norm 2.391552133017e-01 > 1 SNES Function norm 6.839858507066e-05 > 2 SNES Function norm 8.558777232425e-11 > Number of SNES iterations = 2 > ~/Src/petsc/src/snes/tutorials (barry/2021-05-25/fix-qn-jacobian-setfromoptions/release *=) arch-fix-qn-jacobian-setfromoptions > $ ./ex19 -snes_monitor -snes_type qn -snes_qn_type lbfgs -snes_qn_m 1 -snes_qn_restart_type periodic -snes_qn_scale_type jacobian -snes_linesearch_type basic > lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > 0 SNES Function norm 2.391552133017e-01 > 1 SNES Function norm 6.839858507066e-05 > 2 SNES Function norm 1.977614824765e-05 > 3 SNES Function norm 1.286952209377e-10 > Number of SNES iterations = 3 > > If you still have trouble with the branch you can run with -ksp_monitor -snes_linesearch_monitor -info -snes_view to see what may be different with your runs with Newton and with QN. > > Barry > > >> On May 25, 2021, at 12:15 PM, hg > wrote: >> >> Hello >> >> I would expect the setting below would give the same behaviour like -snes_type newtonls -snes_linesearch_type basic: >> >> -snes_type qn >> -snes_qn_type lbfgs >> -snes_qn_m 1 >> -snes_qn_restart_type periodic >> -snes_qn_scale_type jacobian >> -snes_linesearch_type basic >> >> But it's not, below is the convergence log: >> entering BuildRHS >> 0 SNES Function norm 7.450427214612e+03 >> entering BuildLHS >> entering BuildRHS >> 1 SNES Function norm 7.902262148182e+03 >> entering BuildRHS >> 2 SNES Function norm 8.426417730274e+03 >> Periodic restart! i_r = 1 >> entering BuildLHS >> entering BuildRHS >> 3 SNES Function norm 5.571513092130e+04 >> entering BuildRHS >> 4 SNES Function norm 4.019723509872e+05 >> Periodic restart! i_r = 1 >> entering BuildLHS >> entering BuildRHS >> 5 SNES Function norm 9.259722791615e+05 >> entering BuildRHS >> 6 SNES Function norm 3.985884724278e+08 >> Nonlinear solve did not converge due to DIVERGED_DTOL iterations 6 >> >> For -snes_type newtonls -snes_linesearch_type basic: >> >> entering BuildRHS >> 0 SNES Function norm 7.450427214612e+03 >> entering BuildLHS >> entering BuildRHS >> 1 SNES Function norm 1.937109245338e+01 >> entering BuildLHS >> entering BuildRHS >> 2 SNES Function norm 8.126736406257e-01 >> entering BuildLHS >> entering BuildRHS >> 3 SNES Function norm 1.143237968970e-03 >> entering BuildLHS >> entering BuildRHS >> 4 SNES Function norm 2.706184329411e-09 >> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 4 >> >> Would it be the parameters not the same? My idea is first starting with a good configuration of lbfgs then increase the restart (m) to see how the convergence going (and save time). >> >> Thanks >> Giang >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brardafrancesco at gmail.com Wed May 26 05:20:17 2021 From: brardafrancesco at gmail.com (Francesco Brarda) Date: Wed, 26 May 2021 12:20:17 +0200 Subject: [petsc-users] Collect Trajectories components Message-ID: Hi! I solved an ODE system with TS. Now I would like to save one of the trajectories in specific times. To do so, I used TSTrajectoryGetVecs. The values of the variable I am interested in is on one processor. I want to collect these values in a parallel vector, but I had the error: [0]PETSC ERROR: Invalid argument [0]PETSC ERROR: Real value must be same on all processes, argument # 2 [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown [0]PETSC ERROR: ./petsc_sir on a arch-debug named srvulx13 by fbrarda Wed May 26 12:00:42 2021 [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc --download-mpich PETSC_ARCH=arch-debug [0]PETSC ERROR: #1 TSHistoryGetLocFromTime() line 134 in /home/fbrarda/petsc/src/ts/interface/tshistory.c [0]PETSC ERROR: #2 TSTrajectoryReconstruct_Private() line 55 in /home/fbrarda/petsc/src/ts/trajectory/utils/reconstruct.c [0]PETSC ERROR: #3 TSTrajectoryGetVecs() line 239 in /home/fbrarda/petsc/src/ts/trajectory/interface/traj.c Is there any specific routine I can use to overcome this issue? Should I use VecScatter? I hope I made myself clear. Best, Francesco -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed May 26 09:20:47 2021 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 26 May 2021 09:20:47 -0500 Subject: [petsc-users] Collect Trajectories components In-Reply-To: References: Message-ID: <8F422055-6AB2-4BE9-8845-63955BC0D3B9@petsc.dev> Based on the error message it appears that your code is requesting different times on different MPI ranks. Is that what you intend to do? TSTrajectoryGetVecs() is listed as Collective on TS. This means all ranks must call it with the same times in the same order of operations on all ranks that share the TS. You do not need to use VecScatter. Each process must call TSTrajectoryGetVecs with the same time but then you can have only the rank you care about select the entries from the resulting vectors you care about while the other ranks for that time just ignore the vectors since they do not need to values from it. Barry > On May 26, 2021, at 5:20 AM, Francesco Brarda wrote: > > Hi! > > I solved an ODE system with TS. Now I would like to save one of the trajectories in specific times. To do so, I used TSTrajectoryGetVecs. > The values of the variable I am interested in is on one processor. I want to collect these values in a parallel vector, but I had the error: > > [0]PETSC ERROR: Invalid argument > [0]PETSC ERROR: Real value must be same on all processes, argument # 2 > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown > [0]PETSC ERROR: ./petsc_sir on a arch-debug named srvulx13 by fbrarda Wed May 26 12:00:42 2021 > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc --download-mpich PETSC_ARCH=arch-debug > [0]PETSC ERROR: #1 TSHistoryGetLocFromTime() line 134 in /home/fbrarda/petsc/src/ts/interface/tshistory.c > [0]PETSC ERROR: #2 TSTrajectoryReconstruct_Private() line 55 in /home/fbrarda/petsc/src/ts/trajectory/utils/reconstruct.c > [0]PETSC ERROR: #3 TSTrajectoryGetVecs() line 239 in /home/fbrarda/petsc/src/ts/trajectory/interface/traj.c > > Is there any specific routine I can use to overcome this issue? Should I use VecScatter? > > I hope I made myself clear. > Best, > Francesco -------------- next part -------------- An HTML attachment was scrubbed... URL: From brardafrancesco at gmail.com Wed May 26 10:39:23 2021 From: brardafrancesco at gmail.com (Francesco Brarda) Date: Wed, 26 May 2021 17:39:23 +0200 Subject: [petsc-users] Collect Trajectories components In-Reply-To: <8F422055-6AB2-4BE9-8845-63955BC0D3B9@petsc.dev> References: <8F422055-6AB2-4BE9-8845-63955BC0D3B9@petsc.dev> Message-ID: Thank you very much. > Based on the error message it appears that your code is requesting different times on different MPI ranks. Is that what you intend to do? Yes. I want to save different times across a vector built with multiple MPI ranks (PETSC_DECIDE for the local length). The function is called only by the first proc (rank=0) and not from the others. Is there a way to force also other ranks to call that routine? Should I build everything into an external function outside the main? Francesco > Il giorno 26 mag 2021, alle ore 16:20, Barry Smith ha scritto: > > > > > TSTrajectoryGetVecs() is listed as Collective on TS. This means all ranks must call it with the same times in the same order of operations on all ranks that share the TS. > > You do not need to use VecScatter. Each process must call TSTrajectoryGetVecs with the same time but then you can have only the rank you care about select the entries from the resulting vectors you care about while the other ranks for that time just ignore the vectors since they do not need to values from it. > > Barry > > > >> On May 26, 2021, at 5:20 AM, Francesco Brarda > wrote: >> >> Hi! >> >> I solved an ODE system with TS. Now I would like to save one of the trajectories in specific times. To do so, I used TSTrajectoryGetVecs. >> The values of the variable I am interested in is on one processor. I want to collect these values in a parallel vector, but I had the error: >> >> [0]PETSC ERROR: Invalid argument >> [0]PETSC ERROR: Real value must be same on all processes, argument # 2 >> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown >> [0]PETSC ERROR: ./petsc_sir on a arch-debug named srvulx13 by fbrarda Wed May 26 12:00:42 2021 >> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc --download-mpich PETSC_ARCH=arch-debug >> [0]PETSC ERROR: #1 TSHistoryGetLocFromTime() line 134 in /home/fbrarda/petsc/src/ts/interface/tshistory.c >> [0]PETSC ERROR: #2 TSTrajectoryReconstruct_Private() line 55 in /home/fbrarda/petsc/src/ts/trajectory/utils/reconstruct.c >> [0]PETSC ERROR: #3 TSTrajectoryGetVecs() line 239 in /home/fbrarda/petsc/src/ts/trajectory/interface/traj.c >> >> Is there any specific routine I can use to overcome this issue? Should I use VecScatter? >> >> I hope I made myself clear. >> Best, >> Francesco > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed May 26 15:28:40 2021 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 26 May 2021 16:28:40 -0400 Subject: [petsc-users] Did CUDA break again? Message-ID: I started to get this error today on Cori. nvcc fatal : Unsupported gpu architecture 'compute_1120' I am pretty sure I had a clean build but I can redo it if you don't know where this is from. Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 736951 bytes Desc: not available URL: From jacob.fai at gmail.com Wed May 26 15:47:09 2021 From: jacob.fai at gmail.com (Jacob Faibussowitsch) Date: Wed, 26 May 2021 22:47:09 +0200 Subject: [petsc-users] Did CUDA break again? Message-ID: ?1120 sounds suspiciously like some CUDA version rather than architecture or compute capability? Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) Cell: +1 (312) 694-3391 > On May 26, 2021, at 22:29, Mark Adams wrote: > ? > I started to get this error today on Cori. > > nvcc fatal : Unsupported gpu architecture 'compute_1120' > > I am pretty sure I had a clean build but I can redo it if you don't know where this is from. > > Thanks, > Mark > From bsmith at petsc.dev Wed May 26 17:31:11 2021 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 26 May 2021 17:31:11 -0500 Subject: [petsc-users] Did CUDA break again? In-Reply-To: References: Message-ID: <3FE5D21E-B0A9-4014-A742-1CFB05947CE0@petsc.dev> Yes, this code which I guess never got hit before cudaDeviceProp dp; cudaGetDeviceProperties(&dp, 0); printf("%d\n",10*dp.major+dp.minor); return(0);; is using the wrong property for the generation. Back to the CUDA documentation for the correct information. > On May 26, 2021, at 3:47 PM, Jacob Faibussowitsch wrote: > > 1120 sounds suspiciously like some CUDA version rather than architecture or compute capability? > > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > Cell: +1 (312) 694-3391 > >> On May 26, 2021, at 22:29, Mark Adams wrote: >> ? >> I started to get this error today on Cori. >> >> nvcc fatal : Unsupported gpu architecture 'compute_1120' >> >> I am pretty sure I had a clean build but I can redo it if you don't know where this is from. >> >> Thanks, >> Mark >> From bsmith at petsc.dev Wed May 26 18:13:24 2021 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 26 May 2021 18:13:24 -0500 Subject: [petsc-users] Did CUDA break again? In-Reply-To: <3FE5D21E-B0A9-4014-A742-1CFB05947CE0@petsc.dev> References: <3FE5D21E-B0A9-4014-A742-1CFB05947CE0@petsc.dev> Message-ID: <651688A2-EE23-494A-8825-DD305687512F@petsc.dev> What is HOST=cori09 Does it have GPUs? https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_164490976c8e07e028a8f1ce1f5cd42d6 Seems to clearly state int cudaDeviceProp ::major [inherited] Major compute capability Mark, please compile and run this program on the machine you are running configure on #include #include #include #include #include int main(int arg,char **args) { struct cudaDeviceProp dp; cudaGetDeviceProperties(&dp, 0); printf("%d\n",10*dp.major+dp.minor); int major,minor; cuDeviceGetAttribute(&major, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, 0); cuDeviceGetAttribute(&minor, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, 0); printf("%d\n",10*major+minor); return(0); } This is what I get $ nvcc mytest.c -lcuda ~/petsc (main=) arch-main $ ./a.out 70 70 Which is exactly what it is suppose to do. Barry > On May 26, 2021, at 5:31 PM, Barry Smith wrote: > > > Yes, this code which I guess never got hit before > > cudaDeviceProp dp; cudaGetDeviceProperties(&dp, 0); printf("%d\n",10*dp.major+dp.minor); return(0);; > > is using the wrong property for the generation. > > Back to the CUDA documentation for the correct information. > > > >> On May 26, 2021, at 3:47 PM, Jacob Faibussowitsch wrote: >> >> 1120 sounds suspiciously like some CUDA version rather than architecture or compute capability? >> >> Best regards, >> >> Jacob Faibussowitsch >> (Jacob Fai - booss - oh - vitch) >> Cell: +1 (312) 694-3391 >> >>> On May 26, 2021, at 22:29, Mark Adams wrote: >>> ? >>> I started to get this error today on Cori. >>> >>> nvcc fatal : Unsupported gpu architecture 'compute_1120' >>> >>> I am pretty sure I had a clean build but I can redo it if you don't know where this is from. >>> >>> Thanks, >>> Mark >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed May 26 18:15:11 2021 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 26 May 2021 18:15:11 -0500 Subject: [petsc-users] Collect Trajectories components In-Reply-To: References: <8F422055-6AB2-4BE9-8845-63955BC0D3B9@petsc.dev> Message-ID: <7D4A509B-2D00-4F96-AE3F-0705D1038CFA@petsc.dev> > On May 26, 2021, at 10:39 AM, Francesco Brarda wrote: > > Thank you very much. >> Based on the error message it appears that your code is requesting different times on different MPI ranks. Is that what you intend to do? > Yes. I want to save different times across a vector built with multiple MPI ranks (PETSC_DECIDE for the local length). > The function is called only by the first proc (rank=0) and not from the others. Is there a way to force also other ranks to call that routine? Yes, just have all ranks call it and ignore the result on the other ranks. > Should I build everything into an external function outside the main? It can be called in main, does not need to be in a different function. > > Francesco > > >> Il giorno 26 mag 2021, alle ore 16:20, Barry Smith > ha scritto: >> >> >> >> >> TSTrajectoryGetVecs() is listed as Collective on TS. This means all ranks must call it with the same times in the same order of operations on all ranks that share the TS. >> >> You do not need to use VecScatter. Each process must call TSTrajectoryGetVecs with the same time but then you can have only the rank you care about select the entries from the resulting vectors you care about while the other ranks for that time just ignore the vectors since they do not need to values from it. >> >> Barry >> >> >> >>> On May 26, 2021, at 5:20 AM, Francesco Brarda > wrote: >>> >>> Hi! >>> >>> I solved an ODE system with TS. Now I would like to save one of the trajectories in specific times. To do so, I used TSTrajectoryGetVecs. >>> The values of the variable I am interested in is on one processor. I want to collect these values in a parallel vector, but I had the error: >>> >>> [0]PETSC ERROR: Invalid argument >>> [0]PETSC ERROR: Real value must be same on all processes, argument # 2 >>> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>> [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown >>> [0]PETSC ERROR: ./petsc_sir on a arch-debug named srvulx13 by fbrarda Wed May 26 12:00:42 2021 >>> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc --download-mpich PETSC_ARCH=arch-debug >>> [0]PETSC ERROR: #1 TSHistoryGetLocFromTime() line 134 in /home/fbrarda/petsc/src/ts/interface/tshistory.c >>> [0]PETSC ERROR: #2 TSTrajectoryReconstruct_Private() line 55 in /home/fbrarda/petsc/src/ts/trajectory/utils/reconstruct.c >>> [0]PETSC ERROR: #3 TSTrajectoryGetVecs() line 239 in /home/fbrarda/petsc/src/ts/trajectory/interface/traj.c >>> >>> Is there any specific routine I can use to overcome this issue? Should I use VecScatter? >>> >>> I hope I made myself clear. >>> Best, >>> Francesco >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed May 26 20:56:01 2021 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 26 May 2021 21:56:01 -0400 Subject: [petsc-users] Did CUDA break again? In-Reply-To: <651688A2-EE23-494A-8825-DD305687512F@petsc.dev> References: <3FE5D21E-B0A9-4014-A742-1CFB05947CE0@petsc.dev> <651688A2-EE23-494A-8825-DD305687512F@petsc.dev> Message-ID: On Wed, May 26, 2021 at 7:13 PM Barry Smith wrote: > > What is HOST=cori09 Does it have GPUs? > That is the login node that I got the interactive compute node from. This is my node: SLURM_JOB_NODELIST=cgpu02 > > > https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_164490976c8e07e028a8f1ce1f5cd42d6 > > Seems to clearly state > > int cudaDeviceProp > > ::major > > [inherited] > > Major compute capability > > > Mark, please compile and run this program on the machine you are running > configure on > > #include > #include > #include > #include > #include > int main(int arg,char **args) > { > struct cudaDeviceProp dp; > cudaGetDeviceProperties(&dp, 0); > printf("%d\n",10*dp.major+dp.minor); > > int major,minor; > cuDeviceGetAttribute(&major, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, > 0); > cuDeviceGetAttribute(&minor, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, > 0); > printf("%d\n",10*major+minor); > return(0); > } > > This is what I get > > $ nvcc mytest.c -lcuda > ~/petsc* (main=)* arch-main > $ ./a.out > 70 > 70 > > This compiled and ran fine, but the output is wrong: 18:04 cgpu02 ~/petsc_install$ ./a.out 1120 -1431545180 It looks to me like there was a regression. I am running git bisect now. 7 more steps. -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Wed May 26 21:10:02 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 26 May 2021 21:10:02 -0500 Subject: [petsc-users] Did CUDA break again? In-Reply-To: <651688A2-EE23-494A-8825-DD305687512F@petsc.dev> References: <3FE5D21E-B0A9-4014-A742-1CFB05947CE0@petsc.dev> <651688A2-EE23-494A-8825-DD305687512F@petsc.dev> Message-ID: On Wed, May 26, 2021 at 6:13 PM Barry Smith wrote: > > What is HOST=cori09 Does it have GPUs? > > > https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_164490976c8e07e028a8f1ce1f5cd42d6 > > Seems to clearly state > > int cudaDeviceProp > > ::major > > [inherited] > > Major compute capability > > > Mark, please compile and run this program on the machine you are running > configure on > > #include > #include > #include > #include > #include > int main(int arg,char **args) > { > struct cudaDeviceProp dp; > cudaGetDeviceProperties(&dp, 0); > printf("%d\n",10*dp.major+dp.minor); > > int major,minor; > cuDeviceGetAttribute(&major, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, > 0); > cuDeviceGetAttribute(&minor, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, > 0); > printf("%d\n",10*major+minor); > return(0); > Probably, you need to check the return code of these two function calls to make sure they are correct. > } > > This is what I get > > $ nvcc mytest.c -lcuda > ~/petsc* (main=)* arch-main > $ ./a.out > 70 > 70 > > Which is exactly what it is suppose to do. > > Barry > > On May 26, 2021, at 5:31 PM, Barry Smith wrote: > > > Yes, this code which I guess never got hit before > > cudaDeviceProp dp; cudaGetDeviceProperties(&dp, 0); > printf("%d\n",10*dp.major+dp.minor); > return(0);; > > is using the wrong property for the generation. > > Back to the CUDA documentation for the correct information. > > > > On May 26, 2021, at 3:47 PM, Jacob Faibussowitsch > wrote: > > 1120 sounds suspiciously like some CUDA version rather than architecture > or compute capability? > > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > Cell: +1 (312) 694-3391 > > On May 26, 2021, at 22:29, Mark Adams wrote: > ? > I started to get this error today on Cori. > > nvcc fatal : Unsupported gpu architecture 'compute_1120' > > I am pretty sure I had a clean build but I can redo it if you don't know > where this is from. > > Thanks, > Mark > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed May 26 21:21:08 2021 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 26 May 2021 22:21:08 -0400 Subject: [petsc-users] Did CUDA break again? In-Reply-To: References: <3FE5D21E-B0A9-4014-A742-1CFB05947CE0@petsc.dev> <651688A2-EE23-494A-8825-DD305687512F@petsc.dev> Message-ID: I had git bisect working and was 4 steps away when I got a new crash. configure.log is empty. 19:15 1 cgpu02 (a531cba26b...)|BISECTING ~/petsc$ git bisect bad Bisecting: 19 revisions left to test after this (roughly 4 steps) [149e269f455574fbe8ce3ebaf42121ae7fdf0635] Merge branch 'tisaac/feature-spqr' into 'main' 19:16 cgpu02 (149e269f45...)|BISECTING ~/petsc$ ../arch-cori-gpu-opt-gcc.py PETSC_DIR=$PWD =============================================================================== Configuring PETSc to compile on your system =============================================================================== ******************************************************************************* CONFIGURATION CRASH (Please send configure.log to petsc-maint at mcs.anl.gov) ******************************************************************************* EOL while scanning string literal (cuda.py, line 176) File "/global/u2/m/madams/petsc/config/configure.py", line 455, in petsc_configure framework = config.framework.Framework(['--configModules=PETSc.Configure','--optionsModule=config.compilerOptions']+sys.argv[1:], loadArgDB = 0) File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 107, in __init__ self.createChildren() File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 344, in createChildren self.getChild(moduleName) File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 329, in getChild config.setupDependencies(self) File "/global/u2/m/madams/petsc/config/PETSc/Configure.py", line 80, in setupDependencies self.blasLapack = framework.require('config.packages.BlasLapack',self) File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 349, in require config = self.getChild(moduleName, keywordArgs) File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 329, in getChild config.setupDependencies(self) File "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/BlasLapack.py", line 21, in setupDependencies config.package.Package.setupDependencies(self, framework) File "/global/u2/m/madams/petsc/config/BuildSystem/config/package.py", line 151, in setupDependencies self.mpi = framework.require('config.packages.MPI',self) File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 349, in require config = self.getChild(moduleName, keywordArgs) File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 329, in getChild config.setupDependencies(self) File "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/MPI.py", line 73, in setupDependencies self.mpich = framework.require('config.packages.MPICH', self) File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 349, in require config = self.getChild(moduleName, keywordArgs) File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 329, in getChild config.setupDependencies(self) File "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/MPICH.py", line 16, in setupDependencies self.cuda = framework.require('config.packages.cuda',self) File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 349, in require config = self.getChild(moduleName, keywordArgs) File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 302, in getChild type = __import__(moduleName, globals(), locals(), ['Configure']).Configure 19:16 cgpu02 (149e269f45...)|BISECTING ~/petsc$ ../arch-cori-gpu-opt-gcc.py PETSC_DIR=$PWD On Wed, May 26, 2021 at 10:10 PM Junchao Zhang wrote: > > > > On Wed, May 26, 2021 at 6:13 PM Barry Smith wrote: > >> >> What is HOST=cori09 Does it have GPUs? >> >> >> https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_164490976c8e07e028a8f1ce1f5cd42d6 >> >> Seems to clearly state >> >> int cudaDeviceProp >> >> ::major >> >> [inherited] >> >> Major compute capability >> >> >> Mark, please compile and run this program on the machine you are running >> configure on >> >> #include >> #include >> #include >> #include >> #include >> int main(int arg,char **args) >> { >> struct cudaDeviceProp dp; >> cudaGetDeviceProperties(&dp, 0); >> printf("%d\n",10*dp.major+dp.minor); >> >> int major,minor; >> cuDeviceGetAttribute(&major, >> CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, 0); >> cuDeviceGetAttribute(&minor, >> CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, 0); >> printf("%d\n",10*major+minor); >> return(0); >> > Probably, you need to check the return code of these two function calls to > make sure they are correct. > > >> } >> >> This is what I get >> >> $ nvcc mytest.c -lcuda >> ~/petsc* (main=)* arch-main >> $ ./a.out >> 70 >> 70 >> >> Which is exactly what it is suppose to do. >> >> Barry >> >> On May 26, 2021, at 5:31 PM, Barry Smith wrote: >> >> >> Yes, this code which I guess never got hit before >> >> cudaDeviceProp dp; cudaGetDeviceProperties(&dp, 0); >> printf("%d\n",10*dp.major+dp.minor); >> return(0);; >> >> is using the wrong property for the generation. >> >> Back to the CUDA documentation for the correct information. >> >> >> >> On May 26, 2021, at 3:47 PM, Jacob Faibussowitsch >> wrote: >> >> 1120 sounds suspiciously like some CUDA version rather than architecture >> or compute capability? >> >> Best regards, >> >> Jacob Faibussowitsch >> (Jacob Fai - booss - oh - vitch) >> Cell: +1 (312) 694-3391 >> >> On May 26, 2021, at 22:29, Mark Adams wrote: >> ? >> I started to get this error today on Cori. >> >> nvcc fatal : Unsupported gpu architecture 'compute_1120' >> >> I am pretty sure I had a clean build but I can redo it if you don't know >> where this is from. >> >> Thanks, >> Mark >> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From brardafrancesco at gmail.com Thu May 27 02:42:59 2021 From: brardafrancesco at gmail.com (Francesco Brarda) Date: Thu, 27 May 2021 09:42:59 +0200 Subject: [petsc-users] Collect Trajectories components In-Reply-To: <7D4A509B-2D00-4F96-AE3F-0705D1038CFA@petsc.dev> References: <8F422055-6AB2-4BE9-8845-63955BC0D3B9@petsc.dev> <7D4A509B-2D00-4F96-AE3F-0705D1038CFA@petsc.dev> Message-ID: I created a for cycle where I call TSTrajectoryGetVecs, but only the 0 rank seems to enter in this cycle and I do not know why. I thought the following might be a solution, but it is not working as I would like to, i.e. the final vector has the same local parts, a copy of the values obtained with the 0-rank. How should I change this, please? Vec U, partial, Uloc; PetscScalar *Ui, *partlocal; PetscInt i; ierr = VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,14,&partial);CHKERRQ(ierr); for(i=0; i<40; i++) { PetscReal ttime = i+1; ierr = TSTrajectoryGetVecs(appctx.tj,appctx.ts,PETSC_DECIDE,&ttime,U,NULL);CHKERRQ(ierr); ierr = DMGetLocalVector(appctx.da,&Uloc);CHKERRQ(ierr); ierr = DMGlobalToLocalBegin(appctx.da,U,INSERT_VALUES,Uloc);CHKERRQ(ierr); ierr = DMGlobalToLocalEnd(appctx.da,U,INSERT_VALUES,Uloc);CHKERRQ(ierr); ierr = DMDAVecGetArray(appctx.da,Uloc,&Ui);CHKERRQ(ierr); ierr = VecGetArray(partial,&partlocal);CHKERRQ(ierr); partlocal[i] = Ui[1]; ierr = DMDAVecRestoreArray(appctx.da,U,&Ui);CHKERRQ(ierr); ierr = VecRestoreArray(partial,&partlocal);CHKERRQ(ierr); ierr = DMRestoreLocalVector(appctx.da,&Uloc);CHKERRQ(ierr); } > Il giorno 27 mag 2021, alle ore 01:15, Barry Smith ha scritto: > > > >> On May 26, 2021, at 10:39 AM, Francesco Brarda wrote: >> >> Thank you very much. >>> Based on the error message it appears that your code is requesting different times on different MPI ranks. Is that what you intend to do? >> Yes. I want to save different times across a vector built with multiple MPI ranks (PETSC_DECIDE for the local length). >> The function is called only by the first proc (rank=0) and not from the others. Is there a way to force also other ranks to call that routine? > > Yes, just have all ranks call it and ignore the result on the other ranks. > >> Should I build everything into an external function outside the main? > > It can be called in main, does not need to be in a different function. > >> >> Francesco >> >> >>> Il giorno 26 mag 2021, alle ore 16:20, Barry Smith ha scritto: >>> >>> >>> >>> >>> TSTrajectoryGetVecs() is listed as Collective on TS. This means all ranks must call it with the same times in the same order of operations on all ranks that share the TS. >>> >>> You do not need to use VecScatter. Each process must call TSTrajectoryGetVecs with the same time but then you can have only the rank you care about select the entries from the resulting vectors you care about while the other ranks for that time just ignore the vectors since they do not need to values from it. >>> >>> Barry >>> >>> >>> >>>> On May 26, 2021, at 5:20 AM, Francesco Brarda wrote: >>>> >>>> Hi! >>>> >>>> I solved an ODE system with TS. Now I would like to save one of the trajectories in specific times. To do so, I used TSTrajectoryGetVecs. >>>> The values of the variable I am interested in is on one processor. I want to collect these values in a parallel vector, but I had the error: >>>> >>>> [0]PETSC ERROR: Invalid argument >>>> [0]PETSC ERROR: Real value must be same on all processes, argument # 2 >>>> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>> [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown >>>> [0]PETSC ERROR: ./petsc_sir on a arch-debug named srvulx13 by fbrarda Wed May 26 12:00:42 2021 >>>> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc --download-mpich PETSC_ARCH=arch-debug >>>> [0]PETSC ERROR: #1 TSHistoryGetLocFromTime() line 134 in /home/fbrarda/petsc/src/ts/interface/tshistory.c >>>> [0]PETSC ERROR: #2 TSTrajectoryReconstruct_Private() line 55 in /home/fbrarda/petsc/src/ts/trajectory/utils/reconstruct.c >>>> [0]PETSC ERROR: #3 TSTrajectoryGetVecs() line 239 in /home/fbrarda/petsc/src/ts/trajectory/interface/traj.c >>>> >>>> Is there any specific routine I can use to overcome this issue? Should I use VecScatter? >>>> >>>> I hope I made myself clear. >>>> Best, >>>> Francesco >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu May 27 05:45:34 2021 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 27 May 2021 06:45:34 -0400 Subject: [petsc-users] Did CUDA break again? In-Reply-To: References: <3FE5D21E-B0A9-4014-A742-1CFB05947CE0@petsc.dev> <651688A2-EE23-494A-8825-DD305687512F@petsc.dev> Message-ID: FYI, I was running the test incorrectly: 03:38 cgpu12 ~/petsc_install$ srun -n 1 -G 1 ./a.out 70 70 On Wed, May 26, 2021 at 10:21 PM Mark Adams wrote: > I had git bisect working and was 4 steps away when I got a new crash. > configure.log is empty. > > 19:15 1 cgpu02 (a531cba26b...)|BISECTING ~/petsc$ git bisect bad > Bisecting: 19 revisions left to test after this (roughly 4 steps) > [149e269f455574fbe8ce3ebaf42121ae7fdf0635] Merge branch > 'tisaac/feature-spqr' into 'main' > 19:16 cgpu02 (149e269f45...)|BISECTING ~/petsc$ > ../arch-cori-gpu-opt-gcc.py PETSC_DIR=$PWD > > =============================================================================== > Configuring PETSc to compile on your system > > > =============================================================================== > > ******************************************************************************* > CONFIGURATION CRASH (Please send configure.log to > petsc-maint at mcs.anl.gov) > > ******************************************************************************* > > EOL while scanning string literal (cuda.py, line 176) > File "/global/u2/m/madams/petsc/config/configure.py", line 455, in > petsc_configure > framework = > config.framework.Framework(['--configModules=PETSc.Configure','--optionsModule=config.compilerOptions']+sys.argv[1:], > loadArgDB = 0) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", > line 107, in __init__ > self.createChildren() > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", > line 344, in createChildren > self.getChild(moduleName) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", > line 329, in getChild > config.setupDependencies(self) > File "/global/u2/m/madams/petsc/config/PETSc/Configure.py", line 80, in > setupDependencies > self.blasLapack = > framework.require('config.packages.BlasLapack',self) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", > line 349, in require > config = self.getChild(moduleName, keywordArgs) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", > line 329, in getChild > config.setupDependencies(self) > File > "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/BlasLapack.py", > line 21, in setupDependencies > config.package.Package.setupDependencies(self, framework) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/package.py", > line 151, in setupDependencies > self.mpi = framework.require('config.packages.MPI',self) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", > line 349, in require > config = self.getChild(moduleName, keywordArgs) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", > line 329, in getChild > config.setupDependencies(self) > File > "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/MPI.py", line > 73, in setupDependencies > self.mpich = framework.require('config.packages.MPICH', self) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", > line 349, in require > config = self.getChild(moduleName, keywordArgs) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", > line 329, in getChild > config.setupDependencies(self) > File > "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/MPICH.py", > line 16, in setupDependencies > self.cuda = framework.require('config.packages.cuda',self) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", > line 349, in require > config = self.getChild(moduleName, keywordArgs) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", > line 302, in getChild > type = __import__(moduleName, globals(), locals(), > ['Configure']).Configure > 19:16 cgpu02 (149e269f45...)|BISECTING ~/petsc$ > ../arch-cori-gpu-opt-gcc.py PETSC_DIR=$PWD > > On Wed, May 26, 2021 at 10:10 PM Junchao Zhang > wrote: > >> >> >> >> On Wed, May 26, 2021 at 6:13 PM Barry Smith wrote: >> >>> >>> What is HOST=cori09 Does it have GPUs? >>> >>> >>> https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_164490976c8e07e028a8f1ce1f5cd42d6 >>> >>> Seems to clearly state >>> >>> int cudaDeviceProp >>> >>> ::major >>> >>> [inherited] >>> >>> Major compute capability >>> >>> >>> Mark, please compile and run this program on the machine you are running >>> configure on >>> >>> #include >>> #include >>> #include >>> #include >>> #include >>> int main(int arg,char **args) >>> { >>> struct cudaDeviceProp dp; >>> cudaGetDeviceProperties(&dp, 0); >>> printf("%d\n",10*dp.major+dp.minor); >>> >>> int major,minor; >>> cuDeviceGetAttribute(&major, >>> CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, 0); >>> cuDeviceGetAttribute(&minor, >>> CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, 0); >>> printf("%d\n",10*major+minor); >>> return(0); >>> >> Probably, you need to check the return code of these two function calls >> to make sure they are correct. >> >> >>> } >>> >>> This is what I get >>> >>> $ nvcc mytest.c -lcuda >>> ~/petsc* (main=)* arch-main >>> $ ./a.out >>> 70 >>> 70 >>> >>> Which is exactly what it is suppose to do. >>> >>> Barry >>> >>> On May 26, 2021, at 5:31 PM, Barry Smith wrote: >>> >>> >>> Yes, this code which I guess never got hit before >>> >>> cudaDeviceProp dp; cudaGetDeviceProperties(&dp, 0); >>> printf("%d\n",10*dp.major+dp.minor); >>> return(0);; >>> >>> is using the wrong property for the generation. >>> >>> Back to the CUDA documentation for the correct information. >>> >>> >>> >>> On May 26, 2021, at 3:47 PM, Jacob Faibussowitsch >>> wrote: >>> >>> 1120 sounds suspiciously like some CUDA version rather than architecture >>> or compute capability? >>> >>> Best regards, >>> >>> Jacob Faibussowitsch >>> (Jacob Fai - booss - oh - vitch) >>> Cell: +1 (312) 694-3391 >>> >>> On May 26, 2021, at 22:29, Mark Adams wrote: >>> ? >>> I started to get this error today on Cori. >>> >>> nvcc fatal : Unsupported gpu architecture 'compute_1120' >>> >>> I am pretty sure I had a clean build but I can redo it if you don't know >>> where this is from. >>> >>> Thanks, >>> Mark >>> >>> >>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sayosale at hotmail.com Thu May 27 06:00:34 2021 From: sayosale at hotmail.com (dazza simplythebest) Date: Thu, 27 May 2021 11:00:34 +0000 Subject: [petsc-users] Code hangs when calling PetscIntView (MPI, fortran) In-Reply-To: References: , Message-ID: Hi Barry, Apologies for the delay in getting back to you, the mail somehow missed my attention when it first arrived, but thanks for this tip - this would allow the PetscIntView to be used even when in practice only one process was actually writing out information. Thanks, Dan. ________________________________ From: Barry Smith Sent: Thursday, May 20, 2021 5:00 PM To: dazza simplythebest Cc: PETSc users list Subject: Re: [petsc-users] Code hangs when calling PetscIntView (MPI, fortran) You can also have the processes with no values print an array of length zero. Like if (rank3 == PROC_ROW) then ! IF mpi PROCESS OWNS THIS ROW THEN .. .. else NO_A_ENTRIES = 0 call PetscIntView(NO_A_ENTRIES,JALOC(1:NO_A_ENTRIES), & & PETSC_VIEWER_STDOUT_WORLD, ierr_pets) On May 20, 2021, at 5:31 AM, Matthew Knepley > wrote: On Thu, May 20, 2021 at 5:32 AM dazza simplythebest > wrote: Dear Jose, Many thanks for the prompt explanation - that would definitely explain what is going on, I will adjust my code accordingly . If you want to print different things from each process in parallel, I suggest https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscSynchronizedPrintf.html Thanks, Matt Thanks again, Dan. ________________________________ From: Jose E. Roman > Sent: Thursday, May 20, 2021 9:06 AM To: dazza simplythebest > Cc: PETSc users list > Subject: Re: [petsc-users] Code hangs when calling PetscIntView (MPI, fortran) If you look at the manpage https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscIntView.html you will see that PetscIntView() is collective. This means that all MPI processes must call this function, so it is forbidden to call it within an IF rank==... Jose > El 20 may 2021, a las 10:25, dazza simplythebest > escribi?: > > Dear All, > As part of preparing a code to call the SLEPC eigenvalue solving library, > I am constructing a matrix in sparse CSR format row-by-row. Just for debugging > purposes I write out the column values for a given row, which are stored in a > PetscInt allocatable vector, using PetscIntView. > > Everything works fine when the number of MPI processes exactly divide the > number of rows of the matrix, and so each process owns the same number of rows. > However, when the number of MPI processes does not exactly divide the > number of rows of the matrix, and so each process owns a different number of rows, > the code hangs when it reaches the line that calls PetscIntView. > To be precise the code hangs on the final row that a process, other than root, owns. > If I however comment out the call to PetscIntView the code completes without error, > and produces the correct eigenvalues (hence we are not missing a row / miswriting a row). > Note also that a simple direct writeout of this same array using a plain fortran command > will write out the array without problem. > > I have attached below a small code that reproduces the problem. > For this code we have nominally assigned 200 rows to our matrix. The code runs without > problem using 1,2,4,5,8 or 10 MPI processes, all of which precisely divide 200, > but will hang for 3 MPI processes for example. > For the case of 3 MPI processes the subroutine WHOSE_ROW_IS_IT allocates the rows > to each process as : > process no first row last row no. of rows > 0 1 66 66 > 1 67 133 67 > 2 134 200 67 > > The code will hang when process 1 calls PetscIntView for its last row, row 133 for example. > > One piece of additional information that may be relevant is that the code does run to completion > without hanging if I comment out the final slepc/MPI finalisation command > CALL SlepcFinalize(ierr_pets) > (I of course I get ' bad termination' errors, but the otherwise the run is successful.) > > I would appreciate it if anyone has any ideas on what is going wrong! > Many thanks, > Dan. > > > code: > > MODULE ALL_STAB_ROUTINES > IMPLICIT NONE > CONTAINS > > SUBROUTINE WHOSE_ROW_IS_IT(ROW_NO, TOTAL_NO_ROWS, NO_PROCESSES, & > & OWNER) > ! THIS ROUTINE ALLOCATES ROWS EVENLY BETWEEN mpi PROCESSES > #include > use slepceps > IMPLICIT NONE > PetscInt, INTENT(IN) :: ROW_NO, TOTAL_NO_ROWS, NO_PROCESSES > PetscInt, INTENT(OUT) :: OWNER > PetscInt :: P, REM > > P = TOTAL_NO_ROWS / NO_PROCESSES ! NOTE INTEGER DIVISION > REM = TOTAL_NO_ROWS - P*NO_PROCESSES > IF (ROW_NO < (NO_PROCESSES - REM)*P + 1 ) THEN > OWNER = (ROW_NO - 1)/P ! NOTE INTEGER DIVISION > ELSE > OWNER = ( ROW_NO + NO_PROCESSES - REM -1 )/(P+1) ! NOTE INTEGER DIVISION > ENDIF > END SUBROUTINE WHOSE_ROW_IS_IT > END MODULE ALL_STAB_ROUTINES > > > PROGRAM trialer > USE MPI > #include > use slepceps > USE ALL_STAB_ROUTINES > IMPLICIT NONE > PetscMPIInt rank3, total_mpi_size > PetscInt nl3, code, PROC_ROW, ISTATUS, jm, N_rows,NO_A_ENTRIES > PetscInt, ALLOCATABLE, DIMENSION(:) :: JALOC > PetscInt, PARAMETER :: ZERO = 0 , ONE = 1, TWO = 2, THREE = 3 > PetscErrorCode ierr_pets > > ! Initialise sleps/mpi > call SlepcInitialize(PETSC_NULL_CHARACTER,ierr_pets) ! note that this initialises MPI > call MPI_COMM_SIZE(MPI_COMM_WORLD, total_mpi_size, ierr_pets) !! find total no of MPI processes > nL3= total_mpi_size > call MPI_COMM_RANK(MPI_COMM_WORLD,rank3,ierr_pets) !! find my overall rank -> rank3 > write(*,*)'Welcome: PROCESS NO , TOTAL NO. OF PROCESSES = ',rank3, nl3 > > N_rows = 200 ! NUMBER OF ROWS OF A NOTIONAL MATRIX > NO_A_ENTRIES = 12 ! NUMBER OF ENTRIES FOR JALOC > > ! LOOP OVER ROWS > do jm = 1, N_rows > > CALL whose_row_is_it(JM, N_rows , NL3, PROC_ROW) ! FIND OUT WHICH PROCESS OWNS ROW > if (rank3 == PROC_ROW) then ! IF mpi PROCESS OWNS THIS ROW THEN .. > ! ALLOCATE jaloc ARRAY AND INITIALISE > > allocate(jaloc(NO_A_ENTRIES), STAT=ISTATUS ) > jaloc = three > > > WRITE(*,*)'JALOC',JALOC ! THIS SIMPLE PLOT ALWAYS WORKS > write(*,*)'calling PetscIntView: PROCESS NO. ROW NO.',rank3, jm > ! THIS CALL TO PetscIntView CAUSES CODE TO HANG WHEN E.G. total_mpi_size=3, JM=133 > call PetscIntView(NO_A_ENTRIES,JALOC(1:NO_A_ENTRIES), & > & PETSC_VIEWER_STDOUT_WORLD, ierr_pets) > CHKERRA(ierr_pets) > deallocate(jaloc) > endif > enddo > > CALL SlepcFinalize(ierr_pets) > end program trialer -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu May 27 22:50:18 2021 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 27 May 2021 22:50:18 -0500 Subject: [petsc-users] Did CUDA break again? In-Reply-To: References: <3FE5D21E-B0A9-4014-A742-1CFB05947CE0@petsc.dev> <651688A2-EE23-494A-8825-DD305687512F@petsc.dev> Message-ID: Mark, Where did you run the little test program I sent you 1) when it produced The 1120 and negative number and (was this on the compile server or on a compute node?) 2) when it produced the correct answer? (compile server or compute node?) Do you run configure on a compile server (that has no GPUs) or a compute server that has GPUs Don't spend your time bisecting PETSc we know exactly where the problem is, we just don't see how it happens. cuda.py, if it cannot find deviceQuery and if you did not provide a generation arch with -with-cuda-gencodearch=70, runs a version of the little code I sent you to get the number but it is ??apparently?? producing garbage or not running on the compiler server and gives the wrong number 1120. Just use the option -with-cuda-gencodearch=70 (you do not need to pass this information to any flags any more, just with this option and it will use it). Barry Ideally we want it to figure it out automatically and this little test program in configure is suppose to do this but since that is not always working yet you should just use -with-cuda-gencodearch=70 > On May 27, 2021, at 5:45 AM, Mark Adams wrote: > > FYI, I was running the test incorrectly: > 03:38 cgpu12 ~/petsc_install$ srun -n 1 -G 1 ./a.out > 70 > 70 > > On Wed, May 26, 2021 at 10:21 PM Mark Adams > wrote: > I had git bisect working and was 4 steps away when I got a new crash. > configure.log is empty. > > 19:15 1 cgpu02 (a531cba26b...)|BISECTING ~/petsc$ git bisect bad > Bisecting: 19 revisions left to test after this (roughly 4 steps) > [149e269f455574fbe8ce3ebaf42121ae7fdf0635] Merge branch 'tisaac/feature-spqr' into 'main' > 19:16 cgpu02 (149e269f45...)|BISECTING ~/petsc$ ../arch-cori-gpu-opt-gcc.py PETSC_DIR=$PWD > =============================================================================== > Configuring PETSc to compile on your system > =============================================================================== > ******************************************************************************* > CONFIGURATION CRASH (Please send configure.log to petsc-maint at mcs.anl.gov ) > ******************************************************************************* > > EOL while scanning string literal (cuda.py, line 176) > File "/global/u2/m/madams/petsc/config/configure.py", line 455, in petsc_configure > framework = config.framework.Framework(['--configModules=PETSc.Configure','--optionsModule=config.compilerOptions']+sys.argv[1:], loadArgDB = 0) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 107, in __init__ > self.createChildren() > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 344, in createChildren > self.getChild(moduleName) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 329, in getChild > config.setupDependencies(self) > File "/global/u2/m/madams/petsc/config/PETSc/Configure.py", line 80, in setupDependencies > self.blasLapack = framework.require('config.packages.BlasLapack',self) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 349, in require > config = self.getChild(moduleName, keywordArgs) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 329, in getChild > config.setupDependencies(self) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/BlasLapack.py", line 21, in setupDependencies > config.package.Package.setupDependencies(self, framework) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/package.py", line 151, in setupDependencies > self.mpi = framework.require('config.packages.MPI',self) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 349, in require > config = self.getChild(moduleName, keywordArgs) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 329, in getChild > config.setupDependencies(self) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/MPI.py", line 73, in setupDependencies > self.mpich = framework.require('config.packages.MPICH', self) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 349, in require > config = self.getChild(moduleName, keywordArgs) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 329, in getChild > config.setupDependencies(self) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/MPICH.py", line 16, in setupDependencies > self.cuda = framework.require('config.packages.cuda',self) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 349, in require > config = self.getChild(moduleName, keywordArgs) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 302, in getChild > type = __import__(moduleName, globals(), locals(), ['Configure']).Configure > 19:16 cgpu02 (149e269f45...)|BISECTING ~/petsc$ ../arch-cori-gpu-opt-gcc.py PETSC_DIR=$PWD > > On Wed, May 26, 2021 at 10:10 PM Junchao Zhang > wrote: > > > > On Wed, May 26, 2021 at 6:13 PM Barry Smith > wrote: > > What is HOST=cori09 Does it have GPUs? > > https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_164490976c8e07e028a8f1ce1f5cd42d6 > > Seems to clearly state > > int cudaDeviceProp ::major [inherited] > Major compute capability > > > > Mark, please compile and run this program on the machine you are running configure on > > #include > #include > #include > #include > #include > int main(int arg,char **args) > { > struct cudaDeviceProp dp; > cudaGetDeviceProperties(&dp, 0); > printf("%d\n",10*dp.major+dp.minor); > > int major,minor; > cuDeviceGetAttribute(&major, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, 0); > cuDeviceGetAttribute(&minor, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, 0); > printf("%d\n",10*major+minor); > return(0); > Probably, you need to check the return code of these two function calls to make sure they are correct. > > } > > This is what I get > > $ nvcc mytest.c -lcuda > ~/petsc (main=) arch-main > $ ./a.out > 70 > 70 > > Which is exactly what it is suppose to do. > > Barry > >> On May 26, 2021, at 5:31 PM, Barry Smith > wrote: >> >> >> Yes, this code which I guess never got hit before >> >> cudaDeviceProp dp; cudaGetDeviceProperties(&dp, 0); printf("%d\n",10*dp.major+dp.minor); return(0);; >> >> is using the wrong property for the generation. >> >> Back to the CUDA documentation for the correct information. >> >> >> >>> On May 26, 2021, at 3:47 PM, Jacob Faibussowitsch > wrote: >>> >>> 1120 sounds suspiciously like some CUDA version rather than architecture or compute capability? >>> >>> Best regards, >>> >>> Jacob Faibussowitsch >>> (Jacob Fai - booss - oh - vitch) >>> Cell: +1 (312) 694-3391 >>> >>>> On May 26, 2021, at 22:29, Mark Adams > wrote: >>>> ? >>>> I started to get this error today on Cori. >>>> >>>> nvcc fatal : Unsupported gpu architecture 'compute_1120' >>>> >>>> I am pretty sure I had a clean build but I can redo it if you don't know where this is from. >>>> >>>> Thanks, >>>> Mark >>>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri May 28 07:59:54 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 28 May 2021 08:59:54 -0400 Subject: [petsc-users] Did CUDA break again? In-Reply-To: References: <3FE5D21E-B0A9-4014-A742-1CFB05947CE0@petsc.dev> <651688A2-EE23-494A-8825-DD305687512F@petsc.dev> Message-ID: On Thu, May 27, 2021 at 11:50 PM Barry Smith wrote: > > Mark, > > > > Where did you run the little test program I sent you > > 1) when it produced > > The 1120 and negative number and (was this on the compile server or > on a compute node?) > This is fine now. look at my last email. I was not using srun. > 2) when it produced the correct answer? (compile server or compute node?) > > Do you run configure on a compile server (that has no GPUs) or a compute > server that has GPUs > You have to do everything on the compute nodes on Cori/gpu. > Don't spend your time bisecting PETSc we know exactly where the problem > is, we just don't see how it happens. > > cuda.py, if it cannot find deviceQuery and if you did not provide a > generation arch with -with-cuda-gencodearch=70, > I thought I was not supposed to use that anymore. It sounds like it is optional. > runs a version of the little code I sent you to get the number but it is > ??apparently?? producing garbage or not running on the compiler server and > gives the wrong number 1120. > Does PETSc use MPIEXEC to run this? Note, I have not been able to get 'make check' to work on Cori/gpu. I use '-with-mpiexec=srun -G1 [-c 20]' and it fails to execute the tests. OK, putting -with-cuda-gencodearch=70 back in has fixed this problem. It is running now. Thanks, > > Just use the option -with-cuda-gencodearch=70 (you do not need to pass > this information to any flags any more, just with this option and it will > use it). > > Barry > > Ideally we want it to figure it out automatically and this little test > program in configure is suppose to do this but since that is not always > working yet you should just use -with-cuda-gencodearch=70 > > > > On May 27, 2021, at 5:45 AM, Mark Adams wrote: > > FYI, I was running the test incorrectly: > 03:38 cgpu12 ~/petsc_install$ srun -n 1 -G 1 ./a.out > 70 > 70 > > On Wed, May 26, 2021 at 10:21 PM Mark Adams wrote: > >> I had git bisect working and was 4 steps away when I got a new crash. >> configure.log is empty. >> >> 19:15 1 cgpu02 (a531cba26b...)|BISECTING ~/petsc$ git bisect bad >> Bisecting: 19 revisions left to test after this (roughly 4 steps) >> [149e269f455574fbe8ce3ebaf42121ae7fdf0635] Merge branch >> 'tisaac/feature-spqr' into 'main' >> 19:16 cgpu02 (149e269f45...)|BISECTING ~/petsc$ >> ../arch-cori-gpu-opt-gcc.py PETSC_DIR=$PWD >> >> =============================================================================== >> Configuring PETSc to compile on your system >> >> >> =============================================================================== >> >> ******************************************************************************* >> CONFIGURATION CRASH (Please send configure.log to >> petsc-maint at mcs.anl.gov) >> >> ******************************************************************************* >> >> EOL while scanning string literal (cuda.py, line 176) >> File "/global/u2/m/madams/petsc/config/configure.py", line 455, in >> petsc_configure >> framework = >> config.framework.Framework(['--configModules=PETSc.Configure','--optionsModule=config.compilerOptions']+sys.argv[1:], >> loadArgDB = 0) >> File >> "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line >> 107, in __init__ >> self.createChildren() >> File >> "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line >> 344, in createChildren >> self.getChild(moduleName) >> File >> "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line >> 329, in getChild >> config.setupDependencies(self) >> File "/global/u2/m/madams/petsc/config/PETSc/Configure.py", line 80, in >> setupDependencies >> self.blasLapack = >> framework.require('config.packages.BlasLapack',self) >> File >> "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line >> 349, in require >> config = self.getChild(moduleName, keywordArgs) >> File >> "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line >> 329, in getChild >> config.setupDependencies(self) >> File >> "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/BlasLapack.py", >> line 21, in setupDependencies >> config.package.Package.setupDependencies(self, framework) >> File "/global/u2/m/madams/petsc/config/BuildSystem/config/package.py", >> line 151, in setupDependencies >> self.mpi = framework.require('config.packages.MPI',self) >> File >> "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line >> 349, in require >> config = self.getChild(moduleName, keywordArgs) >> File >> "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line >> 329, in getChild >> config.setupDependencies(self) >> File >> "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/MPI.py", line >> 73, in setupDependencies >> self.mpich = framework.require('config.packages.MPICH', self) >> File >> "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line >> 349, in require >> config = self.getChild(moduleName, keywordArgs) >> File >> "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line >> 329, in getChild >> config.setupDependencies(self) >> File >> "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/MPICH.py", >> line 16, in setupDependencies >> self.cuda = framework.require('config.packages.cuda',self) >> File >> "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line >> 349, in require >> config = self.getChild(moduleName, keywordArgs) >> File >> "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line >> 302, in getChild >> type = __import__(moduleName, globals(), locals(), >> ['Configure']).Configure >> 19:16 cgpu02 (149e269f45...)|BISECTING ~/petsc$ >> ../arch-cori-gpu-opt-gcc.py PETSC_DIR=$PWD >> >> On Wed, May 26, 2021 at 10:10 PM Junchao Zhang >> wrote: >> >>> >>> >>> >>> On Wed, May 26, 2021 at 6:13 PM Barry Smith wrote: >>> >>>> >>>> What is HOST=cori09 Does it have GPUs? >>>> >>>> >>>> https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_164490976c8e07e028a8f1ce1f5cd42d6 >>>> >>>> Seems to clearly state >>>> >>>> int cudaDeviceProp >>>> >>>> ::major >>>> >>>> [inherited] >>>> >>>> Major compute capability >>>> >>>> >>>> Mark, please compile and run this program on the machine you are >>>> running configure on >>>> >>>> #include >>>> #include >>>> #include >>>> #include >>>> #include >>>> int main(int arg,char **args) >>>> { >>>> struct cudaDeviceProp dp; >>>> cudaGetDeviceProperties(&dp, 0); >>>> printf("%d\n",10*dp.major+dp.minor); >>>> >>>> int major,minor; >>>> cuDeviceGetAttribute(&major, >>>> CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, 0); >>>> cuDeviceGetAttribute(&minor, >>>> CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, 0); >>>> printf("%d\n",10*major+minor); >>>> return(0); >>>> >>> Probably, you need to check the return code of these two function calls >>> to make sure they are correct. >>> >>> >>>> } >>>> >>>> This is what I get >>>> >>>> $ nvcc mytest.c -lcuda >>>> ~/petsc* (main=)* arch-main >>>> $ ./a.out >>>> 70 >>>> 70 >>>> >>>> Which is exactly what it is suppose to do. >>>> >>>> Barry >>>> >>>> On May 26, 2021, at 5:31 PM, Barry Smith wrote: >>>> >>>> >>>> Yes, this code which I guess never got hit before >>>> >>>> cudaDeviceProp dp; cudaGetDeviceProperties(&dp, 0); >>>> printf("%d\n",10*dp.major+dp.minor); >>>> return(0);; >>>> >>>> is using the wrong property for the generation. >>>> >>>> Back to the CUDA documentation for the correct information. >>>> >>>> >>>> >>>> On May 26, 2021, at 3:47 PM, Jacob Faibussowitsch >>>> wrote: >>>> >>>> 1120 sounds suspiciously like some CUDA version rather than >>>> architecture or compute capability? >>>> >>>> Best regards, >>>> >>>> Jacob Faibussowitsch >>>> (Jacob Fai - booss - oh - vitch) >>>> Cell: +1 (312) 694-3391 >>>> >>>> On May 26, 2021, at 22:29, Mark Adams wrote: >>>> ? >>>> I started to get this error today on Cori. >>>> >>>> nvcc fatal : Unsupported gpu architecture 'compute_1120' >>>> >>>> I am pretty sure I had a clean build but I can redo it if you don't >>>> know where this is from. >>>> >>>> Thanks, >>>> Mark >>>> >>>> >>>> >>>> >>>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri May 28 08:15:49 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 28 May 2021 09:15:49 -0400 Subject: [petsc-users] reproducibility Message-ID: How do you check out the version of PETSc used from the output: Using Petsc Development GIT revision: v3.15.0-531-g1397235 GIT Date: 2021-05-18 13:47:28 -0400 I am looking for a SHA1 Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From wence at gmx.li Fri May 28 08:28:22 2021 From: wence at gmx.li (Lawrence Mitchell) Date: Fri, 28 May 2021 14:28:22 +0100 Subject: [petsc-users] reproducibility In-Reply-To: References: Message-ID: > On 28 May 2021, at 14:15, Mark Adams wrote: > > v3.15.0-531-g1397235 ^^^^^^^ This is the shortened commit hash, so git checkout 1397235 Lawrence From knepley at gmail.com Fri May 28 08:33:08 2021 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 28 May 2021 09:33:08 -0400 Subject: [petsc-users] reproducibility In-Reply-To: References: Message-ID: On Fri, May 28, 2021 at 9:16 AM Mark Adams wrote: > How do you check out the version of PETSc used from the output: > > Using Petsc Development GIT revision: v3.15.0-531-g1397235 GIT Date: > 2021-05-18 13:47:28 -0400 > > I am looking for a SHA1 > You should be able to lookup the commit from the short hash git show 1397235 but that hash is not in my repo. Matt > Thanks, > Mark > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Fri May 28 08:38:57 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Fri, 28 May 2021 16:38:57 +0300 Subject: [petsc-users] reproducibility In-Reply-To: References: Message-ID: Mark That line is obtained via git describe --match "v*" At configure time. The number after the g indicates the commit As Matt says, you can do git checkout to go back at the point were you configured PETSc > On May 28, 2021, at 4:33 PM, Matthew Knepley wrote: > > 1397235 -------------- next part -------------- An HTML attachment was scrubbed... URL: From wence at gmx.li Fri May 28 08:45:01 2021 From: wence at gmx.li (Lawrence Mitchell) Date: Fri, 28 May 2021 14:45:01 +0100 Subject: [petsc-users] reproducibility In-Reply-To: References: Message-ID: > On 28 May 2021, at 14:38, Stefano Zampini wrote: > > Mark > > That line is obtained via > > git describe --match "v*" > > At configure time. The number after the g indicates the commit > As Matt says, you can do git checkout to go back at the point were you configured PETSc In fact, I hadn't realised this but you can do: git checkout v3.15.0-531-g1397235 and git DTRT. Lawrence From mfadams at lbl.gov Fri May 28 08:59:36 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 28 May 2021 09:59:36 -0400 Subject: [petsc-users] reproducibility In-Reply-To: References: Message-ID: On Fri, May 28, 2021 at 9:33 AM Matthew Knepley wrote: > On Fri, May 28, 2021 at 9:16 AM Mark Adams wrote: > >> How do you check out the version of PETSc used from the output: >> >> Using Petsc Development GIT revision: v3.15.0-531-g1397235 GIT Date: >> 2021-05-18 13:47:28 -0400 >> >> I am looking for a SHA1 >> > > You should be able to lookup the commit from the short hash > > git show 1397235 > > but that hash is not in my repo. > Thanks everyone. How would I get a version (a branch say) to be and stay visible? I am not seeing any of my versions used for this data but they were all in the repo at one point, in a branch. Does the branch need to be merged with main? I am going to rerun all the data anyway, so I now want to understand how to set up a branch to use everywhere and, of course, stay visible (for a few months at least). > > Matt > > >> Thanks, >> Mark >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri May 28 09:29:53 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 28 May 2021 10:29:53 -0400 Subject: [petsc-users] reproducibility In-Reply-To: References: Message-ID: On Fri, May 28, 2021 at 9:45 AM Lawrence Mitchell wrote: > > > > On 28 May 2021, at 14:38, Stefano Zampini > wrote: > > > > Mark > > > > That line is obtained via > > > > git describe --match "v*" > > > > At configure time. The number after the g indicates the commit > > As Matt says, you can do git checkout to go back at the > point were you configured PETSc > > In fact, I hadn't realised this but you can do: > > git checkout v3.15.0-531-g1397235 > > and git DTRT. > Thanks, that is nice. I will recommend this. I don't understand "git DTRT" > > Lawrence -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri May 28 09:31:48 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 28 May 2021 10:31:48 -0400 Subject: [petsc-users] reproducibility In-Reply-To: References: Message-ID: Also, as far as libraries go. Does the SHA1 include the p4est and Kokkos versions somehow? On Fri, May 28, 2021 at 10:29 AM Mark Adams wrote: > > > On Fri, May 28, 2021 at 9:45 AM Lawrence Mitchell wrote: > >> >> >> > On 28 May 2021, at 14:38, Stefano Zampini >> wrote: >> > >> > Mark >> > >> > That line is obtained via >> > >> > git describe --match "v*" >> > >> > At configure time. The number after the g indicates the commit >> > As Matt says, you can do git checkout to go back at the >> point were you configured PETSc >> >> In fact, I hadn't realised this but you can do: >> >> git checkout v3.15.0-531-g1397235 >> >> and git DTRT. >> > > Thanks, that is nice. I will recommend this. > > I don't understand "git DTRT" > > >> >> Lawrence > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri May 28 09:46:42 2021 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 28 May 2021 10:46:42 -0400 Subject: [petsc-users] reproducibility In-Reply-To: References: Message-ID: On Fri, May 28, 2021 at 10:32 AM Mark Adams wrote: > Also, as far as libraries go. Does the SHA1 include the p4est and Kokkos > versions somehow? > No. They are in the configure.log however. DTRT = Do The Right Thing Matt > On Fri, May 28, 2021 at 10:29 AM Mark Adams wrote: > >> >> >> On Fri, May 28, 2021 at 9:45 AM Lawrence Mitchell wrote: >> >>> >>> >>> > On 28 May 2021, at 14:38, Stefano Zampini >>> wrote: >>> > >>> > Mark >>> > >>> > That line is obtained via >>> > >>> > git describe --match "v*" >>> > >>> > At configure time. The number after the g indicates the commit >>> > As Matt says, you can do git checkout to go back at the >>> point were you configured PETSc >>> >>> In fact, I hadn't realised this but you can do: >>> >>> git checkout v3.15.0-531-g1397235 >>> >>> and git DTRT. >>> >> >> Thanks, that is nice. I will recommend this. >> >> I don't understand "git DTRT" >> >> >>> >>> Lawrence >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From wence at gmx.li Fri May 28 10:06:34 2021 From: wence at gmx.li (Lawrence Mitchell) Date: Fri, 28 May 2021 16:06:34 +0100 Subject: [petsc-users] reproducibility In-Reply-To: References: Message-ID: > On 28 May 2021, at 14:59, Mark Adams wrote: > > Thanks everyone. > > How would I get a version (a branch say) to be and stay visible? > > I am not seeing any of my versions used for this data but they were all in the repo at one point, in a branch. Does the branch need to be merged with main? > > I am going to rerun all the data anyway, so I now want to understand how to set up a branch to use everywhere and, of course, stay visible (for a few months at least). If the branch (and commits) are merged then they remain (so you can checkout the commit), even if the branch is subsequently deleted. If the commits live on a branch that is never merged and then you delete the branch, then eventually those commits disappear. I don't know what PETSc's policy on tagging is (Satish?), but you could tag the relevant commit to keep it hanging around. Lawrence From bsmith at petsc.dev Fri May 28 10:40:02 2021 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 28 May 2021 10:40:02 -0500 Subject: [petsc-users] Did CUDA break again? In-Reply-To: References: <3FE5D21E-B0A9-4014-A742-1CFB05947CE0@petsc.dev> <651688A2-EE23-494A-8825-DD305687512F@petsc.dev> Message-ID: <90669345-AFDB-484F-A52E-67CE02C997F8@petsc.dev> Thanks. On machines such as this one where you have to use $MPIEXEC to run code you will still need to provide the generation with -with-cuda-gencodearch=70. On systems where it can directly query the GPU without MPIEXEC it will automatically produce the correct result. Otherwise it will guess by compiling for different generations but this can produce an incorrect answer. Barry > On May 28, 2021, at 7:59 AM, Mark Adams wrote: > > > > On Thu, May 27, 2021 at 11:50 PM Barry Smith > wrote: > > Mark, > > > > Where did you run the little test program I sent you > > 1) when it produced > > The 1120 and negative number and (was this on the compile server or on a compute node?) > > This is fine now. look at my last email. I was not using srun. > > > 2) when it produced the correct answer? (compile server or compute node?) > > Do you run configure on a compile server (that has no GPUs) or a compute server that has GPUs > > You have to do everything on the compute nodes on Cori/gpu. > > > Don't spend your time bisecting PETSc we know exactly where the problem is, we just don't see how it happens. > > cuda.py, if it cannot find deviceQuery and if you did not provide a generation arch with -with-cuda-gencodearch=70, > > I thought I was not supposed to use that anymore. It sounds like it is optional. > > runs a version of the little code I sent you to get the number but it is ??apparently?? producing garbage or not running on the compiler server and gives the wrong number 1120. > > Does PETSc use MPIEXEC to run this? > > Note, I have not been able to get 'make check' to work on Cori/gpu. I use '-with-mpiexec=srun -G1 [-c 20]' and it fails to execute the tests. > > OK, putting -with-cuda-gencodearch=70 back in has fixed this problem. It is running now. > > Thanks, > > > Just use the option -with-cuda-gencodearch=70 (you do not need to pass this information to any flags any more, just with this option and it will use it). > > Barry > > Ideally we want it to figure it out automatically and this little test program in configure is suppose to do this but since that is not always working yet you should just use -with-cuda-gencodearch=70 > > > >> On May 27, 2021, at 5:45 AM, Mark Adams > wrote: >> >> FYI, I was running the test incorrectly: >> 03:38 cgpu12 ~/petsc_install$ srun -n 1 -G 1 ./a.out >> 70 >> 70 >> >> On Wed, May 26, 2021 at 10:21 PM Mark Adams > wrote: >> I had git bisect working and was 4 steps away when I got a new crash. >> configure.log is empty. >> >> 19:15 1 cgpu02 (a531cba26b...)|BISECTING ~/petsc$ git bisect bad >> Bisecting: 19 revisions left to test after this (roughly 4 steps) >> [149e269f455574fbe8ce3ebaf42121ae7fdf0635] Merge branch 'tisaac/feature-spqr' into 'main' >> 19:16 cgpu02 (149e269f45...)|BISECTING ~/petsc$ ../arch-cori-gpu-opt-gcc.py PETSC_DIR=$PWD >> =============================================================================== >> Configuring PETSc to compile on your system >> =============================================================================== >> ******************************************************************************* >> CONFIGURATION CRASH (Please send configure.log to petsc-maint at mcs.anl.gov ) >> ******************************************************************************* >> >> EOL while scanning string literal (cuda.py, line 176) >> File "/global/u2/m/madams/petsc/config/configure.py", line 455, in petsc_configure >> framework = config.framework.Framework(['--configModules=PETSc.Configure','--optionsModule=config.compilerOptions']+sys.argv[1:], loadArgDB = 0) >> File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 107, in __init__ >> self.createChildren() >> File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 344, in createChildren >> self.getChild(moduleName) >> File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 329, in getChild >> config.setupDependencies(self) >> File "/global/u2/m/madams/petsc/config/PETSc/Configure.py", line 80, in setupDependencies >> self.blasLapack = framework.require('config.packages.BlasLapack',self) >> File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 349, in require >> config = self.getChild(moduleName, keywordArgs) >> File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 329, in getChild >> config.setupDependencies(self) >> File "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/BlasLapack.py", line 21, in setupDependencies >> config.package.Package.setupDependencies(self, framework) >> File "/global/u2/m/madams/petsc/config/BuildSystem/config/package.py", line 151, in setupDependencies >> self.mpi = framework.require('config.packages.MPI',self) >> File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 349, in require >> config = self.getChild(moduleName, keywordArgs) >> File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 329, in getChild >> config.setupDependencies(self) >> File "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/MPI.py", line 73, in setupDependencies >> self.mpich = framework.require('config.packages.MPICH', self) >> File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 349, in require >> config = self.getChild(moduleName, keywordArgs) >> File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 329, in getChild >> config.setupDependencies(self) >> File "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/MPICH.py", line 16, in setupDependencies >> self.cuda = framework.require('config.packages.cuda',self) >> File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 349, in require >> config = self.getChild(moduleName, keywordArgs) >> File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line 302, in getChild >> type = __import__(moduleName, globals(), locals(), ['Configure']).Configure >> 19:16 cgpu02 (149e269f45...)|BISECTING ~/petsc$ ../arch-cori-gpu-opt-gcc.py PETSC_DIR=$PWD >> >> On Wed, May 26, 2021 at 10:10 PM Junchao Zhang > wrote: >> >> >> >> On Wed, May 26, 2021 at 6:13 PM Barry Smith > wrote: >> >> What is HOST=cori09 Does it have GPUs? >> >> https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_164490976c8e07e028a8f1ce1f5cd42d6 >> >> Seems to clearly state >> >> int cudaDeviceProp ::major [inherited] >> Major compute capability >> >> >> >> Mark, please compile and run this program on the machine you are running configure on >> >> #include >> #include >> #include >> #include >> #include >> int main(int arg,char **args) >> { >> struct cudaDeviceProp dp; >> cudaGetDeviceProperties(&dp, 0); >> printf("%d\n",10*dp.major+dp.minor); >> >> int major,minor; >> cuDeviceGetAttribute(&major, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, 0); >> cuDeviceGetAttribute(&minor, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, 0); >> printf("%d\n",10*major+minor); >> return(0); >> Probably, you need to check the return code of these two function calls to make sure they are correct. >> >> } >> >> This is what I get >> >> $ nvcc mytest.c -lcuda >> ~/petsc (main=) arch-main >> $ ./a.out >> 70 >> 70 >> >> Which is exactly what it is suppose to do. >> >> Barry >> >>> On May 26, 2021, at 5:31 PM, Barry Smith > wrote: >>> >>> >>> Yes, this code which I guess never got hit before >>> >>> cudaDeviceProp dp; cudaGetDeviceProperties(&dp, 0); printf("%d\n",10*dp.major+dp.minor); return(0);; >>> >>> is using the wrong property for the generation. >>> >>> Back to the CUDA documentation for the correct information. >>> >>> >>> >>>> On May 26, 2021, at 3:47 PM, Jacob Faibussowitsch > wrote: >>>> >>>> 1120 sounds suspiciously like some CUDA version rather than architecture or compute capability? >>>> >>>> Best regards, >>>> >>>> Jacob Faibussowitsch >>>> (Jacob Fai - booss - oh - vitch) >>>> Cell: +1 (312) 694-3391 >>>> >>>>> On May 26, 2021, at 22:29, Mark Adams > wrote: >>>>> ? >>>>> I started to get this error today on Cori. >>>>> >>>>> nvcc fatal : Unsupported gpu architecture 'compute_1120' >>>>> >>>>> I am pretty sure I had a clean build but I can redo it if you don't know where this is from. >>>>> >>>>> Thanks, >>>>> Mark >>>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri May 28 10:51:55 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 28 May 2021 11:51:55 -0400 Subject: [petsc-users] reproducibility In-Reply-To: References: Message-ID: It sounds like I should get one branch settled, use it, and keep that branch in the repo, and to be safe not touch it, and that should work for at least a few months. I just want it to work if the reviewer tests it :) Thanks, On Fri, May 28, 2021 at 11:06 AM Lawrence Mitchell wrote: > > > > On 28 May 2021, at 14:59, Mark Adams wrote: > > > > Thanks everyone. > > > > How would I get a version (a branch say) to be and stay visible? > > > > I am not seeing any of my versions used for this data but they were all > in the repo at one point, in a branch. Does the branch need to be merged > with main? > > > > I am going to rerun all the data anyway, so I now want to understand how > to set up a branch to use everywhere and, of course, stay visible (for a > few months at least). > > If the branch (and commits) are merged then they remain (so you can > checkout the commit), even if the branch is subsequently deleted. If the > commits live on a branch that is never merged and then you delete the > branch, then eventually those commits disappear. I don't know what PETSc's > policy on tagging is (Satish?), but you could tag the relevant commit to > keep it hanging around. > > Lawrence -------------- next part -------------- An HTML attachment was scrubbed... URL: From wence at gmx.li Fri May 28 10:53:13 2021 From: wence at gmx.li (Lawrence Mitchell) Date: Fri, 28 May 2021 16:53:13 +0100 Subject: [petsc-users] reproducibility In-Reply-To: References: Message-ID: <26E0FF29-CD2F-46CA-84B7-49F8BF89C18D@gmx.li> > On 28 May 2021, at 16:51, Mark Adams wrote: > > It sounds like I should get one branch settled, use it, and keep that branch in the repo, and to be safe not touch it, and that should work for at least a few months. I just want it to work if the reviewer tests it :) > Just bake a docker container and archive it on zenodo.org or figshare :) Lawrence From mfadams at lbl.gov Fri May 28 10:53:21 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 28 May 2021 11:53:21 -0400 Subject: [petsc-users] CUDA MatSetValues test Message-ID: Is there a test with MatSetValues and CUDA? -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Fri May 28 10:57:15 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Fri, 28 May 2021 18:57:15 +0300 Subject: [petsc-users] CUDA MatSetValues test In-Reply-To: References: Message-ID: If you are referring to your device set values, I guess it is not currently tested See the discussions here https://gitlab.com/petsc/petsc/-/merge_requests/3411 I started cleaning up the code to prepare for testing but we never finished it https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ > On May 28, 2021, at 6:53 PM, Mark Adams wrote: > > Is there a test with MatSetValues and CUDA? -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri May 28 11:04:09 2021 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 28 May 2021 11:04:09 -0500 Subject: [petsc-users] Collect Trajectories components In-Reply-To: References: <8F422055-6AB2-4BE9-8845-63955BC0D3B9@petsc.dev> <7D4A509B-2D00-4F96-AE3F-0705D1038CFA@petsc.dev> Message-ID: <65CD020A-D3CD-4018-9FD2-A07676112499@petsc.dev> What does "not working as I would like" mean? It should be retrieving the trajectory at the times 1.0, 2.0, 3.0 ... 40.0 and setting into the vector partial the values of the second component of Uloc (which depending on DMDA having a stencil width of 1 and a w of 1 is the first component of U. You can move the VecGet/RestoreArray(partial,&partlocal);CHKERRQ(ierr); outside of the loop. If you want the first component of U on process 0 you don't need the Uloc or the GlobalToLocalBegin/End. just use DMDAVecGetArray(appctx.da,U,&Ui);CHKERRQ(ierr); You only provide 14 locations in partial distributed over the MPI ranks but likely you want 40 on the first rank and none on the other ranks You are assigning part local[i] on all ranks, but you said you only want it on rank 0 so here is code that may work if rank == 0 { > ierr = VecCreateMPI(PETSC_COMM_WORLD,40,PETSC_DETERMINE,&partial);CHKERRQ(ierr); /* 40 local values > ierr = VecGetArray(partial,&partlocal);CHKERRQ(ierr); } else { ierr = VecCreateMPI(PETSC_COMM_WORLD,0,PETSC_DETERMINE,&partial);CHKERRQ(ierr); /* 0 local values } > for(i=0; i<40; i++) { > PetscReal ttime = i+1; > ierr = TSTrajectoryGetVecs(appctx.tj,appctx.ts,PETSC_DECIDE,&ttime,U,NULL);CHKERRQ(ierr); if rank == 0 { > ierr = DMDAVecGetArray(appctx.da,U,&Ui);CHKERRQ(ierr); > partlocal[i] = Ui[0]; > ierr = DMDAVecRestoreArray(appctx.da,U,&Ui);CHKERRQ(ierr); } } > if rank == 0 { > ierr = VecRestoreArray(partial,&partlocal);CHKERRQ(ierr); } Note that this entire block of code needs to run on all MPI ranks. But the actually selection of the wanted value only occurs on rank 0 When the loop is done rank == 0 will have a parallel vector whose components are what you want and all the other ranks will have a parallel vector with no components on those ranks. Note that you don't need to make partial be a parallel vector, you can just make it live on rank == 0 because that is the only place you access it. Then the code would be simpler if rank == 0 { > ierr = VecCreateSeq(PETSC_COMM_WORLD,40,PETSC_&partial);CHKERRQ(ierr); > ierr = VecGetArray(partial,&partlocal);CHKERRQ(ierr); } for(i=0; i<40; i++) { > PetscReal ttime = i+1; > ierr = TSTrajectoryGetVecs(appctx.tj,appctx.ts,PETSC_DECIDE,&ttime,U,NULL);CHKERRQ(ierr); if rank == 0 { > ierr = DMDAVecGetArray(appctx.da,U,&Ui);CHKERRQ(ierr); > partlocal[i] = Ui[0]; > ierr = DMDAVecRestoreArray(appctx.da,U,&Ui);CHKERRQ(ierr); } } > if rank == 0 { > ierr = VecRestoreArray(partial,&partlocal);CHKERRQ(ierr);} Barry > On May 27, 2021, at 2:42 AM, Francesco Brarda wrote: > > I created a for cycle where I call TSTrajectoryGetVecs, but only the 0 rank seems to enter in this cycle and I do not know why. > I thought the following might be a solution, but it is not working as I would like to, i.e. the final vector has the same local parts, a copy of the values obtained with the 0-rank. How should I change this, please? > > Vec U, partial, Uloc; > PetscScalar *Ui, *partlocal; > PetscInt i; > ierr = VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,14,&partial);CHKERRQ(ierr); > for(i=0; i<40; i++) { > PetscReal ttime = i+1; > ierr = TSTrajectoryGetVecs(appctx.tj,appctx.ts,PETSC_DECIDE,&ttime,U,NULL);CHKERRQ(ierr); > ierr = DMGetLocalVector(appctx.da,&Uloc);CHKERRQ(ierr); > ierr = DMGlobalToLocalBegin(appctx.da,U,INSERT_VALUES,Uloc);CHKERRQ(ierr); > ierr = DMGlobalToLocalEnd(appctx.da,U,INSERT_VALUES,Uloc);CHKERRQ(ierr); > ierr = DMDAVecGetArray(appctx.da,Uloc,&Ui);CHKERRQ(ierr); > ierr = VecGetArray(partial,&partlocal);CHKERRQ(ierr); > partlocal[i] = Ui[1]; > ierr = DMDAVecRestoreArray(appctx.da,U,&Ui);CHKERRQ(ierr); > ierr = VecRestoreArray(partial,&partlocal);CHKERRQ(ierr); > ierr = DMRestoreLocalVector(appctx.da,&Uloc);CHKERRQ(ierr); > } > > >> Il giorno 27 mag 2021, alle ore 01:15, Barry Smith > ha scritto: >> >> >> >>> On May 26, 2021, at 10:39 AM, Francesco Brarda > wrote: >>> >>> Thank you very much. >>>> Based on the error message it appears that your code is requesting different times on different MPI ranks. Is that what you intend to do? >>> Yes. I want to save different times across a vector built with multiple MPI ranks (PETSC_DECIDE for the local length). >>> The function is called only by the first proc (rank=0) and not from the others. Is there a way to force also other ranks to call that routine? >> >> Yes, just have all ranks call it and ignore the result on the other ranks. >> >>> Should I build everything into an external function outside the main? >> >> It can be called in main, does not need to be in a different function. >> >>> >>> Francesco >>> >>> >>>> Il giorno 26 mag 2021, alle ore 16:20, Barry Smith > ha scritto: >>>> >>>> >>>> >>>> >>>> TSTrajectoryGetVecs() is listed as Collective on TS. This means all ranks must call it with the same times in the same order of operations on all ranks that share the TS. >>>> >>>> You do not need to use VecScatter. Each process must call TSTrajectoryGetVecs with the same time but then you can have only the rank you care about select the entries from the resulting vectors you care about while the other ranks for that time just ignore the vectors since they do not need to values from it. >>>> >>>> Barry >>>> >>>> >>>> >>>>> On May 26, 2021, at 5:20 AM, Francesco Brarda wrote: >>>>> >>>>> Hi! >>>>> >>>>> I solved an ODE system with TS. Now I would like to save one of the trajectories in specific times. To do so, I used TSTrajectoryGetVecs. >>>>> The values of the variable I am interested in is on one processor. I want to collect these values in a parallel vector, but I had the error: >>>>> >>>>> [0]PETSC ERROR: Invalid argument >>>>> [0]PETSC ERROR: Real value must be same on all processes, argument # 2 >>>>> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>>> [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown >>>>> [0]PETSC ERROR: ./petsc_sir on a arch-debug named srvulx13 by fbrarda Wed May 26 12:00:42 2021 >>>>> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc --download-mpich PETSC_ARCH=arch-debug >>>>> [0]PETSC ERROR: #1 TSHistoryGetLocFromTime() line 134 in /home/fbrarda/petsc/src/ts/interface/tshistory.c >>>>> [0]PETSC ERROR: #2 TSTrajectoryReconstruct_Private() line 55 in /home/fbrarda/petsc/src/ts/trajectory/utils/reconstruct.c >>>>> [0]PETSC ERROR: #3 TSTrajectoryGetVecs() line 239 in /home/fbrarda/petsc/src/ts/trajectory/interface/traj.c >>>>> >>>>> Is there any specific routine I can use to overcome this issue? Should I use VecScatter? >>>>> >>>>> I hope I made myself clear. >>>>> Best, >>>>> Francesco >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri May 28 11:04:58 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 28 May 2021 12:04:58 -0400 Subject: [petsc-users] reproducibility In-Reply-To: <26E0FF29-CD2F-46CA-84B7-49F8BF89C18D@gmx.li> References: <26E0FF29-CD2F-46CA-84B7-49F8BF89C18D@gmx.li> Message-ID: On Fri, May 28, 2021 at 11:53 AM Lawrence Mitchell wrote: > > > > On 28 May 2021, at 16:51, Mark Adams wrote: > > > > It sounds like I should get one branch settled, use it, and keep that > branch in the repo, and to be safe not touch it, and that should work for > at least a few months. I just want it to work if the reviewer tests it :) > > > > Just bake a docker container and archive it on zenodo.org or figshare :) > Are you serious? > > Lawrence > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri May 28 11:06:22 2021 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 28 May 2021 12:06:22 -0400 Subject: [petsc-users] reproducibility In-Reply-To: References: <26E0FF29-CD2F-46CA-84B7-49F8BF89C18D@gmx.li> Message-ID: On Fri, May 28, 2021 at 12:05 PM Mark Adams wrote: > On Fri, May 28, 2021 at 11:53 AM Lawrence Mitchell wrote: > >> >> >> > On 28 May 2021, at 16:51, Mark Adams wrote: >> > >> > It sounds like I should get one branch settled, use it, and keep that >> branch in the repo, and to be safe not touch it, and that should work for >> at least a few months. I just want it to work if the reviewer tests it :) >> > >> >> Just bake a docker container and archive it on zenodo.org or figshare :) >> > > Are you serious? > Yes. That is what Zenodo is for. Matt > >> Lawrence >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri May 28 11:13:16 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 28 May 2021 12:13:16 -0400 Subject: [petsc-users] CUDA MatSetValues test In-Reply-To: References: Message-ID: On Fri, May 28, 2021 at 11:57 AM Stefano Zampini wrote: > If you are referring to your device set values, I guess it is not > currently tested > No. There is a test for that (ex5cu). I have a user that is getting a segv in MatSetValues with aijcusparse. I suspect there is memory corruption but I'm trying to cover all the bases. I have added a cuda test to ksp/ex56 that works. I can do an MR for it if such a test does not exist. > See the discussions here > https://gitlab.com/petsc/petsc/-/merge_requests/3411 > I started cleaning up the code to prepare for testing but we never > finished it > https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ > > > On May 28, 2021, at 6:53 PM, Mark Adams wrote: > > Is there a test with MatSetValues and CUDA? > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Fri May 28 11:30:23 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 28 May 2021 11:30:23 -0500 Subject: [petsc-users] Did CUDA break again? In-Reply-To: <90669345-AFDB-484F-A52E-67CE02C997F8@petsc.dev> References: <3FE5D21E-B0A9-4014-A742-1CFB05947CE0@petsc.dev> <651688A2-EE23-494A-8825-DD305687512F@petsc.dev> <90669345-AFDB-484F-A52E-67CE02C997F8@petsc.dev> Message-ID: On Fri, May 28, 2021 at 10:40 AM Barry Smith wrote: > > Thanks. On machines such as this one where you have to use $MPIEXEC to > run code you will still need to provide the generation with > -with-cuda-gencodearch=70. On systems where it can directly query the GPU > without MPIEXEC it will automatically produce the correct result. Otherwise > it will guess by compiling for different generations but this can produce > an incorrect answer. > Yes, on Summit with CUDA-11, the script guesses sm_80, but actually it should be sm_70. Probably, we can test hostname and then set a correct cuda arch for common machines. But it kind of overreacts. > > Barry > > > On May 28, 2021, at 7:59 AM, Mark Adams wrote: > > > > On Thu, May 27, 2021 at 11:50 PM Barry Smith wrote: > >> >> Mark, >> >> >> >> Where did you run the little test program I sent you >> >> 1) when it produced >> >> The 1120 and negative number and (was this on the compile server or >> on a compute node?) >> > > This is fine now. look at my last email. I was not using srun. > > >> 2) when it produced the correct answer? (compile server or compute node?) >> >> Do you run configure on a compile server (that has no GPUs) or a compute >> server that has GPUs >> > > You have to do everything on the compute nodes on Cori/gpu. > > >> Don't spend your time bisecting PETSc we know exactly where the problem >> is, we just don't see how it happens. >> > >> cuda.py, if it cannot find deviceQuery and if you did not provide a >> generation arch with -with-cuda-gencodearch=70, >> > > I thought I was not supposed to use that anymore. It sounds like it is > optional. > > >> runs a version of the little code I sent you to get the number but it is >> ??apparently?? producing garbage or not running on the compiler server and >> gives the wrong number 1120. >> > > Does PETSc use MPIEXEC to run this? > > Note, I have not been able to get 'make check' to work on Cori/gpu. I use > '-with-mpiexec=srun -G1 [-c 20]' and it fails to execute the tests. > > OK, putting -with-cuda-gencodearch=70 back in has fixed this problem. It > is running now. > > Thanks, > > >> >> Just use the option -with-cuda-gencodearch=70 (you do not need to >> pass this information to any flags any more, just with this option and it >> will use it). >> >> Barry >> >> Ideally we want it to figure it out automatically and this little test >> program in configure is suppose to do this but since that is not always >> working yet you should just use -with-cuda-gencodearch=70 >> >> >> >> On May 27, 2021, at 5:45 AM, Mark Adams wrote: >> >> FYI, I was running the test incorrectly: >> 03:38 cgpu12 ~/petsc_install$ srun -n 1 -G 1 ./a.out >> 70 >> 70 >> >> On Wed, May 26, 2021 at 10:21 PM Mark Adams wrote: >> >>> I had git bisect working and was 4 steps away when I got a new crash. >>> configure.log is empty. >>> >>> 19:15 1 cgpu02 (a531cba26b...)|BISECTING ~/petsc$ git bisect bad >>> Bisecting: 19 revisions left to test after this (roughly 4 steps) >>> [149e269f455574fbe8ce3ebaf42121ae7fdf0635] Merge branch >>> 'tisaac/feature-spqr' into 'main' >>> 19:16 cgpu02 (149e269f45...)|BISECTING ~/petsc$ >>> ../arch-cori-gpu-opt-gcc.py PETSC_DIR=$PWD >>> >>> =============================================================================== >>> Configuring PETSc to compile on your system >>> >>> >>> =============================================================================== >>> >>> ******************************************************************************* >>> CONFIGURATION CRASH (Please send configure.log to >>> petsc-maint at mcs.anl.gov) >>> >>> ******************************************************************************* >>> >>> EOL while scanning string literal (cuda.py, line 176) >>> File "/global/u2/m/madams/petsc/config/configure.py", line 455, in >>> petsc_configure >>> framework = >>> config.framework.Framework(['--configModules=PETSc.Configure','--optionsModule=config.compilerOptions']+sys.argv[1:], >>> loadArgDB = 0) >>> File >>> "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line >>> 107, in __init__ >>> self.createChildren() >>> File >>> "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line >>> 344, in createChildren >>> self.getChild(moduleName) >>> File >>> "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line >>> 329, in getChild >>> config.setupDependencies(self) >>> File "/global/u2/m/madams/petsc/config/PETSc/Configure.py", line 80, >>> in setupDependencies >>> self.blasLapack = >>> framework.require('config.packages.BlasLapack',self) >>> File >>> "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line >>> 349, in require >>> config = self.getChild(moduleName, keywordArgs) >>> File >>> "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line >>> 329, in getChild >>> config.setupDependencies(self) >>> File >>> "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/BlasLapack.py", >>> line 21, in setupDependencies >>> config.package.Package.setupDependencies(self, framework) >>> File "/global/u2/m/madams/petsc/config/BuildSystem/config/package.py", >>> line 151, in setupDependencies >>> self.mpi = framework.require('config.packages.MPI',self) >>> File >>> "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line >>> 349, in require >>> config = self.getChild(moduleName, keywordArgs) >>> File >>> "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line >>> 329, in getChild >>> config.setupDependencies(self) >>> File >>> "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/MPI.py", line >>> 73, in setupDependencies >>> self.mpich = framework.require('config.packages.MPICH', self) >>> File >>> "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line >>> 349, in require >>> config = self.getChild(moduleName, keywordArgs) >>> File >>> "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line >>> 329, in getChild >>> config.setupDependencies(self) >>> File >>> "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/MPICH.py", >>> line 16, in setupDependencies >>> self.cuda = framework.require('config.packages.cuda',self) >>> File >>> "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line >>> 349, in require >>> config = self.getChild(moduleName, keywordArgs) >>> File >>> "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", line >>> 302, in getChild >>> type = __import__(moduleName, globals(), locals(), >>> ['Configure']).Configure >>> 19:16 cgpu02 (149e269f45...)|BISECTING ~/petsc$ >>> ../arch-cori-gpu-opt-gcc.py PETSC_DIR=$PWD >>> >>> On Wed, May 26, 2021 at 10:10 PM Junchao Zhang >>> wrote: >>> >>>> >>>> >>>> >>>> On Wed, May 26, 2021 at 6:13 PM Barry Smith wrote: >>>> >>>>> >>>>> What is HOST=cori09 Does it have GPUs? >>>>> >>>>> >>>>> https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_164490976c8e07e028a8f1ce1f5cd42d6 >>>>> >>>>> Seems to clearly state >>>>> >>>>> int cudaDeviceProp >>>>> >>>>> ::major >>>>> >>>>> [inherited] >>>>> >>>>> Major compute capability >>>>> >>>>> >>>>> Mark, please compile and run this program on the machine you are >>>>> running configure on >>>>> >>>>> #include >>>>> #include >>>>> #include >>>>> #include >>>>> #include >>>>> int main(int arg,char **args) >>>>> { >>>>> struct cudaDeviceProp dp; >>>>> cudaGetDeviceProperties(&dp, 0); >>>>> printf("%d\n",10*dp.major+dp.minor); >>>>> >>>>> int major,minor; >>>>> cuDeviceGetAttribute(&major, >>>>> CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, 0); >>>>> cuDeviceGetAttribute(&minor, >>>>> CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, 0); >>>>> printf("%d\n",10*major+minor); >>>>> return(0); >>>>> >>>> Probably, you need to check the return code of these two function calls >>>> to make sure they are correct. >>>> >>>> >>>>> } >>>>> >>>>> This is what I get >>>>> >>>>> $ nvcc mytest.c -lcuda >>>>> ~/petsc* (main=)* arch-main >>>>> $ ./a.out >>>>> 70 >>>>> 70 >>>>> >>>>> Which is exactly what it is suppose to do. >>>>> >>>>> Barry >>>>> >>>>> On May 26, 2021, at 5:31 PM, Barry Smith wrote: >>>>> >>>>> >>>>> Yes, this code which I guess never got hit before >>>>> >>>>> cudaDeviceProp dp; cudaGetDeviceProperties(&dp, 0); >>>>> printf("%d\n",10*dp.major+dp.minor); >>>>> return(0);; >>>>> >>>>> is using the wrong property for the generation. >>>>> >>>>> Back to the CUDA documentation for the correct information. >>>>> >>>>> >>>>> >>>>> On May 26, 2021, at 3:47 PM, Jacob Faibussowitsch >>>>> wrote: >>>>> >>>>> 1120 sounds suspiciously like some CUDA version rather than >>>>> architecture or compute capability? >>>>> >>>>> Best regards, >>>>> >>>>> Jacob Faibussowitsch >>>>> (Jacob Fai - booss - oh - vitch) >>>>> Cell: +1 (312) 694-3391 >>>>> >>>>> On May 26, 2021, at 22:29, Mark Adams wrote: >>>>> ? >>>>> I started to get this error today on Cori. >>>>> >>>>> nvcc fatal : Unsupported gpu architecture 'compute_1120' >>>>> >>>>> I am pretty sure I had a clean build but I can redo it if you don't >>>>> know where this is from. >>>>> >>>>> Thanks, >>>>> Mark >>>>> >>>>> >>>>> >>>>> >>>>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Fri May 28 11:36:53 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Fri, 28 May 2021 19:36:53 +0300 Subject: [petsc-users] CUDA MatSetValues test In-Reply-To: References: Message-ID: That test is not run in the testsuite Il Ven 28 Mag 2021, 19:13 Mark Adams ha scritto: > > > On Fri, May 28, 2021 at 11:57 AM Stefano Zampini < > stefano.zampini at gmail.com> wrote: > >> If you are referring to your device set values, I guess it is not >> currently tested >> > > No. There is a test for that (ex5cu). > I have a user that is getting a segv in MatSetValues with aijcusparse. I > suspect there is memory corruption but I'm trying to cover all the bases. > I have added a cuda test to ksp/ex56 that works. I can do an MR for it if > such a test does not exist. > > >> See the discussions here >> https://gitlab.com/petsc/petsc/-/merge_requests/3411 >> I started cleaning up the code to prepare for testing but we never >> finished it >> https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ >> >> >> On May 28, 2021, at 6:53 PM, Mark Adams wrote: >> >> Is there a test with MatSetValues and CUDA? >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri May 28 11:45:55 2021 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 28 May 2021 11:45:55 -0500 Subject: [petsc-users] CUDA MatSetValues test In-Reply-To: References: Message-ID: <497E7E23-0657-4268-85A0-C5DD6E4CB23F@petsc.dev> ~/petsc/src/mat/tutorials (barry/2021-05-28/robustify-cuda-gencodearch-check=) arch-robustify-cuda-gencodearch-check $ ./ex5cu terminate called after throwing an instance of 'thrust::system::system_error' what(): fill_n: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Aborted (core dumped) requires: cuda !define(PETSC_USE_CTABLE) CI does not test with CUDA and no ctable. The code is still broken as it was six months ago in the discussion Stefano pointed to. It is clear why just no one has had the time to clean things up. Barry > On May 28, 2021, at 11:13 AM, Mark Adams wrote: > > > > On Fri, May 28, 2021 at 11:57 AM Stefano Zampini > wrote: > If you are referring to your device set values, I guess it is not currently tested > > No. There is a test for that (ex5cu). > I have a user that is getting a segv in MatSetValues with aijcusparse. I suspect there is memory corruption but I'm trying to cover all the bases. > I have added a cuda test to ksp/ex56 that works. I can do an MR for it if such a test does not exist. > > See the discussions here https://gitlab.com/petsc/petsc/-/merge_requests/3411 > I started cleaning up the code to prepare for testing but we never finished it https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ > > >> On May 28, 2021, at 6:53 PM, Mark Adams > wrote: >> >> Is there a test with MatSetValues and CUDA? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Fri May 28 12:12:38 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Fri, 28 May 2021 20:12:38 +0300 Subject: [petsc-users] CUDA MatSetValues test In-Reply-To: <497E7E23-0657-4268-85A0-C5DD6E4CB23F@petsc.dev> References: <497E7E23-0657-4268-85A0-C5DD6E4CB23F@petsc.dev> Message-ID: That branch provides a fix for MatSetValuesDevice but it never got merged because of the CI issues with the ?download-openmpi. We can probably try to skip the test in that specific configuration? > On May 28, 2021, at 7:45 PM, Barry Smith wrote: > > > ~/petsc/src/mat/tutorials (barry/2021-05-28/robustify-cuda-gencodearch-check=) arch-robustify-cuda-gencodearch-check > $ ./ex5cu > terminate called after throwing an instance of 'thrust::system::system_error' > what(): fill_n: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered > Aborted (core dumped) > > requires: cuda !define(PETSC_USE_CTABLE) > > CI does not test with CUDA and no ctable. The code is still broken as it was six months ago in the discussion Stefano pointed to. It is clear why just no one has had the time to clean things up. > > Barry > > >> On May 28, 2021, at 11:13 AM, Mark Adams > wrote: >> >> >> >> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini > wrote: >> If you are referring to your device set values, I guess it is not currently tested >> >> No. There is a test for that (ex5cu). >> I have a user that is getting a segv in MatSetValues with aijcusparse. I suspect there is memory corruption but I'm trying to cover all the bases. >> I have added a cuda test to ksp/ex56 that works. I can do an MR for it if such a test does not exist. >> >> See the discussions here https://gitlab.com/petsc/petsc/-/merge_requests/3411 >> I started cleaning up the code to prepare for testing but we never finished it https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ >> >> >>> On May 28, 2021, at 6:53 PM, Mark Adams > wrote: >>> >>> Is there a test with MatSetValues and CUDA? >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Fri May 28 12:16:28 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Fri, 28 May 2021 20:16:28 +0300 Subject: [petsc-users] CUDA MatSetValues test In-Reply-To: References: <497E7E23-0657-4268-85A0-C5DD6E4CB23F@petsc.dev> Message-ID: Or probably remove ?download-openmpi ? Or, just for the moment, why can?t we just tell configure that mpi is a weak dependence of cuda.py, so that it will be forced to be configured later? > On May 28, 2021, at 8:12 PM, Stefano Zampini wrote: > > That branch provides a fix for MatSetValuesDevice but it never got merged because of the CI issues with the ?download-openmpi. We can probably try to skip the test in that specific configuration? > >> On May 28, 2021, at 7:45 PM, Barry Smith > wrote: >> >> >> ~/petsc/src/mat/tutorials (barry/2021-05-28/robustify-cuda-gencodearch-check=) arch-robustify-cuda-gencodearch-check >> $ ./ex5cu >> terminate called after throwing an instance of 'thrust::system::system_error' >> what(): fill_n: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered >> Aborted (core dumped) >> >> requires: cuda !define(PETSC_USE_CTABLE) >> >> CI does not test with CUDA and no ctable. The code is still broken as it was six months ago in the discussion Stefano pointed to. It is clear why just no one has had the time to clean things up. >> >> Barry >> >> >>> On May 28, 2021, at 11:13 AM, Mark Adams > wrote: >>> >>> >>> >>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini > wrote: >>> If you are referring to your device set values, I guess it is not currently tested >>> >>> No. There is a test for that (ex5cu). >>> I have a user that is getting a segv in MatSetValues with aijcusparse. I suspect there is memory corruption but I'm trying to cover all the bases. >>> I have added a cuda test to ksp/ex56 that works. I can do an MR for it if such a test does not exist. >>> >>> See the discussions here https://gitlab.com/petsc/petsc/-/merge_requests/3411 >>> I started cleaning up the code to prepare for testing but we never finished it https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ >>> >>> >>>> On May 28, 2021, at 6:53 PM, Mark Adams > wrote: >>>> >>>> Is there a test with MatSetValues and CUDA? >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri May 28 12:16:47 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 28 May 2021 13:16:47 -0400 Subject: [petsc-users] CUDA MatSetValues test In-Reply-To: <497E7E23-0657-4268-85A0-C5DD6E4CB23F@petsc.dev> References: <497E7E23-0657-4268-85A0-C5DD6E4CB23F@petsc.dev> Message-ID: It looks like MatAssemblyEnd is not setting up correctly in parallel . Segvv here. I'll take a look at what Stefano did. #18 main () (at 0x00000000100019a8) #17 MatMult (mat=0x2155a750, x=0x56dc29c0, y=0x5937b190) at /autofs/nccs-svm1_home1/adams/petsc/src/mat/interface/matrix.c:2448 (at 0x00002000005f4858) #16 MatMult_MPIAIJCUSPARSE(_p_Mat*, _p_Vec*, _p_Vec*) () from /ccs/home/adams/petsc/arch-summit-opt64-gnu-cuda/lib/libpetsc.so.3.15 (at 0x000020000095e298) #15 VecScatterBegin (sf=0x5937fbf0, x=0x56dc29c0, y=0x5937cd20, addv=, mode=) at /autofs/nccs-svm1_home1/adams/petsc/src/vec/is/sf/interface/vscat.c:1345 (at 0x00002000003a44fc) #14 VecScatterBegin_Internal (sf=0x5937fbf0, x=0x56dc29c0, y=0x5937cd20, addv=INSERT_VALUES, mode=SCATTER_FORWARD) at /autofs/nccs-svm1_home1/adams/petsc/src/vec/is/sf/interface/vscat.c:72 (at 0x000020000039e9cc) #13 PetscSFBcastWithMemTypeBegin (sf=0x5937fbf0, unit=0x200024529ed0, rootmtype=, rootdata=0x200076ea1e00, leafmtype=, leafdata=0x200076ea2200, op=0x200024539c70) at /autofs/nccs-svm1_home1/adams/petsc/src/vec/is/sf/interface/sf.c:1493 (at 0x0000200000396f04) #12 PetscSFBcastBegin_Basic (sf=0x5937fbf0, unit=, rootmtype=, rootdata=0x200076ea1e00, leafmtype=, leafdata=0x200076ea2200, op=0x200024539c70) at /autofs/nccs-svm1_home1/adams/petsc/src/vec/is/sf/impls/basic/sfbasic.c:191 (at 0x00002000002de188) #11 PetscSFLinkStartCommunication (direction=PETSCSF_ROOT2LEAF, link=, sf=0x5937fbf0) at /ccs/home/adams/petsc/include/../src/vec/is/sf/impls/basic/sfpack.h:267 (at 0x00002000002de188) #10 PetscSFLinkStartRequests_MPI (sf=, link=0x5937f080, direction=) at /autofs/nccs-svm1_home1/adams/petsc/src/vec/is/sf/impls/basic/sfmpi.c:41 (at 0x00002000003850dc) #9 PMPI_Startall () from /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/container/../lib/libmpi_ibm.so.3 (at 0x0000200024493d98) #8 mca_pml_pami_start () from /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/container/../lib/spectrum_mpi/mca_pml_pami.so (at 0x00002000301ce6e0) #7 pml_pami_persis_send_start () from /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/container/../lib/spectrum_mpi/mca_pml_pami.so (at 0x00002000301ce29c) #6 pml_pami_send () from /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/container/../lib/spectrum_mpi/mca_pml_pami.so (at 0x00002000301cf69c) #5 PAMI_Send_immediate () from /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/container/../lib/pami_port/libpami.so.3 (at 0x0000200030395814) #4 PAMI::Protocol::Send::Eager, PAMI::Counter::IndirectBounded, 256u>, PAMI::Counter::Indirect, PAMI::Device::Shmem::CMAShaddr, 256u, 512u> >, PAMI::Device::IBV::PacketModel >::EagerImpl<(PAMI::Protocol::Send::configuration_t)5, true>::immediate(pami_send_immediate_t*) () from /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/container/../lib/pami_port/libpami.so.3 (at 0x0000200030457bac) #3 PAMI::Protocol::Send::EagerSimple, PAMI::Counter::IndirectBounded, 256u>, PAMI::Counter::Indirect, PAMI::Device::Shmem::CMAShaddr, 256u, 512u> >, (PAMI::Protocol::Send::configuration_t)5>::immediate_impl(pami_send_immediate_t*) () from /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/container/../lib/pami_port/libpami.so.3 (at 0x0000200030457824) #2 bool PAMI::Device::Interface::PacketModel, PAMI::Counter::IndirectBounded, 256u>, PAMI::Counter::Indirect, PAMI::Device::Shmem::CMAShaddr, 256u, 512u> > >::postPacket<2u>(unsigned long, unsigned long, void*, unsigned long, iovec (&) [2u]) () from /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/container/../lib/pami_port/libpami.so.3 (at 0x0000200030456c18) #1 PAMI::Device::Shmem::Packet >::writePayload(PAMI::Fifo::FifoPacket<64u, 4096u>&, iovec*, unsigned long) () from /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/container/../lib/pami_port/libpami.so.3 (at 0x0000200030435a7c) #0 __memcpy_power7 () from /lib64/libc.so.6 (at 0x000020002463b804) On Fri, May 28, 2021 at 12:45 PM Barry Smith wrote: > > ~/petsc/src/mat/tutorials* > (barry/2021-05-28/robustify-cuda-gencodearch-check=)* > arch-robustify-cuda-gencodearch-check > $ ./ex5cu > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): fill_n: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > Aborted (core dumped) > > requires: cuda !define(PETSC_USE_CTABLE) > > CI does not test with CUDA and no ctable. The code is still broken as > it was six months ago in the discussion Stefano pointed to. It is clear why > just no one has had the time to clean things up. > > Barry > > > On May 28, 2021, at 11:13 AM, Mark Adams wrote: > > > > On Fri, May 28, 2021 at 11:57 AM Stefano Zampini < > stefano.zampini at gmail.com> wrote: > >> If you are referring to your device set values, I guess it is not >> currently tested >> > > No. There is a test for that (ex5cu). > I have a user that is getting a segv in MatSetValues with aijcusparse. I > suspect there is memory corruption but I'm trying to cover all the bases. > I have added a cuda test to ksp/ex56 that works. I can do an MR for it if > such a test does not exist. > > >> See the discussions here >> https://gitlab.com/petsc/petsc/-/merge_requests/3411 >> I started cleaning up the code to prepare for testing but we never >> finished it >> https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ >> >> >> On May 28, 2021, at 6:53 PM, Mark Adams wrote: >> >> Is there a test with MatSetValues and CUDA? >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri May 28 12:24:16 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 28 May 2021 13:24:16 -0400 Subject: [petsc-users] CUDA MatSetValues test In-Reply-To: References: <497E7E23-0657-4268-85A0-C5DD6E4CB23F@petsc.dev> Message-ID: I am fixing rebasing this branch over main. On Fri, May 28, 2021 at 1:16 PM Stefano Zampini wrote: > Or probably remove ?download-openmpi ? Or, just for the moment, why can?t > we just tell configure that mpi is a weak dependence of cuda.py, so that it > will be forced to be configured later? > > On May 28, 2021, at 8:12 PM, Stefano Zampini > wrote: > > That branch provides a fix for MatSetValuesDevice but it never got merged > because of the CI issues with the ?download-openmpi. We can probably try to > skip the test in that specific configuration? > > On May 28, 2021, at 7:45 PM, Barry Smith wrote: > > > ~/petsc/src/mat/tutorials* > (barry/2021-05-28/robustify-cuda-gencodearch-check=)* > arch-robustify-cuda-gencodearch-check > $ ./ex5cu > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): fill_n: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > Aborted (core dumped) > > requires: cuda !define(PETSC_USE_CTABLE) > > CI does not test with CUDA and no ctable. The code is still broken as > it was six months ago in the discussion Stefano pointed to. It is clear why > just no one has had the time to clean things up. > > Barry > > > On May 28, 2021, at 11:13 AM, Mark Adams wrote: > > > > On Fri, May 28, 2021 at 11:57 AM Stefano Zampini < > stefano.zampini at gmail.com> wrote: > >> If you are referring to your device set values, I guess it is not >> currently tested >> > > No. There is a test for that (ex5cu). > I have a user that is getting a segv in MatSetValues with aijcusparse. I > suspect there is memory corruption but I'm trying to cover all the bases. > I have added a cuda test to ksp/ex56 that works. I can do an MR for it if > such a test does not exist. > > >> See the discussions here >> https://gitlab.com/petsc/petsc/-/merge_requests/3411 >> I started cleaning up the code to prepare for testing but we never >> finished it >> https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ >> >> >> On May 28, 2021, at 6:53 PM, Mark Adams wrote: >> >> Is there a test with MatSetValues and CUDA? >> >> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri May 28 12:25:03 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 28 May 2021 13:25:03 -0400 Subject: [petsc-users] CUDA MatSetValues test In-Reply-To: References: <497E7E23-0657-4268-85A0-C5DD6E4CB23F@petsc.dev> Message-ID: Is this the correct branch? It conflicted with ex5cu so I assume it is. stefanozampini/simplify-setvalues-device On Fri, May 28, 2021 at 1:24 PM Mark Adams wrote: > I am fixing rebasing this branch over main. > > On Fri, May 28, 2021 at 1:16 PM Stefano Zampini > wrote: > >> Or probably remove ?download-openmpi ? Or, just for the moment, why can?t >> we just tell configure that mpi is a weak dependence of cuda.py, so that it >> will be forced to be configured later? >> >> On May 28, 2021, at 8:12 PM, Stefano Zampini >> wrote: >> >> That branch provides a fix for MatSetValuesDevice but it never got merged >> because of the CI issues with the ?download-openmpi. We can probably try to >> skip the test in that specific configuration? >> >> On May 28, 2021, at 7:45 PM, Barry Smith wrote: >> >> >> ~/petsc/src/mat/tutorials* >> (barry/2021-05-28/robustify-cuda-gencodearch-check=)* >> arch-robustify-cuda-gencodearch-check >> $ ./ex5cu >> terminate called after throwing an instance of >> 'thrust::system::system_error' >> what(): fill_n: failed to synchronize: cudaErrorIllegalAddress: an >> illegal memory access was encountered >> Aborted (core dumped) >> >> requires: cuda !define(PETSC_USE_CTABLE) >> >> CI does not test with CUDA and no ctable. The code is still broken as >> it was six months ago in the discussion Stefano pointed to. It is clear why >> just no one has had the time to clean things up. >> >> Barry >> >> >> On May 28, 2021, at 11:13 AM, Mark Adams wrote: >> >> >> >> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini < >> stefano.zampini at gmail.com> wrote: >> >>> If you are referring to your device set values, I guess it is not >>> currently tested >>> >> >> No. There is a test for that (ex5cu). >> I have a user that is getting a segv in MatSetValues with aijcusparse. I >> suspect there is memory corruption but I'm trying to cover all the bases. >> I have added a cuda test to ksp/ex56 that works. I can do an MR for it if >> such a test does not exist. >> >> >>> See the discussions here >>> https://gitlab.com/petsc/petsc/-/merge_requests/3411 >>> I started cleaning up the code to prepare for testing but we never >>> finished it >>> https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ >>> >>> >>> On May 28, 2021, at 6:53 PM, Mark Adams wrote: >>> >>> Is there a test with MatSetValues and CUDA? >>> >>> >>> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Fri May 28 12:26:54 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Fri, 28 May 2021 20:26:54 +0300 Subject: [petsc-users] CUDA MatSetValues test In-Reply-To: References: <497E7E23-0657-4268-85A0-C5DD6E4CB23F@petsc.dev> Message-ID: <88DEEBBE-09DD-45D6-90B1-85F092FA7BDB@gmail.com> Yes, it is the branch I was using before force pushing to Barry?s barry/2020-11-11/cleanup-matsetvaluesdevice You can use both I guess > On May 28, 2021, at 8:25 PM, Mark Adams wrote: > > Is this the correct branch? It conflicted with ex5cu so I assume it is. > > > stefanozampini/simplify-setvalues-device > > On Fri, May 28, 2021 at 1:24 PM Mark Adams > wrote: > I am fixing rebasing this branch over main. > > On Fri, May 28, 2021 at 1:16 PM Stefano Zampini > wrote: > Or probably remove ?download-openmpi ? Or, just for the moment, why can?t we just tell configure that mpi is a weak dependence of cuda.py, so that it will be forced to be configured later? > >> On May 28, 2021, at 8:12 PM, Stefano Zampini > wrote: >> >> That branch provides a fix for MatSetValuesDevice but it never got merged because of the CI issues with the ?download-openmpi. We can probably try to skip the test in that specific configuration? >> >>> On May 28, 2021, at 7:45 PM, Barry Smith > wrote: >>> >>> >>> ~/petsc/src/mat/tutorials (barry/2021-05-28/robustify-cuda-gencodearch-check=) arch-robustify-cuda-gencodearch-check >>> $ ./ex5cu >>> terminate called after throwing an instance of 'thrust::system::system_error' >>> what(): fill_n: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered >>> Aborted (core dumped) >>> >>> requires: cuda !define(PETSC_USE_CTABLE) >>> >>> CI does not test with CUDA and no ctable. The code is still broken as it was six months ago in the discussion Stefano pointed to. It is clear why just no one has had the time to clean things up. >>> >>> Barry >>> >>> >>>> On May 28, 2021, at 11:13 AM, Mark Adams > wrote: >>>> >>>> >>>> >>>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini > wrote: >>>> If you are referring to your device set values, I guess it is not currently tested >>>> >>>> No. There is a test for that (ex5cu). >>>> I have a user that is getting a segv in MatSetValues with aijcusparse. I suspect there is memory corruption but I'm trying to cover all the bases. >>>> I have added a cuda test to ksp/ex56 that works. I can do an MR for it if such a test does not exist. >>>> >>>> See the discussions here https://gitlab.com/petsc/petsc/-/merge_requests/3411 >>>> I started cleaning up the code to prepare for testing but we never finished it https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ >>>> >>>> >>>>> On May 28, 2021, at 6:53 PM, Mark Adams > wrote: >>>>> >>>>> Is there a test with MatSetValues and CUDA? >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri May 28 12:41:02 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 28 May 2021 13:41:02 -0400 Subject: [petsc-users] CUDA MatSetValues test In-Reply-To: <88DEEBBE-09DD-45D6-90B1-85F092FA7BDB@gmail.com> References: <497E7E23-0657-4268-85A0-C5DD6E4CB23F@petsc.dev> <88DEEBBE-09DD-45D6-90B1-85F092FA7BDB@gmail.com> Message-ID: OK, I will try to rebase and test Barry's branch. On Fri, May 28, 2021 at 1:26 PM Stefano Zampini wrote: > Yes, it is the branch I was using before force pushing to > Barry?s barry/2020-11-11/cleanup-matsetvaluesdevice > You can use both I guess > > On May 28, 2021, at 8:25 PM, Mark Adams wrote: > > Is this the correct branch? It conflicted with ex5cu so I assume it is. > > > stefanozampini/simplify-setvalues-device > > > On Fri, May 28, 2021 at 1:24 PM Mark Adams wrote: > >> I am fixing rebasing this branch over main. >> >> On Fri, May 28, 2021 at 1:16 PM Stefano Zampini < >> stefano.zampini at gmail.com> wrote: >> >>> Or probably remove ?download-openmpi ? Or, just for the moment, why >>> can?t we just tell configure that mpi is a weak dependence of cuda.py, so >>> that it will be forced to be configured later? >>> >>> On May 28, 2021, at 8:12 PM, Stefano Zampini >>> wrote: >>> >>> That branch provides a fix for MatSetValuesDevice but it never got >>> merged because of the CI issues with the ?download-openmpi. We can probably >>> try to skip the test in that specific configuration? >>> >>> On May 28, 2021, at 7:45 PM, Barry Smith wrote: >>> >>> >>> ~/petsc/src/mat/tutorials* >>> (barry/2021-05-28/robustify-cuda-gencodearch-check=)* >>> arch-robustify-cuda-gencodearch-check >>> $ ./ex5cu >>> terminate called after throwing an instance of >>> 'thrust::system::system_error' >>> what(): fill_n: failed to synchronize: cudaErrorIllegalAddress: an >>> illegal memory access was encountered >>> Aborted (core dumped) >>> >>> requires: cuda !define(PETSC_USE_CTABLE) >>> >>> CI does not test with CUDA and no ctable. The code is still broken as >>> it was six months ago in the discussion Stefano pointed to. It is clear why >>> just no one has had the time to clean things up. >>> >>> Barry >>> >>> >>> On May 28, 2021, at 11:13 AM, Mark Adams wrote: >>> >>> >>> >>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini < >>> stefano.zampini at gmail.com> wrote: >>> >>>> If you are referring to your device set values, I guess it is not >>>> currently tested >>>> >>> >>> No. There is a test for that (ex5cu). >>> I have a user that is getting a segv in MatSetValues with aijcusparse. I >>> suspect there is memory corruption but I'm trying to cover all the bases. >>> I have added a cuda test to ksp/ex56 that works. I can do an MR for it >>> if such a test does not exist. >>> >>> >>>> See the discussions here >>>> https://gitlab.com/petsc/petsc/-/merge_requests/3411 >>>> I started cleaning up the code to prepare for testing but we never >>>> finished it >>>> https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ >>>> >>>> >>>> On May 28, 2021, at 6:53 PM, Mark Adams wrote: >>>> >>>> Is there a test with MatSetValues and CUDA? >>>> >>>> >>>> >>> >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri May 28 12:44:59 2021 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 28 May 2021 12:44:59 -0500 Subject: [petsc-users] CUDA MatSetValues test In-Reply-To: References: <497E7E23-0657-4268-85A0-C5DD6E4CB23F@petsc.dev> <88DEEBBE-09DD-45D6-90B1-85F092FA7BDB@gmail.com> Message-ID: <502EBC54-AC76-4548-98AB-1A59FB543FDC@petsc.dev> Stefano, who has a far better memory than me, wrote > Or probably remove ?download-openmpi ? Or, just for the moment, why can?t we just tell configure that mpi is a weak dependence of cuda.py, so that it will be forced to be configured later? MPI.py depends on cuda.py so we cannot also have cuda.py depend on MPI.py using the generic dependencies of configure/packages but perhaps we can just hardwire the rerunning of cuda.py when the MPI compilers are reset. I will try that now and if I can get it to work we should be able to move those old fix branches along as MR. Barry > On May 28, 2021, at 12:41 PM, Mark Adams wrote: > > OK, I will try to rebase and test Barry's branch. > > On Fri, May 28, 2021 at 1:26 PM Stefano Zampini > wrote: > Yes, it is the branch I was using before force pushing to Barry?s barry/2020-11-11/cleanup-matsetvaluesdevice > You can use both I guess > >> On May 28, 2021, at 8:25 PM, Mark Adams > wrote: >> >> Is this the correct branch? It conflicted with ex5cu so I assume it is. >> >> >> stefanozampini/simplify-setvalues-device >> >> On Fri, May 28, 2021 at 1:24 PM Mark Adams > wrote: >> I am fixing rebasing this branch over main. >> >> On Fri, May 28, 2021 at 1:16 PM Stefano Zampini > wrote: >> Or probably remove ?download-openmpi ? Or, just for the moment, why can?t we just tell configure that mpi is a weak dependence of cuda.py, so that it will be forced to be configured later? >> >>> On May 28, 2021, at 8:12 PM, Stefano Zampini > wrote: >>> >>> That branch provides a fix for MatSetValuesDevice but it never got merged because of the CI issues with the ?download-openmpi. We can probably try to skip the test in that specific configuration? >>> >>>> On May 28, 2021, at 7:45 PM, Barry Smith > wrote: >>>> >>>> >>>> ~/petsc/src/mat/tutorials (barry/2021-05-28/robustify-cuda-gencodearch-check=) arch-robustify-cuda-gencodearch-check >>>> $ ./ex5cu >>>> terminate called after throwing an instance of 'thrust::system::system_error' >>>> what(): fill_n: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered >>>> Aborted (core dumped) >>>> >>>> requires: cuda !define(PETSC_USE_CTABLE) >>>> >>>> CI does not test with CUDA and no ctable. The code is still broken as it was six months ago in the discussion Stefano pointed to. It is clear why just no one has had the time to clean things up. >>>> >>>> Barry >>>> >>>> >>>>> On May 28, 2021, at 11:13 AM, Mark Adams > wrote: >>>>> >>>>> >>>>> >>>>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini > wrote: >>>>> If you are referring to your device set values, I guess it is not currently tested >>>>> >>>>> No. There is a test for that (ex5cu). >>>>> I have a user that is getting a segv in MatSetValues with aijcusparse. I suspect there is memory corruption but I'm trying to cover all the bases. >>>>> I have added a cuda test to ksp/ex56 that works. I can do an MR for it if such a test does not exist. >>>>> >>>>> See the discussions here https://gitlab.com/petsc/petsc/-/merge_requests/3411 >>>>> I started cleaning up the code to prepare for testing but we never finished it https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ >>>>> >>>>> >>>>>> On May 28, 2021, at 6:53 PM, Mark Adams > wrote: >>>>>> >>>>>> Is there a test with MatSetValues and CUDA? >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Fri May 28 12:50:16 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Fri, 28 May 2021 20:50:16 +0300 Subject: [petsc-users] CUDA MatSetValues test In-Reply-To: <502EBC54-AC76-4548-98AB-1A59FB543FDC@petsc.dev> References: <497E7E23-0657-4268-85A0-C5DD6E4CB23F@petsc.dev> <88DEEBBE-09DD-45D6-90B1-85F092FA7BDB@gmail.com> <502EBC54-AC76-4548-98AB-1A59FB543FDC@petsc.dev> Message-ID: OpenMPI.py depends on cuda.py in that, if cuda is present, configures using cuda. MPI.py or MPICH.py do not depend on cuda.py (MPICH, only weakly, it adds a print if cuda is present) Since eventually the MPI distro will only need a hint to be configured with CUDA, why not removing the dependency at all and add only a flag ?download-openmpi-use-cuda? > On May 28, 2021, at 8:44 PM, Barry Smith wrote: > > > Stefano, who has a far better memory than me, wrote > > > Or probably remove ?download-openmpi ? Or, just for the moment, why can?t we just tell configure that mpi is a weak dependence of cuda.py, so that it will be forced to be configured later? > > MPI.py depends on cuda.py so we cannot also have cuda.py depend on MPI.py using the generic dependencies of configure/packages > > but perhaps we can just hardwire the rerunning of cuda.py when the MPI compilers are reset. I will try that now and if I can get it to work we should be able to move those old fix branches along as MR. > > Barry > > > >> On May 28, 2021, at 12:41 PM, Mark Adams > wrote: >> >> OK, I will try to rebase and test Barry's branch. >> >> On Fri, May 28, 2021 at 1:26 PM Stefano Zampini > wrote: >> Yes, it is the branch I was using before force pushing to Barry?s barry/2020-11-11/cleanup-matsetvaluesdevice >> You can use both I guess >> >>> On May 28, 2021, at 8:25 PM, Mark Adams > wrote: >>> >>> Is this the correct branch? It conflicted with ex5cu so I assume it is. >>> >>> >>> stefanozampini/simplify-setvalues-device >>> >>> On Fri, May 28, 2021 at 1:24 PM Mark Adams > wrote: >>> I am fixing rebasing this branch over main. >>> >>> On Fri, May 28, 2021 at 1:16 PM Stefano Zampini > wrote: >>> Or probably remove ?download-openmpi ? Or, just for the moment, why can?t we just tell configure that mpi is a weak dependence of cuda.py, so that it will be forced to be configured later? >>> >>>> On May 28, 2021, at 8:12 PM, Stefano Zampini > wrote: >>>> >>>> That branch provides a fix for MatSetValuesDevice but it never got merged because of the CI issues with the ?download-openmpi. We can probably try to skip the test in that specific configuration? >>>> >>>>> On May 28, 2021, at 7:45 PM, Barry Smith > wrote: >>>>> >>>>> >>>>> ~/petsc/src/mat/tutorials (barry/2021-05-28/robustify-cuda-gencodearch-check=) arch-robustify-cuda-gencodearch-check >>>>> $ ./ex5cu >>>>> terminate called after throwing an instance of 'thrust::system::system_error' >>>>> what(): fill_n: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered >>>>> Aborted (core dumped) >>>>> >>>>> requires: cuda !define(PETSC_USE_CTABLE) >>>>> >>>>> CI does not test with CUDA and no ctable. The code is still broken as it was six months ago in the discussion Stefano pointed to. It is clear why just no one has had the time to clean things up. >>>>> >>>>> Barry >>>>> >>>>> >>>>>> On May 28, 2021, at 11:13 AM, Mark Adams > wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini > wrote: >>>>>> If you are referring to your device set values, I guess it is not currently tested >>>>>> >>>>>> No. There is a test for that (ex5cu). >>>>>> I have a user that is getting a segv in MatSetValues with aijcusparse. I suspect there is memory corruption but I'm trying to cover all the bases. >>>>>> I have added a cuda test to ksp/ex56 that works. I can do an MR for it if such a test does not exist. >>>>>> >>>>>> See the discussions here https://gitlab.com/petsc/petsc/-/merge_requests/3411 >>>>>> I started cleaning up the code to prepare for testing but we never finished it https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ >>>>>> >>>>>> >>>>>>> On May 28, 2021, at 6:53 PM, Mark Adams > wrote: >>>>>>> >>>>>>> Is there a test with MatSetValues and CUDA? >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri May 28 13:15:38 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 28 May 2021 14:15:38 -0400 Subject: [petsc-users] CUDA MatSetValues test In-Reply-To: References: <497E7E23-0657-4268-85A0-C5DD6E4CB23F@petsc.dev> <88DEEBBE-09DD-45D6-90B1-85F092FA7BDB@gmail.com> <502EBC54-AC76-4548-98AB-1A59FB543FDC@petsc.dev> Message-ID: I am rebasing over main and its a bit of a mess. I must have missed something. I get this. I think the _n_SplitCSRMat must be wrong. In file included from /autofs/nccs-svm1_home1/adams/petsc/src/vec/is/sf/impls/basic/sfbasic.c:128:0: /ccs/home/adams/petsc/include/petscmat.h:1976:32: error: conflicting types for 'PetscSplitCSRDataStructure' typedef struct _n_SplitCSRMat *PetscSplitCSRDataStructure; ^~~~~~~~~~~~~~~~~~~~~~~~~~ /ccs/home/adams/petsc/include/petscmat.h:1922:31: note: previous declaration of 'PetscSplitCSRDataStructure' was here typedef struct _p_SplitCSRMat PetscSplitCSRDataStructure; ^~~~~~~~~~~~~~~~~~~~~~~~~~ CC arch-summit-opt-gnu-cuda/obj/vec/vec/impls/seq/dvec2.o On Fri, May 28, 2021 at 1:50 PM Stefano Zampini wrote: > OpenMPI.py depends on cuda.py in that, if cuda is present, configures > using cuda. MPI.py or MPICH.py do not depend on cuda.py (MPICH, only > weakly, it adds a print if cuda is present) > Since eventually the MPI distro will only need a hint to be configured > with CUDA, why not removing the dependency at all and add only a flag > ?download-openmpi-use-cuda? > > On May 28, 2021, at 8:44 PM, Barry Smith wrote: > > > Stefano, who has a far better memory than me, wrote > > > Or probably remove ?download-openmpi ? Or, just for the moment, why > can?t we just tell configure that mpi is a weak dependence of cuda.py, so > that it will be forced to be configured later? > > MPI.py depends on cuda.py so we cannot also have cuda.py depend on > MPI.py using the generic dependencies of configure/packages > > but perhaps we can just hardwire the rerunning of cuda.py when the MPI > compilers are reset. I will try that now and if I can get it to work we > should be able to move those old fix branches along as MR. > > Barry > > > > On May 28, 2021, at 12:41 PM, Mark Adams wrote: > > OK, I will try to rebase and test Barry's branch. > > On Fri, May 28, 2021 at 1:26 PM Stefano Zampini > wrote: > >> Yes, it is the branch I was using before force pushing to >> Barry?s barry/2020-11-11/cleanup-matsetvaluesdevice >> You can use both I guess >> >> On May 28, 2021, at 8:25 PM, Mark Adams wrote: >> >> Is this the correct branch? It conflicted with ex5cu so I assume it is. >> >> >> stefanozampini/simplify-setvalues-device >> >> >> On Fri, May 28, 2021 at 1:24 PM Mark Adams wrote: >> >>> I am fixing rebasing this branch over main. >>> >>> On Fri, May 28, 2021 at 1:16 PM Stefano Zampini < >>> stefano.zampini at gmail.com> wrote: >>> >>>> Or probably remove ?download-openmpi ? Or, just for the moment, why >>>> can?t we just tell configure that mpi is a weak dependence of cuda.py, so >>>> that it will be forced to be configured later? >>>> >>>> On May 28, 2021, at 8:12 PM, Stefano Zampini >>>> wrote: >>>> >>>> That branch provides a fix for MatSetValuesDevice but it never got >>>> merged because of the CI issues with the ?download-openmpi. We can probably >>>> try to skip the test in that specific configuration? >>>> >>>> On May 28, 2021, at 7:45 PM, Barry Smith wrote: >>>> >>>> >>>> ~/petsc/src/mat/tutorials* >>>> (barry/2021-05-28/robustify-cuda-gencodearch-check=)* >>>> arch-robustify-cuda-gencodearch-check >>>> $ ./ex5cu >>>> terminate called after throwing an instance of >>>> 'thrust::system::system_error' >>>> what(): fill_n: failed to synchronize: cudaErrorIllegalAddress: an >>>> illegal memory access was encountered >>>> Aborted (core dumped) >>>> >>>> requires: cuda !define(PETSC_USE_CTABLE) >>>> >>>> CI does not test with CUDA and no ctable. The code is still broken >>>> as it was six months ago in the discussion Stefano pointed to. It is clear >>>> why just no one has had the time to clean things up. >>>> >>>> Barry >>>> >>>> >>>> On May 28, 2021, at 11:13 AM, Mark Adams wrote: >>>> >>>> >>>> >>>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini < >>>> stefano.zampini at gmail.com> wrote: >>>> >>>>> If you are referring to your device set values, I guess it is not >>>>> currently tested >>>>> >>>> >>>> No. There is a test for that (ex5cu). >>>> I have a user that is getting a segv in MatSetValues with aijcusparse. >>>> I suspect there is memory corruption but I'm trying to cover all the bases. >>>> I have added a cuda test to ksp/ex56 that works. I can do an MR for it >>>> if such a test does not exist. >>>> >>>> >>>>> See the discussions here >>>>> https://gitlab.com/petsc/petsc/-/merge_requests/3411 >>>>> I started cleaning up the code to prepare for testing but we never >>>>> finished it >>>>> https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ >>>>> >>>>> >>>>> On May 28, 2021, at 6:53 PM, Mark Adams wrote: >>>>> >>>>> Is there a test with MatSetValues and CUDA? >>>>> >>>>> >>>>> >>>> >>>> >>>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri May 28 13:51:52 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 28 May 2021 14:51:52 -0400 Subject: [petsc-users] CUDA MatSetValues test In-Reply-To: References: <497E7E23-0657-4268-85A0-C5DD6E4CB23F@petsc.dev> <88DEEBBE-09DD-45D6-90B1-85F092FA7BDB@gmail.com> <502EBC54-AC76-4548-98AB-1A59FB543FDC@petsc.dev> Message-ID: I am getting messed up in trying to resolve conflicts in rebasing over main. Is there a better way of doing this? Can I just tell git to use Barry's version and then test it? Or should I just try it again? On Fri, May 28, 2021 at 2:15 PM Mark Adams wrote: > I am rebasing over main and its a bit of a mess. I must have missed > something. I get this. I think the _n_SplitCSRMat must be wrong. > > > In file included from > /autofs/nccs-svm1_home1/adams/petsc/src/vec/is/sf/impls/basic/sfbasic.c:128:0: > /ccs/home/adams/petsc/include/petscmat.h:1976:32: error: conflicting types > for 'PetscSplitCSRDataStructure' > typedef struct _n_SplitCSRMat *PetscSplitCSRDataStructure; > ^~~~~~~~~~~~~~~~~~~~~~~~~~ > /ccs/home/adams/petsc/include/petscmat.h:1922:31: note: previous > declaration of 'PetscSplitCSRDataStructure' was here > typedef struct _p_SplitCSRMat PetscSplitCSRDataStructure; > ^~~~~~~~~~~~~~~~~~~~~~~~~~ > CC arch-summit-opt-gnu-cuda/obj/vec/vec/impls/seq/dvec2.o > > On Fri, May 28, 2021 at 1:50 PM Stefano Zampini > wrote: > >> OpenMPI.py depends on cuda.py in that, if cuda is present, configures >> using cuda. MPI.py or MPICH.py do not depend on cuda.py (MPICH, only >> weakly, it adds a print if cuda is present) >> Since eventually the MPI distro will only need a hint to be configured >> with CUDA, why not removing the dependency at all and add only a flag >> ?download-openmpi-use-cuda? >> >> On May 28, 2021, at 8:44 PM, Barry Smith wrote: >> >> >> Stefano, who has a far better memory than me, wrote >> >> > Or probably remove ?download-openmpi ? Or, just for the moment, why >> can?t we just tell configure that mpi is a weak dependence of cuda.py, so >> that it will be forced to be configured later? >> >> MPI.py depends on cuda.py so we cannot also have cuda.py depend on >> MPI.py using the generic dependencies of configure/packages >> >> but perhaps we can just hardwire the rerunning of cuda.py when the MPI >> compilers are reset. I will try that now and if I can get it to work we >> should be able to move those old fix branches along as MR. >> >> Barry >> >> >> >> On May 28, 2021, at 12:41 PM, Mark Adams wrote: >> >> OK, I will try to rebase and test Barry's branch. >> >> On Fri, May 28, 2021 at 1:26 PM Stefano Zampini < >> stefano.zampini at gmail.com> wrote: >> >>> Yes, it is the branch I was using before force pushing to >>> Barry?s barry/2020-11-11/cleanup-matsetvaluesdevice >>> You can use both I guess >>> >>> On May 28, 2021, at 8:25 PM, Mark Adams wrote: >>> >>> Is this the correct branch? It conflicted with ex5cu so I assume it is. >>> >>> >>> stefanozampini/simplify-setvalues-device >>> >>> >>> On Fri, May 28, 2021 at 1:24 PM Mark Adams wrote: >>> >>>> I am fixing rebasing this branch over main. >>>> >>>> On Fri, May 28, 2021 at 1:16 PM Stefano Zampini < >>>> stefano.zampini at gmail.com> wrote: >>>> >>>>> Or probably remove ?download-openmpi ? Or, just for the moment, why >>>>> can?t we just tell configure that mpi is a weak dependence of cuda.py, so >>>>> that it will be forced to be configured later? >>>>> >>>>> On May 28, 2021, at 8:12 PM, Stefano Zampini < >>>>> stefano.zampini at gmail.com> wrote: >>>>> >>>>> That branch provides a fix for MatSetValuesDevice but it never got >>>>> merged because of the CI issues with the ?download-openmpi. We can probably >>>>> try to skip the test in that specific configuration? >>>>> >>>>> On May 28, 2021, at 7:45 PM, Barry Smith wrote: >>>>> >>>>> >>>>> ~/petsc/src/mat/tutorials* >>>>> (barry/2021-05-28/robustify-cuda-gencodearch-check=)* >>>>> arch-robustify-cuda-gencodearch-check >>>>> $ ./ex5cu >>>>> terminate called after throwing an instance of >>>>> 'thrust::system::system_error' >>>>> what(): fill_n: failed to synchronize: cudaErrorIllegalAddress: an >>>>> illegal memory access was encountered >>>>> Aborted (core dumped) >>>>> >>>>> requires: cuda !define(PETSC_USE_CTABLE) >>>>> >>>>> CI does not test with CUDA and no ctable. The code is still broken >>>>> as it was six months ago in the discussion Stefano pointed to. It is clear >>>>> why just no one has had the time to clean things up. >>>>> >>>>> Barry >>>>> >>>>> >>>>> On May 28, 2021, at 11:13 AM, Mark Adams wrote: >>>>> >>>>> >>>>> >>>>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini < >>>>> stefano.zampini at gmail.com> wrote: >>>>> >>>>>> If you are referring to your device set values, I guess it is not >>>>>> currently tested >>>>>> >>>>> >>>>> No. There is a test for that (ex5cu). >>>>> I have a user that is getting a segv in MatSetValues with aijcusparse. >>>>> I suspect there is memory corruption but I'm trying to cover all the bases. >>>>> I have added a cuda test to ksp/ex56 that works. I can do an MR for it >>>>> if such a test does not exist. >>>>> >>>>> >>>>>> See the discussions here >>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/3411 >>>>>> I started cleaning up the code to prepare for testing but we never >>>>>> finished it >>>>>> https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ >>>>>> >>>>>> >>>>>> On May 28, 2021, at 6:53 PM, Mark Adams wrote: >>>>>> >>>>>> Is there a test with MatSetValues and CUDA? >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Fri May 28 14:13:06 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Fri, 28 May 2021 22:13:06 +0300 Subject: [petsc-users] CUDA MatSetValues test In-Reply-To: References: <497E7E23-0657-4268-85A0-C5DD6E4CB23F@petsc.dev> <88DEEBBE-09DD-45D6-90B1-85F092FA7BDB@gmail.com> <502EBC54-AC76-4548-98AB-1A59FB543FDC@petsc.dev> Message-ID: I can take a quick look at it tomorrow, what are the main changes you made since then? > On May 28, 2021, at 9:51 PM, Mark Adams wrote: > > I am getting messed up in trying to resolve conflicts in rebasing over main. > Is there a better way of doing this? > Can I just tell git to use Barry's version and then test it? > Or should I just try it again? > > On Fri, May 28, 2021 at 2:15 PM Mark Adams > wrote: > I am rebasing over main and its a bit of a mess. I must have missed something. I get this. I think the _n_SplitCSRMat must be wrong. > > > In file included from /autofs/nccs-svm1_home1/adams/petsc/src/vec/is/sf/impls/basic/sfbasic.c:128:0: > /ccs/home/adams/petsc/include/petscmat.h:1976:32: error: conflicting types for 'PetscSplitCSRDataStructure' > typedef struct _n_SplitCSRMat *PetscSplitCSRDataStructure; > ^~~~~~~~~~~~~~~~~~~~~~~~~~ > /ccs/home/adams/petsc/include/petscmat.h:1922:31: note: previous declaration of 'PetscSplitCSRDataStructure' was here > typedef struct _p_SplitCSRMat PetscSplitCSRDataStructure; > ^~~~~~~~~~~~~~~~~~~~~~~~~~ > CC arch-summit-opt-gnu-cuda/obj/vec/vec/impls/seq/dvec2.o > > On Fri, May 28, 2021 at 1:50 PM Stefano Zampini > wrote: > OpenMPI.py depends on cuda.py in that, if cuda is present, configures using cuda. MPI.py or MPICH.py do not depend on cuda.py (MPICH, only weakly, it adds a print if cuda is present) > Since eventually the MPI distro will only need a hint to be configured with CUDA, why not removing the dependency at all and add only a flag ?download-openmpi-use-cuda? > >> On May 28, 2021, at 8:44 PM, Barry Smith > wrote: >> >> >> Stefano, who has a far better memory than me, wrote >> >> > Or probably remove ?download-openmpi ? Or, just for the moment, why can?t we just tell configure that mpi is a weak dependence of cuda.py, so that it will be forced to be configured later? >> >> MPI.py depends on cuda.py so we cannot also have cuda.py depend on MPI.py using the generic dependencies of configure/packages >> >> but perhaps we can just hardwire the rerunning of cuda.py when the MPI compilers are reset. I will try that now and if I can get it to work we should be able to move those old fix branches along as MR. >> >> Barry >> >> >> >>> On May 28, 2021, at 12:41 PM, Mark Adams > wrote: >>> >>> OK, I will try to rebase and test Barry's branch. >>> >>> On Fri, May 28, 2021 at 1:26 PM Stefano Zampini > wrote: >>> Yes, it is the branch I was using before force pushing to Barry?s barry/2020-11-11/cleanup-matsetvaluesdevice >>> You can use both I guess >>> >>>> On May 28, 2021, at 8:25 PM, Mark Adams > wrote: >>>> >>>> Is this the correct branch? It conflicted with ex5cu so I assume it is. >>>> >>>> >>>> stefanozampini/simplify-setvalues-device >>>> >>>> On Fri, May 28, 2021 at 1:24 PM Mark Adams > wrote: >>>> I am fixing rebasing this branch over main. >>>> >>>> On Fri, May 28, 2021 at 1:16 PM Stefano Zampini > wrote: >>>> Or probably remove ?download-openmpi ? Or, just for the moment, why can?t we just tell configure that mpi is a weak dependence of cuda.py, so that it will be forced to be configured later? >>>> >>>>> On May 28, 2021, at 8:12 PM, Stefano Zampini > wrote: >>>>> >>>>> That branch provides a fix for MatSetValuesDevice but it never got merged because of the CI issues with the ?download-openmpi. We can probably try to skip the test in that specific configuration? >>>>> >>>>>> On May 28, 2021, at 7:45 PM, Barry Smith > wrote: >>>>>> >>>>>> >>>>>> ~/petsc/src/mat/tutorials (barry/2021-05-28/robustify-cuda-gencodearch-check=) arch-robustify-cuda-gencodearch-check >>>>>> $ ./ex5cu >>>>>> terminate called after throwing an instance of 'thrust::system::system_error' >>>>>> what(): fill_n: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered >>>>>> Aborted (core dumped) >>>>>> >>>>>> requires: cuda !define(PETSC_USE_CTABLE) >>>>>> >>>>>> CI does not test with CUDA and no ctable. The code is still broken as it was six months ago in the discussion Stefano pointed to. It is clear why just no one has had the time to clean things up. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>>> On May 28, 2021, at 11:13 AM, Mark Adams > wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini > wrote: >>>>>>> If you are referring to your device set values, I guess it is not currently tested >>>>>>> >>>>>>> No. There is a test for that (ex5cu). >>>>>>> I have a user that is getting a segv in MatSetValues with aijcusparse. I suspect there is memory corruption but I'm trying to cover all the bases. >>>>>>> I have added a cuda test to ksp/ex56 that works. I can do an MR for it if such a test does not exist. >>>>>>> >>>>>>> See the discussions here https://gitlab.com/petsc/petsc/-/merge_requests/3411 >>>>>>> I started cleaning up the code to prepare for testing but we never finished it https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ >>>>>>> >>>>>>> >>>>>>>> On May 28, 2021, at 6:53 PM, Mark Adams > wrote: >>>>>>>> >>>>>>>> Is there a test with MatSetValues and CUDA? >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri May 28 14:39:42 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 28 May 2021 15:39:42 -0400 Subject: [petsc-users] CUDA MatSetValues test In-Reply-To: References: <497E7E23-0657-4268-85A0-C5DD6E4CB23F@petsc.dev> <88DEEBBE-09DD-45D6-90B1-85F092FA7BDB@gmail.com> <502EBC54-AC76-4548-98AB-1A59FB543FDC@petsc.dev> Message-ID: Thanks, I did not intend to make any (real) changes. The only thing that I did not intend to use from Barry's branch, that conflicted, was the help and comment block at the top of ex5cu.cu * I ended up with two declarations of PetscSplitCSRDataStructure * I added some includes to fix errors like this: /ccs/home/adams/petsc/include/../src/mat/impls/aij/seq/seqcusparse/cusparsematimpl.h(263): error: incomplete type is not allowed * I end ended not having csr2csc_i in Mat_SeqAIJCUSPARSE so I get: /autofs/nccs-svm1_home1/adams/petsc/src/mat/impls/aij/seq/seqcusparse/ aijcusparse.cu(1348): error: class "Mat_SeqAIJCUSPARSE" has no member "csr2csc_i" On Fri, May 28, 2021 at 3:13 PM Stefano Zampini wrote: > I can take a quick look at it tomorrow, what are the main changes you made > since then? > > On May 28, 2021, at 9:51 PM, Mark Adams wrote: > > I am getting messed up in trying to resolve conflicts in rebasing over > main. > Is there a better way of doing this? > Can I just tell git to use Barry's version and then test it? > Or should I just try it again? > > On Fri, May 28, 2021 at 2:15 PM Mark Adams wrote: > >> I am rebasing over main and its a bit of a mess. I must have missed >> something. I get this. I think the _n_SplitCSRMat must be wrong. >> >> >> In file included from >> /autofs/nccs-svm1_home1/adams/petsc/src/vec/is/sf/impls/basic/sfbasic.c:128:0: >> /ccs/home/adams/petsc/include/petscmat.h:1976:32: error: conflicting >> types for 'PetscSplitCSRDataStructure' >> typedef struct _n_SplitCSRMat *PetscSplitCSRDataStructure; >> ^~~~~~~~~~~~~~~~~~~~~~~~~~ >> /ccs/home/adams/petsc/include/petscmat.h:1922:31: note: previous >> declaration of 'PetscSplitCSRDataStructure' was here >> typedef struct _p_SplitCSRMat PetscSplitCSRDataStructure; >> ^~~~~~~~~~~~~~~~~~~~~~~~~~ >> CC arch-summit-opt-gnu-cuda/obj/vec/vec/impls/seq/dvec2.o >> >> On Fri, May 28, 2021 at 1:50 PM Stefano Zampini < >> stefano.zampini at gmail.com> wrote: >> >>> OpenMPI.py depends on cuda.py in that, if cuda is present, configures >>> using cuda. MPI.py or MPICH.py do not depend on cuda.py (MPICH, only >>> weakly, it adds a print if cuda is present) >>> Since eventually the MPI distro will only need a hint to be configured >>> with CUDA, why not removing the dependency at all and add only a flag >>> ?download-openmpi-use-cuda? >>> >>> On May 28, 2021, at 8:44 PM, Barry Smith wrote: >>> >>> >>> Stefano, who has a far better memory than me, wrote >>> >>> > Or probably remove ?download-openmpi ? Or, just for the moment, why >>> can?t we just tell configure that mpi is a weak dependence of cuda.py, so >>> that it will be forced to be configured later? >>> >>> MPI.py depends on cuda.py so we cannot also have cuda.py depend on >>> MPI.py using the generic dependencies of configure/packages >>> >>> but perhaps we can just hardwire the rerunning of cuda.py when the MPI >>> compilers are reset. I will try that now and if I can get it to work we >>> should be able to move those old fix branches along as MR. >>> >>> Barry >>> >>> >>> >>> On May 28, 2021, at 12:41 PM, Mark Adams wrote: >>> >>> OK, I will try to rebase and test Barry's branch. >>> >>> On Fri, May 28, 2021 at 1:26 PM Stefano Zampini < >>> stefano.zampini at gmail.com> wrote: >>> >>>> Yes, it is the branch I was using before force pushing to >>>> Barry?s barry/2020-11-11/cleanup-matsetvaluesdevice >>>> You can use both I guess >>>> >>>> On May 28, 2021, at 8:25 PM, Mark Adams wrote: >>>> >>>> Is this the correct branch? It conflicted with ex5cu so I assume it is. >>>> >>>> >>>> stefanozampini/simplify-setvalues-device >>>> >>>> >>>> On Fri, May 28, 2021 at 1:24 PM Mark Adams wrote: >>>> >>>>> I am fixing rebasing this branch over main. >>>>> >>>>> On Fri, May 28, 2021 at 1:16 PM Stefano Zampini < >>>>> stefano.zampini at gmail.com> wrote: >>>>> >>>>>> Or probably remove ?download-openmpi ? Or, just for the moment, why >>>>>> can?t we just tell configure that mpi is a weak dependence of cuda.py, so >>>>>> that it will be forced to be configured later? >>>>>> >>>>>> On May 28, 2021, at 8:12 PM, Stefano Zampini < >>>>>> stefano.zampini at gmail.com> wrote: >>>>>> >>>>>> That branch provides a fix for MatSetValuesDevice but it never got >>>>>> merged because of the CI issues with the ?download-openmpi. We can probably >>>>>> try to skip the test in that specific configuration? >>>>>> >>>>>> On May 28, 2021, at 7:45 PM, Barry Smith wrote: >>>>>> >>>>>> >>>>>> ~/petsc/src/mat/tutorials* >>>>>> (barry/2021-05-28/robustify-cuda-gencodearch-check=)* >>>>>> arch-robustify-cuda-gencodearch-check >>>>>> $ ./ex5cu >>>>>> terminate called after throwing an instance of >>>>>> 'thrust::system::system_error' >>>>>> what(): fill_n: failed to synchronize: cudaErrorIllegalAddress: an >>>>>> illegal memory access was encountered >>>>>> Aborted (core dumped) >>>>>> >>>>>> requires: cuda !define(PETSC_USE_CTABLE) >>>>>> >>>>>> CI does not test with CUDA and no ctable. The code is still broken >>>>>> as it was six months ago in the discussion Stefano pointed to. It is clear >>>>>> why just no one has had the time to clean things up. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> On May 28, 2021, at 11:13 AM, Mark Adams wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini < >>>>>> stefano.zampini at gmail.com> wrote: >>>>>> >>>>>>> If you are referring to your device set values, I guess it is not >>>>>>> currently tested >>>>>>> >>>>>> >>>>>> No. There is a test for that (ex5cu). >>>>>> I have a user that is getting a segv in MatSetValues with >>>>>> aijcusparse. I suspect there is memory corruption but I'm trying to cover >>>>>> all the bases. >>>>>> I have added a cuda test to ksp/ex56 that works. I can do an MR for >>>>>> it if such a test does not exist. >>>>>> >>>>>> >>>>>>> See the discussions here >>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/3411 >>>>>>> I started cleaning up the code to prepare for testing but we never >>>>>>> finished it >>>>>>> https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ >>>>>>> >>>>>>> >>>>>>> On May 28, 2021, at 6:53 PM, Mark Adams wrote: >>>>>>> >>>>>>> Is there a test with MatSetValues and CUDA? >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>> >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri May 28 22:53:17 2021 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 28 May 2021 22:53:17 -0500 Subject: [petsc-users] CUDA MatSetValues test In-Reply-To: References: <497E7E23-0657-4268-85A0-C5DD6E4CB23F@petsc.dev> <88DEEBBE-09DD-45D6-90B1-85F092FA7BDB@gmail.com> <502EBC54-AC76-4548-98AB-1A59FB543FDC@petsc.dev> Message-ID: I have rebased and tried to fix everything. I am now fixing the issues of --download-openmpi and cuda, once that is done I will test, rebase with main again if needed and restart the MR and get it into main. Barry I was stupid to let the MR lay fallow, I should have figured out a solution to the openmpi and cuda issue instead of punting and waiting for a dream fix. > On May 28, 2021, at 2:39 PM, Mark Adams wrote: > > Thanks, > > I did not intend to make any (real) changes. > The only thing that I did not intend to use from Barry's branch, that conflicted, was the help and comment block at the top of ex5cu.cu > > * I ended up with two declarations of PetscSplitCSRDataStructure > * I added some includes to fix errors like this: > /ccs/home/adams/petsc/include/../src/mat/impls/aij/seq/seqcusparse/cusparsematimpl.h(263): error: incomplete type is not allowed > * I end ended not having csr2csc_i in Mat_SeqAIJCUSPARSE so I get: > /autofs/nccs-svm1_home1/adams/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu (1348): error: class "Mat_SeqAIJCUSPARSE" has no member "csr2csc_i" > > > > > On Fri, May 28, 2021 at 3:13 PM Stefano Zampini > wrote: > I can take a quick look at it tomorrow, what are the main changes you made since then? > >> On May 28, 2021, at 9:51 PM, Mark Adams > wrote: >> >> I am getting messed up in trying to resolve conflicts in rebasing over main. >> Is there a better way of doing this? >> Can I just tell git to use Barry's version and then test it? >> Or should I just try it again? >> >> On Fri, May 28, 2021 at 2:15 PM Mark Adams > wrote: >> I am rebasing over main and its a bit of a mess. I must have missed something. I get this. I think the _n_SplitCSRMat must be wrong. >> >> >> In file included from /autofs/nccs-svm1_home1/adams/petsc/src/vec/is/sf/impls/basic/sfbasic.c:128:0: >> /ccs/home/adams/petsc/include/petscmat.h:1976:32: error: conflicting types for 'PetscSplitCSRDataStructure' >> typedef struct _n_SplitCSRMat *PetscSplitCSRDataStructure; >> ^~~~~~~~~~~~~~~~~~~~~~~~~~ >> /ccs/home/adams/petsc/include/petscmat.h:1922:31: note: previous declaration of 'PetscSplitCSRDataStructure' was here >> typedef struct _p_SplitCSRMat PetscSplitCSRDataStructure; >> ^~~~~~~~~~~~~~~~~~~~~~~~~~ >> CC arch-summit-opt-gnu-cuda/obj/vec/vec/impls/seq/dvec2.o >> >> On Fri, May 28, 2021 at 1:50 PM Stefano Zampini > wrote: >> OpenMPI.py depends on cuda.py in that, if cuda is present, configures using cuda. MPI.py or MPICH.py do not depend on cuda.py (MPICH, only weakly, it adds a print if cuda is present) >> Since eventually the MPI distro will only need a hint to be configured with CUDA, why not removing the dependency at all and add only a flag ?download-openmpi-use-cuda? >> >>> On May 28, 2021, at 8:44 PM, Barry Smith > wrote: >>> >>> >>> Stefano, who has a far better memory than me, wrote >>> >>> > Or probably remove ?download-openmpi ? Or, just for the moment, why can?t we just tell configure that mpi is a weak dependence of cuda.py, so that it will be forced to be configured later? >>> >>> MPI.py depends on cuda.py so we cannot also have cuda.py depend on MPI.py using the generic dependencies of configure/packages >>> >>> but perhaps we can just hardwire the rerunning of cuda.py when the MPI compilers are reset. I will try that now and if I can get it to work we should be able to move those old fix branches along as MR. >>> >>> Barry >>> >>> >>> >>>> On May 28, 2021, at 12:41 PM, Mark Adams > wrote: >>>> >>>> OK, I will try to rebase and test Barry's branch. >>>> >>>> On Fri, May 28, 2021 at 1:26 PM Stefano Zampini > wrote: >>>> Yes, it is the branch I was using before force pushing to Barry?s barry/2020-11-11/cleanup-matsetvaluesdevice >>>> You can use both I guess >>>> >>>>> On May 28, 2021, at 8:25 PM, Mark Adams > wrote: >>>>> >>>>> Is this the correct branch? It conflicted with ex5cu so I assume it is. >>>>> >>>>> >>>>> stefanozampini/simplify-setvalues-device >>>>> >>>>> On Fri, May 28, 2021 at 1:24 PM Mark Adams > wrote: >>>>> I am fixing rebasing this branch over main. >>>>> >>>>> On Fri, May 28, 2021 at 1:16 PM Stefano Zampini > wrote: >>>>> Or probably remove ?download-openmpi ? Or, just for the moment, why can?t we just tell configure that mpi is a weak dependence of cuda.py, so that it will be forced to be configured later? >>>>> >>>>>> On May 28, 2021, at 8:12 PM, Stefano Zampini > wrote: >>>>>> >>>>>> That branch provides a fix for MatSetValuesDevice but it never got merged because of the CI issues with the ?download-openmpi. We can probably try to skip the test in that specific configuration? >>>>>> >>>>>>> On May 28, 2021, at 7:45 PM, Barry Smith > wrote: >>>>>>> >>>>>>> >>>>>>> ~/petsc/src/mat/tutorials (barry/2021-05-28/robustify-cuda-gencodearch-check=) arch-robustify-cuda-gencodearch-check >>>>>>> $ ./ex5cu >>>>>>> terminate called after throwing an instance of 'thrust::system::system_error' >>>>>>> what(): fill_n: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered >>>>>>> Aborted (core dumped) >>>>>>> >>>>>>> requires: cuda !define(PETSC_USE_CTABLE) >>>>>>> >>>>>>> CI does not test with CUDA and no ctable. The code is still broken as it was six months ago in the discussion Stefano pointed to. It is clear why just no one has had the time to clean things up. >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>>> On May 28, 2021, at 11:13 AM, Mark Adams > wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini > wrote: >>>>>>>> If you are referring to your device set values, I guess it is not currently tested >>>>>>>> >>>>>>>> No. There is a test for that (ex5cu). >>>>>>>> I have a user that is getting a segv in MatSetValues with aijcusparse. I suspect there is memory corruption but I'm trying to cover all the bases. >>>>>>>> I have added a cuda test to ksp/ex56 that works. I can do an MR for it if such a test does not exist. >>>>>>>> >>>>>>>> See the discussions here https://gitlab.com/petsc/petsc/-/merge_requests/3411 >>>>>>>> I started cleaning up the code to prepare for testing but we never finished it https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ >>>>>>>> >>>>>>>> >>>>>>>>> On May 28, 2021, at 6:53 PM, Mark Adams > wrote: >>>>>>>>> >>>>>>>>> Is there a test with MatSetValues and CUDA? >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Fri May 28 23:09:31 2021 From: jed at jedbrown.org (Jed Brown) Date: Fri, 28 May 2021 22:09:31 -0600 Subject: [petsc-users] reproducibility In-Reply-To: References: <26E0FF29-CD2F-46CA-84B7-49F8BF89C18D@gmx.li> Message-ID: <87im32yydg.fsf@jedbrown.org> Matthew Knepley writes: > On Fri, May 28, 2021 at 12:05 PM Mark Adams wrote: > >> On Fri, May 28, 2021 at 11:53 AM Lawrence Mitchell wrote: >> >>> >>> >>> > On 28 May 2021, at 16:51, Mark Adams wrote: >>> > >>> > It sounds like I should get one branch settled, use it, and keep that >>> branch in the repo, and to be safe not touch it, and that should work for >>> at least a few months. I just want it to work if the reviewer tests it :) >>> > >>> >>> Just bake a docker container and archive it on zenodo.org or figshare :) >>> >> >> Are you serious? >> > > Yes. That is what Zenodo is for. Yeah, an alternative is to archive that commit of the PETSc repository if you don't want to pin the compiler toolchain, etc. (There are pros and cons of pinning the whole environment versus enabling tinkering. One option is to have the repository snapshot and a Dockerfile that specifies versions, which would allow you to test and them to spin up something similar. But usually that build involves an apt-get update so you'll get new maintenance releases of all the packages. Usually harmless, but something to keep in mind.) From zonexo at gmail.com Sat May 29 04:37:38 2021 From: zonexo at gmail.com (TAY wee-beng) Date: Sat, 29 May 2021 17:37:38 +0800 Subject: [petsc-users] HYPRE link Message-ID: <181dd486-7c05-2c76-17d5-8293a85d4797@gmail.com> Hi, I need to compile PETSc with HYPRE but I can't download directly since my cluster is behind firewall. Can someone give me the link to download it? -- Thank you very much. Yours sincerely, ================================================ TAY Wee-Beng ??? (Zheng Weiming) Personal research webpage: _http://tayweebeng.wixsite.com/website_ Youtube research showcase: _https://goo.gl/PtvdwQ_ linkedin: _https://www.linkedin.com/in/tay-weebeng_ ================================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat May 29 05:47:51 2021 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 29 May 2021 06:47:51 -0400 Subject: [petsc-users] HYPRE link In-Reply-To: <181dd486-7c05-2c76-17d5-8293a85d4797@gmail.com> References: <181dd486-7c05-2c76-17d5-8293a85d4797@gmail.com> Message-ID: On Sat, May 29, 2021 at 5:38 AM TAY wee-beng wrote: > Hi, > > I need to compile PETSc with HYPRE but I can't download directly since my > cluster is behind firewall. > > Can someone give me the link to download it? > The links are in the configure modules: https://gitlab.com/petsc/petsc/-/blob/main/config/BuildSystem/config/packages/hypre.py#L13 Thanks, Matt > -- > > Thank you very much. > > Yours sincerely, > > ================================================ > TAY Wee-Beng ??? (Zheng Weiming) > Personal research webpage: *http://tayweebeng.wixsite.com/website > * > Youtube research showcase: *https://goo.gl/PtvdwQ * > linkedin: *https://www.linkedin.com/in/tay-weebeng > * > ================================================ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Sat May 29 09:55:01 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Sat, 29 May 2021 09:55:01 -0500 Subject: [petsc-users] HYPRE link In-Reply-To: References: <181dd486-7c05-2c76-17d5-8293a85d4797@gmail.com> Message-ID: <3fea1b85-1aac-b060-5189-36b7c01e36cc@mcs.anl.gov> One can use --with-packages-download-dir for this use case. i.e: - run configure with --with-packages-download-dir - it prints the needed URLs - download the pacakges using the above URL/s - and place in this location - rerun configure [with the same options] - and configure will use these downloads from this location. Satish -------- balay at sb /home/balay/petsc (main=) $ ./configure --download-hypre --with-packages-download-dir=$PWD =============================================================================== Configuring PETSc to compile on your system =============================================================================== Download the following packages to /home/balay/petsc hypre ['git://https://github.com/hypre-space/hypre', 'https://github.com/hypre-space/hypre/archive/v2.20.0.tar.gz'] Then run the script again On Sat, 29 May 2021, Matthew Knepley wrote: > On Sat, May 29, 2021 at 5:38 AM TAY wee-beng wrote: > > > Hi, > > > > I need to compile PETSc with HYPRE but I can't download directly since my > > cluster is behind firewall. > > > > Can someone give me the link to download it? > > > The links are in the configure modules: > > > https://gitlab.com/petsc/petsc/-/blob/main/config/BuildSystem/config/packages/hypre.py#L13 > > Thanks, > > Matt > > > > -- > > > > Thank you very much. > > > > Yours sincerely, > > > > ================================================ > > TAY Wee-Beng ??? (Zheng Weiming) > > Personal research webpage: *http://tayweebeng.wixsite.com/website > > * > > Youtube research showcase: *https://goo.gl/PtvdwQ * > > linkedin: *https://www.linkedin.com/in/tay-weebeng > > * > > ================================================ > > > > > From bsmith at petsc.dev Sat May 29 11:32:54 2021 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 29 May 2021 11:32:54 -0500 Subject: [petsc-users] CUDA MatSetValues test In-Reply-To: References: <497E7E23-0657-4268-85A0-C5DD6E4CB23F@petsc.dev> <88DEEBBE-09DD-45D6-90B1-85F092FA7BDB@gmail.com> <502EBC54-AC76-4548-98AB-1A59FB543FDC@petsc.dev> Message-ID: <79F9D2BD-A257-47ED-AEB0-FA836DB1D7D8@petsc.dev> I am working away on this branch, making some progress, also cleaning things up with some small simplifications. Hope I can succeed, a bunch of stuff got moved around and some structs had changes, the merge could not handle some of these so I have to do a good amount of code wrangling to fix it. I'll let you know as I progress. Barry > On May 28, 2021, at 10:53 PM, Barry Smith wrote: > > > I have rebased and tried to fix everything. I am now fixing the issues of --download-openmpi and cuda, once that is done I will test, rebase with main again if needed and restart the MR and get it into main. > > Barry > > I was stupid to let the MR lay fallow, I should have figured out a solution to the openmpi and cuda issue instead of punting and waiting for a dream fix. > > > >> On May 28, 2021, at 2:39 PM, Mark Adams > wrote: >> >> Thanks, >> >> I did not intend to make any (real) changes. >> The only thing that I did not intend to use from Barry's branch, that conflicted, was the help and comment block at the top of ex5cu.cu >> >> * I ended up with two declarations of PetscSplitCSRDataStructure >> * I added some includes to fix errors like this: >> /ccs/home/adams/petsc/include/../src/mat/impls/aij/seq/seqcusparse/cusparsematimpl.h(263): error: incomplete type is not allowed >> * I end ended not having csr2csc_i in Mat_SeqAIJCUSPARSE so I get: >> /autofs/nccs-svm1_home1/adams/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu (1348): error: class "Mat_SeqAIJCUSPARSE" has no member "csr2csc_i" >> >> >> >> >> On Fri, May 28, 2021 at 3:13 PM Stefano Zampini > wrote: >> I can take a quick look at it tomorrow, what are the main changes you made since then? >> >>> On May 28, 2021, at 9:51 PM, Mark Adams > wrote: >>> >>> I am getting messed up in trying to resolve conflicts in rebasing over main. >>> Is there a better way of doing this? >>> Can I just tell git to use Barry's version and then test it? >>> Or should I just try it again? >>> >>> On Fri, May 28, 2021 at 2:15 PM Mark Adams > wrote: >>> I am rebasing over main and its a bit of a mess. I must have missed something. I get this. I think the _n_SplitCSRMat must be wrong. >>> >>> >>> In file included from /autofs/nccs-svm1_home1/adams/petsc/src/vec/is/sf/impls/basic/sfbasic.c:128:0: >>> /ccs/home/adams/petsc/include/petscmat.h:1976:32: error: conflicting types for 'PetscSplitCSRDataStructure' >>> typedef struct _n_SplitCSRMat *PetscSplitCSRDataStructure; >>> ^~~~~~~~~~~~~~~~~~~~~~~~~~ >>> /ccs/home/adams/petsc/include/petscmat.h:1922:31: note: previous declaration of 'PetscSplitCSRDataStructure' was here >>> typedef struct _p_SplitCSRMat PetscSplitCSRDataStructure; >>> ^~~~~~~~~~~~~~~~~~~~~~~~~~ >>> CC arch-summit-opt-gnu-cuda/obj/vec/vec/impls/seq/dvec2.o >>> >>> On Fri, May 28, 2021 at 1:50 PM Stefano Zampini > wrote: >>> OpenMPI.py depends on cuda.py in that, if cuda is present, configures using cuda. MPI.py or MPICH.py do not depend on cuda.py (MPICH, only weakly, it adds a print if cuda is present) >>> Since eventually the MPI distro will only need a hint to be configured with CUDA, why not removing the dependency at all and add only a flag ?download-openmpi-use-cuda? >>> >>>> On May 28, 2021, at 8:44 PM, Barry Smith > wrote: >>>> >>>> >>>> Stefano, who has a far better memory than me, wrote >>>> >>>> > Or probably remove ?download-openmpi ? Or, just for the moment, why can?t we just tell configure that mpi is a weak dependence of cuda.py, so that it will be forced to be configured later? >>>> >>>> MPI.py depends on cuda.py so we cannot also have cuda.py depend on MPI.py using the generic dependencies of configure/packages >>>> >>>> but perhaps we can just hardwire the rerunning of cuda.py when the MPI compilers are reset. I will try that now and if I can get it to work we should be able to move those old fix branches along as MR. >>>> >>>> Barry >>>> >>>> >>>> >>>>> On May 28, 2021, at 12:41 PM, Mark Adams > wrote: >>>>> >>>>> OK, I will try to rebase and test Barry's branch. >>>>> >>>>> On Fri, May 28, 2021 at 1:26 PM Stefano Zampini > wrote: >>>>> Yes, it is the branch I was using before force pushing to Barry?s barry/2020-11-11/cleanup-matsetvaluesdevice >>>>> You can use both I guess >>>>> >>>>>> On May 28, 2021, at 8:25 PM, Mark Adams > wrote: >>>>>> >>>>>> Is this the correct branch? It conflicted with ex5cu so I assume it is. >>>>>> >>>>>> >>>>>> stefanozampini/simplify-setvalues-device >>>>>> >>>>>> On Fri, May 28, 2021 at 1:24 PM Mark Adams > wrote: >>>>>> I am fixing rebasing this branch over main. >>>>>> >>>>>> On Fri, May 28, 2021 at 1:16 PM Stefano Zampini > wrote: >>>>>> Or probably remove ?download-openmpi ? Or, just for the moment, why can?t we just tell configure that mpi is a weak dependence of cuda.py, so that it will be forced to be configured later? >>>>>> >>>>>>> On May 28, 2021, at 8:12 PM, Stefano Zampini > wrote: >>>>>>> >>>>>>> That branch provides a fix for MatSetValuesDevice but it never got merged because of the CI issues with the ?download-openmpi. We can probably try to skip the test in that specific configuration? >>>>>>> >>>>>>>> On May 28, 2021, at 7:45 PM, Barry Smith > wrote: >>>>>>>> >>>>>>>> >>>>>>>> ~/petsc/src/mat/tutorials (barry/2021-05-28/robustify-cuda-gencodearch-check=) arch-robustify-cuda-gencodearch-check >>>>>>>> $ ./ex5cu >>>>>>>> terminate called after throwing an instance of 'thrust::system::system_error' >>>>>>>> what(): fill_n: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered >>>>>>>> Aborted (core dumped) >>>>>>>> >>>>>>>> requires: cuda !define(PETSC_USE_CTABLE) >>>>>>>> >>>>>>>> CI does not test with CUDA and no ctable. The code is still broken as it was six months ago in the discussion Stefano pointed to. It is clear why just no one has had the time to clean things up. >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>>> On May 28, 2021, at 11:13 AM, Mark Adams > wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini > wrote: >>>>>>>>> If you are referring to your device set values, I guess it is not currently tested >>>>>>>>> >>>>>>>>> No. There is a test for that (ex5cu). >>>>>>>>> I have a user that is getting a segv in MatSetValues with aijcusparse. I suspect there is memory corruption but I'm trying to cover all the bases. >>>>>>>>> I have added a cuda test to ksp/ex56 that works. I can do an MR for it if such a test does not exist. >>>>>>>>> >>>>>>>>> See the discussions here https://gitlab.com/petsc/petsc/-/merge_requests/3411 >>>>>>>>> I started cleaning up the code to prepare for testing but we never finished it https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ >>>>>>>>> >>>>>>>>> >>>>>>>>>> On May 28, 2021, at 6:53 PM, Mark Adams > wrote: >>>>>>>>>> >>>>>>>>>> Is there a test with MatSetValues and CUDA? >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sat May 29 14:16:16 2021 From: mfadams at lbl.gov (Mark Adams) Date: Sat, 29 May 2021 15:16:16 -0400 Subject: [petsc-users] strange segv Message-ID: I am running on Summit with Kokkos-CUDA and I am getting a segv that looks like some sort of a compile/link mismatch. I also have a user with a C++ code that is getting strange segvs when calling MatSetValues with CUDA (I know MatSetValues is not a cupsarse method, but that is the report that I have). I have no idea if these are related but they both involve C -- C++ calls ... I started with a clean build (attached) and I ran in DDT. DDT stopped at the call in plexland.c to the KokkosLanau operator. I stepped into this function and then took this screenshot of the stack, with the Kokkos call and PETSc signal handler. Make check does not seem to be running Kokkos tests: 15:02 adams/landau-mass-opt *= /gpfs/alpine/csc314/scratch/adams/petsc$ make PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc PETSC_ARCH=arch-summit-opt-gnu-kokkos-notpl-cuda10 check Running check examples to verify correct installation Using PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc and PETSC_ARCH=arch-summit-opt-gnu-kokkos-notpl-cuda10 C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes C/C++ example src/snes/tutorials/ex19 run successfully with cuda Completed test examples Also, I ran this AM with another branch that had not been rebased with main as recently as this branch (adams/landau-mass-opt). Any ideas? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log Type: application/octet-stream Size: 108017 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 3458445 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2021-05-29 at 2.51.00 PM.png Type: image/png Size: 117498 bytes Desc: not available URL: From mfadams at lbl.gov Sat May 29 15:31:40 2021 From: mfadams at lbl.gov (Mark Adams) Date: Sat, 29 May 2021 16:31:40 -0400 Subject: [petsc-users] building with PGI and Kokkos Message-ID: I am trying to build with Kokkos and PGI on Summit and I can't seem to get Kokkos to accept that the C++ compiler supports C++14. Any ideas? Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 772873 bytes Desc: not available URL: From junchao.zhang at gmail.com Sat May 29 16:17:05 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Sat, 29 May 2021 16:17:05 -0500 Subject: [petsc-users] building with PGI and Kokkos In-Reply-To: References: Message-ID: It seems pgi-19.9 does not support C++14 needed by Kokkos. Try pgi/20.4? --Junchao Zhang On Sat, May 29, 2021 at 3:32 PM Mark Adams wrote: > I am trying to build with Kokkos and PGI on Summit and I can't seem to get > Kokkos to accept that the C++ compiler supports C++14. > Any ideas? > Mark > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sat May 29 16:43:51 2021 From: mfadams at lbl.gov (Mark Adams) Date: Sat, 29 May 2021 17:43:51 -0400 Subject: [petsc-users] building with PGI and Kokkos In-Reply-To: References: Message-ID: Good idea, thanks, no luck. Kokkos requires C++14 but the CUDAcompiler only supports C++11 On Sat, May 29, 2021 at 5:17 PM Junchao Zhang wrote: > It seems pgi-19.9 does not support C++14 needed by Kokkos. Try pgi/20.4? > --Junchao Zhang > > > On Sat, May 29, 2021 at 3:32 PM Mark Adams wrote: > >> I am trying to build with Kokkos and PGI on Summit and I can't seem to >> get Kokkos to accept that the C++ compiler supports C++14. >> Any ideas? >> Mark >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 1253804 bytes Desc: not available URL: From mfadams at lbl.gov Sat May 29 17:16:20 2021 From: mfadams at lbl.gov (Mark Adams) Date: Sat, 29 May 2021 18:16:20 -0400 Subject: [petsc-users] ok mat_tests-ex231_1 # SKIP Requires DATAFILESPATH Message-ID: How does one configure to get this DATAFILESPATH? -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Sat May 29 17:26:45 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Sat, 29 May 2021 17:26:45 -0500 Subject: [petsc-users] ok mat_tests-ex231_1 # SKIP Requires DATAFILESPATH In-Reply-To: References: Message-ID: In env export DATAFILESPATH=/path/to/datafile. --Junchao Zhang On Sat, May 29, 2021 at 5:16 PM Mark Adams wrote: > How does one configure to get this DATAFILESPATH? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sat May 29 18:48:53 2021 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 29 May 2021 18:48:53 -0500 Subject: [petsc-users] strange segv In-Reply-To: References: Message-ID: I don't see why it is not running the Kokkos check. Here is the rule right below the CUDA rule that is apparently running. check_build: - at echo "Running check examples to verify correct installation" - at echo "Using PETSC_DIR=${PETSC_DIR} and PETSC_ARCH=${PETSC_ARCH}" + at cd src/snes/tutorials >/dev/null; ${OMAKE_SELF} PETSC_ARCH=${PETSC_ARCH} PETSC_DIR=${PETSC_DIR} clean-legacy + at cd src/snes/tutorials >/dev/null; ${OMAKE_SELF} PETSC_ARCH=${PETSC_ARCH} PETSC_DIR=${PETSC_DIR} testex19 + at if [ "${HYPRE_LIB}" != "" ] && [ "${PETSC_WITH_BATCH}" = "" ] && [ "${PETSC_SCALAR}" = "real" ]; then \ cd src/snes/tutorials >/dev/null; ${OMAKE_SELF} PETSC_ARCH=${PETSC_ARCH} PETSC_DIR=${PETSC_DIR} DIFF=${PETSC_DIR}/lib/petsc/bin/petscdiff runex19_hypre; \ fi; + at if [ "${CUDA_LIB}" != "" ] && [ "${PETSC_WITH_BATCH}" = "" ] && [ "${PETSC_SCALAR}" = "real" ]; then \ cd src/snes/tutorials >/dev/null; ${OMAKE_SELF} PETSC_ARCH=${PETSC_ARCH} PETSC_DIR=${PETSC_DIR} DIFF=${PETSC_DIR}/lib/petsc/bin/petscdiff runex19_cuda; \ fi; + at if [ "${KOKKOS_KERNELS_LIB}" != "" ] && [ "${PETSC_WITH_BATCH}" = "" ] && [ "${PETSC_SCALAR}" = "real" ] && [ "${PETSC_PRECISION}" = "double" ] && [ "${MPI_IS_MPIUNI}" = "0" ]; then \ cd src/snes/tutorials >/dev/null; ${OMAKE_SELF} PETSC_ARCH=${PETSC_ARCH} PETSC_DIR=${PETSC_DIR} DIFF=${PETSC_DIR}/lib/petsc/bin/petscdiff runex3k_kokkos; \ fi; Regarding the debugging, if it is just one MPI rank (or even more) with GDB it will trap the error and show the exact line of source code where the error occurred and you can poke around at variables to see if they look corrupt or wrong (for example crazy address in a pointer), I don't know why your debugger is not giving more useful information. Barry > On May 29, 2021, at 2:16 PM, Mark Adams wrote: > > I am running on Summit with Kokkos-CUDA and I am getting a segv that looks like some sort of a compile/link mismatch. I also have a user with a C++ code that is getting strange segvs when calling MatSetValues with CUDA (I know MatSetValues is not a cupsarse method, but that is the report that I have). I have no idea if these are related but they both involve C -- C++ calls ... > > I started with a clean build (attached) and I ran in DDT. DDT stopped at the call in plexland.c to the KokkosLanau operator. I stepped into this function and then took this screenshot of the stack, with the Kokkos call and PETSc signal handler. > > Make check does not seem to be running Kokkos tests: > > 15:02 adams/landau-mass-opt *= /gpfs/alpine/csc314/scratch/adams/petsc$ make PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc PETSC_ARCH=arch-summit-opt-gnu-kokkos-notpl-cuda10 check > Running check examples to verify correct installation > Using PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc and PETSC_ARCH=arch-summit-opt-gnu-kokkos-notpl-cuda10 > C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process > C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes > C/C++ example src/snes/tutorials/ex19 run successfully with cuda > Completed test examples > > Also, I ran this AM with another branch that had not been rebased with main as recently as this branch (adams/landau-mass-opt). > > Any ideas? > From bsmith at petsc.dev Sat May 29 18:52:07 2021 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 29 May 2021 18:52:07 -0500 Subject: [petsc-users] building with PGI and Kokkos In-Reply-To: References: Message-ID: <1ACAC555-E17A-4D3B-9F4D-4877135D7EA6@petsc.dev> Something you might check. https://forums.developer.nvidia.com/t/c-14/135887/7 > On May 29, 2021, at 4:43 PM, Mark Adams wrote: > > Good idea, thanks, > no luck. > > Kokkos requires C++14 but the CUDAcompiler only supports C++11 > > On Sat, May 29, 2021 at 5:17 PM Junchao Zhang > wrote: > It seems pgi-19.9 does not support C++14 needed by Kokkos. Try pgi/20.4? > --Junchao Zhang > > > On Sat, May 29, 2021 at 3:32 PM Mark Adams > wrote: > I am trying to build with Kokkos and PGI on Summit and I can't seem to get Kokkos to accept that the C++ compiler supports C++14. > Any ideas? > Mark > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sat May 29 19:46:42 2021 From: mfadams at lbl.gov (Mark Adams) Date: Sat, 29 May 2021 20:46:42 -0400 Subject: [petsc-users] strange segv In-Reply-To: References: Message-ID: On Sat, May 29, 2021 at 7:48 PM Barry Smith wrote: > > I don't see why it is not running the Kokkos check. Here is the rule > right below the CUDA rule that is apparently running. > > check_build: > - at echo "Running check examples to verify correct installation" > - at echo "Using PETSC_DIR=${PETSC_DIR} and PETSC_ARCH=${PETSC_ARCH}" > + at cd src/snes/tutorials >/dev/null; ${OMAKE_SELF} > PETSC_ARCH=${PETSC_ARCH} PETSC_DIR=${PETSC_DIR} clean-legacy > + at cd src/snes/tutorials >/dev/null; ${OMAKE_SELF} > PETSC_ARCH=${PETSC_ARCH} PETSC_DIR=${PETSC_DIR} testex19 > + at if [ "${HYPRE_LIB}" != "" ] && [ "${PETSC_WITH_BATCH}" = "" ] > && [ "${PETSC_SCALAR}" = "real" ]; then \ > cd src/snes/tutorials >/dev/null; ${OMAKE_SELF} > PETSC_ARCH=${PETSC_ARCH} PETSC_DIR=${PETSC_DIR} > DIFF=${PETSC_DIR}/lib/petsc/bin/petscdiff runex19_hypre; \ > fi; > + at if [ "${CUDA_LIB}" != "" ] && [ "${PETSC_WITH_BATCH}" = "" ] && > [ "${PETSC_SCALAR}" = "real" ]; then \ > cd src/snes/tutorials >/dev/null; ${OMAKE_SELF} > PETSC_ARCH=${PETSC_ARCH} PETSC_DIR=${PETSC_DIR} > DIFF=${PETSC_DIR}/lib/petsc/bin/petscdiff runex19_cuda; \ > fi; > + at if [ "${KOKKOS_KERNELS_LIB}" != "" ] && [ "${PETSC_WITH_BATCH}" > = "" ] && [ "${PETSC_SCALAR}" = "real" ] && [ "${PETSC_PRECISION}" = > "double" ] && [ "${MPI_IS_MPIUNI}" = "0" ]; then \ > cd src/snes/tutorials >/dev/null; ${OMAKE_SELF} > PETSC_ARCH=${PETSC_ARCH} PETSC_DIR=${PETSC_DIR} > DIFF=${PETSC_DIR}/lib/petsc/bin/petscdiff runex3k_kokkos; \ > fi; > > Regarding the debugging, if it is just one MPI rank (or even more) with > GDB it will trap the error and show the exact line of source code where the > error occurred and you can poke around at variables to see if they look > corrupt or wrong (for example crazy address in a pointer), I don't know why > your debugger is not giving more useful information. > > This is what I did (in DDT). It stopped at the function call and the data looked fine. I stepped into the call, but didn't get to it. The signal handler was called and I was dead. Maybe I did something in my branch. Can't see what, but I keep probing, Thanks, > Barry > > > > On May 29, 2021, at 2:16 PM, Mark Adams wrote: > > > > I am running on Summit with Kokkos-CUDA and I am getting a segv that > looks like some sort of a compile/link mismatch. I also have a user with a > C++ code that is getting strange segvs when calling MatSetValues with CUDA > (I know MatSetValues is not a cupsarse method, but that is the report that > I have). I have no idea if these are related but they both involve C -- C++ > calls ... > > > > I started with a clean build (attached) and I ran in DDT. DDT stopped at > the call in plexland.c to the KokkosLanau operator. I stepped into this > function and then took this screenshot of the stack, with the Kokkos call > and PETSc signal handler. > > > > Make check does not seem to be running Kokkos tests: > > > > 15:02 adams/landau-mass-opt *= /gpfs/alpine/csc314/scratch/adams/petsc$ > make PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc > PETSC_ARCH=arch-summit-opt-gnu-kokkos-notpl-cuda10 check > > Running check examples to verify correct installation > > Using PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc and > PETSC_ARCH=arch-summit-opt-gnu-kokkos-notpl-cuda10 > > C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process > > C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI > processes > > C/C++ example src/snes/tutorials/ex19 run successfully with cuda > > Completed test examples > > > > Also, I ran this AM with another branch that had not been rebased with > main as recently as this branch (adams/landau-mass-opt). > > > > Any ideas? > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sun May 30 11:54:58 2021 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 30 May 2021 11:54:58 -0500 Subject: [petsc-users] CUDA MatSetValues test In-Reply-To: <79F9D2BD-A257-47ED-AEB0-FA836DB1D7D8@petsc.dev> References: <497E7E23-0657-4268-85A0-C5DD6E4CB23F@petsc.dev> <88DEEBBE-09DD-45D6-90B1-85F092FA7BDB@gmail.com> <502EBC54-AC76-4548-98AB-1A59FB543FDC@petsc.dev> <79F9D2BD-A257-47ED-AEB0-FA836DB1D7D8@petsc.dev> Message-ID: <9E5A7CCC-1CA9-4A66-9398-0FF52151E55F@petsc.dev> I believe I have finally successfully rebased the branch barry/2020-11-11/cleanup-matsetvaluesdevice against main and cleaned up all the issues. Please read the commit message. I have submitted a CI pipeline with ctables turned off temporarily for testing of the MatSetValuesDevice(). If it works hopefully Mark can maybe run a few additional tests of his Landau code that are not in the usual testing to verify and we can finally get the branch into main. Mark, Since this change is involved, it is likely your Landau mass matrix branch may not rebase cleanly. Let me know if you would like me to do the rebase and testing of your Landau mass matrix branch. I can get it ready to work with the results of barry/2020-11-11/cleanup-matsetvaluesdevice and then hand it back to you for further development. Barry > On May 29, 2021, at 11:32 AM, Barry Smith wrote: > > > I am working away on this branch, making some progress, also cleaning things up with some small simplifications. Hope I can succeed, a bunch of stuff got moved around and some structs had changes, the merge could not handle some of these so I have to do a good amount of code wrangling to fix it. > > I'll let you know as I progress. > > Barry > > >> On May 28, 2021, at 10:53 PM, Barry Smith > wrote: >> >> >> I have rebased and tried to fix everything. I am now fixing the issues of --download-openmpi and cuda, once that is done I will test, rebase with main again if needed and restart the MR and get it into main. >> >> Barry >> >> I was stupid to let the MR lay fallow, I should have figured out a solution to the openmpi and cuda issue instead of punting and waiting for a dream fix. >> >> >> >>> On May 28, 2021, at 2:39 PM, Mark Adams > wrote: >>> >>> Thanks, >>> >>> I did not intend to make any (real) changes. >>> The only thing that I did not intend to use from Barry's branch, that conflicted, was the help and comment block at the top of ex5cu.cu >>> >>> * I ended up with two declarations of PetscSplitCSRDataStructure >>> * I added some includes to fix errors like this: >>> /ccs/home/adams/petsc/include/../src/mat/impls/aij/seq/seqcusparse/cusparsematimpl.h(263): error: incomplete type is not allowed >>> * I end ended not having csr2csc_i in Mat_SeqAIJCUSPARSE so I get: >>> /autofs/nccs-svm1_home1/adams/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu (1348): error: class "Mat_SeqAIJCUSPARSE" has no member "csr2csc_i" >>> >>> >>> >>> >>> On Fri, May 28, 2021 at 3:13 PM Stefano Zampini > wrote: >>> I can take a quick look at it tomorrow, what are the main changes you made since then? >>> >>>> On May 28, 2021, at 9:51 PM, Mark Adams > wrote: >>>> >>>> I am getting messed up in trying to resolve conflicts in rebasing over main. >>>> Is there a better way of doing this? >>>> Can I just tell git to use Barry's version and then test it? >>>> Or should I just try it again? >>>> >>>> On Fri, May 28, 2021 at 2:15 PM Mark Adams > wrote: >>>> I am rebasing over main and its a bit of a mess. I must have missed something. I get this. I think the _n_SplitCSRMat must be wrong. >>>> >>>> >>>> In file included from /autofs/nccs-svm1_home1/adams/petsc/src/vec/is/sf/impls/basic/sfbasic.c:128:0: >>>> /ccs/home/adams/petsc/include/petscmat.h:1976:32: error: conflicting types for 'PetscSplitCSRDataStructure' >>>> typedef struct _n_SplitCSRMat *PetscSplitCSRDataStructure; >>>> ^~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> /ccs/home/adams/petsc/include/petscmat.h:1922:31: note: previous declaration of 'PetscSplitCSRDataStructure' was here >>>> typedef struct _p_SplitCSRMat PetscSplitCSRDataStructure; >>>> ^~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> CC arch-summit-opt-gnu-cuda/obj/vec/vec/impls/seq/dvec2.o >>>> >>>> On Fri, May 28, 2021 at 1:50 PM Stefano Zampini > wrote: >>>> OpenMPI.py depends on cuda.py in that, if cuda is present, configures using cuda. MPI.py or MPICH.py do not depend on cuda.py (MPICH, only weakly, it adds a print if cuda is present) >>>> Since eventually the MPI distro will only need a hint to be configured with CUDA, why not removing the dependency at all and add only a flag ?download-openmpi-use-cuda? >>>> >>>>> On May 28, 2021, at 8:44 PM, Barry Smith > wrote: >>>>> >>>>> >>>>> Stefano, who has a far better memory than me, wrote >>>>> >>>>> > Or probably remove ?download-openmpi ? Or, just for the moment, why can?t we just tell configure that mpi is a weak dependence of cuda.py, so that it will be forced to be configured later? >>>>> >>>>> MPI.py depends on cuda.py so we cannot also have cuda.py depend on MPI.py using the generic dependencies of configure/packages >>>>> >>>>> but perhaps we can just hardwire the rerunning of cuda.py when the MPI compilers are reset. I will try that now and if I can get it to work we should be able to move those old fix branches along as MR. >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>>> On May 28, 2021, at 12:41 PM, Mark Adams > wrote: >>>>>> >>>>>> OK, I will try to rebase and test Barry's branch. >>>>>> >>>>>> On Fri, May 28, 2021 at 1:26 PM Stefano Zampini > wrote: >>>>>> Yes, it is the branch I was using before force pushing to Barry?s barry/2020-11-11/cleanup-matsetvaluesdevice >>>>>> You can use both I guess >>>>>> >>>>>>> On May 28, 2021, at 8:25 PM, Mark Adams > wrote: >>>>>>> >>>>>>> Is this the correct branch? It conflicted with ex5cu so I assume it is. >>>>>>> >>>>>>> >>>>>>> stefanozampini/simplify-setvalues-device >>>>>>> >>>>>>> On Fri, May 28, 2021 at 1:24 PM Mark Adams > wrote: >>>>>>> I am fixing rebasing this branch over main. >>>>>>> >>>>>>> On Fri, May 28, 2021 at 1:16 PM Stefano Zampini > wrote: >>>>>>> Or probably remove ?download-openmpi ? Or, just for the moment, why can?t we just tell configure that mpi is a weak dependence of cuda.py, so that it will be forced to be configured later? >>>>>>> >>>>>>>> On May 28, 2021, at 8:12 PM, Stefano Zampini > wrote: >>>>>>>> >>>>>>>> That branch provides a fix for MatSetValuesDevice but it never got merged because of the CI issues with the ?download-openmpi. We can probably try to skip the test in that specific configuration? >>>>>>>> >>>>>>>>> On May 28, 2021, at 7:45 PM, Barry Smith > wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> ~/petsc/src/mat/tutorials (barry/2021-05-28/robustify-cuda-gencodearch-check=) arch-robustify-cuda-gencodearch-check >>>>>>>>> $ ./ex5cu >>>>>>>>> terminate called after throwing an instance of 'thrust::system::system_error' >>>>>>>>> what(): fill_n: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered >>>>>>>>> Aborted (core dumped) >>>>>>>>> >>>>>>>>> requires: cuda !define(PETSC_USE_CTABLE) >>>>>>>>> >>>>>>>>> CI does not test with CUDA and no ctable. The code is still broken as it was six months ago in the discussion Stefano pointed to. It is clear why just no one has had the time to clean things up. >>>>>>>>> >>>>>>>>> Barry >>>>>>>>> >>>>>>>>> >>>>>>>>>> On May 28, 2021, at 11:13 AM, Mark Adams > wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini > wrote: >>>>>>>>>> If you are referring to your device set values, I guess it is not currently tested >>>>>>>>>> >>>>>>>>>> No. There is a test for that (ex5cu). >>>>>>>>>> I have a user that is getting a segv in MatSetValues with aijcusparse. I suspect there is memory corruption but I'm trying to cover all the bases. >>>>>>>>>> I have added a cuda test to ksp/ex56 that works. I can do an MR for it if such a test does not exist. >>>>>>>>>> >>>>>>>>>> See the discussions here https://gitlab.com/petsc/petsc/-/merge_requests/3411 >>>>>>>>>> I started cleaning up the code to prepare for testing but we never finished it https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On May 28, 2021, at 6:53 PM, Mark Adams > wrote: >>>>>>>>>>> >>>>>>>>>>> Is there a test with MatSetValues and CUDA? >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sun May 30 15:46:01 2021 From: mfadams at lbl.gov (Mark Adams) Date: Sun, 30 May 2021 16:46:01 -0400 Subject: [petsc-users] CUDA MatSetValues test In-Reply-To: <9E5A7CCC-1CA9-4A66-9398-0FF52151E55F@petsc.dev> References: <497E7E23-0657-4268-85A0-C5DD6E4CB23F@petsc.dev> <88DEEBBE-09DD-45D6-90B1-85F092FA7BDB@gmail.com> <502EBC54-AC76-4548-98AB-1A59FB543FDC@petsc.dev> <79F9D2BD-A257-47ED-AEB0-FA836DB1D7D8@petsc.dev> <9E5A7CCC-1CA9-4A66-9398-0FF52151E55F@petsc.dev> Message-ID: On Sun, May 30, 2021 at 12:55 PM Barry Smith wrote: > > I believe I have finally successfully rebased the branch barry/2020-11-11/cleanup-matsetvaluesdevice > against main and cleaned up all the issues. Please read the commit message. > > I have submitted a CI pipeline with ctables turned off temporarily for > testing of the MatSetValuesDevice(). If it works hopefully Mark can maybe > run a few additional tests of his Landau code that are not in the usual > testing to verify and we can finally get the branch into main. > If ex2_[cuda|kokkos] pass then you are fine. Thanks for doing this, and Stefano, this needed to be looked at by someone that knows what they are doing. > > Mark, > > Since this change is involved, it is likely your Landau mass matrix > branch may not rebase cleanly. > Oh ya, you touched a lot of landau code, but it does not look hard. You removed * from PetscSplitCSRDataStructure *d_mat=NULL; and the rest look like it will be easy to pick your version. > Let me know if you would like me to do the rebase and testing of your > Landau mass matrix branch. I can get it ready to work with the results of > barry/2020-11-11/cleanup-matsetvaluesdevice and then hand it back to you > for further development. > I will remove my changes to petscaijdevice.h. All I (Peng) did was protect printf statements with DEBUG, because printf takes up register(s) and registers are the limiting resource in Landau. So I guess you can merge and I will rebase over main. Mark > > Barry > > > On May 29, 2021, at 11:32 AM, Barry Smith wrote: > > > I am working away on this branch, making some progress, also cleaning > things up with some small simplifications. Hope I can succeed, a bunch of > stuff got moved around and some structs had changes, the merge could not > handle some of these so I have to do a good amount of code wrangling to fix > it. > > I'll let you know as I progress. > > Barry > > > On May 28, 2021, at 10:53 PM, Barry Smith wrote: > > > I have rebased and tried to fix everything. I am now fixing the issues > of --download-openmpi and cuda, once that is done I will test, rebase with > main again if needed and restart the MR and get it into main. > > Barry > > I was stupid to let the MR lay fallow, I should have figured out a > solution to the openmpi and cuda issue instead of punting and waiting for a > dream fix. > > > > On May 28, 2021, at 2:39 PM, Mark Adams wrote: > > Thanks, > > I did not intend to make any (real) changes. > The only thing that I did not intend to use from Barry's branch, that > conflicted, was the help and comment block at the top of ex5cu.cu > > * I ended up with two declarations of PetscSplitCSRDataStructure > * I added some includes to fix errors like this: > /ccs/home/adams/petsc/include/../src/mat/impls/aij/seq/seqcusparse/cusparsematimpl.h(263): > error: incomplete type is not allowed > * I end ended not having csr2csc_i in Mat_SeqAIJCUSPARSE so I get: > /autofs/nccs-svm1_home1/adams/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu(1348): error: class "Mat_SeqAIJCUSPARSE" has no member > "csr2csc_i" > > > > > On Fri, May 28, 2021 at 3:13 PM Stefano Zampini > wrote: > >> I can take a quick look at it tomorrow, what are the main changes you >> made since then? >> >> On May 28, 2021, at 9:51 PM, Mark Adams wrote: >> >> I am getting messed up in trying to resolve conflicts in rebasing over >> main. >> Is there a better way of doing this? >> Can I just tell git to use Barry's version and then test it? >> Or should I just try it again? >> >> On Fri, May 28, 2021 at 2:15 PM Mark Adams wrote: >> >>> I am rebasing over main and its a bit of a mess. I must have missed >>> something. I get this. I think the _n_SplitCSRMat must be wrong. >>> >>> >>> In file included from >>> /autofs/nccs-svm1_home1/adams/petsc/src/vec/is/sf/impls/basic/sfbasic.c:128:0: >>> /ccs/home/adams/petsc/include/petscmat.h:1976:32: error: conflicting >>> types for 'PetscSplitCSRDataStructure' >>> typedef struct _n_SplitCSRMat *PetscSplitCSRDataStructure; >>> ^~~~~~~~~~~~~~~~~~~~~~~~~~ >>> /ccs/home/adams/petsc/include/petscmat.h:1922:31: note: previous >>> declaration of 'PetscSplitCSRDataStructure' was here >>> typedef struct _p_SplitCSRMat PetscSplitCSRDataStructure; >>> ^~~~~~~~~~~~~~~~~~~~~~~~~~ >>> CC arch-summit-opt-gnu-cuda/obj/vec/vec/impls/seq/dvec2.o >>> >>> On Fri, May 28, 2021 at 1:50 PM Stefano Zampini < >>> stefano.zampini at gmail.com> wrote: >>> >>>> OpenMPI.py depends on cuda.py in that, if cuda is present, configures >>>> using cuda. MPI.py or MPICH.py do not depend on cuda.py (MPICH, only >>>> weakly, it adds a print if cuda is present) >>>> Since eventually the MPI distro will only need a hint to be configured >>>> with CUDA, why not removing the dependency at all and add only a flag >>>> ?download-openmpi-use-cuda? >>>> >>>> On May 28, 2021, at 8:44 PM, Barry Smith wrote: >>>> >>>> >>>> Stefano, who has a far better memory than me, wrote >>>> >>>> > Or probably remove ?download-openmpi ? Or, just for the moment, why >>>> can?t we just tell configure that mpi is a weak dependence of cuda.py, so >>>> that it will be forced to be configured later? >>>> >>>> MPI.py depends on cuda.py so we cannot also have cuda.py depend on >>>> MPI.py using the generic dependencies of configure/packages >>>> >>>> but perhaps we can just hardwire the rerunning of cuda.py when the >>>> MPI compilers are reset. I will try that now and if I can get it to work we >>>> should be able to move those old fix branches along as MR. >>>> >>>> Barry >>>> >>>> >>>> >>>> On May 28, 2021, at 12:41 PM, Mark Adams wrote: >>>> >>>> OK, I will try to rebase and test Barry's branch. >>>> >>>> On Fri, May 28, 2021 at 1:26 PM Stefano Zampini < >>>> stefano.zampini at gmail.com> wrote: >>>> >>>>> Yes, it is the branch I was using before force pushing to >>>>> Barry?s barry/2020-11-11/cleanup-matsetvaluesdevice >>>>> You can use both I guess >>>>> >>>>> On May 28, 2021, at 8:25 PM, Mark Adams wrote: >>>>> >>>>> Is this the correct branch? It conflicted with ex5cu so I assume it is. >>>>> >>>>> >>>>> stefanozampini/simplify-setvalues-device >>>>> >>>>> >>>>> On Fri, May 28, 2021 at 1:24 PM Mark Adams wrote: >>>>> >>>>>> I am fixing rebasing this branch over main. >>>>>> >>>>>> On Fri, May 28, 2021 at 1:16 PM Stefano Zampini < >>>>>> stefano.zampini at gmail.com> wrote: >>>>>> >>>>>>> Or probably remove ?download-openmpi ? Or, just for the moment, why >>>>>>> can?t we just tell configure that mpi is a weak dependence of cuda.py, so >>>>>>> that it will be forced to be configured later? >>>>>>> >>>>>>> On May 28, 2021, at 8:12 PM, Stefano Zampini < >>>>>>> stefano.zampini at gmail.com> wrote: >>>>>>> >>>>>>> That branch provides a fix for MatSetValuesDevice but it never got >>>>>>> merged because of the CI issues with the ?download-openmpi. We can probably >>>>>>> try to skip the test in that specific configuration? >>>>>>> >>>>>>> On May 28, 2021, at 7:45 PM, Barry Smith wrote: >>>>>>> >>>>>>> >>>>>>> ~/petsc/src/mat/tutorials* >>>>>>> (barry/2021-05-28/robustify-cuda-gencodearch-check=)* >>>>>>> arch-robustify-cuda-gencodearch-check >>>>>>> $ ./ex5cu >>>>>>> terminate called after throwing an instance of >>>>>>> 'thrust::system::system_error' >>>>>>> what(): fill_n: failed to synchronize: cudaErrorIllegalAddress: >>>>>>> an illegal memory access was encountered >>>>>>> Aborted (core dumped) >>>>>>> >>>>>>> requires: cuda !define(PETSC_USE_CTABLE) >>>>>>> >>>>>>> CI does not test with CUDA and no ctable. The code is still >>>>>>> broken as it was six months ago in the discussion Stefano pointed to. It is >>>>>>> clear why just no one has had the time to clean things up. >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> On May 28, 2021, at 11:13 AM, Mark Adams wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini < >>>>>>> stefano.zampini at gmail.com> wrote: >>>>>>> >>>>>>>> If you are referring to your device set values, I guess it is not >>>>>>>> currently tested >>>>>>>> >>>>>>> >>>>>>> No. There is a test for that (ex5cu). >>>>>>> I have a user that is getting a segv in MatSetValues with >>>>>>> aijcusparse. I suspect there is memory corruption but I'm trying to cover >>>>>>> all the bases. >>>>>>> I have added a cuda test to ksp/ex56 that works. I can do an MR for >>>>>>> it if such a test does not exist. >>>>>>> >>>>>>> >>>>>>>> See the discussions here >>>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/3411 >>>>>>>> I started cleaning up the code to prepare for testing but we never >>>>>>>> finished it >>>>>>>> https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ >>>>>>>> >>>>>>>> >>>>>>>> On May 28, 2021, at 6:53 PM, Mark Adams wrote: >>>>>>>> >>>>>>>> Is there a test with MatSetValues and CUDA? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>> >>>> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brardafrancesco at gmail.com Mon May 31 03:25:12 2021 From: brardafrancesco at gmail.com (Francesco Brarda) Date: Mon, 31 May 2021 10:25:12 +0200 Subject: [petsc-users] Collect Trajectories components In-Reply-To: <65CD020A-D3CD-4018-9FD2-A07676112499@petsc.dev> References: <8F422055-6AB2-4BE9-8845-63955BC0D3B9@petsc.dev> <7D4A509B-2D00-4F96-AE3F-0705D1038CFA@petsc.dev> <65CD020A-D3CD-4018-9FD2-A07676112499@petsc.dev> Message-ID: Thank you very much for the insights. I don?t believe I was clear enough in my previous message, I am sorry. The final vector I am trying to build should contain only 1 component of the trajectory (the second one, U[1]) at different times. For instance: Rank 0: Second component(t=0) Second component(t=1) Second component(t=2) Rank 1: Second component(t=3) Second component(t=4) Second component(t=5) Rank 2: Second component(t=6) Second component(t=7) And so on. Do you think it is possible? Does the vector U need to have specific requirements in terms of dofs or stencil width? Francesco > Il giorno 28 mag 2021, alle ore 18:04, Barry Smith ha scritto: > > > What does "not working as I would like" mean? It should be retrieving the trajectory at the times 1.0, 2.0, 3.0 ... 40.0 and setting into the vector partial the values of the second component of Uloc (which depending on DMDA having a stencil width of 1 and a w of 1 is the first component of U. > > You can move the VecGet/RestoreArray(partial,&partlocal);CHKERRQ(ierr); outside of the loop. > > If you want the first component of U on process 0 you don't need the Uloc or the GlobalToLocalBegin/End. just use DMDAVecGetArray(appctx.da,U,&Ui);CHKERRQ(ierr); > > > You only provide 14 locations in partial distributed over the MPI ranks but likely you want 40 on the first rank and none on the other ranks > > You are assigning part local[i] on all ranks, but you said you only want it on rank 0 so here is code that may work > > if rank == 0 { >> ierr = VecCreateMPI(PETSC_COMM_WORLD,40,PETSC_DETERMINE,&partial);CHKERRQ(ierr); /* 40 local values >> ierr = VecGetArray(partial,&partlocal);CHKERRQ(ierr); > } else { > ierr = VecCreateMPI(PETSC_COMM_WORLD,0,PETSC_DETERMINE,&partial);CHKERRQ(ierr); /* 0 local values > } >> for(i=0; i<40; i++) { >> PetscReal ttime = i+1; >> ierr = TSTrajectoryGetVecs(appctx.tj,appctx.ts,PETSC_DECIDE,&ttime,U,NULL);CHKERRQ(ierr); > if rank == 0 { >> ierr = DMDAVecGetArray(appctx.da,U,&Ui);CHKERRQ(ierr); >> partlocal[i] = Ui[0]; >> ierr = DMDAVecRestoreArray(appctx.da,U,&Ui);CHKERRQ(ierr); > } > } >> if rank == 0 { >> ierr = VecRestoreArray(partial,&partlocal);CHKERRQ(ierr); > } > > Note that this entire block of code needs to run on all MPI ranks. But the actually selection of the wanted value only occurs on rank 0 > > When the loop is done rank == 0 will have a parallel vector whose components are what you want and all the other ranks will have a parallel vector with no components on those ranks. Note that you don't need to make partial be a parallel vector, you can just make it live on rank == 0 because that is the only place you access it. Then the code would be simpler > > if rank == 0 { >> ierr = VecCreateSeq(PETSC_COMM_WORLD,40,PETSC_&partial);CHKERRQ(ierr); >> ierr = VecGetArray(partial,&partlocal);CHKERRQ(ierr); > } > > for(i=0; i<40; i++) { >> PetscReal ttime = i+1; >> ierr = TSTrajectoryGetVecs(appctx.tj,appctx.ts,PETSC_DECIDE,&ttime,U,NULL);CHKERRQ(ierr); > if rank == 0 { >> ierr = DMDAVecGetArray(appctx.da,U,&Ui);CHKERRQ(ierr); >> partlocal[i] = Ui[0]; >> ierr = DMDAVecRestoreArray(appctx.da,U,&Ui);CHKERRQ(ierr); > } > } >> if rank == 0 { >> ierr = VecRestoreArray(partial,&partlocal);CHKERRQ(ierr);} > > > Barry > > >> On May 27, 2021, at 2:42 AM, Francesco Brarda > wrote: >> >> I created a for cycle where I call TSTrajectoryGetVecs, but only the 0 rank seems to enter in this cycle and I do not know why. >> I thought the following might be a solution, but it is not working as I would like to, i.e. the final vector has the same local parts, a copy of the values obtained with the 0-rank. How should I change this, please? >> >> Vec U, partial, Uloc; >> PetscScalar *Ui, *partlocal; >> PetscInt i; >> ierr = VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,14,&partial);CHKERRQ(ierr); >> for(i=0; i<40; i++) { >> PetscReal ttime = i+1; >> ierr = TSTrajectoryGetVecs(appctx.tj,appctx.ts,PETSC_DECIDE,&ttime,U,NULL);CHKERRQ(ierr); >> ierr = DMGetLocalVector(appctx.da,&Uloc);CHKERRQ(ierr); >> ierr = DMGlobalToLocalBegin(appctx.da,U,INSERT_VALUES,Uloc);CHKERRQ(ierr); >> ierr = DMGlobalToLocalEnd(appctx.da,U,INSERT_VALUES,Uloc);CHKERRQ(ierr); >> ierr = DMDAVecGetArray(appctx.da,Uloc,&Ui);CHKERRQ(ierr); >> ierr = VecGetArray(partial,&partlocal);CHKERRQ(ierr); >> partlocal[i] = Ui[1]; >> ierr = DMDAVecRestoreArray(appctx.da,U,&Ui);CHKERRQ(ierr); >> ierr = VecRestoreArray(partial,&partlocal);CHKERRQ(ierr); >> ierr = DMRestoreLocalVector(appctx.da,&Uloc);CHKERRQ(ierr); >> } >> >> >>> Il giorno 27 mag 2021, alle ore 01:15, Barry Smith > ha scritto: >>> >>> >>> >>>> On May 26, 2021, at 10:39 AM, Francesco Brarda > wrote: >>>> >>>> Thank you very much. >>>>> Based on the error message it appears that your code is requesting different times on different MPI ranks. Is that what you intend to do? >>>> Yes. I want to save different times across a vector built with multiple MPI ranks (PETSC_DECIDE for the local length). >>>> The function is called only by the first proc (rank=0) and not from the others. Is there a way to force also other ranks to call that routine? >>> >>> Yes, just have all ranks call it and ignore the result on the other ranks. >>> >>>> Should I build everything into an external function outside the main? >>> >>> It can be called in main, does not need to be in a different function. >>> >>>> >>>> Francesco >>>> >>>> >>>>> Il giorno 26 mag 2021, alle ore 16:20, Barry Smith > ha scritto: >>>>> >>>>> >>>>> >>>>> >>>>> TSTrajectoryGetVecs() is listed as Collective on TS. This means all ranks must call it with the same times in the same order of operations on all ranks that share the TS. >>>>> >>>>> You do not need to use VecScatter. Each process must call TSTrajectoryGetVecs with the same time but then you can have only the rank you care about select the entries from the resulting vectors you care about while the other ranks for that time just ignore the vectors since they do not need to values from it. >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>>> On May 26, 2021, at 5:20 AM, Francesco Brarda > wrote: >>>>>> >>>>>> Hi! >>>>>> >>>>>> I solved an ODE system with TS. Now I would like to save one of the trajectories in specific times. To do so, I used TSTrajectoryGetVecs. >>>>>> The values of the variable I am interested in is on one processor. I want to collect these values in a parallel vector, but I had the error: >>>>>> >>>>>> [0]PETSC ERROR: Invalid argument >>>>>> [0]PETSC ERROR: Real value must be same on all processes, argument # 2 >>>>>> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>>>> [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown >>>>>> [0]PETSC ERROR: ./petsc_sir on a arch-debug named srvulx13 by fbrarda Wed May 26 12:00:42 2021 >>>>>> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc --download-mpich PETSC_ARCH=arch-debug >>>>>> [0]PETSC ERROR: #1 TSHistoryGetLocFromTime() line 134 in /home/fbrarda/petsc/src/ts/interface/tshistory.c >>>>>> [0]PETSC ERROR: #2 TSTrajectoryReconstruct_Private() line 55 in /home/fbrarda/petsc/src/ts/trajectory/utils/reconstruct.c >>>>>> [0]PETSC ERROR: #3 TSTrajectoryGetVecs() line 239 in /home/fbrarda/petsc/src/ts/trajectory/interface/traj.c >>>>>> >>>>>> Is there any specific routine I can use to overcome this issue? Should I use VecScatter? >>>>>> >>>>>> I hope I made myself clear. >>>>>> Best, >>>>>> Francesco >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon May 31 04:14:14 2021 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 31 May 2021 04:14:14 -0500 Subject: [petsc-users] Collect Trajectories components In-Reply-To: References: <8F422055-6AB2-4BE9-8845-63955BC0D3B9@petsc.dev> <7D4A509B-2D00-4F96-AE3F-0705D1038CFA@petsc.dev> <65CD020A-D3CD-4018-9FD2-A07676112499@petsc.dev> Message-ID: > On May 31, 2021, at 3:25 AM, Francesco Brarda wrote: > > Thank you very much for the insights. I don?t believe I was clear enough in my previous message, I am sorry. > The final vector I am trying to build should contain only 1 component of the trajectory (the second one, U[1]) at different times. For instance: > > Rank 0: > Second component(t=0) > Second component(t=1) > Second component(t=2) > Rank 1: > Second component(t=3) > Second component(t=4) > Second component(t=5) > Rank 2: > Second component(t=6) > Second component(t=7) > > And so on. > Do you think it is possible? The partial vector will only have values on rank 0 and have the values you want. If you want them in parallel then you would use VecScatter to get them spread out into a new parallel vector after you have collected them all. PetscInt n; VecGetLocalSize(partial,&n); VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,n,&result); VecScatter vecscatter; VecScatterCreate(partial,NULL,result,NULL,&vecscatter); VecScatterBegin(vescatter,partial,result,INSERT_VALUES,SCATTER_FORWARD); VecScatterEnd(vescatter,partial,result,INSERT_VALUES,SCATTER_FORWARD); PetscInt nlocal; VecGetLocalSize(result,&nlocal); result contains the required values and on each rank it has nlocal of them. Note that nlocal may be different by 1 on different ranks depending on the exact value of n. If you want to control how many values are on each process then set nlocal (so that sum_ranks nlocal = n) and have instead PetscInt n; VecGetLocalSize(partial,&n); PetscInt nlocal /* compute this for each rank, the way you want. VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,&result); VecScatter vecscatter; VecScatterCreate(partial,NULL,result,NULL,&vecscatter); VecScatterBegin(vescatter,partial,result,INSERT_VALUES,SCATTER_FORWARD); VecScatterEnd(vescatter,partial,result,INSERT_VALUES,SCATTER_FORWARD); Note it makes sense to send the values around to all the ranks when you have hit the final tilmestep. It would be less efficient to send at most one of them along at each timestep. Unless you need that. If you need that then you can do an MPI_Send() on rank 0 of the latest value you obtained and MPI_IRecv() on whichever rank is destined to store the value (which will be rank 0 for the first number of time steps, then rank 1, then rank 2 etc.) Barry > Does the vector U need to have specific requirements in terms of dofs or stencil width? > > Francesco > > >> Il giorno 28 mag 2021, alle ore 18:04, Barry Smith > ha scritto: >> >> >> What does "not working as I would like" mean? It should be retrieving the trajectory at the times 1.0, 2.0, 3.0 ... 40.0 and setting into the vector partial the values of the second component of Uloc (which depending on DMDA having a stencil width of 1 and a w of 1 is the first component of U. >> >> You can move the VecGet/RestoreArray(partial,&partlocal);CHKERRQ(ierr); outside of the loop. >> >> If you want the first component of U on process 0 you don't need the Uloc or the GlobalToLocalBegin/End. just use DMDAVecGetArray(appctx.da,U,&Ui);CHKERRQ(ierr); >> >> >> You only provide 14 locations in partial distributed over the MPI ranks but likely you want 40 on the first rank and none on the other ranks >> >> You are assigning part local[i] on all ranks, but you said you only want it on rank 0 so here is code that may work >> >> if rank == 0 { >>> ierr = VecCreateMPI(PETSC_COMM_WORLD,40,PETSC_DETERMINE,&partial);CHKERRQ(ierr); /* 40 local values >>> ierr = VecGetArray(partial,&partlocal);CHKERRQ(ierr); >> } else { >> ierr = VecCreateMPI(PETSC_COMM_WORLD,0,PETSC_DETERMINE,&partial);CHKERRQ(ierr); /* 0 local values >> } >>> for(i=0; i<40; i++) { >>> PetscReal ttime = i+1; >>> ierr = TSTrajectoryGetVecs(appctx.tj,appctx.ts,PETSC_DECIDE,&ttime,U,NULL);CHKERRQ(ierr); >> if rank == 0 { >>> ierr = DMDAVecGetArray(appctx.da,U,&Ui);CHKERRQ(ierr); >>> partlocal[i] = Ui[0]; >>> ierr = DMDAVecRestoreArray(appctx.da,U,&Ui);CHKERRQ(ierr); >> } >> } >>> if rank == 0 { >>> ierr = VecRestoreArray(partial,&partlocal);CHKERRQ(ierr); >> } >> >> Note that this entire block of code needs to run on all MPI ranks. But the actually selection of the wanted value only occurs on rank 0 >> >> When the loop is done rank == 0 will have a parallel vector whose components are what you want and all the other ranks will have a parallel vector with no components on those ranks. Note that you don't need to make partial be a parallel vector, you can just make it live on rank == 0 because that is the only place you access it. Then the code would be simpler >> >> if rank == 0 { >>> ierr = VecCreateSeq(PETSC_COMM_WORLD,40,PETSC_&partial);CHKERRQ(ierr); >>> ierr = VecGetArray(partial,&partlocal);CHKERRQ(ierr); >> } >> >> for(i=0; i<40; i++) { >>> PetscReal ttime = i+1; >>> ierr = TSTrajectoryGetVecs(appctx.tj,appctx.ts,PETSC_DECIDE,&ttime,U,NULL);CHKERRQ(ierr); >> if rank == 0 { >>> ierr = DMDAVecGetArray(appctx.da,U,&Ui);CHKERRQ(ierr); >>> partlocal[i] = Ui[0]; >>> ierr = DMDAVecRestoreArray(appctx.da,U,&Ui);CHKERRQ(ierr); >> } >> } >>> if rank == 0 { >>> ierr = VecRestoreArray(partial,&partlocal);CHKERRQ(ierr);} >> >> >> Barry >> >> >>> On May 27, 2021, at 2:42 AM, Francesco Brarda > wrote: >>> >>> I created a for cycle where I call TSTrajectoryGetVecs, but only the 0 rank seems to enter in this cycle and I do not know why. >>> I thought the following might be a solution, but it is not working as I would like to, i.e. the final vector has the same local parts, a copy of the values obtained with the 0-rank. How should I change this, please? >>> >>> Vec U, partial, Uloc; >>> PetscScalar *Ui, *partlocal; >>> PetscInt i; >>> ierr = VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,14,&partial);CHKERRQ(ierr); >>> for(i=0; i<40; i++) { >>> PetscReal ttime = i+1; >>> ierr = TSTrajectoryGetVecs(appctx.tj,appctx.ts,PETSC_DECIDE,&ttime,U,NULL);CHKERRQ(ierr); >>> ierr = DMGetLocalVector(appctx.da,&Uloc);CHKERRQ(ierr); >>> ierr = DMGlobalToLocalBegin(appctx.da,U,INSERT_VALUES,Uloc);CHKERRQ(ierr); >>> ierr = DMGlobalToLocalEnd(appctx.da,U,INSERT_VALUES,Uloc);CHKERRQ(ierr); >>> ierr = DMDAVecGetArray(appctx.da,Uloc,&Ui);CHKERRQ(ierr); >>> ierr = VecGetArray(partial,&partlocal);CHKERRQ(ierr); >>> partlocal[i] = Ui[1]; >>> ierr = DMDAVecRestoreArray(appctx.da,U,&Ui);CHKERRQ(ierr); >>> ierr = VecRestoreArray(partial,&partlocal);CHKERRQ(ierr); >>> ierr = DMRestoreLocalVector(appctx.da,&Uloc);CHKERRQ(ierr); >>> } >>> >>> >>>> Il giorno 27 mag 2021, alle ore 01:15, Barry Smith > ha scritto: >>>> >>>> >>>> >>>>> On May 26, 2021, at 10:39 AM, Francesco Brarda > wrote: >>>>> >>>>> Thank you very much. >>>>>> Based on the error message it appears that your code is requesting different times on different MPI ranks. Is that what you intend to do? >>>>> Yes. I want to save different times across a vector built with multiple MPI ranks (PETSC_DECIDE for the local length). >>>>> The function is called only by the first proc (rank=0) and not from the others. Is there a way to force also other ranks to call that routine? >>>> >>>> Yes, just have all ranks call it and ignore the result on the other ranks. >>>> >>>>> Should I build everything into an external function outside the main? >>>> >>>> It can be called in main, does not need to be in a different function. >>>> >>>>> >>>>> Francesco >>>>> >>>>> >>>>>> Il giorno 26 mag 2021, alle ore 16:20, Barry Smith > ha scritto: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> TSTrajectoryGetVecs() is listed as Collective on TS. This means all ranks must call it with the same times in the same order of operations on all ranks that share the TS. >>>>>> >>>>>> You do not need to use VecScatter. Each process must call TSTrajectoryGetVecs with the same time but then you can have only the rank you care about select the entries from the resulting vectors you care about while the other ranks for that time just ignore the vectors since they do not need to values from it. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>>> On May 26, 2021, at 5:20 AM, Francesco Brarda > wrote: >>>>>>> >>>>>>> Hi! >>>>>>> >>>>>>> I solved an ODE system with TS. Now I would like to save one of the trajectories in specific times. To do so, I used TSTrajectoryGetVecs. >>>>>>> The values of the variable I am interested in is on one processor. I want to collect these values in a parallel vector, but I had the error: >>>>>>> >>>>>>> [0]PETSC ERROR: Invalid argument >>>>>>> [0]PETSC ERROR: Real value must be same on all processes, argument # 2 >>>>>>> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>>>>> [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown >>>>>>> [0]PETSC ERROR: ./petsc_sir on a arch-debug named srvulx13 by fbrarda Wed May 26 12:00:42 2021 >>>>>>> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc --download-mpich PETSC_ARCH=arch-debug >>>>>>> [0]PETSC ERROR: #1 TSHistoryGetLocFromTime() line 134 in /home/fbrarda/petsc/src/ts/interface/tshistory.c >>>>>>> [0]PETSC ERROR: #2 TSTrajectoryReconstruct_Private() line 55 in /home/fbrarda/petsc/src/ts/trajectory/utils/reconstruct.c >>>>>>> [0]PETSC ERROR: #3 TSTrajectoryGetVecs() line 239 in /home/fbrarda/petsc/src/ts/trajectory/interface/traj.c >>>>>>> >>>>>>> Is there any specific routine I can use to overcome this issue? Should I use VecScatter? >>>>>>> >>>>>>> I hope I made myself clear. >>>>>>> Best, >>>>>>> Francesco >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.wick.1980 at gmail.com Mon May 31 08:20:02 2021 From: michael.wick.1980 at gmail.com (Michael Wick) Date: Mon, 31 May 2021 21:20:02 +0800 Subject: [petsc-users] Performing a coordinate system rotation for the stiffness matrix Message-ID: Hi PETSc team: I am considering implementing a skew roller boundary condition for my elasticity problem. The method is based on this journal paper: http://inside.mines.edu/~vgriffit/pubs/All_J_Pubs/18.pdf Or you may find the method in the attached Bathe's slides, pages 9 -10. Roughly speaking, a (very) sparse matrix T will be created which takes the shape [ I, O; O, R], where R is a 3x3 rotation matrix. And the original linear problem K U = F will be modified into (T^t K T) (T^t U) = T^t F. In doing so, one can enforce a roller boundary condition on a slanted surface. I think it can be an easy option if I can generate the T matrix and do two matrix multiplications to get T^t K T. I noticed that there is a MatPtAP function. Yet, after reading a previous discussion, it seems that this function is not designed for this purposes ( https://lists.mcs.anl.gov/pipermail/petsc-users/2018-June/035477.html). I assume I can only call MatMatMult & MatTransposeMatMult to do this job, correct? Is there any existingly PETSc function to do T^t K T in one call? Thanks, Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: MITRES2_002S10_lec03.pdf Type: application/pdf Size: 286742 bytes Desc: not available URL: From mfadams at lbl.gov Mon May 31 08:52:39 2021 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 31 May 2021 09:52:39 -0400 Subject: [petsc-users] [petsc-maint] Performing a coordinate system rotation for the stiffness matrix In-Reply-To: References: Message-ID: On Mon, May 31, 2021 at 9:20 AM Michael Wick wrote: > Hi PETSc team: > > I am considering implementing a skew roller boundary condition for my > elasticity problem. The method is based on this journal paper: > http://inside.mines.edu/~vgriffit/pubs/All_J_Pubs/18.pdf > > Or you may find the method in the attached Bathe's slides, pages 9 -10. > > Roughly speaking, a (very) sparse matrix T will be created which takes the > shape [ I, O; O, R], where R is a 3x3 rotation matrix. And the original > linear problem K U = F will be modified into (T^t K T) (T^t U) = T^t F. In > doing so, one can enforce a roller boundary condition on a slanted surface. > > I think it can be an easy option if I can generate the T matrix and do two > matrix multiplications to get T^t K T. I noticed that there is a MatPtAP > function. Yet, after reading a previous discussion, it seems that this > function is not designed for this purposes ( > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-June/035477.html). > Yes, and no. It is motivated and optimized for a Galerkin coarse grid operator for AMG solvers, but it is a projection and it should be fine. If not, we will fix it. We try to test our methods of "empty" operators , but I don't know if MatPtAP has ever been tested for super sparse P. Give it a shot and see what happens. Mark > > I assume I can only call MatMatMult & MatTransposeMatMult to do this job, > correct? Is there any existingly PETSc function to do T^t K T in one call? > > Thanks, > > Mike > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Mon May 31 09:42:20 2021 From: hongzhang at anl.gov (Zhang, Hong) Date: Mon, 31 May 2021 14:42:20 +0000 Subject: [petsc-users] Collect Trajectories components In-Reply-To: References: <8F422055-6AB2-4BE9-8845-63955BC0D3B9@petsc.dev> <7D4A509B-2D00-4F96-AE3F-0705D1038CFA@petsc.dev> <65CD020A-D3CD-4018-9FD2-A07676112499@petsc.dev> Message-ID: <447F9939-7063-4E01-96DA-5D29DEBF7D1B@anl.gov> On May 31, 2021, at 3:25 AM, Francesco Brarda > wrote: Thank you very much for the insights. I don?t believe I was clear enough in my previous message, I am sorry. The final vector I am trying to build should contain only 1 component of the trajectory (the second one, U[1]) at different times. For instance: Rank 0: Second component(t=0) Second component(t=1) Second component(t=2) Rank 1: Second component(t=3) Second component(t=4) Second component(t=5) Rank 2: Second component(t=6) Second component(t=7) And so on. Do you think it is possible? Does the vector U need to have specific requirements in terms of dofs or stencil width? It is doable. But it would likely be a bad idea if you care about performance. Are you trying to prototype some algorithm in a parallel-in-time style? It would be more helpful if you could provide some background about what you want to achieve. Hong (Mr.) Francesco Il giorno 28 mag 2021, alle ore 18:04, Barry Smith > ha scritto: What does "not working as I would like" mean? It should be retrieving the trajectory at the times 1.0, 2.0, 3.0 ... 40.0 and setting into the vector partial the values of the second component of Uloc (which depending on DMDA having a stencil width of 1 and a w of 1 is the first component of U. You can move the VecGet/RestoreArray(partial,&partlocal);CHKERRQ(ierr); outside of the loop. If you want the first component of U on process 0 you don't need the Uloc or the GlobalToLocalBegin/End. just use DMDAVecGetArray(appctx.da,U,&Ui);CHKERRQ(ierr); You only provide 14 locations in partial distributed over the MPI ranks but likely you want 40 on the first rank and none on the other ranks You are assigning part local[i] on all ranks, but you said you only want it on rank 0 so here is code that may work if rank == 0 { ierr = VecCreateMPI(PETSC_COMM_WORLD,40,PETSC_DETERMINE,&partial);CHKERRQ(ierr); /* 40 local values ierr = VecGetArray(partial,&partlocal);CHKERRQ(ierr); } else { ierr = VecCreateMPI(PETSC_COMM_WORLD,0,PETSC_DETERMINE,&partial);CHKERRQ(ierr); /* 0 local values } for(i=0; i<40; i++) { PetscReal ttime = i+1; ierr = TSTrajectoryGetVecs(appctx.tj,appctx.ts,PETSC_DECIDE,&ttime,U,NULL);CHKERRQ(ierr); if rank == 0 { ierr = DMDAVecGetArray(appctx.da,U,&Ui);CHKERRQ(ierr); partlocal[i] = Ui[0]; ierr = DMDAVecRestoreArray(appctx.da,U,&Ui);CHKERRQ(ierr); } } if rank == 0 { ierr = VecRestoreArray(partial,&partlocal);CHKERRQ(ierr); } Note that this entire block of code needs to run on all MPI ranks. But the actually selection of the wanted value only occurs on rank 0 When the loop is done rank == 0 will have a parallel vector whose components are what you want and all the other ranks will have a parallel vector with no components on those ranks. Note that you don't need to make partial be a parallel vector, you can just make it live on rank == 0 because that is the only place you access it. Then the code would be simpler if rank == 0 { ierr = VecCreateSeq(PETSC_COMM_WORLD,40,PETSC_&partial);CHKERRQ(ierr); ierr = VecGetArray(partial,&partlocal);CHKERRQ(ierr); } for(i=0; i<40; i++) { PetscReal ttime = i+1; ierr = TSTrajectoryGetVecs(appctx.tj,appctx.ts,PETSC_DECIDE,&ttime,U,NULL);CHKERRQ(ierr); if rank == 0 { ierr = DMDAVecGetArray(appctx.da,U,&Ui);CHKERRQ(ierr); partlocal[i] = Ui[0]; ierr = DMDAVecRestoreArray(appctx.da,U,&Ui);CHKERRQ(ierr); } } if rank == 0 { ierr = VecRestoreArray(partial,&partlocal);CHKERRQ(ierr);} Barry On May 27, 2021, at 2:42 AM, Francesco Brarda > wrote: I created a for cycle where I call TSTrajectoryGetVecs, but only the 0 rank seems to enter in this cycle and I do not know why. I thought the following might be a solution, but it is not working as I would like to, i.e. the final vector has the same local parts, a copy of the values obtained with the 0-rank. How should I change this, please? Vec U, partial, Uloc; PetscScalar *Ui, *partlocal; PetscInt i; ierr = VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,14,&partial);CHKERRQ(ierr); for(i=0; i<40; i++) { PetscReal ttime = i+1; ierr = TSTrajectoryGetVecs(appctx.tj,appctx.ts,PETSC_DECIDE,&ttime,U,NULL);CHKERRQ(ierr); ierr = DMGetLocalVector(appctx.da,&Uloc);CHKERRQ(ierr); ierr = DMGlobalToLocalBegin(appctx.da,U,INSERT_VALUES,Uloc);CHKERRQ(ierr); ierr = DMGlobalToLocalEnd(appctx.da,U,INSERT_VALUES,Uloc);CHKERRQ(ierr); ierr = DMDAVecGetArray(appctx.da,Uloc,&Ui);CHKERRQ(ierr); ierr = VecGetArray(partial,&partlocal);CHKERRQ(ierr); partlocal[i] = Ui[1]; ierr = DMDAVecRestoreArray(appctx.da,U,&Ui);CHKERRQ(ierr); ierr = VecRestoreArray(partial,&partlocal);CHKERRQ(ierr); ierr = DMRestoreLocalVector(appctx.da,&Uloc);CHKERRQ(ierr); } Il giorno 27 mag 2021, alle ore 01:15, Barry Smith > ha scritto: On May 26, 2021, at 10:39 AM, Francesco Brarda > wrote: Thank you very much. Based on the error message it appears that your code is requesting different times on different MPI ranks. Is that what you intend to do? Yes. I want to save different times across a vector built with multiple MPI ranks (PETSC_DECIDE for the local length). The function is called only by the first proc (rank=0) and not from the others. Is there a way to force also other ranks to call that routine? Yes, just have all ranks call it and ignore the result on the other ranks. Should I build everything into an external function outside the main? It can be called in main, does not need to be in a different function. Francesco Il giorno 26 mag 2021, alle ore 16:20, Barry Smith > ha scritto: TSTrajectoryGetVecs() is listed as Collective on TS. This means all ranks must call it with the same times in the same order of operations on all ranks that share the TS. You do not need to use VecScatter. Each process must call TSTrajectoryGetVecs with the same time but then you can have only the rank you care about select the entries from the resulting vectors you care about while the other ranks for that time just ignore the vectors since they do not need to values from it. Barry On May 26, 2021, at 5:20 AM, Francesco Brarda > wrote: Hi! I solved an ODE system with TS. Now I would like to save one of the trajectories in specific times. To do so, I used TSTrajectoryGetVecs. The values of the variable I am interested in is on one processor. I want to collect these values in a parallel vector, but I had the error: [0]PETSC ERROR: Invalid argument [0]PETSC ERROR: Real value must be same on all processes, argument # 2 [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown [0]PETSC ERROR: ./petsc_sir on a arch-debug named srvulx13 by fbrarda Wed May 26 12:00:42 2021 [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc --download-mpich PETSC_ARCH=arch-debug [0]PETSC ERROR: #1 TSHistoryGetLocFromTime() line 134 in /home/fbrarda/petsc/src/ts/interface/tshistory.c [0]PETSC ERROR: #2 TSTrajectoryReconstruct_Private() line 55 in /home/fbrarda/petsc/src/ts/trajectory/utils/reconstruct.c [0]PETSC ERROR: #3 TSTrajectoryGetVecs() line 239 in /home/fbrarda/petsc/src/ts/trajectory/interface/traj.c Is there any specific routine I can use to overcome this issue? Should I use VecScatter? I hope I made myself clear. Best, Francesco -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Mon May 31 10:12:13 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Mon, 31 May 2021 18:12:13 +0300 Subject: [petsc-users] [petsc-maint] Performing a coordinate system rotation for the stiffness matrix In-Reply-To: References: Message-ID: Mike as long as P is a sparse matrix with compatible rows and cols (i.e. rows(P)= cols(A) = rows (A)) , MatPtAP will compute the result. Il giorno lun 31 mag 2021 alle ore 16:52 Mark Adams ha scritto: > > > On Mon, May 31, 2021 at 9:20 AM Michael Wick > wrote: > >> Hi PETSc team: >> >> I am considering implementing a skew roller boundary condition for my >> elasticity problem. The method is based on this journal paper: >> http://inside.mines.edu/~vgriffit/pubs/All_J_Pubs/18.pdf >> >> Or you may find the method in the attached Bathe's slides, pages 9 -10. >> >> Roughly speaking, a (very) sparse matrix T will be created which takes >> the shape [ I, O; O, R], where R is a 3x3 rotation matrix. And the original >> linear problem K U = F will be modified into (T^t K T) (T^t U) = T^t F. In >> doing so, one can enforce a roller boundary condition on a slanted surface. >> >> I think it can be an easy option if I can generate the T matrix and do >> two matrix multiplications to get T^t K T. I noticed that there is a >> MatPtAP function. Yet, after reading a previous discussion, it seems that >> this function is not designed for this purposes ( >> https://lists.mcs.anl.gov/pipermail/petsc-users/2018-June/035477.html). >> > > Yes, and no. It is motivated and optimized for a Galerkin coarse grid > operator for AMG solvers, but it is a projection and it should be fine. If > not, we will fix it. > > We try to test our methods of "empty" operators , but I don't know > if MatPtAP has ever been tested for super sparse P. Give it a shot and see > what happens. > > Mark > > >> >> I assume I can only call MatMatMult & MatTransposeMatMult to do this job, >> correct? Is there any existingly PETSc function to do T^t K T in one call? >> >> Thanks, >> >> Mike >> >> -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 31 11:33:48 2021 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 31 May 2021 12:33:48 -0400 Subject: [petsc-users] [petsc-maint] Performing a coordinate system rotation for the stiffness matrix In-Reply-To: References: Message-ID: On Mon, May 31, 2021 at 11:12 AM Stefano Zampini wrote: > Mike > > as long as P is a sparse matrix with compatible rows and cols (i.e. > rows(P)= cols(A) = rows (A)) , MatPtAP will compute the result. > Stefano and Mark are correct. This will work. I implemented the same thing in my code in a different way. I put this transformation into the mapping between local and global vector spaces. The global degrees of freedom are the ones you want for boundary conditions (normal and tangential to the boundary), and I eliminate the ones that are constrained. The local degrees of freedom are the normal Caresian ones, and these are used for assembly. The map is used when I execute DMGlobalToLocal() and DMLocalToGlobal(). There is an example of me doing this in SNES ex71, Poiseuille flow in a tilted channel. Thanks, Matt > Il giorno lun 31 mag 2021 alle ore 16:52 Mark Adams ha > scritto: > >> >> >> On Mon, May 31, 2021 at 9:20 AM Michael Wick >> wrote: >> >>> Hi PETSc team: >>> >>> I am considering implementing a skew roller boundary condition for my >>> elasticity problem. The method is based on this journal paper: >>> http://inside.mines.edu/~vgriffit/pubs/All_J_Pubs/18.pdf >>> >>> Or you may find the method in the attached Bathe's slides, pages 9 -10. >>> >>> Roughly speaking, a (very) sparse matrix T will be created which takes >>> the shape [ I, O; O, R], where R is a 3x3 rotation matrix. And the original >>> linear problem K U = F will be modified into (T^t K T) (T^t U) = T^t F. In >>> doing so, one can enforce a roller boundary condition on a slanted surface. >>> >>> I think it can be an easy option if I can generate the T matrix and do >>> two matrix multiplications to get T^t K T. I noticed that there is a >>> MatPtAP function. Yet, after reading a previous discussion, it seems that >>> this function is not designed for this purposes ( >>> https://lists.mcs.anl.gov/pipermail/petsc-users/2018-June/035477.html). >>> >> >> Yes, and no. It is motivated and optimized for a Galerkin coarse grid >> operator for AMG solvers, but it is a projection and it should be fine. If >> not, we will fix it. >> >> We try to test our methods of "empty" operators , but I don't know >> if MatPtAP has ever been tested for super sparse P. Give it a shot and see >> what happens. >> >> Mark >> >> >>> >>> I assume I can only call MatMatMult & MatTransposeMatMult to do this >>> job, correct? Is there any existingly PETSc function to do T^t K T in one >>> call? >>> >>> Thanks, >>> >>> Mike >>> >>> > > -- > Stefano > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon May 31 11:57:10 2021 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 31 May 2021 11:57:10 -0500 Subject: [petsc-users] Collect Trajectories components In-Reply-To: <447F9939-7063-4E01-96DA-5D29DEBF7D1B@anl.gov> References: <8F422055-6AB2-4BE9-8845-63955BC0D3B9@petsc.dev> <7D4A509B-2D00-4F96-AE3F-0705D1038CFA@petsc.dev> <65CD020A-D3CD-4018-9FD2-A07676112499@petsc.dev> <447F9939-7063-4E01-96DA-5D29DEBF7D1B@anl.gov> Message-ID: So long as you are building the parallel version of the vector (that contains the second component at all times as I explained with VecScatter) AFTER the TSSolve() is done then the making it parallel (with VecScatter) is not really a performance issue. What you do with this parallel vector after you have made it could be a performance issue. Barry > On May 31, 2021, at 9:42 AM, Zhang, Hong wrote: > >> On May 31, 2021, at 3:25 AM, Francesco Brarda > wrote: >> >> Thank you very much for the insights. I don?t believe I was clear enough in my previous message, I am sorry. >> The final vector I am trying to build should contain only 1 component of the trajectory (the second one, U[1]) at different times. For instance: >> >> Rank 0: >> Second component(t=0) >> Second component(t=1) >> Second component(t=2) >> Rank 1: >> Second component(t=3) >> Second component(t=4) >> Second component(t=5) >> Rank 2: >> Second component(t=6) >> Second component(t=7) >> >> And so on. >> Do you think it is possible? Does the vector U need to have specific requirements in terms of dofs or stencil width? > > It is doable. But it would likely be a bad idea if you care about performance. Are you trying to prototype some algorithm in a parallel-in-time style? It would be more helpful if you could provide some background about what you want to achieve. > > Hong (Mr.) > >> >> Francesco >> >> >>> Il giorno 28 mag 2021, alle ore 18:04, Barry Smith > ha scritto: >>> >>> >>> What does "not working as I would like" mean? It should be retrieving the trajectory at the times 1.0, 2.0, 3.0 ... 40.0 and setting into the vector partial the values of the second component of Uloc (which depending on DMDA having a stencil width of 1 and a w of 1 is the first component of U. >>> >>> You can move the VecGet/RestoreArray(partial,&partlocal);CHKERRQ(ierr); outside of the loop. >>> >>> If you want the first component of U on process 0 you don't need the Uloc or the GlobalToLocalBegin/End. just use DMDAVecGetArray(appctx.da,U,&Ui);CHKERRQ(ierr); >>> >>> >>> You only provide 14 locations in partial distributed over the MPI ranks but likely you want 40 on the first rank and none on the other ranks >>> >>> You are assigning part local[i] on all ranks, but you said you only want it on rank 0 so here is code that may work >>> >>> if rank == 0 { >>>> ierr = VecCreateMPI(PETSC_COMM_WORLD,40,PETSC_DETERMINE,&partial);CHKERRQ(ierr); /* 40 local values >>>> ierr = VecGetArray(partial,&partlocal);CHKERRQ(ierr); >>> } else { >>> ierr = VecCreateMPI(PETSC_COMM_WORLD,0,PETSC_DETERMINE,&partial);CHKERRQ(ierr); /* 0 local values >>> } >>>> for(i=0; i<40; i++) { >>>> PetscReal ttime = i+1; >>>> ierr = TSTrajectoryGetVecs(appctx.tj,appctx.ts,PETSC_DECIDE,&ttime,U,NULL);CHKERRQ(ierr); >>> if rank == 0 { >>>> ierr = DMDAVecGetArray(appctx.da,U,&Ui);CHKERRQ(ierr); >>>> partlocal[i] = Ui[0]; >>>> ierr = DMDAVecRestoreArray(appctx.da,U,&Ui);CHKERRQ(ierr); >>> } >>> } >>>> if rank == 0 { >>>> ierr = VecRestoreArray(partial,&partlocal);CHKERRQ(ierr); >>> } >>> >>> Note that this entire block of code needs to run on all MPI ranks. But the actually selection of the wanted value only occurs on rank 0 >>> >>> When the loop is done rank == 0 will have a parallel vector whose components are what you want and all the other ranks will have a parallel vector with no components on those ranks. Note that you don't need to make partial be a parallel vector, you can just make it live on rank == 0 because that is the only place you access it. Then the code would be simpler >>> >>> if rank == 0 { >>>> ierr = VecCreateSeq(PETSC_COMM_WORLD,40,PETSC_&partial);CHKERRQ(ierr); >>>> ierr = VecGetArray(partial,&partlocal);CHKERRQ(ierr); >>> } >>> >>> for(i=0; i<40; i++) { >>>> PetscReal ttime = i+1; >>>> ierr = TSTrajectoryGetVecs(appctx.tj,appctx.ts,PETSC_DECIDE,&ttime,U,NULL);CHKERRQ(ierr); >>> if rank == 0 { >>>> ierr = DMDAVecGetArray(appctx.da,U,&Ui);CHKERRQ(ierr); >>>> partlocal[i] = Ui[0]; >>>> ierr = DMDAVecRestoreArray(appctx.da,U,&Ui);CHKERRQ(ierr); >>> } >>> } >>>> if rank == 0 { >>>> ierr = VecRestoreArray(partial,&partlocal);CHKERRQ(ierr);} >>> >>> >>> Barry >>> >>> >>>> On May 27, 2021, at 2:42 AM, Francesco Brarda > wrote: >>>> >>>> I created a for cycle where I call TSTrajectoryGetVecs, but only the 0 rank seems to enter in this cycle and I do not know why. >>>> I thought the following might be a solution, but it is not working as I would like to, i.e. the final vector has the same local parts, a copy of the values obtained with the 0-rank. How should I change this, please? >>>> >>>> Vec U, partial, Uloc; >>>> PetscScalar *Ui, *partlocal; >>>> PetscInt i; >>>> ierr = VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,14,&partial);CHKERRQ(ierr); >>>> for(i=0; i<40; i++) { >>>> PetscReal ttime = i+1; >>>> ierr = TSTrajectoryGetVecs(appctx.tj,appctx.ts,PETSC_DECIDE,&ttime,U,NULL);CHKERRQ(ierr); >>>> ierr = DMGetLocalVector(appctx.da,&Uloc);CHKERRQ(ierr); >>>> ierr = DMGlobalToLocalBegin(appctx.da,U,INSERT_VALUES,Uloc);CHKERRQ(ierr); >>>> ierr = DMGlobalToLocalEnd(appctx.da,U,INSERT_VALUES,Uloc);CHKERRQ(ierr); >>>> ierr = DMDAVecGetArray(appctx.da,Uloc,&Ui);CHKERRQ(ierr); >>>> ierr = VecGetArray(partial,&partlocal);CHKERRQ(ierr); >>>> partlocal[i] = Ui[1]; >>>> ierr = DMDAVecRestoreArray(appctx.da,U,&Ui);CHKERRQ(ierr); >>>> ierr = VecRestoreArray(partial,&partlocal);CHKERRQ(ierr); >>>> ierr = DMRestoreLocalVector(appctx.da,&Uloc);CHKERRQ(ierr); >>>> } >>>> >>>> >>>>> Il giorno 27 mag 2021, alle ore 01:15, Barry Smith > ha scritto: >>>>> >>>>> >>>>> >>>>>> On May 26, 2021, at 10:39 AM, Francesco Brarda > wrote: >>>>>> >>>>>> Thank you very much. >>>>>>> Based on the error message it appears that your code is requesting different times on different MPI ranks. Is that what you intend to do? >>>>>> Yes. I want to save different times across a vector built with multiple MPI ranks (PETSC_DECIDE for the local length). >>>>>> The function is called only by the first proc (rank=0) and not from the others. Is there a way to force also other ranks to call that routine? >>>>> >>>>> Yes, just have all ranks call it and ignore the result on the other ranks. >>>>> >>>>>> Should I build everything into an external function outside the main? >>>>> >>>>> It can be called in main, does not need to be in a different function. >>>>> >>>>>> >>>>>> Francesco >>>>>> >>>>>> >>>>>>> Il giorno 26 mag 2021, alle ore 16:20, Barry Smith > ha scritto: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> TSTrajectoryGetVecs() is listed as Collective on TS. This means all ranks must call it with the same times in the same order of operations on all ranks that share the TS. >>>>>>> >>>>>>> You do not need to use VecScatter. Each process must call TSTrajectoryGetVecs with the same time but then you can have only the rank you care about select the entries from the resulting vectors you care about while the other ranks for that time just ignore the vectors since they do not need to values from it. >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On May 26, 2021, at 5:20 AM, Francesco Brarda > wrote: >>>>>>>> >>>>>>>> Hi! >>>>>>>> >>>>>>>> I solved an ODE system with TS. Now I would like to save one of the trajectories in specific times. To do so, I used TSTrajectoryGetVecs. >>>>>>>> The values of the variable I am interested in is on one processor. I want to collect these values in a parallel vector, but I had the error: >>>>>>>> >>>>>>>> [0]PETSC ERROR: Invalid argument >>>>>>>> [0]PETSC ERROR: Real value must be same on all processes, argument # 2 >>>>>>>> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>>>>>> [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown >>>>>>>> [0]PETSC ERROR: ./petsc_sir on a arch-debug named srvulx13 by fbrarda Wed May 26 12:00:42 2021 >>>>>>>> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc --download-mpich PETSC_ARCH=arch-debug >>>>>>>> [0]PETSC ERROR: #1 TSHistoryGetLocFromTime() line 134 in /home/fbrarda/petsc/src/ts/interface/tshistory.c >>>>>>>> [0]PETSC ERROR: #2 TSTrajectoryReconstruct_Private() line 55 in /home/fbrarda/petsc/src/ts/trajectory/utils/reconstruct.c >>>>>>>> [0]PETSC ERROR: #3 TSTrajectoryGetVecs() line 239 in /home/fbrarda/petsc/src/ts/trajectory/interface/traj.c >>>>>>>> >>>>>>>> Is there any specific routine I can use to overcome this issue? Should I use VecScatter? >>>>>>>> >>>>>>>> I hope I made myself clear. >>>>>>>> Best, >>>>>>>> Francesco >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Mon May 31 13:00:55 2021 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Mon, 31 May 2021 11:00:55 -0700 Subject: [petsc-users] [petsc-maint] Performing a coordinate system rotation for the stiffness matrix In-Reply-To: References: Message-ID: In our FEA code we perform the rotations at the local level, before assembly so that it is easy to apply the boundary conditions, then unrotate locally after solution to get the usual Cartesian components.? Somehow this seems more efficient than doing this globally, but perhaps I am missing something. -sanjay On 5/31/21 9:33 AM, Matthew Knepley wrote: > On Mon, May 31, 2021 at 11:12 AM Stefano Zampini > > wrote: > > Mike > > as long as P is a sparse matrix with compatible?rows and cols > (i.e. rows(P)= cols(A) = rows (A)) , MatPtAP will compute the result. > > > Stefano and Mark are correct. This will work. > > I implemented the same thing in my code in a different way. I put this > transformation into the mapping between local and global > vector?spaces. The global degrees of > freedom are the ones you want for boundary conditions (normal and > tangential to the boundary), and I eliminate the ones that are > constrained. The local degrees of > freedom are the normal Caresian ones, and these are used for assembly. > The map is used when I execute?DMGlobalToLocal() and > DMLocalToGlobal(). There is an > example of me doing this in SNES ex71,?Poiseuille flow in a tilted > channel. > > ? Thanks, > > ? ? ? Matt > > Il giorno lun 31 mag 2021 alle ore 16:52 Mark Adams > > ha scritto: > > > > On Mon, May 31, 2021 at 9:20 AM Michael Wick > > wrote: > > Hi PETSc team: > > I am considering implementing?a skew roller boundary > condition for my elasticity problem. The method is based > on this journal paper: > http://inside.mines.edu/~vgriffit/pubs/All_J_Pubs/18.pdf > > > Or you may find the method in the attached Bathe's slides, > pages 9 -10. > > Roughly speaking, a (very) sparse matrix T will be created > which takes the shape [ I, O; O, R], where R is a 3x3 > rotation matrix. And the original linear problem K U = F > will be modified into (T^t K T) (T^t U) = T^t F. In doing > so, one can enforce a roller boundary condition on a > slanted surface. > > I think it can be an easy option if I can generate the T > matrix and do two matrix multiplications to get T^t K T. I > noticed that there is a MatPtAP function. Yet, after > reading a previous discussion, it seems that this function > is not designed?for this purposes > (https://lists.mcs.anl.gov/pipermail/petsc-users/2018-June/035477.html > ). > > > Yes, and no. It is motivated and optimized for a Galerkin > coarse grid operator for AMG solvers, but it is a projection > and it should be fine. If not, we will fix it. > > We try to test our methods of "empty" operators , but I don't > know if?MatPtAP has ever been tested for super sparse P. Give > it a shot and see what happens. > > Mark > > > I assume I can only call MatMatMult & MatTransposeMatMult > to do this job, correct? Is there any existingly PETSc > function to do T^t K T in one call? > > Thanks, > > Mike > > > > -- > Stefano > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon May 31 13:08:56 2021 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 31 May 2021 13:08:56 -0500 Subject: [petsc-users] [petsc-maint] Performing a coordinate system rotation for the stiffness matrix In-Reply-To: References: Message-ID: <2642E270-FCBB-4258-AB50-36DE31D26919@petsc.dev> Everybody is correct. Using the extremely sparse, nearly diagonal MPI matrix will have a bit more overhead than doing the rotations on the elements due to the needed MPI check and general use of a sparse-sparse-sparse matrix vector product. But if a code does not have support for the local transformations the extra cost of the the sparse matrix operation will be trivial relative to the rest of computations and so is fine to use. Barry > On May 31, 2021, at 1:00 PM, Sanjay Govindjee wrote: > > In our FEA code we perform the rotations at the local level, before assembly so that it is easy to apply the boundary conditions, then unrotate locally after solution to get the usual Cartesian components. Somehow this seems more efficient than doing this globally, but perhaps I am missing something. > -sanjay > > On 5/31/21 9:33 AM, Matthew Knepley wrote: >> On Mon, May 31, 2021 at 11:12 AM Stefano Zampini > wrote: >> Mike >> >> as long as P is a sparse matrix with compatible rows and cols (i.e. rows(P)= cols(A) = rows (A)) , MatPtAP will compute the result. >> >> Stefano and Mark are correct. This will work. >> >> I implemented the same thing in my code in a different way. I put this transformation into the mapping between local and global vector spaces. The global degrees of >> freedom are the ones you want for boundary conditions (normal and tangential to the boundary), and I eliminate the ones that are constrained. The local degrees of >> freedom are the normal Caresian ones, and these are used for assembly. The map is used when I execute DMGlobalToLocal() and DMLocalToGlobal(). There is an >> example of me doing this in SNES ex71, Poiseuille flow in a tilted channel. >> >> Thanks, >> >> Matt >> >> Il giorno lun 31 mag 2021 alle ore 16:52 Mark Adams > ha scritto: >> >> >> On Mon, May 31, 2021 at 9:20 AM Michael Wick > wrote: >> Hi PETSc team: >> >> I am considering implementing a skew roller boundary condition for my elasticity problem. The method is based on this journal paper: http://inside.mines.edu/~vgriffit/pubs/All_J_Pubs/18.pdf >> >> Or you may find the method in the attached Bathe's slides, pages 9 -10. >> >> Roughly speaking, a (very) sparse matrix T will be created which takes the shape [ I, O; O, R], where R is a 3x3 rotation matrix. And the original linear problem K U = F will be modified into (T^t K T) (T^t U) = T^t F. In doing so, one can enforce a roller boundary condition on a slanted surface. >> >> I think it can be an easy option if I can generate the T matrix and do two matrix multiplications to get T^t K T. I noticed that there is a MatPtAP function. Yet, after reading a previous discussion, it seems that this function is not designed for this purposes (https://lists.mcs.anl.gov/pipermail/petsc-users/2018-June/035477.html ). >> >> Yes, and no. It is motivated and optimized for a Galerkin coarse grid operator for AMG solvers, but it is a projection and it should be fine. If not, we will fix it. >> >> We try to test our methods of "empty" operators , but I don't know if MatPtAP has ever been tested for super sparse P. Give it a shot and see what happens. >> >> Mark >> >> >> I assume I can only call MatMatMult & MatTransposeMatMult to do this job, correct? Is there any existingly PETSc function to do T^t K T in one call? >> >> Thanks, >> >> Mike >> >> >> >> -- >> Stefano >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: