From C.Klaij at marin.nl Tue Jul 1 03:16:17 2025 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Tue, 1 Jul 2025 08:16:17 +0000 Subject: [petsc-users] problem with nested logging In-Reply-To: References: Message-ID: It's been a while, in the meantime we did upgrade the OS and the compilers but the problem still persists. Can it be that one must call the PetscLogEventRegister and the PetscLogEventBegin/End on all procs? Currently we don't do so, think of some small system being build and solved. This event is registered on all procs, but may take place on a single proc or subset with the begin/end only on that proc or subset. Chris ________________________________________ From: Klaij, Christiaan Sent: Thursday, May 1, 2025 3:06 PM To: Stefano Zampini Cc: Randall Mackie; PETSc users list; Isaac, Toby Subject: Re: [petsc-users] problem with nested logging If I deactivate all the calls to PetscLogEventBegin/End in the simulation code, the error does not show-up. But since there are more than 2500 events, it's impossible to do them one-by-one, especially since the error shows-up at random and requires a number of cases and repetitions. Unfortunately, I'm running out of time and budget. I will retry once we get Rocky Linux 9 and the latest Intel compilers. Chris ________________________________________ From: Stefano Zampini Sent: Thursday, May 1, 2025 10:57 AM To: Klaij, Christiaan Cc: Randall Mackie; PETSc users list; Isaac, Toby Subject: Re: [petsc-users] problem with nested logging Il giorno gio 1 mag 2025 alle ore 11:38 Klaij, Christiaan > ha scritto: The checks seem to be in place: I do get a PETSC ERROR when I add a log event on rank 0 as you suggested. Ok, the broken logic may be in LogView then. You can try deactivating some logging by classes and see how the error evolves, maybe using PetscLogClassSetActiveAll. Or, if feasible, commenting out some part of the simulation code Another thought: in between the log events pairs, I also have calls to PetscLogFlops, perhaps that plays a role somehow. It shouldn't Chris ________________________________________ From: Klaij, Christiaan > Sent: Thursday, May 1, 2025 10:23 AM To: Stefano Zampini Cc: Randall Mackie; PETSc users list; Isaac, Toby Subject: Re: [petsc-users] problem with nested logging Was the rewritting by Toby done somewhere between petsc 3.19.4 (no problem) and 3.23.4 (problem)? Chris ________________________________________ From: Stefano Zampini > Sent: Thursday, May 1, 2025 9:12 AM To: Klaij, Christiaan Cc: Randall Mackie; PETSc users list; Isaac, Toby Subject: Re: [petsc-users] problem with nested logging If I look at the code for PetscLogHandlerEventBegin_Default, there are checks in place to see if the event is collectively called (see below) Can you make sure the Nested logging system has the same checks? It should, but double check since the code has been largely rewritten by Toby some time ago; to check it should be as easy as writing a code that calls a collective event on a single process and a debug version of petsc should complain if (rank ==0) LogEventBegin() <-this should call MPIU_Allreduce, but other processes will not, thus hang If it does not complain, then the error must come from how the logic in LogView works, and from how it traverses the various events (my guess: processed in a different order from different processes). Without a reproducer, it is hard to understand what's going on static PetscErrorCode PetscLogHandlerEventBegin_Default(PetscLogHandler h, PetscLogEvent event, PetscObject o1, PetscObject o2, PetscObject o3, PetscObject o4) { PetscLogHandler_Default def = (PetscLogHandler_Default)h->data; PetscEventPerfInfo *event_perf_info = NULL; PetscLogEventInfo event_info; PetscLogDouble time; PetscLogState state; PetscLogStage stage; PetscFunctionBegin; PetscCall(PetscLogHandlerGetState(h, &state)); if (PetscDefined(USE_DEBUG)) { PetscCall(PetscLogStateEventGetInfo(state, event, &event_info)); if (PetscUnlikely(o1)) PetscValidHeader(o1, 3); if (PetscUnlikely(o2)) PetscValidHeader(o2, 4); if (PetscUnlikely(o3)) PetscValidHeader(o3, 5); if (PetscUnlikely(o4)) PetscValidHeader(o4, 6); if (event_info.collective && o1) { PetscInt64 b1[2], b2[2]; b1[0] = -o1->cidx; b1[1] = o1->cidx; PetscCallMPI(MPIU_Allreduce(b1, b2, 2, MPIU_INT64, MPI_MAX, PetscObjectComm(o1))); PetscCheck(-b2[0] == b2[1], PETSC_COMM_SELF, PETSC_ERR_PLIB, "Collective event %s not called collectively %" PetscInt64_FMT " != %" PetscInt64_FMT, event_info.name, -b2[0], b2[1]); } } /* Synchronization */ PetscCall(PetscLogHandlerEventSync_Default(h, event, PetscObjectComm(o1))); Il giorno gio 1 mag 2025 alle ore 09:56 Klaij, Christiaan >> ha scritto: I've tried with -log_sync, no complaints whatsoever, but the error is still there... Chris ________________________________________ From: Stefano Zampini >> Sent: Tuesday, April 29, 2025 6:12 PM To: Randall Mackie Cc: Klaij, Christiaan; PETSc users list; Isaac, Toby Subject: Re: [petsc-users] problem with nested logging Can you try using -log_sync ? This should check every entry/exit points of logged Events and complain if something is not collectively called Stefano On Tue, Apr 29, 2025, 18:21 Randall Mackie >>>> wrote: ah okay, I missed that this was found using openmpi. then it?s probably not the same issue we had. I can?t remember in which version it was fixed (I?m away from my work computer)?.I do know in our case openmpi and the latest Intel One API work fine. Randy On Apr 29, 2025, at 8:58?AM, Klaij, Christiaan >>>> wrote: Well, the error below only shows-up thanks to openmpi and gnu compilers. With the intel mpi and compilers it just hangs (tried oneapi 2023.1.0). In which version was that bug fixed? Chris ________________________________________ dr. ir. Christiaan Klaij | senior researcher | Research & Development | CFD Development T +31 317 49 33 44 | C.Klaij at marin.nl>>> | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!faEur1uGDdy2EYiXEYLCVq_UbMO1KiXKQa0vvY2nxKctWQDpgzsX7k9qTLBuMuQ6VNzwRHRHYc8jUopRLSBIKfE$ From: Randall Mackie >>>> Sent: Tuesday, April 29, 2025 3:33 PM To: Klaij, Christiaan Cc: Matthew Knepley; petsc-users at mcs.anl.gov>>>; Isaac, Toby Subject: Re: [petsc-users] problem with nested logging You don't often get email from rlmackie862 at gmail.com>>>. Learn why this is important> We had a similar issue last year that we eventually tracked down to a bug in Intel MPI AllReduce, which was around the same version you are using. Can you try a different MPI or the latest Intel One API and see if your error clears? Randy On Tue, Apr 29, 2025 at 8:17?AM Klaij, Christiaan via petsc-users >>>>>>>> wrote: I don't think so, we have tracing in place to detect mismatches. But as soon as I switch the tracing on, the error disappears... Same if I add a counter or print statements before and after EventBegin/End. Looks like a memory corruption problem, maybe nothing to do with petsc despite the error message. Chris ________________________________________ From: Matthew Knepley >>>>>>>> Sent: Tuesday, April 29, 2025 1:50 PM To: Klaij, Christiaan Cc: Junchao Zhang; petsc-users at mcs.anl.gov>>>>>>>; Isaac, Toby Subject: Re: [petsc-users] problem with nested logging On Tue, Apr 29, 2025 at 6:50?AM Klaij, Christiaan >>>>>>>>>>>>>>>> wrote: Here's a slightly better error message, obtained --with-debugging=1 Is it possible that you have a mismatched EventBegin()/EventEnd() in your code? That could be why we cannot reproduce it here. Thanks, Matt [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Petsc has generated inconsistent data [0]PETSC ERROR: MPIU_Allreduce() called in different locations (code lines) on different processors [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjggvQAzPU$ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 [0]PETSC ERROR: ./refresco with 2 MPI process(es) and PETSC_ARCH on marclus3login2 by cklaij Tue Apr 29 12:43:54 2025 [0]PETSC ERROR: Configure options: --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs --with-mpi-dir=/cm/shared/apps/openmpi/gcc/4.0.2 --with-x=0 --with-mpe=0 --with-debugging=1 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgVgVAJPM$ --with-blaslapack-dir=/cm/shared/apps/intel/oneapi/mkl/2021.4.0 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgo2JWTO4$ --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgX9ZMYJA$ --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-libs/superbuild --with-ssl=0 --with-shared-libraries=1 [0]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 [0]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:379 [0]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #4 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #5 PetscLogNestedTreePrintTop() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 [0]PETSC ERROR: #6 PetscLogHandlerView_Nested_XML() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 [0]PETSC ERROR: #7 PetscLogHandlerView_Nested() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 [0]PETSC ERROR: #8 PetscLogHandlerView() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 [0]PETSC ERROR: #9 PetscLogView() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/plog.c:2040 [0]PETSC ERROR: #10 /home/cklaij/ReFRESCO/trunk/Code/src/petsc_include_impl.F90:130 ________________________________________ [cid:ii_19681617e7812ff9cfc1] dr. ir. Christiaan Klaij | senior researcher | Research & Development | CFD Development T +31 317 49 33 44 | C.Klaij at marin.nl>>>>>>>>>>>>>>> | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgJooxJhg$ [Facebook] [LinkedIn] [YouTube] From: Klaij, Christiaan >>>>>>>>>>>>>>>> Sent: Monday, April 28, 2025 3:53 PM To: Matthew Knepley Cc: Junchao Zhang; petsc-users at mcs.anl.gov>>>>>>>>>>>>>>>; Isaac, Toby Subject: Re: [petsc-users] problem with nested logging Bisecting would be quite hard, it's not just the petsc version that changed, also other libs, compilers, even os components. Chris ________________________________________ From: Matthew Knepley >>>>>>>>>>>>>>>> Sent: Monday, April 28, 2025 3:06 PM To: Klaij, Christiaan Cc: Junchao Zhang; petsc-users at mcs.anl.gov>>>>>>>>>>>>>>>; Isaac, Toby Subject: Re: [petsc-users] problem with nested logging You don't often get email from knepley at gmail.com>>>>>>>>>>>>>>>. Learn why this is important On Mon, Apr 28, 2025 at 8:45?AM Klaij, Christiaan via petsc-users >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: I've tried adding a nested log viewer to src/snes/tutorials/ex70.c, but it does not replicate the problem and works fine. Perhaps it is related to fortran, since the manualpage of PetscLogNestedBegin says "No fortran support" (why? we've been using it in fortran ever since). Therefore I've tried adding it to src/snes/ex5f90.F90 and that also works fine. It seems I cannot replicate the problem in a small example, unfortunately. We cannot replicate it here. Is there a chance you could bisect to see what change is responsible? Thanks, Matt Chris ________________________________________ From: Junchao Zhang >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Saturday, April 26, 2025 3:51 PM To: Klaij, Christiaan Cc: petsc-users at mcs.anl.gov>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>; Isaac, Toby Subject: Re: [petsc-users] problem with nested logging You don't often get email from junchao.zhang at gmail.com>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>. Learn why this is important Toby (Cc'ed) might know it. Or could you provide an example? --Junchao Zhang On Fri, Apr 25, 2025 at 3:31?AM Klaij, Christiaan via petsc-users >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: We recently upgraded from 3.19.4 to 3.22.4 but face the problem below with the nested logging. Any ideas? Chris [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: General MPI error [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6gIT68pbk$ for trouble shooting. [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 [1]PETSC ERROR: refresco with 2 MPI process(es) and PETSC_ARCH on marclus3login2 by jwindt Fri Apr 25 08:52:30 2025 [1]PETSC ERROR: Configure options: --prefix=/home/jwindt/cmake_builds/refresco/install-libs-gnu --with-mpi-dir=/cm/shared/apps/openmpi/gcc/4.0.2 --with-x=0 --with-mpe=0 --with-debugging=0 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6grH5BbeU$ --with-blaslapack-dir=/cm/shared/apps/intel/oneapi/mkl/2021.4.0 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6gw4-tEtY$ --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6gHq4uYiY$ --with-packages-build-dir=/home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" [1]PETSC ERROR: #1 PetscLogNestedTreePrint() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:330 [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 [1]PETSC ERROR: #7 PetscLogHandlerView() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 [1]PETSC ERROR: #8 PetscLogView() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/plog.c:2040 -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 98. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- [cid:ii_196725d1e2a809852191] dr. ir. Christiaan Klaij | senior researcher | Research & Development | CFD Development T +31 317 49 33 44 | C.Klaij at marin.nl>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6g8TwMMcw$ [Facebook] [LinkedIn] [YouTube] -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjg539kFLg$ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjg539kFLg$ -- Stefano -- Stefano From bsmith at petsc.dev Tue Jul 1 09:54:16 2025 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 1 Jul 2025 10:54:16 -0400 Subject: [petsc-users] problem with nested logging In-Reply-To: References: Message-ID: <589D9E7C-5632-46EB-A976-7DB1E69F497A@petsc.dev> It's probably already been done, but if not, you should run under Valgrind. "Random" crashes are usually an indication of memory corruption. > On Jul 1, 2025, at 4:16?AM, Klaij, Christiaan via petsc-users wrote: > > It's been a while, in the meantime we did upgrade the OS and the > compilers but the problem still persists. > > Can it be that one must call the PetscLogEventRegister and the > PetscLogEventBegin/End on all procs? Currently we don't do so, > think of some small system being build and solved. This event is > registered on all procs, but may take place on a single proc or > subset with the begin/end only on that proc or subset. > > Chris > > ________________________________________ > From: Klaij, Christiaan > Sent: Thursday, May 1, 2025 3:06 PM > To: Stefano Zampini > Cc: Randall Mackie; PETSc users list; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > If I deactivate all the calls to PetscLogEventBegin/End in the > simulation code, the error does not show-up. > > But since there are more than 2500 events, it's impossible to do > them one-by-one, especially since the error shows-up at random > and requires a number of cases and repetitions. > > Unfortunately, I'm running out of time and budget. I will retry > once we get Rocky Linux 9 and the latest Intel compilers. > > Chris > > ________________________________________ > From: Stefano Zampini > Sent: Thursday, May 1, 2025 10:57 AM > To: Klaij, Christiaan > Cc: Randall Mackie; PETSc users list; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > > > Il giorno gio 1 mag 2025 alle ore 11:38 Klaij, Christiaan > ha scritto: > The checks seem to be in place: I do get a PETSC ERROR when I add a log event on rank 0 as you suggested. > > > Ok, the broken logic may be in LogView then. You can try deactivating some logging by classes and see how the error evolves, maybe using PetscLogClassSetActiveAll. Or, if feasible, commenting out some part of the simulation code > > Another thought: in between the log events pairs, I also have calls to PetscLogFlops, perhaps that plays a role somehow. > > It shouldn't > > Chris > > ________________________________________ > From: Klaij, Christiaan > > Sent: Thursday, May 1, 2025 10:23 AM > To: Stefano Zampini > Cc: Randall Mackie; PETSc users list; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > Was the rewritting by Toby done somewhere between petsc 3.19.4 (no problem) and 3.23.4 (problem)? > > Chris > > ________________________________________ > From: Stefano Zampini > > Sent: Thursday, May 1, 2025 9:12 AM > To: Klaij, Christiaan > Cc: Randall Mackie; PETSc users list; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > If I look at the code for PetscLogHandlerEventBegin_Default, there are checks in place to see if the event is collectively called (see below) > Can you make sure the Nested logging system has the same checks? > It should, but double check since the code has been largely rewritten by Toby some time ago; to check it should be as easy as writing a code that calls a collective event on a single process and a debug version of petsc should complain > > if (rank ==0) > LogEventBegin() <-this should call MPIU_Allreduce, but other processes will not, thus hang > > > If it does not complain, then the error must come from how the logic in LogView works, and from how it traverses the various events (my guess: processed in a different order from different processes). Without a reproducer, it is hard to understand what's going on > > static PetscErrorCode PetscLogHandlerEventBegin_Default(PetscLogHandler h, PetscLogEvent event, PetscObject o1, PetscObject o2, PetscObject o3, PetscObject o4) > { > PetscLogHandler_Default def = (PetscLogHandler_Default)h->data; > PetscEventPerfInfo *event_perf_info = NULL; > PetscLogEventInfo event_info; > PetscLogDouble time; > PetscLogState state; > PetscLogStage stage; > > PetscFunctionBegin; > PetscCall(PetscLogHandlerGetState(h, &state)); > if (PetscDefined(USE_DEBUG)) { > PetscCall(PetscLogStateEventGetInfo(state, event, &event_info)); > if (PetscUnlikely(o1)) PetscValidHeader(o1, 3); > if (PetscUnlikely(o2)) PetscValidHeader(o2, 4); > if (PetscUnlikely(o3)) PetscValidHeader(o3, 5); > if (PetscUnlikely(o4)) PetscValidHeader(o4, 6); > if (event_info.collective && o1) { > PetscInt64 b1[2], b2[2]; > > b1[0] = -o1->cidx; > b1[1] = o1->cidx; > PetscCallMPI(MPIU_Allreduce(b1, b2, 2, MPIU_INT64, MPI_MAX, PetscObjectComm(o1))); > PetscCheck(-b2[0] == b2[1], PETSC_COMM_SELF, PETSC_ERR_PLIB, "Collective event %s not called collectively %" PetscInt64_FMT " != %" PetscInt64_FMT, event_info.name, -b2[0], b2[1]); > } > } > /* Synchronization */ > PetscCall(PetscLogHandlerEventSync_Default(h, event, PetscObjectComm(o1))); > > > > > Il giorno gio 1 mag 2025 alle ore 09:56 Klaij, Christiaan >> ha scritto: > I've tried with -log_sync, no complaints whatsoever, but the error is still there... > > Chris > > ________________________________________ > From: Stefano Zampini >> > Sent: Tuesday, April 29, 2025 6:12 PM > To: Randall Mackie > Cc: Klaij, Christiaan; PETSc users list; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > Can you try using -log_sync ? This should check every entry/exit points of logged Events and complain if something is not collectively called > > Stefano > > On Tue, Apr 29, 2025, 18:21 Randall Mackie >>>> wrote: > ah okay, I missed that this was found using openmpi. > > then it?s probably not the same issue we had. > > I can?t remember in which version it was fixed (I?m away from my work computer)?.I do know in our case openmpi and the latest Intel One API work fine. > > Randy > > On Apr 29, 2025, at 8:58?AM, Klaij, Christiaan >>>> wrote: > > Well, the error below only shows-up thanks to openmpi and gnu compilers. > With the intel mpi and compilers it just hangs (tried oneapi 2023.1.0). In which version was that bug fixed? > > Chris > > ________________________________________ > > dr. ir. Christiaan Klaij > | senior researcher | Research & Development | CFD Development > T +31 317 49 33 44 | C.Klaij at marin.nl>>> | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!faEur1uGDdy2EYiXEYLCVq_UbMO1KiXKQa0vvY2nxKctWQDpgzsX7k9qTLBuMuQ6VNzwRHRHYc8jUopRLSBIKfE$ > > > > > From: Randall Mackie >>>> > Sent: Tuesday, April 29, 2025 3:33 PM > To: Klaij, Christiaan > Cc: Matthew Knepley; petsc-users at mcs.anl.gov>>>; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > You don't often get email from rlmackie862 at gmail.com>>>. Learn why this is important> > We had a similar issue last year that we eventually tracked down to a bug in Intel MPI AllReduce, which was around the same version you are using. > > Can you try a different MPI or the latest Intel One API and see if your error clears? > > Randy > > On Tue, Apr 29, 2025 at 8:17?AM Klaij, Christiaan via petsc-users >>>>>>>> wrote: > I don't think so, we have tracing in place to detect mismatches. But as soon as I switch the tracing on, the error disappears... Same if I add a counter or print statements before and after EventBegin/End. Looks like a memory corruption problem, maybe nothing to do with petsc despite the error message. > > Chris > > ________________________________________ > From: Matthew Knepley >>>>>>>> > Sent: Tuesday, April 29, 2025 1:50 PM > To: Klaij, Christiaan > Cc: Junchao Zhang; petsc-users at mcs.anl.gov>>>>>>>; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > On Tue, Apr 29, 2025 at 6:50?AM Klaij, Christiaan >>>>>>>>>>>>>>>> wrote: > Here's a slightly better error message, obtained --with-debugging=1 > > Is it possible that you have a mismatched EventBegin()/EventEnd() in your code? That could be why we cannot reproduce it here. > > Thanks, > > Matt > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Petsc has generated inconsistent data > [0]PETSC ERROR: MPIU_Allreduce() called in different locations (code lines) on different processors > [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjggvQAzPU$ for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 > [0]PETSC ERROR: ./refresco with 2 MPI process(es) and PETSC_ARCH on marclus3login2 by cklaij Tue Apr 29 12:43:54 2025 > [0]PETSC ERROR: Configure options: --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs --with-mpi-dir=/cm/shared/apps/openmpi/gcc/4.0.2 --with-x=0 --with-mpe=0 --with-debugging=1 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgVgVAJPM$ --with-blaslapack-dir=/cm/shared/apps/intel/oneapi/mkl/2021.4.0 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgo2JWTO4$ --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgX9ZMYJA$ --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-libs/superbuild --with-ssl=0 --with-shared-libraries=1 > [0]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 > [0]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:379 > [0]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > [0]PETSC ERROR: #4 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > [0]PETSC ERROR: #5 PetscLogNestedTreePrintTop() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 > [0]PETSC ERROR: #6 PetscLogHandlerView_Nested_XML() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 > [0]PETSC ERROR: #7 PetscLogHandlerView_Nested() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 > [0]PETSC ERROR: #8 PetscLogHandlerView() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 > [0]PETSC ERROR: #9 PetscLogView() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/plog.c:2040 > [0]PETSC ERROR: #10 /home/cklaij/ReFRESCO/trunk/Code/src/petsc_include_impl.F90:130 > > ________________________________________ > [cid:ii_19681617e7812ff9cfc1] > dr. ir. Christiaan Klaij > | senior researcher | Research & Development | CFD Development > T +31 317 49 33 44 | C.Klaij at marin.nl>>>>>>>>>>>>>>> | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgJooxJhg$ > [Facebook] > [LinkedIn] > [YouTube] > > From: Klaij, Christiaan >>>>>>>>>>>>>>>> > Sent: Monday, April 28, 2025 3:53 PM > To: Matthew Knepley > Cc: Junchao Zhang; petsc-users at mcs.anl.gov>>>>>>>>>>>>>>>; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > Bisecting would be quite hard, it's not just the petsc version that changed, also other libs, compilers, even os components. > > Chris > > ________________________________________ > From: Matthew Knepley >>>>>>>>>>>>>>>> > Sent: Monday, April 28, 2025 3:06 PM > To: Klaij, Christiaan > Cc: Junchao Zhang; petsc-users at mcs.anl.gov>>>>>>>>>>>>>>>; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > You don't often get email from knepley at gmail.com>>>>>>>>>>>>>>>. Learn why this is important > On Mon, Apr 28, 2025 at 8:45?AM Klaij, Christiaan via petsc-users >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > I've tried adding a nested log viewer to src/snes/tutorials/ex70.c, > but it does not replicate the problem and works fine. > > Perhaps it is related to fortran, since the manualpage of > PetscLogNestedBegin says "No fortran support" (why? we've been > using it in fortran ever since). > > Therefore I've tried adding it to src/snes/ex5f90.F90 and that > also works fine. It seems I cannot replicate the problem in a > small example, unfortunately. > > We cannot replicate it here. Is there a chance you could bisect to see what change is responsible? > > Thanks, > > Matt > > Chris > > ________________________________________ > From: Junchao Zhang >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Sent: Saturday, April 26, 2025 3:51 PM > To: Klaij, Christiaan > Cc: petsc-users at mcs.anl.gov>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > You don't often get email from junchao.zhang at gmail.com>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>. Learn why this is important > Toby (Cc'ed) might know it. Or could you provide an example? > > --Junchao Zhang > > > On Fri, Apr 25, 2025 at 3:31?AM Klaij, Christiaan via petsc-users >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > We recently upgraded from 3.19.4 to 3.22.4 but face the problem below with the nested logging. Any ideas? > > Chris > > > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: General MPI error > [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer > [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6gIT68pbk$ for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 > [1]PETSC ERROR: refresco with 2 MPI process(es) and PETSC_ARCH on marclus3login2 by jwindt Fri Apr 25 08:52:30 2025 > [1]PETSC ERROR: Configure options: --prefix=/home/jwindt/cmake_builds/refresco/install-libs-gnu --with-mpi-dir=/cm/shared/apps/openmpi/gcc/4.0.2 --with-x=0 --with-mpe=0 --with-debugging=0 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6grH5BbeU$ --with-blaslapack-dir=/cm/shared/apps/intel/oneapi/mkl/2021.4.0 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6gw4-tEtY$ --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6gHq4uYiY$ --with-packages-build-dir=/home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" > [1]PETSC ERROR: #1 PetscLogNestedTreePrint() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:330 > [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 > [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 > [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 > [1]PETSC ERROR: #7 PetscLogHandlerView() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 > [1]PETSC ERROR: #8 PetscLogView() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/plog.c:2040 > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD > with errorcode 98. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > [cid:ii_196725d1e2a809852191] > dr. ir. Christiaan Klaij > | senior researcher | Research & Development | CFD Development > T +31 317 49 33 44 | C.Klaij at marin.nl>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6g8TwMMcw$ > [Facebook] > [LinkedIn] > [YouTube] > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjg539kFLg$ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjg539kFLg$ > > > > -- > Stefano > > > -- > Stefano From Olivier.JAMOND at cea.fr Tue Jul 1 11:07:57 2025 From: Olivier.JAMOND at cea.fr (JAMOND Olivier) Date: Tue, 1 Jul 2025 16:07:57 +0000 Subject: [petsc-users] Efficient handling of missing diagonal entities In-Reply-To: <3D2FB177-9C1F-4EDE-97B4-C6F6B92C0D52@joliv.et> References: <3FE25D55-AA1E-4CBF-B122-A98072D354C4@petsc.dev>, <3D2FB177-9C1F-4EDE-97B4-C6F6B92C0D52@joliv.et> Message-ID: Hi Pierre and Barry, Thanks for your answers and sorry for my reaction time (a bit overwhelmed...). Yes I confirm that I already tried `MAT_FORCE_DIAGONAL_ENTRIES` following a Pierre suggestion, but it did not help... I understand from your messages that this option indeed needs a fix. Do you think that such a fix for this option could be envisaged in a next future in petsc's roadmap? Many thanks, _________________________________________ Olivier Jamond Research Engineer French Atomic Energy and Alternative Energies Commission DES/ISAS/DM2S/SEMT/DYN 91191 Gif sur Yvette, Cedex, France Email: olivier.jamond at cea.fr Phone: +336.78.18.18.25 ________________________________ From: petsc-users on behalf of Pierre Jolivet Sent: Friday, June 27, 2025 3:50:29 PM To: Barry Smith Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Efficient handling of missing diagonal entities On 27 Jun 2025, at 3:21?PM, Barry Smith wrote: ? Because I completely forgot that this option existed, and the LLM didn't save me from embarrassing myself. I see that this option sets mat->force_diagonals, but this variable is never used in the mat assembly routines, meaning it will not help in this situation. Presumably, MatAssemblyXXX_YYY() could/should be fixed to respect this flag? Yes, Olivier asked me the same question previously, I told him that this option should probably be revamped because it?s there but I don?t think it?s doing its job. Thanks, Pierre Then it would help the Olivier. Barry On Jun 27, 2025, at 7:49?AM, Pierre Jolivet wrote: On 27 Jun 2025, at 1:33?PM, Barry Smith wrote: ? Handling empty diagonal entries on matrices is often problematic, just as you describe. I suggest placing explicit zeros on the diagonal first before providing the other entries, which might be the cleanest and most efficient approach. So have each MPI rank loop over its local rows and call MatSetValue() for each diagonal entry and then continue with your other MatSetValues(). Do not call MatAssemblyBegin/End() after you have provided the zeros on the diagonal just chug straight into setting the other values. Barry As you observed, trying to add the zero entries in the matrix after it is assembled is terribly inefficient and not the way to go. I've considered adding a matrix option to force zero entries on the diagonal, but I never completed my consideration. For example, MatSetOption(A, MAT_NONEMPTY_DIAGONAL,PETSC_TRUE); Why would you need another option when there is already MAT_FORCE_DIAGONAL_ENTRIES? Thanks, Pierre and when this option is set, MatAssemblyBegin fills up any empty diagonal entries automatically. On Jun 27, 2025, at 6:26?AM, JAMOND Olivier wrote: Hello, I am working on a PDE solver which uses petsc to solve its sparse distributed linear systems. I am mainly dealing with MPIAIJ matrices. In some situations, it may happen that the matrices considered does not have non-zero term on the diagonal. For instance I work on a case which have a stokes like saddle-point structure (in a MPIAIJ, not a MATNEST): [A Bt][U]=[F] [B 0 ][L] [0] I do not insert null terms in the zero block. In some cases, I use the function `MatZeroRowsColumns` to handle "Dirichlet" boundary conditions. In this particular case, I apply Dirichlet BCs only on dofs of "U". But I get an error `Matrix is missing diagonal entry in row X` from the function `MatZeroRowsColumns`, where X is a row related to "L". My first question is: is it normal that I get an error for a missing diagonal in the function `MatZeroRowsColumns`entry for a dof that is not involved in the list of dofs that I pass to `MatZeroRowsColumns`? I then tried to make my code to detect that there are some missing diagonal entries, and add an explicit zero to them. My code which adds the missing diagonal entries looks like what follows. This is certainly not the best way to do that, as in my test case about ~80% of the total computation time is spent in this piece of code (more precisely in `MatSetValue(D, k, k, 0., ADD_VALUES)`). So my second question is: what would be the most efficient way to detect the missing diagonal entries, and ad explicit zeros on the diagonal at these places? Many thanks, Olivier ... MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); Mat D; MatGetDiagonalBlock(A, &D); PetscBool missing; MatMissingDiagonal(D, &missing, NULL); if (missing) { IS missingDiagEntryRows; MatFindZeroDiagonals(D, &missingDiagEntryRows) PetscInt size; ISGetLocalSize(missingDiagEntryRows, &size); const PetscInt *ptr; ISGetIndices(missingDiagEntryRows, &ptr); for (Index i = 0; i < size; ++i) { PetscInt k = ptr[i]; MatSetValue(D, k, k, 0., ADD_VALUES); } MatAssemblyBegin(D, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(D, MAT_FINAL_ASSEMBLY); ISRestoreIndices(missingDiagEntryRows, &ptr); } _________________________________________ Olivier Jamond Research Engineer French Atomic Energy and Alternative Energies Commission DES/ISAS/DM2S/SEMT/DYN 91191 Gif sur Yvette, Cedex, France Email: olivier.jamond@cea.fr Phone: +336.78.18.18.25 -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicolas.tardieu at edf.fr Tue Jul 1 11:30:45 2025 From: nicolas.tardieu at edf.fr (TARDIEU Nicolas) Date: Tue, 1 Jul 2025 16:30:45 +0000 Subject: [petsc-users] New Fortran API to PCFieldSplitGetSubKSP Message-ID: Dear PETSc team, I am having hard time upgrading our code's interface to the new Fortran API... For instance, I lack a Fortran example of the use of PCFieldSplitGetSubKSP. The doc says "You must pass in a KSP array that is large enough to contain all the KSPs" but the interface file is : subroutine PCFieldSplitGetSubKSP(a,b,c, z) import tPC,tKSP PC :: a PetscInt :: b KSP, pointer :: c(:) PetscErrorCode z end subroutine end interface Does it mean that I have to allocate an array of KSP before calling the routine ? It seems to me that the other new interfaces take care of allocating the output object. PCBJacobiGetSubKSP does it. Thanks in advance, Nicolas -- Nicolas Tardieu Ing PhD Computational Mechanics EDF - R&D Dpt ERMES PARIS-SACLAY, FRANCE Ce message et toutes les pi?ces jointes (ci-apr?s le 'Message') sont ?tablis ? l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme ? sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse. Si vous n'?tes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez re?u ce Message par erreur, merci de le supprimer de votre syst?me, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions ?galement d'en avertir imm?diatement l'exp?diteur par retour du message. Il est impossible de garantir que les communications par messagerie ?lectronique arrivent en temps utile, sont s?curis?es ou d?nu?es de toute erreur ou virus. ____________________________________________________ This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval. If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message. E-mail communication cannot be guaranteed to be timely secure, error or virus-free. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Tue Jul 1 11:58:58 2025 From: pierre at joliv.et (Pierre Jolivet) Date: Tue, 1 Jul 2025 18:58:58 +0200 Subject: [petsc-users] New Fortran API to PCFieldSplitGetSubKSP In-Reply-To: References: Message-ID: <4EF1D509-6268-41C2-B5A2-4A455BED1B9A@joliv.et> Dear Nicolas, > On 1 Jul 2025, at 6:31?PM, TARDIEU Nicolas via petsc-users wrote: > ? > Dear PETSc team, > > I am having hard time upgrading our code's interface to the new Fortran API... > For instance, I lack a Fortran example of the use of PCFieldSplitGetSubKSP. The doc says "You must pass in a KSP array that is large enough to contain all the KSPs" but the interface file is : > > subroutine PCFieldSplitGetSubKSP(a,b,c, z) > import tPC,tKSP > PC :: a > PetscInt :: b > KSP, pointer :: c(:) > PetscErrorCode z > end subroutine > end interface > > Does it mean that I have to allocate an array of KSP before calling the routine ? Yes. > It seems to me that the other new interfaces take care of allocating the output object. PCBJacobiGetSubKSP does it. They do not work the same behind the scene (because for fieldsplit, in the internal structure, the sub KSP are not necessarily stored in a contiguous array), and so they are not strictly equivalent. Thanks, Pierre > Thanks in advance, > > Nicolas > -- > Nicolas Tardieu > > Ing PhD Computational Mechanics > > EDF - R&D Dpt ERMES > > PARIS-SACLAY, FRANCE > > > > > Ce message et toutes les pi?ces jointes (ci-apr?s le 'Message') sont ?tablis ? l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme ? sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse. > > Si vous n'?tes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez re?u ce Message par erreur, merci de le supprimer de votre syst?me, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions ?galement d'en avertir imm?diatement l'exp?diteur par retour du message. > > Il est impossible de garantir que les communications par messagerie ?lectronique arrivent en temps utile, sont s?curis?es ou d?nu?es de toute erreur ou virus. > ____________________________________________________ > > This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval. > > If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message. > > E-mail communication cannot be guaranteed to be timely secure, error or virus-free. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicolas.tardieu at edf.fr Tue Jul 1 12:11:28 2025 From: nicolas.tardieu at edf.fr (TARDIEU Nicolas) Date: Tue, 1 Jul 2025 17:11:28 +0000 Subject: [petsc-users] New Fortran API to PCFieldSplitGetSubKSP In-Reply-To: <4EF1D509-6268-41C2-B5A2-4A455BED1B9A@joliv.et> References: <4EF1D509-6268-41C2-B5A2-4A455BED1B9A@joliv.et> Message-ID: Thank you so much for your ultra fast answer Pierre! Nicolas ________________________________ De : pierre at joliv.et Envoy? : mardi 1 juillet 2025 18:58 ? : TARDIEU Nicolas Cc : petsc-users at mcs.anl.gov Objet : Re: [petsc-users] New Fortran API to PCFieldSplitGetSubKSP Exp?diteur externe : V?rifier l?exp?diteur avant de cliquer sur les liens ou pi?ces-jointes External Sender : Check the sender before clicking any links or attachments Dear Nicolas, On 1 Jul 2025, at 6:31?PM, TARDIEU Nicolas via petsc-users wrote: ? Dear PETSc team, I am having hard time upgrading our code's interface to the new Fortran API... For instance, I lack a Fortran example of the use of PCFieldSplitGetSubKSP. The doc says "You must pass in a KSP array that is large enough to contain all the KSPs" but the interface file is : subroutine PCFieldSplitGetSubKSP(a,b,c, z) import tPC,tKSP PC :: a PetscInt :: b KSP, pointer :: c(:) PetscErrorCode z end subroutine end interface Does it mean that I have to allocate an array of KSP before calling the routine ? Yes. It seems to me that the other new interfaces take care of allocating the output object. PCBJacobiGetSubKSP does it. They do not work the same behind the scene (because for fieldsplit, in the internal structure, the sub KSP are not necessarily stored in a contiguous array), and so they are not strictly equivalent. Thanks, Pierre Thanks in advance, Nicolas -- Nicolas Tardieu Ing PhD Computational Mechanics EDF - R&D Dpt ERMES PARIS-SACLAY, FRANCE Ce message et toutes les pi?ces jointes (ci-apr?s le 'Message') sont ?tablis ? l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme ? sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse. Si vous n'?tes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez re?u ce Message par erreur, merci de le supprimer de votre syst?me, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions ?galement d'en avertir imm?diatement l'exp?diteur par retour du message. Il est impossible de garantir que les communications par messagerie ?lectronique arrivent en temps utile, sont s?curis?es ou d?nu?es de toute erreur ou virus. ____________________________________________________ This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval. If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message. E-mail communication cannot be guaranteed to be timely secure, error or virus-free. Ce message et toutes les pi?ces jointes (ci-apr?s le 'Message') sont ?tablis ? l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme ? sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse. Si vous n'?tes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez re?u ce Message par erreur, merci de le supprimer de votre syst?me, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions ?galement d'en avertir imm?diatement l'exp?diteur par retour du message. Il est impossible de garantir que les communications par messagerie ?lectronique arrivent en temps utile, sont s?curis?es ou d?nu?es de toute erreur ou virus. ____________________________________________________ This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval. If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message. E-mail communication cannot be guaranteed to be timely secure, error or virus-free. -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Jul 3 17:52:20 2025 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 3 Jul 2025 17:52:20 -0500 Subject: [petsc-users] Introducing https://mpi-debug.org Message-ID: Dear petsc users, I apologize that this email is not about petsc, but it is relevant. Supported by the Best Scientific Software (BSSw) program, I've set up a website https://urldefense.us/v3/__https://mpi-debug.org__;!!G_uCfscf7eWS!fUAC0I4JHaglnu41VoVKLc_7NjKvgGGhsyb4mMu8uxeJxaQkXW6vfFi-Ev4hHydapxE4MGWSsCkq6o-W_bSw2ZXaetJS$ for the HPC community to share MPI debugging tips (in part because I debugged a lot of petsc codes). It is still in the initial stage, but I'd like to hear your input on topics and in general anything you like to see on the website. You could input them by creating a discussion at https://urldefense.us/v3/__https://github.com/mpi-debug/mpi-debug.github.io/discussions__;!!G_uCfscf7eWS!fUAC0I4JHaglnu41VoVKLc_7NjKvgGGhsyb4mMu8uxeJxaQkXW6vfFi-Ev4hHydapxE4MGWSsCkq6o-W_bSw2SVVWoCf$ . Also, you are welcome to do a 5-seconds survey at https://urldefense.us/v3/__https://github.com/mpi-debug/mpi-debug.github.io/discussions/4__;!!G_uCfscf7eWS!fUAC0I4JHaglnu41VoVKLc_7NjKvgGGhsyb4mMu8uxeJxaQkXW6vfFi-Ev4hHydapxE4MGWSsCkq6o-W_bSw2RiIG5V2$ and https://urldefense.us/v3/__https://github.com/mpi-debug/mpi-debug.github.io/discussions/5__;!!G_uCfscf7eWS!fUAC0I4JHaglnu41VoVKLc_7NjKvgGGhsyb4mMu8uxeJxaQkXW6vfFi-Ev4hHydapxE4MGWSsCkq6o-W_bSw2eqd1Qcg$ . It is true, 5 seconds. Replying to this email is also fine. Thank you very much! --Junchao Zhang -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangzongze at gmail.com Thu Jul 3 19:07:56 2025 From: yangzongze at gmail.com (Zongze Yang) Date: Fri, 4 Jul 2025 00:07:56 +0000 Subject: [petsc-users] Suggestion: Support optional extras for Firedrake (or other python package) when installed via PETSc Message-ID: Hi all, As PETSc can install Firedrake along with it, would it be possible to support optional extras for Firedrake?such as [vtk], [netgen]?during the installation? Is this something that could be considered? Best regards, Zongze -------------- next part -------------- An HTML attachment was scrubbed... URL: From C.Klaij at marin.nl Fri Jul 4 05:55:29 2025 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Fri, 4 Jul 2025 10:55:29 +0000 Subject: [petsc-users] problem with nested logging In-Reply-To: <589D9E7C-5632-46EB-A976-7DB1E69F497A@petsc.dev> References: <589D9E7C-5632-46EB-A976-7DB1E69F497A@petsc.dev> Message-ID: (yes, valgrind was attempted but did not show any obvious problems.) However, we are making some progress. The random segmentation faults are observed with intelmpi 2012.13 and openmpi 4.1.5. However, with openmpi 5.0.6, the randomness is gone and we get systematic hanging of the code during PetscLogView. In fact, we can now make the code hang by adding a single collective PetscLogEvent. Next step is to replicate this behaviour in a standalone example for you to investigate. Chris _____ dr. ir. Christiaan Klaij | senior researcher Research & Development | CFD Development T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!fGUcwKtvcNmBqtHO64KSPpOhRicJBWa4sUVUEshLDH1m6jh1mEO97VnoUwQPxTYSSqRNNC-zDhw8Qm8RR5RdXuE$ ___________________________________ From: Barry Smith Sent: Tuesday, July 1, 2025 4:54 PM To: Klaij, Christiaan Cc: PETSc users list Subject: Re: [petsc-users] problem with nested logging It's probably already been done, but if not, you should run under Valgrind. "Random" crashes are usually an indication of memory corruption. > On Jul 1, 2025, at 4:16?AM, Klaij, Christiaan via petsc-users wrote: > > It's been a while, in the meantime we did upgrade the OS and the > compilers but the problem still persists. > > Can it be that one must call the PetscLogEventRegister and the > PetscLogEventBegin/End on all procs? Currently we don't do so, > think of some small system being build and solved. This event is > registered on all procs, but may take place on a single proc or > subset with the begin/end only on that proc or subset. > > Chris > > ________________________________________ > From: Klaij, Christiaan > Sent: Thursday, May 1, 2025 3:06 PM > To: Stefano Zampini > Cc: Randall Mackie; PETSc users list; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > If I deactivate all the calls to PetscLogEventBegin/End in the > simulation code, the error does not show-up. > > But since there are more than 2500 events, it's impossible to do > them one-by-one, especially since the error shows-up at random > and requires a number of cases and repetitions. > > Unfortunately, I'm running out of time and budget. I will retry > once we get Rocky Linux 9 and the latest Intel compilers. > > Chris > > ________________________________________ > From: Stefano Zampini > Sent: Thursday, May 1, 2025 10:57 AM > To: Klaij, Christiaan > Cc: Randall Mackie; PETSc users list; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > > > Il giorno gio 1 mag 2025 alle ore 11:38 Klaij, Christiaan > ha scritto: > The checks seem to be in place: I do get a PETSC ERROR when I add a log event on rank 0 as you suggested. > > > Ok, the broken logic may be in LogView then. You can try deactivating some logging by classes and see how the error evolves, maybe using PetscLogClassSetActiveAll. Or, if feasible, commenting out some part of the simulation code > > Another thought: in between the log events pairs, I also have calls to PetscLogFlops, perhaps that plays a role somehow. > > It shouldn't > > Chris > > ________________________________________ > From: Klaij, Christiaan > > Sent: Thursday, May 1, 2025 10:23 AM > To: Stefano Zampini > Cc: Randall Mackie; PETSc users list; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > Was the rewritting by Toby done somewhere between petsc 3.19.4 (no problem) and 3.23.4 (problem)? > > Chris > > ________________________________________ > From: Stefano Zampini > > Sent: Thursday, May 1, 2025 9:12 AM > To: Klaij, Christiaan > Cc: Randall Mackie; PETSc users list; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > If I look at the code for PetscLogHandlerEventBegin_Default, there are checks in place to see if the event is collectively called (see below) > Can you make sure the Nested logging system has the same checks? > It should, but double check since the code has been largely rewritten by Toby some time ago; to check it should be as easy as writing a code that calls a collective event on a single process and a debug version of petsc should complain > > if (rank ==0) > LogEventBegin() <-this should call MPIU_Allreduce, but other processes will not, thus hang > > > If it does not complain, then the error must come from how the logic in LogView works, and from how it traverses the various events (my guess: processed in a different order from different processes). Without a reproducer, it is hard to understand what's going on > > static PetscErrorCode PetscLogHandlerEventBegin_Default(PetscLogHandler h, PetscLogEvent event, PetscObject o1, PetscObject o2, PetscObject o3, PetscObject o4) > { > PetscLogHandler_Default def = (PetscLogHandler_Default)h->data; > PetscEventPerfInfo *event_perf_info = NULL; > PetscLogEventInfo event_info; > PetscLogDouble time; > PetscLogState state; > PetscLogStage stage; > > PetscFunctionBegin; > PetscCall(PetscLogHandlerGetState(h, &state)); > if (PetscDefined(USE_DEBUG)) { > PetscCall(PetscLogStateEventGetInfo(state, event, &event_info)); > if (PetscUnlikely(o1)) PetscValidHeader(o1, 3); > if (PetscUnlikely(o2)) PetscValidHeader(o2, 4); > if (PetscUnlikely(o3)) PetscValidHeader(o3, 5); > if (PetscUnlikely(o4)) PetscValidHeader(o4, 6); > if (event_info.collective && o1) { > PetscInt64 b1[2], b2[2]; > > b1[0] = -o1->cidx; > b1[1] = o1->cidx; > PetscCallMPI(MPIU_Allreduce(b1, b2, 2, MPIU_INT64, MPI_MAX, PetscObjectComm(o1))); > PetscCheck(-b2[0] == b2[1], PETSC_COMM_SELF, PETSC_ERR_PLIB, "Collective event %s not called collectively %" PetscInt64_FMT " != %" PetscInt64_FMT, event_info.name, -b2[0], b2[1]); > } > } > /* Synchronization */ > PetscCall(PetscLogHandlerEventSync_Default(h, event, PetscObjectComm(o1))); > > > > > Il giorno gio 1 mag 2025 alle ore 09:56 Klaij, Christiaan >> ha scritto: > I've tried with -log_sync, no complaints whatsoever, but the error is still there... > > Chris > > ________________________________________ > From: Stefano Zampini >> > Sent: Tuesday, April 29, 2025 6:12 PM > To: Randall Mackie > Cc: Klaij, Christiaan; PETSc users list; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > Can you try using -log_sync ? This should check every entry/exit points of logged Events and complain if something is not collectively called > > Stefano > > On Tue, Apr 29, 2025, 18:21 Randall Mackie >>>> wrote: > ah okay, I missed that this was found using openmpi. > > then it?s probably not the same issue we had. > > I can?t remember in which version it was fixed (I?m away from my work computer)?.I do know in our case openmpi and the latest Intel One API work fine. > > Randy > > On Apr 29, 2025, at 8:58?AM, Klaij, Christiaan >>>> wrote: > > Well, the error below only shows-up thanks to openmpi and gnu compilers. > With the intel mpi and compilers it just hangs (tried oneapi 2023.1.0). In which version was that bug fixed? > > Chris > > ________________________________________ > > dr. ir. Christiaan Klaij > | senior researcher | Research & Development | CFD Development > T +31 317 49 33 44 | C.Klaij at marin.nl>>> | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!faEur1uGDdy2EYiXEYLCVq_UbMO1KiXKQa0vvY2nxKctWQDpgzsX7k9qTLBuMuQ6VNzwRHRHYc8jUopRLSBIKfE$ > > > > > From: Randall Mackie >>>> > Sent: Tuesday, April 29, 2025 3:33 PM > To: Klaij, Christiaan > Cc: Matthew Knepley; petsc-users at mcs.anl.gov>>>; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > You don't often get email from rlmackie862 at gmail.com>>>. Learn why this is important> > We had a similar issue last year that we eventually tracked down to a bug in Intel MPI AllReduce, which was around the same version you are using. > > Can you try a different MPI or the latest Intel One API and see if your error clears? > > Randy > > On Tue, Apr 29, 2025 at 8:17?AM Klaij, Christiaan via petsc-users >>>>>>>> wrote: > I don't think so, we have tracing in place to detect mismatches. But as soon as I switch the tracing on, the error disappears... Same if I add a counter or print statements before and after EventBegin/End. Looks like a memory corruption problem, maybe nothing to do with petsc despite the error message. > > Chris > > ________________________________________ > From: Matthew Knepley >>>>>>>> > Sent: Tuesday, April 29, 2025 1:50 PM > To: Klaij, Christiaan > Cc: Junchao Zhang; petsc-users at mcs.anl.gov>>>>>>>; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > On Tue, Apr 29, 2025 at 6:50?AM Klaij, Christiaan >>>>>>>>>>>>>>>> wrote: > Here's a slightly better error message, obtained --with-debugging=1 > > Is it possible that you have a mismatched EventBegin()/EventEnd() in your code? That could be why we cannot reproduce it here. > > Thanks, > > Matt > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Petsc has generated inconsistent data > [0]PETSC ERROR: MPIU_Allreduce() called in different locations (code lines) on different processors > [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjggvQAzPU$ for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 > [0]PETSC ERROR: ./refresco with 2 MPI process(es) and PETSC_ARCH on marclus3login2 by cklaij Tue Apr 29 12:43:54 2025 > [0]PETSC ERROR: Configure options: --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs --with-mpi-dir=/cm/shared/apps/openmpi/gcc/4.0.2 --with-x=0 --with-mpe=0 --with-debugging=1 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgVgVAJPM$ --with-blaslapack-dir=/cm/shared/apps/intel/oneapi/mkl/2021.4.0 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgo2JWTO4$ --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgX9ZMYJA$ --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-libs/superbuild --with-ssl=0 --with-shared-libraries=1 > [0]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 > [0]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:379 > [0]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > [0]PETSC ERROR: #4 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > [0]PETSC ERROR: #5 PetscLogNestedTreePrintTop() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 > [0]PETSC ERROR: #6 PetscLogHandlerView_Nested_XML() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 > [0]PETSC ERROR: #7 PetscLogHandlerView_Nested() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 > [0]PETSC ERROR: #8 PetscLogHandlerView() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 > [0]PETSC ERROR: #9 PetscLogView() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/plog.c:2040 > [0]PETSC ERROR: #10 /home/cklaij/ReFRESCO/trunk/Code/src/petsc_include_impl.F90:130 > > ________________________________________ > [cid:ii_19681617e7812ff9cfc1] > dr. ir. Christiaan Klaij > | senior researcher | Research & Development | CFD Development > T +31 317 49 33 44 | C.Klaij at marin.nl>>>>>>>>>>>>>>> | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgJooxJhg$ > [Facebook] > [LinkedIn] > [YouTube] > > From: Klaij, Christiaan >>>>>>>>>>>>>>>> > Sent: Monday, April 28, 2025 3:53 PM > To: Matthew Knepley > Cc: Junchao Zhang; petsc-users at mcs.anl.gov>>>>>>>>>>>>>>>; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > Bisecting would be quite hard, it's not just the petsc version that changed, also other libs, compilers, even os components. > > Chris > > ________________________________________ > From: Matthew Knepley >>>>>>>>>>>>>>>> > Sent: Monday, April 28, 2025 3:06 PM > To: Klaij, Christiaan > Cc: Junchao Zhang; petsc-users at mcs.anl.gov>>>>>>>>>>>>>>>; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > You don't often get email from knepley at gmail.com>>>>>>>>>>>>>>>. Learn why this is important > On Mon, Apr 28, 2025 at 8:45?AM Klaij, Christiaan via petsc-users >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > I've tried adding a nested log viewer to src/snes/tutorials/ex70.c, > but it does not replicate the problem and works fine. > > Perhaps it is related to fortran, since the manualpage of > PetscLogNestedBegin says "No fortran support" (why? we've been > using it in fortran ever since). > > Therefore I've tried adding it to src/snes/ex5f90.F90 and that > also works fine. It seems I cannot replicate the problem in a > small example, unfortunately. > > We cannot replicate it here. Is there a chance you could bisect to see what change is responsible? > > Thanks, > > Matt > > Chris > > ________________________________________ > From: Junchao Zhang >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Sent: Saturday, April 26, 2025 3:51 PM > To: Klaij, Christiaan > Cc: petsc-users at mcs.anl.gov>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > You don't often get email from junchao.zhang at gmail.com>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>. Learn why this is important > Toby (Cc'ed) might know it. Or could you provide an example? > > --Junchao Zhang > > > On Fri, Apr 25, 2025 at 3:31?AM Klaij, Christiaan via petsc-users >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > We recently upgraded from 3.19.4 to 3.22.4 but face the problem below with the nested logging. Any ideas? > > Chris > > > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: General MPI error > [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer > [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6gIT68pbk$ for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 > [1]PETSC ERROR: refresco with 2 MPI process(es) and PETSC_ARCH on marclus3login2 by jwindt Fri Apr 25 08:52:30 2025 > [1]PETSC ERROR: Configure options: --prefix=/home/jwindt/cmake_builds/refresco/install-libs-gnu --with-mpi-dir=/cm/shared/apps/openmpi/gcc/4.0.2 --with-x=0 --with-mpe=0 --with-debugging=0 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6grH5BbeU$ --with-blaslapack-dir=/cm/shared/apps/intel/oneapi/mkl/2021.4.0 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6gw4-tEtY$ --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6gHq4uYiY$ --with-packages-build-dir=/home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" > [1]PETSC ERROR: #1 PetscLogNestedTreePrint() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:330 > [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 > [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 > [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 > [1]PETSC ERROR: #7 PetscLogHandlerView() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 > [1]PETSC ERROR: #8 PetscLogView() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/plog.c:2040 > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD > with errorcode 98. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > [cid:ii_196725d1e2a809852191] > dr. ir. Christiaan Klaij > | senior researcher | Research & Development | CFD Development > T +31 317 49 33 44 | C.Klaij at marin.nl>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6g8TwMMcw$ > [Facebook] > [LinkedIn] > [YouTube] > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjg539kFLg$ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjg539kFLg$ > > > > -- > Stefano > > > -- > Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image240518.png Type: image/png Size: 5004 bytes Desc: image240518.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image614921.png Type: image/png Size: 487 bytes Desc: image614921.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image484724.png Type: image/png Size: 504 bytes Desc: image484724.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image987297.png Type: image/png Size: 482 bytes Desc: image987297.png URL: From rlmackie862 at gmail.com Fri Jul 4 09:32:41 2025 From: rlmackie862 at gmail.com (Randall Mackie) Date: Fri, 4 Jul 2025 07:32:41 -0700 Subject: [petsc-users] MatNest error with null matrix blocks Message-ID: <899A42F0-B199-451F-B179-ED77CCA91A0F@gmail.com> In the process of upgrading our code to version 3.23, we have run into another issue, this time with nest matrices. In previous versions (well to at least 3.21) it was perfectly fine to pass in null matrix blocks to create a nest matrix. There are situations where we have, for example, only diagonal blocks. That should be allowed and was and worked fine. In petsc 3.23, that no longer is true, or perhaps we do not know what is the proper way to pass in a null matrix. The attached example reproduces this with the following error: [0]PETSC ERROR: Configure options: --force --with-clean=1 --with-scalar-type=complex --with-debugging=1 --with-fortran=1 --download-mpich=1 --with-cxx=0 [0]PETSC ERROR: #1 PetscObjectReference() at /home/rmackie/PETSc/petsc/src/sys/objects/inherit.c:620 [0]PETSC ERROR: #2 MatNestSetSubMats_Nest() at /home/rmackie/PETSc/petsc/src/mat/impls/nest/matnest.c:1407 [0]PETSC ERROR: #3 MatNestSetSubMats() at /home/rmackie/PETSc/petsc/src/mat/impls/nest/matnest.c:1518 [0]PETSC ERROR: #4 MatCreateNest() at /home/rmackie/PETSc/petsc/src/mat/impls/nest/matnest.c:1800 [0]PETSC ERROR: #5 matnest_bug_reproducer.F90:48 Thanks for helping with this, and thanks for all the work on the Fortran interfaces - I?ve been playing around with your new branch and am excited by all the work! Randy ? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: matnest_bug_reproducer.F90 Type: application/octet-stream Size: 2062 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Fri Jul 4 20:02:55 2025 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 4 Jul 2025 20:02:55 -0500 Subject: [petsc-users] problem with nested logging In-Reply-To: References: <589D9E7C-5632-46EB-A976-7DB1E69F497A@petsc.dev> Message-ID: How many MPI processes did you use? When they hang, maybe you can check their call stack to see if there are mismatches. --Junchao Zhang On Fri, Jul 4, 2025 at 5:56?AM Klaij, Christiaan via petsc-users < petsc-users at mcs.anl.gov> wrote: > (yes, valgrind was attempted but did not show any obvious > problems.) > > However, we are making some progress. The random segmentation > faults are observed with intelmpi 2012.13 and openmpi > 4.1.5. However, with openmpi 5.0.6, the randomness is gone and we > get systematic hanging of the code during PetscLogView. In fact, > we can now make the code hang by adding a single collective > PetscLogEvent. Next step is to replicate this behaviour in a > standalone example for you to investigate. > > Chris > > ________________________________________ > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44 <+31%20317%2049%2033%2044> | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!fjdaq68iKfZuqDCR2RkfATKmWU1OQqwgHSTHbXpz2Ad-Mu6uSQZvA-e4Fbx0lzh1j3ztmfJPNcOCveUUHXgjZzTK7bFU$ > > [image: Facebook] > > [image: LinkedIn] > > [image: YouTube] > > > > From: Barry Smith > Sent: Tuesday, July 1, 2025 4:54 PM > To: Klaij, Christiaan > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging > > > It's probably already been done, but if not, you should run under > Valgrind. "Random" crashes are usually an indication of memory corruption. > > > > > On Jul 1, 2025, at 4:16?AM, Klaij, Christiaan via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > It's been a while, in the meantime we did upgrade the OS and the > > compilers but the problem still persists. > > > > Can it be that one must call the PetscLogEventRegister and the > > PetscLogEventBegin/End on all procs? Currently we don't do so, > > think of some small system being build and solved. This event is > > registered on all procs, but may take place on a single proc or > > subset with the begin/end only on that proc or subset. > > > > Chris > > > > ________________________________________ > > From: Klaij, Christiaan > > Sent: Thursday, May 1, 2025 3:06 PM > > To: Stefano Zampini > > Cc: Randall Mackie; PETSc users list; Isaac, Toby > > Subject: Re: [petsc-users] problem with nested logging > > > > If I deactivate all the calls to PetscLogEventBegin/End in the > > simulation code, the error does not show-up. > > > > But since there are more than 2500 events, it's impossible to do > > them one-by-one, especially since the error shows-up at random > > and requires a number of cases and repetitions. > > > > Unfortunately, I'm running out of time and budget. I will retry > > once we get Rocky Linux 9 and the latest Intel compilers. > > > > Chris > > > > ________________________________________ > > From: Stefano Zampini > > Sent: Thursday, May 1, 2025 10:57 AM > > To: Klaij, Christiaan > > Cc: Randall Mackie; PETSc users list; Isaac, Toby > > Subject: Re: [petsc-users] problem with nested logging > > > > > > > > Il giorno gio 1 mag 2025 alle ore 11:38 Klaij, Christiaan < > C.Klaij at marin.nl> ha scritto: > > The checks seem to be in place: I do get a PETSC ERROR when I add a log > event on rank 0 as you suggested. > > > > > > Ok, the broken logic may be in LogView then. You can try deactivating > some logging by classes and see how the error evolves, maybe using > PetscLogClassSetActiveAll. Or, if feasible, commenting out some part of the > simulation code > > > > Another thought: in between the log events pairs, I also have calls to > PetscLogFlops, perhaps that plays a role somehow. > > > > It shouldn't > > > > Chris > > > > ________________________________________ > > From: Klaij, Christiaan > > > Sent: Thursday, May 1, 2025 10:23 AM > > To: Stefano Zampini > > Cc: Randall Mackie; PETSc users list; Isaac, Toby > > Subject: Re: [petsc-users] problem with nested logging > > > > Was the rewritting by Toby done somewhere between petsc 3.19.4 (no > problem) and 3.23.4 (problem)? > > > > Chris > > > > ________________________________________ > > From: Stefano Zampini stefano.zampini at gmail.com>> > > Sent: Thursday, May 1, 2025 9:12 AM > > To: Klaij, Christiaan > > Cc: Randall Mackie; PETSc users list; Isaac, Toby > > Subject: Re: [petsc-users] problem with nested logging > > > > If I look at the code for PetscLogHandlerEventBegin_Default, there are > checks in place to see if the event is collectively called (see below) > > Can you make sure the Nested logging system has the same checks? > > It should, but double check since the code has been largely rewritten by > Toby some time ago; to check it should be as easy as writing a code that > calls a collective event on a single process and a debug version of petsc > should complain > > > > if (rank ==0) > > LogEventBegin() <-this should call MPIU_Allreduce, but other processes > will not, thus hang > > > > > > If it does not complain, then the error must come from how the logic in > LogView works, and from how it traverses the various events (my guess: > processed in a different order from different processes). Without a > reproducer, it is hard to understand what's going on > > > > static PetscErrorCode PetscLogHandlerEventBegin_Default(PetscLogHandler > h, PetscLogEvent event, PetscObject o1, PetscObject o2, PetscObject o3, > PetscObject o4) > > { > > PetscLogHandler_Default def = (PetscLogHandler_Default)h->data; > > PetscEventPerfInfo *event_perf_info = NULL; > > PetscLogEventInfo event_info; > > PetscLogDouble time; > > PetscLogState state; > > PetscLogStage stage; > > > > PetscFunctionBegin; > > PetscCall(PetscLogHandlerGetState(h, &state)); > > if (PetscDefined(USE_DEBUG)) { > > PetscCall(PetscLogStateEventGetInfo(state, event, &event_info)); > > if (PetscUnlikely(o1)) PetscValidHeader(o1, 3); > > if (PetscUnlikely(o2)) PetscValidHeader(o2, 4); > > if (PetscUnlikely(o3)) PetscValidHeader(o3, 5); > > if (PetscUnlikely(o4)) PetscValidHeader(o4, 6); > > if (event_info.collective && o1) { > > PetscInt64 b1[2], b2[2]; > > > > b1[0] = -o1->cidx; > > b1[1] = o1->cidx; > > PetscCallMPI(MPIU_Allreduce(b1, b2, 2, MPIU_INT64, MPI_MAX, > PetscObjectComm(o1))); > > PetscCheck(-b2[0] == b2[1], PETSC_COMM_SELF, PETSC_ERR_PLIB, "Collective > event %s not called collectively %" PetscInt64_FMT " != %" PetscInt64_FMT, > event_info.name< > https://urldefense.us/v3/__http://event_info.name__;!!G_uCfscf7eWS!faEur1uGDdy2EYiXEYLCVq_UbMO1KiXKQa0vvY2nxKctWQDpgzsX7k9qTLBuMuQ6VNzwRHRHYc8jUopR10nyRr4$ > >< > https://urldefense.us/v3/__http://event_info.name__;!!G_uCfscf7eWS!faEur1uGDdy2EYiXEYLCVq_UbMO1KiXKQa0vvY2nxKctWQDpgzsX7k9qTLBuMuQ6VNzwRHRHYc8jUopR10nyRr4$ > >, -b2[0], b2[1]); > > } > > } > > /* Synchronization */ > > PetscCall(PetscLogHandlerEventSync_Default(h, event, > PetscObjectComm(o1))); > > > > > > > > > > Il giorno gio 1 mag 2025 alle ore 09:56 Klaij, Christiaan < > C.Klaij at marin.nl C.Klaij at marin.nl>>> ha scritto: > > I've tried with -log_sync, no complaints whatsoever, but the error is > still there... > > > > Chris > > > > ________________________________________ > > From: Stefano Zampini stefano.zampini at gmail.com> stefano.zampini at gmail.com>>> > > Sent: Tuesday, April 29, 2025 6:12 PM > > To: Randall Mackie > > Cc: Klaij, Christiaan; PETSc users list; Isaac, Toby > > Subject: Re: [petsc-users] problem with nested logging > > > > Can you try using -log_sync ? This should check every entry/exit points > of logged Events and complain if something is not collectively called > > > > Stefano > > > > On Tue, Apr 29, 2025, 18:21 Randall Mackie rlmackie862 at gmail.com>> rlmackie862 at gmail.com> rlmackie862 at gmail.com>>>> wrote: > > ah okay, I missed that this was found using openmpi. > > > > then it?s probably not the same issue we had. > > > > I can?t remember in which version it was fixed (I?m away from my work > computer)?.I do know in our case openmpi and the latest Intel One API work > fine. > > > > Randy > > > > On Apr 29, 2025, at 8:58?AM, Klaij, Christiaan C.Klaij at marin.nl> >> C.Klaij at marin.nl>>> wrote: > > > > Well, the error below only shows-up thanks to openmpi and gnu compilers. > > With the intel mpi and compilers it just hangs (tried oneapi 2023.1.0). > In which version was that bug fixed? > > > > Chris > > > > ________________________________________ > > > > dr. ir. Christiaan Klaij > > | senior researcher | Research & Development | CFD Development > > T +31 317 49 33 44 | C.Klaij at marin.nl > >> C.Klaij at marin.nl>> | > https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!faEur1uGDdy2EYiXEYLCVq_UbMO1KiXKQa0vvY2nxKctWQDpgzsX7k9qTLBuMuQ6VNzwRHRHYc8jUopRLSBIKfE$ > < > https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!faEur1uGDdy2EYiXEYLCVq_UbMO1KiXKQa0vvY2nxKctWQDpgzsX7k9qTLBuMuQ6VNzwRHRHYc8jUopRLSBIKfE$ > >< > https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!faEur1uGDdy2EYiXEYLCVq_UbMO1KiXKQa0vvY2nxKctWQDpgzsX7k9qTLBuMuQ6VNzwRHRHYc8jUopRLSBIKfE$ > >< > https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!bW2X4VgowGOLrAASgbFOR_Mh6HW4HtWqrdtpsvpnpiFrIwki34JOGyih-h-1bvgb-Bh4EdWRUoVqQW7s6Cx-ci9iNg$ > > > > < > https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!bW2X4VgowGOLrAASgbFOR_Mh6HW4HtWqrdtpsvpnpiFrIwki34JOGyih-h-1bvgb-Bh4EdWRUoVqQW7s6Cxl-rBbfA$ > > > > < > https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!bW2X4VgowGOLrAASgbFOR_Mh6HW4HtWqrdtpsvpnpiFrIwki34JOGyih-h-1bvgb-Bh4EdWRUoVqQW7s6Cw7FNelIQ$ > > > > < > https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!bW2X4VgowGOLrAASgbFOR_Mh6HW4HtWqrdtpsvpnpiFrIwki34JOGyih-h-1bvgb-Bh4EdWRUoVqQW7s6CwmfHQRog$ > > > > > > From: Randall Mackie >> rlmackie862 at gmail.com rlmackie862 at gmail.com>>> > > Sent: Tuesday, April 29, 2025 3:33 PM > > To: Klaij, Christiaan > > Cc: Matthew Knepley; petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov>>>; Isaac, Toby > > Subject: Re: [petsc-users] problem with nested logging > > > > You don't often get email from rlmackie862 at gmail.com rlmackie862 at gmail.com> rlmackie862 at gmail.com>> rlmackie862 at gmail.com> rlmackie862 at gmail.com>>>. Learn why this is important< > https://urldefense.us/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!G_uCfscf7eWS!faEur1uGDdy2EYiXEYLCVq_UbMO1KiXKQa0vvY2nxKctWQDpgzsX7k9qTLBuMuQ6VNzwRHRHYc8jUopRRPgiIXk$ > < > https://urldefense.us/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!G_uCfscf7eWS!bW2X4VgowGOLrAASgbFOR_Mh6HW4HtWqrdtpsvpnpiFrIwki34JOGyih-h-1bvgb-Bh4EdWRUoVqQW7s6CwgJDUS5w$ > >> > > We had a similar issue last year that we eventually tracked down to a > bug in Intel MPI AllReduce, which was around the same version you are using. > > > > Can you try a different MPI or the latest Intel One API and see if your > error clears? > > > > Randy > > > > On Tue, Apr 29, 2025 at 8:17?AM Klaij, Christiaan via petsc-users < > petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>>>> wrote: > > I don't think so, we have tracing in place to detect mismatches. But as > soon as I switch the tracing on, the error disappears... Same if I add a > counter or print statements before and after EventBegin/End. Looks like a > memory corruption problem, maybe nothing to do with petsc despite the error > message. > > > > Chris > > > > ________________________________________ > > From: Matthew Knepley >> knepley at gmail.com >> knepley at gmail.com> >> knepley at gmail.com>>>> > > Sent: Tuesday, April 29, 2025 1:50 PM > > To: Klaij, Christiaan > > Cc: Junchao Zhang; petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov>>> petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov>>>>; Isaac, Toby > > Subject: Re: [petsc-users] problem with nested logging > > > > On Tue, Apr 29, 2025 at 6:50?AM Klaij, Christiaan >> C.Klaij at marin.nl>> >> C.Klaij at marin.nl>>> >> C.Klaij at marin.nl>> >> C.Klaij at marin.nl>>>>> wrote: > > Here's a slightly better error message, obtained --with-debugging=1 > > > > Is it possible that you have a mismatched EventBegin()/EventEnd() in > your code? That could be why we cannot reproduce it here. > > > > Thanks, > > > > Matt > > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: Petsc has generated inconsistent data > > [0]PETSC ERROR: MPIU_Allreduce() called in different locations (code > lines) on different processors > > [0]PETSC ERROR: See > https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjggvQAzPU$ > for trouble shooting. > > [0]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 > > [0]PETSC ERROR: ./refresco with 2 MPI process(es) and PETSC_ARCH on > marclus3login2 by cklaij Tue Apr 29 12:43:54 2025 > > [0]PETSC ERROR: Configure options: > --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs > --with-mpi-dir=/cm/shared/apps/openmpi/gcc/4.0.2 --with-x=0 --with-mpe=0 > --with-debugging=1 --download-superlu_dist= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgVgVAJPM$ > --with-blaslapack-dir=/cm/shared/apps/intel/oneapi/mkl/2021.4.0 > --download-parmetis= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgo2JWTO4$ > --download-metis= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgX9ZMYJA$ > --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-libs/superbuild > --with-ssl=0 --with-shared-libraries=1 > > [0]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at > /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 > > [0]PETSC ERROR: #2 PetscLogNestedTreePrint() at > /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:379 > > [0]PETSC ERROR: #3 PetscLogNestedTreePrint() at > /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > > [0]PETSC ERROR: #4 PetscLogNestedTreePrint() at > /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > > [0]PETSC ERROR: #5 PetscLogNestedTreePrintTop() at > /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 > > [0]PETSC ERROR: #6 PetscLogHandlerView_Nested_XML() at > /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 > > [0]PETSC ERROR: #7 PetscLogHandlerView_Nested() at > /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 > > [0]PETSC ERROR: #8 PetscLogHandlerView() at > /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 > > [0]PETSC ERROR: #9 PetscLogView() at > /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/plog.c:2040 > > [0]PETSC ERROR: #10 > /home/cklaij/ReFRESCO/trunk/Code/src/petsc_include_impl.F90:130 > > > > ________________________________________ > > [cid:ii_19681617e7812ff9cfc1] > > dr. ir. Christiaan Klaij > > | senior researcher | Research & Development | CFD Development > > T +31 317 49 33 44 | C.Klaij at marin.nl > >> C.Klaij at marin.nl>> >> C.Klaij at marin.nl>>> >> C.Klaij at marin.nl>> >> C.Klaij at marin.nl>>>> | > https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgJooxJhg$ > < > https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgpOXjQSM$ > > > > [Facebook]< > https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjg_3Zt0Pw$ > > > > [LinkedIn]< > https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgcpgQdSE$ > > > > [YouTube]< > https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgn8qP6_I$ > > > > > > From: Klaij, Christiaan >> C.Klaij at marin.nl C.Klaij at marin.nl>>> >> C.Klaij at marin.nl C.Klaij at marin.nl>>>> >> C.Klaij at marin.nl C.Klaij at marin.nl>>> >> C.Klaij at marin.nl C.Klaij at marin.nl>>>>>> > > Sent: Monday, April 28, 2025 3:53 PM > > To: Matthew Knepley > > Cc: Junchao Zhang; petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov>>> petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov>>>> petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov>>> petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov>>>>>; Isaac, Toby > > Subject: Re: [petsc-users] problem with nested logging > > > > Bisecting would be quite hard, it's not just the petsc version that > changed, also other libs, compilers, even os components. > > > > Chris > > > > ________________________________________ > > From: Matthew Knepley >> knepley at gmail.com >> knepley at gmail.com> >> knepley at gmail.com>>> knepley at gmail.com>> >>> knepley at gmail.com > knepley at gmail.com>>>>>> > > Sent: Monday, April 28, 2025 3:06 PM > > To: Klaij, Christiaan > > Cc: Junchao Zhang; petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov>>> petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov>>>> petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov>>> petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov>>>>>; Isaac, Toby > > Subject: Re: [petsc-users] problem with nested logging > > > > You don't often get email from knepley at gmail.com knepley at gmail.com> >> knepley at gmail.com>> knepley at gmail.com>> >>>> knepley at gmail.com > knepley at gmail.com> >>> knepley at gmail.com> knepley at gmail.com>>>>>. Learn why this is important< > https://urldefense.us/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgcKEJXRA$ > > > > On Mon, Apr 28, 2025 at 8:45?AM Klaij, Christiaan via petsc-users < > petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>>>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>>>>>> wrote: > > I've tried adding a nested log viewer to src/snes/tutorials/ex70.c, > > but it does not replicate the problem and works fine. > > > > Perhaps it is related to fortran, since the manualpage of > > PetscLogNestedBegin says "No fortran support" (why? we've been > > using it in fortran ever since). > > > > Therefore I've tried adding it to src/snes/ex5f90.F90 and that > > also works fine. It seems I cannot replicate the problem in a > > small example, unfortunately. > > > > We cannot replicate it here. Is there a chance you could bisect to see > what change is responsible? > > > > Thanks, > > > > Matt > > > > Chris > > > > ________________________________________ > > From: Junchao Zhang junchao.zhang at gmail.com> junchao.zhang at gmail.com>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>>>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>>>>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>>>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>>>>>>> > > Sent: Saturday, April 26, 2025 3:51 PM > > To: Klaij, Christiaan > > Cc: petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>>>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>>>>>; Isaac, Toby > > Subject: Re: [petsc-users] problem with nested logging > > > > You don't often get email from junchao.zhang at gmail.com junchao.zhang at gmail.com> junchao.zhang at gmail.com>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>>>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>>>>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>>>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>> junchao.zhang at gmail.com> junchao.zhang at gmail.com>>>>>>. Learn why this is important< > https://urldefense.us/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6gVgt1dmE$ > > > > Toby (Cc'ed) might know it. Or could you provide an example? > > > > --Junchao Zhang > > > > > > On Fri, Apr 25, 2025 at 3:31?AM Klaij, Christiaan via petsc-users < > petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>>>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>>>>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>>>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>>>>>>> wrote: > > We recently upgraded from 3.19.4 to 3.22.4 but face the problem below > with the nested logging. Any ideas? > > > > Chris > > > > > > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [1]PETSC ERROR: General MPI error > > [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer > > [1]PETSC ERROR: See > https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6gIT68pbk$ > < > https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!biVlk6PKXoJvq5oVlmWdVJfW9tXv-JlwuWr3zg3jI5u1_jo8rvtZpEYnHO5RjdBqQEoqpqlJ3nusrFGM3UaqHzc$> > for trouble shooting. > > [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 > > [1]PETSC ERROR: refresco with 2 MPI process(es) and PETSC_ARCH on > marclus3login2 by jwindt Fri Apr 25 08:52:30 2025 > > [1]PETSC ERROR: Configure options: > --prefix=/home/jwindt/cmake_builds/refresco/install-libs-gnu > --with-mpi-dir=/cm/shared/apps/openmpi/gcc/4.0.2 --with-x=0 --with-mpe=0 > --with-debugging=0 --download-superlu_dist= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6grH5BbeU$ > < > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!biVlk6PKXoJvq5oVlmWdVJfW9tXv-JlwuWr3zg3jI5u1_jo8rvtZpEYnHO5RjdBqQEoqpqlJ3nusrFGM21-2D-o$> > --with-blaslapack-dir=/cm/shared/apps/intel/oneapi/mkl/2021.4.0 > --download-parmetis= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6gw4-tEtY$ > < > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!biVlk6PKXoJvq5oVlmWdVJfW9tXv-JlwuWr3zg3jI5u1_jo8rvtZpEYnHO5RjdBqQEoqpqlJ3nusrFGMW0lYHko$> > --download-metis= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6gHq4uYiY$ > < > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!biVlk6PKXoJvq5oVlmWdVJfW9tXv-JlwuWr3zg3jI5u1_jo8rvtZpEYnHO5RjdBqQEoqpqlJ3nusrFGMbSrIiUg$> > --with-packages-build-dir=/home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild > --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall > -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall > -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall > -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall > -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops > -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime > -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops > -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime > -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops > -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime > -Wno-unused-function -O3 -DNDEBUG" > > [1]PETSC ERROR: #1 PetscLogNestedTreePrint() at > /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:330 > > [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at > /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > > [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at > /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > > [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at > /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 > > [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at > /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 > > [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at > /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 > > [1]PETSC ERROR: #7 PetscLogHandlerView() at > /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 > > [1]PETSC ERROR: #8 PetscLogView() at > /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/plog.c:2040 > > > -------------------------------------------------------------------------- > > MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD > > with errorcode 98. > > > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > > You may or may not see output from other processes, depending on > > exactly when Open MPI kills them. > > > -------------------------------------------------------------------------- > > [cid:ii_196725d1e2a809852191] > > dr. ir. Christiaan Klaij > > | senior researcher | Research & Development | CFD Development > > T +31 317 49 33 44 | C.Klaij at marin.nl > >> C.Klaij at marin.nl>> >> C.Klaij at marin.nl>>> >> C.Klaij at marin.nl>> >> C.Klaij at marin.nl>>>> >> C.Klaij at marin.nl>> >> C.Klaij at marin.nl>>> >> C.Klaij at marin.nl>> >> C.Klaij at marin.nl>>>>> >> C.Klaij at marin.nl>> >> C.Klaij at marin.nl>>> >> C.Klaij at marin.nl>> >> C.Klaij at marin.nl>>>> >> C.Klaij at marin.nl>> >> C.Klaij at marin.nl>>> >> C.Klaij at marin.nl>> >> C.Klaij at marin.nl>>>>>> | > https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6g8TwMMcw$ > < > https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!biVlk6PKXoJvq5oVlmWdVJfW9tXv-JlwuWr3zg3jI5u1_jo8rvtZpEYnHO5RjdBqQEoqpqlJ3nusrFGM8tSNH1g$ > > > > [Facebook]< > https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!biVlk6PKXoJvq5oVlmWdVJfW9tXv-JlwuWr3zg3jI5u1_jo8rvtZpEYnHO5RjdBqQEoqpqlJ3nusrFGMcVCZ9hk$ > > > > [LinkedIn]< > https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!biVlk6PKXoJvq5oVlmWdVJfW9tXv-JlwuWr3zg3jI5u1_jo8rvtZpEYnHO5RjdBqQEoqpqlJ3nusrFGMIDBZW7k$ > > > > [YouTube]< > https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!biVlk6PKXoJvq5oVlmWdVJfW9tXv-JlwuWr3zg3jI5u1_jo8rvtZpEYnHO5RjdBqQEoqpqlJ3nusrFGMVKWos24$ > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjg539kFLg$ > < > https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgMsu6hhA$ > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjg539kFLg$ > < > https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgMsu6hhA$ > > > > > > > > > > -- > > Stefano > > > > > > -- > > Stefano > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image240518.png Type: image/png Size: 5004 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image614921.png Type: image/png Size: 487 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image484724.png Type: image/png Size: 504 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image987297.png Type: image/png Size: 482 bytes Desc: not available URL: From bsmith at petsc.dev Fri Jul 4 20:08:56 2025 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 4 Jul 2025 21:08:56 -0400 Subject: [petsc-users] Suggestion: Support optional extras for Firedrake (or other python package) when installed via PETSc In-Reply-To: References: Message-ID: <34816642-06CD-4D99-8DDF-A688A03565A4@petsc.dev> Yes, what specific ones are portable and fairly commonly needed? I would like a default installation to include everything a user would need. So let me know the list and I will add them. Barry > On Jul 3, 2025, at 8:07?PM, Zongze Yang wrote: > > Hi all, > > As PETSc can install Firedrake along with it, would it be possible to support optional extras for Firedrake?such as [vtk], [netgen]?during the installation? > > Is this something that could be considered? > > Best regards, > Zongze > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Jul 4 20:56:03 2025 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 4 Jul 2025 21:56:03 -0400 Subject: [petsc-users] MatNest error with null matrix blocks In-Reply-To: <899A42F0-B199-451F-B179-ED77CCA91A0F@gmail.com> References: <899A42F0-B199-451F-B179-ED77CCA91A0F@gmail.com> Message-ID: <141B7BA8-D995-415C-A79B-5066F7368999@petsc.dev> You need to pass PETSC_NULL_MAT in those locations, you cannot just pass a Mat that has never been created (that is a different kind of NULL matrix :-). I have added clearer error checking in https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8526__;!!G_uCfscf7eWS!c7m-laUXAl00bSHNb-ohlO2tKeFMTPNbjzMWPv5thIWU9NSFKiyZzFTUJgVbOqSWRkgYoLYJVkqR6dtPIeyHqig$ Barry > On Jul 4, 2025, at 10:32?AM, Randall Mackie wrote: > > In the process of upgrading our code to version 3.23, we have run into another issue, this time with nest matrices. > > In previous versions (well to at least 3.21) it was perfectly fine to pass in null matrix blocks to create a nest matrix. There are situations where we have, for example, only diagonal blocks. That should be allowed and was and worked fine. > > In petsc 3.23, that no longer is true, or perhaps we do not know what is the proper way to pass in a null matrix. > > The attached example reproduces this with the following error: > > [0]PETSC ERROR: Configure options: --force --with-clean=1 --with-scalar-type=complex --with-debugging=1 --with-fortran=1 --download-mpich=1 --with-cxx=0 > [0]PETSC ERROR: #1 PetscObjectReference() at /home/rmackie/PETSc/petsc/src/sys/objects/inherit.c:620 > [0]PETSC ERROR: #2 MatNestSetSubMats_Nest() at /home/rmackie/PETSc/petsc/src/mat/impls/nest/matnest.c:1407 > [0]PETSC ERROR: #3 MatNestSetSubMats() at /home/rmackie/PETSc/petsc/src/mat/impls/nest/matnest.c:1518 > [0]PETSC ERROR: #4 MatCreateNest() at /home/rmackie/PETSc/petsc/src/mat/impls/nest/matnest.c:1800 > [0]PETSC ERROR: #5 matnest_bug_reproducer.F90:48 > > > Thanks for helping with this, and thanks for all the work on the Fortran interfaces - I?ve been playing around with your new branch and am excited by all the work! > > Randy > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rlmackie862 at gmail.com Fri Jul 4 20:59:51 2025 From: rlmackie862 at gmail.com (Randall Mackie) Date: Fri, 4 Jul 2025 18:59:51 -0700 Subject: [petsc-users] MatNest error with null matrix blocks In-Reply-To: <141B7BA8-D995-415C-A79B-5066F7368999@petsc.dev> References: <899A42F0-B199-451F-B179-ED77CCA91A0F@gmail.com> <141B7BA8-D995-415C-A79B-5066F7368999@petsc.dev> Message-ID: <6B268C9B-8C01-4E1D-8624-FDBD974C62AC@gmail.com> Thanks Barry We were confused because the online manual says: PETSc objects are always automatically initialized when declared so you do not need to (and should not) do type(tXXX) x = PETSC_NULL_XXX XXX x = PETSC_NULL_XXX Randy > On Jul 4, 2025, at 6:56?PM, Barry Smith wrote: > > > You need to pass PETSC_NULL_MAT in those locations, you cannot just pass a Mat that has never been created (that is a different kind of NULL matrix :-). I have added clearer error checking in https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8526__;!!G_uCfscf7eWS!Y7C44Hdmd964boi6gEmciviR7GGeHph-RSlxAgB7oCuxJsPbYDW5gZpHgn5KIJMryLOmkFvMiYX3aDDpI5f6G0AQrg$ > > Barry > > >> On Jul 4, 2025, at 10:32?AM, Randall Mackie wrote: >> >> In the process of upgrading our code to version 3.23, we have run into another issue, this time with nest matrices. >> >> In previous versions (well to at least 3.21) it was perfectly fine to pass in null matrix blocks to create a nest matrix. There are situations where we have, for example, only diagonal blocks. That should be allowed and was and worked fine. >> >> In petsc 3.23, that no longer is true, or perhaps we do not know what is the proper way to pass in a null matrix. >> >> The attached example reproduces this with the following error: >> >> [0]PETSC ERROR: Configure options: --force --with-clean=1 --with-scalar-type=complex --with-debugging=1 --with-fortran=1 --download-mpich=1 --with-cxx=0 >> [0]PETSC ERROR: #1 PetscObjectReference() at /home/rmackie/PETSc/petsc/src/sys/objects/inherit.c:620 >> [0]PETSC ERROR: #2 MatNestSetSubMats_Nest() at /home/rmackie/PETSc/petsc/src/mat/impls/nest/matnest.c:1407 >> [0]PETSC ERROR: #3 MatNestSetSubMats() at /home/rmackie/PETSc/petsc/src/mat/impls/nest/matnest.c:1518 >> [0]PETSC ERROR: #4 MatCreateNest() at /home/rmackie/PETSc/petsc/src/mat/impls/nest/matnest.c:1800 >> [0]PETSC ERROR: #5 matnest_bug_reproducer.F90:48 >> >> >> Thanks for helping with this, and thanks for all the work on the Fortran interfaces - I?ve been playing around with your new branch and am excited by all the work! >> >> Randy >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangzongze at gmail.com Sat Jul 5 20:14:05 2025 From: yangzongze at gmail.com (Zongze Yang) Date: Sun, 6 Jul 2025 01:14:05 +0000 Subject: [petsc-users] Suggestion: Support optional extras for Firedrake (or other python package) when installed via PETSc In-Reply-To: <34816642-06CD-4D99-8DDF-A688A03565A4@petsc.dev> References: <34816642-06CD-4D99-8DDF-A688A03565A4@petsc.dev> Message-ID: Thank you. I recommend including these packages: [check, netgen, slepc]. Since VTK does not have an ARM Linux package, it may not be suitable to include. Best wishes, Zongze From: Barry Smith Date: Saturday, July 5, 2025 at 09:09 To: Zongze Yang Cc: PETSc users list Subject: Re: [petsc-users] Suggestion: Support optional extras for Firedrake (or other python package) when installed via PETSc Yes, what specific ones are portable and fairly commonly needed? I would like a default installation to include everything a user would need. So let me know the list and I will add them. Barry On Jul 3, 2025, at 8:07?PM, Zongze Yang wrote: Hi all, As PETSc can install Firedrake along with it, would it be possible to support optional extras for Firedrake?such as [vtk], [netgen]?during the installation? Is this something that could be considered? Best regards, Zongze -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sun Jul 6 11:59:00 2025 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 6 Jul 2025 12:59:00 -0400 Subject: [petsc-users] MatNest error with null matrix blocks In-Reply-To: <6B268C9B-8C01-4E1D-8624-FDBD974C62AC@gmail.com> References: <899A42F0-B199-451F-B179-ED77CCA91A0F@gmail.com> <141B7BA8-D995-415C-A79B-5066F7368999@petsc.dev> <6B268C9B-8C01-4E1D-8624-FDBD974C62AC@gmail.com> Message-ID: <894F0E23-CAD0-4D7D-8871-CF42E9A20379@petsc.dev> Yes, they are automatically initialized to something ((void*)-2 actually). They are NOT automatically initialized to PETSC_NULL_XXX #define PetscObjectIsNull(obj) (obj%v == 0 .or. obj%v == -2 .or. obj%v == -3) PETSc Fortran has multiple representations of NULL objects. When you want to pass in the meaning "I am a null" you pass PETSC_NULL_XXX. Not the other representations. Barry Why? Well, if variables were initialized to 0 and then got passed into a routine, the routine would see the 0 and think: Oh, the person does not want to set this variable (because passing x would be the same as passing PETSC_NULL_XXX). > On Jul 4, 2025, at 9:59?PM, Randall Mackie wrote: > > Thanks Barry > > We were confused because the online manual says: > > PETSc objects are always automatically initialized when declared so you do not need to (and should not) do > > type(tXXX) x = PETSC_NULL_XXX > XXX x = PETSC_NULL_XXX > > Randy > > >> On Jul 4, 2025, at 6:56?PM, Barry Smith wrote: >> >> >> You need to pass PETSC_NULL_MAT in those locations, you cannot just pass a Mat that has never been created (that is a different kind of NULL matrix :-). I have added clearer error checking in https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8526__;!!G_uCfscf7eWS!dYn7mRbq7wqPwxO_KWtoU39_hRLXGhGFWIbqPlpeg6nUDdMMWKXwWo4LX9PczTjyXK_ayGhvWV8eEQ8Qo68moKE$ >> >> Barry >> >> >>> On Jul 4, 2025, at 10:32?AM, Randall Mackie wrote: >>> >>> In the process of upgrading our code to version 3.23, we have run into another issue, this time with nest matrices. >>> >>> In previous versions (well to at least 3.21) it was perfectly fine to pass in null matrix blocks to create a nest matrix. There are situations where we have, for example, only diagonal blocks. That should be allowed and was and worked fine. >>> >>> In petsc 3.23, that no longer is true, or perhaps we do not know what is the proper way to pass in a null matrix. >>> >>> The attached example reproduces this with the following error: >>> >>> [0]PETSC ERROR: Configure options: --force --with-clean=1 --with-scalar-type=complex --with-debugging=1 --with-fortran=1 --download-mpich=1 --with-cxx=0 >>> [0]PETSC ERROR: #1 PetscObjectReference() at /home/rmackie/PETSc/petsc/src/sys/objects/inherit.c:620 >>> [0]PETSC ERROR: #2 MatNestSetSubMats_Nest() at /home/rmackie/PETSc/petsc/src/mat/impls/nest/matnest.c:1407 >>> [0]PETSC ERROR: #3 MatNestSetSubMats() at /home/rmackie/PETSc/petsc/src/mat/impls/nest/matnest.c:1518 >>> [0]PETSC ERROR: #4 MatCreateNest() at /home/rmackie/PETSc/petsc/src/mat/impls/nest/matnest.c:1800 >>> [0]PETSC ERROR: #5 matnest_bug_reproducer.F90:48 >>> >>> >>> Thanks for helping with this, and thanks for all the work on the Fortran interfaces - I?ve been playing around with your new branch and am excited by all the work! >>> >>> Randy >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sun Jul 6 11:59:00 2025 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 6 Jul 2025 12:59:00 -0400 Subject: [petsc-users] MatNest error with null matrix blocks In-Reply-To: <6B268C9B-8C01-4E1D-8624-FDBD974C62AC@gmail.com> References: <899A42F0-B199-451F-B179-ED77CCA91A0F@gmail.com> <141B7BA8-D995-415C-A79B-5066F7368999@petsc.dev> <6B268C9B-8C01-4E1D-8624-FDBD974C62AC@gmail.com> Message-ID: <894F0E23-CAD0-4D7D-8871-CF42E9A20379@petsc.dev> Yes, they are automatically initialized to something ((void*)-2 actually). They are NOT automatically initialized to PETSC_NULL_XXX #define PetscObjectIsNull(obj) (obj%v == 0 .or. obj%v == -2 .or. obj%v == -3) PETSc Fortran has multiple representations of NULL objects. When you want to pass in the meaning "I am a null" you pass PETSC_NULL_XXX. Not the other representations. Barry Why? Well, if variables were initialized to 0 and then got passed into a routine, the routine would see the 0 and think: Oh, the person does not want to set this variable (because passing x would be the same as passing PETSC_NULL_XXX). > On Jul 4, 2025, at 9:59?PM, Randall Mackie wrote: > > Thanks Barry > > We were confused because the online manual says: > > PETSc objects are always automatically initialized when declared so you do not need to (and should not) do > > type(tXXX) x = PETSC_NULL_XXX > XXX x = PETSC_NULL_XXX > > Randy > > >> On Jul 4, 2025, at 6:56?PM, Barry Smith wrote: >> >> >> You need to pass PETSC_NULL_MAT in those locations, you cannot just pass a Mat that has never been created (that is a different kind of NULL matrix :-). I have added clearer error checking in https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8526__;!!G_uCfscf7eWS!fCKFD2GQuhS7Ph3ZA3-seSvOMjhBw12nu4QDGsY2xv5BW8-DUzkyLgU0UhWOJsx21qVDDjkua97Nfr43MUTuUsk$ >> >> Barry >> >> >>> On Jul 4, 2025, at 10:32?AM, Randall Mackie wrote: >>> >>> In the process of upgrading our code to version 3.23, we have run into another issue, this time with nest matrices. >>> >>> In previous versions (well to at least 3.21) it was perfectly fine to pass in null matrix blocks to create a nest matrix. There are situations where we have, for example, only diagonal blocks. That should be allowed and was and worked fine. >>> >>> In petsc 3.23, that no longer is true, or perhaps we do not know what is the proper way to pass in a null matrix. >>> >>> The attached example reproduces this with the following error: >>> >>> [0]PETSC ERROR: Configure options: --force --with-clean=1 --with-scalar-type=complex --with-debugging=1 --with-fortran=1 --download-mpich=1 --with-cxx=0 >>> [0]PETSC ERROR: #1 PetscObjectReference() at /home/rmackie/PETSc/petsc/src/sys/objects/inherit.c:620 >>> [0]PETSC ERROR: #2 MatNestSetSubMats_Nest() at /home/rmackie/PETSc/petsc/src/mat/impls/nest/matnest.c:1407 >>> [0]PETSC ERROR: #3 MatNestSetSubMats() at /home/rmackie/PETSc/petsc/src/mat/impls/nest/matnest.c:1518 >>> [0]PETSC ERROR: #4 MatCreateNest() at /home/rmackie/PETSc/petsc/src/mat/impls/nest/matnest.c:1800 >>> [0]PETSC ERROR: #5 matnest_bug_reproducer.F90:48 >>> >>> >>> Thanks for helping with this, and thanks for all the work on the Fortran interfaces - I?ve been playing around with your new branch and am excited by all the work! >>> >>> Randy >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rlmackie862 at gmail.com Sun Jul 6 12:10:10 2025 From: rlmackie862 at gmail.com (Randall Mackie) Date: Sun, 6 Jul 2025 10:10:10 -0700 Subject: [petsc-users] MatNest error with null matrix blocks In-Reply-To: <894F0E23-CAD0-4D7D-8871-CF42E9A20379@petsc.dev> References: <899A42F0-B199-451F-B179-ED77CCA91A0F@gmail.com> <141B7BA8-D995-415C-A79B-5066F7368999@petsc.dev> <6B268C9B-8C01-4E1D-8624-FDBD974C62AC@gmail.com> <894F0E23-CAD0-4D7D-8871-CF42E9A20379@petsc.dev> Message-ID: <5C3D7324-AC88-49A7-8993-51EC9E7FAEAB@gmail.com> Thanks Barry - this is a very helpful clarification. Randy > On Jul 6, 2025, at 9:59?AM, Barry Smith wrote: > > > Yes, they are automatically initialized to something ((void*)-2 actually). They are NOT automatically initialized to PETSC_NULL_XXX > > #define PetscObjectIsNull(obj) (obj%v == 0 .or. obj%v == -2 .or. obj%v == -3) > > PETSc Fortran has multiple representations of NULL objects. When you want to pass in the meaning "I am a null" you pass PETSC_NULL_XXX. Not the other representations. > > > Barry > > Why? Well, if variables were initialized to 0 and then got passed into a routine, the routine would see the 0 and think: Oh, the person does not want to set this variable (because passing x would be the same as passing PETSC_NULL_XXX). > > > > > > >> On Jul 4, 2025, at 9:59?PM, Randall Mackie wrote: >> >> Thanks Barry >> >> We were confused because the online manual says: >> >> PETSc objects are always automatically initialized when declared so you do not need to (and should not) do >> >> type(tXXX) x = PETSC_NULL_XXX >> XXX x = PETSC_NULL_XXX >> >> Randy >> >> >>> On Jul 4, 2025, at 6:56?PM, Barry Smith wrote: >>> >>> >>> You need to pass PETSC_NULL_MAT in those locations, you cannot just pass a Mat that has never been created (that is a different kind of NULL matrix :-). I have added clearer error checking in https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8526__;!!G_uCfscf7eWS!cuhCwQPErEmmFpD3ik1xLziqtbh4SbXqQViT4PDY_DgSHIm0dkes0boQKPJnZkhSqaayUh78IUoqo7i2dMIPbibEsw$ >>> >>> Barry >>> >>> >>>> On Jul 4, 2025, at 10:32?AM, Randall Mackie wrote: >>>> >>>> In the process of upgrading our code to version 3.23, we have run into another issue, this time with nest matrices. >>>> >>>> In previous versions (well to at least 3.21) it was perfectly fine to pass in null matrix blocks to create a nest matrix. There are situations where we have, for example, only diagonal blocks. That should be allowed and was and worked fine. >>>> >>>> In petsc 3.23, that no longer is true, or perhaps we do not know what is the proper way to pass in a null matrix. >>>> >>>> The attached example reproduces this with the following error: >>>> >>>> [0]PETSC ERROR: Configure options: --force --with-clean=1 --with-scalar-type=complex --with-debugging=1 --with-fortran=1 --download-mpich=1 --with-cxx=0 >>>> [0]PETSC ERROR: #1 PetscObjectReference() at /home/rmackie/PETSc/petsc/src/sys/objects/inherit.c:620 >>>> [0]PETSC ERROR: #2 MatNestSetSubMats_Nest() at /home/rmackie/PETSc/petsc/src/mat/impls/nest/matnest.c:1407 >>>> [0]PETSC ERROR: #3 MatNestSetSubMats() at /home/rmackie/PETSc/petsc/src/mat/impls/nest/matnest.c:1518 >>>> [0]PETSC ERROR: #4 MatCreateNest() at /home/rmackie/PETSc/petsc/src/mat/impls/nest/matnest.c:1800 >>>> [0]PETSC ERROR: #5 matnest_bug_reproducer.F90:48 >>>> >>>> >>>> Thanks for helping with this, and thanks for all the work on the Fortran interfaces - I?ve been playing around with your new branch and am excited by all the work! >>>> >>>> Randy >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From C.Klaij at marin.nl Sun Jul 6 12:53:20 2025 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Sun, 6 Jul 2025 17:53:20 +0000 Subject: [petsc-users] problem with nested logging In-Reply-To: References: <589D9E7C-5632-46EB-A976-7DB1E69F497A@petsc.dev> Message-ID: Any number above 1. There are no mismatches. I'm sending a standalone example next. Chris _____ dr. ir. Christiaan Klaij | senior researcher Research & Development | CFD Development T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!amZVGXlyUOzDuaLA3yogXw9MYfgtdztuEIkZPxTsxJtBDaa7NZmRB8mV7o5NQjz6HCSHRON-BuOUWOYPxMGZxNg$ ___________________________________ From: Junchao Zhang Sent: Saturday, July 5, 2025 3:02 AM To: Klaij, Christiaan Cc: Barry Smith; PETSc users list Subject: Re: [petsc-users] problem with nested logging How many MPI processes did you use? When they hang, maybe you can check their call stack to see if there are mismatches. --Junchao Zhang On Fri, Jul 4, 2025 at 5:56?AM Klaij, Christiaan via petsc-users > wrote: (yes, valgrind was attempted but did not show any obvious problems.) However, we are making some progress. The random segmentation faults are observed with intelmpi 2012.13 and openmpi 4.1.5. However, with openmpi 5.0.6, the randomness is gone and we get systematic hanging of the code during PetscLogView. In fact, we can now make the code hang by adding a single collective PetscLogEvent. Next step is to replicate this behaviour in a standalone example for you to investigate. Chris ________________________________________ [cid:ii_197d818ee46d5b17c041] dr. ir. Christiaan Klaij | senior researcher Research & Development | CFD Development T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!amZVGXlyUOzDuaLA3yogXw9MYfgtdztuEIkZPxTsxJtBDaa7NZmRB8mV7o5NQjz6HCSHRON-BuOUWOYPxMGZxNg$ [Facebook] [LinkedIn] [YouTube] From: Barry Smith > Sent: Tuesday, July 1, 2025 4:54 PM To: Klaij, Christiaan Cc: PETSc users list Subject: Re: [petsc-users] problem with nested logging It's probably already been done, but if not, you should run under Valgrind. "Random" crashes are usually an indication of memory corruption. > On Jul 1, 2025, at 4:16?AM, Klaij, Christiaan via petsc-users > wrote: > > It's been a while, in the meantime we did upgrade the OS and the > compilers but the problem still persists. > > Can it be that one must call the PetscLogEventRegister and the > PetscLogEventBegin/End on all procs? Currently we don't do so, > think of some small system being build and solved. This event is > registered on all procs, but may take place on a single proc or > subset with the begin/end only on that proc or subset. > > Chris > > ________________________________________ > From: Klaij, Christiaan > > Sent: Thursday, May 1, 2025 3:06 PM > To: Stefano Zampini > Cc: Randall Mackie; PETSc users list; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > If I deactivate all the calls to PetscLogEventBegin/End in the > simulation code, the error does not show-up. > > But since there are more than 2500 events, it's impossible to do > them one-by-one, especially since the error shows-up at random > and requires a number of cases and repetitions. > > Unfortunately, I'm running out of time and budget. I will retry > once we get Rocky Linux 9 and the latest Intel compilers. > > Chris > > ________________________________________ > From: Stefano Zampini > > Sent: Thursday, May 1, 2025 10:57 AM > To: Klaij, Christiaan > Cc: Randall Mackie; PETSc users list; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > > > Il giorno gio 1 mag 2025 alle ore 11:38 Klaij, Christiaan >> ha scritto: > The checks seem to be in place: I do get a PETSC ERROR when I add a log event on rank 0 as you suggested. > > > Ok, the broken logic may be in LogView then. You can try deactivating some logging by classes and see how the error evolves, maybe using PetscLogClassSetActiveAll. Or, if feasible, commenting out some part of the simulation code > > Another thought: in between the log events pairs, I also have calls to PetscLogFlops, perhaps that plays a role somehow. > > It shouldn't > > Chris > > ________________________________________ > From: Klaij, Christiaan >> > Sent: Thursday, May 1, 2025 10:23 AM > To: Stefano Zampini > Cc: Randall Mackie; PETSc users list; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > Was the rewritting by Toby done somewhere between petsc 3.19.4 (no problem) and 3.23.4 (problem)? > > Chris > > ________________________________________ > From: Stefano Zampini >> > Sent: Thursday, May 1, 2025 9:12 AM > To: Klaij, Christiaan > Cc: Randall Mackie; PETSc users list; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > If I look at the code for PetscLogHandlerEventBegin_Default, there are checks in place to see if the event is collectively called (see below) > Can you make sure the Nested logging system has the same checks? > It should, but double check since the code has been largely rewritten by Toby some time ago; to check it should be as easy as writing a code that calls a collective event on a single process and a debug version of petsc should complain > > if (rank ==0) > LogEventBegin() <-this should call MPIU_Allreduce, but other processes will not, thus hang > > > If it does not complain, then the error must come from how the logic in LogView works, and from how it traverses the various events (my guess: processed in a different order from different processes). Without a reproducer, it is hard to understand what's going on > > static PetscErrorCode PetscLogHandlerEventBegin_Default(PetscLogHandler h, PetscLogEvent event, PetscObject o1, PetscObject o2, PetscObject o3, PetscObject o4) > { > PetscLogHandler_Default def = (PetscLogHandler_Default)h->data; > PetscEventPerfInfo *event_perf_info = NULL; > PetscLogEventInfo event_info; > PetscLogDouble time; > PetscLogState state; > PetscLogStage stage; > > PetscFunctionBegin; > PetscCall(PetscLogHandlerGetState(h, &state)); > if (PetscDefined(USE_DEBUG)) { > PetscCall(PetscLogStateEventGetInfo(state, event, &event_info)); > if (PetscUnlikely(o1)) PetscValidHeader(o1, 3); > if (PetscUnlikely(o2)) PetscValidHeader(o2, 4); > if (PetscUnlikely(o3)) PetscValidHeader(o3, 5); > if (PetscUnlikely(o4)) PetscValidHeader(o4, 6); > if (event_info.collective && o1) { > PetscInt64 b1[2], b2[2]; > > b1[0] = -o1->cidx; > b1[1] = o1->cidx; > PetscCallMPI(MPIU_Allreduce(b1, b2, 2, MPIU_INT64, MPI_MAX, PetscObjectComm(o1))); > PetscCheck(-b2[0] == b2[1], PETSC_COMM_SELF, PETSC_ERR_PLIB, "Collective event %s not called collectively %" PetscInt64_FMT " != %" PetscInt64_FMT, event_info.name, -b2[0], b2[1]); > } > } > /* Synchronization */ > PetscCall(PetscLogHandlerEventSync_Default(h, event, PetscObjectComm(o1))); > > > > > Il giorno gio 1 mag 2025 alle ore 09:56 Klaij, Christiaan >>>> ha scritto: > I've tried with -log_sync, no complaints whatsoever, but the error is still there... > > Chris > > ________________________________________ > From: Stefano Zampini >>>> > Sent: Tuesday, April 29, 2025 6:12 PM > To: Randall Mackie > Cc: Klaij, Christiaan; PETSc users list; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > Can you try using -log_sync ? This should check every entry/exit points of logged Events and complain if something is not collectively called > > Stefano > > On Tue, Apr 29, 2025, 18:21 Randall Mackie >>>>>>>> wrote: > ah okay, I missed that this was found using openmpi. > > then it?s probably not the same issue we had. > > I can?t remember in which version it was fixed (I?m away from my work computer)?.I do know in our case openmpi and the latest Intel One API work fine. > > Randy > > On Apr 29, 2025, at 8:58?AM, Klaij, Christiaan >>>>>>>> wrote: > > Well, the error below only shows-up thanks to openmpi and gnu compilers. > With the intel mpi and compilers it just hangs (tried oneapi 2023.1.0). In which version was that bug fixed? > > Chris > > ________________________________________ > > dr. ir. Christiaan Klaij > | senior researcher | Research & Development | CFD Development > T +31 317 49 33 44 | C.Klaij at marin.nl>>>>>>> | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!faEur1uGDdy2EYiXEYLCVq_UbMO1KiXKQa0vvY2nxKctWQDpgzsX7k9qTLBuMuQ6VNzwRHRHYc8jUopRLSBIKfE$ > > > > > From: Randall Mackie >>>>>>>> > Sent: Tuesday, April 29, 2025 3:33 PM > To: Klaij, Christiaan > Cc: Matthew Knepley; petsc-users at mcs.anl.gov>>>>>>>; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > You don't often get email from rlmackie862 at gmail.com>>>>>>>. Learn why this is important> > We had a similar issue last year that we eventually tracked down to a bug in Intel MPI AllReduce, which was around the same version you are using. > > Can you try a different MPI or the latest Intel One API and see if your error clears? > > Randy > > On Tue, Apr 29, 2025 at 8:17?AM Klaij, Christiaan via petsc-users >>>>>>>>>>>>>>>> wrote: > I don't think so, we have tracing in place to detect mismatches. But as soon as I switch the tracing on, the error disappears... Same if I add a counter or print statements before and after EventBegin/End. Looks like a memory corruption problem, maybe nothing to do with petsc despite the error message. > > Chris > > ________________________________________ > From: Matthew Knepley >>>>>>>>>>>>>>>> > Sent: Tuesday, April 29, 2025 1:50 PM > To: Klaij, Christiaan > Cc: Junchao Zhang; petsc-users at mcs.anl.gov>>>>>>>>>>>>>>>; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > On Tue, Apr 29, 2025 at 6:50?AM Klaij, Christiaan >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > Here's a slightly better error message, obtained --with-debugging=1 > > Is it possible that you have a mismatched EventBegin()/EventEnd() in your code? That could be why we cannot reproduce it here. > > Thanks, > > Matt > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Petsc has generated inconsistent data > [0]PETSC ERROR: MPIU_Allreduce() called in different locations (code lines) on different processors > [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjggvQAzPU$ for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 > [0]PETSC ERROR: ./refresco with 2 MPI process(es) and PETSC_ARCH on marclus3login2 by cklaij Tue Apr 29 12:43:54 2025 > [0]PETSC ERROR: Configure options: --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs --with-mpi-dir=/cm/shared/apps/openmpi/gcc/4.0.2 --with-x=0 --with-mpe=0 --with-debugging=1 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgVgVAJPM$ --with-blaslapack-dir=/cm/shared/apps/intel/oneapi/mkl/2021.4.0 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgo2JWTO4$ --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgX9ZMYJA$ --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-libs/superbuild --with-ssl=0 --with-shared-libraries=1 > [0]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 > [0]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:379 > [0]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > [0]PETSC ERROR: #4 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > [0]PETSC ERROR: #5 PetscLogNestedTreePrintTop() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 > [0]PETSC ERROR: #6 PetscLogHandlerView_Nested_XML() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 > [0]PETSC ERROR: #7 PetscLogHandlerView_Nested() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 > [0]PETSC ERROR: #8 PetscLogHandlerView() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 > [0]PETSC ERROR: #9 PetscLogView() at /home/cklaij/ReFRESCO/trunk/build-libs/superbuild/petsc/src/src/sys/logging/plog.c:2040 > [0]PETSC ERROR: #10 /home/cklaij/ReFRESCO/trunk/Code/src/petsc_include_impl.F90:130 > > ________________________________________ > [cid:ii_19681617e7812ff9cfc1] > dr. ir. Christiaan Klaij > | senior researcher | Research & Development | CFD Development > T +31 317 49 33 44 | C.Klaij at marin.nl>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjgJooxJhg$ > [Facebook] > [LinkedIn] > [YouTube] > > From: Klaij, Christiaan >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Sent: Monday, April 28, 2025 3:53 PM > To: Matthew Knepley > Cc: Junchao Zhang; petsc-users at mcs.anl.gov>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > Bisecting would be quite hard, it's not just the petsc version that changed, also other libs, compilers, even os components. > > Chris > > ________________________________________ > From: Matthew Knepley >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Sent: Monday, April 28, 2025 3:06 PM > To: Klaij, Christiaan > Cc: Junchao Zhang; petsc-users at mcs.anl.gov>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > You don't often get email from knepley at gmail.com>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>. Learn why this is important > On Mon, Apr 28, 2025 at 8:45?AM Klaij, Christiaan via petsc-users >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > I've tried adding a nested log viewer to src/snes/tutorials/ex70.c, > but it does not replicate the problem and works fine. > > Perhaps it is related to fortran, since the manualpage of > PetscLogNestedBegin says "No fortran support" (why? we've been > using it in fortran ever since). > > Therefore I've tried adding it to src/snes/ex5f90.F90 and that > also works fine. It seems I cannot replicate the problem in a > small example, unfortunately. > > We cannot replicate it here. Is there a chance you could bisect to see what change is responsible? > > Thanks, > > Matt > > Chris > > ________________________________________ > From: Junchao Zhang >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Sent: Saturday, April 26, 2025 3:51 PM > To: Klaij, Christiaan > Cc: petsc-users at mcs.anl.gov>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>; Isaac, Toby > Subject: Re: [petsc-users] problem with nested logging > > You don't often get email from junchao.zhang at gmail.com>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>. Learn why this is important > Toby (Cc'ed) might know it. Or could you provide an example? > > --Junchao Zhang > > > On Fri, Apr 25, 2025 at 3:31?AM Klaij, Christiaan via petsc-users >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > We recently upgraded from 3.19.4 to 3.22.4 but face the problem below with the nested logging. Any ideas? > > Chris > > > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: General MPI error > [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer > [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6gIT68pbk$ for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 > [1]PETSC ERROR: refresco with 2 MPI process(es) and PETSC_ARCH on marclus3login2 by jwindt Fri Apr 25 08:52:30 2025 > [1]PETSC ERROR: Configure options: --prefix=/home/jwindt/cmake_builds/refresco/install-libs-gnu --with-mpi-dir=/cm/shared/apps/openmpi/gcc/4.0.2 --with-x=0 --with-mpe=0 --with-debugging=0 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6grH5BbeU$ --with-blaslapack-dir=/cm/shared/apps/intel/oneapi/mkl/2021.4.0 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6gw4-tEtY$ --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6gHq4uYiY$ --with-packages-build-dir=/home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" > [1]PETSC ERROR: #1 PetscLogNestedTreePrint() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:330 > [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 > [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 > [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 > [1]PETSC ERROR: #7 PetscLogHandlerView() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 > [1]PETSC ERROR: #8 PetscLogView() at /home/jwindt/cmake_builds/refresco/build-libs-gnu/superbuild/petsc/src/src/sys/logging/plog.c:2040 > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD > with errorcode 98. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > [cid:ii_196725d1e2a809852191] > dr. ir. Christiaan Klaij > | senior researcher | Research & Development | CFD Development > T +31 317 49 33 44 | C.Klaij at marin.nl>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!cmLENZvO_Uydoa8ciUmsyX-F-QiJt9a2ZfQRUvQnRibGm7VE6PED7S_BDsUgjOzvPZIJyiIoJ8bLJk6g8TwMMcw$ > [Facebook] > [LinkedIn] > [YouTube] > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjg539kFLg$ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fFq1daAFFpKhBA_NWU3sd2QJe_S44rklqeRi0TB57XI0nQsh9jgy8iw3JNGpBbd21zqvO3QlGTLa7kjg539kFLg$ > > > > -- > Stefano > > > -- > Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image159969.png Type: image/png Size: 5004 bytes Desc: image159969.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image143723.png Type: image/png Size: 487 bytes Desc: image143723.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image311745.png Type: image/png Size: 504 bytes Desc: image311745.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image779266.png Type: image/png Size: 482 bytes Desc: image779266.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image240518.png Type: image/png Size: 5004 bytes Desc: image240518.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image614921.png Type: image/png Size: 487 bytes Desc: image614921.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image484724.png Type: image/png Size: 504 bytes Desc: image484724.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image987297.png Type: image/png Size: 482 bytes Desc: image987297.png URL: From C.Klaij at marin.nl Sun Jul 6 12:56:46 2025 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Sun, 6 Jul 2025 17:56:46 +0000 Subject: [petsc-users] problem with nested logging, standalone example Message-ID: Attached is a standalone example of the issue described in the earlier thread "problem with nested logging". The issue appeared somewhere between petsc 3.19.4 and 3.23.4. The example is a variation of ../ksp/tutorials/ex2f.F90, where I've added the nested log viewer with one event as well as the solution of a small system on rank zero. When running on mulitple procs the example hangs during PetscLogView with the backtrace below. The configure.log is also attached in the hope that you can replicate the issue. Chris #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 , src=1, tag=-12, comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling ( sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) at base/coll_base_allreduce.c:247 #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630, algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142 #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) at coll_tuned_decision_fixed.c:216 #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaecb80) at coll_hcoll_ops.c:217 #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30) at allreduce.c:123 #6 0x0000155553eabede in MPIU_Allreduce_Private () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #15 0x0000155553e56232 in PetscLogHandlerView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #16 0x0000155553e588c3 in PetscLogView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #17 0x0000155553e40eb5 in petsclogview_ () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #18 0x0000000000402c8b in MAIN__ () #19 0x00000000004023df in main () dr. ir. Christiaan Klaij | senior researcher Research & Development | CFD Development T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imksXwTrKU$ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image043083.png Type: image/png Size: 5004 bytes Desc: image043083.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image954569.png Type: image/png Size: 487 bytes Desc: image954569.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image488066.png Type: image/png Size: 504 bytes Desc: image488066.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image817580.png Type: image/png Size: 482 bytes Desc: image817580.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex2f-cklaij-dbg.F90 Type: text/x-fortran Size: 15291 bytes Desc: ex2f-cklaij-dbg.F90 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 2095623 bytes Desc: configure.log URL: From bubu.cattaneo at gmail.com Mon Jul 7 08:40:07 2025 From: bubu.cattaneo at gmail.com (Alberto Cattaneo) Date: Mon, 7 Jul 2025 09:40:07 -0400 Subject: [petsc-users] Petsc/Jax no copy interfacing issues Message-ID: Greetings. I hope this email reaches you well. I?m trying to get JAX and PETSc to work together in a no-copy system using the DLPack tools in both. Unfortunately I can?t seem to get it to work right. Ideally, I?d like to create a PETSc vec object using petsc4py, pass it to to a JAX object without copying, make a change to it in a JAX jitted function and have that change reflected in the PETSc object. All of this without copying. Of note: When I try to do this I get an error that the alignment is wrong and a copy must be made when I call the from-dlpack function but changing the alignment in the PETSc ./config stage to 32 causes the error message to disappear, even so it still doesn?t function correctly. I?ve tried looking through the documentation, but I?m getting a little turned around. I?ve included a code snippet below: *from petsc4py import PETSc as PETSc* *import jax* *from functools import partial* *import jax.numpy as jnp* *@partial(jax.jit, donate_argnums=(0,))* *def set_in_place(x):* * return x.at [:].set(3.0)* *print('\nTesting jax from_dlpack given a PETSc vector that was allocated by PETSc')* *x = jnp.ones((1000,1))* *y_petsc = PETSc.Vec().createSeq(x.shape[0])* *y_petsc.set(0.0)* *print(hex(y_petsc.handle))* *y2_petsc = PETSc.Vec().createWithDLPack(y_petsc.toDLPack('rw'))* *y2_petsc.set(-1.0)* *assert y_petsc.getValue(0) == y2_petsc.getValue(0)* *print('After creating a second PETSc vector via a DLPack of the first, modifying the memory of one affects the other.')* *#y = jnp.from_dlpack(y_petsc.toDLPack('rw'), copy=False)* *y = jnp.from_dlpack(y_petsc, copy=False)* *orig_ptr = y.unsafe_buffer_pointer()* *print(f'before: ptr at {hex(orig_ptr)}')* *y = set_in_place(y)* *print(f'after: ptr at {hex(y.unsafe_buffer_pointer())}')* *assert orig_ptr == y.unsafe_buffer_pointer()* *#assert y_petsc.getValue(0) == y[0], f'The PETSc value {y_petsc.getValue(0)} did not match the JAX value {y[0]}, so modifying the JAX memory did not affect the PETSc memory.'* I?d like the bottom two asserts to pass, but I can only get one of them. If somebody is familiar with this issue I?d greatly appreciate the assistance. Respectfully: Alberto -------------- next part -------------- An HTML attachment was scrubbed... URL: From mac3bar at gmail.com Mon Jul 7 09:32:41 2025 From: mac3bar at gmail.com (Art) Date: Mon, 7 Jul 2025 10:32:41 -0400 Subject: [petsc-users] Matrix-Free J*v in PETSc Message-ID: Hi all, I am integrating a stiff system of ODEs/PDEs using PETSc TS (typically with BDF or other implicit time-stepping schemes), and I would like to exploit the fact that I can efficiently compute the action of the Jacobian on a vector (Jv) without assembling the full Jacobian matrix. Since for a large system it becomes expensive to assemble the Jacobian in each iteration. In scikits.odes (SUNDIALS/CVODE), there is a native API for passing only a J*v routine to the time integrator. In my experience, when I use only a Jacobian-vector product routine (without assembling the full matrix), the performance improves significantly for large systems. However, in PETSc TS, the workflow seems more matrix-centric, and I have only found the possibility to use MatShell for the Jacobian Is there a way to do something similar in PETSc TS (for BDF or other implicit schemes)? Currently, I use the matrix-free Newton-Krylov method to approximate the Jacobian and have adjusted the tolerances to achieve convergence, as recommended by Barry. In that case, I obtain similar integration times with scikits.odes CVODE without using the Jacobian times vector. Best regards, Art -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Jul 8 00:54:08 2025 From: jed at jedbrown.org (Jed Brown) Date: Mon, 07 Jul 2025 23:54:08 -0600 Subject: [petsc-users] Matrix-Free J*v in PETSc In-Reply-To: References: Message-ID: <87a55fut1b.fsf@jedbrown.org> Using MatShell is the standard method. Note that MatShell allows exposing other "matrix operations", such as producing a diagonal or other preconditioning ingredients. Art writes: > Hi all, > > I am integrating a stiff system of ODEs/PDEs using PETSc TS (typically with > BDF or other implicit time-stepping schemes), and I would like to exploit > the fact that I can efficiently compute the action of the Jacobian on a > vector (Jv) without assembling the full Jacobian matrix. Since for a large > system it becomes expensive to assemble the Jacobian in each iteration. In > scikits.odes (SUNDIALS/CVODE), there is a native API for passing only a J*v > routine to the time integrator. In my experience, when I use only a > Jacobian-vector product routine (without assembling the full matrix), the > performance improves significantly for large systems. However, in PETSc TS, > the workflow seems more matrix-centric, and I have only found the > possibility to use MatShell for the Jacobian > > Is there a way to do something similar in PETSc TS (for BDF or other > implicit schemes)? > > Currently, I use the matrix-free Newton-Krylov method to approximate the > Jacobian and have adjusted the tolerances to achieve convergence, as > recommended by Barry. In that case, I obtain similar integration times > with scikits.odes CVODE without using the Jacobian times vector. > > Best regards, > > Art From knepley at gmail.com Tue Jul 8 05:33:23 2025 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 8 Jul 2025 06:33:23 -0400 Subject: [petsc-users] Matrix-Free J*v in PETSc In-Reply-To: <87a55fut1b.fsf@jedbrown.org> References: <87a55fut1b.fsf@jedbrown.org> Message-ID: Also note that MatShell is _exactly_ the same as the CVODE interface. It is just a wrapper for that function pointer so that we do not need to change the top-level interface. Thanks, Matt On Tue, Jul 8, 2025 at 2:10?AM Jed Brown wrote: > Using MatShell is the standard method. Note that MatShell allows exposing > other "matrix operations", such as producing a diagonal or other > preconditioning ingredients. > > Art writes: > > > Hi all, > > > > I am integrating a stiff system of ODEs/PDEs using PETSc TS (typically > with > > BDF or other implicit time-stepping schemes), and I would like to exploit > > the fact that I can efficiently compute the action of the Jacobian on a > > vector (Jv) without assembling the full Jacobian matrix. Since for a > large > > system it becomes expensive to assemble the Jacobian in each iteration. > In > > scikits.odes (SUNDIALS/CVODE), there is a native API for passing only a > J*v > > routine to the time integrator. In my experience, when I use only a > > Jacobian-vector product routine (without assembling the full matrix), the > > performance improves significantly for large systems. However, in PETSc > TS, > > the workflow seems more matrix-centric, and I have only found the > > possibility to use MatShell for the Jacobian > > > > Is there a way to do something similar in PETSc TS (for BDF or other > > implicit schemes)? > > > > Currently, I use the matrix-free Newton-Krylov method to approximate the > > Jacobian and have adjusted the tolerances to achieve convergence, as > > recommended by Barry. In that case, I obtain similar integration times > > with scikits.odes CVODE without using the Jacobian times vector. > > > > Best regards, > > > > Art > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ee4C66EMPaHUhn55k5ZDEY4tEPmxzmhPifOJHGQtd6NBISPZb8UlupQ3u-o3Vt7n7NjYikEhGokM92hphqtq$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Tue Jul 8 12:21:05 2025 From: hongzhang at anl.gov (Zhang, Hong) Date: Tue, 8 Jul 2025 17:21:05 +0000 Subject: [petsc-users] Matrix-Free J*v in PETSc In-Reply-To: References: Message-ID: <3D2D0C40-E99A-4B05-8CFF-318C0CB5161B@anl.gov> Hi Art, Here is a TS example that uses MatShell for implicit time integration and adjoint sensitivity calculation: src/ts/tutorials/advection-diffusion-reaction/ex5adj_mf.c You will need to provide a (jvp) routine like MyIMatMult() in this example. Adjoints require vjp (vector-Jacobian product) routines that are also included in this example. Hong From: petsc-users on behalf of Art Date: Monday, July 7, 2025 at 9:33?AM To: "petsc-users at mcs.anl.gov" Subject: [petsc-users] Matrix-Free J*v in PETSc Hi all, I am integrating a stiff system of ODEs/PDEs using PETSc TS (typically with BDF or other implicit time-stepping schemes), and I would like to exploit the fact that I can efficiently compute the action of the Jacobian on a vector (Jv) without assembling the full Jacobian matrix. Since for a large system it becomes expensive to assemble the Jacobian in each iteration. In scikits.odes (SUNDIALS/CVODE), there is a native API for passing only a J*v routine to the time integrator. In my experience, when I use only a Jacobian-vector product routine (without assembling the full matrix), the performance improves significantly for large systems. However, in PETSc TS, the workflow seems more matrix-centric, and I have only found the possibility to use MatShell for the Jacobian Is there a way to do something similar in PETSc TS (for BDF or other implicit schemes)? Currently, I use the matrix-free Newton-Krylov method to approximate the Jacobian and have adjusted the tolerances to achieve convergence, as recommended by Barry. In that case, I obtain similar integration times with scikits.odes CVODE without using the Jacobian times vector. Best regards, Art -------------- next part -------------- An HTML attachment was scrubbed... URL: From 13390589636 at 163.com Tue Jul 8 12:29:25 2025 From: 13390589636 at 163.com (mengjie liang) Date: Wed, 9 Jul 2025 01:29:25 +0800 (CST) Subject: [petsc-users] About PetscFD Message-ID: <1c21b404.a880.197eb15d3da.Coremail.13390589636@163.com> Dear Petsc, Big fan here, you are amazing. I want to test a finite difference method with parallel multigrid. I have done it using DMDA and now I need to do it with local mesh (non-conforming) refinement. I notice that the unreleased PetscFD is all I need. I did find Dr. Abhishek's thesis, which provides the unreleased PetscFD code. I also go through professor Dave Salac's talk in the annual meeting. I know that there is no PetscFD code implementing local mesh refinement in the fd/tests directory. I tried to adapt the mesh in ex2 and view it in a vtk file. It doesn't work. But, in the talk and the thesis, there is a Laplace example implementing the local mesh refinement. I plan to go through the fd source code so that i can find out what is going on. Before that, I have to ask if you can show me the mentioned Laplace code, that would be great. Best regards, --Mengjie -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Tue Jul 8 12:36:04 2025 From: hongzhang at anl.gov (Zhang, Hong) Date: Tue, 8 Jul 2025 17:36:04 +0000 Subject: [petsc-users] Matrix-Free J*v in PETSc In-Reply-To: <3D2D0C40-E99A-4B05-8CFF-318C0CB5161B@anl.gov> References: <3D2D0C40-E99A-4B05-8CFF-318C0CB5161B@anl.gov> Message-ID: <09C40D76-43A2-4C17-95F1-3AEBDAA365E3@anl.gov> For a python example, please take a look at src/binding/petsc4py/demo/legacy/ode/vanderpol.py and you will see how jvp is done in the class IJacShell. Hong From: petsc-users on behalf of "Zhang, Hong via petsc-users" Reply-To: "Zhang, Hong" Date: Tuesday, July 8, 2025 at 12:21?PM To: Art , "petsc-users at mcs.anl.gov" Subject: Re: [petsc-users] Matrix-Free J*v in PETSc Hi Art, Here is a TS example that uses MatShell for implicit time integration and adjoint sensitivity calculation: src/ts/tutorials/advection-diffusion-reaction/ex5adj_mf.c You will need to provide a (jvp) routine like MyIMatMult() in this example. Adjoints require vjp (vector-Jacobian product) routines that are also included in this example. Hong From: petsc-users on behalf of Art Date: Monday, July 7, 2025 at 9:33?AM To: "petsc-users at mcs.anl.gov" Subject: [petsc-users] Matrix-Free J*v in PETSc Hi all, I am integrating a stiff system of ODEs/PDEs using PETSc TS (typically with BDF or other implicit time-stepping schemes), and I would like to exploit the fact that I can efficiently compute the action of the Jacobian on a vector (Jv) without assembling the full Jacobian matrix. Since for a large system it becomes expensive to assemble the Jacobian in each iteration. In scikits.odes (SUNDIALS/CVODE), there is a native API for passing only a J*v routine to the time integrator. In my experience, when I use only a Jacobian-vector product routine (without assembling the full matrix), the performance improves significantly for large systems. However, in PETSc TS, the workflow seems more matrix-centric, and I have only found the possibility to use MatShell for the Jacobian Is there a way to do something similar in PETSc TS (for BDF or other implicit schemes)? Currently, I use the matrix-free Newton-Krylov method to approximate the Jacobian and have adjusted the tolerances to achieve convergence, as recommended by Barry. In that case, I obtain similar integration times with scikits.odes CVODE without using the Jacobian times vector. Best regards, Art -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Tue Jul 8 14:56:08 2025 From: hongzhang at anl.gov (Zhang, Hong) Date: Tue, 8 Jul 2025 19:56:08 +0000 Subject: [petsc-users] Petsc/Jax no copy interfacing issues In-Reply-To: References: Message-ID: <7BA68283-588B-4CB3-A86C-C14E05A30D3F@anl.gov> Hi Alberto, 1. To check the array pointer on the PETSc side, you can do print(hex(y_petsc.array.ctypes.data)). Then you will see a pointer mismatch caused by the line y = jnp.from_dlpack(y_petsc, copy=False). This is because you configured PETSc in double precision, but JAX uses single precision by default. You can either add jax.config.update("jax_enable_x64", True) to make JAX use double precision number or configure PETSc to support single precision. 2. Once you fix this precision mismatch, the in-place conversion between PETSc and JAX should work. However, .at[].set() in JAX does not guarantee to operate in-place. The array updates in JAX are generally performed out-of-place by design. You may do the updates in PETSc so that it won?t break the zero-copy system. Hong From: petsc-users on behalf of Alberto Cattaneo Date: Monday, July 7, 2025 at 8:40?AM To: "petsc-users at mcs.anl.gov" Subject: [petsc-users] Petsc/Jax no copy interfacing issues Greetings. I hope this email reaches you well. I?m trying to get JAX and PETSc to work together in a no-copy system using the DLPack tools in both. Unfortunately I can?t seem to get it to work right. Ideally, I?d like to create a PETSc vec object using petsc4py, pass it to to a JAX object without copying, make a change to it in a JAX jitted function and have that change reflected in the PETSc object. All of this without copying. Of note: When I try to do this I get an error that the alignment is wrong and a copy must be made when I call the from-dlpack function but changing the alignment in the PETSc ./config stage to 32 causes the error message to disappear, even so it still doesn?t function correctly. I?ve tried looking through the documentation, but I?m getting a little turned around. I?ve included a code snippet below: from petsc4py import PETSc as PETSc import jax from functools import partial import jax.numpy as jnp @partial(jax.jit, donate_argnums=(0,)) def set_in_place(x): return x.at[:].set(3.0) print('\nTesting jax from_dlpack given a PETSc vector that was allocated by PETSc') x = jnp.ones((1000,1)) y_petsc = PETSc.Vec().createSeq(x.shape[0]) y_petsc.set(0.0) print(hex(y_petsc.handle)) y2_petsc = PETSc.Vec().createWithDLPack(y_petsc.toDLPack('rw')) y2_petsc.set(-1.0) assert y_petsc.getValue(0) == y2_petsc.getValue(0) print('After creating a second PETSc vector via a DLPack of the first, modifying the memory of one affects the other.') #y = jnp.from_dlpack(y_petsc.toDLPack('rw'), copy=False) y = jnp.from_dlpack(y_petsc, copy=False) orig_ptr = y.unsafe_buffer_pointer() print(f'before: ptr at {hex(orig_ptr)}') y = set_in_place(y) print(f'after: ptr at {hex(y.unsafe_buffer_pointer())}') assert orig_ptr == y.unsafe_buffer_pointer() #assert y_petsc.getValue(0) == y[0], f'The PETSc value {y_petsc.getValue(0)} did not match the JAX value {y[0]}, so modifying the JAX memory did not affect the PETSc memory.' I?d like the bottom two asserts to pass, but I can only get one of them. If somebody is familiar with this issue I?d greatly appreciate the assistance. Respectfully: Alberto -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Tue Jul 8 15:58:49 2025 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Tue, 8 Jul 2025 15:58:49 -0500 Subject: [petsc-users] problem with nested logging, standalone example In-Reply-To: References: Message-ID: Hi, Chris, First, I had to fix an error in your test by adding " PetscCallA(MatSetFromOptions(AA,ierr))" at line 254. [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Mat object's type is not set: Argument # 1 ... [0]PETSC ERROR: #1 MatSetValues() at /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503 [0]PETSC ERROR: #2 ex2f.F90:258 Then I could ran the test without problems mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always 0 KSP Residual norm 1.11803 1 KSP Residual norm 0.591608 2 KSP Residual norm 0.316228 3 KSP Residual norm < 1.e-11 0 KSP Residual norm 0.707107 1 KSP Residual norm 0.408248 2 KSP Residual norm < 1.e-11 Norm of error < 1.e-12 iterations 3 I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-openmpi --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" Could you fix the error and retry? --Junchao Zhang On Sun, Jul 6, 2025 at 12:57?PM Klaij, Christiaan via petsc-users < petsc-users at mcs.anl.gov> wrote: > Attached is a standalone example of the issue described in the > earlier thread "problem with nested logging". The issue appeared > somewhere between petsc 3.19.4 and 3.23.4. > > The example is a variation of ../ksp/tutorials/ex2f.F90, where > I've added the nested log viewer with one event as well as the > solution of a small system on rank zero. > > When running on mulitple procs the example hangs during > PetscLogView with the backtrace below. The configure.log is also > attached in the hope that you can replicate the issue. > > Chris > > > #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, > datatype=0x15554c9ef900 , src=1, tag=-12, > comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 > #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > at base/coll_base_allreduce.c:247 > #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630, > algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142 > #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > at coll_tuned_decision_fixed.c:216 > #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, > rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaecb80) > at coll_hcoll_ops.c:217 > #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, > recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 > , op=0x15554ca28980 , comm=0x7f1e30) > at allreduce.c:123 > #6 0x0000155553eabede in MPIU_Allreduce_Private () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #15 0x0000155553e56232 in PetscLogHandlerView () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #16 0x0000155553e588c3 in PetscLogView () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #17 0x0000155553e40eb5 in petsclogview_ () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #18 0x0000000000402c8b in MAIN__ () > #19 0x00000000004023df in main () > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44 <+31%20317%2049%2033%2044> | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!ZHsZOjVtZW563bLFZCYkP0mARgO90Fn8zT_BWJ4h6gc-X9izzpyCqZ13ci6S6dNQXmutKi_6BIx2wmwGEBYl5mDNm02m$ > > [image: Facebook] > > [image: LinkedIn] > > [image: YouTube] > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image043083.png Type: image/png Size: 5004 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image954569.png Type: image/png Size: 487 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image488066.png Type: image/png Size: 504 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image817580.png Type: image/png Size: 482 bytes Desc: not available URL: From mac3bar at gmail.com Wed Jul 9 09:10:29 2025 From: mac3bar at gmail.com (Art) Date: Wed, 9 Jul 2025 10:10:29 -0400 Subject: [petsc-users] Matrix-Free J*v in PETSc In-Reply-To: <09C40D76-43A2-4C17-95F1-3AEBDAA365E3@anl.gov> References: <3D2D0C40-E99A-4B05-8CFF-318C0CB5161B@anl.gov> <09C40D76-43A2-4C17-95F1-3AEBDAA365E3@anl.gov> Message-ID: Thank you very much for your help and the examples provided. I was able to implement the Jacobian-times-vector (Jv) approach as recommended, and it works great!. Art El mar, 8 jul 2025 a las 13:36, Zhang, Hong () escribi?: > For a python example, please take a look at > src/binding/petsc4py/demo/legacy/ode/vanderpol.py and you will see how jvp > is done in the class IJacShell. > > > > Hong > > > > *From: *petsc-users on behalf of > "Zhang, Hong via petsc-users" > *Reply-To: *"Zhang, Hong" > *Date: *Tuesday, July 8, 2025 at 12:21?PM > *To: *Art , "petsc-users at mcs.anl.gov" < > petsc-users at mcs.anl.gov> > *Subject: *Re: [petsc-users] Matrix-Free J*v in PETSc > > > > Hi Art, > > Here is a TS example that uses MatShell for implicit time integration and > adjoint sensitivity calculation: > src/ts/tutorials/advection-diffusion-reaction/ex5adj_mf.c > > > You will need to provide a (jvp) routine like MyIMatMult() in this > example. Adjoints require vjp (vector-Jacobian product) routines that are > also included in this example. > > Hong > > *From: *petsc-users on behalf of Art < > mac3bar at gmail.com> > *Date: *Monday, July 7, 2025 at 9:33?AM > *To: *"petsc-users at mcs.anl.gov" > *Subject: *[petsc-users] Matrix-Free J*v in PETSc > > > > Hi all, > > I am integrating a stiff system of ODEs/PDEs using PETSc TS (typically > with BDF or other implicit time-stepping schemes), and I would like to > exploit the fact that I can efficiently compute the action of the Jacobian > on a vector (Jv) without assembling the full Jacobian matrix. Since for a > large system it becomes expensive to assemble the Jacobian in each > iteration. In scikits.odes (SUNDIALS/CVODE), there is a native API for > passing only a J*v routine to the time integrator. In my experience, when I > use only a Jacobian-vector product routine (without assembling the full > matrix), the performance improves significantly for large systems. However, > in PETSc TS, the workflow seems more matrix-centric, and I have only found > the possibility to use MatShell for the Jacobian > > Is there a way to do something similar in PETSc TS (for BDF or other > implicit schemes)? > > Currently, I use the matrix-free Newton-Krylov method to approximate the > Jacobian and have adjusted the tolerances to achieve convergence, as > recommended by Barry. In that case, I obtain similar integration times > with scikits.odes CVODE without using the Jacobian times vector. > > Best regards, > > Art > -------------- next part -------------- An HTML attachment was scrubbed... URL: From C.Klaij at marin.nl Thu Jul 10 03:15:05 2025 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Thu, 10 Jul 2025 08:15:05 +0000 Subject: [petsc-users] problem with nested logging, standalone example In-Reply-To: References: Message-ID: Hi Junchao, Thanks for testing. I've fixed the error but unfortunately that doesn't change the behavior, the code still hangs as before, with the same stack trace... Chris _____ dr. ir. Christiaan Klaij | senior researcher Research & Development | CFD Development T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!bV-wSzFdop4YFdAYdaN5SqxF1Ea1onC_aTyxTwYg3TAYhYJpnIuO7MDbEv5VmkfbhJpB3EHweorcvDLsr_38Wj4$ ___________________________________ From: Junchao Zhang Sent: Tuesday, July 8, 2025 10:58 PM To: Klaij, Christiaan Cc: PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example Hi, Chris, First, I had to fix an error in your test by adding " PetscCallA(MatSetFromOptions(AA,ierr))" at line 254. [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Mat object's type is not set: Argument # 1 ... [0]PETSC ERROR: #1 MatSetValues() at /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503 [0]PETSC ERROR: #2 ex2f.F90:258 Then I could ran the test without problems mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always 0 KSP Residual norm 1.11803 1 KSP Residual norm 0.591608 2 KSP Residual norm 0.316228 3 KSP Residual norm < 1.e-11 0 KSP Residual norm 0.707107 1 KSP Residual norm 0.408248 2 KSP Residual norm < 1.e-11 Norm of error < 1.e-12 iterations 3 I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-openmpi --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" Could you fix the error and retry? --Junchao Zhang On Sun, Jul 6, 2025 at 12:57?PM Klaij, Christiaan via petsc-users > wrote: Attached is a standalone example of the issue described in the earlier thread "problem with nested logging". The issue appeared somewhere between petsc 3.19.4 and 3.23.4. The example is a variation of ../ksp/tutorials/ex2f.F90, where I've added the nested log viewer with one event as well as the solution of a small system on rank zero. When running on mulitple procs the example hangs during PetscLogView with the backtrace below. The configure.log is also attached in the hope that you can replicate the issue. Chris #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 , src=1, tag=-12, comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling ( sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) at base/coll_base_allreduce.c:247 #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630, algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142 #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) at coll_tuned_decision_fixed.c:216 #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaecb80) at coll_hcoll_ops.c:217 #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30) at allreduce.c:123 #6 0x0000155553eabede in MPIU_Allreduce_Private () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #15 0x0000155553e56232 in PetscLogHandlerView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #16 0x0000155553e588c3 in PetscLogView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #17 0x0000155553e40eb5 in petsclogview_ () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #18 0x0000000000402c8b in MAIN__ () #19 0x00000000004023df in main () [cid:ii_197ebccaa1d27ee6ef21] dr. ir. Christiaan Klaij | senior researcher Research & Development | CFD Development T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!bV-wSzFdop4YFdAYdaN5SqxF1Ea1onC_aTyxTwYg3TAYhYJpnIuO7MDbEv5VmkfbhJpB3EHweorcvDLsr_38Wj4$ [Facebook] [LinkedIn] [YouTube] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image433266.png Type: image/png Size: 5004 bytes Desc: image433266.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image106439.png Type: image/png Size: 487 bytes Desc: image106439.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image416923.png Type: image/png Size: 504 bytes Desc: image416923.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image856574.png Type: image/png Size: 482 bytes Desc: image856574.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image043083.png Type: image/png Size: 5004 bytes Desc: image043083.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image954569.png Type: image/png Size: 487 bytes Desc: image954569.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image488066.png Type: image/png Size: 504 bytes Desc: image488066.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image817580.png Type: image/png Size: 482 bytes Desc: image817580.png URL: From C.Klaij at marin.nl Thu Jul 10 03:38:55 2025 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Thu, 10 Jul 2025 08:38:55 +0000 Subject: [petsc-users] problem with nested logging, standalone example In-Reply-To: References: Message-ID: An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0, the code does not hang but gives the error below. Chris $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always 0 KSP Residual norm 1.11803 1 KSP Residual norm 0.591608 2 KSP Residual norm 0.316228 3 KSP Residual norm < 1.e-11 0 KSP Residual norm 0.707107 1 KSP Residual norm 0.408248 2 KSP Residual norm < 1.e-11 Norm of error < 1.e-12 iterations 3 [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: General MPI error [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcs91fyp4$ for trouble shooting. [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on login1 by cklaij Thu Jul 10 10:33:33 2025 [1]PETSC ERROR: Configure options: --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 --with-mpe=0 --with-debugging=0 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcEB0dwdE$ --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcW9tvX1c$ --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcI1wRWu4$ --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377 [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 [1]PETSC ERROR: #7 PetscLogHandlerView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 [1]PETSC ERROR: #8 PetscLogView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040 [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301 -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF Proc: [[55228,1],1] Errorcode: 98 NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- -------------------------------------------------------------------------- prterun has exited due to process rank 1 with PID 0 on node login1 calling "abort". This may have caused other processes in the application to be terminated by signals sent by prterun (as reported here). -------------------------------------------------------------------------- _____ dr. ir. Christiaan Klaij | senior researcher Research & Development | CFD Development T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcO8dj_LY$ ___________________________________ From: Klaij, Christiaan Sent: Thursday, July 10, 2025 10:15 AM To: Junchao Zhang Cc: PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example Hi Junchao, Thanks for testing. I've fixed the error but unfortunately that doesn't change the behavior, the code still hangs as before, with the same stack trace... Chris ________________________________________ From: Junchao Zhang Sent: Tuesday, July 8, 2025 10:58 PM To: Klaij, Christiaan Cc: PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example Hi, Chris, First, I had to fix an error in your test by adding " PetscCallA(MatSetFromOptions(AA,ierr))" at line 254. [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Mat object's type is not set: Argument # 1 ... [0]PETSC ERROR: #1 MatSetValues() at /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503 [0]PETSC ERROR: #2 ex2f.F90:258 Then I could ran the test without problems mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always 0 KSP Residual norm 1.11803 1 KSP Residual norm 0.591608 2 KSP Residual norm 0.316228 3 KSP Residual norm < 1.e-11 0 KSP Residual norm 0.707107 1 KSP Residual norm 0.408248 2 KSP Residual norm < 1.e-11 Norm of error < 1.e-12 iterations 3 I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-openmpi --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" Could you fix the error and retry? --Junchao Zhang On Sun, Jul 6, 2025 at 12:57?PM Klaij, Christiaan via petsc-users > wrote: Attached is a standalone example of the issue described in the earlier thread "problem with nested logging". The issue appeared somewhere between petsc 3.19.4 and 3.23.4. The example is a variation of ../ksp/tutorials/ex2f.F90, where I've added the nested log viewer with one event as well as the solution of a small system on rank zero. When running on mulitple procs the example hangs during PetscLogView with the backtrace below. The configure.log is also attached in the hope that you can replicate the issue. Chris #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 , src=1, tag=-12, comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling ( sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) at base/coll_base_allreduce.c:247 #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630, algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142 #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) at coll_tuned_decision_fixed.c:216 #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaecb80) at coll_hcoll_ops.c:217 #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30) at allreduce.c:123 #6 0x0000155553eabede in MPIU_Allreduce_Private () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #15 0x0000155553e56232 in PetscLogHandlerView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #16 0x0000155553e588c3 in PetscLogView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #17 0x0000155553e40eb5 in petsclogview_ () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #18 0x0000000000402c8b in MAIN__ () #19 0x00000000004023df in main () [cid:ii_197ebccaa1d27ee6ef21] dr. ir. Christiaan Klaij | senior researcher Research & Development | CFD Development T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcO8dj_LY$ [Facebook] [LinkedIn] [YouTube] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image198746.png Type: image/png Size: 5004 bytes Desc: image198746.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image542473.png Type: image/png Size: 487 bytes Desc: image542473.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image555176.png Type: image/png Size: 504 bytes Desc: image555176.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image269837.png Type: image/png Size: 482 bytes Desc: image269837.png URL: From knepley at gmail.com Thu Jul 10 06:37:13 2025 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 10 Jul 2025 07:37:13 -0400 Subject: [petsc-users] problem with nested logging, standalone example In-Reply-To: References: Message-ID: On Thu, Jul 10, 2025 at 4:39?AM Klaij, Christiaan via petsc-users < petsc-users at mcs.anl.gov> wrote: > An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0, > the code does not hang but gives the error below. > The error on its face should be impossible. On line 289, we pass pointers to two variables on the stack. This would seem to indicate more general memory corruption. I know we asked before, but have you run under Address Sanitizer or Valgrind? Thanks, Matt > Chris > > > $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi > -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 1.11803 > 1 KSP Residual norm 0.591608 > 2 KSP Residual norm 0.316228 > 3 KSP Residual norm < 1.e-11 > 0 KSP Residual norm 0.707107 > 1 KSP Residual norm 0.408248 > 2 KSP Residual norm < 1.e-11 > Norm of error < 1.e-12 iterations 3 > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: General MPI error > [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer > [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!ehgYKc3ubRMXAbJr5a8kI3c3JFMYZe8L9fASpf0LYNC0oKs7PdCn2Tm5bh0sZtxA2uAu6W2Z0nEXCm0ya88i$ > > for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 > [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on > login1 by cklaij Thu Jul 10 10:33:33 2025 > [1]PETSC ERROR: Configure options: > --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs > --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 > --with-mpe=0 --with-debugging=0 --download-superlu_dist= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!ehgYKc3ubRMXAbJr5a8kI3c3JFMYZe8L9fASpf0LYNC0oKs7PdCn2Tm5bh0sZtxA2uAu6W2Z0nEXCnh-YWju$ > > --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 > --download-parmetis= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!ehgYKc3ubRMXAbJr5a8kI3c3JFMYZe8L9fASpf0LYNC0oKs7PdCn2Tm5bh0sZtxA2uAu6W2Z0nEXCkew_rwy$ > > --download-metis= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!ehgYKc3ubRMXAbJr5a8kI3c3JFMYZe8L9fASpf0LYNC0oKs7PdCn2Tm5bh0sZtxA2uAu6W2Z0nEXCmsCEJyh$ > > --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild > --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall > -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall > -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall > -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall > -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops > -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime > -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops > -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime > -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops > -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime > -Wno-unused-function -O3 -DNDEBUG" > [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 > [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377 > [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 > [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 > [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 > [1]PETSC ERROR: #7 PetscLogHandlerView() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 > [1]PETSC ERROR: #8 PetscLogView() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040 > [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301 > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF > Proc: [[55228,1],1] > Errorcode: 98 > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > prterun has exited due to process rank 1 with PID 0 on node login1 calling > "abort". This may have caused other processes in the application to be > terminated by signals sent by prterun (as reported here). > -------------------------------------------------------------------------- > > ________________________________________ > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44 <+31%20317%2049%2033%2044> | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!ehgYKc3ubRMXAbJr5a8kI3c3JFMYZe8L9fASpf0LYNC0oKs7PdCn2Tm5bh0sZtxA2uAu6W2Z0nEXCo7OpLen$ > > [image: Facebook] > > [image: LinkedIn] > > [image: YouTube] > > > > From: Klaij, Christiaan > Sent: Thursday, July 10, 2025 10:15 AM > To: Junchao Zhang > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > Hi Junchao, > > Thanks for testing. I've fixed the error but unfortunately that doesn't > change the behavior, the code still hangs as before, with the same stack > trace... > > Chris > > ________________________________________ > From: Junchao Zhang > Sent: Tuesday, July 8, 2025 10:58 PM > To: Klaij, Christiaan > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > Hi, Chris, > First, I had to fix an error in your test by adding " > PetscCallA(MatSetFromOptions(AA,ierr))" at line 254. > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Mat object's type is not set: Argument # 1 > ... > [0]PETSC ERROR: #1 MatSetValues() at > /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503 > [0]PETSC ERROR: #2 ex2f.F90:258 > > Then I could ran the test without problems > mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short > -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 1.11803 > 1 KSP Residual norm 0.591608 > 2 KSP Residual norm 0.316228 > 3 KSP Residual norm < 1.e-11 > 0 KSP Residual norm 0.707107 > 1 KSP Residual norm 0.408248 > 2 KSP Residual norm < 1.e-11 > Norm of error < 1.e-12 iterations 3 > > I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with > ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran > --download-openmpi --with-ssl=0 --with-shared-libraries=1 > CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" > CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " > COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" > CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " > FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 > -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 > -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 > -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 > -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 > -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 > -DNDEBUG" > > Could you fix the error and retry? > > --Junchao Zhang > > > On Sun, Jul 6, 2025 at 12:57?PM Klaij, Christiaan via petsc-users < > petsc-users at mcs.anl.gov> wrote: > Attached is a standalone example of the issue described in the > earlier thread "problem with nested logging". The issue appeared > somewhere between petsc 3.19.4 and 3.23.4. > > The example is a variation of ../ksp/tutorials/ex2f.F90, where > I've added the nested log viewer with one event as well as the > solution of a small system on rank zero. > > When running on mulitple procs the example hangs during > PetscLogView with the backtrace below. The configure.log is also > attached in the hope that you can replicate the issue. > > Chris > > > #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, > datatype=0x15554c9ef900 , src=1, tag=-12, > comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 > #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > at base/coll_base_allreduce.c:247 > #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630, > algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142 > #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > at coll_tuned_decision_fixed.c:216 > #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, > rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaecb80) > at coll_hcoll_ops.c:217 > #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, > recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 > , op=0x15554ca28980 , comm=0x7f1e30) > at allreduce.c:123 > #6 0x0000155553eabede in MPIU_Allreduce_Private () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #15 0x0000155553e56232 in PetscLogHandlerView () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #16 0x0000155553e588c3 in PetscLogView () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #17 0x0000155553e40eb5 in petsclogview_ () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #18 0x0000000000402c8b in MAIN__ () > #19 0x00000000004023df in main () > [cid:ii_197ebccaa1d27ee6ef21] > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!ehgYKc3ubRMXAbJr5a8kI3c3JFMYZe8L9fASpf0LYNC0oKs7PdCn2Tm5bh0sZtxA2uAu6W2Z0nEXCo7OpLen$ > > < > https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imk4ivm_tE$ > > > [Facebook]< > https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkLNCvsiI$ > > > [LinkedIn]< > https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkrb79Ay4$ > > > [YouTube]< > https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkJiCoeLw$ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ehgYKc3ubRMXAbJr5a8kI3c3JFMYZe8L9fASpf0LYNC0oKs7PdCn2Tm5bh0sZtxA2uAu6W2Z0nEXCkcm7Yoj$ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image198746.png Type: image/png Size: 5004 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image542473.png Type: image/png Size: 487 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image555176.png Type: image/png Size: 504 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image269837.png Type: image/png Size: 482 bytes Desc: not available URL: From C.Klaij at marin.nl Thu Jul 10 07:46:47 2025 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Thu, 10 Jul 2025 12:46:47 +0000 Subject: [petsc-users] problem with nested logging, standalone example In-Reply-To: References: Message-ID: Hi Matt, Attached is the output of valgrind: $ mpirun -mca coll_hcoll_enable 0 -n 2 valgrind --track-origins=yes ./ex2f-cklaij-dbg -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > out 2>&1 Chris ________________________________________ From: Matthew Knepley Sent: Thursday, July 10, 2025 1:37 PM To: Klaij, Christiaan Cc: Junchao Zhang; PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example On Thu, Jul 10, 2025 at 4:39?AM Klaij, Christiaan via petsc-users > wrote: An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0, the code does not hang but gives the error below. The error on its face should be impossible. On line 289, we pass pointers to two variables on the stack. This would seem to indicate more general memory corruption. I know we asked before, but have you run under Address Sanitizer or Valgrind? Thanks, Matt Chris $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always 0 KSP Residual norm 1.11803 1 KSP Residual norm 0.591608 2 KSP Residual norm 0.316228 3 KSP Residual norm < 1.e-11 0 KSP Residual norm 0.707107 1 KSP Residual norm 0.408248 2 KSP Residual norm < 1.e-11 Norm of error < 1.e-12 iterations 3 [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: General MPI error [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGSUP46MA$ for trouble shooting. [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on login1 by cklaij Thu Jul 10 10:33:33 2025 [1]PETSC ERROR: Configure options: --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 --with-mpe=0 --with-debugging=0 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGUnQa2TU$ --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGHhVsNGA$ --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGgrD72NI$ --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377 [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 [1]PETSC ERROR: #7 PetscLogHandlerView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 [1]PETSC ERROR: #8 PetscLogView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040 [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301 -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF Proc: [[55228,1],1] Errorcode: 98 NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- -------------------------------------------------------------------------- prterun has exited due to process rank 1 with PID 0 on node login1 calling "abort". This may have caused other processes in the application to be terminated by signals sent by prterun (as reported here). -------------------------------------------------------------------------- ________________________________________ [cid:ii_197f41eaf2e74966e3f1] dr. ir. Christiaan Klaij | senior researcher Research & Development | CFD Development T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGPWK5ac8$ [Facebook] [LinkedIn] [YouTube] From: Klaij, Christiaan > Sent: Thursday, July 10, 2025 10:15 AM To: Junchao Zhang Cc: PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example Hi Junchao, Thanks for testing. I've fixed the error but unfortunately that doesn't change the behavior, the code still hangs as before, with the same stack trace... Chris ________________________________________ From: Junchao Zhang > Sent: Tuesday, July 8, 2025 10:58 PM To: Klaij, Christiaan Cc: PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example Hi, Chris, First, I had to fix an error in your test by adding " PetscCallA(MatSetFromOptions(AA,ierr))" at line 254. [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Mat object's type is not set: Argument # 1 ... [0]PETSC ERROR: #1 MatSetValues() at /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503 [0]PETSC ERROR: #2 ex2f.F90:258 Then I could ran the test without problems mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always 0 KSP Residual norm 1.11803 1 KSP Residual norm 0.591608 2 KSP Residual norm 0.316228 3 KSP Residual norm < 1.e-11 0 KSP Residual norm 0.707107 1 KSP Residual norm 0.408248 2 KSP Residual norm < 1.e-11 Norm of error < 1.e-12 iterations 3 I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-openmpi --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" Could you fix the error and retry? --Junchao Zhang On Sun, Jul 6, 2025 at 12:57?PM Klaij, Christiaan via petsc-users >> wrote: Attached is a standalone example of the issue described in the earlier thread "problem with nested logging". The issue appeared somewhere between petsc 3.19.4 and 3.23.4. The example is a variation of ../ksp/tutorials/ex2f.F90, where I've added the nested log viewer with one event as well as the solution of a small system on rank zero. When running on mulitple procs the example hangs during PetscLogView with the backtrace below. The configure.log is also attached in the hope that you can replicate the issue. Chris #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 , src=1, tag=-12, comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling ( sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) at base/coll_base_allreduce.c:247 #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630, algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142 #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) at coll_tuned_decision_fixed.c:216 #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaecb80) at coll_hcoll_ops.c:217 #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30) at allreduce.c:123 #6 0x0000155553eabede in MPIU_Allreduce_Private () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #15 0x0000155553e56232 in PetscLogHandlerView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #16 0x0000155553e588c3 in PetscLogView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #17 0x0000155553e40eb5 in petsclogview_ () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #18 0x0000000000402c8b in MAIN__ () #19 0x00000000004023df in main () [cid:ii_197ebccaa1d27ee6ef21] dr. ir. Christiaan Klaij | senior researcher Research & Development | CFD Development T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGPWK5ac8$ [Facebook] [LinkedIn] [YouTube] -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGDDw7FtA$ -------------- next part -------------- A non-text attachment was scrubbed... Name: image198746.png Type: image/png Size: 5004 bytes Desc: image198746.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image542473.png Type: image/png Size: 487 bytes Desc: image542473.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image555176.png Type: image/png Size: 504 bytes Desc: image555176.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image269837.png Type: image/png Size: 482 bytes Desc: image269837.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: out Type: application/octet-stream Size: 8300 bytes Desc: out URL: From knepley at gmail.com Thu Jul 10 07:59:54 2025 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 10 Jul 2025 08:59:54 -0400 Subject: [petsc-users] problem with nested logging, standalone example In-Reply-To: References: Message-ID: On Thu, Jul 10, 2025 at 8:46?AM Klaij, Christiaan wrote: > Hi Matt, > > Attached is the output of valgrind: > > $ mpirun -mca coll_hcoll_enable 0 -n 2 valgrind --track-origins=yes > ./ex2f-cklaij-dbg -pc_type jacobi -ksp_monitor_short > -ksp_gmres_cgs_refinement_type refine_always > out 2>&1 > Hmm, so no MPI error when running with valgrind? It looks like Junchao and I cannot reproduce here. It is puzzling. Would you be able to try MPICH instead? Thanks, Matt > Chris > > > ________________________________________ > From: Matthew Knepley > Sent: Thursday, July 10, 2025 1:37 PM > To: Klaij, Christiaan > Cc: Junchao Zhang; PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > On Thu, Jul 10, 2025 at 4:39?AM Klaij, Christiaan via petsc-users < > petsc-users at mcs.anl.gov> wrote: > An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0, > the code does not hang but gives the error below. > > The error on its face should be impossible. On line 289, we pass pointers > to two variables on the stack. This would seem to indicate more general > memory corruption. > > I know we asked before, but have you run under Address Sanitizer or > Valgrind? > > Thanks, > > Matt > > Chris > > > $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi > -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 1.11803 > 1 KSP Residual norm 0.591608 > 2 KSP Residual norm 0.316228 > 3 KSP Residual norm < 1.e-11 > 0 KSP Residual norm 0.707107 > 1 KSP Residual norm 0.408248 > 2 KSP Residual norm < 1.e-11 > Norm of error < 1.e-12 iterations 3 > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: General MPI error > [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer > [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!ekmHYyjQCQOSoZsuiOVaLC1ES7g6V5BR3QAM29XZk9xse4rZ3cQBIP5mk--PyGf6YJ9GhRyxHroRTn_izD5g$ < > https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcs91fyp4$> > for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 > [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on > login1 by cklaij Thu Jul 10 10:33:33 2025 > [1]PETSC ERROR: Configure options: > --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs > --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 > --with-mpe=0 --with-debugging=0 --download-superlu_dist= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!ekmHYyjQCQOSoZsuiOVaLC1ES7g6V5BR3QAM29XZk9xse4rZ3cQBIP5mk--PyGf6YJ9GhRyxHroRThmsqdKy$ < > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcEB0dwdE$> > --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 > --download-parmetis= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!ekmHYyjQCQOSoZsuiOVaLC1ES7g6V5BR3QAM29XZk9xse4rZ3cQBIP5mk--PyGf6YJ9GhRyxHroRTo_cr9O7$ < > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcW9tvX1c$> > --download-metis= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!ekmHYyjQCQOSoZsuiOVaLC1ES7g6V5BR3QAM29XZk9xse4rZ3cQBIP5mk--PyGf6YJ9GhRyxHroRTs3JgUlY$ < > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcI1wRWu4$> > --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild > --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall > -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall > -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall > -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall > -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops > -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime > -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops > -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime > -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops > -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime > -Wno-unused-function -O3 -DNDEBUG" > [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 > [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377 > [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 > [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 > [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 > [1]PETSC ERROR: #7 PetscLogHandlerView() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 > [1]PETSC ERROR: #8 PetscLogView() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040 > [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301 > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF > Proc: [[55228,1],1] > Errorcode: 98 > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > prterun has exited due to process rank 1 with PID 0 on node login1 calling > "abort". This may have caused other processes in the application to be > terminated by signals sent by prterun (as reported here). > -------------------------------------------------------------------------- > > ________________________________________ > [cid:ii_197f41eaf2e74966e3f1] > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44 | > https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!ekmHYyjQCQOSoZsuiOVaLC1ES7g6V5BR3QAM29XZk9xse4rZ3cQBIP5mk--PyGf6YJ9GhRyxHroRTjNXiH1z$ < > https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcwyIuD3g$ > > > [Facebook]< > https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmc0UAPFx4$ > > > [LinkedIn]< > https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmc0f6IfnU$ > > > [YouTube]< > https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcDphiKcc$ > > > > > From: Klaij, Christiaan > > Sent: Thursday, July 10, 2025 10:15 AM > To: Junchao Zhang > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > Hi Junchao, > > Thanks for testing. I've fixed the error but unfortunately that doesn't > change the behavior, the code still hangs as before, with the same stack > trace... > > Chris > > ________________________________________ > From: Junchao Zhang junchao.zhang at gmail.com>> > Sent: Tuesday, July 8, 2025 10:58 PM > To: Klaij, Christiaan > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > Hi, Chris, > First, I had to fix an error in your test by adding " > PetscCallA(MatSetFromOptions(AA,ierr))" at line 254. > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Mat object's type is not set: Argument # 1 > ... > [0]PETSC ERROR: #1 MatSetValues() at > /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503 > [0]PETSC ERROR: #2 ex2f.F90:258 > > Then I could ran the test without problems > mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short > -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 1.11803 > 1 KSP Residual norm 0.591608 > 2 KSP Residual norm 0.316228 > 3 KSP Residual norm < 1.e-11 > 0 KSP Residual norm 0.707107 > 1 KSP Residual norm 0.408248 > 2 KSP Residual norm < 1.e-11 > Norm of error < 1.e-12 iterations 3 > > I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with > ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran > --download-openmpi --with-ssl=0 --with-shared-libraries=1 > CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" > CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " > COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" > CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " > FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 > -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 > -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 > -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 > -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 > -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 > -DNDEBUG" > > Could you fix the error and retry? > > --Junchao Zhang > > > On Sun, Jul 6, 2025 at 12:57?PM Klaij, Christiaan via petsc-users < > petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>> wrote: > Attached is a standalone example of the issue described in the > earlier thread "problem with nested logging". The issue appeared > somewhere between petsc 3.19.4 and 3.23.4. > > The example is a variation of ../ksp/tutorials/ex2f.F90, where > I've added the nested log viewer with one event as well as the > solution of a small system on rank zero. > > When running on mulitple procs the example hangs during > PetscLogView with the backtrace below. The configure.log is also > attached in the hope that you can replicate the issue. > > Chris > > > #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, > datatype=0x15554c9ef900 , src=1, tag=-12, > comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 > #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > at base/coll_base_allreduce.c:247 > #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630, > algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142 > #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > at coll_tuned_decision_fixed.c:216 > #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, > rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaecb80) > at coll_hcoll_ops.c:217 > #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, > recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 > , op=0x15554ca28980 , comm=0x7f1e30) > at allreduce.c:123 > #6 0x0000155553eabede in MPIU_Allreduce_Private () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #15 0x0000155553e56232 in PetscLogHandlerView () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #16 0x0000155553e588c3 in PetscLogView () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #17 0x0000155553e40eb5 in petsclogview_ () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #18 0x0000000000402c8b in MAIN__ () > #19 0x00000000004023df in main () > [cid:ii_197ebccaa1d27ee6ef21] > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!ekmHYyjQCQOSoZsuiOVaLC1ES7g6V5BR3QAM29XZk9xse4rZ3cQBIP5mk--PyGf6YJ9GhRyxHroRTjNXiH1z$ < > https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!bhLWmMB1f8WaSDbp9K4m6tdMiaSZUO0fz4wfjGqnmEpFXM6dyY0NHVQFP9Rbvo2D9gl117ZjcVyTiAmcO8dj_LY$ > >< > https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imk4ivm_tE$ > > > [Facebook]< > https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkLNCvsiI$ > > > [LinkedIn]< > https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkrb79Ay4$ > > > [YouTube]< > https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkJiCoeLw$ > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ekmHYyjQCQOSoZsuiOVaLC1ES7g6V5BR3QAM29XZk9xse4rZ3cQBIP5mk--PyGf6YJ9GhRyxHroRTiXJnOgy$ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ekmHYyjQCQOSoZsuiOVaLC1ES7g6V5BR3QAM29XZk9xse4rZ3cQBIP5mk--PyGf6YJ9GhRyxHroRTiXJnOgy$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Arun.Soman-Pillai at stfc.ac.uk Thu Jul 10 03:35:29 2025 From: Arun.Soman-Pillai at stfc.ac.uk (Arun Soman Pillai - STFC UKRI) Date: Thu, 10 Jul 2025 08:35:29 +0000 Subject: [petsc-users] MatView for parallel jobs Message-ID: Hi, I am trying to view a matrix object when running a job in parallel. The application is BOUT++ (https://urldefense.us/v3/__https://boutproject.github.io/__;!!G_uCfscf7eWS!Z-f_HrmUaFAwAK4e2pBspL2oRFtfq9O3GtRCoS_Nb1e9y5j2Vjq6THJTnoot7XiMeRAxlNhskW0QrMV7x_v_VsBXVAsGHRrIifxc$ ) which uses petsc as solver. While the option -mat_view gives a jumbled output on STDOUT from across the processors, redirecting output to a specific file results in race condition. The only option that works for me is -mat_view binary with -viewer_binary_filename . I am processing this binary file with petsc4py as follows: from petsc4py import PETSc import numpy as np import matplotlib.pyplot as plt import glob def read_binary(filename): viewer = PETSc.Viewer().createMPIIO(filename, "r", comm=PETSc.COMM_WORLD) A = PETSc.Mat().load(viewer) A.assemble() print("Matrix type:", A.getType()) print("Matrix size:", A.getSize()) info = A.getInfo() print("Number of nonzeros:", info['nz_used']) print("Nonzeros allocated:", info['nz_allocated']) m, n = A.getSize() dense_mat = np.zeros((m, n), dtype=np.float64) for i in range(m): cols, vals = A.getRow(i) dense_mat[i, cols] = vals rstart, rend = A.getOwnershipRange() rows, cols = [], [] for i in range(rstart, rend): cols_i, _ = A.getRow(i) rows.extend([i] * len(cols_i)) cols.extend(cols_i) rows = np.array(rows) cols = np.array(cols) return dense_mat, rows, cols A, rows, cols = read_binary(?matrix?) This yields accurate number of non-zeros and matrix sparsity. However, the dense matrix turns out to be a null matrix. I wonder what is going wrong here. Thank you, best regards Arun Pillai -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jul 10 10:34:30 2025 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 10 Jul 2025 11:34:30 -0400 Subject: [petsc-users] MatView for parallel jobs In-Reply-To: References: Message-ID: On Thu, Jul 10, 2025 at 11:29?AM Arun Soman Pillai - STFC UKRI via petsc-users wrote: > Hi, > I am trying to view a matrix object when running a job in parallel. The > application is BOUT++ (https://urldefense.us/v3/__https://boutproject.github.io/__;!!G_uCfscf7eWS!ZLeEFXZF4ASJ2hjoTAuK-DQjJkNz_wPtN8mH3P557R1p2cltUNodOb0wrXsWAcQlamre0C1SJy4cn2cXZDeX$ > ) > which uses petsc as solver. > While the option -mat_view gives a jumbled output on STDOUT from across > the processors, > This means that something is terribly wrong. I suspect that you are not actually running in parallel, but rather launching independent jobs. That can happen when mpiexec does not match the MPI library that you linked with. Thanks, Matt > redirecting output to a specific file results in race condition. The only > option that works for me is *-mat_view binary* with *-viewer_binary_filename > **. *I am processing this binary file with petsc4py as follows: > > from petsc4py import PETSc > > import numpy as np > > import matplotlib.pyplot as plt > > import glob > > > > def read_binary(filename): > > viewer = PETSc.Viewer().createMPIIO(filename, "r", > comm=PETSc.COMM_WORLD) > > A = PETSc.Mat().load(viewer) > > A.assemble() > > print("Matrix type:", A.getType()) > > print("Matrix size:", A.getSize()) > > info = A.getInfo() > > print("Number of nonzeros:", info['nz_used']) > > print("Nonzeros allocated:", info['nz_allocated']) > > m, n = A.getSize() > > dense_mat = np.zeros((m, n), dtype=np.float64) > > for i in range(m): > > cols, vals = A.getRow(i) > > dense_mat[i, cols] = vals > > > > rstart, rend = A.getOwnershipRange() > > rows, cols = [], [] > > for i in range(rstart, rend): > > cols_i, _ = A.getRow(i) > > rows.extend([i] * len(cols_i)) > > cols.extend(cols_i) > > > > rows = np.array(rows) > > cols = np.array(cols) > > > > return dense_mat, rows, cols > > > > A, rows, cols = read_binary(?matrix?) > > > > This yields accurate number of non-zeros and matrix sparsity. However, the > dense matrix turns out to be a null matrix. I wonder what is going wrong > here. > > > > Thank you, best regards > > Arun Pillai > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ZLeEFXZF4ASJ2hjoTAuK-DQjJkNz_wPtN8mH3P557R1p2cltUNodOb0wrXsWAcQlamre0C1SJy4cn2k7zbgZ$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Thu Jul 10 10:55:34 2025 From: jroman at dsic.upv.es (Jose E. Roman) Date: Thu, 10 Jul 2025 15:55:34 +0000 Subject: [petsc-users] MatView for parallel jobs In-Reply-To: References: Message-ID: You can try this: B = A.convert('dense') dense_mat = B.getDenseArray(readonly=True) print(dense_mat) Jose > El 10 jul 2025, a las 10:35, Arun Soman Pillai - STFC UKRI via petsc-users escribi?: > > Hi, > I am trying to view a matrix object when running a job in parallel. The application is BOUT++ (https://urldefense.us/v3/__https://boutproject.github.io/__;!!G_uCfscf7eWS!bhfmQqtaPjwyTZrLkeA6UsVNt6voZUv4N62wF-OTafCdKcrUyeW0ezFOYpJJ9nZ7jhDW2dzrLpCAA9zF8LLqhGvU$ ) which uses petsc as solver. > While the option -mat_view gives a jumbled output on STDOUT from across the processors, redirecting output to a specific file results in race condition. The only option that works for me is -mat_view binary with -viewer_binary_filename . I am processing this binary file with petsc4py as follows: > from petsc4py import PETSc > import numpy as np > import matplotlib.pyplot as plt > import glob > def read_binary(filename): > viewer = PETSc.Viewer().createMPIIO(filename, "r", comm=PETSc.COMM_WORLD) > A = PETSc.Mat().load(viewer) > A.assemble() > print("Matrix type:", A.getType()) > print("Matrix size:", A.getSize()) > info = A.getInfo() > print("Number of nonzeros:", info['nz_used']) > print("Nonzeros allocated:", info['nz_allocated']) > m, n = A.getSize() > dense_mat = np.zeros((m, n), dtype=np.float64) > for i in range(m): > cols, vals = A.getRow(i) > dense_mat[i, cols] = vals > rstart, rend = A.getOwnershipRange() > rows, cols = [], [] > for i in range(rstart, rend): > cols_i, _ = A.getRow(i) > rows.extend([i] * len(cols_i)) > cols.extend(cols_i) > rows = np.array(rows) > cols = np.array(cols) > return dense_mat, rows, cols > A, rows, cols = read_binary(?matrix?) > This yields accurate number of non-zeros and matrix sparsity. However, the dense matrix turns out to be a null matrix. I wonder what is going wrong here. > Thank you, best regards > Arun Pillai From bsmith at petsc.dev Thu Jul 10 14:24:51 2025 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 10 Jul 2025 15:24:51 -0400 Subject: [petsc-users] problem with nested logging, standalone example In-Reply-To: References: Message-ID: <7CA4F141-CA33-4A2E-9035-65285D78B7BF@petsc.dev> "When running on mulitple procs the example hangs during PetscLogView with the backtrace below." Is it the same traceback on both MPI processes? Barry > On Jul 10, 2025, at 8:46?AM, Klaij, Christiaan via petsc-users wrote: > > Hi Matt, > > Attached is the output of valgrind: > > $ mpirun -mca coll_hcoll_enable 0 -n 2 valgrind --track-origins=yes ./ex2f-cklaij-dbg -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > out 2>&1 > > Chris > > > ________________________________________ > From: Matthew Knepley > Sent: Thursday, July 10, 2025 1:37 PM > To: Klaij, Christiaan > Cc: Junchao Zhang; PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > On Thu, Jul 10, 2025 at 4:39?AM Klaij, Christiaan via petsc-users > wrote: > An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0, the code does not hang but gives the error below. > > The error on its face should be impossible. On line 289, we pass pointers to two variables on the stack. This would seem to indicate more general memory corruption. > > I know we asked before, but have you run under Address Sanitizer or Valgrind? > > Thanks, > > Matt > > Chris > > > $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 1.11803 > 1 KSP Residual norm 0.591608 > 2 KSP Residual norm 0.316228 > 3 KSP Residual norm < 1.e-11 > 0 KSP Residual norm 0.707107 > 1 KSP Residual norm 0.408248 > 2 KSP Residual norm < 1.e-11 > Norm of error < 1.e-12 iterations 3 > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: General MPI error > [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer > [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGSUP46MA$ for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 > [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on login1 by cklaij Thu Jul 10 10:33:33 2025 > [1]PETSC ERROR: Configure options: --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 --with-mpe=0 --with-debugging=0 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGUnQa2TU$ --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGHhVsNGA$ --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGgrD72NI$ --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" > [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 > [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377 > [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 > [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 > [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 > [1]PETSC ERROR: #7 PetscLogHandlerView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 > [1]PETSC ERROR: #8 PetscLogView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040 > [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301 > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF > Proc: [[55228,1],1] > Errorcode: 98 > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > prterun has exited due to process rank 1 with PID 0 on node login1 calling > "abort". This may have caused other processes in the application to be > terminated by signals sent by prterun (as reported here). > -------------------------------------------------------------------------- > > ________________________________________ > [cid:ii_197f41eaf2e74966e3f1] > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGPWK5ac8$ > [Facebook] > [LinkedIn] > [YouTube] > > > From: Klaij, Christiaan > > Sent: Thursday, July 10, 2025 10:15 AM > To: Junchao Zhang > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > Hi Junchao, > > Thanks for testing. I've fixed the error but unfortunately that doesn't change the behavior, the code still hangs as before, with the same stack trace... > > Chris > > ________________________________________ > From: Junchao Zhang > > Sent: Tuesday, July 8, 2025 10:58 PM > To: Klaij, Christiaan > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > Hi, Chris, > First, I had to fix an error in your test by adding " PetscCallA(MatSetFromOptions(AA,ierr))" at line 254. > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Mat object's type is not set: Argument # 1 > ... > [0]PETSC ERROR: #1 MatSetValues() at /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503 > [0]PETSC ERROR: #2 ex2f.F90:258 > > Then I could ran the test without problems > mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 1.11803 > 1 KSP Residual norm 0.591608 > 2 KSP Residual norm 0.316228 > 3 KSP Residual norm < 1.e-11 > 0 KSP Residual norm 0.707107 > 1 KSP Residual norm 0.408248 > 2 KSP Residual norm < 1.e-11 > Norm of error < 1.e-12 iterations 3 > > I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with > ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-openmpi --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" > > Could you fix the error and retry? > > --Junchao Zhang > > > On Sun, Jul 6, 2025 at 12:57?PM Klaij, Christiaan via petsc-users >> wrote: > Attached is a standalone example of the issue described in the > earlier thread "problem with nested logging". The issue appeared > somewhere between petsc 3.19.4 and 3.23.4. > > The example is a variation of ../ksp/tutorials/ex2f.F90, where > I've added the nested log viewer with one event as well as the > solution of a small system on rank zero. > > When running on mulitple procs the example hangs during > PetscLogView with the backtrace below. The configure.log is also > attached in the hope that you can replicate the issue. > > Chris > > > #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, > datatype=0x15554c9ef900 , src=1, tag=-12, > comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 > #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > at base/coll_base_allreduce.c:247 > #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630, > algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142 > #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > at coll_tuned_decision_fixed.c:216 > #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, > rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaecb80) > at coll_hcoll_ops.c:217 > #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, > recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30) at allreduce.c:123 > #6 0x0000155553eabede in MPIU_Allreduce_Private () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #15 0x0000155553e56232 in PetscLogHandlerView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #16 0x0000155553e588c3 in PetscLogView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #17 0x0000155553e40eb5 in petsclogview_ () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #18 0x0000000000402c8b in MAIN__ () > #19 0x00000000004023df in main () > [cid:ii_197ebccaa1d27ee6ef21] > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGPWK5ac8$ > [Facebook] > [LinkedIn] > [YouTube] > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!e7vkmZAHAZIpI56iMhswN0ZKXp037eAMTO2HabEi8HbqA5lbgqcPqy_2Uq7z8w0NJj5-PZWTzOCSYRvGDDw7FtA$ > From junchao.zhang at gmail.com Thu Jul 10 14:46:00 2025 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 10 Jul 2025 14:46:00 -0500 Subject: [petsc-users] problem with nested logging, standalone example In-Reply-To: References: Message-ID: Adding -mca coll_hcoll_enable 0 didn't change anything at my end. Strange. --Junchao Zhang On Thu, Jul 10, 2025 at 3:39?AM Klaij, Christiaan wrote: > An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0, > the code does not hang but gives the error below. > > Chris > > > $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi > -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 1.11803 > 1 KSP Residual norm 0.591608 > 2 KSP Residual norm 0.316228 > 3 KSP Residual norm < 1.e-11 > 0 KSP Residual norm 0.707107 > 1 KSP Residual norm 0.408248 > 2 KSP Residual norm < 1.e-11 > Norm of error < 1.e-12 iterations 3 > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: General MPI error > [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer > [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJjkYxsN9$ for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 > [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on > login1 by cklaij Thu Jul 10 10:33:33 2025 > [1]PETSC ERROR: Configure options: > --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs > --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 > --with-mpe=0 --with-debugging=0 --download-superlu_dist= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJkouVHb2$ > --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 > --download-parmetis= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJrjo6-SP$ > --download-metis= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJhCc9MRE$ > --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild > --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall > -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall > -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall > -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall > -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops > -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime > -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops > -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime > -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops > -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime > -Wno-unused-function -O3 -DNDEBUG" > [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 > [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377 > [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 > [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 > [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 > [1]PETSC ERROR: #7 PetscLogHandlerView() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 > [1]PETSC ERROR: #8 PetscLogView() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040 > [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301 > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF > Proc: [[55228,1],1] > Errorcode: 98 > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > prterun has exited due to process rank 1 with PID 0 on node login1 calling > "abort". This may have caused other processes in the application to be > terminated by signals sent by prterun (as reported here). > -------------------------------------------------------------------------- > > ________________________________________ > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44 <+31%20317%2049%2033%2044> | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJhphmV4x$ > [image: Facebook] > [image: LinkedIn] > [image: YouTube] > > From: Klaij, Christiaan > Sent: Thursday, July 10, 2025 10:15 AM > To: Junchao Zhang > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > Hi Junchao, > > Thanks for testing. I've fixed the error but unfortunately that doesn't > change the behavior, the code still hangs as before, with the same stack > trace... > > Chris > > ________________________________________ > From: Junchao Zhang > Sent: Tuesday, July 8, 2025 10:58 PM > To: Klaij, Christiaan > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > Hi, Chris, > First, I had to fix an error in your test by adding " > PetscCallA(MatSetFromOptions(AA,ierr))" at line 254. > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Mat object's type is not set: Argument # 1 > ... > [0]PETSC ERROR: #1 MatSetValues() at > /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503 > [0]PETSC ERROR: #2 ex2f.F90:258 > > Then I could ran the test without problems > mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short > -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 1.11803 > 1 KSP Residual norm 0.591608 > 2 KSP Residual norm 0.316228 > 3 KSP Residual norm < 1.e-11 > 0 KSP Residual norm 0.707107 > 1 KSP Residual norm 0.408248 > 2 KSP Residual norm < 1.e-11 > Norm of error < 1.e-12 iterations 3 > > I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with > ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran > --download-openmpi --with-ssl=0 --with-shared-libraries=1 > CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" > CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " > COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" > CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " > FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 > -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 > -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 > -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 > -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 > -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 > -DNDEBUG" > > Could you fix the error and retry? > > --Junchao Zhang > > > On Sun, Jul 6, 2025 at 12:57?PM Klaij, Christiaan via petsc-users < > petsc-users at mcs.anl.gov> wrote: > Attached is a standalone example of the issue described in the > earlier thread "problem with nested logging". The issue appeared > somewhere between petsc 3.19.4 and 3.23.4. > > The example is a variation of ../ksp/tutorials/ex2f.F90, where > I've added the nested log viewer with one event as well as the > solution of a small system on rank zero. > > When running on mulitple procs the example hangs during > PetscLogView with the backtrace below. The configure.log is also > attached in the hope that you can replicate the issue. > > Chris > > > #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, > datatype=0x15554c9ef900 , src=1, tag=-12, > comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 > #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > at base/coll_base_allreduce.c:247 > #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630, > algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142 > #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > at coll_tuned_decision_fixed.c:216 > #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, > rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaecb80) > at coll_hcoll_ops.c:217 > #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, > recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 > , op=0x15554ca28980 , comm=0x7f1e30) > at allreduce.c:123 > #6 0x0000155553eabede in MPIU_Allreduce_Private () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #15 0x0000155553e56232 in PetscLogHandlerView () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #16 0x0000155553e588c3 in PetscLogView () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #17 0x0000155553e40eb5 in petsclogview_ () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #18 0x0000000000402c8b in MAIN__ () > #19 0x00000000004023df in main () > [cid:ii_197ebccaa1d27ee6ef21] > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJhphmV4x$ < > https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imk4ivm_tE$ > > > [Facebook]< > https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkLNCvsiI$ > > > [LinkedIn]< > https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkrb79Ay4$ > > > [YouTube]< > https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkJiCoeLw$ > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image198746.png Type: image/png Size: 5004 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image542473.png Type: image/png Size: 487 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image555176.png Type: image/png Size: 504 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image269837.png Type: image/png Size: 482 bytes Desc: not available URL: From bsmith at petsc.dev Thu Jul 10 16:10:18 2025 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 10 Jul 2025 17:10:18 -0400 Subject: [petsc-users] problem with nested logging, standalone example In-Reply-To: References: Message-ID: <7EBA5795-308C-423C-A6B5-919F0DB8E76A@petsc.dev> I cannot reproduce > On Jul 10, 2025, at 3:46?PM, Junchao Zhang wrote: > > Adding -mca coll_hcoll_enable 0 didn't change anything at my end. Strange. > > --Junchao Zhang > > > On Thu, Jul 10, 2025 at 3:39?AM Klaij, Christiaan > wrote: >> An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0, the code does not hang but gives the error below. >> >> Chris >> >> >> $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always >> 0 KSP Residual norm 1.11803 >> 1 KSP Residual norm 0.591608 >> 2 KSP Residual norm 0.316228 >> 3 KSP Residual norm < 1.e-11 >> 0 KSP Residual norm 0.707107 >> 1 KSP Residual norm 0.408248 >> 2 KSP Residual norm < 1.e-11 >> Norm of error < 1.e-12 iterations 3 >> [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [1]PETSC ERROR: General MPI error >> [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer >> [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!dzLyG0osjlx5EZwDofUQJACo9XbvN3TivivATn9dcksFDBoKYE4O0I12C64_AU4mdrWDRh7iTamrKAvqbE9wCAE$ for trouble shooting. >> [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 >> [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on login1 by cklaij Thu Jul 10 10:33:33 2025 >> [1]PETSC ERROR: Configure options: --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 --with-mpe=0 --with-debugging=0 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!dzLyG0osjlx5EZwDofUQJACo9XbvN3TivivATn9dcksFDBoKYE4O0I12C64_AU4mdrWDRh7iTamrKAvqUpCZ_t4$ --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!dzLyG0osjlx5EZwDofUQJACo9XbvN3TivivATn9dcksFDBoKYE4O0I12C64_AU4mdrWDRh7iTamrKAvqz-w4w_E$ --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!dzLyG0osjlx5EZwDofUQJACo9XbvN3TivivATn9dcksFDBoKYE4O0I12C64_AU4mdrWDRh7iTamrKAvqcGo9gWg$ --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" >> [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 >> [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377 >> [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >> [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 >> [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 >> [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 >> [1]PETSC ERROR: #7 PetscLogHandlerView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 >> [1]PETSC ERROR: #8 PetscLogView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040 >> [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301 >> -------------------------------------------------------------------------- >> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF >> Proc: [[55228,1],1] >> Errorcode: 98 >> >> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >> You may or may not see output from other processes, depending on >> exactly when Open MPI kills them. >> -------------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> prterun has exited due to process rank 1 with PID 0 on node login1 calling >> "abort". This may have caused other processes in the application to be >> terminated by signals sent by prterun (as reported here). >> -------------------------------------------------------------------------- >> >> ________________________________________ >> >> dr. ir. Christiaan Klaij | senior researcher >> Research & Development | CFD Development >> T +31?317?49?33?44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dzLyG0osjlx5EZwDofUQJACo9XbvN3TivivATn9dcksFDBoKYE4O0I12C64_AU4mdrWDRh7iTamrKAvqq6EDt2Q$ >> >> >> >> >> From: Klaij, Christiaan > >> Sent: Thursday, July 10, 2025 10:15 AM >> To: Junchao Zhang >> Cc: PETSc users list >> Subject: Re: [petsc-users] problem with nested logging, standalone example >> >> Hi Junchao, >> >> Thanks for testing. I've fixed the error but unfortunately that doesn't change the behavior, the code still hangs as before, with the same stack trace... >> >> Chris >> >> ________________________________________ >> From: Junchao Zhang > >> Sent: Tuesday, July 8, 2025 10:58 PM >> To: Klaij, Christiaan >> Cc: PETSc users list >> Subject: Re: [petsc-users] problem with nested logging, standalone example >> >> Hi, Chris, >> First, I had to fix an error in your test by adding " PetscCallA(MatSetFromOptions(AA,ierr))" at line 254. >> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [0]PETSC ERROR: Object is in wrong state >> [0]PETSC ERROR: Mat object's type is not set: Argument # 1 >> ... >> [0]PETSC ERROR: #1 MatSetValues() at /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503 >> [0]PETSC ERROR: #2 ex2f.F90:258 >> >> Then I could ran the test without problems >> mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always >> 0 KSP Residual norm 1.11803 >> 1 KSP Residual norm 0.591608 >> 2 KSP Residual norm 0.316228 >> 3 KSP Residual norm < 1.e-11 >> 0 KSP Residual norm 0.707107 >> 1 KSP Residual norm 0.408248 >> 2 KSP Residual norm < 1.e-11 >> Norm of error < 1.e-12 iterations 3 >> >> I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with >> ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-openmpi --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" >> >> Could you fix the error and retry? >> >> --Junchao Zhang >> >> >> On Sun, Jul 6, 2025 at 12:57?PM Klaij, Christiaan via petsc-users >> wrote: >> Attached is a standalone example of the issue described in the >> earlier thread "problem with nested logging". The issue appeared >> somewhere between petsc 3.19.4 and 3.23.4. >> >> The example is a variation of ../ksp/tutorials/ex2f.F90, where >> I've added the nested log viewer with one event as well as the >> solution of a small system on rank zero. >> >> When running on mulitple procs the example hangs during >> PetscLogView with the backtrace below. The configure.log is also >> attached in the hope that you can replicate the issue. >> >> Chris >> >> >> #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, >> datatype=0x15554c9ef900 , src=1, tag=-12, >> comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 >> #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling ( >> sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, >> dtype=0x15554c9ef900 , >> op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) >> at base/coll_base_allreduce.c:247 >> #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( >> sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, >> dtype=0x15554c9ef900 , >> op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630, >> algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142 >> #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( >> sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, >> dtype=0x15554c9ef900 , >> op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) >> at coll_tuned_decision_fixed.c:216 >> #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, >> rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , >> op=0x15554ca28980 , comm=0x7f1e30, module=0xaecb80) >> at coll_hcoll_ops.c:217 >> #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, >> recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30) at allreduce.c:123 >> #6 0x0000155553eabede in MPIU_Allreduce_Private () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> #15 0x0000155553e56232 in PetscLogHandlerView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> #16 0x0000155553e588c3 in PetscLogView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> #17 0x0000155553e40eb5 in petsclogview_ () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> #18 0x0000000000402c8b in MAIN__ () >> #19 0x00000000004023df in main () >> [cid:ii_197ebccaa1d27ee6ef21] >> dr. ir. Christiaan Klaij | senior researcher >> Research & Development | CFD Development >> T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dzLyG0osjlx5EZwDofUQJACo9XbvN3TivivATn9dcksFDBoKYE4O0I12C64_AU4mdrWDRh7iTamrKAvqq6EDt2Q$ >> [Facebook] >> [LinkedIn] >> [YouTube] >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From C.Klaij at marin.nl Fri Jul 11 03:10:09 2025 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Fri, 11 Jul 2025 08:10:09 +0000 Subject: [petsc-users] problem with nested logging, standalone example In-Reply-To: <7EBA5795-308C-423C-A6B5-919F0DB8E76A@petsc.dev> References: <7EBA5795-308C-423C-A6B5-919F0DB8E76A@petsc.dev> Message-ID: @Matt: no MPI errors indeed. I've tried with MPICH and I get the same hanging. @Barry: both stack traces aren't exactly the same, see a sample with MPICH below. If it cannot be reproduced at your side, I'm afraid this is another dead end. Thanks anyway, I really appreciate all your help. Chris (gdb) bt #0 0x000015555033bc2e in MPIDI_POSIX_mpi_release_gather_gather.constprop.0 () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #1 0x000015555033db8a in MPIDI_POSIX_mpi_allreduce_release_gather () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #2 0x000015555033e70f in MPIR_Allreduce () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #3 0x000015555033f22e in PMPI_Allreduce () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #4 0x0000155553f85d69 in MPIU_Allreduce_Count (comm=-2080374782, op=1476395020, dtype=1275072547, count=1, outbuf=0x7fffffffac70, inbuf=0x7fffffffac60) at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1839 #5 MPIU_Allreduce_Private (inbuf=inbuf at entry=0x7fffffffac60, outbuf=outbuf at entry=0x7fffffffac70, count=count at entry=1, dtype=dtype at entry=1275072547, op=op at entry=1476395020, comm=-2080374782) at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1869 #6 0x0000155553f33dbe in PetscPrintXMLNestedLinePerfResults ( viewer=viewer at entry=0x458890, name=name at entry=0x155554ef6a0d 'mbps\000', value=, minthreshold=minthreshold at entry=0, maxthreshold=maxthreshold at entry=0.01, minmaxtreshold=minmaxtreshold at entry=1.05) at /home/cklaij/petsc/petsc-3.23.4/src/sys/logging/handler/impls/nested/xmlviewer.c:255 (gdb) bt #0 0x000015554fed3b17 in clock_gettime at GLIBC_2.2.5 () from /lib64/libc.so.6 #1 0x0000155550b0de71 in ofi_gettime_ns () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #2 0x0000155550b0dec9 in ofi_gettime_ms () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #3 0x0000155550b2fab5 in sock_cq_sreadfrom () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #4 0x00001555505ca6f7 in MPIDI_OFI_progress () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #5 0x0000155550591fe9 in progress_test () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #6 0x00001555505924a3 in MPID_Progress_wait () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #7 0x000015555043463e in MPIR_Wait_state () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #8 0x000015555052ec49 in MPIC_Wait () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #9 0x000015555053093e in MPIC_Sendrecv () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #10 0x00001555504bf674 in MPIR_Allreduce_intra_recursive_doubling () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #11 0x00001555505b61de in MPIDI_OFI_mpi_finalize_hook () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 ________________________________________ From: Barry Smith Sent: Thursday, July 10, 2025 11:10 PM To: Junchao Zhang Cc: Klaij, Christiaan; PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example I cannot reproduce On Jul 10, 2025, at 3:46?PM, Junchao Zhang wrote: Adding -mca coll_hcoll_enable 0 didn't change anything at my end. Strange. --Junchao Zhang On Thu, Jul 10, 2025 at 3:39?AM Klaij, Christiaan > wrote: An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0, the code does not hang but gives the error below. Chris $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always 0 KSP Residual norm 1.11803 1 KSP Residual norm 0.591608 2 KSP Residual norm 0.316228 3 KSP Residual norm < 1.e-11 0 KSP Residual norm 0.707107 1 KSP Residual norm 0.408248 2 KSP Residual norm < 1.e-11 Norm of error < 1.e-12 iterations 3 [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: General MPI error [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!am07v8vkB2o1-JeSTK2rYDRbspJR7ZIcd4dbE__mK6TBYLCooG0VphfXUiL-NpKR_zpsevylv12jGNkkuD8ykp0$ for trouble shooting. [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on login1 by cklaij Thu Jul 10 10:33:33 2025 [1]PETSC ERROR: Configure options: --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 --with-mpe=0 --with-debugging=0 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!am07v8vkB2o1-JeSTK2rYDRbspJR7ZIcd4dbE__mK6TBYLCooG0VphfXUiL-NpKR_zpsevylv12jGNkkNAyu1sE$ --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!am07v8vkB2o1-JeSTK2rYDRbspJR7ZIcd4dbE__mK6TBYLCooG0VphfXUiL-NpKR_zpsevylv12jGNkk421ePTE$ --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!am07v8vkB2o1-JeSTK2rYDRbspJR7ZIcd4dbE__mK6TBYLCooG0VphfXUiL-NpKR_zpsevylv12jGNkkq5A3z6Q$ --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377 [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 [1]PETSC ERROR: #7 PetscLogHandlerView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 [1]PETSC ERROR: #8 PetscLogView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040 [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301 -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF Proc: [[55228,1],1] Errorcode: 98 NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- -------------------------------------------------------------------------- prterun has exited due to process rank 1 with PID 0 on node login1 calling "abort". This may have caused other processes in the application to be terminated by signals sent by prterun (as reported here). -------------------------------------------------------------------------- ________________________________________ dr. ir. Christiaan Klaij | senior researcher Research & Development | CFD Development T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!am07v8vkB2o1-JeSTK2rYDRbspJR7ZIcd4dbE__mK6TBYLCooG0VphfXUiL-NpKR_zpsevylv12jGNkkzhcV8nY$ From: Klaij, Christiaan > Sent: Thursday, July 10, 2025 10:15 AM To: Junchao Zhang Cc: PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example Hi Junchao, Thanks for testing. I've fixed the error but unfortunately that doesn't change the behavior, the code still hangs as before, with the same stack trace... Chris ________________________________________ From: Junchao Zhang > Sent: Tuesday, July 8, 2025 10:58 PM To: Klaij, Christiaan Cc: PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example Hi, Chris, First, I had to fix an error in your test by adding " PetscCallA(MatSetFromOptions(AA,ierr))" at line 254. [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Mat object's type is not set: Argument # 1 ... [0]PETSC ERROR: #1 MatSetValues() at /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503 [0]PETSC ERROR: #2 ex2f.F90:258 Then I could ran the test without problems mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always 0 KSP Residual norm 1.11803 1 KSP Residual norm 0.591608 2 KSP Residual norm 0.316228 3 KSP Residual norm < 1.e-11 0 KSP Residual norm 0.707107 1 KSP Residual norm 0.408248 2 KSP Residual norm < 1.e-11 Norm of error < 1.e-12 iterations 3 I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-openmpi --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" Could you fix the error and retry? --Junchao Zhang On Sun, Jul 6, 2025 at 12:57?PM Klaij, Christiaan via petsc-users >> wrote: Attached is a standalone example of the issue described in the earlier thread "problem with nested logging". The issue appeared somewhere between petsc 3.19.4 and 3.23.4. The example is a variation of ../ksp/tutorials/ex2f.F90, where I've added the nested log viewer with one event as well as the solution of a small system on rank zero. When running on mulitple procs the example hangs during PetscLogView with the backtrace below. The configure.log is also attached in the hope that you can replicate the issue. Chris #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 , src=1, tag=-12, comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling ( sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) at base/coll_base_allreduce.c:247 #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630, algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142 #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) at coll_tuned_decision_fixed.c:216 #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaecb80) at coll_hcoll_ops.c:217 #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30) at allreduce.c:123 #6 0x0000155553eabede in MPIU_Allreduce_Private () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #15 0x0000155553e56232 in PetscLogHandlerView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #16 0x0000155553e588c3 in PetscLogView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #17 0x0000155553e40eb5 in petsclogview_ () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #18 0x0000000000402c8b in MAIN__ () #19 0x00000000004023df in main () [cid:ii_197ebccaa1d27ee6ef21] dr. ir. Christiaan Klaij | senior researcher Research & Development | CFD Development T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!am07v8vkB2o1-JeSTK2rYDRbspJR7ZIcd4dbE__mK6TBYLCooG0VphfXUiL-NpKR_zpsevylv12jGNkkzhcV8nY$ [Facebook] [LinkedIn] [YouTube] From C.Klaij at marin.nl Fri Jul 11 06:58:15 2025 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Fri, 11 Jul 2025 11:58:15 +0000 Subject: [petsc-users] problem with nested logging, standalone example In-Reply-To: References: <7EBA5795-308C-423C-A6B5-919F0DB8E76A@petsc.dev> Message-ID: In summary for future reference: - tested 3 different machines, two at Marin, one at the national HPC - tested 3 different mpi implementation (intelmpi, openmpi and mpich) - tested openmpi in both release and debug - tested 2 different compilers (intel and gnu), both older and very recent versions - tested with the most basic config (./configure --with-cxx=0 --with-debugging=0 --download-mpich) All of these test either segfault, or hang or error-out at the call to PetscLogView. Chris ________________________________________ From: Klaij, Christiaan Sent: Friday, July 11, 2025 10:10 AM To: Barry Smith; Junchao Zhang Cc: PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example @Matt: no MPI errors indeed. I've tried with MPICH and I get the same hanging. @Barry: both stack traces aren't exactly the same, see a sample with MPICH below. If it cannot be reproduced at your side, I'm afraid this is another dead end. Thanks anyway, I really appreciate all your help. Chris (gdb) bt #0 0x000015555033bc2e in MPIDI_POSIX_mpi_release_gather_gather.constprop.0 () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #1 0x000015555033db8a in MPIDI_POSIX_mpi_allreduce_release_gather () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #2 0x000015555033e70f in MPIR_Allreduce () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #3 0x000015555033f22e in PMPI_Allreduce () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #4 0x0000155553f85d69 in MPIU_Allreduce_Count (comm=-2080374782, op=1476395020, dtype=1275072547, count=1, outbuf=0x7fffffffac70, inbuf=0x7fffffffac60) at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1839 #5 MPIU_Allreduce_Private (inbuf=inbuf at entry=0x7fffffffac60, outbuf=outbuf at entry=0x7fffffffac70, count=count at entry=1, dtype=dtype at entry=1275072547, op=op at entry=1476395020, comm=-2080374782) at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1869 #6 0x0000155553f33dbe in PetscPrintXMLNestedLinePerfResults ( viewer=viewer at entry=0x458890, name=name at entry=0x155554ef6a0d 'mbps\000', value=, minthreshold=minthreshold at entry=0, maxthreshold=maxthreshold at entry=0.01, minmaxtreshold=minmaxtreshold at entry=1.05) at /home/cklaij/petsc/petsc-3.23.4/src/sys/logging/handler/impls/nested/xmlviewer.c:255 (gdb) bt #0 0x000015554fed3b17 in clock_gettime at GLIBC_2.2.5 () from /lib64/libc.so.6 #1 0x0000155550b0de71 in ofi_gettime_ns () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #2 0x0000155550b0dec9 in ofi_gettime_ms () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #3 0x0000155550b2fab5 in sock_cq_sreadfrom () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #4 0x00001555505ca6f7 in MPIDI_OFI_progress () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #5 0x0000155550591fe9 in progress_test () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #6 0x00001555505924a3 in MPID_Progress_wait () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #7 0x000015555043463e in MPIR_Wait_state () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #8 0x000015555052ec49 in MPIC_Wait () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #9 0x000015555053093e in MPIC_Sendrecv () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #10 0x00001555504bf674 in MPIR_Allreduce_intra_recursive_doubling () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 #11 0x00001555505b61de in MPIDI_OFI_mpi_finalize_hook () from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 ________________________________________ From: Barry Smith Sent: Thursday, July 10, 2025 11:10 PM To: Junchao Zhang Cc: Klaij, Christiaan; PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example I cannot reproduce On Jul 10, 2025, at 3:46?PM, Junchao Zhang wrote: Adding -mca coll_hcoll_enable 0 didn't change anything at my end. Strange. --Junchao Zhang On Thu, Jul 10, 2025 at 3:39?AM Klaij, Christiaan > wrote: An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0, the code does not hang but gives the error below. Chris $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always 0 KSP Residual norm 1.11803 1 KSP Residual norm 0.591608 2 KSP Residual norm 0.316228 3 KSP Residual norm < 1.e-11 0 KSP Residual norm 0.707107 1 KSP Residual norm 0.408248 2 KSP Residual norm < 1.e-11 Norm of error < 1.e-12 iterations 3 [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: General MPI error [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!bLtkvDWt0ZHeDWQE5mjc8s90ZmWUFrh0PICJvXaChdQGDYlSZ6OptJZWra04FJyleZbf86N02V4CGyfL4FBPz8Q$ for trouble shooting. [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on login1 by cklaij Thu Jul 10 10:33:33 2025 [1]PETSC ERROR: Configure options: --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 --with-mpe=0 --with-debugging=0 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!bLtkvDWt0ZHeDWQE5mjc8s90ZmWUFrh0PICJvXaChdQGDYlSZ6OptJZWra04FJyleZbf86N02V4CGyfLmrNYD0Q$ --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!bLtkvDWt0ZHeDWQE5mjc8s90ZmWUFrh0PICJvXaChdQGDYlSZ6OptJZWra04FJyleZbf86N02V4CGyfLnwhr6fw$ --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!bLtkvDWt0ZHeDWQE5mjc8s90ZmWUFrh0PICJvXaChdQGDYlSZ6OptJZWra04FJyleZbf86N02V4CGyfLkkIrZY0$ --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377 [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 [1]PETSC ERROR: #7 PetscLogHandlerView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 [1]PETSC ERROR: #8 PetscLogView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040 [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301 -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF Proc: [[55228,1],1] Errorcode: 98 NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- -------------------------------------------------------------------------- prterun has exited due to process rank 1 with PID 0 on node login1 calling "abort". This may have caused other processes in the application to be terminated by signals sent by prterun (as reported here). -------------------------------------------------------------------------- ________________________________________ dr. ir. Christiaan Klaij | senior researcher Research & Development | CFD Development T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!bLtkvDWt0ZHeDWQE5mjc8s90ZmWUFrh0PICJvXaChdQGDYlSZ6OptJZWra04FJyleZbf86N02V4CGyfLJ_1BCVo$ From: Klaij, Christiaan > Sent: Thursday, July 10, 2025 10:15 AM To: Junchao Zhang Cc: PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example Hi Junchao, Thanks for testing. I've fixed the error but unfortunately that doesn't change the behavior, the code still hangs as before, with the same stack trace... Chris ________________________________________ From: Junchao Zhang > Sent: Tuesday, July 8, 2025 10:58 PM To: Klaij, Christiaan Cc: PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example Hi, Chris, First, I had to fix an error in your test by adding " PetscCallA(MatSetFromOptions(AA,ierr))" at line 254. [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Mat object's type is not set: Argument # 1 ... [0]PETSC ERROR: #1 MatSetValues() at /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503 [0]PETSC ERROR: #2 ex2f.F90:258 Then I could ran the test without problems mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always 0 KSP Residual norm 1.11803 1 KSP Residual norm 0.591608 2 KSP Residual norm 0.316228 3 KSP Residual norm < 1.e-11 0 KSP Residual norm 0.707107 1 KSP Residual norm 0.408248 2 KSP Residual norm < 1.e-11 Norm of error < 1.e-12 iterations 3 I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-openmpi --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" Could you fix the error and retry? --Junchao Zhang On Sun, Jul 6, 2025 at 12:57?PM Klaij, Christiaan via petsc-users >> wrote: Attached is a standalone example of the issue described in the earlier thread "problem with nested logging". The issue appeared somewhere between petsc 3.19.4 and 3.23.4. The example is a variation of ../ksp/tutorials/ex2f.F90, where I've added the nested log viewer with one event as well as the solution of a small system on rank zero. When running on mulitple procs the example hangs during PetscLogView with the backtrace below. The configure.log is also attached in the hope that you can replicate the issue. Chris #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 , src=1, tag=-12, comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling ( sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) at base/coll_base_allreduce.c:247 #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630, algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142 #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) at coll_tuned_decision_fixed.c:216 #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30, module=0xaecb80) at coll_hcoll_ops.c:217 #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30) at allreduce.c:123 #6 0x0000155553eabede in MPIU_Allreduce_Private () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #15 0x0000155553e56232 in PetscLogHandlerView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #16 0x0000155553e588c3 in PetscLogView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #17 0x0000155553e40eb5 in petsclogview_ () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 #18 0x0000000000402c8b in MAIN__ () #19 0x00000000004023df in main () [cid:ii_197ebccaa1d27ee6ef21] dr. ir. Christiaan Klaij | senior researcher Research & Development | CFD Development T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!bLtkvDWt0ZHeDWQE5mjc8s90ZmWUFrh0PICJvXaChdQGDYlSZ6OptJZWra04FJyleZbf86N02V4CGyfLJ_1BCVo$ [Facebook] [LinkedIn] [YouTube] From junchao.zhang at gmail.com Fri Jul 11 10:39:43 2025 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 11 Jul 2025 10:39:43 -0500 Subject: [petsc-users] problem with nested logging, standalone example In-Reply-To: References: <7EBA5795-308C-423C-A6B5-919F0DB8E76A@petsc.dev> Message-ID: Were they all tested with your ex2f.F90 example? --Junchao Zhang On Fri, Jul 11, 2025 at 6:58?AM Klaij, Christiaan wrote: > In summary for future reference: > - tested 3 different machines, two at Marin, one at the national HPC > - tested 3 different mpi implementation (intelmpi, openmpi and mpich) > - tested openmpi in both release and debug > - tested 2 different compilers (intel and gnu), both older and very recent > versions > - tested with the most basic config (./configure --with-cxx=0 > --with-debugging=0 --download-mpich) > > All of these test either segfault, or hang or error-out at the call to > PetscLogView. > > Chris > > ________________________________________ > From: Klaij, Christiaan > Sent: Friday, July 11, 2025 10:10 AM > To: Barry Smith; Junchao Zhang > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > @Matt: no MPI errors indeed. I've tried with MPICH and I get the same > hanging. > @Barry: both stack traces aren't exactly the same, see a sample with MPICH > below. > > If it cannot be reproduced at your side, I'm afraid this is another dead > end. Thanks anyway, I really appreciate all your help. > > Chris > > (gdb) bt > #0 0x000015555033bc2e in > MPIDI_POSIX_mpi_release_gather_gather.constprop.0 () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #1 0x000015555033db8a in MPIDI_POSIX_mpi_allreduce_release_gather () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #2 0x000015555033e70f in MPIR_Allreduce () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #3 0x000015555033f22e in PMPI_Allreduce () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #4 0x0000155553f85d69 in MPIU_Allreduce_Count (comm=-2080374782, > op=1476395020, dtype=1275072547, count=1, outbuf=0x7fffffffac70, > inbuf=0x7fffffffac60) > at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1839 > #5 MPIU_Allreduce_Private (inbuf=inbuf at entry=0x7fffffffac60, > outbuf=outbuf at entry=0x7fffffffac70, count=count at entry=1, > dtype=dtype at entry=1275072547, op=op at entry=1476395020, > comm=-2080374782) > at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1869 > #6 0x0000155553f33dbe in PetscPrintXMLNestedLinePerfResults ( > viewer=viewer at entry=0x458890, name=name at entry=0x155554ef6a0d > 'mbps\000', > value=, minthreshold=minthreshold at entry=0, > maxthreshold=maxthreshold at entry=0.01, > minmaxtreshold=minmaxtreshold at entry=1.05) > at > /home/cklaij/petsc/petsc-3.23.4/src/sys/logging/handler/impls/nested/xmlviewer.c:255 > > > (gdb) bt > #0 0x000015554fed3b17 in clock_gettime at GLIBC_2.2.5 () from > /lib64/libc.so.6 > #1 0x0000155550b0de71 in ofi_gettime_ns () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #2 0x0000155550b0dec9 in ofi_gettime_ms () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #3 0x0000155550b2fab5 in sock_cq_sreadfrom () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #4 0x00001555505ca6f7 in MPIDI_OFI_progress () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #5 0x0000155550591fe9 in progress_test () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #6 0x00001555505924a3 in MPID_Progress_wait () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #7 0x000015555043463e in MPIR_Wait_state () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #8 0x000015555052ec49 in MPIC_Wait () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #9 0x000015555053093e in MPIC_Sendrecv () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #10 0x00001555504bf674 in MPIR_Allreduce_intra_recursive_doubling () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #11 0x00001555505b61de in MPIDI_OFI_mpi_finalize_hook () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > > ________________________________________ > From: Barry Smith > Sent: Thursday, July 10, 2025 11:10 PM > To: Junchao Zhang > Cc: Klaij, Christiaan; PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > > I cannot reproduce > > On Jul 10, 2025, at 3:46?PM, Junchao Zhang > wrote: > > Adding -mca coll_hcoll_enable 0 didn't change anything at my end. Strange. > > --Junchao Zhang > > > On Thu, Jul 10, 2025 at 3:39?AM Klaij, Christiaan > wrote: > An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0, > the code does not hang but gives the error below. > > Chris > > > $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi > -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 1.11803 > 1 KSP Residual norm 0.591608 > 2 KSP Residual norm 0.316228 > 3 KSP Residual norm < 1.e-11 > 0 KSP Residual norm 0.707107 > 1 KSP Residual norm 0.408248 > 2 KSP Residual norm < 1.e-11 > Norm of error < 1.e-12 iterations 3 > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: General MPI error > [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer > [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!ZyiwrArIFFVQEy3iWRduuvJj55fiVzKR94DvhrQs9yD59inAYaV8ixbXwagVSTSKcTFN80eCIfFWyH6SWODSjHOBKqtO$ < > https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJjkYxsN9$> > for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 > [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on > login1 by cklaij Thu Jul 10 10:33:33 2025 > [1]PETSC ERROR: Configure options: > --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs > --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 > --with-mpe=0 --with-debugging=0 --download-superlu_dist= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!ZyiwrArIFFVQEy3iWRduuvJj55fiVzKR94DvhrQs9yD59inAYaV8ixbXwagVSTSKcTFN80eCIfFWyH6SWODSjEz07NgJ$ < > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJkouVHb2$> > --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 > --download-parmetis= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!ZyiwrArIFFVQEy3iWRduuvJj55fiVzKR94DvhrQs9yD59inAYaV8ixbXwagVSTSKcTFN80eCIfFWyH6SWODSjK2HNCeG$ < > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJrjo6-SP$> > --download-metis= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!ZyiwrArIFFVQEy3iWRduuvJj55fiVzKR94DvhrQs9yD59inAYaV8ixbXwagVSTSKcTFN80eCIfFWyH6SWODSjL4z_WUP$ < > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJhCc9MRE$> > --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild > --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall > -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall > -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall > -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall > -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops > -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime > -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops > -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime > -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops > -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime > -Wno-unused-function -O3 -DNDEBUG" > [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 > [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377 > [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 > [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 > [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 > [1]PETSC ERROR: #7 PetscLogHandlerView() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 > [1]PETSC ERROR: #8 PetscLogView() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040 > [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301 > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF > Proc: [[55228,1],1] > Errorcode: 98 > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > prterun has exited due to process rank 1 with PID 0 on node login1 calling > "abort". This may have caused other processes in the application to be > terminated by signals sent by prterun (as reported here). > -------------------------------------------------------------------------- > > ________________________________________ > > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44 | > https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!ZyiwrArIFFVQEy3iWRduuvJj55fiVzKR94DvhrQs9yD59inAYaV8ixbXwagVSTSKcTFN80eCIfFWyH6SWODSjDGFyRHJ$ < > https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJrOqapgp$ > > > < > https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJoD4fuV7$ > > > < > https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJospHf95$ > > > < > https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJrpsjB_W$ > > > > > From: Klaij, Christiaan > > Sent: Thursday, July 10, 2025 10:15 AM > To: Junchao Zhang > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > Hi Junchao, > > Thanks for testing. I've fixed the error but unfortunately that doesn't > change the behavior, the code still hangs as before, with the same stack > trace... > > Chris > > ________________________________________ > From: Junchao Zhang junchao.zhang at gmail.com>> > Sent: Tuesday, July 8, 2025 10:58 PM > To: Klaij, Christiaan > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > Hi, Chris, > First, I had to fix an error in your test by adding " > PetscCallA(MatSetFromOptions(AA,ierr))" at line 254. > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Mat object's type is not set: Argument # 1 > ... > [0]PETSC ERROR: #1 MatSetValues() at > /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503 > [0]PETSC ERROR: #2 ex2f.F90:258 > > Then I could ran the test without problems > mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short > -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 1.11803 > 1 KSP Residual norm 0.591608 > 2 KSP Residual norm 0.316228 > 3 KSP Residual norm < 1.e-11 > 0 KSP Residual norm 0.707107 > 1 KSP Residual norm 0.408248 > 2 KSP Residual norm < 1.e-11 > Norm of error < 1.e-12 iterations 3 > > I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with > ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran > --download-openmpi --with-ssl=0 --with-shared-libraries=1 > CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" > CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " > COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" > CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " > FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 > -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 > -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 > -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 > -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 > -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 > -DNDEBUG" > > Could you fix the error and retry? > > --Junchao Zhang > > > On Sun, Jul 6, 2025 at 12:57?PM Klaij, Christiaan via petsc-users < > petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>> wrote: > Attached is a standalone example of the issue described in the > earlier thread "problem with nested logging". The issue appeared > somewhere between petsc 3.19.4 and 3.23.4. > > The example is a variation of ../ksp/tutorials/ex2f.F90, where > I've added the nested log viewer with one event as well as the > solution of a small system on rank zero. > > When running on mulitple procs the example hangs during > PetscLogView with the backtrace below. The configure.log is also > attached in the hope that you can replicate the issue. > > Chris > > > #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, > datatype=0x15554c9ef900 , src=1, tag=-12, > comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 > #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > at base/coll_base_allreduce.c:247 > #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630, > algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142 > #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > at coll_tuned_decision_fixed.c:216 > #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, > rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaecb80) > at coll_hcoll_ops.c:217 > #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, > recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 > , op=0x15554ca28980 , comm=0x7f1e30) > at allreduce.c:123 > #6 0x0000155553eabede in MPIU_Allreduce_Private () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #15 0x0000155553e56232 in PetscLogHandlerView () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #16 0x0000155553e588c3 in PetscLogView () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #17 0x0000155553e40eb5 in petsclogview_ () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #18 0x0000000000402c8b in MAIN__ () > #19 0x00000000004023df in main () > [cid:ii_197ebccaa1d27ee6ef21] > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!ZyiwrArIFFVQEy3iWRduuvJj55fiVzKR94DvhrQs9yD59inAYaV8ixbXwagVSTSKcTFN80eCIfFWyH6SWODSjDGFyRHJ$ < > https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJhphmV4x$ > >< > https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imk4ivm_tE$ > > > [Facebook]< > https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkLNCvsiI$ > > > [LinkedIn]< > https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkrb79Ay4$ > > > [YouTube]< > https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkJiCoeLw$ > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Jul 11 16:22:36 2025 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 11 Jul 2025 17:22:36 -0400 Subject: [petsc-users] problem with nested logging, standalone example In-Reply-To: References: <7EBA5795-308C-423C-A6B5-919F0DB8E76A@petsc.dev> Message-ID: And yet we cannot reproduce. Please tell us the exact PETSc version and MPI implementation versions. And reattach your reproducing example. And exactly how you run it. Can you reproduce it on an "ordinary" machine, say a Mac or Linux laptop. Barry If I could reproduce the problem here is how I would debug. I put use -start_in_debugger and then put break points in places which it seem problematic. Presumably I would end up with a hang with each MPI process in a "different place" and from that I may be able to determine how that happened. > On Jul 11, 2025, at 7:58?AM, Klaij, Christiaan wrote: > > In summary for future reference: > - tested 3 different machines, two at Marin, one at the national HPC > - tested 3 different mpi implementation (intelmpi, openmpi and mpich) > - tested openmpi in both release and debug > - tested 2 different compilers (intel and gnu), both older and very recent versions > - tested with the most basic config (./configure --with-cxx=0 --with-debugging=0 --download-mpich) > > All of these test either segfault, or hang or error-out at the call to PetscLogView. > > Chris > > ________________________________________ > From: Klaij, Christiaan > Sent: Friday, July 11, 2025 10:10 AM > To: Barry Smith; Junchao Zhang > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > @Matt: no MPI errors indeed. I've tried with MPICH and I get the same hanging. > @Barry: both stack traces aren't exactly the same, see a sample with MPICH below. > > If it cannot be reproduced at your side, I'm afraid this is another dead end. Thanks anyway, I really appreciate all your help. > > Chris > > (gdb) bt > #0 0x000015555033bc2e in MPIDI_POSIX_mpi_release_gather_gather.constprop.0 () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #1 0x000015555033db8a in MPIDI_POSIX_mpi_allreduce_release_gather () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #2 0x000015555033e70f in MPIR_Allreduce () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #3 0x000015555033f22e in PMPI_Allreduce () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #4 0x0000155553f85d69 in MPIU_Allreduce_Count (comm=-2080374782, > op=1476395020, dtype=1275072547, count=1, outbuf=0x7fffffffac70, > inbuf=0x7fffffffac60) > at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1839 > #5 MPIU_Allreduce_Private (inbuf=inbuf at entry=0x7fffffffac60, > outbuf=outbuf at entry=0x7fffffffac70, count=count at entry=1, > dtype=dtype at entry=1275072547, op=op at entry=1476395020, comm=-2080374782) > at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1869 > #6 0x0000155553f33dbe in PetscPrintXMLNestedLinePerfResults ( > viewer=viewer at entry=0x458890, name=name at entry=0x155554ef6a0d 'mbps\000', > value=, minthreshold=minthreshold at entry=0, > maxthreshold=maxthreshold at entry=0.01, > minmaxtreshold=minmaxtreshold at entry=1.05) > at /home/cklaij/petsc/petsc-3.23.4/src/sys/logging/handler/impls/nested/xmlviewer.c:255 > > > (gdb) bt > #0 0x000015554fed3b17 in clock_gettime at GLIBC_2.2.5 () from /lib64/libc.so.6 > #1 0x0000155550b0de71 in ofi_gettime_ns () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #2 0x0000155550b0dec9 in ofi_gettime_ms () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #3 0x0000155550b2fab5 in sock_cq_sreadfrom () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #4 0x00001555505ca6f7 in MPIDI_OFI_progress () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #5 0x0000155550591fe9 in progress_test () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #6 0x00001555505924a3 in MPID_Progress_wait () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #7 0x000015555043463e in MPIR_Wait_state () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #8 0x000015555052ec49 in MPIC_Wait () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #9 0x000015555053093e in MPIC_Sendrecv () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #10 0x00001555504bf674 in MPIR_Allreduce_intra_recursive_doubling () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #11 0x00001555505b61de in MPIDI_OFI_mpi_finalize_hook () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > > ________________________________________ > From: Barry Smith > Sent: Thursday, July 10, 2025 11:10 PM > To: Junchao Zhang > Cc: Klaij, Christiaan; PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > > I cannot reproduce > > On Jul 10, 2025, at 3:46?PM, Junchao Zhang wrote: > > Adding -mca coll_hcoll_enable 0 didn't change anything at my end. Strange. > > --Junchao Zhang > > > On Thu, Jul 10, 2025 at 3:39?AM Klaij, Christiaan > wrote: > An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0, the code does not hang but gives the error below. > > Chris > > > $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 1.11803 > 1 KSP Residual norm 0.591608 > 2 KSP Residual norm 0.316228 > 3 KSP Residual norm < 1.e-11 > 0 KSP Residual norm 0.707107 > 1 KSP Residual norm 0.408248 > 2 KSP Residual norm < 1.e-11 > Norm of error < 1.e-12 iterations 3 > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: General MPI error > [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer > [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!cXtuTTf7kilpktOAvL4X2xIl72hhvfnklH8dS4VBuh1CJ5LTt8I5kFBOJO8taiyBqpO3BZ0A0AINRRcHohe__EA$ for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 > [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on login1 by cklaij Thu Jul 10 10:33:33 2025 > [1]PETSC ERROR: Configure options: --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 --with-mpe=0 --with-debugging=0 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!cXtuTTf7kilpktOAvL4X2xIl72hhvfnklH8dS4VBuh1CJ5LTt8I5kFBOJO8taiyBqpO3BZ0A0AINRRcHKz6JNHs$ --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!cXtuTTf7kilpktOAvL4X2xIl72hhvfnklH8dS4VBuh1CJ5LTt8I5kFBOJO8taiyBqpO3BZ0A0AINRRcHlxt0gc0$ --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!cXtuTTf7kilpktOAvL4X2xIl72hhvfnklH8dS4VBuh1CJ5LTt8I5kFBOJO8taiyBqpO3BZ0A0AINRRcH8j4pep0$ --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" > [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 > [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377 > [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 > [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 > [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 > [1]PETSC ERROR: #7 PetscLogHandlerView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 > [1]PETSC ERROR: #8 PetscLogView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040 > [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301 > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF > Proc: [[55228,1],1] > Errorcode: 98 > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > prterun has exited due to process rank 1 with PID 0 on node login1 calling > "abort". This may have caused other processes in the application to be > terminated by signals sent by prterun (as reported here). > -------------------------------------------------------------------------- > > ________________________________________ > > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!cXtuTTf7kilpktOAvL4X2xIl72hhvfnklH8dS4VBuh1CJ5LTt8I5kFBOJO8taiyBqpO3BZ0A0AINRRcHmL7cVmc$ > > > > > > From: Klaij, Christiaan > > Sent: Thursday, July 10, 2025 10:15 AM > To: Junchao Zhang > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > Hi Junchao, > > Thanks for testing. I've fixed the error but unfortunately that doesn't change the behavior, the code still hangs as before, with the same stack trace... > > Chris > > ________________________________________ > From: Junchao Zhang > > Sent: Tuesday, July 8, 2025 10:58 PM > To: Klaij, Christiaan > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > Hi, Chris, > First, I had to fix an error in your test by adding " PetscCallA(MatSetFromOptions(AA,ierr))" at line 254. > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Mat object's type is not set: Argument # 1 > ... > [0]PETSC ERROR: #1 MatSetValues() at /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503 > [0]PETSC ERROR: #2 ex2f.F90:258 > > Then I could ran the test without problems > mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 1.11803 > 1 KSP Residual norm 0.591608 > 2 KSP Residual norm 0.316228 > 3 KSP Residual norm < 1.e-11 > 0 KSP Residual norm 0.707107 > 1 KSP Residual norm 0.408248 > 2 KSP Residual norm < 1.e-11 > Norm of error < 1.e-12 iterations 3 > > I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with > ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-openmpi --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" > > Could you fix the error and retry? > > --Junchao Zhang > > > On Sun, Jul 6, 2025 at 12:57?PM Klaij, Christiaan via petsc-users >> wrote: > Attached is a standalone example of the issue described in the > earlier thread "problem with nested logging". The issue appeared > somewhere between petsc 3.19.4 and 3.23.4. > > The example is a variation of ../ksp/tutorials/ex2f.F90, where > I've added the nested log viewer with one event as well as the > solution of a small system on rank zero. > > When running on mulitple procs the example hangs during > PetscLogView with the backtrace below. The configure.log is also > attached in the hope that you can replicate the issue. > > Chris > > > #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, > datatype=0x15554c9ef900 , src=1, tag=-12, > comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 > #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > at base/coll_base_allreduce.c:247 > #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630, > algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142 > #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > at coll_tuned_decision_fixed.c:216 > #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, > rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaecb80) > at coll_hcoll_ops.c:217 > #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, > recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30) at allreduce.c:123 > #6 0x0000155553eabede in MPIU_Allreduce_Private () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #15 0x0000155553e56232 in PetscLogHandlerView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #16 0x0000155553e588c3 in PetscLogView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #17 0x0000155553e40eb5 in petsclogview_ () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #18 0x0000000000402c8b in MAIN__ () > #19 0x00000000004023df in main () > [cid:ii_197ebccaa1d27ee6ef21] > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!cXtuTTf7kilpktOAvL4X2xIl72hhvfnklH8dS4VBuh1CJ5LTt8I5kFBOJO8taiyBqpO3BZ0A0AINRRcHmL7cVmc$ > [Facebook] > [LinkedIn] > [YouTube] > > From C.Klaij at marin.nl Mon Jul 14 02:58:15 2025 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Mon, 14 Jul 2025 07:58:15 +0000 Subject: [petsc-users] problem with nested logging, standalone example In-Reply-To: References: <7EBA5795-308C-423C-A6B5-919F0DB8E76A@petsc.dev> Message-ID: @Junchao: yes, all with my ex2f.F90 variation on two or three cores @Barry: it's really puzzling that you cannot reproduce. Can you try running it a dozen times in a row? And look at the report_performance.xml file? When it hangs I see some nan's, for instance here in the VecAXPY event: VecAXPY 0.5 0. 1. 1 0 self This is what I did in my latest attempt on the login node of our Rocky Linux 9 cluster: 1) download petsc-3.23.4.tar.gz from the petsc website 2) ./configure -prefix=~/petsc/install --with-cxx=0 --with-debugging=0 --with-mpi-dir=/cm/shared/apps/mpich/ge/gcc/64/3.4.2 3) adjust my example to this version of petsc (file is attached) 4) make ex2f-cklaij-dbg-v2 5) mpirun -n 2 ./ex2f-cklaij-dbg-v2 So the exact versions are: petsc-3.23.4, system mpich 3.4.2, system gcc 11.5.0 ________________________________________ From: Barry Smith Sent: Friday, July 11, 2025 11:22 PM To: Klaij, Christiaan Cc: Junchao Zhang; PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example And yet we cannot reproduce. Please tell us the exact PETSc version and MPI implementation versions. And reattach your reproducing example. And exactly how you run it. Can you reproduce it on an "ordinary" machine, say a Mac or Linux laptop. Barry If I could reproduce the problem here is how I would debug. I put use -start_in_debugger and then put break points in places which it seem problematic. Presumably I would end up with a hang with each MPI process in a "different place" and from that I may be able to determine how that happened. > On Jul 11, 2025, at 7:58?AM, Klaij, Christiaan wrote: > > In summary for future reference: > - tested 3 different machines, two at Marin, one at the national HPC > - tested 3 different mpi implementation (intelmpi, openmpi and mpich) > - tested openmpi in both release and debug > - tested 2 different compilers (intel and gnu), both older and very recent versions > - tested with the most basic config (./configure --with-cxx=0 --with-debugging=0 --download-mpich) > > All of these test either segfault, or hang or error-out at the call to PetscLogView. > > Chris > > ________________________________________ > From: Klaij, Christiaan > Sent: Friday, July 11, 2025 10:10 AM > To: Barry Smith; Junchao Zhang > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > @Matt: no MPI errors indeed. I've tried with MPICH and I get the same hanging. > @Barry: both stack traces aren't exactly the same, see a sample with MPICH below. > > If it cannot be reproduced at your side, I'm afraid this is another dead end. Thanks anyway, I really appreciate all your help. > > Chris > > (gdb) bt > #0 0x000015555033bc2e in MPIDI_POSIX_mpi_release_gather_gather.constprop.0 () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #1 0x000015555033db8a in MPIDI_POSIX_mpi_allreduce_release_gather () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #2 0x000015555033e70f in MPIR_Allreduce () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #3 0x000015555033f22e in PMPI_Allreduce () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #4 0x0000155553f85d69 in MPIU_Allreduce_Count (comm=-2080374782, > op=1476395020, dtype=1275072547, count=1, outbuf=0x7fffffffac70, > inbuf=0x7fffffffac60) > at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1839 > #5 MPIU_Allreduce_Private (inbuf=inbuf at entry=0x7fffffffac60, > outbuf=outbuf at entry=0x7fffffffac70, count=count at entry=1, > dtype=dtype at entry=1275072547, op=op at entry=1476395020, comm=-2080374782) > at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1869 > #6 0x0000155553f33dbe in PetscPrintXMLNestedLinePerfResults ( > viewer=viewer at entry=0x458890, name=name at entry=0x155554ef6a0d 'mbps\000', > value=, minthreshold=minthreshold at entry=0, > maxthreshold=maxthreshold at entry=0.01, > minmaxtreshold=minmaxtreshold at entry=1.05) > at /home/cklaij/petsc/petsc-3.23.4/src/sys/logging/handler/impls/nested/xmlviewer.c:255 > > > (gdb) bt > #0 0x000015554fed3b17 in clock_gettime at GLIBC_2.2.5 () from /lib64/libc.so.6 > #1 0x0000155550b0de71 in ofi_gettime_ns () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #2 0x0000155550b0dec9 in ofi_gettime_ms () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #3 0x0000155550b2fab5 in sock_cq_sreadfrom () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #4 0x00001555505ca6f7 in MPIDI_OFI_progress () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #5 0x0000155550591fe9 in progress_test () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #6 0x00001555505924a3 in MPID_Progress_wait () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #7 0x000015555043463e in MPIR_Wait_state () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #8 0x000015555052ec49 in MPIC_Wait () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #9 0x000015555053093e in MPIC_Sendrecv () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #10 0x00001555504bf674 in MPIR_Allreduce_intra_recursive_doubling () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #11 0x00001555505b61de in MPIDI_OFI_mpi_finalize_hook () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > > ________________________________________ > From: Barry Smith > Sent: Thursday, July 10, 2025 11:10 PM > To: Junchao Zhang > Cc: Klaij, Christiaan; PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > > I cannot reproduce > > On Jul 10, 2025, at 3:46?PM, Junchao Zhang wrote: > > Adding -mca coll_hcoll_enable 0 didn't change anything at my end. Strange. > > --Junchao Zhang > > > On Thu, Jul 10, 2025 at 3:39?AM Klaij, Christiaan > wrote: > An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0, the code does not hang but gives the error below. > > Chris > > > $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 1.11803 > 1 KSP Residual norm 0.591608 > 2 KSP Residual norm 0.316228 > 3 KSP Residual norm < 1.e-11 > 0 KSP Residual norm 0.707107 > 1 KSP Residual norm 0.408248 > 2 KSP Residual norm < 1.e-11 > Norm of error < 1.e-12 iterations 3 > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: General MPI error > [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer > [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK43J9p4SM$ for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 > [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on login1 by cklaij Thu Jul 10 10:33:33 2025 > [1]PETSC ERROR: Configure options: --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 --with-mpe=0 --with-debugging=0 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4VVy6P4U$ --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4-9b1K84$ --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4Y9uaqiQ$ --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" > [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 > [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377 > [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 > [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 > [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 > [1]PETSC ERROR: #7 PetscLogHandlerView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 > [1]PETSC ERROR: #8 PetscLogView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040 > [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301 > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF > Proc: [[55228,1],1] > Errorcode: 98 > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > prterun has exited due to process rank 1 with PID 0 on node login1 calling > "abort". This may have caused other processes in the application to be > terminated by signals sent by prterun (as reported here). > -------------------------------------------------------------------------- > > ________________________________________ > > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4BUEn1h8$ > > > > > > From: Klaij, Christiaan > > Sent: Thursday, July 10, 2025 10:15 AM > To: Junchao Zhang > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > Hi Junchao, > > Thanks for testing. I've fixed the error but unfortunately that doesn't change the behavior, the code still hangs as before, with the same stack trace... > > Chris > > ________________________________________ > From: Junchao Zhang > > Sent: Tuesday, July 8, 2025 10:58 PM > To: Klaij, Christiaan > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > Hi, Chris, > First, I had to fix an error in your test by adding " PetscCallA(MatSetFromOptions(AA,ierr))" at line 254. > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Mat object's type is not set: Argument # 1 > ... > [0]PETSC ERROR: #1 MatSetValues() at /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503 > [0]PETSC ERROR: #2 ex2f.F90:258 > > Then I could ran the test without problems > mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 1.11803 > 1 KSP Residual norm 0.591608 > 2 KSP Residual norm 0.316228 > 3 KSP Residual norm < 1.e-11 > 0 KSP Residual norm 0.707107 > 1 KSP Residual norm 0.408248 > 2 KSP Residual norm < 1.e-11 > Norm of error < 1.e-12 iterations 3 > > I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with > ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-openmpi --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" > > Could you fix the error and retry? > > --Junchao Zhang > > > On Sun, Jul 6, 2025 at 12:57?PM Klaij, Christiaan via petsc-users >> wrote: > Attached is a standalone example of the issue described in the > earlier thread "problem with nested logging". The issue appeared > somewhere between petsc 3.19.4 and 3.23.4. > > The example is a variation of ../ksp/tutorials/ex2f.F90, where > I've added the nested log viewer with one event as well as the > solution of a small system on rank zero. > > When running on mulitple procs the example hangs during > PetscLogView with the backtrace below. The configure.log is also > attached in the hope that you can replicate the issue. > > Chris > > > #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, > datatype=0x15554c9ef900 , src=1, tag=-12, > comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 > #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > at base/coll_base_allreduce.c:247 > #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630, > algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142 > #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > at coll_tuned_decision_fixed.c:216 > #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, > rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaecb80) > at coll_hcoll_ops.c:217 > #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, > recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30) at allreduce.c:123 > #6 0x0000155553eabede in MPIU_Allreduce_Private () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #15 0x0000155553e56232 in PetscLogHandlerView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #16 0x0000155553e588c3 in PetscLogView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #17 0x0000155553e40eb5 in petsclogview_ () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #18 0x0000000000402c8b in MAIN__ () > #19 0x00000000004023df in main () > [cid:ii_197ebccaa1d27ee6ef21] > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4BUEn1h8$ > [Facebook] > [LinkedIn] > [YouTube] > > -------------- next part -------------- A non-text attachment was scrubbed... Name: ex2f-cklaij-dbg-v2.F90 Type: text/x-fortran Size: 15500 bytes Desc: ex2f-cklaij-dbg-v2.F90 URL: From alexandre.scotto at irt-saintexupery.com Tue Jul 15 05:02:33 2025 From: alexandre.scotto at irt-saintexupery.com (SCOTTO Alexandre) Date: Tue, 15 Jul 2025 10:02:33 +0000 Subject: [petsc-users] When to perform PETSc.Vec assembly with petsc4py Message-ID: Dear PETSc community, As a beginner in the MPI world and with the PETSc library, I come with a possibly very naive question. I know from the documentation that assembling vectors must be done, but it is not clear to me when to perform this operation. Is there a simple way to know when a vector need be assembled and when it is not? Thanks in advance. Regards, Alexandre. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Tue Jul 15 05:20:32 2025 From: jroman at dsic.upv.es (Jose E. Roman) Date: Tue, 15 Jul 2025 10:20:32 +0000 Subject: [petsc-users] When to perform PETSc.Vec assembly with petsc4py In-Reply-To: References: Message-ID: <9C189057-7ADB-4220-B913-B51B7B93C957@dsic.upv.es> Assembly is needed after a call to x.setValues() or any of its variants. https://urldefense.us/v3/__https://petsc.org/main/petsc4py/reference/petsc4py.PETSc.Vec.html*petsc4py.PETSc.Vec.setValue__;Iw!!G_uCfscf7eWS!cvSh6J_ggyBtDLAEXjAIxkYQUbKkTTiA-QPyYNKZh3E_iJjftgP4afSVeUnPwdIE84eDB6to38b3rFjJg9yRB1QL$ Take into account that in python the notation x[i] = ... with call x.setValues() under the hood. Jose > El 15 jul 2025, a las 12:02, SCOTTO Alexandre via petsc-users escribi?: > > Dear PETSc community, > As a beginner in the MPI world and with the PETSc library, I come with a possibly very naive question. > I know from the documentation that assembling vectors must be done, but it is not clear to me when to perform this operation. > Is there a simple way to know when a vector need be assembled and when it is not? > Thanks in advance. > Regards, > Alexandre. From alexandre.scotto at irt-saintexupery.com Tue Jul 15 07:13:30 2025 From: alexandre.scotto at irt-saintexupery.com (SCOTTO Alexandre) Date: Tue, 15 Jul 2025 12:13:30 +0000 Subject: [petsc-users] When to perform PETSc.Vec assembly with petsc4py In-Reply-To: <9C189057-7ADB-4220-B913-B51B7B93C957@dsic.upv.es> References: <9C189057-7ADB-4220-B913-B51B7B93C957@dsic.upv.es> Message-ID: Hello Jose, Thanks for your answer. Then it seems that I have under the hood usages of setValues() in my code since I do not explicitly make use of it but still has problems when not assembling my vector. Do I need to assemble vectors after Mat.mult or Scatter.scatter? By the way, I did not know that PETSc.Vec objects supported direct assignments like x[i] = ..., so I rather use assignments of the form x.array = ... >From what I understand from the GitHub page (https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/blob/main/src/binding/petsc4py/src/petsc4py/PETSc/Vec.pyx__;!!G_uCfscf7eWS!f3M5SmfhSqew4tgfvFDjdLnd3q3kT_KzcCitJTJC0HRl1YvbSNBr5Uuzv6OWP1pVQBgbi_WJlekerylpGcBHQ9vsbzL1suFBQHkoPIwhdw$ ) it seems, that doing so, we only access the local portion of the array which should not require any assemble() am I right? Best regards, Alexandre. -----Message d'origine----- De?: Jose E. Roman Envoy??: mardi 15 juillet 2025 12:21 ??: SCOTTO Alexandre Cc?: petsc-users at mcs.anl.gov Objet?: Re: [petsc-users] When to perform PETSc.Vec assembly with petsc4py Assembly is needed after a call to x.setValues() or any of its variants. https://urldefense.us/v3/__https://petsc.org/main/petsc4py/reference/petsc4py.PETSc.Vec.html*petsc4py.PETSc.Vec.setValue__;Iw!!G_uCfscf7eWS!f3M5SmfhSqew4tgfvFDjdLnd3q3kT_KzcCitJTJC0HRl1YvbSNBr5Uuzv6OWP1pVQBgbi_WJlekerylpGcBHQ9vsbzL1suFBQHlKEPOyQg$ Take into account that in python the notation x[i] = ... with call x.setValues() under the hood. Jose > El 15 jul 2025, a las 12:02, SCOTTO Alexandre via petsc-users escribi?: > > Dear PETSc community, > As a beginner in the MPI world and with the PETSc library, I come with a possibly very naive question. > I know from the documentation that assembling vectors must be done, but it is not clear to me when to perform this operation. > Is there a simple way to know when a vector need be assembled and when it is not? > Thanks in advance. > Regards, > Alexandre. From bsmith at petsc.dev Tue Jul 15 07:16:32 2025 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 15 Jul 2025 08:16:32 -0400 Subject: [petsc-users] When to perform PETSc.Vec assembly with petsc4py In-Reply-To: References: <9C189057-7ADB-4220-B913-B51B7B93C957@dsic.upv.es> Message-ID: > On Jul 15, 2025, at 8:13?AM, SCOTTO Alexandre via petsc-users wrote: > > Hello Jose, > > Thanks for your answer. Then it seems that I have under the hood usages of setValues() in my code since I do not explicitly make use of it but still has problems when not assembling my vector. Do I need to assemble vectors after Mat.mult or Scatter.scatter? Absolutely not. In such situations all the communication and putting the correct vector values in the correct location is done automatically by the routines you are calling. > > By the way, I did not know that PETSc.Vec objects supported direct assignments like x[i] = ..., so I rather use assignments of the form x.array = ... > > From what I understand from the GitHub page (https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/blob/main/src/binding/petsc4py/src/petsc4py/PETSc/Vec.pyx__;!!G_uCfscf7eWS!f3M5SmfhSqew4tgfvFDjdLnd3q3kT_KzcCitJTJC0HRl1YvbSNBr5Uuzv6OWP1pVQBgbi_WJlekerylpGcBHQ9vsbzL1suFBQHkoPIwhdw$ ) it seems, that doing so, we only access the local portion of the array which should not require any assemble() am I right? > > Best regards, > Alexandre. > > > -----Message d'origine----- > De : Jose E. Roman > Envoy? : mardi 15 juillet 2025 12:21 > ? : SCOTTO Alexandre > Cc : petsc-users at mcs.anl.gov > Objet : Re: [petsc-users] When to perform PETSc.Vec assembly with petsc4py > > Assembly is needed after a call to x.setValues() or any of its variants. > https://urldefense.us/v3/__https://petsc.org/main/petsc4py/reference/petsc4py.PETSc.Vec.html*petsc4py.PETSc.Vec.setValue__;Iw!!G_uCfscf7eWS!f3M5SmfhSqew4tgfvFDjdLnd3q3kT_KzcCitJTJC0HRl1YvbSNBr5Uuzv6OWP1pVQBgbi_WJlekerylpGcBHQ9vsbzL1suFBQHlKEPOyQg$ > Take into account that in python the notation x[i] = ... with call x.setValues() under the hood. > > Jose > > >> El 15 jul 2025, a las 12:02, SCOTTO Alexandre via petsc-users escribi?: >> >> Dear PETSc community, >> As a beginner in the MPI world and with the PETSc library, I come with a possibly very naive question. >> I know from the documentation that assembling vectors must be done, but it is not clear to me when to perform this operation. >> Is there a simple way to know when a vector need be assembled and when it is not? >> Thanks in advance. >> Regards, >> Alexandre. > From knepley at gmail.com Tue Jul 15 07:18:04 2025 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 15 Jul 2025 08:18:04 -0400 Subject: [petsc-users] When to perform PETSc.Vec assembly with petsc4py In-Reply-To: References: <9C189057-7ADB-4220-B913-B51B7B93C957@dsic.upv.es> Message-ID: On Tue, Jul 15, 2025 at 8:13?AM SCOTTO Alexandre via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hello Jose, > > Thanks for your answer. Then it seems that I have under the hood usages of > setValues() in my code since I do not explicitly make use of it but still > has problems when not assembling my vector. Do I need to assemble vectors > after Mat.mult or Scatter.scatter? > No. The purpose of VecAsseblyBegin/End() is to move values from processes that do not own them to the processes that do. PETSc does this automatically for MatMult() and VecScatter routines because we know exactly where values are headed. However, when users call VecSetValues(), they may set locations that are owned by other processes. We could communicate these immediately, but that might be expensive for a series of VecSetValues() calls, so we wait until you call VecAssembly(). Note that direct assignment to the array can only set local values. This is equivalent to VecGetArray(). Thanks, Matt > By the way, I did not know that PETSc.Vec objects supported direct > assignments like x[i] = ..., so I rather use assignments of the form > x.array = ... > > From what I understand from the GitHub page ( > https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/blob/main/src/binding/petsc4py/src/petsc4py/PETSc/Vec.pyx__;!!G_uCfscf7eWS!f3M5SmfhSqew4tgfvFDjdLnd3q3kT_KzcCitJTJC0HRl1YvbSNBr5Uuzv6OWP1pVQBgbi_WJlekerylpGcBHQ9vsbzL1suFBQHkoPIwhdw$ > ) it seems, that doing so, we only access the local portion of the array > which should not require any assemble() am I right? > > Best regards, > Alexandre. > > > -----Message d'origine----- > De : Jose E. Roman > Envoy? : mardi 15 juillet 2025 12:21 > ? : SCOTTO Alexandre > Cc : petsc-users at mcs.anl.gov > Objet : Re: [petsc-users] When to perform PETSc.Vec assembly with petsc4py > > Assembly is needed after a call to x.setValues() or any of its variants. > > https://urldefense.us/v3/__https://petsc.org/main/petsc4py/reference/petsc4py.PETSc.Vec.html*petsc4py.PETSc.Vec.setValue__;Iw!!G_uCfscf7eWS!f3M5SmfhSqew4tgfvFDjdLnd3q3kT_KzcCitJTJC0HRl1YvbSNBr5Uuzv6OWP1pVQBgbi_WJlekerylpGcBHQ9vsbzL1suFBQHlKEPOyQg$ > Take into account that in python the notation x[i] = ... with call > x.setValues() under the hood. > > Jose > > > > El 15 jul 2025, a las 12:02, SCOTTO Alexandre via petsc-users < > petsc-users at mcs.anl.gov> escribi?: > > > > Dear PETSc community, > > As a beginner in the MPI world and with the PETSc library, I come with > a possibly very naive question. > > I know from the documentation that assembling vectors must be done, but > it is not clear to me when to perform this operation. > > Is there a simple way to know when a vector need be assembled and when > it is not? > > Thanks in advance. > > Regards, > > Alexandre. > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fx-I-mCGkHBLHWAaiP_nkFN7915JhuHJsjJm8Qpi3_otAh9GDsilnbnEQsr0-PfcqLdM5RdSw58rYsASqEIL$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexandre.scotto at irt-saintexupery.com Tue Jul 15 07:49:21 2025 From: alexandre.scotto at irt-saintexupery.com (SCOTTO Alexandre) Date: Tue, 15 Jul 2025 12:49:21 +0000 Subject: [petsc-users] When to perform PETSc.Vec assembly with petsc4py In-Reply-To: References: <9C189057-7ADB-4220-B913-B51B7B93C957@dsic.upv.es> Message-ID: Ok I get it, Vec.assemble() is mandatory whenever MPI communications are required to get the values to appropriate processes. To provide more information, I am in a situation where I perform a A.multTranspose(x, y), where y is a vector that has been filled in with values earlier in the process. At this stage, I no longer care of these values and I expect the multTranspose() to override the values. But what I get is: ? if I do y.assemble() before the transpose multiplication, then y is filled-in with the correct result. ? if I do not perform y.assemble() before the transpose multiplication, I actually get y = y + A^T(x), i.e. a result rather of the form multTransposeAdd(). ? If I do A.multTranspose(x, y) twice, then I get the correct result y = A^T(x). This makes me think that I am misusing something at some points, but it is not clear what. If someone has a hint to explain this behaviour that would help me better understand how to properly use PETSc! Regards, Alexandre. De : Matthew Knepley Envoy? : mardi 15 juillet 2025 14:18 ? : SCOTTO Alexandre Cc : Jose E. Roman ; petsc-users at mcs.anl.gov Objet : Re: [petsc-users] When to perform PETSc.Vec assembly with petsc4py On Tue, Jul 15, 2025 at 8:13?AM SCOTTO Alexandre via petsc-users > wrote: Hello Jose, Thanks for your answer. Then it seems that I have under the hood usages of setValues() in my code since I do not explicitly make use of it but still has problems when not assembling my vector. Do I need to assemble vectors after Mat.mult or Scatter.scatter? No. The purpose of VecAsseblyBegin/End() is to move values from processes that do not own them to the processes that do. PETSc does this automatically for MatMult() and VecScatter routines because we know exactly where values are headed. However, when users call VecSetValues(), they may set locations that are owned by other processes. We could communicate these immediately, but that might be expensive for a series of VecSetValues() calls, so we wait until you call VecAssembly(). Note that direct assignment to the array can only set local values. This is equivalent to VecGetArray(). Thanks, Matt By the way, I did not know that PETSc.Vec objects supported direct assignments like x[i] = ..., so I rather use assignments of the form x.array = ... >From what I understand from the GitHub page (https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/blob/main/src/binding/petsc4py/src/petsc4py/PETSc/Vec.pyx__;!!G_uCfscf7eWS!f3M5SmfhSqew4tgfvFDjdLnd3q3kT_KzcCitJTJC0HRl1YvbSNBr5Uuzv6OWP1pVQBgbi_WJlekerylpGcBHQ9vsbzL1suFBQHkoPIwhdw$ ) it seems, that doing so, we only access the local portion of the array which should not require any assemble() am I right? Best regards, Alexandre. -----Message d'origine----- De : Jose E. Roman > Envoy? : mardi 15 juillet 2025 12:21 ? : SCOTTO Alexandre > Cc : petsc-users at mcs.anl.gov Objet : Re: [petsc-users] When to perform PETSc.Vec assembly with petsc4py Assembly is needed after a call to x.setValues() or any of its variants. https://urldefense.us/v3/__https://petsc.org/main/petsc4py/reference/petsc4py.PETSc.Vec.html*petsc4py.PETSc.Vec.setValue__;Iw!!G_uCfscf7eWS!f3M5SmfhSqew4tgfvFDjdLnd3q3kT_KzcCitJTJC0HRl1YvbSNBr5Uuzv6OWP1pVQBgbi_WJlekerylpGcBHQ9vsbzL1suFBQHlKEPOyQg$ Take into account that in python the notation x[i] = ... with call x.setValues() under the hood. Jose > El 15 jul 2025, a las 12:02, SCOTTO Alexandre via petsc-users > escribi?: > > Dear PETSc community, > As a beginner in the MPI world and with the PETSc library, I come with a possibly very naive question. > I know from the documentation that assembling vectors must be done, but it is not clear to me when to perform this operation. > Is there a simple way to know when a vector need be assembled and when it is not? > Thanks in advance. > Regards, > Alexandre. -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cspsHbAhId4QoslCKt3i_holgL7kvzmVSbBmROUEJ9HuA2YEjyfDurQ0XiBEpAaqohjglNK31xfzm4zAiM3wDxpywkRKnJnXtieNhKyZvw$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Tue Jul 15 08:17:07 2025 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 15 Jul 2025 15:17:07 +0200 Subject: [petsc-users] When to perform PETSc.Vec assembly with petsc4py In-Reply-To: References: <9C189057-7ADB-4220-B913-B51B7B93C957@dsic.upv.es> Message-ID: A.multTranspose(x, y) produces y = A^t * x If it is not the case, you are doing something wrong I minimal working example will help us debug your approach. You can take a look at the test folder https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/tree/main/src/binding/petsc4py/test__;!!G_uCfscf7eWS!dhEH-iztM8EuysrQ-SA3_P74GVXmb3CXNESdTF-hHBHpAa-fQbeM0Lf27pHtwK5mW_VFNYIFPHdQbhUpPxZdxitcNaZSyTY$ , the demo folder https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/tree/main/src/binding/petsc4py/demo__;!!G_uCfscf7eWS!dhEH-iztM8EuysrQ-SA3_P74GVXmb3CXNESdTF-hHBHpAa-fQbeM0Lf27pHtwK5mW_VFNYIFPHdQbhUpPxZdxitcupJqo4Q$ There are also Python examples here https://urldefense.us/v3/__https://gitlab.com/stefanozampini/petscexamples__;!!G_uCfscf7eWS!dhEH-iztM8EuysrQ-SA3_P74GVXmb3CXNESdTF-hHBHpAa-fQbeM0Lf27pHtwK5mW_VFNYIFPHdQbhUpPxZdxitcntk_xfM$ Il giorno mar 15 lug 2025 alle ore 14:49 SCOTTO Alexandre via petsc-users < petsc-users at mcs.anl.gov> ha scritto: > Ok I get it, Vec.assemble() is mandatory whenever MPI communications are > required to get the values to appropriate processes. > > > > To provide more information, I am in a situation where I perform a > A.multTranspose(x, y), where y is a vector that has been filled in with > values earlier in the process. At this stage, I no longer care of these > values and I expect the multTranspose() to override the values. But what I > get is: > > ? if I do y.assemble() before the transpose multiplication, then > y is filled-in with the correct result. > > ? if I do not perform y.assemble() before the transpose > multiplication, I actually get y = y + A^T(x), i.e. a result rather of the > form multTransposeAdd(). > > ? If I do A.multTranspose(x, y) *twice*, then I get the correct > result y = A^T(x). > > > > This makes me think that I am misusing something at some points, but it is > not clear what. If someone has a hint to explain this behaviour that would > help me better understand how to properly use PETSc! > > > > Regards, > Alexandre. > > > > *De :* Matthew Knepley > *Envoy? :* mardi 15 juillet 2025 14:18 > *? :* SCOTTO Alexandre > *Cc :* Jose E. Roman ; petsc-users at mcs.anl.gov > *Objet :* Re: [petsc-users] When to perform PETSc.Vec assembly with > petsc4py > > > > On Tue, Jul 15, 2025 at 8:13?AM SCOTTO Alexandre via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hello Jose, > > Thanks for your answer. Then it seems that I have under the hood usages of > setValues() in my code since I do not explicitly make use of it but still > has problems when not assembling my vector. Do I need to assemble vectors > after Mat.mult or Scatter.scatter? > > > > No. The purpose of VecAsseblyBegin/End() is to move values from processes > that do not own them to the processes that do. PETSc does this > automatically for MatMult() and VecScatter routines because we know exactly > where values are headed. However, when users call VecSetValues(), they may > set locations that are owned by other processes. We could communicate these > immediately, but that might be expensive for a series of VecSetValues() > calls, so we wait until you call VecAssembly(). > > > > Note that direct assignment to the array can only set local values. This > is equivalent to VecGetArray(). > > > > Thanks, > > > > Matt > > > > By the way, I did not know that PETSc.Vec objects supported direct > assignments like x[i] = ..., so I rather use assignments of the form > x.array = ... > > From what I understand from the GitHub page ( > https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/blob/main/src/binding/petsc4py/src/petsc4py/PETSc/Vec.pyx__;!!G_uCfscf7eWS!f3M5SmfhSqew4tgfvFDjdLnd3q3kT_KzcCitJTJC0HRl1YvbSNBr5Uuzv6OWP1pVQBgbi_WJlekerylpGcBHQ9vsbzL1suFBQHkoPIwhdw$ > > ) it seems, that doing so, we only access the local portion of the array > which should not require any assemble() am I right? > > Best regards, > Alexandre. > > > -----Message d'origine----- > De : Jose E. Roman > Envoy? : mardi 15 juillet 2025 12:21 > ? : SCOTTO Alexandre > Cc : petsc-users at mcs.anl.gov > Objet : Re: [petsc-users] When to perform PETSc.Vec assembly with petsc4py > > Assembly is needed after a call to x.setValues() or any of its variants. > > https://urldefense.us/v3/__https://petsc.org/main/petsc4py/reference/petsc4py.PETSc.Vec.html*petsc4py.PETSc.Vec.setValue__;Iw!!G_uCfscf7eWS!f3M5SmfhSqew4tgfvFDjdLnd3q3kT_KzcCitJTJC0HRl1YvbSNBr5Uuzv6OWP1pVQBgbi_WJlekerylpGcBHQ9vsbzL1suFBQHlKEPOyQg$ > > Take into account that in python the notation x[i] = ... with call > x.setValues() under the hood. > > Jose > > > > El 15 jul 2025, a las 12:02, SCOTTO Alexandre via petsc-users < > petsc-users at mcs.anl.gov> escribi?: > > > > Dear PETSc community, > > As a beginner in the MPI world and with the PETSc library, I come with > a possibly very naive question. > > I know from the documentation that assembling vectors must be done, but > it is not clear to me when to perform this operation. > > Is there a simple way to know when a vector need be assembled and when > it is not? > > Thanks in advance. > > Regards, > > Alexandre. > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dhEH-iztM8EuysrQ-SA3_P74GVXmb3CXNESdTF-hHBHpAa-fQbeM0Lf27pHtwK5mW_VFNYIFPHdQbhUpPxZdxitcsLpPCdI$ > > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Tue Jul 15 08:36:30 2025 From: jroman at dsic.upv.es (Jose E. Roman) Date: Tue, 15 Jul 2025 13:36:30 +0000 Subject: [petsc-users] When to perform PETSc.Vec assembly with petsc4py In-Reply-To: References: <9C189057-7ADB-4220-B913-B51B7B93C957@dsic.upv.es> Message-ID: <771ABA95-CC4C-4D61-B935-7EA59C1AAFC3@dsic.upv.es> It is better if you can provide a full python script that we can run to reproduce the problem. Jose > El 15 jul 2025, a las 14:49, SCOTTO Alexandre escribi?: > > Ok I get it, Vec.assemble() is mandatory whenever MPI communications are required to get the values to appropriate processes. > To provide more information, I am in a situation where I perform a A.multTranspose(x, y), where y is a vector that has been filled in with values earlier in the process. At this stage, I no longer care of these values and I expect the multTranspose() to override the values. But what I get is: > ? if I do y.assemble() before the transpose multiplication, then y is filled-in with the correct result. > ? if I do not perform y.assemble() before the transpose multiplication, I actually get y = y + A^T(x), i.e. a result rather of the form multTransposeAdd(). > ? If I do A.multTranspose(x, y) twice, then I get the correct result y = A^T(x). > This makes me think that I am misusing something at some points, but it is not clear what. If someone has a hint to explain this behaviour that would help me better understand how to properly use PETSc! > Regards, > Alexandre. > De : Matthew Knepley > Envoy? : mardi 15 juillet 2025 14:18 > ? : SCOTTO Alexandre > Cc : Jose E. Roman ; petsc-users at mcs.anl.gov > Objet : Re: [petsc-users] When to perform PETSc.Vec assembly with petsc4py > On Tue, Jul 15, 2025 at 8:13?AM SCOTTO Alexandre via petsc-users wrote: > Hello Jose, > > Thanks for your answer. Then it seems that I have under the hood usages of setValues() in my code since I do not explicitly make use of it but still has problems when not assembling my vector. Do I need to assemble vectors after Mat.mult or Scatter.scatter? > No. The purpose of VecAsseblyBegin/End() is to move values from processes that do not own them to the processes that do. PETSc does this automatically for MatMult() and VecScatter routines because we know exactly where values are headed. However, when users call VecSetValues(), they may set locations that are owned by other processes. We could communicate these immediately, but that might be expensive for a series of VecSetValues() calls, so we wait until you call VecAssembly(). > Note that direct assignment to the array can only set local values. This is equivalent to VecGetArray(). > Thanks, > Matt > By the way, I did not know that PETSc.Vec objects supported direct assignments like x[i] = ..., so I rather use assignments of the form x.array = ... > > From what I understand from the GitHub page (https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/blob/main/src/binding/petsc4py/src/petsc4py/PETSc/Vec.pyx__;!!G_uCfscf7eWS!f3M5SmfhSqew4tgfvFDjdLnd3q3kT_KzcCitJTJC0HRl1YvbSNBr5Uuzv6OWP1pVQBgbi_WJlekerylpGcBHQ9vsbzL1suFBQHkoPIwhdw$ ) it seems, that doing so, we only access the local portion of the array which should not require any assemble() am I right? > > Best regards, > Alexandre. > > > -----Message d'origine----- > De : Jose E. Roman > Envoy? : mardi 15 juillet 2025 12:21 > ? : SCOTTO Alexandre > Cc : petsc-users at mcs.anl.gov > Objet : Re: [petsc-users] When to perform PETSc.Vec assembly with petsc4py > > Assembly is needed after a call to x.setValues() or any of its variants. > https://urldefense.us/v3/__https://petsc.org/main/petsc4py/reference/petsc4py.PETSc.Vec.html*petsc4py.PETSc.Vec.setValue__;Iw!!G_uCfscf7eWS!f3M5SmfhSqew4tgfvFDjdLnd3q3kT_KzcCitJTJC0HRl1YvbSNBr5Uuzv6OWP1pVQBgbi_WJlekerylpGcBHQ9vsbzL1suFBQHlKEPOyQg$ > Take into account that in python the notation x[i] = ... with call x.setValues() under the hood. > > Jose > > > > El 15 jul 2025, a las 12:02, SCOTTO Alexandre via petsc-users escribi?: > > > > Dear PETSc community, > > As a beginner in the MPI world and with the PETSc library, I come with a possibly very naive question. > > I know from the documentation that assembling vectors must be done, but it is not clear to me when to perform this operation. > > Is there a simple way to know when a vector need be assembled and when it is not? > > Thanks in advance. > > Regards, > > Alexandre. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YFFHiG3QxKfiKPBItZkVA5OnOBphUS5CGiNZWrgNYm4FLOjEzBMqLKlVQRdN3EZC9ueeIg6pWPdVOwt58c2GO5z3$ From knepley at gmail.com Tue Jul 15 09:05:44 2025 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 15 Jul 2025 10:05:44 -0400 Subject: [petsc-users] When to perform PETSc.Vec assembly with petsc4py In-Reply-To: <771ABA95-CC4C-4D61-B935-7EA59C1AAFC3@dsic.upv.es> References: <9C189057-7ADB-4220-B913-B51B7B93C957@dsic.upv.es> <771ABA95-CC4C-4D61-B935-7EA59C1AAFC3@dsic.upv.es> Message-ID: We check that the x argument is assembled in MatMultTranspose(), but not y. It seems possible that y could have stashed values that get communicated when the operation is performed. Why do we not check that y is assembled? Thanks, Matt On Tue, Jul 15, 2025 at 9:36?AM Jose E. Roman wrote: > It is better if you can provide a full python script that we can run to > reproduce the problem. > > Jose > > > > El 15 jul 2025, a las 14:49, SCOTTO Alexandre < > alexandre.scotto at irt-saintexupery.com> escribi?: > > > > Ok I get it, Vec.assemble() is mandatory whenever MPI communications are > required to get the values to appropriate processes. > > To provide more information, I am in a situation where I perform a > A.multTranspose(x, y), where y is a vector that has been filled in with > values earlier in the process. At this stage, I no longer care of these > values and I expect the multTranspose() to override the values. But what I > get is: > > ? if I do y.assemble() before the transpose multiplication, then > y is filled-in with the correct result. > > ? if I do not perform y.assemble() before the transpose > multiplication, I actually get y = y + A^T(x), i.e. a result rather of the > form multTransposeAdd(). > > ? If I do A.multTranspose(x, y) twice, then I get the correct > result y = A^T(x). > > This makes me think that I am misusing something at some points, but it > is not clear what. If someone has a hint to explain this behaviour that > would help me better understand how to properly use PETSc! > > Regards, > > Alexandre. > > De : Matthew Knepley > > Envoy? : mardi 15 juillet 2025 14:18 > > ? : SCOTTO Alexandre > > Cc : Jose E. Roman ; petsc-users at mcs.anl.gov > > Objet : Re: [petsc-users] When to perform PETSc.Vec assembly with > petsc4py > > On Tue, Jul 15, 2025 at 8:13?AM SCOTTO Alexandre via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hello Jose, > > > > Thanks for your answer. Then it seems that I have under the hood usages > of setValues() in my code since I do not explicitly make use of it but > still has problems when not assembling my vector. Do I need to assemble > vectors after Mat.mult or Scatter.scatter? > > No. The purpose of VecAsseblyBegin/End() is to move values from > processes that do not own them to the processes that do. PETSc does this > automatically for MatMult() and VecScatter routines because we know exactly > where values are headed. However, when users call VecSetValues(), they may > set locations that are owned by other processes. We could communicate these > immediately, but that might be expensive for a series of VecSetValues() > calls, so we wait until you call VecAssembly(). > > Note that direct assignment to the array can only set local values. > This is equivalent to VecGetArray(). > > Thanks, > > Matt > > By the way, I did not know that PETSc.Vec objects supported direct > assignments like x[i] = ..., so I rather use assignments of the form > x.array = ... > > > > From what I understand from the GitHub page ( > https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/blob/main/src/binding/petsc4py/src/petsc4py/PETSc/Vec.pyx__;!!G_uCfscf7eWS!f3M5SmfhSqew4tgfvFDjdLnd3q3kT_KzcCitJTJC0HRl1YvbSNBr5Uuzv6OWP1pVQBgbi_WJlekerylpGcBHQ9vsbzL1suFBQHkoPIwhdw$ > ) it seems, that doing so, we only access the local portion of the array > which should not require any assemble() am I right? > > > > Best regards, > > Alexandre. > > > > > > -----Message d'origine----- > > De : Jose E. Roman > > Envoy? : mardi 15 juillet 2025 12:21 > > ? : SCOTTO Alexandre > > Cc : petsc-users at mcs.anl.gov > > Objet : Re: [petsc-users] When to perform PETSc.Vec assembly with > petsc4py > > > > Assembly is needed after a call to x.setValues() or any of its variants. > > > https://urldefense.us/v3/__https://petsc.org/main/petsc4py/reference/petsc4py.PETSc.Vec.html*petsc4py.PETSc.Vec.setValue__;Iw!!G_uCfscf7eWS!f3M5SmfhSqew4tgfvFDjdLnd3q3kT_KzcCitJTJC0HRl1YvbSNBr5Uuzv6OWP1pVQBgbi_WJlekerylpGcBHQ9vsbzL1suFBQHlKEPOyQg$ > > Take into account that in python the notation x[i] = ... with call > x.setValues() under the hood. > > > > Jose > > > > > > > El 15 jul 2025, a las 12:02, SCOTTO Alexandre via petsc-users < > petsc-users at mcs.anl.gov> escribi?: > > > > > > Dear PETSc community, > > > As a beginner in the MPI world and with the PETSc library, I come > with a possibly very naive question. > > > I know from the documentation that assembling vectors must be done, > but it is not clear to me when to perform this operation. > > > Is there a simple way to know when a vector need be assembled and when > it is not? > > > Thanks in advance. > > > Regards, > > > Alexandre. > > -- What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dSyL6vBiLAFtj65CVYTcdDSdHEss-RYYzDlW5AXqReVAaV4ND9G4hKjVv2091d4dmLUf7tF09u5_6hVbw5fP$ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dSyL6vBiLAFtj65CVYTcdDSdHEss-RYYzDlW5AXqReVAaV4ND9G4hKjVv2091d4dmLUf7tF09u5_6hVbw5fP$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexandre.scotto at irt-saintexupery.com Tue Jul 15 09:44:11 2025 From: alexandre.scotto at irt-saintexupery.com (SCOTTO Alexandre) Date: Tue, 15 Jul 2025 14:44:11 +0000 Subject: [petsc-users] When to perform PETSc.Vec assembly with petsc4py In-Reply-To: References: <9C189057-7ADB-4220-B913-B51B7B93C957@dsic.upv.es> <771ABA95-CC4C-4D61-B935-7EA59C1AAFC3@dsic.upv.es> Message-ID: <10114346b93b4f3a955615f32dbc5b4d@irt-saintexupery.com> The place where the problem is quite deep in a large code, I will try to isolate the problematic behaviour but this may take some time. @Matthew, is there a way in the Python API to check whether a vector is assembled? Regards, Alexandre. De : Matthew Knepley Envoy? : mardi 15 juillet 2025 16:06 ? : Jose E. Roman Cc : SCOTTO Alexandre ; petsc-users at mcs.anl.gov Objet : Re: [petsc-users] When to perform PETSc.Vec assembly with petsc4py We check that the x argument is assembled in MatMultTranspose(), but not y. It seems possible that y could have stashed values that get communicated when the operation is performed. Why do we not check that y is assembled? Thanks, Matt On Tue, Jul 15, 2025 at 9:36?AM Jose E. Roman > wrote: It is better if you can provide a full python script that we can run to reproduce the problem. Jose > El 15 jul 2025, a las 14:49, SCOTTO Alexandre > escribi?: > > Ok I get it, Vec.assemble() is mandatory whenever MPI communications are required to get the values to appropriate processes. > To provide more information, I am in a situation where I perform a A.multTranspose(x, y), where y is a vector that has been filled in with values earlier in the process. At this stage, I no longer care of these values and I expect the multTranspose() to override the values. But what I get is: > ? if I do y.assemble() before the transpose multiplication, then y is filled-in with the correct result. > ? if I do not perform y.assemble() before the transpose multiplication, I actually get y = y + A^T(x), i.e. a result rather of the form multTransposeAdd(). > ? If I do A.multTranspose(x, y) twice, then I get the correct result y = A^T(x). > This makes me think that I am misusing something at some points, but it is not clear what. If someone has a hint to explain this behaviour that would help me better understand how to properly use PETSc! > Regards, > Alexandre. > De : Matthew Knepley > > Envoy? : mardi 15 juillet 2025 14:18 > ? : SCOTTO Alexandre > > Cc : Jose E. Roman >; petsc-users at mcs.anl.gov > Objet : Re: [petsc-users] When to perform PETSc.Vec assembly with petsc4py > On Tue, Jul 15, 2025 at 8:13?AM SCOTTO Alexandre via petsc-users > wrote: > Hello Jose, > > Thanks for your answer. Then it seems that I have under the hood usages of setValues() in my code since I do not explicitly make use of it but still has problems when not assembling my vector. Do I need to assemble vectors after Mat.mult or Scatter.scatter? > No. The purpose of VecAsseblyBegin/End() is to move values from processes that do not own them to the processes that do. PETSc does this automatically for MatMult() and VecScatter routines because we know exactly where values are headed. However, when users call VecSetValues(), they may set locations that are owned by other processes. We could communicate these immediately, but that might be expensive for a series of VecSetValues() calls, so we wait until you call VecAssembly(). > Note that direct assignment to the array can only set local values. This is equivalent to VecGetArray(). > Thanks, > Matt > By the way, I did not know that PETSc.Vec objects supported direct assignments like x[i] = ..., so I rather use assignments of the form x.array = ... > > From what I understand from the GitHub page (https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/blob/main/src/binding/petsc4py/src/petsc4py/PETSc/Vec.pyx__;!!G_uCfscf7eWS!f3M5SmfhSqew4tgfvFDjdLnd3q3kT_KzcCitJTJC0HRl1YvbSNBr5Uuzv6OWP1pVQBgbi_WJlekerylpGcBHQ9vsbzL1suFBQHkoPIwhdw$ ) it seems, that doing so, we only access the local portion of the array which should not require any assemble() am I right? > > Best regards, > Alexandre. > > > -----Message d'origine----- > De : Jose E. Roman > > Envoy? : mardi 15 juillet 2025 12:21 > ? : SCOTTO Alexandre > > Cc : petsc-users at mcs.anl.gov > Objet : Re: [petsc-users] When to perform PETSc.Vec assembly with petsc4py > > Assembly is needed after a call to x.setValues() or any of its variants. > https://urldefense.us/v3/__https://petsc.org/main/petsc4py/reference/petsc4py.PETSc.Vec.html*petsc4py.PETSc.Vec.setValue__;Iw!!G_uCfscf7eWS!f3M5SmfhSqew4tgfvFDjdLnd3q3kT_KzcCitJTJC0HRl1YvbSNBr5Uuzv6OWP1pVQBgbi_WJlekerylpGcBHQ9vsbzL1suFBQHlKEPOyQg$ > Take into account that in python the notation x[i] = ... with call x.setValues() under the hood. > > Jose > > > > El 15 jul 2025, a las 12:02, SCOTTO Alexandre via petsc-users > escribi?: > > > > Dear PETSc community, > > As a beginner in the MPI world and with the PETSc library, I come with a possibly very naive question. > > I know from the documentation that assembling vectors must be done, but it is not clear to me when to perform this operation. > > Is there a simple way to know when a vector need be assembled and when it is not? > > Thanks in advance. > > Regards, > > Alexandre. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aZJ9246pF8FAjBBUhSi39ow-huolRMdyysnIQMJe6Lilwvn7xVXxabH5JBy-S32vOZA5Sk4Ma_pBzo5D7Xg5VlPa-UrOinlcL1nselY1pw$ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aZJ9246pF8FAjBBUhSi39ow-huolRMdyysnIQMJe6Lilwvn7xVXxabH5JBy-S32vOZA5Sk4Ma_pBzo5D7Xg5VlPa-UrOinlcL1nselY1pw$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Jul 15 10:26:04 2025 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 15 Jul 2025 11:26:04 -0400 Subject: [petsc-users] When to perform PETSc.Vec assembly with petsc4py In-Reply-To: <10114346b93b4f3a955615f32dbc5b4d@irt-saintexupery.com> References: <9C189057-7ADB-4220-B913-B51B7B93C957@dsic.upv.es> <771ABA95-CC4C-4D61-B935-7EA59C1AAFC3@dsic.upv.es> <10114346b93b4f3a955615f32dbc5b4d@irt-saintexupery.com> Message-ID: On Tue, Jul 15, 2025 at 10:44?AM SCOTTO Alexandre < alexandre.scotto at irt-saintexupery.com> wrote: > The place where the problem is quite deep in a large code, I will try to > isolate the problematic behaviour but this may take some time. > > > > @Matthew, is there a way in the Python API to check whether a vector is > assembled? > No. I will put in the wrapper for that. Thanks, Matt > Regards, > > Alexandre. > > > > *De :* Matthew Knepley > *Envoy? :* mardi 15 juillet 2025 16:06 > *? :* Jose E. Roman > *Cc :* SCOTTO Alexandre ; > petsc-users at mcs.anl.gov > *Objet :* Re: [petsc-users] When to perform PETSc.Vec assembly with > petsc4py > > > > We check that the x argument is assembled in MatMultTranspose(), but not > y. It seems possible that y could have stashed values that get communicated > when the operation is performed. Why do we not check that y is assembled? > > > > Thanks, > > > > Matt > > > > On Tue, Jul 15, 2025 at 9:36?AM Jose E. Roman wrote: > > It is better if you can provide a full python script that we can run to > reproduce the problem. > > Jose > > > > El 15 jul 2025, a las 14:49, SCOTTO Alexandre < > alexandre.scotto at irt-saintexupery.com> escribi?: > > > > Ok I get it, Vec.assemble() is mandatory whenever MPI communications are > required to get the values to appropriate processes. > > To provide more information, I am in a situation where I perform a > A.multTranspose(x, y), where y is a vector that has been filled in with > values earlier in the process. At this stage, I no longer care of these > values and I expect the multTranspose() to override the values. But what I > get is: > > ? if I do y.assemble() before the transpose multiplication, then > y is filled-in with the correct result. > > ? if I do not perform y.assemble() before the transpose > multiplication, I actually get y = y + A^T(x), i.e. a result rather of the > form multTransposeAdd(). > > ? If I do A.multTranspose(x, y) twice, then I get the correct > result y = A^T(x). > > This makes me think that I am misusing something at some points, but it > is not clear what. If someone has a hint to explain this behaviour that > would help me better understand how to properly use PETSc! > > Regards, > > Alexandre. > > De : Matthew Knepley > > Envoy? : mardi 15 juillet 2025 14:18 > > ? : SCOTTO Alexandre > > Cc : Jose E. Roman ; petsc-users at mcs.anl.gov > > Objet : Re: [petsc-users] When to perform PETSc.Vec assembly with > petsc4py > > On Tue, Jul 15, 2025 at 8:13?AM SCOTTO Alexandre via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hello Jose, > > > > Thanks for your answer. Then it seems that I have under the hood usages > of setValues() in my code since I do not explicitly make use of it but > still has problems when not assembling my vector. Do I need to assemble > vectors after Mat.mult or Scatter.scatter? > > No. The purpose of VecAsseblyBegin/End() is to move values from > processes that do not own them to the processes that do. PETSc does this > automatically for MatMult() and VecScatter routines because we know exactly > where values are headed. However, when users call VecSetValues(), they may > set locations that are owned by other processes. We could communicate these > immediately, but that might be expensive for a series of VecSetValues() > calls, so we wait until you call VecAssembly(). > > Note that direct assignment to the array can only set local values. > This is equivalent to VecGetArray(). > > Thanks, > > Matt > > By the way, I did not know that PETSc.Vec objects supported direct > assignments like x[i] = ..., so I rather use assignments of the form > x.array = ... > > > > From what I understand from the GitHub page ( > https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/blob/main/src/binding/petsc4py/src/petsc4py/PETSc/Vec.pyx__;!!G_uCfscf7eWS!f3M5SmfhSqew4tgfvFDjdLnd3q3kT_KzcCitJTJC0HRl1YvbSNBr5Uuzv6OWP1pVQBgbi_WJlekerylpGcBHQ9vsbzL1suFBQHkoPIwhdw$ > > ) it seems, that doing so, we only access the local portion of the array > which should not require any assemble() am I right? > > > > Best regards, > > Alexandre. > > > > > > -----Message d'origine----- > > De : Jose E. Roman > > Envoy? : mardi 15 juillet 2025 12:21 > > ? : SCOTTO Alexandre > > Cc : petsc-users at mcs.anl.gov > > Objet : Re: [petsc-users] When to perform PETSc.Vec assembly with > petsc4py > > > > Assembly is needed after a call to x.setValues() or any of its variants. > > > https://urldefense.us/v3/__https://petsc.org/main/petsc4py/reference/petsc4py.PETSc.Vec.html*petsc4py.PETSc.Vec.setValue__;Iw!!G_uCfscf7eWS!f3M5SmfhSqew4tgfvFDjdLnd3q3kT_KzcCitJTJC0HRl1YvbSNBr5Uuzv6OWP1pVQBgbi_WJlekerylpGcBHQ9vsbzL1suFBQHlKEPOyQg$ > > > Take into account that in python the notation x[i] = ... with call > x.setValues() under the hood. > > > > Jose > > > > > > > El 15 jul 2025, a las 12:02, SCOTTO Alexandre via petsc-users < > petsc-users at mcs.anl.gov> escribi?: > > > > > > Dear PETSc community, > > > As a beginner in the MPI world and with the PETSc library, I come > with a possibly very naive question. > > > I know from the documentation that assembling vectors must be done, > but it is not clear to me when to perform this operation. > > > Is there a simple way to know when a vector need be assembled and when > it is not? > > > Thanks in advance. > > > Regards, > > > Alexandre. > > -- What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fWSWIHY4a4L2adD4fYqOth8uBeVB9CUKljMnzGrrzwXCW7Th4Iv_EdU7624bjBlHVSoaV6Ybdss-lakusLGJ$ > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fWSWIHY4a4L2adD4fYqOth8uBeVB9CUKljMnzGrrzwXCW7Th4Iv_EdU7624bjBlHVSoaV6Ybdss-lakusLGJ$ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fWSWIHY4a4L2adD4fYqOth8uBeVB9CUKljMnzGrrzwXCW7Th4Iv_EdU7624bjBlHVSoaV6Ybdss-lakusLGJ$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Wed Jul 16 17:15:39 2025 From: hongzhang at anl.gov (Zhang, Hong) Date: Wed, 16 Jul 2025 22:15:39 +0000 Subject: [petsc-users] Petsc/Jax no copy interfacing issues In-Reply-To: References: <7BA68283-588B-4CB3-A86C-C14E05A30D3F@anl.gov> Message-ID: It is expected that changes made in Jax are not reflected in the PETSc object. The issue has been explained in my previous message (point 2). Hong ________________________________ From: Alberto Cattaneo Sent: Tuesday, July 15, 2025 1:13 PM To: Zhang, Hong Subject: Re: [petsc-users] Petsc/Jax no copy interfacing issues Odd, I was using double precision (forgot to include that in the example, sorry) but on my machineI?m still not seeing the changes made reflected in the PETSc object. Are the changes reflected on your end? Is it possibly an ownership issue? ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Odd, I was using double precision (forgot to include that in the example, sorry) but on my machineI?m still not seeing the changes made reflected in the PETSc object. Are the changes reflected on your end? Is it possibly an ownership issue? On Tue, Jul 8, 2025 at 3:56?PM Zhang, Hong > wrote: Hi Alberto, 1. To check the array pointer on the PETSc side, you can do print(hex(y_petsc.array.ctypes.data)). Then you will see a pointer mismatch caused by the line y = jnp.from_dlpack(y_petsc, copy=False). This is because you configured PETSc in double precision, but JAX uses single precision by default. You can either add jax.config.update("jax_enable_x64", True) to make JAX use double precision number or configure PETSc to support single precision. 2. Once you fix this precision mismatch, the in-place conversion between PETSc and JAX should work. However, .at[].set() in JAX does not guarantee to operate in-place. The array updates in JAX are generally performed out-of-place by design. You may do the updates in PETSc so that it won?t break the zero-copy system. Hong From: petsc-users > on behalf of Alberto Cattaneo > Date: Monday, July 7, 2025 at 8:40?AM To: "petsc-users at mcs.anl.gov" > Subject: [petsc-users] Petsc/Jax no copy interfacing issues Greetings. I hope this email reaches you well. I?m trying to get JAX and PETSc to work together in a no-copy system using the DLPack tools in both. Unfortunately I can?t seem to get it to work right. Ideally, I?d like to create a PETSc vec object using petsc4py, pass it to to a JAX object without copying, make a change to it in a JAX jitted function and have that change reflected in the PETSc object. All of this without copying. Of note: When I try to do this I get an error that the alignment is wrong and a copy must be made when I call the from-dlpack function but changing the alignment in the PETSc ./config stage to 32 causes the error message to disappear, even so it still doesn?t function correctly. I?ve tried looking through the documentation, but I?m getting a little turned around. I?ve included a code snippet below: from petsc4py import PETSc as PETSc import jax from functools import partial import jax.numpy as jnp @partial(jax.jit, donate_argnums=(0,)) def set_in_place(x): return x.at[:].set(3.0) print('\nTesting jax from_dlpack given a PETSc vector that was allocated by PETSc') x = jnp.ones((1000,1)) y_petsc = PETSc.Vec().createSeq(x.shape[0]) y_petsc.set(0.0) print(hex(y_petsc.handle)) y2_petsc = PETSc.Vec().createWithDLPack(y_petsc.toDLPack('rw')) y2_petsc.set(-1.0) assert y_petsc.getValue(0) == y2_petsc.getValue(0) print('After creating a second PETSc vector via a DLPack of the first, modifying the memory of one affects the other.') #y = jnp.from_dlpack(y_petsc.toDLPack('rw'), copy=False) y = jnp.from_dlpack(y_petsc, copy=False) orig_ptr = y.unsafe_buffer_pointer() print(f'before: ptr at {hex(orig_ptr)}') y = set_in_place(y) print(f'after: ptr at {hex(y.unsafe_buffer_pointer())}') assert orig_ptr == y.unsafe_buffer_pointer() #assert y_petsc.getValue(0) == y[0], f'The PETSc value {y_petsc.getValue(0)} did not match the JAX value {y[0]}, so modifying the JAX memory did not affect the PETSc memory.' I?d like the bottom two asserts to pass, but I can only get one of them. If somebody is familiar with this issue I?d greatly appreciate the assistance. Respectfully: Alberto -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Jul 17 22:58:46 2025 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 17 Jul 2025 23:58:46 -0400 Subject: [petsc-users] Fortran, KSPMonitorSet, and KSPMonitorTrueResidual In-Reply-To: <5BFFB970-9483-4AF5-83D3-906EF3BC94C8@gmail.com> References: <5BFFB970-9483-4AF5-83D3-906EF3BC94C8@gmail.com> Message-ID: Randy, Thanks for pointing out this problem. I was generating the wrong C stubs for arguments such as PetscViewerAndFormat, that is arguments that are C structs but treated as FortranAddr in Fortran because the C struct is too complex to be directly represented in Fortran. This is fixed in https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8563__;!!G_uCfscf7eWS!ZJ2LzM36kh8gWYIb5xKHC93lz7AjKZEaJaTVb7JIHJ2VZQ6psyGi7AKwT6E6zu-TflxcNJEnP0zppaV27SmvFCo$ and should allow you to do what you previously had done. In the MR you will see I have modified ex2f.F to do essentially what your test code is doing. Barry > On Jun 23, 2025, at 2:46?PM, Randall Mackie wrote: > > In previous versions of PETSc we use to be able to call KSPMonitorTrueResidual from within our custom KSPMonitor, using an approach that is now commented out in the example found at https://urldefense.us/v3/__https://petsc.org/release/src/ksp/ksp/tutorials/ex2f.F90.html__;!!G_uCfscf7eWS!ZJ2LzM36kh8gWYIb5xKHC93lz7AjKZEaJaTVb7JIHJ2VZQ6psyGi7AKwT6E6zu-TflxcNJEnP0zppaV2RnCwvyM$ : > > ! > 214: <>! Cannot also use the default KSP monitor routine showing how it may be used from Fortran > 215: <>! since the Fortran compiler thinks the calling arguments are different in the two cases > 216: <>! > 217: <>! PetscCallA (PetscViewerAndFormatCreate (PETSC_VIEWER_STDOUT_WORLD ,PETSC_VIEWER_DEFAULT ,vf,ierr)) > 218: <>! PetscCallA (KSPMonitorSet (ksp,KSPMonitorResidual ,vf,PetscViewerAndFormatDestroy ,ierr)) > > Instead, that example uses: > > 210: <> if (flg) then > 211: <> vzero = 0 > 212: <> PetscCallA (KSPMonitorSet (ksp,MyKSPMonitor,vzero,PETSC_NULL_FUNCTION,ierr)) > 213: <>! > Regardless of which of these approaches I try, I cannot use KSPMonitorTrueResidual in the MyKSPMonitor routine. > I get the following error: > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Null argument, when expecting valid pointer > [0]PETSC ERROR: Null Pointer: Parameter # 4 > [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!ZJ2LzM36kh8gWYIb5xKHC93lz7AjKZEaJaTVb7JIHJ2VZQ6psyGi7AKwT6E6zu-TflxcNJEnP0zppaV2eUazhhw$ for trouble shooting. > [0]PETSC ERROR: PETSc Release Version 3.23.3, May 30, 2025 > [0]PETSC ERROR: ./test with 2 MPI process(es) and PETSC_ARCH linux-gfortran-complex-debug on rmackie-VirtualBox-2024 by rmackie Mon Jun 23 11:34:04 2025 > [0]PETSC ERROR: Configure options: --with-clean=1 --with-scalar-type=complex --with-debugging=1 --with-fortran=1 --download-mpich=1 > [0]PETSC ERROR: #1 KSPMonitorTrueResidual() at /home/rmackie/PETSc/petsc-3.23.3/src/ksp/ksp/interface/iterativ.c:400 > [0]PETSC ERROR: #2 test.F90:303 > > > I attach a slightly modified version of the example that demonstrates this behavior. > > > Thanks for the help, > > Randy > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rlmackie862 at gmail.com Fri Jul 18 12:08:16 2025 From: rlmackie862 at gmail.com (Randall Mackie) Date: Fri, 18 Jul 2025 10:08:16 -0700 Subject: [petsc-users] Fortran, KSPMonitorSet, and KSPMonitorTrueResidual In-Reply-To: References: <5BFFB970-9483-4AF5-83D3-906EF3BC94C8@gmail.com> Message-ID: Thanks Barry! > On Jul 17, 2025, at 8:58?PM, Barry Smith wrote: > > > Randy, > > Thanks for pointing out this problem. I was generating the wrong C stubs for arguments such as PetscViewerAndFormat, that is arguments that are C structs but treated as FortranAddr in Fortran because the C struct is too complex to be directly represented in Fortran. > > This is fixed in https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8563__;!!G_uCfscf7eWS!cpFnpAAfsY4pLNBenaae5toN0iiU6tx2ilGtp9nukqV_fjZQXlneIci6mh96yqnVRvaPr38SO2VFxIQnXb8sQdBftA$ and should allow you to do what you previously had done. In the MR you will see I have modified ex2f.F to do essentially what your test code is doing. > > Barry > > >> On Jun 23, 2025, at 2:46?PM, Randall Mackie wrote: >> >> In previous versions of PETSc we use to be able to call KSPMonitorTrueResidual from within our custom KSPMonitor, using an approach that is now commented out in the example found at https://urldefense.us/v3/__https://petsc.org/release/src/ksp/ksp/tutorials/ex2f.F90.html__;!!G_uCfscf7eWS!cpFnpAAfsY4pLNBenaae5toN0iiU6tx2ilGtp9nukqV_fjZQXlneIci6mh96yqnVRvaPr38SO2VFxIQnXb_bSiflJw$ : >> >> ! >> 214: <>! Cannot also use the default KSP monitor routine showing how it may be used from Fortran >> 215: <>! since the Fortran compiler thinks the calling arguments are different in the two cases >> 216: <>! >> 217: <>! PetscCallA (PetscViewerAndFormatCreate (PETSC_VIEWER_STDOUT_WORLD ,PETSC_VIEWER_DEFAULT ,vf,ierr)) >> 218: <>! PetscCallA (KSPMonitorSet (ksp,KSPMonitorResidual ,vf,PetscViewerAndFormatDestroy ,ierr)) >> >> Instead, that example uses: >> >> 210: <> if (flg) then >> 211: <> vzero = 0 >> 212: <> PetscCallA (KSPMonitorSet (ksp,MyKSPMonitor,vzero,PETSC_NULL_FUNCTION,ierr)) >> 213: <>! >> Regardless of which of these approaches I try, I cannot use KSPMonitorTrueResidual in the MyKSPMonitor routine. >> I get the following error: >> >> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [0]PETSC ERROR: Null argument, when expecting valid pointer >> [0]PETSC ERROR: Null Pointer: Parameter # 4 >> [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!cpFnpAAfsY4pLNBenaae5toN0iiU6tx2ilGtp9nukqV_fjZQXlneIci6mh96yqnVRvaPr38SO2VFxIQnXb8J5-MKxg$ for trouble shooting. >> [0]PETSC ERROR: PETSc Release Version 3.23.3, May 30, 2025 >> [0]PETSC ERROR: ./test with 2 MPI process(es) and PETSC_ARCH linux-gfortran-complex-debug on rmackie-VirtualBox-2024 by rmackie Mon Jun 23 11:34:04 2025 >> [0]PETSC ERROR: Configure options: --with-clean=1 --with-scalar-type=complex --with-debugging=1 --with-fortran=1 --download-mpich=1 >> [0]PETSC ERROR: #1 KSPMonitorTrueResidual() at /home/rmackie/PETSc/petsc-3.23.3/src/ksp/ksp/interface/iterativ.c:400 >> [0]PETSC ERROR: #2 test.F90:303 >> >> >> I attach a slightly modified version of the example that demonstrates this behavior. >> >> >> Thanks for the help, >> >> Randy >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From edoardo.alinovi at gmail.com Fri Jul 18 12:20:24 2025 From: edoardo.alinovi at gmail.com (Edoardo alinovi) Date: Fri, 18 Jul 2025 19:20:24 +0200 Subject: [petsc-users] MatMPIAIJSetPreallocation deadlocks Message-ID: Hello Petsc friends, Hope you are all doing well. Today I was doing a simulation (27Mln cell on 64 cores) and I came across an issue. Indeed, I am deadlocking somewhere in *MatMPIAIJSetPreallocation. D*o you have any clue about the reason for this? Any suggestions to track this down? Many thanks, Edo -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Fri Jul 18 12:36:40 2025 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Fri, 18 Jul 2025 19:36:40 +0200 Subject: [petsc-users] MatMPIAIJSetPreallocation deadlocks In-Reply-To: References: Message-ID: Causes for deadlock of this type are usually memory corruption or calling collectives in two different places. You should try with a debug version of the library first, then try valgrind or address sanitizer compilation flags Stefano On Fri, Jul 18, 2025, 19:20 Edoardo alinovi wrote: > Hello Petsc friends, > > Hope you are all doing well. > > Today I was doing a simulation (27Mln cell on 64 cores) and I came across > an issue. Indeed, I am deadlocking somewhere in *MatMPIAIJSetPreallocation. > D*o you have any clue about the reason for this? Any suggestions to track > this down? > > Many thanks, > > Edo > -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Fri Jul 18 15:08:44 2025 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 18 Jul 2025 15:08:44 -0500 Subject: [petsc-users] MatMPIAIJSetPreallocation deadlocks In-Reply-To: References: Message-ID: Do you have any chance to collect stack traces of all the MPI processes? --Junchao Zhang On Fri, Jul 18, 2025 at 12:20?PM Edoardo alinovi wrote: > Hello Petsc friends, > > Hope you are all doing well. > > Today I was doing a simulation (27Mln cell on 64 cores) and I came across > an issue. Indeed, I am deadlocking somewhere in *MatMPIAIJSetPreallocation. > D*o you have any clue about the reason for this? Any suggestions to track > this down? > > Many thanks, > > Edo > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sat Jul 19 17:40:41 2025 From: mfadams at lbl.gov (Mark Adams) Date: Sat, 19 Jul 2025 18:40:41 -0400 Subject: [petsc-users] MatMPIAIJSetPreallocation deadlocks In-Reply-To: References: Message-ID: Valgrind is a good place to start, but it can be hard to use ... so if you are clean or don't want to bother, DDT is useful. If you have ddt you can simply run interactively and "pause all" and poke around and collect some stack traces (at least one) which is super useful. You can run non-interactively with something like: srun -n 4 ddt --offline --output=ddt-output.html --snapshot-interval= ./myprogram This should dump a stack trace, periodically, into ddt-output.html that is readable and has stack variables, for all processors. Mark On Fri, Jul 18, 2025 at 4:09?PM Junchao Zhang wrote: > Do you have any chance to collect stack traces of all the MPI processes? > > --Junchao Zhang > > > On Fri, Jul 18, 2025 at 12:20?PM Edoardo alinovi < > edoardo.alinovi at gmail.com> wrote: > >> Hello Petsc friends, >> >> Hope you are all doing well. >> >> Today I was doing a simulation (27Mln cell on 64 cores) and I came across >> an issue. Indeed, I am deadlocking somewhere in *MatMPIAIJSetPreallocation. >> D*o you have any clue about the reason for this? Any suggestions to >> track this down? >> >> Many thanks, >> >> Edo >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Mon Jul 21 06:48:11 2025 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 21 Jul 2025 07:48:11 -0400 Subject: [petsc-users] Segfaults when calling some PETSc functions in Fortran with variables initialized to PETSC_NULL_XX In-Reply-To: References: Message-ID: Hi Robert, [I don't see this message to PETSc users in my email so cc'ing manually] This is odd, something is messed up. These NULL things are certainly 0 and the initial value of these objects is not defined by the compiler, and they could be 0 in some cases. Others are more up on the new Fortran stuff, but I would do a clean build of PETSc (rm -fr arch...), and turn debugging on. You will get a stack trace and it might find something and give a useful error message, or "fix" the problem mysteriously. You could put a print statement in before these are initialized to see what *was* in it that you are clobbering. Mark On Sun, Jul 20, 2025 at 10:33?AM Robert Hager wrote: > FYI > > ---------- Forwarded message --------- > From: Robert Hager > Date: Sun, Jul 20, 2025 at 10:32?AM > Subject: Segfaults when calling some PETSc functions in Fortran with > variables initialized to PETSC_NULL_XX > To: > > > Hello, > > I am in the process of updating my code to use PETSc versions >=3.22. (I > am using v3.22.3 compiled with GCC on Perlmutter-CPU at NERSC for debugging > at this time.) After updating all the Fortran function calls that have > changed from v3.21 to v3.22 I am now getting segfaults in some PETSc > routines. In the two instances I was able to identify, the problem seems to > be calling the PETSc function with a PETSc variable that has been > initialized to PETSC_NULL_XX: > > solver%rhs2_mat = PETSC_NULL_MAT > > [...] > > *call* MatDuplicate(solver%Amat,MAT_DO_NOT_COPY_VALUES,solver%rhs2 > _mat,ierr) > > --> Segfault > > > solver%ksp = PETSC_NULL_KSP > > [...] > > *call* KSPCreate(solver%comm,solver%ksp,ierr) > > --> Segfault > > > When I comment out the assignments to PETSC_NULL_XX in the above examples, > the code works just fine. > > Is this the intended behavior or a bug that you might have fixed by now? > > Best, > > Robert > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jul 21 07:49:57 2025 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 21 Jul 2025 08:49:57 -0400 Subject: [petsc-users] Segfaults when calling some PETSc functions in Fortran with variables initialized to PETSC_NULL_XX In-Reply-To: References: Message-ID: On Mon, Jul 21, 2025 at 7:49?AM Mark Adams wrote: > Hi Robert, [I don't see this message to PETSc users in my email so cc'ing > manually] > > This is odd, something is messed up. > These NULL things are certainly 0 and the initial value of these objects > is not defined by the compiler, and they could be 0 in some cases. > Hi Robert, I am not the Fortran expert, but I believe that passing PETSC_NULL_XXX to a creation routine is not correct. Here is my logic. In the wrapper, PETSC_NULL_XXX is converted to a NULL pointer (_not_ a pointer whose value is NULL). Thus KSPCreate would try to set a NULL pointer and you get the SEGV. I think it may have worked before because this was not automated, so these routines did not check for PETSC_NULL_XXX since it is not a case handled by the C function. Could you try without initializing? Barry would have to tell you what they should be initialized to. Thanks, Matt > Others are more up on the new Fortran stuff, but I would do a clean build > of PETSc (rm -fr arch...), and turn debugging on. > You will get a stack trace and it might find something and give a useful > error message, or "fix" the problem mysteriously. > > You could put a print statement in before these are initialized to see > what *was* in it that you are clobbering. > > Mark > > > On Sun, Jul 20, 2025 at 10:33?AM Robert Hager wrote: > >> FYI >> >> ---------- Forwarded message --------- >> From: Robert Hager >> Date: Sun, Jul 20, 2025 at 10:32?AM >> Subject: Segfaults when calling some PETSc functions in Fortran with >> variables initialized to PETSC_NULL_XX >> To: >> >> >> Hello, >> >> I am in the process of updating my code to use PETSc versions >=3.22. (I >> am using v3.22.3 compiled with GCC on Perlmutter-CPU at NERSC for debugging >> at this time.) After updating all the Fortran function calls that have >> changed from v3.21 to v3.22 I am now getting segfaults in some PETSc >> routines. In the two instances I was able to identify, the problem seems to >> be calling the PETSc function with a PETSc variable that has been >> initialized to PETSC_NULL_XX: >> >> solver%rhs2_mat = PETSC_NULL_MAT >> >> [...] >> >> *call* MatDuplicate(solver%Amat,MAT_DO_NOT_COPY_VALUES,solver%rhs2 >> _mat,ierr) >> >> --> Segfault >> >> >> solver%ksp = PETSC_NULL_KSP >> >> [...] >> >> *call* KSPCreate(solver%comm,solver%ksp,ierr) >> >> --> Segfault >> >> >> When I comment out the assignments to PETSC_NULL_XX in the above >> examples, the code works just fine. >> >> Is this the intended behavior or a bug that you might have fixed by now? >> >> Best, >> >> Robert >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Ylz6mzmFkBJe1SAiy6sraNQ79TbZP3WWJtJY0kJSoeydhbohH8fH8uPd-f_ig_mjlKa2S7MME6rnPSJOqFhW$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Jul 21 08:10:36 2025 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 21 Jul 2025 09:10:36 -0400 Subject: [petsc-users] Segfaults when calling some PETSc functions in Fortran with variables initialized to PETSC_NULL_XX In-Reply-To: References: Message-ID: <37471BDB-E576-4364-8A37-69BA4D6D2F86@petsc.dev> One passes PETSC_NULL_XXX as an argument that one does not want to get filled up, an optional argument. Similarly in PETSc C one passes NULL as an optional argument. You should not be setting variables to PETSC_NULL_XXX. When they are declared in your code they are automatically correctly initialized to an appropriate default value (which is not PETSC_NULL_XXX). You can use code such as if (PetscObjectIsNull(vec)) then to check if a PetscObject is currently validly set. Do not use if (vec == PETSC_NULL_VEC) then So, for example, VecCreate(..., vec,ierr) .... VecDestroy(vec,ierr) if (PetscObjectIsNull(vec)) then print*, 'vector was appropriately freed') endif VecCreate(..., vec,ierr) .... Any further questions, please ask on petsc-maint or petsc-users Barry > On Jul 21, 2025, at 8:49?AM, Matthew Knepley wrote: > > On Mon, Jul 21, 2025 at 7:49?AM Mark Adams > wrote: >> Hi Robert, [I don't see this message to PETSc users in my email so cc'ing manually] >> >> This is odd, something is messed up. >> These NULL things are certainly 0 and the initial value of these objects is not defined by the compiler, and they could be 0 in some cases. > > Hi Robert, > > I am not the Fortran expert, but I believe that passing PETSC_NULL_XXX to a creation routine is not correct. Here is my logic. > > In the wrapper, PETSC_NULL_XXX is converted to a NULL pointer (_not_ a pointer whose value is NULL). Thus KSPCreate would try to set a NULL pointer and you get the SEGV. I think it may have worked before because this was not automated, so these routines did not check for PETSC_NULL_XXX since it is not a case handled by the C function. > > Could you try without initializing? Barry would have to tell you what they should be initialized to. > > Thanks, > > Matt > >> Others are more up on the new Fortran stuff, but I would do a clean build of PETSc (rm -fr arch...), and turn debugging on. >> You will get a stack trace and it might find something and give a useful error message, or "fix" the problem mysteriously. >> >> You could put a print statement in before these are initialized to see what *was* in it that you are clobbering. >> >> Mark >> >> >> On Sun, Jul 20, 2025 at 10:33?AM Robert Hager > wrote: >>> FYI >>> >>> ---------- Forwarded message --------- >>> From: Robert Hager > >>> Date: Sun, Jul 20, 2025 at 10:32?AM >>> Subject: Segfaults when calling some PETSc functions in Fortran with variables initialized to PETSC_NULL_XX >>> To: > >>> >>> >>> Hello, >>> >>> I am in the process of updating my code to use PETSc versions >=3.22. (I am using v3.22.3 compiled with GCC on Perlmutter-CPU at NERSC for debugging at this time.) After updating all the Fortran function calls that have changed from v3.21 to v3.22 I am now getting segfaults in some PETSc routines. In the two instances I was able to identify, the problem seems to be calling the PETSc function with a PETSc variable that has been initialized to PETSC_NULL_XX: >>> >>> solver%rhs2_mat = PETSC_NULL_MAT >>> [...] >>> call MatDuplicate(solver%Amat,MAT_DO_NOT_COPY_VALUES,solver%rhs2_mat,ierr) >>> --> Segfault >>> >>> solver%ksp = PETSC_NULL_KSP >>> [...] >>> call KSPCreate(solver%comm,solver%ksp,ierr) >>> --> Segfault >>> >>> When I comment out the assignments to PETSC_NULL_XX in the above examples, the code works just fine. >>> >>> Is this the intended behavior or a bug that you might have fixed by now? >>> >>> Best, >>> >>> Robert > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!egzw7ALU5FsbUczu5y2X9FXQ7OQkGlERHVGi34DRcifCs4AIpGl4eCMgo5jZtd3OCkBB4NFbMiXA8gyS7yRnYjA$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rhager at pppl.gov Sun Jul 20 09:32:25 2025 From: rhager at pppl.gov (Robert Hager) Date: Sun, 20 Jul 2025 10:32:25 -0400 Subject: [petsc-users] Segfaults when calling some PETSc functions in Fortran with variables initialized to PETSC_NULL_XX Message-ID: Hello, I am in the process of updating my code to use PETSc versions >=3.22. (I am using v3.22.3 compiled with GCC on Perlmutter-CPU at NERSC for debugging at this time.) After updating all the Fortran function calls that have changed from v3.21 to v3.22 I am now getting segfaults in some PETSc routines. In the two instances I was able to identify, the problem seems to be calling the PETSc function with a PETSc variable that has been initialized to PETSC_NULL_XX: solver%rhs2_mat = PETSC_NULL_MAT [...] *call* MatDuplicate(solver%Amat,MAT_DO_NOT_COPY_VALUES,solver%rhs2_mat,ierr) --> Segfault solver%ksp = PETSC_NULL_KSP [...] *call* KSPCreate(solver%comm,solver%ksp,ierr) --> Segfault When I comment out the assignments to PETSC_NULL_XX in the above examples, the code works just fine. Is this the intended behavior or a bug that you might have fixed by now? Best, Robert -------------- next part -------------- An HTML attachment was scrubbed... URL: From rhager at pppl.gov Mon Jul 21 08:38:23 2025 From: rhager at pppl.gov (Robert Hager) Date: Mon, 21 Jul 2025 09:38:23 -0400 Subject: [petsc-users] [External] Re: Segfaults when calling some PETSc functions in Fortran with variables initialized to PETSC_NULL_XX In-Reply-To: <37471BDB-E576-4364-8A37-69BA4D6D2F86@petsc.dev> References: <37471BDB-E576-4364-8A37-69BA4D6D2F86@petsc.dev> Message-ID: Thanks, all! That makes sense (and the code worked without the initializations in question). BTW, my original email to the PETSc mailing list was held up for review by a moderator yesterday. So that's probably why you haven't seen it there yet. Best, Robert On Mon, Jul 21, 2025 at 9:10?AM Barry Smith wrote: > > One passes PETSC_NULL_XXX as an argument that one does not want to get > filled up, an optional argument. Similarly in PETSc C one passes NULL as an > optional argument. > > You should not be setting variables to PETSC_NULL_XXX. When they are > declared in your code they are automatically correctly initialized to an > appropriate default value (which is not PETSC_NULL_XXX). > > You can use code such as > > if (PetscObjectIsNull(vec)) then > > to check if a PetscObject is currently validly set. Do not use > > if (vec == PETSC_NULL_VEC) then > > So, for example, > > VecCreate(..., vec,ierr) > .... > VecDestroy(vec,ierr) > if (PetscObjectIsNull(vec)) then > print*, 'vector was appropriately freed') > endif > VecCreate(..., vec,ierr) > .... > > Any further questions, please ask on petsc-maint or petsc-users > > Barry > > > > On Jul 21, 2025, at 8:49?AM, Matthew Knepley wrote: > > On Mon, Jul 21, 2025 at 7:49?AM Mark Adams wrote: > >> Hi Robert, [I don't see this message to PETSc users in my email so cc'ing >> manually] >> >> This is odd, something is messed up. >> These NULL things are certainly 0 and the initial value of these objects >> is not defined by the compiler, and they could be 0 in some cases. >> > > Hi Robert, > > I am not the Fortran expert, but I believe that passing PETSC_NULL_XXX to > a creation routine is not correct. Here is my logic. > > In the wrapper, PETSC_NULL_XXX is converted to a NULL pointer (_not_ a > pointer whose value is NULL). Thus KSPCreate would try to set a NULL > pointer and you get the SEGV. I think it may have worked before because > this was not automated, so these routines did not check for PETSC_NULL_XXX > since it is not a case handled by the C function. > > Could you try without initializing? Barry would have to tell you what they > should be initialized to. > > Thanks, > > Matt > > >> Others are more up on the new Fortran stuff, but I would do a clean build >> of PETSc (rm -fr arch...), and turn debugging on. >> You will get a stack trace and it might find something and give a useful >> error message, or "fix" the problem mysteriously. >> >> You could put a print statement in before these are initialized to see >> what *was* in it that you are clobbering. >> >> Mark >> >> >> On Sun, Jul 20, 2025 at 10:33?AM Robert Hager wrote: >> >>> FYI >>> >>> ---------- Forwarded message --------- >>> From: Robert Hager >>> Date: Sun, Jul 20, 2025 at 10:32?AM >>> Subject: Segfaults when calling some PETSc functions in Fortran with >>> variables initialized to PETSC_NULL_XX >>> To: >>> >>> >>> Hello, >>> >>> I am in the process of updating my code to use PETSc versions >=3.22. (I >>> am using v3.22.3 compiled with GCC on Perlmutter-CPU at NERSC for debugging >>> at this time.) After updating all the Fortran function calls that have >>> changed from v3.21 to v3.22 I am now getting segfaults in some PETSc >>> routines. In the two instances I was able to identify, the problem seems to >>> be calling the PETSc function with a PETSc variable that has been >>> initialized to PETSC_NULL_XX: >>> >>> solver%rhs2_mat = PETSC_NULL_MAT >>> [...] >>> *call* MatDuplicate(solver%Amat,MAT_DO_NOT_COPY_VALUES,solver%rhs2 >>> _mat,ierr) >>> --> Segfault >>> >>> solver%ksp = PETSC_NULL_KSP >>> [...] >>> *call* KSPCreate(solver%comm,solver%ksp,ierr) >>> --> Segfault >>> >>> >>> When I comment out the assignments to PETSC_NULL_XX in the above >>> examples, the code works just fine. >>> >>> Is this the intended behavior or a bug that you might have fixed by now? >>> >>> Best, >>> >>> Robert >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ff1QrxkF_3zJXayYxscsHslgcC5yaFfwPYqgEvYwhn9Zc_NC9pomafs9Ry0gaVyQwhFIk2ddjV3fAIFTeqS2zg$ > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bourdin at mcmaster.ca Tue Jul 22 09:23:12 2025 From: bourdin at mcmaster.ca (Blaise Bourdin) Date: Tue, 22 Jul 2025 14:23:12 +0000 Subject: [petsc-users] Printing _some_ options Message-ID: <94ABA8B4-8703-4A87-82A4-5F0B632D777E@mcmaster.ca> Hi, My (Fortran) code currently makes heavy use of PetscBag to store parameters and options in fortran derived types. I would like to simplify it by getting rid of these bags and simply querying the PETSc options database as needed. One nice thing about PetscBags, though, is that I can easily display help messages related to a specific family of options by calling PetscBagView and get nicely formatted output like this: Registering cell set 1 prefix: cs0001_ PetscBag Object: Cell set 1 (cs0001_) HeatXferCellSetOptions MEF90 Heat transfer Cell Set options Flux = 0. ; [J.s^(-1).m^(-3) / J.s^(-1).m^(-2)] (f): Internal / applied heat flux TemperatureBC = FALSE; Temperature has Dirichlet boundary Condition (Y/N) boundaryTemperature = 0. [K] (); Temperature boundary value advectionVector = 0. 0. 0. ; [m.s^(-1)] (V): advection vector even when -help is not passed in the command line, so that these messages are not drowned in Vec, SNES etc help messages. Is there an easy way to achieve the same thing when using PetscOptionsXXX or PetscOptionsGetXXX? Ideally, I?d like to be able to do -mef90_help and display only the help messages associated to the calls to PetscOptionsXXX or PetscOptionsGetXXX I made and have been executed. Does that make sense? Regards, Blaise ? Canada Research Chair in Mathematical and Computational Aspects of Solid Mechanics (Tier 1) Professor, Department of Mathematics & Statistics Hamilton Hall room 409A, McMaster University 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada https://urldefense.us/v3/__https://www.math.mcmaster.ca/bourdin__;!!G_uCfscf7eWS!fgK0I4XQdmA4YFzlulAe0gnPsrtmmt_KjqXJy9r3CefbaPNCHyAQIZFpe-XT4H2K0zjYzsxw7ZuSQtzS4WfS1ZUg$ | +1 (905) 525 9140 ext. 27243 From yangzongze at gmail.com Tue Jul 22 09:48:42 2025 From: yangzongze at gmail.com (Zongze Yang) Date: Tue, 22 Jul 2025 14:48:42 +0000 Subject: [petsc-users] problem with nested logging, standalone example In-Reply-To: References: <7EBA5795-308C-423C-A6B5-919F0DB8E76A@petsc.dev> Message-ID: Hi, I encountered a similar issue with Firedrake when using the -log_view option with XML format on macOS. Below is the error message. The Firedrake code and the shell script used to run it are attached. ``` [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: General MPI error [0]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!eiv8Wo1VhQz4c2L8MbDoPcg0KZ0loiWlwjI1MR6VEtFfLWTjZNV4UssfSUT-F9tKXb2GjX8Ar-YrWmBGIAY9ujQp$ for trouble shooting. [0]PETSC ERROR: PETSc Release Version 3.23.4, unknown [0]PETSC ERROR: test.py with 2 MPI process(es) and PETSC_ARCH arch-firedrake-default on 192.168.10.51 by zzyang Tue Jul 22 22:24:05 2025 [0]PETSC ERROR: Configure options: PETSC_ARCH=arch-firedrake-default --COPTFLAGS="-O3 -march=native -mtune=native" --CXXOPTFLAGS="-O3 -march=native -mtune=native" --FOPTFLAGS="-O3 -mtune=native" --with-c2html=0 --with-debugging=0 --with-fortran-bindings=0 --with-shared-libraries=1 --with-strict-petscerrorcode --download-cmake --download-bison --download-fftw --download-mumps-avoid-mpi-in-place --with-hdf5-dir=/opt/homebrew --with-hwloc-dir=/opt/homebrew --download-metis --download-mumps --download-netcdf --download-pnetcdf --download-ptscotch --download-scalapack --download-suitesparse --download-superlu_dist --download-slepc --with-zlib --download-hpddm --download-libpng --download-ctetgen --download-tetgen --download-triangle --download-mmg --download-parmmg --download-p4est --download-eigen --download-hypre --download-pragmatic [0]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:289 [0]PETSC ERROR: #2 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:383 [0]PETSC ERROR: #3 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #4 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #5 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #6 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #7 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #8 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #9 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #10 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #11 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #12 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #13 PetscLogNestedTreePrintTop() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:420 [0]PETSC ERROR: #14 PetscLogHandlerView_Nested_XML() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:443 [0]PETSC ERROR: #15 PetscLogHandlerView_Nested() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/lognested.c:405 [0]PETSC ERROR: #16 PetscLogHandlerView() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/interface/loghandler.c:342 [0]PETSC ERROR: #17 PetscLogView() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/plog.c:2043 [0]PETSC ERROR: #18 PetscLogViewFromOptions() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/plog.c:2084 [0]PETSC ERROR: #19 PetscFinalize() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/objects/pinit.c:1552 PetscFinalize() failed [error code: 98] -------------------------------------------------------------------------- prterun has exited due to process rank 0 with PID 28986 on node 192.168.10.51 exiting improperly. There are three reasons this could occur: 1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination. 2. this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to exiting or it will be considered an "abnormal termination" 3. this process called "MPI_Abort" or "prte_abort" and the mca parameter prte_create_session_dirs is set to false. In this case, the run-time cannot detect that the abort call was an abnormal termination. Hence, the only error message you will receive is this one. This may have caused other processes in the application to be terminated by signals sent by prterun (as reported here). You can avoid this message by specifying -quiet on the prterun command line. -------------------------------------------------------------------------- ``` Best wishes, Zongze From: petsc-users on behalf of Klaij, Christiaan via petsc-users Date: Monday, July 14, 2025 at 15:58 To: Barry Smith Cc: PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example @Junchao: yes, all with my ex2f.F90 variation on two or three cores @Barry: it's really puzzling that you cannot reproduce. Can you try running it a dozen times in a row? And look at the report_performance.xml file? When it hangs I see some nan's, for instance here in the VecAXPY event: VecAXPY 0.5 0. 1. 1 0 self This is what I did in my latest attempt on the login node of our Rocky Linux 9 cluster: 1) download petsc-3.23.4.tar.gz from the petsc website 2) ./configure -prefix=~/petsc/install --with-cxx=0 --with-debugging=0 --with-mpi-dir=/cm/shared/apps/mpich/ge/gcc/64/3.4.2 3) adjust my example to this version of petsc (file is attached) 4) make ex2f-cklaij-dbg-v2 5) mpirun -n 2 ./ex2f-cklaij-dbg-v2 So the exact versions are: petsc-3.23.4, system mpich 3.4.2, system gcc 11.5.0 ________________________________________ From: Barry Smith Sent: Friday, July 11, 2025 11:22 PM To: Klaij, Christiaan Cc: Junchao Zhang; PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example And yet we cannot reproduce. Please tell us the exact PETSc version and MPI implementation versions. And reattach your reproducing example. And exactly how you run it. Can you reproduce it on an "ordinary" machine, say a Mac or Linux laptop. Barry If I could reproduce the problem here is how I would debug. I put use -start_in_debugger and then put break points in places which it seem problematic. Presumably I would end up with a hang with each MPI process in a "different place" and from that I may be able to determine how that happened. > On Jul 11, 2025, at 7:58?AM, Klaij, Christiaan wrote: > > In summary for future reference: > - tested 3 different machines, two at Marin, one at the national HPC > - tested 3 different mpi implementation (intelmpi, openmpi and mpich) > - tested openmpi in both release and debug > - tested 2 different compilers (intel and gnu), both older and very recent versions > - tested with the most basic config (./configure --with-cxx=0 --with-debugging=0 --download-mpich) > > All of these test either segfault, or hang or error-out at the call to PetscLogView. > > Chris > > ________________________________________ > From: Klaij, Christiaan > Sent: Friday, July 11, 2025 10:10 AM > To: Barry Smith; Junchao Zhang > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > @Matt: no MPI errors indeed. I've tried with MPICH and I get the same hanging. > @Barry: both stack traces aren't exactly the same, see a sample with MPICH below. > > If it cannot be reproduced at your side, I'm afraid this is another dead end. Thanks anyway, I really appreciate all your help. > > Chris > > (gdb) bt > #0 0x000015555033bc2e in MPIDI_POSIX_mpi_release_gather_gather.constprop.0 () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #1 0x000015555033db8a in MPIDI_POSIX_mpi_allreduce_release_gather () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #2 0x000015555033e70f in MPIR_Allreduce () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #3 0x000015555033f22e in PMPI_Allreduce () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #4 0x0000155553f85d69 in MPIU_Allreduce_Count (comm=-2080374782, > op=1476395020, dtype=1275072547, count=1, outbuf=0x7fffffffac70, > inbuf=0x7fffffffac60) > at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1839 > #5 MPIU_Allreduce_Private (inbuf=inbuf at entry=0x7fffffffac60, > outbuf=outbuf at entry=0x7fffffffac70, count=count at entry=1, > dtype=dtype at entry=1275072547, op=op at entry=1476395020, comm=-2080374782) > at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1869 > #6 0x0000155553f33dbe in PetscPrintXMLNestedLinePerfResults ( > viewer=viewer at entry=0x458890, name=name at entry=0x155554ef6a0d 'mbps\000', > value=, minthreshold=minthreshold at entry=0, > maxthreshold=maxthreshold at entry=0.01, > minmaxtreshold=minmaxtreshold at entry=1.05) > at /home/cklaij/petsc/petsc-3.23.4/src/sys/logging/handler/impls/nested/xmlviewer.c:255 > > > (gdb) bt > #0 0x000015554fed3b17 in clock_gettime at GLIBC_2.2.5 () from /lib64/libc.so.6 > #1 0x0000155550b0de71 in ofi_gettime_ns () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #2 0x0000155550b0dec9 in ofi_gettime_ms () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #3 0x0000155550b2fab5 in sock_cq_sreadfrom () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #4 0x00001555505ca6f7 in MPIDI_OFI_progress () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #5 0x0000155550591fe9 in progress_test () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #6 0x00001555505924a3 in MPID_Progress_wait () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #7 0x000015555043463e in MPIR_Wait_state () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #8 0x000015555052ec49 in MPIC_Wait () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #9 0x000015555053093e in MPIC_Sendrecv () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #10 0x00001555504bf674 in MPIR_Allreduce_intra_recursive_doubling () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #11 0x00001555505b61de in MPIDI_OFI_mpi_finalize_hook () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > > ________________________________________ > From: Barry Smith > Sent: Thursday, July 10, 2025 11:10 PM > To: Junchao Zhang > Cc: Klaij, Christiaan; PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > > I cannot reproduce > > On Jul 10, 2025, at 3:46?PM, Junchao Zhang wrote: > > Adding -mca coll_hcoll_enable 0 didn't change anything at my end. Strange. > > --Junchao Zhang > > > On Thu, Jul 10, 2025 at 3:39?AM Klaij, Christiaan > wrote: > An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0, the code does not hang but gives the error below. > > Chris > > > $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 1.11803 > 1 KSP Residual norm 0.591608 > 2 KSP Residual norm 0.316228 > 3 KSP Residual norm < 1.e-11 > 0 KSP Residual norm 0.707107 > 1 KSP Residual norm 0.408248 > 2 KSP Residual norm < 1.e-11 > Norm of error < 1.e-12 iterations 3 > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: General MPI error > [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer > [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK43J9p4SM$ for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 > [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on login1 by cklaij Thu Jul 10 10:33:33 2025 > [1]PETSC ERROR: Configure options: --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 --with-mpe=0 --with-debugging=0 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4VVy6P4U$ --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4-9b1K84$ --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4Y9uaqiQ$ --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" > [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 > [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377 > [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 > [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 > [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 > [1]PETSC ERROR: #7 PetscLogHandlerView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 > [1]PETSC ERROR: #8 PetscLogView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040 > [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301 > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF > Proc: [[55228,1],1] > Errorcode: 98 > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > prterun has exited due to process rank 1 with PID 0 on node login1 calling > "abort". This may have caused other processes in the application to be > terminated by signals sent by prterun (as reported here). > -------------------------------------------------------------------------- > > ________________________________________ > > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4BUEn1h8$ > > > > > > From: Klaij, Christiaan > > Sent: Thursday, July 10, 2025 10:15 AM > To: Junchao Zhang > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > Hi Junchao, > > Thanks for testing. I've fixed the error but unfortunately that doesn't change the behavior, the code still hangs as before, with the same stack trace... > > Chris > > ________________________________________ > From: Junchao Zhang > > Sent: Tuesday, July 8, 2025 10:58 PM > To: Klaij, Christiaan > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > Hi, Chris, > First, I had to fix an error in your test by adding " PetscCallA(MatSetFromOptions(AA,ierr))" at line 254. > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Mat object's type is not set: Argument # 1 > ... > [0]PETSC ERROR: #1 MatSetValues() at /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503 > [0]PETSC ERROR: #2 ex2f.F90:258 > > Then I could ran the test without problems > mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 1.11803 > 1 KSP Residual norm 0.591608 > 2 KSP Residual norm 0.316228 > 3 KSP Residual norm < 1.e-11 > 0 KSP Residual norm 0.707107 > 1 KSP Residual norm 0.408248 > 2 KSP Residual norm < 1.e-11 > Norm of error < 1.e-12 iterations 3 > > I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with > ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-openmpi --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" > > Could you fix the error and retry? > > --Junchao Zhang > > > On Sun, Jul 6, 2025 at 12:57?PM Klaij, Christiaan via petsc-users >> wrote: > Attached is a standalone example of the issue described in the > earlier thread "problem with nested logging". The issue appeared > somewhere between petsc 3.19.4 and 3.23.4. > > The example is a variation of ../ksp/tutorials/ex2f.F90, where > I've added the nested log viewer with one event as well as the > solution of a small system on rank zero. > > When running on mulitple procs the example hangs during > PetscLogView with the backtrace below. The configure.log is also > attached in the hope that you can replicate the issue. > > Chris > > > #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, > datatype=0x15554c9ef900 , src=1, tag=-12, > comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 > #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > at base/coll_base_allreduce.c:247 > #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630, > algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142 > #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > at coll_tuned_decision_fixed.c:216 > #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, > rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaecb80) > at coll_hcoll_ops.c:217 > #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, > recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30) at allreduce.c:123 > #6 0x0000155553eabede in MPIU_Allreduce_Private () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #15 0x0000155553e56232 in PetscLogHandlerView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #16 0x0000155553e588c3 in PetscLogView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #17 0x0000155553e40eb5 in petsclogview_ () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #18 0x0000000000402c8b in MAIN__ () > #19 0x00000000004023df in main () > [cid:ii_197ebccaa1d27ee6ef21] > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4BUEn1h8$ > [Facebook] > [LinkedIn] > [YouTube] > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: code.tar.gz Type: application/x-gzip Size: 1297 bytes Desc: code.tar.gz URL: From bsmith at petsc.dev Tue Jul 22 13:14:31 2025 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 22 Jul 2025 14:14:31 -0400 Subject: [petsc-users] Printing _some_ options In-Reply-To: <94ABA8B4-8703-4A87-82A4-5F0B632D777E@mcmaster.ca> References: <94ABA8B4-8703-4A87-82A4-5F0B632D777E@mcmaster.ca> Message-ID: Hm, what is the advantage of dropping bags? Maybe we just need to improve bags to make them better for you. > On Jul 22, 2025, at 10:23?AM, Blaise Bourdin wrote: > > Hi, > > My (Fortran) code currently makes heavy use of PetscBag to store parameters and options in fortran derived types. I would like to simplify it by getting rid of these bags and simply querying the PETSc options database as needed. > > One nice thing about PetscBags, though, is that I can easily display help messages related to a specific family of options by calling PetscBagView and get nicely formatted output like this: > > Registering cell set 1 prefix: cs0001_ > PetscBag Object: Cell set 1 (cs0001_) HeatXferCellSetOptions MEF90 Heat transfer Cell Set options > Flux = 0. ; [J.s^(-1).m^(-3) / J.s^(-1).m^(-2)] (f): Internal / applied heat flux > TemperatureBC = FALSE; Temperature has Dirichlet boundary Condition (Y/N) > boundaryTemperature = 0. [K] (); Temperature boundary value > advectionVector = 0. 0. 0. ; [m.s^(-1)] (V): advection vector > > even when -help is not passed in the command line, so that these messages are not drowned in Vec, SNES etc help messages. > > Is there an easy way to achieve the same thing when using PetscOptionsXXX or PetscOptionsGetXXX? > > Ideally, I?d like to be able to do -mef90_help and display only the help messages associated to the calls to PetscOptionsXXX or PetscOptionsGetXXX I made and have been executed. Does that make sense? > > Regards, > Blaise > > ? > Canada Research Chair in Mathematical and Computational Aspects of Solid Mechanics (Tier 1) > Professor, Department of Mathematics & Statistics > Hamilton Hall room 409A, McMaster University > 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada > https://urldefense.us/v3/__https://www.math.mcmaster.ca/bourdin__;!!G_uCfscf7eWS!fgK0I4XQdmA4YFzlulAe0gnPsrtmmt_KjqXJy9r3CefbaPNCHyAQIZFpe-XT4H2K0zjYzsxw7ZuSQtzS4WfS1ZUg$ | +1 (905) 525 9140 ext. 27243 > From junchao.zhang at gmail.com Tue Jul 22 15:18:22 2025 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Tue, 22 Jul 2025 15:18:22 -0500 Subject: [petsc-users] problem with nested logging, standalone example In-Reply-To: References: <7EBA5795-308C-423C-A6B5-919F0DB8E76A@petsc.dev> Message-ID: With Chris's example, I did reproduce the "MPI_ERR_BUFFER: invalid buffer pointer" on a machine. I am looking into it. Thanks. --Junchao Zhang On Tue, Jul 22, 2025 at 9:51?AM Zongze Yang wrote: > Hi, > I encountered a similar issue with Firedrake when using the -log_view option > with XML format on macOS. Below is the error message. The Firedrake code > and the shell script used to run it are attached. > > ``` > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: General MPI error > > [0]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer > > [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!bitkMAVSBHfkO71IthJuocmtSkJAoWdXju0W8ra3pkNhAQ0ULGK2V2SDIVEXJTW6DekHmZoJorx9h8YpGr1EJj_T7kfT$ > > for trouble shooting. > > [0]PETSC ERROR: PETSc Release Version 3.23.4, unknown > > [0]PETSC ERROR: test.py with 2 MPI process(es) and PETSC_ARCH > arch-firedrake-default on 192.168.10.51 by zzyang Tue Jul 22 22:24:05 2025 > > [0]PETSC ERROR: Configure options: PETSC_ARCH=arch-firedrake-default > --COPTFLAGS="-O3 -march=native -mtune=native" --CXXOPTFLAGS="-O3 > -march=native -mtune=native" --FOPTFLAGS="-O3 -mtune=native" > --with-c2html=0 --with-debugging=0 --with-fortran-bindings=0 > --with-shared-libraries=1 --with-strict-petscerrorcode --download-cmake > --download-bison --download-fftw --download-mumps-avoid-mpi-in-place > --with-hdf5-dir=/opt/homebrew --with-hwloc-dir=/opt/homebrew > --download-metis --download-mumps --download-netcdf --download-pnetcdf > --download-ptscotch --download-scalapack --download-suitesparse > --download-superlu_dist --download-slepc --with-zlib --download-hpddm > --download-libpng --download-ctetgen --download-tetgen --download-triangle > --download-mmg --download-parmmg --download-p4est --download-eigen > --download-hypre --download-pragmatic > > [0]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at > /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:289 > > [0]PETSC ERROR: #2 PetscLogNestedTreePrint() at > /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:383 > > [0]PETSC ERROR: #3 PetscLogNestedTreePrint() at > /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > > [0]PETSC ERROR: #4 PetscLogNestedTreePrint() at > /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > > [0]PETSC ERROR: #5 PetscLogNestedTreePrint() at > /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > > [0]PETSC ERROR: #6 PetscLogNestedTreePrint() at > /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > > [0]PETSC ERROR: #7 PetscLogNestedTreePrint() at > /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > > [0]PETSC ERROR: #8 PetscLogNestedTreePrint() at > /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > > [0]PETSC ERROR: #9 PetscLogNestedTreePrint() at > /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > > [0]PETSC ERROR: #10 PetscLogNestedTreePrint() at > /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > > [0]PETSC ERROR: #11 PetscLogNestedTreePrint() at > /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > > [0]PETSC ERROR: #12 PetscLogNestedTreePrint() at > /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > > [0]PETSC ERROR: #13 PetscLogNestedTreePrintTop() at > /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:420 > > [0]PETSC ERROR: #14 PetscLogHandlerView_Nested_XML() at > /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:443 > > [0]PETSC ERROR: #15 PetscLogHandlerView_Nested() at > /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/lognested.c:405 > > [0]PETSC ERROR: #16 PetscLogHandlerView() at > /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/interface/loghandler.c:342 > > [0]PETSC ERROR: #17 PetscLogView() at > /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/plog.c:2043 > > [0]PETSC ERROR: #18 PetscLogViewFromOptions() at > /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/plog.c:2084 > > [0]PETSC ERROR: #19 PetscFinalize() at > /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/objects/pinit.c:1552 > > PetscFinalize() failed [error code: 98] > > -------------------------------------------------------------------------- > > prterun has exited due to process rank 0 with PID 28986 on node > 192.168.10.51 exiting > > improperly. There are three reasons this could occur: > > > 1. this process did not call "init" before exiting, but others in the > > job did. This can cause a job to hang indefinitely while it waits for > > all processes to call "init". By rule, if one process calls "init", > > then ALL processes must call "init" prior to termination. > > > 2. this process called "init", but exited without calling "finalize". > > By rule, all processes that call "init" MUST call "finalize" prior to > > exiting or it will be considered an "abnormal termination" > > > 3. this process called "MPI_Abort" or "prte_abort" and the mca > > parameter prte_create_session_dirs is set to false. In this case, the > > run-time cannot detect that the abort call was an abnormal > > termination. Hence, the only error message you will receive is this > > one. > > > This may have caused other processes in the application to be > > terminated by signals sent by prterun (as reported here). > > > You can avoid this message by specifying -quiet on the prterun command > > line. > > -------------------------------------------------------------------------- > ``` > > Best wishes, > Zongze > > *From: *petsc-users on behalf of Klaij, > Christiaan via petsc-users > *Date: *Monday, July 14, 2025 at 15:58 > *To: *Barry Smith > *Cc: *PETSc users list > *Subject: *Re: [petsc-users] problem with nested logging, standalone > example > > @Junchao: yes, all with my ex2f.F90 variation on two or three cores > > @Barry: it's really puzzling that you cannot reproduce. Can you try > running it a dozen times in a row? And look at the report_performance.xml > file? When it hangs I see some nan's, for instance here in the VecAXPY > event: > > > > VecAXPY > > > 0.5 > 0. > 1. > 1 > 0 > > > > self > > > This is what I did in my latest attempt on the login node of our Rocky > Linux 9 cluster: > 1) download petsc-3.23.4.tar.gz from the petsc website > 2) ./configure -prefix=~/petsc/install --with-cxx=0 --with-debugging=0 > --with-mpi-dir=/cm/shared/apps/mpich/ge/gcc/64/3.4.2 > 3) adjust my example to this version of petsc (file is attached) > 4) make ex2f-cklaij-dbg-v2 > 5) mpirun -n 2 ./ex2f-cklaij-dbg-v2 > > So the exact versions are: petsc-3.23.4, system mpich 3.4.2, system gcc > 11.5.0 > > ________________________________________ > From: Barry Smith > Sent: Friday, July 11, 2025 11:22 PM > To: Klaij, Christiaan > Cc: Junchao Zhang; PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > > And yet we cannot reproduce. > > Please tell us the exact PETSc version and MPI implementation versions. > And reattach your reproducing example. And exactly how you run it. > > > Can you reproduce it on an "ordinary" machine, say a Mac or Linux > laptop. > > Barry > > If I could reproduce the problem here is how I would debug. I put use > -start_in_debugger and then put break points in places which it seem > problematic. Presumably I would end up with a hang with each MPI process in > a "different place" and from that I may be able to determine how that > happened. > > > > > On Jul 11, 2025, at 7:58?AM, Klaij, Christiaan wrote: > > > > In summary for future reference: > > - tested 3 different machines, two at Marin, one at the national HPC > > - tested 3 different mpi implementation (intelmpi, openmpi and mpich) > > - tested openmpi in both release and debug > > - tested 2 different compilers (intel and gnu), both older and very > recent versions > > - tested with the most basic config (./configure --with-cxx=0 > --with-debugging=0 --download-mpich) > > > > All of these test either segfault, or hang or error-out at the call to > PetscLogView. > > > > Chris > > > > ________________________________________ > > From: Klaij, Christiaan > > Sent: Friday, July 11, 2025 10:10 AM > > To: Barry Smith; Junchao Zhang > > Cc: PETSc users list > > Subject: Re: [petsc-users] problem with nested logging, standalone > example > > > > @Matt: no MPI errors indeed. I've tried with MPICH and I get the same > hanging. > > @Barry: both stack traces aren't exactly the same, see a sample with > MPICH below. > > > > If it cannot be reproduced at your side, I'm afraid this is another dead > end. Thanks anyway, I really appreciate all your help. > > > > Chris > > > > (gdb) bt > > #0 0x000015555033bc2e in > MPIDI_POSIX_mpi_release_gather_gather.constprop.0 () > > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > > #1 0x000015555033db8a in MPIDI_POSIX_mpi_allreduce_release_gather () > > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > > #2 0x000015555033e70f in MPIR_Allreduce () > > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > > #3 0x000015555033f22e in PMPI_Allreduce () > > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > > #4 0x0000155553f85d69 in MPIU_Allreduce_Count (comm=-2080374782, > > op=1476395020, dtype=1275072547, count=1, outbuf=0x7fffffffac70, > > inbuf=0x7fffffffac60) > > at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1839 > > #5 MPIU_Allreduce_Private (inbuf=inbuf at entry=0x7fffffffac60, > > outbuf=outbuf at entry=0x7fffffffac70, count=count at entry=1, > > dtype=dtype at entry=1275072547, op=op at entry=1476395020, > comm=-2080374782) > > at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1869 > > #6 0x0000155553f33dbe in PetscPrintXMLNestedLinePerfResults ( > > viewer=viewer at entry=0x458890, name=name at entry=0x155554ef6a0d > 'mbps\000', > > value=, minthreshold=minthreshold at entry=0, > > maxthreshold=maxthreshold at entry=0.01, > > minmaxtreshold=minmaxtreshold at entry=1.05) > > at > /home/cklaij/petsc/petsc-3.23.4/src/sys/logging/handler/impls/nested/xmlviewer.c:255 > > > > > > (gdb) bt > > #0 0x000015554fed3b17 in clock_gettime at GLIBC_2.2.5 () from > /lib64/libc.so.6 > > #1 0x0000155550b0de71 in ofi_gettime_ns () > > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > > #2 0x0000155550b0dec9 in ofi_gettime_ms () > > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > > #3 0x0000155550b2fab5 in sock_cq_sreadfrom () > > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > > #4 0x00001555505ca6f7 in MPIDI_OFI_progress () > > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > > #5 0x0000155550591fe9 in progress_test () > > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > > #6 0x00001555505924a3 in MPID_Progress_wait () > > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > > #7 0x000015555043463e in MPIR_Wait_state () > > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > > #8 0x000015555052ec49 in MPIC_Wait () > > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > > #9 0x000015555053093e in MPIC_Sendrecv () > > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > > #10 0x00001555504bf674 in MPIR_Allreduce_intra_recursive_doubling () > > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > > #11 0x00001555505b61de in MPIDI_OFI_mpi_finalize_hook () > > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > > > > ________________________________________ > > From: Barry Smith > > Sent: Thursday, July 10, 2025 11:10 PM > > To: Junchao Zhang > > Cc: Klaij, Christiaan; PETSc users list > > Subject: Re: [petsc-users] problem with nested logging, standalone > example > > > > > > I cannot reproduce > > > > On Jul 10, 2025, at 3:46?PM, Junchao Zhang > wrote: > > > > Adding -mca coll_hcoll_enable 0 didn't change anything at my end. > Strange. > > > > --Junchao Zhang > > > > > > On Thu, Jul 10, 2025 at 3:39?AM Klaij, Christiaan > wrote: > > An additional clue perhaps: with the option > OMPI_MCA_coll_hcoll_enable=0, the code does not hang but gives the error > below. > > > > Chris > > > > > > $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi > -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > > 0 KSP Residual norm 1.11803 > > 1 KSP Residual norm 0.591608 > > 2 KSP Residual norm 0.316228 > > 3 KSP Residual norm < 1.e-11 > > 0 KSP Residual norm 0.707107 > > 1 KSP Residual norm 0.408248 > > 2 KSP Residual norm < 1.e-11 > > Norm of error < 1.e-12 iterations 3 > > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [1]PETSC ERROR: General MPI error > > [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer > > [1]PETSC ERROR: See > https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK43J9p4SM$ > < > https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJjkYxsN9$> > for trouble shooting. > > [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 > > [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH > on login1 by cklaij Thu Jul 10 10:33:33 2025 > > [1]PETSC ERROR: Configure options: > --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs > --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 > --with-mpe=0 --with-debugging=0 --download-superlu_dist= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4VVy6P4U$ > < > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJkouVHb2$> > --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 > --download-parmetis= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4-9b1K84$ > < > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJrjo6-SP$> > --download-metis= > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4Y9uaqiQ$ > < > https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJhCc9MRE$> > --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild > --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall > -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall > -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall > -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall > -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops > -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime > -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops > -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime > -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops > -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime > -Wno-unused-function -O3 -DNDEBUG" > > [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 > > [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377 > > [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > > [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 > > [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 > > [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 > > [1]PETSC ERROR: #7 PetscLogHandlerView() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 > > [1]PETSC ERROR: #8 PetscLogView() at > /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040 > > [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301 > > > -------------------------------------------------------------------------- > > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF > > Proc: [[55228,1],1] > > Errorcode: 98 > > > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > > You may or may not see output from other processes, depending on > > exactly when Open MPI kills them. > > > -------------------------------------------------------------------------- > > > -------------------------------------------------------------------------- > > prterun has exited due to process rank 1 with PID 0 on node login1 > calling > > "abort". This may have caused other processes in the application to be > > terminated by signals sent by prterun (as reported here). > > > -------------------------------------------------------------------------- > > > > ________________________________________ > > > > dr. ir. Christiaan Klaij | senior researcher > > Research & Development | CFD Development > > T +31 317 49 33 44 | > https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4BUEn1h8$ > < > https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJrOqapgp$ > > > > < > https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJoD4fuV7$ > > > > < > https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJospHf95$ > > > > < > https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJrpsjB_W$ > > > > > > > > From: Klaij, Christiaan > > > Sent: Thursday, July 10, 2025 10:15 AM > > To: Junchao Zhang > > Cc: PETSc users list > > Subject: Re: [petsc-users] problem with nested logging, standalone > example > > > > Hi Junchao, > > > > Thanks for testing. I've fixed the error but unfortunately that doesn't > change the behavior, the code still hangs as before, with the same stack > trace... > > > > Chris > > > > ________________________________________ > > From: Junchao Zhang junchao.zhang at gmail.com>> > > Sent: Tuesday, July 8, 2025 10:58 PM > > To: Klaij, Christiaan > > Cc: PETSc users list > > Subject: Re: [petsc-users] problem with nested logging, standalone > example > > > > Hi, Chris, > > First, I had to fix an error in your test by adding " > PetscCallA(MatSetFromOptions(AA,ierr))" at line 254. > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: Object is in wrong state > > [0]PETSC ERROR: Mat object's type is not set: Argument # 1 > > ... > > [0]PETSC ERROR: #1 MatSetValues() at > /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503 > > [0]PETSC ERROR: #2 ex2f.F90:258 > > > > Then I could ran the test without problems > > mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short > -ksp_gmres_cgs_refinement_type refine_always > > 0 KSP Residual norm 1.11803 > > 1 KSP Residual norm 0.591608 > > 2 KSP Residual norm 0.316228 > > 3 KSP Residual norm < 1.e-11 > > 0 KSP Residual norm 0.707107 > > 1 KSP Residual norm 0.408248 > > 2 KSP Residual norm < 1.e-11 > > Norm of error < 1.e-12 iterations 3 > > > > I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with > > ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran > --download-openmpi --with-ssl=0 --with-shared-libraries=1 > CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" > CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " > COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" > CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " > FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 > -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 > -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 > -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 > -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 > -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 > -DNDEBUG" > > > > Could you fix the error and retry? > > > > --Junchao Zhang > > > > > > On Sun, Jul 6, 2025 at 12:57?PM Klaij, Christiaan via petsc-users < > petsc-users at mcs.anl.gov petsc-users at mcs.anl.gov>> wrote: > > Attached is a standalone example of the issue described in the > > earlier thread "problem with nested logging". The issue appeared > > somewhere between petsc 3.19.4 and 3.23.4. > > > > The example is a variation of ../ksp/tutorials/ex2f.F90, where > > I've added the nested log viewer with one event as well as the > > solution of a small system on rank zero. > > > > When running on mulitple procs the example hangs during > > PetscLogView with the backtrace below. The configure.log is also > > attached in the hope that you can replicate the issue. > > > > Chris > > > > > > #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, > > datatype=0x15554c9ef900 , src=1, tag=-12, > > comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 > > #1 0x000015554c65baff in > ompi_coll_base_allreduce_intra_recursivedoubling ( > > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > > dtype=0x15554c9ef900 , > > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > > at base/coll_base_allreduce.c:247 > > #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( > > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > > dtype=0x15554c9ef900 , > > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630, > > algorithm=3, faninout=0, segsize=0) at > coll_tuned_allreduce_decision.c:142 > > #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( > > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > > dtype=0x15554c9ef900 , > > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > > at coll_tuned_decision_fixed.c:216 > > #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, > > rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , > > op=0x15554ca28980 , comm=0x7f1e30, module=0xaecb80) > > at coll_hcoll_ops.c:217 > > #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, > > recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 > , op=0x15554ca28980 , comm=0x7f1e30) > at allreduce.c:123 > > #6 0x0000155553eabede in MPIU_Allreduce_Private () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > > #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > > #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > > #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > > #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > > #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > > #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > > #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > > #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > > #15 0x0000155553e56232 in PetscLogHandlerView () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > > #16 0x0000155553e588c3 in PetscLogView () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > > #17 0x0000155553e40eb5 in petsclogview_ () from > /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > > #18 0x0000000000402c8b in MAIN__ () > > #19 0x00000000004023df in main () > > [cid:ii_197ebccaa1d27ee6ef21] > > dr. ir. Christiaan Klaij | senior researcher > > Research & Development | CFD Development > > T +31 317 49 33 44 | > https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4BUEn1h8$ > < > https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJhphmV4x$> > > > [Facebook]< > https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkLNCvsiI$ > > > > [LinkedIn]< > https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkrb79Ay4$ > > > > [YouTube]< > https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkJiCoeLw$ > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Jul 22 16:16:20 2025 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 22 Jul 2025 17:16:20 -0400 Subject: [petsc-users] problem with nested logging, standalone example In-Reply-To: References: <7EBA5795-308C-423C-A6B5-919F0DB8E76A@petsc.dev> Message-ID: Yippee! (maybe) > On Jul 22, 2025, at 4:18?PM, Junchao Zhang wrote: > > With Chris's example, I did reproduce the "MPI_ERR_BUFFER: invalid buffer pointer" on a machine. I am looking into it. > > Thanks. > --Junchao Zhang > > > On Tue, Jul 22, 2025 at 9:51?AM Zongze Yang > wrote: >> Hi, >> I encountered a similar issue with Firedrake when using the -log_view option with XML format on macOS. Below is the error message. The Firedrake code and the shell script used to run it are attached. >> >> ``` >> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [0]PETSC ERROR: General MPI error >> [0]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer >> [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!e6os5i6Vr0sPXPU9Ut4fBH23Zm5rhuv1OQzrN-_auwqE3MHOqaBjE9-qILFZzFX8uJoxgIXWCDijPrpqKAzeMGg$ for trouble shooting. >> [0]PETSC ERROR: PETSc Release Version 3.23.4, unknown >> [0]PETSC ERROR: test.py with 2 MPI process(es) and PETSC_ARCH arch-firedrake-default on 192.168.10.51 by zzyang Tue Jul 22 22:24:05 2025 >> [0]PETSC ERROR: Configure options: PETSC_ARCH=arch-firedrake-default --COPTFLAGS="-O3 -march=native -mtune=native" --CXXOPTFLAGS="-O3 -march=native -mtune=native" --FOPTFLAGS="-O3 -mtune=native" --with-c2html=0 --with-debugging=0 --with-fortran-bindings=0 --with-shared-libraries=1 --with-strict-petscerrorcode --download-cmake --download-bison --download-fftw --download-mumps-avoid-mpi-in-place --with-hdf5-dir=/opt/homebrew --with-hwloc-dir=/opt/homebrew --download-metis --download-mumps --download-netcdf --download-pnetcdf --download-ptscotch --download-scalapack --download-suitesparse --download-superlu_dist --download-slepc --with-zlib --download-hpddm --download-libpng --download-ctetgen --download-tetgen --download-triangle --download-mmg --download-parmmg --download-p4est --download-eigen --download-hypre --download-pragmatic >> [0]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:289 >> [0]PETSC ERROR: #2 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:383 >> [0]PETSC ERROR: #3 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >> [0]PETSC ERROR: #4 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >> [0]PETSC ERROR: #5 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >> [0]PETSC ERROR: #6 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >> [0]PETSC ERROR: #7 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >> [0]PETSC ERROR: #8 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >> [0]PETSC ERROR: #9 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >> [0]PETSC ERROR: #10 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >> [0]PETSC ERROR: #11 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >> [0]PETSC ERROR: #12 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >> [0]PETSC ERROR: #13 PetscLogNestedTreePrintTop() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:420 >> [0]PETSC ERROR: #14 PetscLogHandlerView_Nested_XML() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:443 >> [0]PETSC ERROR: #15 PetscLogHandlerView_Nested() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/lognested.c:405 >> [0]PETSC ERROR: #16 PetscLogHandlerView() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/interface/loghandler.c:342 >> [0]PETSC ERROR: #17 PetscLogView() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/plog.c:2043 >> [0]PETSC ERROR: #18 PetscLogViewFromOptions() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/plog.c:2084 >> [0]PETSC ERROR: #19 PetscFinalize() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/objects/pinit.c:1552 >> PetscFinalize() failed [error code: 98] >> -------------------------------------------------------------------------- >> prterun has exited due to process rank 0 with PID 28986 on node 192.168.10.51 exiting >> improperly. There are three reasons this could occur: >> >> 1. this process did not call "init" before exiting, but others in the >> job did. This can cause a job to hang indefinitely while it waits for >> all processes to call "init". By rule, if one process calls "init", >> then ALL processes must call "init" prior to termination. >> >> 2. this process called "init", but exited without calling "finalize". >> By rule, all processes that call "init" MUST call "finalize" prior to >> exiting or it will be considered an "abnormal termination" >> >> 3. this process called "MPI_Abort" or "prte_abort" and the mca >> parameter prte_create_session_dirs is set to false. In this case, the >> run-time cannot detect that the abort call was an abnormal >> termination. Hence, the only error message you will receive is this >> one. >> >> This may have caused other processes in the application to be >> terminated by signals sent by prterun (as reported here). >> >> You can avoid this message by specifying -quiet on the prterun command >> line. >> -------------------------------------------------------------------------- >> ``` >> >> Best wishes, >> Zongze >> >> From: petsc-users > on behalf of Klaij, Christiaan via petsc-users > >> Date: Monday, July 14, 2025 at 15:58 >> To: Barry Smith > >> Cc: PETSc users list > >> Subject: Re: [petsc-users] problem with nested logging, standalone example >> >> @Junchao: yes, all with my ex2f.F90 variation on two or three cores >> >> @Barry: it's really puzzling that you cannot reproduce. Can you try running it a dozen times in a row? And look at the report_performance.xml file? When it hangs I see some nan's, for instance here in the VecAXPY event: >> >> >> >> VecAXPY >> >> >> 0.5 >> 0. >> 1. >> 1 >> 0 >> >> >> >> self >> >> >> This is what I did in my latest attempt on the login node of our Rocky Linux 9 cluster: >> 1) download petsc-3.23.4.tar.gz from the petsc website >> 2) ./configure -prefix=~/petsc/install --with-cxx=0 --with-debugging=0 --with-mpi-dir=/cm/shared/apps/mpich/ge/gcc/64/3.4.2 >> 3) adjust my example to this version of petsc (file is attached) >> 4) make ex2f-cklaij-dbg-v2 >> 5) mpirun -n 2 ./ex2f-cklaij-dbg-v2 >> >> So the exact versions are: petsc-3.23.4, system mpich 3.4.2, system gcc 11.5.0 >> >> ________________________________________ >> From: Barry Smith > >> Sent: Friday, July 11, 2025 11:22 PM >> To: Klaij, Christiaan >> Cc: Junchao Zhang; PETSc users list >> Subject: Re: [petsc-users] problem with nested logging, standalone example >> >> >> And yet we cannot reproduce. >> >> Please tell us the exact PETSc version and MPI implementation versions. And reattach your reproducing example. And exactly how you run it. >> >> >> Can you reproduce it on an "ordinary" machine, say a Mac or Linux laptop. >> >> Barry >> >> If I could reproduce the problem here is how I would debug. I put use -start_in_debugger and then put break points in places which it seem problematic. Presumably I would end up with a hang with each MPI process in a "different place" and from that I may be able to determine how that happened. >> >> >> >> > On Jul 11, 2025, at 7:58?AM, Klaij, Christiaan > wrote: >> > >> > In summary for future reference: >> > - tested 3 different machines, two at Marin, one at the national HPC >> > - tested 3 different mpi implementation (intelmpi, openmpi and mpich) >> > - tested openmpi in both release and debug >> > - tested 2 different compilers (intel and gnu), both older and very recent versions >> > - tested with the most basic config (./configure --with-cxx=0 --with-debugging=0 --download-mpich) >> > >> > All of these test either segfault, or hang or error-out at the call to PetscLogView. >> > >> > Chris >> > >> > ________________________________________ >> > From: Klaij, Christiaan > >> > Sent: Friday, July 11, 2025 10:10 AM >> > To: Barry Smith; Junchao Zhang >> > Cc: PETSc users list >> > Subject: Re: [petsc-users] problem with nested logging, standalone example >> > >> > @Matt: no MPI errors indeed. I've tried with MPICH and I get the same hanging. >> > @Barry: both stack traces aren't exactly the same, see a sample with MPICH below. >> > >> > If it cannot be reproduced at your side, I'm afraid this is another dead end. Thanks anyway, I really appreciate all your help. >> > >> > Chris >> > >> > (gdb) bt >> > #0 0x000015555033bc2e in MPIDI_POSIX_mpi_release_gather_gather.constprop.0 () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #1 0x000015555033db8a in MPIDI_POSIX_mpi_allreduce_release_gather () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #2 0x000015555033e70f in MPIR_Allreduce () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #3 0x000015555033f22e in PMPI_Allreduce () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #4 0x0000155553f85d69 in MPIU_Allreduce_Count (comm=-2080374782, >> > op=1476395020, dtype=1275072547, count=1, outbuf=0x7fffffffac70, >> > inbuf=0x7fffffffac60) >> > at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1839 >> > #5 MPIU_Allreduce_Private (inbuf=inbuf at entry=0x7fffffffac60, >> > outbuf=outbuf at entry=0x7fffffffac70, count=count at entry=1, >> > dtype=dtype at entry=1275072547, op=op at entry=1476395020, comm=-2080374782) >> > at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1869 >> > #6 0x0000155553f33dbe in PetscPrintXMLNestedLinePerfResults ( >> > viewer=viewer at entry=0x458890, name=name at entry=0x155554ef6a0d 'mbps\000', >> > value=, minthreshold=minthreshold at entry=0, >> > maxthreshold=maxthreshold at entry=0.01, >> > minmaxtreshold=minmaxtreshold at entry=1.05) >> > at /home/cklaij/petsc/petsc-3.23.4/src/sys/logging/handler/impls/nested/xmlviewer.c:255 >> > >> > >> > (gdb) bt >> > #0 0x000015554fed3b17 in clock_gettime at GLIBC_2.2.5 () from /lib64/libc.so.6 >> > #1 0x0000155550b0de71 in ofi_gettime_ns () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #2 0x0000155550b0dec9 in ofi_gettime_ms () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #3 0x0000155550b2fab5 in sock_cq_sreadfrom () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #4 0x00001555505ca6f7 in MPIDI_OFI_progress () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #5 0x0000155550591fe9 in progress_test () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #6 0x00001555505924a3 in MPID_Progress_wait () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #7 0x000015555043463e in MPIR_Wait_state () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #8 0x000015555052ec49 in MPIC_Wait () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #9 0x000015555053093e in MPIC_Sendrecv () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #10 0x00001555504bf674 in MPIR_Allreduce_intra_recursive_doubling () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #11 0x00001555505b61de in MPIDI_OFI_mpi_finalize_hook () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > >> > ________________________________________ >> > From: Barry Smith > >> > Sent: Thursday, July 10, 2025 11:10 PM >> > To: Junchao Zhang >> > Cc: Klaij, Christiaan; PETSc users list >> > Subject: Re: [petsc-users] problem with nested logging, standalone example >> > >> > >> > I cannot reproduce >> > >> > On Jul 10, 2025, at 3:46?PM, Junchao Zhang > wrote: >> > >> > Adding -mca coll_hcoll_enable 0 didn't change anything at my end. Strange. >> > >> > --Junchao Zhang >> > >> > >> > On Thu, Jul 10, 2025 at 3:39?AM Klaij, Christiaan >> wrote: >> > An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0, the code does not hang but gives the error below. >> > >> > Chris >> > >> > >> > $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always >> > 0 KSP Residual norm 1.11803 >> > 1 KSP Residual norm 0.591608 >> > 2 KSP Residual norm 0.316228 >> > 3 KSP Residual norm < 1.e-11 >> > 0 KSP Residual norm 0.707107 >> > 1 KSP Residual norm 0.408248 >> > 2 KSP Residual norm < 1.e-11 >> > Norm of error < 1.e-12 iterations 3 >> > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> > [1]PETSC ERROR: General MPI error >> > [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer >> > [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK43J9p4SM$ for trouble shooting. >> > [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 >> > [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on login1 by cklaij Thu Jul 10 10:33:33 2025 >> > [1]PETSC ERROR: Configure options: --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 --with-mpe=0 --with-debugging=0 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4VVy6P4U$ --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4-9b1K84$ --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4Y9uaqiQ$ --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" >> > [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 >> > [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377 >> > [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >> > [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 >> > [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 >> > [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 >> > [1]PETSC ERROR: #7 PetscLogHandlerView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 >> > [1]PETSC ERROR: #8 PetscLogView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040 >> > [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301 >> > -------------------------------------------------------------------------- >> > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF >> > Proc: [[55228,1],1] >> > Errorcode: 98 >> > >> > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >> > You may or may not see output from other processes, depending on >> > exactly when Open MPI kills them. >> > -------------------------------------------------------------------------- >> > -------------------------------------------------------------------------- >> > prterun has exited due to process rank 1 with PID 0 on node login1 calling >> > "abort". This may have caused other processes in the application to be >> > terminated by signals sent by prterun (as reported here). >> > -------------------------------------------------------------------------- >> > >> > ________________________________________ >> > >> > dr. ir. Christiaan Klaij | senior researcher >> > Research & Development | CFD Development >> > T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4BUEn1h8$ >> > >> > >> > >> > >> > >> > From: Klaij, Christiaan >> >> > Sent: Thursday, July 10, 2025 10:15 AM >> > To: Junchao Zhang >> > Cc: PETSc users list >> > Subject: Re: [petsc-users] problem with nested logging, standalone example >> > >> > Hi Junchao, >> > >> > Thanks for testing. I've fixed the error but unfortunately that doesn't change the behavior, the code still hangs as before, with the same stack trace... >> > >> > Chris >> > >> > ________________________________________ >> > From: Junchao Zhang >> >> > Sent: Tuesday, July 8, 2025 10:58 PM >> > To: Klaij, Christiaan >> > Cc: PETSc users list >> > Subject: Re: [petsc-users] problem with nested logging, standalone example >> > >> > Hi, Chris, >> > First, I had to fix an error in your test by adding " PetscCallA(MatSetFromOptions(AA,ierr))" at line 254. >> > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> > [0]PETSC ERROR: Object is in wrong state >> > [0]PETSC ERROR: Mat object's type is not set: Argument # 1 >> > ... >> > [0]PETSC ERROR: #1 MatSetValues() at /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503 >> > [0]PETSC ERROR: #2 ex2f.F90:258 >> > >> > Then I could ran the test without problems >> > mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always >> > 0 KSP Residual norm 1.11803 >> > 1 KSP Residual norm 0.591608 >> > 2 KSP Residual norm 0.316228 >> > 3 KSP Residual norm < 1.e-11 >> > 0 KSP Residual norm 0.707107 >> > 1 KSP Residual norm 0.408248 >> > 2 KSP Residual norm < 1.e-11 >> > Norm of error < 1.e-12 iterations 3 >> > >> > I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with >> > ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-openmpi --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" >> > >> > Could you fix the error and retry? >> > >> > --Junchao Zhang >> > >> > >> > On Sun, Jul 6, 2025 at 12:57?PM Klaij, Christiaan via petsc-users >>>> wrote: >> > Attached is a standalone example of the issue described in the >> > earlier thread "problem with nested logging". The issue appeared >> > somewhere between petsc 3.19.4 and 3.23.4. >> > >> > The example is a variation of ../ksp/tutorials/ex2f.F90, where >> > I've added the nested log viewer with one event as well as the >> > solution of a small system on rank zero. >> > >> > When running on mulitple procs the example hangs during >> > PetscLogView with the backtrace below. The configure.log is also >> > attached in the hope that you can replicate the issue. >> > >> > Chris >> > >> > >> > #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, >> > datatype=0x15554c9ef900 , src=1, tag=-12, >> > comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 >> > #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling ( >> > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, >> > dtype=0x15554c9ef900 , >> > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) >> > at base/coll_base_allreduce.c:247 >> > #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( >> > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, >> > dtype=0x15554c9ef900 , >> > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630, >> > algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142 >> > #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( >> > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, >> > dtype=0x15554c9ef900 , >> > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) >> > at coll_tuned_decision_fixed.c:216 >> > #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, >> > rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , >> > op=0x15554ca28980 , comm=0x7f1e30, module=0xaecb80) >> > at coll_hcoll_ops.c:217 >> > #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, >> > recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30) at allreduce.c:123 >> > #6 0x0000155553eabede in MPIU_Allreduce_Private () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #15 0x0000155553e56232 in PetscLogHandlerView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #16 0x0000155553e588c3 in PetscLogView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #17 0x0000155553e40eb5 in petsclogview_ () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #18 0x0000000000402c8b in MAIN__ () >> > #19 0x00000000004023df in main () >> > [cid:ii_197ebccaa1d27ee6ef21] >> > dr. ir. Christiaan Klaij | senior researcher >> > Research & Development | CFD Development >> > T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4BUEn1h8$ > >> > [Facebook] >> > [LinkedIn] >> > [YouTube] >> > >> > >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sblondel at utk.edu Wed Jul 23 10:48:26 2025 From: sblondel at utk.edu (Blondel, Sophie) Date: Wed, 23 Jul 2025 15:48:26 +0000 Subject: [petsc-users] PETSc unable to find cuda Message-ID: Hi, I am trying to install PETSc (3.22.2) with Kokkos and cuda support on an Ubuntu laptop with dependencies loaded with Conda. The configure line is: ./configure PETSC_DIR=/home/sophie/Workspace/xolotl-develop-source/external/petsc PETSC_ARCH=rel-cuda --prefix=/home/sophie/Workspace/xolotl-develop-cuda/external/petsc_install --with-fc=0 --with-cuda=1 --with-mpi --with-openmp=0 --with-debugging=0 --with-shared-libraries --with-64-bit-indices --download-kokkos --download-kokkos-kernels --download-hdf5 --download-hdf5-configure-arguments=--enable-parallel --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --with-cuda-arch=86 --CUDAOPTFLAGS=-O3 And the configure.log is attached. Let me know if I can provide additional information. Best, Sophie -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 642762 bytes Desc: configure.log URL: From knepley at gmail.com Wed Jul 23 10:54:23 2025 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 23 Jul 2025 11:54:23 -0400 Subject: [petsc-users] PETSc unable to find cuda In-Reply-To: References: Message-ID: On Wed, Jul 23, 2025 at 11:49?AM Blondel, Sophie via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi, > > I am trying to install PETSc (3.22.2) with Kokkos and cuda support on an > Ubuntu laptop with dependencies loaded with Conda. > You are likely to have to turn off Conda before configuring. It messes up paths for Python and other things. Thanks, Matt > The configure line is: ./configure > PETSC_DIR=/home/sophie/Workspace/xolotl-develop-source/external/petsc > PETSC_ARCH=rel-cuda > --prefix=/home/sophie/Workspace/xolotl-develop-cuda/external/petsc_install > --with-fc=0 --with-cuda=1 --with-mpi --with-openmp=0 --with-debugging=0 > --with-shared-libraries --with-64-bit-indices --download-kokkos > --download-kokkos-kernels --download-hdf5 > --download-hdf5-configure-arguments=--enable-parallel --COPTFLAGS=-O3 > --CXXOPTFLAGS=-O3 --with-cuda-arch=86 --CUDAOPTFLAGS=-O3 > > And the configure.log is attached. Let me know if I can provide additional > information. > > Best, > > Sophie > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!agMVStAAIH9wbZe0e-cFdZ4WjTbwZrweNZu7-go8ZAWNhd-pdXsjVKaHKw732n2ipdeGu6Ifzps6ZL8zlPZi$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay.anl at fastmail.org Wed Jul 23 11:09:07 2025 From: balay.anl at fastmail.org (Satish Balay) Date: Wed, 23 Jul 2025 11:09:07 -0500 (CDT) Subject: [petsc-users] PETSc unable to find cuda In-Reply-To: References: Message-ID: >>> Executing: mpicc --version stdout: x86_64-conda-linux-gnu-cc (conda-forge gcc 13.3.0-2) 13.3.0 Executing: mpicc -o /tmp/petsc-q3as9pdp/config.libraries/conftest -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector -fvisibility=hidden -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector -fvisibility=hidden -O3 /tmp/petsc-q3as9pdp/config.libraries/conftest.o -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64 -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -L/usr/local/cuda-11.8/lib64/stubs -lcuda -lquadmath stdout: /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: warning: libstdc++.so.6, needed by /usr/local/cuda-11.8/lib64/libnvToolsExt.so, not found (try using -rpath or -rpath-link) /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/sophie/miniforge/envs/release/lib/./././libicuuc.so.73: undefined reference to `std::condition_variable::notify_all()@GLIBCXX_3.4.11' /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/sophie/miniforge/envs/release/lib/./././libicui18n.so.73: undefined reference to `__cxa_guard_acquire at CXXABI_1.3' /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/sophie/miniforge/envs/release/lib/./././libicui18n.so.73: undefined reference to `operator delete(void*)@GLIBCXX_3.4' /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/sophie/miniforge/envs/release/lib/./././libicuuc.so.73: undefined reference to `std::__once_call at GLIBCXX_3.4.11' /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/sophie/miniforge/envs/release/lib/./././libicui18n.so.73: undefined reference to `vtable for __cxxabiv1::__si_class_type_info at CXXABI_1.3' <<<< Likely you need gcc/g++-11 for this version of cuda. [or install/use a newer version of cuda]. And best if you can use latest petsc release. Satish On Wed, 23 Jul 2025, Matthew Knepley wrote: > On Wed, Jul 23, 2025 at 11:49?AM Blondel, Sophie via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > Hi, > > > > I am trying to install PETSc (3.22.2) with Kokkos and cuda support on an > > Ubuntu laptop with dependencies loaded with Conda. > > > > You are likely to have to turn off Conda before configuring. It messes up > paths for Python and other things. > > Thanks, > > Matt > > > > The configure line is: ./configure > > PETSC_DIR=/home/sophie/Workspace/xolotl-develop-source/external/petsc > > PETSC_ARCH=rel-cuda > > --prefix=/home/sophie/Workspace/xolotl-develop-cuda/external/petsc_install > > --with-fc=0 --with-cuda=1 --with-mpi --with-openmp=0 --with-debugging=0 > > --with-shared-libraries --with-64-bit-indices --download-kokkos > > --download-kokkos-kernels --download-hdf5 > > --download-hdf5-configure-arguments=--enable-parallel --COPTFLAGS=-O3 > > --CXXOPTFLAGS=-O3 --with-cuda-arch=86 --CUDAOPTFLAGS=-O3 > > > > And the configure.log is attached. Let me know if I can provide additional > > information. > > > > Best, > > > > Sophie > > > > > From junchao.zhang at gmail.com Wed Jul 23 13:55:12 2025 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 23 Jul 2025 13:55:12 -0500 Subject: [petsc-users] problem with nested logging, standalone example In-Reply-To: References: <7EBA5795-308C-423C-A6B5-919F0DB8E76A@petsc.dev> Message-ID: I think I have a fix at https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8583__;!!G_uCfscf7eWS!Ygjk0LqLzq4CEWSwWlfjSXbnpYArxmsXNUsIPbxdrCzLChWKg3wAvRTDx2E_WNi8e-uL0lA5NANTbRg7Yx0Cx_HiuHtS$ Chirs and Zongze, could you try it? Thanks! --Junchao Zhang On Tue, Jul 22, 2025 at 4:16?PM Barry Smith wrote: > > Yippee! (maybe) > > On Jul 22, 2025, at 4:18?PM, Junchao Zhang > wrote: > > With Chris's example, I did reproduce the "MPI_ERR_BUFFER: invalid buffer > pointer" on a machine. I am looking into it. > > Thanks. > --Junchao Zhang > > > On Tue, Jul 22, 2025 at 9:51?AM Zongze Yang wrote: > >> Hi, >> I encountered a similar issue with Firedrake when using the -log_view option >> with XML format on macOS. Below is the error message. The Firedrake code >> and the shell script used to run it are attached. >> >> ``` >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: General MPI error >> [0]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer >> [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!Ygjk0LqLzq4CEWSwWlfjSXbnpYArxmsXNUsIPbxdrCzLChWKg3wAvRTDx2E_WNi8e-uL0lA5NANTbRg7Yx0Cxw97wgYm$ >> >> for trouble shooting. >> [0]PETSC ERROR: PETSc Release Version 3.23.4, unknown >> [0]PETSC ERROR: test.py with 2 MPI process(es) and PETSC_ARCH >> arch-firedrake-default on 192.168.10.51 by zzyang Tue Jul 22 22:24:05 2025 >> [0]PETSC ERROR: Configure options: PETSC_ARCH=arch-firedrake-default >> --COPTFLAGS="-O3 -march=native -mtune=native" --CXXOPTFLAGS="-O3 >> -march=native -mtune=native" --FOPTFLAGS="-O3 -mtune=native" >> --with-c2html=0 --with-debugging=0 --with-fortran-bindings=0 >> --with-shared-libraries=1 --with-strict-petscerrorcode --download-cmake >> --download-bison --download-fftw --download-mumps-avoid-mpi-in-place >> --with-hdf5-dir=/opt/homebrew --with-hwloc-dir=/opt/homebrew >> --download-metis --download-mumps --download-netcdf --download-pnetcdf >> --download-ptscotch --download-scalapack --download-suitesparse >> --download-superlu_dist --download-slepc --with-zlib --download-hpddm >> --download-libpng --download-ctetgen --download-tetgen --download-triangle >> --download-mmg --download-parmmg --download-p4est --download-eigen >> --download-hypre --download-pragmatic >> [0]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at >> /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:289 >> [0]PETSC ERROR: #2 PetscLogNestedTreePrint() at >> /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:383 >> [0]PETSC ERROR: #3 PetscLogNestedTreePrint() at >> /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >> [0]PETSC ERROR: #4 PetscLogNestedTreePrint() at >> /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >> [0]PETSC ERROR: #5 PetscLogNestedTreePrint() at >> /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >> [0]PETSC ERROR: #6 PetscLogNestedTreePrint() at >> /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >> [0]PETSC ERROR: #7 PetscLogNestedTreePrint() at >> /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >> [0]PETSC ERROR: #8 PetscLogNestedTreePrint() at >> /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >> [0]PETSC ERROR: #9 PetscLogNestedTreePrint() at >> /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >> [0]PETSC ERROR: #10 PetscLogNestedTreePrint() at >> /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >> [0]PETSC ERROR: #11 PetscLogNestedTreePrint() at >> /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >> [0]PETSC ERROR: #12 PetscLogNestedTreePrint() at >> /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >> [0]PETSC ERROR: #13 PetscLogNestedTreePrintTop() at >> /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:420 >> [0]PETSC ERROR: #14 PetscLogHandlerView_Nested_XML() at >> /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:443 >> [0]PETSC ERROR: #15 PetscLogHandlerView_Nested() at >> /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/lognested.c:405 >> [0]PETSC ERROR: #16 PetscLogHandlerView() at >> /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/interface/loghandler.c:342 >> [0]PETSC ERROR: #17 PetscLogView() at >> /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/plog.c:2043 >> [0]PETSC ERROR: #18 PetscLogViewFromOptions() at >> /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/plog.c:2084 >> [0]PETSC ERROR: #19 PetscFinalize() at >> /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/objects/pinit.c:1552 >> PetscFinalize() failed [error code: 98] >> -------------------------------------------------------------------------- >> prterun has exited due to process rank 0 with PID 28986 on node >> 192.168.10.51 exiting >> improperly. There are three reasons this could occur: >> >> 1. this process did not call "init" before exiting, but others in the >> job did. This can cause a job to hang indefinitely while it waits for >> all processes to call "init". By rule, if one process calls "init", >> then ALL processes must call "init" prior to termination. >> >> 2. this process called "init", but exited without calling "finalize". >> By rule, all processes that call "init" MUST call "finalize" prior to >> exiting or it will be considered an "abnormal termination" >> >> 3. this process called "MPI_Abort" or "prte_abort" and the mca >> parameter prte_create_session_dirs is set to false. In this case, the >> run-time cannot detect that the abort call was an abnormal >> termination. Hence, the only error message you will receive is this >> one. >> >> This may have caused other processes in the application to be >> terminated by signals sent by prterun (as reported here). >> >> You can avoid this message by specifying -quiet on the prterun command >> line. >> -------------------------------------------------------------------------- >> ``` >> >> Best wishes, >> Zongze >> >> *From: *petsc-users on behalf of >> Klaij, Christiaan via petsc-users >> *Date: *Monday, July 14, 2025 at 15:58 >> *To: *Barry Smith >> *Cc: *PETSc users list >> *Subject: *Re: [petsc-users] problem with nested logging, standalone >> example >> >> @Junchao: yes, all with my ex2f.F90 variation on two or three cores >> >> @Barry: it's really puzzling that you cannot reproduce. Can you try >> running it a dozen times in a row? And look at the report_performance.xml >> file? When it hangs I see some nan's, for instance here in the VecAXPY >> event: >> >> >> >> VecAXPY >> >> >> 0.5 >> 0. >> 1. >> 1 >> 0 >> >> >> >> self >> >> >> This is what I did in my latest attempt on the login node of our Rocky >> Linux 9 cluster: >> 1) download petsc-3.23.4.tar.gz from the petsc website >> 2) ./configure -prefix=~/petsc/install --with-cxx=0 --with-debugging=0 >> --with-mpi-dir=/cm/shared/apps/mpich/ge/gcc/64/3.4.2 >> 3) adjust my example to this version of petsc (file is attached) >> 4) make ex2f-cklaij-dbg-v2 >> 5) mpirun -n 2 ./ex2f-cklaij-dbg-v2 >> >> So the exact versions are: petsc-3.23.4, system mpich 3.4.2, system gcc >> 11.5.0 >> >> ________________________________________ >> From: Barry Smith >> Sent: Friday, July 11, 2025 11:22 PM >> To: Klaij, Christiaan >> Cc: Junchao Zhang; PETSc users list >> Subject: Re: [petsc-users] problem with nested logging, standalone example >> >> >> And yet we cannot reproduce. >> >> Please tell us the exact PETSc version and MPI implementation versions. >> And reattach your reproducing example. And exactly how you run it. >> >> >> Can you reproduce it on an "ordinary" machine, say a Mac or Linux >> laptop. >> >> Barry >> >> If I could reproduce the problem here is how I would debug. I put use >> -start_in_debugger and then put break points in places which it seem >> problematic. Presumably I would end up with a hang with each MPI process in >> a "different place" and from that I may be able to determine how that >> happened. >> >> >> >> > On Jul 11, 2025, at 7:58?AM, Klaij, Christiaan >> wrote: >> > >> > In summary for future reference: >> > - tested 3 different machines, two at Marin, one at the national HPC >> > - tested 3 different mpi implementation (intelmpi, openmpi and mpich) >> > - tested openmpi in both release and debug >> > - tested 2 different compilers (intel and gnu), both older and very >> recent versions >> > - tested with the most basic config (./configure --with-cxx=0 >> --with-debugging=0 --download-mpich) >> > >> > All of these test either segfault, or hang or error-out at the call to >> PetscLogView. >> > >> > Chris >> > >> > ________________________________________ >> > From: Klaij, Christiaan >> > Sent: Friday, July 11, 2025 10:10 AM >> > To: Barry Smith; Junchao Zhang >> > Cc: PETSc users list >> > Subject: Re: [petsc-users] problem with nested logging, standalone >> example >> > >> > @Matt: no MPI errors indeed. I've tried with MPICH and I get the same >> hanging. >> > @Barry: both stack traces aren't exactly the same, see a sample with >> MPICH below. >> > >> > If it cannot be reproduced at your side, I'm afraid this is another >> dead end. Thanks anyway, I really appreciate all your help. >> > >> > Chris >> > >> > (gdb) bt >> > #0 0x000015555033bc2e in >> MPIDI_POSIX_mpi_release_gather_gather.constprop.0 () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #1 0x000015555033db8a in MPIDI_POSIX_mpi_allreduce_release_gather () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #2 0x000015555033e70f in MPIR_Allreduce () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #3 0x000015555033f22e in PMPI_Allreduce () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #4 0x0000155553f85d69 in MPIU_Allreduce_Count (comm=-2080374782, >> > op=1476395020, dtype=1275072547, count=1, outbuf=0x7fffffffac70, >> > inbuf=0x7fffffffac60) >> > at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1839 >> > #5 MPIU_Allreduce_Private (inbuf=inbuf at entry=0x7fffffffac60, >> > outbuf=outbuf at entry=0x7fffffffac70, count=count at entry=1, >> > dtype=dtype at entry=1275072547, op=op at entry=1476395020, >> comm=-2080374782) >> > at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1869 >> > #6 0x0000155553f33dbe in PetscPrintXMLNestedLinePerfResults ( >> > viewer=viewer at entry=0x458890, name=name at entry=0x155554ef6a0d >> 'mbps\000', >> > value=, minthreshold=minthreshold at entry=0, >> > maxthreshold=maxthreshold at entry=0.01, >> > minmaxtreshold=minmaxtreshold at entry=1.05) >> > at >> /home/cklaij/petsc/petsc-3.23.4/src/sys/logging/handler/impls/nested/xmlviewer.c:255 >> > >> > >> > (gdb) bt >> > #0 0x000015554fed3b17 in clock_gettime at GLIBC_2.2.5 () from >> /lib64/libc.so.6 >> > #1 0x0000155550b0de71 in ofi_gettime_ns () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #2 0x0000155550b0dec9 in ofi_gettime_ms () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #3 0x0000155550b2fab5 in sock_cq_sreadfrom () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #4 0x00001555505ca6f7 in MPIDI_OFI_progress () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #5 0x0000155550591fe9 in progress_test () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #6 0x00001555505924a3 in MPID_Progress_wait () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #7 0x000015555043463e in MPIR_Wait_state () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #8 0x000015555052ec49 in MPIC_Wait () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #9 0x000015555053093e in MPIC_Sendrecv () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #10 0x00001555504bf674 in MPIR_Allreduce_intra_recursive_doubling () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > #11 0x00001555505b61de in MPIDI_OFI_mpi_finalize_hook () >> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >> > >> > ________________________________________ >> > From: Barry Smith >> > Sent: Thursday, July 10, 2025 11:10 PM >> > To: Junchao Zhang >> > Cc: Klaij, Christiaan; PETSc users list >> > Subject: Re: [petsc-users] problem with nested logging, standalone >> example >> > >> > >> > I cannot reproduce >> > >> > On Jul 10, 2025, at 3:46?PM, Junchao Zhang >> wrote: >> > >> > Adding -mca coll_hcoll_enable 0 didn't change anything at my end. >> Strange. >> > >> > --Junchao Zhang >> > >> > >> > On Thu, Jul 10, 2025 at 3:39?AM Klaij, Christiaan > > wrote: >> > An additional clue perhaps: with the option >> OMPI_MCA_coll_hcoll_enable=0, the code does not hang but gives the error >> below. >> > >> > Chris >> > >> > >> > $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type >> jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always >> > 0 KSP Residual norm 1.11803 >> > 1 KSP Residual norm 0.591608 >> > 2 KSP Residual norm 0.316228 >> > 3 KSP Residual norm < 1.e-11 >> > 0 KSP Residual norm 0.707107 >> > 1 KSP Residual norm 0.408248 >> > 2 KSP Residual norm < 1.e-11 >> > Norm of error < 1.e-12 iterations 3 >> > [1]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> > [1]PETSC ERROR: General MPI error >> > [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer >> > [1]PETSC ERROR: See >> https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK43J9p4SM$ >> < >> https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJjkYxsN9$> >> for trouble shooting. >> > [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 >> > [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH >> on login1 by cklaij Thu Jul 10 10:33:33 2025 >> > [1]PETSC ERROR: Configure options: >> --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs >> --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 >> --with-mpe=0 --with-debugging=0 --download-superlu_dist= >> https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4VVy6P4U$ >> < >> https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJkouVHb2$> >> --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 >> --download-parmetis= >> https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4-9b1K84$ >> < >> https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJrjo6-SP$> >> --download-metis= >> https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4Y9uaqiQ$ >> < >> https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJhCc9MRE$> >> --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild >> --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall >> -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall >> -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall >> -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall >> -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops >> -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime >> -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops >> -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime >> -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops >> -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime >> -Wno-unused-function -O3 -DNDEBUG" >> > [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at >> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 >> > [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at >> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377 >> > [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at >> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >> > [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at >> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 >> > [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at >> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 >> > [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at >> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 >> > [1]PETSC ERROR: #7 PetscLogHandlerView() at >> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 >> > [1]PETSC ERROR: #8 PetscLogView() at >> /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040 >> > [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301 >> > >> -------------------------------------------------------------------------- >> > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF >> > Proc: [[55228,1],1] >> > Errorcode: 98 >> > >> > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >> > You may or may not see output from other processes, depending on >> > exactly when Open MPI kills them. >> > >> -------------------------------------------------------------------------- >> > >> -------------------------------------------------------------------------- >> > prterun has exited due to process rank 1 with PID 0 on node login1 >> calling >> > "abort". This may have caused other processes in the application to be >> > terminated by signals sent by prterun (as reported here). >> > >> -------------------------------------------------------------------------- >> > >> > ________________________________________ >> > >> > dr. ir. Christiaan Klaij | senior researcher >> > Research & Development | CFD Development >> > T +31 317 49 33 44 | >> https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4BUEn1h8$ >> < >> https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJrOqapgp$ >> > >> > < >> https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJoD4fuV7$ >> > >> > < >> https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJospHf95$ >> > >> > < >> https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJrpsjB_W$ >> > >> > >> > >> > From: Klaij, Christiaan > >> > Sent: Thursday, July 10, 2025 10:15 AM >> > To: Junchao Zhang >> > Cc: PETSc users list >> > Subject: Re: [petsc-users] problem with nested logging, standalone >> example >> > >> > Hi Junchao, >> > >> > Thanks for testing. I've fixed the error but unfortunately that doesn't >> change the behavior, the code still hangs as before, with the same stack >> trace... >> > >> > Chris >> > >> > ________________________________________ >> > From: Junchao Zhang > junchao.zhang at gmail.com>> >> > Sent: Tuesday, July 8, 2025 10:58 PM >> > To: Klaij, Christiaan >> > Cc: PETSc users list >> > Subject: Re: [petsc-users] problem with nested logging, standalone >> example >> > >> > Hi, Chris, >> > First, I had to fix an error in your test by adding " >> PetscCallA(MatSetFromOptions(AA,ierr))" at line 254. >> > [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> > [0]PETSC ERROR: Object is in wrong state >> > [0]PETSC ERROR: Mat object's type is not set: Argument # 1 >> > ... >> > [0]PETSC ERROR: #1 MatSetValues() at >> /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503 >> > [0]PETSC ERROR: #2 ex2f.F90:258 >> > >> > Then I could ran the test without problems >> > mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short >> -ksp_gmres_cgs_refinement_type refine_always >> > 0 KSP Residual norm 1.11803 >> > 1 KSP Residual norm 0.591608 >> > 2 KSP Residual norm 0.316228 >> > 3 KSP Residual norm < 1.e-11 >> > 0 KSP Residual norm 0.707107 >> > 1 KSP Residual norm 0.408248 >> > 2 KSP Residual norm < 1.e-11 >> > Norm of error < 1.e-12 iterations 3 >> > >> > I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with >> > ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran >> --download-openmpi --with-ssl=0 --with-shared-libraries=1 >> CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" >> CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " >> COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" >> CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " >> FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 >> -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 >> -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 >> -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 >> -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 >> -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 >> -DNDEBUG" >> > >> > Could you fix the error and retry? >> > >> > --Junchao Zhang >> > >> > >> > On Sun, Jul 6, 2025 at 12:57?PM Klaij, Christiaan via petsc-users < >> petsc-users at mcs.anl.gov> petsc-users at mcs.anl.gov>> wrote: >> > Attached is a standalone example of the issue described in the >> > earlier thread "problem with nested logging". The issue appeared >> > somewhere between petsc 3.19.4 and 3.23.4. >> > >> > The example is a variation of ../ksp/tutorials/ex2f.F90, where >> > I've added the nested log viewer with one event as well as the >> > solution of a small system on rank zero. >> > >> > When running on mulitple procs the example hangs during >> > PetscLogView with the backtrace below. The configure.log is also >> > attached in the hope that you can replicate the issue. >> > >> > Chris >> > >> > >> > #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, >> > datatype=0x15554c9ef900 , src=1, tag=-12, >> > comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 >> > #1 0x000015554c65baff in >> ompi_coll_base_allreduce_intra_recursivedoubling ( >> > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, >> > dtype=0x15554c9ef900 , >> > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) >> > at base/coll_base_allreduce.c:247 >> > #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( >> > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, >> > dtype=0x15554c9ef900 , >> > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630, >> > algorithm=3, faninout=0, segsize=0) at >> coll_tuned_allreduce_decision.c:142 >> > #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( >> > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, >> > dtype=0x15554c9ef900 , >> > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) >> > at coll_tuned_decision_fixed.c:216 >> > #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, >> > rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , >> > op=0x15554ca28980 , comm=0x7f1e30, module=0xaecb80) >> > at coll_hcoll_ops.c:217 >> > #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, >> > recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 >> , op=0x15554ca28980 , comm=0x7f1e30) >> at allreduce.c:123 >> > #6 0x0000155553eabede in MPIU_Allreduce_Private () from >> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from >> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from >> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from >> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from >> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from >> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from >> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from >> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from >> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #15 0x0000155553e56232 in PetscLogHandlerView () from >> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #16 0x0000155553e588c3 in PetscLogView () from >> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #17 0x0000155553e40eb5 in petsclogview_ () from >> /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >> > #18 0x0000000000402c8b in MAIN__ () >> > #19 0x00000000004023df in main () >> > [cid:ii_197ebccaa1d27ee6ef21] >> > dr. ir. Christiaan Klaij | senior researcher >> > Research & Development | CFD Development >> > T +31 317 49 33 44 | >> https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4BUEn1h8$ >> < >> https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJhphmV4x$>> > >> > [Facebook]< >> https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkLNCvsiI$ >> > >> > [LinkedIn]< >> https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkrb79Ay4$ >> > >> > [YouTube]< >> https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkJiCoeLw$ >> > >> > >> > >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Jul 23 14:02:28 2025 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 23 Jul 2025 15:02:28 -0400 Subject: [petsc-users] problem with nested logging, standalone example In-Reply-To: References: <7EBA5795-308C-423C-A6B5-919F0DB8E76A@petsc.dev> Message-ID: Yippee! > On Jul 23, 2025, at 2:55?PM, Junchao Zhang wrote: > > I think I have a fix at https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8583__;!!G_uCfscf7eWS!e3MBGqc1gt1lKlbv4ETAhLKYDlgT2teM1RuXVOTxDVqsgdFK3oQU-JmOwnszj_WGfSTGPBSmsjoAzNzCpDd4Si8$ > > Chirs and Zongze, could you try it? > > Thanks! > --Junchao Zhang > > > On Tue, Jul 22, 2025 at 4:16?PM Barry Smith > wrote: >> >> Yippee! (maybe) >> >>> On Jul 22, 2025, at 4:18?PM, Junchao Zhang > wrote: >>> >>> With Chris's example, I did reproduce the "MPI_ERR_BUFFER: invalid buffer pointer" on a machine. I am looking into it. >>> >>> Thanks. >>> --Junchao Zhang >>> >>> >>> On Tue, Jul 22, 2025 at 9:51?AM Zongze Yang > wrote: >>>> Hi, >>>> I encountered a similar issue with Firedrake when using the -log_view option with XML format on macOS. Below is the error message. The Firedrake code and the shell script used to run it are attached. >>>> >>>> ``` >>>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>> [0]PETSC ERROR: General MPI error >>>> [0]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer >>>> [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!e3MBGqc1gt1lKlbv4ETAhLKYDlgT2teM1RuXVOTxDVqsgdFK3oQU-JmOwnszj_WGfSTGPBSmsjoAzNzC4_rTAoo$ for trouble shooting. >>>> [0]PETSC ERROR: PETSc Release Version 3.23.4, unknown >>>> [0]PETSC ERROR: test.py with 2 MPI process(es) and PETSC_ARCH arch-firedrake-default on 192.168.10.51 by zzyang Tue Jul 22 22:24:05 2025 >>>> [0]PETSC ERROR: Configure options: PETSC_ARCH=arch-firedrake-default --COPTFLAGS="-O3 -march=native -mtune=native" --CXXOPTFLAGS="-O3 -march=native -mtune=native" --FOPTFLAGS="-O3 -mtune=native" --with-c2html=0 --with-debugging=0 --with-fortran-bindings=0 --with-shared-libraries=1 --with-strict-petscerrorcode --download-cmake --download-bison --download-fftw --download-mumps-avoid-mpi-in-place --with-hdf5-dir=/opt/homebrew --with-hwloc-dir=/opt/homebrew --download-metis --download-mumps --download-netcdf --download-pnetcdf --download-ptscotch --download-scalapack --download-suitesparse --download-superlu_dist --download-slepc --with-zlib --download-hpddm --download-libpng --download-ctetgen --download-tetgen --download-triangle --download-mmg --download-parmmg --download-p4est --download-eigen --download-hypre --download-pragmatic >>>> [0]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:289 >>>> [0]PETSC ERROR: #2 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:383 >>>> [0]PETSC ERROR: #3 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >>>> [0]PETSC ERROR: #4 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >>>> [0]PETSC ERROR: #5 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >>>> [0]PETSC ERROR: #6 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >>>> [0]PETSC ERROR: #7 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >>>> [0]PETSC ERROR: #8 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >>>> [0]PETSC ERROR: #9 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >>>> [0]PETSC ERROR: #10 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >>>> [0]PETSC ERROR: #11 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >>>> [0]PETSC ERROR: #12 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >>>> [0]PETSC ERROR: #13 PetscLogNestedTreePrintTop() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:420 >>>> [0]PETSC ERROR: #14 PetscLogHandlerView_Nested_XML() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:443 >>>> [0]PETSC ERROR: #15 PetscLogHandlerView_Nested() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/lognested.c:405 >>>> [0]PETSC ERROR: #16 PetscLogHandlerView() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/interface/loghandler.c:342 >>>> [0]PETSC ERROR: #17 PetscLogView() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/plog.c:2043 >>>> [0]PETSC ERROR: #18 PetscLogViewFromOptions() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/plog.c:2084 >>>> [0]PETSC ERROR: #19 PetscFinalize() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/objects/pinit.c:1552 >>>> PetscFinalize() failed [error code: 98] >>>> -------------------------------------------------------------------------- >>>> prterun has exited due to process rank 0 with PID 28986 on node 192.168.10.51 exiting >>>> improperly. There are three reasons this could occur: >>>> >>>> 1. this process did not call "init" before exiting, but others in the >>>> job did. This can cause a job to hang indefinitely while it waits for >>>> all processes to call "init". By rule, if one process calls "init", >>>> then ALL processes must call "init" prior to termination. >>>> >>>> 2. this process called "init", but exited without calling "finalize". >>>> By rule, all processes that call "init" MUST call "finalize" prior to >>>> exiting or it will be considered an "abnormal termination" >>>> >>>> 3. this process called "MPI_Abort" or "prte_abort" and the mca >>>> parameter prte_create_session_dirs is set to false. In this case, the >>>> run-time cannot detect that the abort call was an abnormal >>>> termination. Hence, the only error message you will receive is this >>>> one. >>>> >>>> This may have caused other processes in the application to be >>>> terminated by signals sent by prterun (as reported here). >>>> >>>> You can avoid this message by specifying -quiet on the prterun command >>>> line. >>>> -------------------------------------------------------------------------- >>>> ``` >>>> >>>> Best wishes, >>>> Zongze >>>> >>>> From: petsc-users > on behalf of Klaij, Christiaan via petsc-users > >>>> Date: Monday, July 14, 2025 at 15:58 >>>> To: Barry Smith > >>>> Cc: PETSc users list > >>>> Subject: Re: [petsc-users] problem with nested logging, standalone example >>>> >>>> @Junchao: yes, all with my ex2f.F90 variation on two or three cores >>>> >>>> @Barry: it's really puzzling that you cannot reproduce. Can you try running it a dozen times in a row? And look at the report_performance.xml file? When it hangs I see some nan's, for instance here in the VecAXPY event: >>>> >>>> >>>> >>>> VecAXPY >>>> >>>> >>>> 0.5 >>>> 0. >>>> 1. >>>> 1 >>>> 0 >>>> >>>> >>>> >>>> self >>>> >>>> >>>> This is what I did in my latest attempt on the login node of our Rocky Linux 9 cluster: >>>> 1) download petsc-3.23.4.tar.gz from the petsc website >>>> 2) ./configure -prefix=~/petsc/install --with-cxx=0 --with-debugging=0 --with-mpi-dir=/cm/shared/apps/mpich/ge/gcc/64/3.4.2 >>>> 3) adjust my example to this version of petsc (file is attached) >>>> 4) make ex2f-cklaij-dbg-v2 >>>> 5) mpirun -n 2 ./ex2f-cklaij-dbg-v2 >>>> >>>> So the exact versions are: petsc-3.23.4, system mpich 3.4.2, system gcc 11.5.0 >>>> >>>> ________________________________________ >>>> From: Barry Smith > >>>> Sent: Friday, July 11, 2025 11:22 PM >>>> To: Klaij, Christiaan >>>> Cc: Junchao Zhang; PETSc users list >>>> Subject: Re: [petsc-users] problem with nested logging, standalone example >>>> >>>> >>>> And yet we cannot reproduce. >>>> >>>> Please tell us the exact PETSc version and MPI implementation versions. And reattach your reproducing example. And exactly how you run it. >>>> >>>> >>>> Can you reproduce it on an "ordinary" machine, say a Mac or Linux laptop. >>>> >>>> Barry >>>> >>>> If I could reproduce the problem here is how I would debug. I put use -start_in_debugger and then put break points in places which it seem problematic. Presumably I would end up with a hang with each MPI process in a "different place" and from that I may be able to determine how that happened. >>>> >>>> >>>> >>>> > On Jul 11, 2025, at 7:58?AM, Klaij, Christiaan > wrote: >>>> > >>>> > In summary for future reference: >>>> > - tested 3 different machines, two at Marin, one at the national HPC >>>> > - tested 3 different mpi implementation (intelmpi, openmpi and mpich) >>>> > - tested openmpi in both release and debug >>>> > - tested 2 different compilers (intel and gnu), both older and very recent versions >>>> > - tested with the most basic config (./configure --with-cxx=0 --with-debugging=0 --download-mpich) >>>> > >>>> > All of these test either segfault, or hang or error-out at the call to PetscLogView. >>>> > >>>> > Chris >>>> > >>>> > ________________________________________ >>>> > From: Klaij, Christiaan > >>>> > Sent: Friday, July 11, 2025 10:10 AM >>>> > To: Barry Smith; Junchao Zhang >>>> > Cc: PETSc users list >>>> > Subject: Re: [petsc-users] problem with nested logging, standalone example >>>> > >>>> > @Matt: no MPI errors indeed. I've tried with MPICH and I get the same hanging. >>>> > @Barry: both stack traces aren't exactly the same, see a sample with MPICH below. >>>> > >>>> > If it cannot be reproduced at your side, I'm afraid this is another dead end. Thanks anyway, I really appreciate all your help. >>>> > >>>> > Chris >>>> > >>>> > (gdb) bt >>>> > #0 0x000015555033bc2e in MPIDI_POSIX_mpi_release_gather_gather.constprop.0 () >>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >>>> > #1 0x000015555033db8a in MPIDI_POSIX_mpi_allreduce_release_gather () >>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >>>> > #2 0x000015555033e70f in MPIR_Allreduce () >>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >>>> > #3 0x000015555033f22e in PMPI_Allreduce () >>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >>>> > #4 0x0000155553f85d69 in MPIU_Allreduce_Count (comm=-2080374782, >>>> > op=1476395020, dtype=1275072547, count=1, outbuf=0x7fffffffac70, >>>> > inbuf=0x7fffffffac60) >>>> > at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1839 >>>> > #5 MPIU_Allreduce_Private (inbuf=inbuf at entry=0x7fffffffac60, >>>> > outbuf=outbuf at entry=0x7fffffffac70, count=count at entry=1, >>>> > dtype=dtype at entry=1275072547, op=op at entry=1476395020, comm=-2080374782) >>>> > at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1869 >>>> > #6 0x0000155553f33dbe in PetscPrintXMLNestedLinePerfResults ( >>>> > viewer=viewer at entry=0x458890, name=name at entry=0x155554ef6a0d 'mbps\000', >>>> > value=, minthreshold=minthreshold at entry=0, >>>> > maxthreshold=maxthreshold at entry=0.01, >>>> > minmaxtreshold=minmaxtreshold at entry=1.05) >>>> > at /home/cklaij/petsc/petsc-3.23.4/src/sys/logging/handler/impls/nested/xmlviewer.c:255 >>>> > >>>> > >>>> > (gdb) bt >>>> > #0 0x000015554fed3b17 in clock_gettime at GLIBC_2.2.5 () from /lib64/libc.so.6 >>>> > #1 0x0000155550b0de71 in ofi_gettime_ns () >>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >>>> > #2 0x0000155550b0dec9 in ofi_gettime_ms () >>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >>>> > #3 0x0000155550b2fab5 in sock_cq_sreadfrom () >>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >>>> > #4 0x00001555505ca6f7 in MPIDI_OFI_progress () >>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >>>> > #5 0x0000155550591fe9 in progress_test () >>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >>>> > #6 0x00001555505924a3 in MPID_Progress_wait () >>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >>>> > #7 0x000015555043463e in MPIR_Wait_state () >>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >>>> > #8 0x000015555052ec49 in MPIC_Wait () >>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >>>> > #9 0x000015555053093e in MPIC_Sendrecv () >>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >>>> > #10 0x00001555504bf674 in MPIR_Allreduce_intra_recursive_doubling () >>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >>>> > #11 0x00001555505b61de in MPIDI_OFI_mpi_finalize_hook () >>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 >>>> > >>>> > ________________________________________ >>>> > From: Barry Smith > >>>> > Sent: Thursday, July 10, 2025 11:10 PM >>>> > To: Junchao Zhang >>>> > Cc: Klaij, Christiaan; PETSc users list >>>> > Subject: Re: [petsc-users] problem with nested logging, standalone example >>>> > >>>> > >>>> > I cannot reproduce >>>> > >>>> > On Jul 10, 2025, at 3:46?PM, Junchao Zhang > wrote: >>>> > >>>> > Adding -mca coll_hcoll_enable 0 didn't change anything at my end. Strange. >>>> > >>>> > --Junchao Zhang >>>> > >>>> > >>>> > On Thu, Jul 10, 2025 at 3:39?AM Klaij, Christiaan >> wrote: >>>> > An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0, the code does not hang but gives the error below. >>>> > >>>> > Chris >>>> > >>>> > >>>> > $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always >>>> > 0 KSP Residual norm 1.11803 >>>> > 1 KSP Residual norm 0.591608 >>>> > 2 KSP Residual norm 0.316228 >>>> > 3 KSP Residual norm < 1.e-11 >>>> > 0 KSP Residual norm 0.707107 >>>> > 1 KSP Residual norm 0.408248 >>>> > 2 KSP Residual norm < 1.e-11 >>>> > Norm of error < 1.e-12 iterations 3 >>>> > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>> > [1]PETSC ERROR: General MPI error >>>> > [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer >>>> > [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK43J9p4SM$ for trouble shooting. >>>> > [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 >>>> > [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on login1 by cklaij Thu Jul 10 10:33:33 2025 >>>> > [1]PETSC ERROR: Configure options: --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 --with-mpe=0 --with-debugging=0 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4VVy6P4U$ --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4-9b1K84$ --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4Y9uaqiQ$ --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" >>>> > [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 >>>> > [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377 >>>> > [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 >>>> > [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 >>>> > [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 >>>> > [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 >>>> > [1]PETSC ERROR: #7 PetscLogHandlerView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 >>>> > [1]PETSC ERROR: #8 PetscLogView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040 >>>> > [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301 >>>> > -------------------------------------------------------------------------- >>>> > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF >>>> > Proc: [[55228,1],1] >>>> > Errorcode: 98 >>>> > >>>> > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >>>> > You may or may not see output from other processes, depending on >>>> > exactly when Open MPI kills them. >>>> > -------------------------------------------------------------------------- >>>> > -------------------------------------------------------------------------- >>>> > prterun has exited due to process rank 1 with PID 0 on node login1 calling >>>> > "abort". This may have caused other processes in the application to be >>>> > terminated by signals sent by prterun (as reported here). >>>> > -------------------------------------------------------------------------- >>>> > >>>> > ________________________________________ >>>> > >>>> > dr. ir. Christiaan Klaij | senior researcher >>>> > Research & Development | CFD Development >>>> > T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4BUEn1h8$ >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > From: Klaij, Christiaan >> >>>> > Sent: Thursday, July 10, 2025 10:15 AM >>>> > To: Junchao Zhang >>>> > Cc: PETSc users list >>>> > Subject: Re: [petsc-users] problem with nested logging, standalone example >>>> > >>>> > Hi Junchao, >>>> > >>>> > Thanks for testing. I've fixed the error but unfortunately that doesn't change the behavior, the code still hangs as before, with the same stack trace... >>>> > >>>> > Chris >>>> > >>>> > ________________________________________ >>>> > From: Junchao Zhang >> >>>> > Sent: Tuesday, July 8, 2025 10:58 PM >>>> > To: Klaij, Christiaan >>>> > Cc: PETSc users list >>>> > Subject: Re: [petsc-users] problem with nested logging, standalone example >>>> > >>>> > Hi, Chris, >>>> > First, I had to fix an error in your test by adding " PetscCallA(MatSetFromOptions(AA,ierr))" at line 254. >>>> > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>> > [0]PETSC ERROR: Object is in wrong state >>>> > [0]PETSC ERROR: Mat object's type is not set: Argument # 1 >>>> > ... >>>> > [0]PETSC ERROR: #1 MatSetValues() at /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503 >>>> > [0]PETSC ERROR: #2 ex2f.F90:258 >>>> > >>>> > Then I could ran the test without problems >>>> > mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always >>>> > 0 KSP Residual norm 1.11803 >>>> > 1 KSP Residual norm 0.591608 >>>> > 2 KSP Residual norm 0.316228 >>>> > 3 KSP Residual norm < 1.e-11 >>>> > 0 KSP Residual norm 0.707107 >>>> > 1 KSP Residual norm 0.408248 >>>> > 2 KSP Residual norm < 1.e-11 >>>> > Norm of error < 1.e-12 iterations 3 >>>> > >>>> > I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with >>>> > ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-openmpi --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" >>>> > >>>> > Could you fix the error and retry? >>>> > >>>> > --Junchao Zhang >>>> > >>>> > >>>> > On Sun, Jul 6, 2025 at 12:57?PM Klaij, Christiaan via petsc-users >>>> wrote: >>>> > Attached is a standalone example of the issue described in the >>>> > earlier thread "problem with nested logging". The issue appeared >>>> > somewhere between petsc 3.19.4 and 3.23.4. >>>> > >>>> > The example is a variation of ../ksp/tutorials/ex2f.F90, where >>>> > I've added the nested log viewer with one event as well as the >>>> > solution of a small system on rank zero. >>>> > >>>> > When running on mulitple procs the example hangs during >>>> > PetscLogView with the backtrace below. The configure.log is also >>>> > attached in the hope that you can replicate the issue. >>>> > >>>> > Chris >>>> > >>>> > >>>> > #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, >>>> > datatype=0x15554c9ef900 , src=1, tag=-12, >>>> > comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 >>>> > #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling ( >>>> > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, >>>> > dtype=0x15554c9ef900 , >>>> > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) >>>> > at base/coll_base_allreduce.c:247 >>>> > #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( >>>> > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, >>>> > dtype=0x15554c9ef900 , >>>> > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630, >>>> > algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142 >>>> > #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( >>>> > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, >>>> > dtype=0x15554c9ef900 , >>>> > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) >>>> > at coll_tuned_decision_fixed.c:216 >>>> > #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, >>>> > rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , >>>> > op=0x15554ca28980 , comm=0x7f1e30, module=0xaecb80) >>>> > at coll_hcoll_ops.c:217 >>>> > #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, >>>> > recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30) at allreduce.c:123 >>>> > #6 0x0000155553eabede in MPIU_Allreduce_Private () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >>>> > #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >>>> > #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >>>> > #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >>>> > #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >>>> > #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >>>> > #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >>>> > #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >>>> > #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >>>> > #15 0x0000155553e56232 in PetscLogHandlerView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >>>> > #16 0x0000155553e588c3 in PetscLogView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >>>> > #17 0x0000155553e40eb5 in petsclogview_ () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 >>>> > #18 0x0000000000402c8b in MAIN__ () >>>> > #19 0x00000000004023df in main () >>>> > [cid:ii_197ebccaa1d27ee6ef21] >>>> > dr. ir. Christiaan Klaij | senior researcher >>>> > Research & Development | CFD Development >>>> > T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4BUEn1h8$ > >>>> > [Facebook] >>>> > [LinkedIn] >>>> > [YouTube] >>>> > >>>> > >>>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sblondel at utk.edu Wed Jul 23 14:36:26 2025 From: sblondel at utk.edu (Blondel, Sophie) Date: Wed, 23 Jul 2025 19:36:26 +0000 Subject: [petsc-users] PETSc unable to find cuda In-Reply-To: References: Message-ID: Thank you both for your reply, Changing the gcc version fixed this issue. I will try with PETSc latest release now. Best, Sophie ________________________________ From: Satish Balay Sent: Wednesday, July 23, 2025 12:09 To: Matthew Knepley Cc: Blondel, Sophie ; PETSc users list Subject: Re: [petsc-users] PETSc unable to find cuda [You don't often get email from balay.anl at fastmail.org. Learn why this is important at https://urldefense.us/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!G_uCfscf7eWS!d017XOnXyQPKwAOyowyrrMSKBNTMI5wjlOdK84apvit3ZqHGOFCysvlQkl5EWnaiBbOwHM90BQfZDHh7GhvZg-yk$ ] >>> Executing: mpicc --version stdout: x86_64-conda-linux-gnu-cc (conda-forge gcc 13.3.0-2) 13.3.0 Executing: mpicc -o /tmp/petsc-q3as9pdp/config.libraries/conftest -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector -fvisibility=hidden -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector -fvisibility=hidden -O3 /tmp/petsc-q3as9pdp/config.libraries/conftest.o -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64 -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -L/usr/local/cuda-11.8/lib64/stubs -lcuda -lquadmath stdout: /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: warning: libstdc++.so.6, needed by /usr/local/cuda-11.8/lib64/libnvToolsExt.so, not found (try using -rpath or -rpath-link) /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/sophie/miniforge/envs/release/lib/./././libicuuc.so.73: undefined reference to `std::condition_variable::notify_all()@GLIBCXX_3.4.11' /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/sophie/miniforge/envs/release/lib/./././libicui18n.so.73: undefined reference to `__cxa_guard_acquire at CXXABI_1.3' /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/sophie/miniforge/envs/release/lib/./././libicui18n.so.73: undefined reference to `operator delete(void*)@GLIBCXX_3.4' /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/sophie/miniforge/envs/release/lib/./././libicuuc.so.73: undefined reference to `std::__once_call at GLIBCXX_3.4.11' /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/sophie/miniforge/envs/release/lib/./././libicui18n.so.73: undefined reference to `vtable for __cxxabiv1::__si_class_type_info at CXXABI_1.3' <<<< Likely you need gcc/g++-11 for this version of cuda. [or install/use a newer version of cuda]. And best if you can use latest petsc release. Satish On Wed, 23 Jul 2025, Matthew Knepley wrote: > On Wed, Jul 23, 2025 at 11:49?AM Blondel, Sophie via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > Hi, > > > > I am trying to install PETSc (3.22.2) with Kokkos and cuda support on an > > Ubuntu laptop with dependencies loaded with Conda. > > > > You are likely to have to turn off Conda before configuring. It messes up > paths for Python and other things. > > Thanks, > > Matt > > > > The configure line is: ./configure > > PETSC_DIR=/home/sophie/Workspace/xolotl-develop-source/external/petsc > > PETSC_ARCH=rel-cuda > > --prefix=/home/sophie/Workspace/xolotl-develop-cuda/external/petsc_install > > --with-fc=0 --with-cuda=1 --with-mpi --with-openmp=0 --with-debugging=0 > > --with-shared-libraries --with-64-bit-indices --download-kokkos > > --download-kokkos-kernels --download-hdf5 > > --download-hdf5-configure-arguments=--enable-parallel --COPTFLAGS=-O3 > > --CXXOPTFLAGS=-O3 --with-cuda-arch=86 --CUDAOPTFLAGS=-O3 > > > > And the configure.log is attached. Let me know if I can provide additional > > information. > > > > Best, > > > > Sophie > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrea.irwin at vuw.ac.nz Thu Jul 24 00:46:03 2025 From: andrea.irwin at vuw.ac.nz (Andrea Irwin) Date: Thu, 24 Jul 2025 05:46:03 +0000 Subject: [petsc-users] Getting vectors of field variables of a DMPlex Message-ID: <3136a960-18a7-4b3b-909d-fd750be026da@vuw.ac.nz> Hi, I'm new to PETSc, and I'm trying to learn how to use the DMPlex object. I've learned that you can assign labeled "fields" to a PETScSection/DMPlex, but I'm a bit lost as to how to access the data corresponding to each field. The documentation suggests that data from different fields is stored in the same big vector, and you can access it with PetscSectionGetFieldOffset(). I've tried this in the snippet below (using petsc4py) # get points from the layout pstart, pend = section.getChart() # set vector values according to field arr = vec.getArray() for point in range(pstart, pend): print(point) offset_u = section.getFieldOffset(point, 0) offset_v = section.getFieldOffset(point, 1) offset_w = section.getFieldOffset(point, 2) print(offset_u, offset_v, offset_w) arr[offset_u] = 1 for d in range(num_comp[1]): arr[offset_v + d] = 2 for d in range(num_comp[2]): arr[offset_w + d] = 3 Unfortunately, this either this doesn't work or I'm not using it right. The offsets are often equal when they should be different for different variables, and they get larger than the number of points in the vector. Despite being recommended directly in the documentation, there aren't any examples of it being used in context either. A previous entry in the mailing list suggests using DMCreateSubDM() to instead split the DM (and the vectors, in the process) up by field. There's also DMCreateFieldDecomposition(), which appears to do something similar. Is there a "canonical" way to access different fields? Does anyone have some simple examples of different field variables being declared, allocated, set, and accessed? I'm still new enough that I'm not even sure what questions to ask at this point, but any help would be appreciated. Thank you, Andrea Irwin -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangzongze at gmail.com Thu Jul 24 01:21:02 2025 From: yangzongze at gmail.com (Zongze Yang) Date: Thu, 24 Jul 2025 06:21:02 +0000 Subject: [petsc-users] problem with nested logging, standalone example In-Reply-To: References: <7EBA5795-308C-423C-A6B5-919F0DB8E76A@petsc.dev> Message-ID: Thank you for the quick fix ? it works well on my end. Best wishes, Zongze From: Junchao Zhang Date: Thursday, July 24, 2025 at 02:55 To: Barry Smith Cc: Zongze Yang , Klaij, Christiaan , PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example I think I have a fix at https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8583__;!!G_uCfscf7eWS!fxzaDHQxd3uHn2ASrZmv-IW42m1OeVvMXd0xo20hK2CZsZ_Mp8c7krPPe-rwleQvMo-ZGwDbRXPknH8Iv3wiy85a$ Chirs and Zongze, could you try it? Thanks! --Junchao Zhang On Tue, Jul 22, 2025 at 4:16?PM Barry Smith > wrote: Yippee! (maybe) On Jul 22, 2025, at 4:18?PM, Junchao Zhang > wrote: With Chris's example, I did reproduce the "MPI_ERR_BUFFER: invalid buffer pointer" on a machine. I am looking into it. Thanks. --Junchao Zhang On Tue, Jul 22, 2025 at 9:51?AM Zongze Yang > wrote: Hi, I encountered a similar issue with Firedrake when using the -log_view option with XML format on macOS. Below is the error message. The Firedrake code and the shell script used to run it are attached. ``` [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: General MPI error [0]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!fxzaDHQxd3uHn2ASrZmv-IW42m1OeVvMXd0xo20hK2CZsZ_Mp8c7krPPe-rwleQvMo-ZGwDbRXPknH8IvwAziGaX$ for trouble shooting. [0]PETSC ERROR: PETSc Release Version 3.23.4, unknown [0]PETSC ERROR: test.py with 2 MPI process(es) and PETSC_ARCH arch-firedrake-default on 192.168.10.51 by zzyang Tue Jul 22 22:24:05 2025 [0]PETSC ERROR: Configure options: PETSC_ARCH=arch-firedrake-default --COPTFLAGS="-O3 -march=native -mtune=native" --CXXOPTFLAGS="-O3 -march=native -mtune=native" --FOPTFLAGS="-O3 -mtune=native" --with-c2html=0 --with-debugging=0 --with-fortran-bindings=0 --with-shared-libraries=1 --with-strict-petscerrorcode --download-cmake --download-bison --download-fftw --download-mumps-avoid-mpi-in-place --with-hdf5-dir=/opt/homebrew --with-hwloc-dir=/opt/homebrew --download-metis --download-mumps --download-netcdf --download-pnetcdf --download-ptscotch --download-scalapack --download-suitesparse --download-superlu_dist --download-slepc --with-zlib --download-hpddm --download-libpng --download-ctetgen --download-tetgen --download-triangle --download-mmg --download-parmmg --download-p4est --download-eigen --download-hypre --download-pragmatic [0]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:289 [0]PETSC ERROR: #2 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:383 [0]PETSC ERROR: #3 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #4 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #5 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #6 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #7 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #8 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #9 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #10 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #11 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #12 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #13 PetscLogNestedTreePrintTop() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:420 [0]PETSC ERROR: #14 PetscLogHandlerView_Nested_XML() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:443 [0]PETSC ERROR: #15 PetscLogHandlerView_Nested() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/lognested.c:405 [0]PETSC ERROR: #16 PetscLogHandlerView() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/interface/loghandler.c:342 [0]PETSC ERROR: #17 PetscLogView() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/plog.c:2043 [0]PETSC ERROR: #18 PetscLogViewFromOptions() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/plog.c:2084 [0]PETSC ERROR: #19 PetscFinalize() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/objects/pinit.c:1552 PetscFinalize() failed [error code: 98] -------------------------------------------------------------------------- prterun has exited due to process rank 0 with PID 28986 on node 192.168.10.51 exiting improperly. There are three reasons this could occur: 1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination. 2. this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to exiting or it will be considered an "abnormal termination" 3. this process called "MPI_Abort" or "prte_abort" and the mca parameter prte_create_session_dirs is set to false. In this case, the run-time cannot detect that the abort call was an abnormal termination. Hence, the only error message you will receive is this one. This may have caused other processes in the application to be terminated by signals sent by prterun (as reported here). You can avoid this message by specifying -quiet on the prterun command line. -------------------------------------------------------------------------- ``` Best wishes, Zongze From: petsc-users > on behalf of Klaij, Christiaan via petsc-users > Date: Monday, July 14, 2025 at 15:58 To: Barry Smith > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example @Junchao: yes, all with my ex2f.F90 variation on two or three cores @Barry: it's really puzzling that you cannot reproduce. Can you try running it a dozen times in a row? And look at the report_performance.xml file? When it hangs I see some nan's, for instance here in the VecAXPY event: VecAXPY 0.5 0. 1. 1 0 self This is what I did in my latest attempt on the login node of our Rocky Linux 9 cluster: 1) download petsc-3.23.4.tar.gz from the petsc website 2) ./configure -prefix=~/petsc/install --with-cxx=0 --with-debugging=0 --with-mpi-dir=/cm/shared/apps/mpich/ge/gcc/64/3.4.2 3) adjust my example to this version of petsc (file is attached) 4) make ex2f-cklaij-dbg-v2 5) mpirun -n 2 ./ex2f-cklaij-dbg-v2 So the exact versions are: petsc-3.23.4, system mpich 3.4.2, system gcc 11.5.0 ________________________________________ From: Barry Smith > Sent: Friday, July 11, 2025 11:22 PM To: Klaij, Christiaan Cc: Junchao Zhang; PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example And yet we cannot reproduce. Please tell us the exact PETSc version and MPI implementation versions. And reattach your reproducing example. And exactly how you run it. Can you reproduce it on an "ordinary" machine, say a Mac or Linux laptop. Barry If I could reproduce the problem here is how I would debug. I put use -start_in_debugger and then put break points in places which it seem problematic. Presumably I would end up with a hang with each MPI process in a "different place" and from that I may be able to determine how that happened. > On Jul 11, 2025, at 7:58?AM, Klaij, Christiaan > wrote: > > In summary for future reference: > - tested 3 different machines, two at Marin, one at the national HPC > - tested 3 different mpi implementation (intelmpi, openmpi and mpich) > - tested openmpi in both release and debug > - tested 2 different compilers (intel and gnu), both older and very recent versions > - tested with the most basic config (./configure --with-cxx=0 --with-debugging=0 --download-mpich) > > All of these test either segfault, or hang or error-out at the call to PetscLogView. > > Chris > > ________________________________________ > From: Klaij, Christiaan > > Sent: Friday, July 11, 2025 10:10 AM > To: Barry Smith; Junchao Zhang > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > @Matt: no MPI errors indeed. I've tried with MPICH and I get the same hanging. > @Barry: both stack traces aren't exactly the same, see a sample with MPICH below. > > If it cannot be reproduced at your side, I'm afraid this is another dead end. Thanks anyway, I really appreciate all your help. > > Chris > > (gdb) bt > #0 0x000015555033bc2e in MPIDI_POSIX_mpi_release_gather_gather.constprop.0 () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #1 0x000015555033db8a in MPIDI_POSIX_mpi_allreduce_release_gather () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #2 0x000015555033e70f in MPIR_Allreduce () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #3 0x000015555033f22e in PMPI_Allreduce () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #4 0x0000155553f85d69 in MPIU_Allreduce_Count (comm=-2080374782, > op=1476395020, dtype=1275072547, count=1, outbuf=0x7fffffffac70, > inbuf=0x7fffffffac60) > at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1839 > #5 MPIU_Allreduce_Private (inbuf=inbuf at entry=0x7fffffffac60, > outbuf=outbuf at entry=0x7fffffffac70, count=count at entry=1, > dtype=dtype at entry=1275072547, op=op at entry=1476395020, comm=-2080374782) > at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1869 > #6 0x0000155553f33dbe in PetscPrintXMLNestedLinePerfResults ( > viewer=viewer at entry=0x458890, name=name at entry=0x155554ef6a0d 'mbps\000', > value=, minthreshold=minthreshold at entry=0, > maxthreshold=maxthreshold at entry=0.01, > minmaxtreshold=minmaxtreshold at entry=1.05) > at /home/cklaij/petsc/petsc-3.23.4/src/sys/logging/handler/impls/nested/xmlviewer.c:255 > > > (gdb) bt > #0 0x000015554fed3b17 in clock_gettime at GLIBC_2.2.5 () from /lib64/libc.so.6 > #1 0x0000155550b0de71 in ofi_gettime_ns () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #2 0x0000155550b0dec9 in ofi_gettime_ms () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #3 0x0000155550b2fab5 in sock_cq_sreadfrom () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #4 0x00001555505ca6f7 in MPIDI_OFI_progress () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #5 0x0000155550591fe9 in progress_test () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #6 0x00001555505924a3 in MPID_Progress_wait () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #7 0x000015555043463e in MPIR_Wait_state () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #8 0x000015555052ec49 in MPIC_Wait () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #9 0x000015555053093e in MPIC_Sendrecv () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #10 0x00001555504bf674 in MPIR_Allreduce_intra_recursive_doubling () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #11 0x00001555505b61de in MPIDI_OFI_mpi_finalize_hook () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > > ________________________________________ > From: Barry Smith > > Sent: Thursday, July 10, 2025 11:10 PM > To: Junchao Zhang > Cc: Klaij, Christiaan; PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > > I cannot reproduce > > On Jul 10, 2025, at 3:46?PM, Junchao Zhang > wrote: > > Adding -mca coll_hcoll_enable 0 didn't change anything at my end. Strange. > > --Junchao Zhang > > > On Thu, Jul 10, 2025 at 3:39?AM Klaij, Christiaan >> wrote: > An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0, the code does not hang but gives the error below. > > Chris > > > $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 1.11803 > 1 KSP Residual norm 0.591608 > 2 KSP Residual norm 0.316228 > 3 KSP Residual norm < 1.e-11 > 0 KSP Residual norm 0.707107 > 1 KSP Residual norm 0.408248 > 2 KSP Residual norm < 1.e-11 > Norm of error < 1.e-12 iterations 3 > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: General MPI error > [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer > [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK43J9p4SM$ for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 > [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on login1 by cklaij Thu Jul 10 10:33:33 2025 > [1]PETSC ERROR: Configure options: --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 --with-mpe=0 --with-debugging=0 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4VVy6P4U$ --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4-9b1K84$ --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4Y9uaqiQ$ --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" > [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 > [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377 > [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 > [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 > [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 > [1]PETSC ERROR: #7 PetscLogHandlerView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 > [1]PETSC ERROR: #8 PetscLogView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040 > [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301 > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF > Proc: [[55228,1],1] > Errorcode: 98 > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > prterun has exited due to process rank 1 with PID 0 on node login1 calling > "abort". This may have caused other processes in the application to be > terminated by signals sent by prterun (as reported here). > -------------------------------------------------------------------------- > > ________________________________________ > > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4BUEn1h8$ > > > > > > From: Klaij, Christiaan >> > Sent: Thursday, July 10, 2025 10:15 AM > To: Junchao Zhang > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > Hi Junchao, > > Thanks for testing. I've fixed the error but unfortunately that doesn't change the behavior, the code still hangs as before, with the same stack trace... > > Chris > > ________________________________________ > From: Junchao Zhang >> > Sent: Tuesday, July 8, 2025 10:58 PM > To: Klaij, Christiaan > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > Hi, Chris, > First, I had to fix an error in your test by adding " PetscCallA(MatSetFromOptions(AA,ierr))" at line 254. > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Mat object's type is not set: Argument # 1 > ... > [0]PETSC ERROR: #1 MatSetValues() at /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503 > [0]PETSC ERROR: #2 ex2f.F90:258 > > Then I could ran the test without problems > mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 1.11803 > 1 KSP Residual norm 0.591608 > 2 KSP Residual norm 0.316228 > 3 KSP Residual norm < 1.e-11 > 0 KSP Residual norm 0.707107 > 1 KSP Residual norm 0.408248 > 2 KSP Residual norm < 1.e-11 > Norm of error < 1.e-12 iterations 3 > > I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with > ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-openmpi --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" > > Could you fix the error and retry? > > --Junchao Zhang > > > On Sun, Jul 6, 2025 at 12:57?PM Klaij, Christiaan via petsc-users >>>> wrote: > Attached is a standalone example of the issue described in the > earlier thread "problem with nested logging". The issue appeared > somewhere between petsc 3.19.4 and 3.23.4. > > The example is a variation of ../ksp/tutorials/ex2f.F90, where > I've added the nested log viewer with one event as well as the > solution of a small system on rank zero. > > When running on mulitple procs the example hangs during > PetscLogView with the backtrace below. The configure.log is also > attached in the hope that you can replicate the issue. > > Chris > > > #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, > datatype=0x15554c9ef900 , src=1, tag=-12, > comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 > #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > at base/coll_base_allreduce.c:247 > #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630, > algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142 > #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > at coll_tuned_decision_fixed.c:216 > #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, > rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaecb80) > at coll_hcoll_ops.c:217 > #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, > recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30) at allreduce.c:123 > #6 0x0000155553eabede in MPIU_Allreduce_Private () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #15 0x0000155553e56232 in PetscLogHandlerView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #16 0x0000155553e588c3 in PetscLogView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #17 0x0000155553e40eb5 in petsclogview_ () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #18 0x0000000000402c8b in MAIN__ () > #19 0x00000000004023df in main () > [cid:ii_197ebccaa1d27ee6ef21] > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4BUEn1h8$ > > [Facebook] > [LinkedIn] > [YouTube] > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jul 24 05:59:51 2025 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 24 Jul 2025 06:59:51 -0400 Subject: [petsc-users] Getting vectors of field variables of a DMPlex In-Reply-To: <3136a960-18a7-4b3b-909d-fd750be026da@vuw.ac.nz> References: <3136a960-18a7-4b3b-909d-fd750be026da@vuw.ac.nz> Message-ID: On Thu, Jul 24, 2025 at 1:46?AM Andrea Irwin via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi, > > I'm new to PETSc, and I'm trying to learn how to use the DMPlex object. > I've learned that you can assign labeled "fields" to a PETScSection/DMPlex, > but I'm a bit lost as to how to access the data corresponding to each field. > > The documentation suggests that data from different fields is stored in > the same big vector, and you can access it with > PetscSectionGetFieldOffset(). I've tried this in the snippet below (using > petsc4py) > > # get points from the layout > pstart, pend = section.getChart() > > # set vector values according to field > arr = vec.getArray() > > for point in range(pstart, pend): > print(point) > offset_u = section.getFieldOffset(point, 0) > offset_v = section.getFieldOffset(point, 1) > offset_w = section.getFieldOffset(point, 2) > > print(offset_u, offset_v, offset_w) > > arr[offset_u] = 1 > > for d in range(num_comp[1]): > arr[offset_v + d] = 2 > > for d in range(num_comp[2]): > arr[offset_w + d] = 3 > > Unfortunately, this either this doesn't work or I'm not using it right. > The offsets are often equal when they should be different for different > variables, and they get larger than the number of points in the vector. > Despite being recommended directly in the documentation, there aren't any > examples of it being used in context either. > 1. The above looks like it should work. Feel free to send a small example we can run. 2. It would be a good check to also call section.getFieldDof() to see if that field has any unknowns on the point 3. You would expect dof offsets to be larger than the number of points. Even if each field only has 1 dof, the offsets will be 3 * Npoints. 4. We do not normally use this method in examples because for the situations we show, it is easier to use DMCreateSubDM() to pull out the entire subvector, or to use DMPlexPoint*() to get a pointer directly to the data. > A previous entry in the mailing list suggests using DMCreateSubDM() to > instead split the DM (and the vectors, in the process) up by field. There's > also DMCreateFieldDecomposition(), which appears to do something similar. > > Is there a "canonical" way to access different fields? Does anyone have > some simple examples of different field variables being declared, > allocated, set, and accessed? I'm still new enough that I'm not even sure > what questions to ask at this point, but any help would be appreciated. > There is no canonical way because there are many different reasons for dof access. Most of the examples, say in KSP, SNES, TS, and TAO are using the FE assembly interface and not setting values by hand. Most of the test/tutorials under Plex are directly setting coordinates, but not fields. However, Plex tutorial ex6.c: Uses DMPlexPoint to look at spectral element access patterns Plex test ex26: Uses CreateSubDM() extensively Thanks, Matt > Thank you, > > Andrea Irwin > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cn7IrTrgZMR2vAc2HvctUIDWZOfBdYI7H7njJ98SzhIKVtz8LCZqfDmhvC4PycTNu4HoKzTXpRyf04eSnfYU$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From C.Klaij at marin.nl Thu Jul 24 07:02:03 2025 From: C.Klaij at marin.nl (Klaij, Christiaan) Date: Thu, 24 Jul 2025 12:02:03 +0000 Subject: [petsc-users] problem with nested logging, standalone example In-Reply-To: References: <7EBA5795-308C-423C-A6B5-919F0DB8E76A@petsc.dev> Message-ID: Hi Junchao, Your fix works here too, not only for ex2f but more importantly also for our simulation code. What a relief, thanks a lot! Out of curiosity, do you understand why it failed to reproduce initially? Chris _____ dr. ir. Christiaan Klaij | senior researcher Research & Development | CFD Development T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!chUVMpNpdFW_9GdVtoUeUbibFyK85qjxNrQXMnMmnMd8EM89uGGMImRDqorLBEfZlQvUW9oFNeAY2-BEkw4v_Lc$ ___________________________________ From: Zongze Yang Sent: Thursday, July 24, 2025 8:21 AM To: Junchao Zhang; Barry Smith Cc: Klaij, Christiaan; PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example You don't often get email from yangzongze at gmail.com. Learn why this is important Thank you for the quick fix ? it works well on my end. Best wishes, Zongze From: Junchao Zhang Date: Thursday, July 24, 2025 at 02:55 To: Barry Smith Cc: Zongze Yang , Klaij, Christiaan , PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example I think I have a fix at https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8583__;!!G_uCfscf7eWS!chUVMpNpdFW_9GdVtoUeUbibFyK85qjxNrQXMnMmnMd8EM89uGGMImRDqorLBEfZlQvUW9oFNeAY2-BEG6w6w7Y$ Chirs and Zongze, could you try it? Thanks! --Junchao Zhang On Tue, Jul 22, 2025 at 4:16?PM Barry Smith > wrote: Yippee! (maybe) On Jul 22, 2025, at 4:18?PM, Junchao Zhang > wrote: With Chris's example, I did reproduce the "MPI_ERR_BUFFER: invalid buffer pointer" on a machine. I am looking into it. Thanks. --Junchao Zhang On Tue, Jul 22, 2025 at 9:51?AM Zongze Yang > wrote: Hi, I encountered a similar issue with Firedrake when using the -log_view option with XML format on macOS. Below is the error message. The Firedrake code and the shell script used to run it are attached. ``` [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: General MPI error [0]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!chUVMpNpdFW_9GdVtoUeUbibFyK85qjxNrQXMnMmnMd8EM89uGGMImRDqorLBEfZlQvUW9oFNeAY2-BEtbvLIZk$ for trouble shooting. [0]PETSC ERROR: PETSc Release Version 3.23.4, unknown [0]PETSC ERROR: test.py with 2 MPI process(es) and PETSC_ARCH arch-firedrake-default on 192.168.10.51 by zzyang Tue Jul 22 22:24:05 2025 [0]PETSC ERROR: Configure options: PETSC_ARCH=arch-firedrake-default --COPTFLAGS="-O3 -march=native -mtune=native" --CXXOPTFLAGS="-O3 -march=native -mtune=native" --FOPTFLAGS="-O3 -mtune=native" --with-c2html=0 --with-debugging=0 --with-fortran-bindings=0 --with-shared-libraries=1 --with-strict-petscerrorcode --download-cmake --download-bison --download-fftw --download-mumps-avoid-mpi-in-place --with-hdf5-dir=/opt/homebrew --with-hwloc-dir=/opt/homebrew --download-metis --download-mumps --download-netcdf --download-pnetcdf --download-ptscotch --download-scalapack --download-suitesparse --download-superlu_dist --download-slepc --with-zlib --download-hpddm --download-libpng --download-ctetgen --download-tetgen --download-triangle --download-mmg --download-parmmg --download-p4est --download-eigen --download-hypre --download-pragmatic [0]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:289 [0]PETSC ERROR: #2 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:383 [0]PETSC ERROR: #3 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #4 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #5 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #6 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #7 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #8 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #9 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #10 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #11 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #12 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384 [0]PETSC ERROR: #13 PetscLogNestedTreePrintTop() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:420 [0]PETSC ERROR: #14 PetscLogHandlerView_Nested_XML() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:443 [0]PETSC ERROR: #15 PetscLogHandlerView_Nested() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/lognested.c:405 [0]PETSC ERROR: #16 PetscLogHandlerView() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/interface/loghandler.c:342 [0]PETSC ERROR: #17 PetscLogView() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/plog.c:2043 [0]PETSC ERROR: #18 PetscLogViewFromOptions() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/plog.c:2084 [0]PETSC ERROR: #19 PetscFinalize() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/objects/pinit.c:1552 PetscFinalize() failed [error code: 98] -------------------------------------------------------------------------- prterun has exited due to process rank 0 with PID 28986 on node 192.168.10.51 exiting improperly. There are three reasons this could occur: 1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination. 2. this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to exiting or it will be considered an "abnormal termination" 3. this process called "MPI_Abort" or "prte_abort" and the mca parameter prte_create_session_dirs is set to false. In this case, the run-time cannot detect that the abort call was an abnormal termination. Hence, the only error message you will receive is this one. This may have caused other processes in the application to be terminated by signals sent by prterun (as reported here). You can avoid this message by specifying -quiet on the prterun command line. -------------------------------------------------------------------------- ``` Best wishes, Zongze From: petsc-users > on behalf of Klaij, Christiaan via petsc-users > Date: Monday, July 14, 2025 at 15:58 To: Barry Smith > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example @Junchao: yes, all with my ex2f.F90 variation on two or three cores @Barry: it's really puzzling that you cannot reproduce. Can you try running it a dozen times in a row? And look at the report_performance.xml file? When it hangs I see some nan's, for instance here in the VecAXPY event: VecAXPY 0.5 0. 1. 1 0 self This is what I did in my latest attempt on the login node of our Rocky Linux 9 cluster: 1) download petsc-3.23.4.tar.gz from the petsc website 2) ./configure -prefix=~/petsc/install --with-cxx=0 --with-debugging=0 --with-mpi-dir=/cm/shared/apps/mpich/ge/gcc/64/3.4.2 3) adjust my example to this version of petsc (file is attached) 4) make ex2f-cklaij-dbg-v2 5) mpirun -n 2 ./ex2f-cklaij-dbg-v2 So the exact versions are: petsc-3.23.4, system mpich 3.4.2, system gcc 11.5.0 ________________________________________ From: Barry Smith > Sent: Friday, July 11, 2025 11:22 PM To: Klaij, Christiaan Cc: Junchao Zhang; PETSc users list Subject: Re: [petsc-users] problem with nested logging, standalone example And yet we cannot reproduce. Please tell us the exact PETSc version and MPI implementation versions. And reattach your reproducing example. And exactly how you run it. Can you reproduce it on an "ordinary" machine, say a Mac or Linux laptop. Barry If I could reproduce the problem here is how I would debug. I put use -start_in_debugger and then put break points in places which it seem problematic. Presumably I would end up with a hang with each MPI process in a "different place" and from that I may be able to determine how that happened. > On Jul 11, 2025, at 7:58?AM, Klaij, Christiaan > wrote: > > In summary for future reference: > - tested 3 different machines, two at Marin, one at the national HPC > - tested 3 different mpi implementation (intelmpi, openmpi and mpich) > - tested openmpi in both release and debug > - tested 2 different compilers (intel and gnu), both older and very recent versions > - tested with the most basic config (./configure --with-cxx=0 --with-debugging=0 --download-mpich) > > All of these test either segfault, or hang or error-out at the call to PetscLogView. > > Chris > > ________________________________________ > From: Klaij, Christiaan > > Sent: Friday, July 11, 2025 10:10 AM > To: Barry Smith; Junchao Zhang > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > @Matt: no MPI errors indeed. I've tried with MPICH and I get the same hanging. > @Barry: both stack traces aren't exactly the same, see a sample with MPICH below. > > If it cannot be reproduced at your side, I'm afraid this is another dead end. Thanks anyway, I really appreciate all your help. > > Chris > > (gdb) bt > #0 0x000015555033bc2e in MPIDI_POSIX_mpi_release_gather_gather.constprop.0 () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #1 0x000015555033db8a in MPIDI_POSIX_mpi_allreduce_release_gather () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #2 0x000015555033e70f in MPIR_Allreduce () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #3 0x000015555033f22e in PMPI_Allreduce () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #4 0x0000155553f85d69 in MPIU_Allreduce_Count (comm=-2080374782, > op=1476395020, dtype=1275072547, count=1, outbuf=0x7fffffffac70, > inbuf=0x7fffffffac60) > at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1839 > #5 MPIU_Allreduce_Private (inbuf=inbuf at entry=0x7fffffffac60, > outbuf=outbuf at entry=0x7fffffffac70, count=count at entry=1, > dtype=dtype at entry=1275072547, op=op at entry=1476395020, comm=-2080374782) > at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1869 > #6 0x0000155553f33dbe in PetscPrintXMLNestedLinePerfResults ( > viewer=viewer at entry=0x458890, name=name at entry=0x155554ef6a0d 'mbps\000', > value=, minthreshold=minthreshold at entry=0, > maxthreshold=maxthreshold at entry=0.01, > minmaxtreshold=minmaxtreshold at entry=1.05) > at /home/cklaij/petsc/petsc-3.23.4/src/sys/logging/handler/impls/nested/xmlviewer.c:255 > > > (gdb) bt > #0 0x000015554fed3b17 in clock_gettime at GLIBC_2.2.5 () from /lib64/libc.so.6 > #1 0x0000155550b0de71 in ofi_gettime_ns () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #2 0x0000155550b0dec9 in ofi_gettime_ms () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #3 0x0000155550b2fab5 in sock_cq_sreadfrom () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #4 0x00001555505ca6f7 in MPIDI_OFI_progress () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #5 0x0000155550591fe9 in progress_test () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #6 0x00001555505924a3 in MPID_Progress_wait () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #7 0x000015555043463e in MPIR_Wait_state () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #8 0x000015555052ec49 in MPIC_Wait () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #9 0x000015555053093e in MPIC_Sendrecv () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #10 0x00001555504bf674 in MPIR_Allreduce_intra_recursive_doubling () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > #11 0x00001555505b61de in MPIDI_OFI_mpi_finalize_hook () > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12 > > ________________________________________ > From: Barry Smith > > Sent: Thursday, July 10, 2025 11:10 PM > To: Junchao Zhang > Cc: Klaij, Christiaan; PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > > I cannot reproduce > > On Jul 10, 2025, at 3:46?PM, Junchao Zhang > wrote: > > Adding -mca coll_hcoll_enable 0 didn't change anything at my end. Strange. > > --Junchao Zhang > > > On Thu, Jul 10, 2025 at 3:39?AM Klaij, Christiaan >> wrote: > An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0, the code does not hang but gives the error below. > > Chris > > > $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 1.11803 > 1 KSP Residual norm 0.591608 > 2 KSP Residual norm 0.316228 > 3 KSP Residual norm < 1.e-11 > 0 KSP Residual norm 0.707107 > 1 KSP Residual norm 0.408248 > 2 KSP Residual norm < 1.e-11 > Norm of error < 1.e-12 iterations 3 > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: General MPI error > [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer > [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK43J9p4SM$ for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025 > [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on login1 by cklaij Thu Jul 10 10:33:33 2025 > [1]PETSC ERROR: Configure options: --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 --with-mpe=0 --with-debugging=0 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4VVy6P4U$ --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4-9b1K84$ --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4Y9uaqiQ$ --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" > [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289 > [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377 > [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384 > [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420 > [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443 > [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405 > [1]PETSC ERROR: #7 PetscLogHandlerView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342 > [1]PETSC ERROR: #8 PetscLogView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040 > [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301 > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF > Proc: [[55228,1],1] > Errorcode: 98 > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > prterun has exited due to process rank 1 with PID 0 on node login1 calling > "abort". This may have caused other processes in the application to be > terminated by signals sent by prterun (as reported here). > -------------------------------------------------------------------------- > > ________________________________________ > > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4BUEn1h8$ > > > > > > From: Klaij, Christiaan >> > Sent: Thursday, July 10, 2025 10:15 AM > To: Junchao Zhang > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > Hi Junchao, > > Thanks for testing. I've fixed the error but unfortunately that doesn't change the behavior, the code still hangs as before, with the same stack trace... > > Chris > > ________________________________________ > From: Junchao Zhang >> > Sent: Tuesday, July 8, 2025 10:58 PM > To: Klaij, Christiaan > Cc: PETSc users list > Subject: Re: [petsc-users] problem with nested logging, standalone example > > Hi, Chris, > First, I had to fix an error in your test by adding " PetscCallA(MatSetFromOptions(AA,ierr))" at line 254. > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Mat object's type is not set: Argument # 1 > ... > [0]PETSC ERROR: #1 MatSetValues() at /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503 > [0]PETSC ERROR: #2 ex2f.F90:258 > > Then I could ran the test without problems > mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always > 0 KSP Residual norm 1.11803 > 1 KSP Residual norm 0.591608 > 2 KSP Residual norm 0.316228 > 3 KSP Residual norm < 1.e-11 > 0 KSP Residual norm 0.707107 > 1 KSP Residual norm 0.408248 > 2 KSP Residual norm < 1.e-11 > Norm of error < 1.e-12 iterations 3 > > I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with > ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-openmpi --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" > > Could you fix the error and retry? > > --Junchao Zhang > > > On Sun, Jul 6, 2025 at 12:57?PM Klaij, Christiaan via petsc-users >>>> wrote: > Attached is a standalone example of the issue described in the > earlier thread "problem with nested logging". The issue appeared > somewhere between petsc 3.19.4 and 3.23.4. > > The example is a variation of ../ksp/tutorials/ex2f.F90, where > I've added the nested log viewer with one event as well as the > solution of a small system on rank zero. > > When running on mulitple procs the example hangs during > PetscLogView with the backtrace below. The configure.log is also > attached in the hope that you can replicate the issue. > > Chris > > > #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1, > datatype=0x15554c9ef900 , src=1, tag=-12, > comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700 > #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > at base/coll_base_allreduce.c:247 > #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630, > algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142 > #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed ( > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1, > dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaec630) > at coll_tuned_decision_fixed.c:216 > #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20, > rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 , > op=0x15554ca28980 , comm=0x7f1e30, module=0xaecb80) > at coll_hcoll_ops.c:217 > #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20, > recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 , op=0x15554ca28980 , comm=0x7f1e30) at allreduce.c:123 > #6 0x0000155553eabede in MPIU_Allreduce_Private () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #15 0x0000155553e56232 in PetscLogHandlerView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #16 0x0000155553e588c3 in PetscLogView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #17 0x0000155553e40eb5 in petsclogview_ () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22 > #18 0x0000000000402c8b in MAIN__ () > #19 0x00000000004023df in main () > [cid:ii_197ebccaa1d27ee6ef21] > dr. ir. Christiaan Klaij | senior researcher > Research & Development | CFD Development > T +31 317 49 33 44 | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4BUEn1h8$ > > [Facebook] > [LinkedIn] > [YouTube] > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image933554.png Type: image/png Size: 5004 bytes Desc: image933554.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image203317.png Type: image/png Size: 487 bytes Desc: image203317.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image449198.png Type: image/png Size: 504 bytes Desc: image449198.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image038368.png Type: image/png Size: 482 bytes Desc: image038368.png URL: From alexis.salzman at ec-nantes.fr Fri Jul 25 04:15:48 2025 From: alexis.salzman at ec-nantes.fr (Alexis SALZMAN) Date: Fri, 25 Jul 2025 11:15:48 +0200 Subject: [petsc-users] MatCreateSubMatricesMPI strange behavior Message-ID: <9d648a1d-72d2-44d0-8a3b-a9d64b01604f@ec-nantes.fr> Hi, As I am relatively new to Petsc, I may have misunderstood how to use the MatCreateSubMatricesMPI function. The attached code is tuned for three processes and extracts one matrix for each colour of a subcommunicator that has been created using the MPI_Comm_split function from an? MPIAij matrix. The following error message appears when the code is set to its default configuration (i.e. when a rectangular matrix is extracted with more rows than columns for colour 0): [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Argument out of range [0]PETSC ERROR: Column too large: col 4 max 3 [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!ZqH097BZ0G0O3WI7RWrwIKFNpyk0czSWEqfusAeTlgEygAffwpgBUzsLw1TIoGkjZ3mYG-NRQxxFoxU4y8EyY0ofiz9I43Qwe0w$ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.22.2, unknown ... petsc git hash 2a89477b25f compiled on a dell i9 computer with Gcc 14.3, mkl 2025.2, ..... [0]PETSC ERROR: #1 MatSetValues_SeqAIJ() at ...petsc/src/mat/impls/aij/seq/aij.c:426 [0]PETSC ERROR: #2 MatSetValues() at ...petsc/src/mat/interface/matrix.c:1543 [0]PETSC ERROR: #3 MatSetSeqMats_MPIAIJ() at .../petsc/src/mat/impls/aij/mpi/mpiov.c:2965 [0]PETSC ERROR: #4 MatCreateSubMatricesMPI_MPIXAIJ() at .../petsc/src/mat/impls/aij/mpi/mpiov.c:3163 [0]PETSC ERROR: #5 MatCreateSubMatricesMPI_MPIAIJ() at .../petsc/src/mat/impls/aij/mpi/mpiov.c:3196 [0]PETSC ERROR: #6 MatCreateSubMatricesMPI() at .../petsc/src/mat/interface/matrix.c:7293 [0]PETSC ERROR: #7 main() at sub.c:169 When the '-ok' option is selected, the code extracts a square matrix for colour 0, which runs smoothly in this case. Selecting the '-trans' option swaps the row and column selection indices, providing a transposed submatrix smoothly. For colour 1, which uses only one process and is therefore sequential, rectangular extraction is OK regardless of the shape. Is this dependency on the shape expected? Have I missed an important tuning step somewhere? Thank you in advance for any clarification. Regards A.S. P.S.: I'm sorry, but as I'm leaving my office for the following weeks this evening, I won't be very responsive during this period. -------------- next part -------------- A non-text attachment was scrubbed... Name: sub.c Type: text/x-csrc Size: 6037 bytes Desc: not available URL: From sblondel at utk.edu Fri Jul 25 09:26:02 2025 From: sblondel at utk.edu (Blondel, Sophie) Date: Fri, 25 Jul 2025 14:26:02 +0000 Subject: [petsc-users] PETSc unable to find cuda In-Reply-To: References: Message-ID: Hi everyone, I'm now testing PETSc v3.23.4 with my code, with the serial backend this time, and I'm getting a segfault in DMCreateMatrix. "-log_view" is not providing any information. Should I create a new email thread for this issue? Best, Sophie ________________________________ From: Blondel, Sophie Sent: Wednesday, July 23, 2025 15:36 To: Matthew Knepley ; petsc-users ; balay.anl at fastmail.org Subject: Re: [petsc-users] PETSc unable to find cuda Thank you both for your reply, Changing the gcc version fixed this issue. I will try with PETSc latest release now. Best, Sophie ________________________________ From: Satish Balay Sent: Wednesday, July 23, 2025 12:09 To: Matthew Knepley Cc: Blondel, Sophie ; PETSc users list Subject: Re: [petsc-users] PETSc unable to find cuda [You don't often get email from balay.anl at fastmail.org. Learn why this is important at https://urldefense.us/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!G_uCfscf7eWS!YdOBIx1u7kj6Vj5P1B9HYULC4E4LJvOYoi4GpL_FrXb_T-z-iA7GKZPkniVsoDXuoimsHRrFEKSkL1uMYlntdEKo$ ] >>> Executing: mpicc --version stdout: x86_64-conda-linux-gnu-cc (conda-forge gcc 13.3.0-2) 13.3.0 Executing: mpicc -o /tmp/petsc-q3as9pdp/config.libraries/conftest -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector -fvisibility=hidden -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector -fvisibility=hidden -O3 /tmp/petsc-q3as9pdp/config.libraries/conftest.o -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64 -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -L/usr/local/cuda-11.8/lib64/stubs -lcuda -lquadmath stdout: /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: warning: libstdc++.so.6, needed by /usr/local/cuda-11.8/lib64/libnvToolsExt.so, not found (try using -rpath or -rpath-link) /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/sophie/miniforge/envs/release/lib/./././libicuuc.so.73: undefined reference to `std::condition_variable::notify_all()@GLIBCXX_3.4.11' /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/sophie/miniforge/envs/release/lib/./././libicui18n.so.73: undefined reference to `__cxa_guard_acquire at CXXABI_1.3' /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/sophie/miniforge/envs/release/lib/./././libicui18n.so.73: undefined reference to `operator delete(void*)@GLIBCXX_3.4' /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/sophie/miniforge/envs/release/lib/./././libicuuc.so.73: undefined reference to `std::__once_call at GLIBCXX_3.4.11' /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/sophie/miniforge/envs/release/lib/./././libicui18n.so.73: undefined reference to `vtable for __cxxabiv1::__si_class_type_info at CXXABI_1.3' <<<< Likely you need gcc/g++-11 for this version of cuda. [or install/use a newer version of cuda]. And best if you can use latest petsc release. Satish On Wed, 23 Jul 2025, Matthew Knepley wrote: > On Wed, Jul 23, 2025 at 11:49?AM Blondel, Sophie via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > Hi, > > > > I am trying to install PETSc (3.22.2) with Kokkos and cuda support on an > > Ubuntu laptop with dependencies loaded with Conda. > > > > You are likely to have to turn off Conda before configuring. It messes up > paths for Python and other things. > > Thanks, > > Matt > > > > The configure line is: ./configure > > PETSC_DIR=/home/sophie/Workspace/xolotl-develop-source/external/petsc > > PETSC_ARCH=rel-cuda > > --prefix=/home/sophie/Workspace/xolotl-develop-cuda/external/petsc_install > > --with-fc=0 --with-cuda=1 --with-mpi --with-openmp=0 --with-debugging=0 > > --with-shared-libraries --with-64-bit-indices --download-kokkos > > --download-kokkos-kernels --download-hdf5 > > --download-hdf5-configure-arguments=--enable-parallel --COPTFLAGS=-O3 > > --CXXOPTFLAGS=-O3 --with-cuda-arch=86 --CUDAOPTFLAGS=-O3 > > > > And the configure.log is attached. Let me know if I can provide additional > > information. > > > > Best, > > > > Sophie > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Fri Jul 25 12:27:50 2025 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 25 Jul 2025 12:27:50 -0500 Subject: [petsc-users] PETSc unable to find cuda In-Reply-To: References: Message-ID: Yes. --Junchao Zhang On Fri, Jul 25, 2025 at 9:34?AM Blondel, Sophie via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi everyone, > > I'm now testing PETSc v3.23.4 with my code, with the serial backend this > time, and I'm getting a segfault in DMCreateMatrix. "-log_view" is not > providing any information. Should I create a new email thread for this > issue? > > Best, > > Sophie > ------------------------------ > *From:* Blondel, Sophie > *Sent:* Wednesday, July 23, 2025 15:36 > *To:* Matthew Knepley ; petsc-users < > petsc-users at mcs.anl.gov>; balay.anl at fastmail.org > *Subject:* Re: [petsc-users] PETSc unable to find cuda > > Thank you both for your reply, > > Changing the gcc version fixed this issue. I will try with PETSc latest > release now. > > Best, > > Sophie > ------------------------------ > *From:* Satish Balay > *Sent:* Wednesday, July 23, 2025 12:09 > *To:* Matthew Knepley > *Cc:* Blondel, Sophie ; PETSc users list < > petsc-users at mcs.anl.gov> > *Subject:* Re: [petsc-users] PETSc unable to find cuda > > [You don't often get email from balay.anl at fastmail.org. Learn why this is > important at https://urldefense.us/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!G_uCfscf7eWS!bxyzod8dN-P0zLdIr1eeuNiSOmM35RtT9GWtJwafue2TQFgaNorEbFAE6CDQO0E90Z3pREv6cFV5gY0E-hiDAPwd9JXy$ > > ] > > >>> > Executing: mpicc --version > stdout: > x86_64-conda-linux-gnu-cc (conda-forge gcc 13.3.0-2) 13.3.0 > > Executing: mpicc -o /tmp/petsc-q3as9pdp/config.libraries/conftest > -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch > -fstack-protector -fvisibility=hidden -Wall -Wwrite-strings > -Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector > -fvisibility=hidden -O3 /tmp/petsc-q3as9pdp/config.libraries/conftest.o > -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64 -lcudart > -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand > -L/usr/local/cuda-11.8/lib64/stubs -lcuda -lquadmath > stdout: > /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: > warning: libstdc++.so.6, needed by > /usr/local/cuda-11.8/lib64/libnvToolsExt.so, not found (try using -rpath or > -rpath-link) > /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: > /home/sophie/miniforge/envs/release/lib/./././libicuuc.so.73: undefined > reference to `std::condition_variable::notify_all()@GLIBCXX_3.4.11' > /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: > /home/sophie/miniforge/envs/release/lib/./././libicui18n.so.73: undefined > reference to `__cxa_guard_acquire at CXXABI_1.3' > /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: > /home/sophie/miniforge/envs/release/lib/./././libicui18n.so.73: undefined > reference to `operator delete(void*)@GLIBCXX_3.4' > /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: > /home/sophie/miniforge/envs/release/lib/./././libicuuc.so.73: undefined > reference to `std::__once_call at GLIBCXX_3.4.11' > /home/sophie/miniforge/envs/release/bin/../lib/gcc/x86_64-conda-linux-gnu/13.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: > /home/sophie/miniforge/envs/release/lib/./././libicui18n.so.73: undefined > reference to `vtable for __cxxabiv1::__si_class_type_info at CXXABI_1.3' > <<<< > > Likely you need gcc/g++-11 for this version of cuda. [or install/use a > newer version of cuda]. And best if you can use latest petsc release. > > Satish > > On Wed, 23 Jul 2025, Matthew Knepley wrote: > > > On Wed, Jul 23, 2025 at 11:49?AM Blondel, Sophie via petsc-users < > > petsc-users at mcs.anl.gov> wrote: > > > > > Hi, > > > > > > I am trying to install PETSc (3.22.2) with Kokkos and cuda support on > an > > > Ubuntu laptop with dependencies loaded with Conda. > > > > > > > You are likely to have to turn off Conda before configuring. It messes up > > paths for Python and other things. > > > > Thanks, > > > > Matt > > > > > > > The configure line is: ./configure > > > PETSC_DIR=/home/sophie/Workspace/xolotl-develop-source/external/petsc > > > PETSC_ARCH=rel-cuda > > > > --prefix=/home/sophie/Workspace/xolotl-develop-cuda/external/petsc_install > > > --with-fc=0 --with-cuda=1 --with-mpi --with-openmp=0 --with-debugging=0 > > > --with-shared-libraries --with-64-bit-indices --download-kokkos > > > --download-kokkos-kernels --download-hdf5 > > > --download-hdf5-configure-arguments=--enable-parallel --COPTFLAGS=-O3 > > > --CXXOPTFLAGS=-O3 --with-cuda-arch=86 --CUDAOPTFLAGS=-O3 > > > > > > And the configure.log is attached. Let me know if I can provide > additional > > > information. > > > > > > Best, > > > > > > Sophie > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sblondel at utk.edu Fri Jul 25 14:00:19 2025 From: sblondel at utk.edu (Blondel, Sophie) Date: Fri, 25 Jul 2025 19:00:19 +0000 Subject: [petsc-users] Segfault in DMCreateMatrix Message-ID: Hi, I am using PETSc version v3.23.4 for my code https://urldefense.us/v3/__https://github.com/ORNL-Fusion/xolotl/blob/develop/xolotl/solver/src/PetscSolver.cpp__;!!G_uCfscf7eWS!dqsvk_x8eZNzest4mbTcVDQ6NcJLDJOQGX0QxmrWxJeM8s8Hi2altiu6F4EFTeZLDmw5F1rDvE-jHw3VgvjKx_kg$ and I am getting a segfault in DMCreateMatrix, more specifically from gdb: #0 0x0000000000000651 in ?? () #1 0x00007ffff58f0119 in DMCreateMatrix (dm=0x55555571bb90, mat=0x7fffffffc680) at /home/sophie/Workspace/xolotl-develop-source/external/petsc/src/dm/interface/dm.c:1519 #2 0x00007ffff7cafc9d in xolotl::solver::PetscSolver::initialize(int, double, _p_DM*, _p_Vec*) () from /home/sophie/Workspace/xolotl-develop-build/xolotl/solver/libxolotlSolver.so PETSc was built with: ./configure PETSC_DIR=/home/sophie/Workspace/xolotl-develop-source/external/petsc PETSC_ARCH=dbg --prefix=/home/sophie/Workspace/xolotl-develop-build/external/petsc_install --with-fc=0 --with-cuda=0 --with-mpi --with-openmp=0 --with-debugging=1 --with-shared-libraries --with-64-bit-indices --download-kokkos --download-kokkos-kernels --with-cudac=0 Let me know what additional information I can provide to resolve the issue. Best, Sophie -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jul 25 14:48:24 2025 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 25 Jul 2025 15:48:24 -0400 Subject: [petsc-users] Segfault in DMCreateMatrix In-Reply-To: References: Message-ID: On Fri, Jul 25, 2025 at 3:00?PM Blondel, Sophie via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi, > > I am using PETSc version v3.23.4 for my code > https://urldefense.us/v3/__https://github.com/ORNL-Fusion/xolotl/blob/develop/xolotl/solver/src/PetscSolver.cpp__;!!G_uCfscf7eWS!ZP6ogmTTYFdFBmqAralMBdfthDbAfo3gRRaJxI4kJX0cKTmrR9rkgiYJQs6MvFSUs1NWo4VjX1oVeBGyBkX8$ > and > I am getting a segfault in DMCreateMatrix, more specifically from gdb: > #0 0x0000000000000651 in ?? () > #1 0x00007ffff58f0119 in DMCreateMatrix (dm=0x55555571bb90, > mat=0x7fffffffc680) > at > /home/sophie/Workspace/xolotl-develop-source/external/petsc/src/dm/interface/dm.c:1519 > #2 0x00007ffff7cafc9d in xolotl::solver::PetscSolver::initialize(int, > double, _p_DM*, _p_Vec*) () > from > /home/sophie/Workspace/xolotl-develop-build/xolotl/solver/libxolotlSolver.so > > PETSc was built with: > ./configure > PETSC_DIR=/home/sophie/Workspace/xolotl-develop-source/external/petsc > PETSC_ARCH=dbg > --prefix=/home/sophie/Workspace/xolotl-develop-build/external/petsc_install > --with-fc=0 --with-cuda=0 --with-mpi --with-openmp=0 --with-debugging=1 > --with-shared-libraries --with-64-bit-indices --download-kokkos > --download-kokkos-kernels --with-cudac=0 > > Let me know what additional information I can provide to resolve the issue. > This was fixed here: https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/commit/4758e3ce3f4cc9adc2a49a02e2f05c2a0f943969__;!!G_uCfscf7eWS!ZP6ogmTTYFdFBmqAralMBdfthDbAfo3gRRaJxI4kJX0cKTmrR9rkgiYJQs6MvFSUs1NWo4VjX1oVeHqfyjIi$ so it will make it to the point release Sept 28, or it is in main. Thanks, Matt > Best, > > Sophie > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ZP6ogmTTYFdFBmqAralMBdfthDbAfo3gRRaJxI4kJX0cKTmrR9rkgiYJQs6MvFSUs1NWo4VjX1oVeBC8AX31$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sblondel at utk.edu Fri Jul 25 16:02:11 2025 From: sblondel at utk.edu (Blondel, Sophie) Date: Fri, 25 Jul 2025 21:02:11 +0000 Subject: [petsc-users] Segfault in DMCreateMatrix In-Reply-To: References: Message-ID: Great, Thank you very much! Sophie ________________________________ From: Matthew Knepley Sent: Friday, July 25, 2025 15:48 To: Blondel, Sophie Cc: PETSc users list Subject: Re: [petsc-users] Segfault in DMCreateMatrix On Fri, Jul 25, 2025 at 3:00?PM Blondel, Sophie via petsc-users > wrote: Hi, I am using PETSc version v3.23.4 for my code https://urldefense.us/v3/__https://github.com/ORNL-Fusion/xolotl/blob/develop/xolotl/solver/src/PetscSolver.cpp__;!!G_uCfscf7eWS!fIWnEqjFirZwazUqGfkHLeuaz7w9ooew4360JV437QI5gTXUmvknS6lHCTh_5DDxxYfJSS163XB6VWf_YINWq-x7$ and I am getting a segfault in DMCreateMatrix, more specifically from gdb: #0 0x0000000000000651 in ?? () #1 0x00007ffff58f0119 in DMCreateMatrix (dm=0x55555571bb90, mat=0x7fffffffc680) at /home/sophie/Workspace/xolotl-develop-source/external/petsc/src/dm/interface/dm.c:1519 #2 0x00007ffff7cafc9d in xolotl::solver::PetscSolver::initialize(int, double, _p_DM*, _p_Vec*) () from /home/sophie/Workspace/xolotl-develop-build/xolotl/solver/libxolotlSolver.so PETSc was built with: ./configure PETSC_DIR=/home/sophie/Workspace/xolotl-develop-source/external/petsc PETSC_ARCH=dbg --prefix=/home/sophie/Workspace/xolotl-develop-build/external/petsc_install --with-fc=0 --with-cuda=0 --with-mpi --with-openmp=0 --with-debugging=1 --with-shared-libraries --with-64-bit-indices --download-kokkos --download-kokkos-kernels --with-cudac=0 Let me know what additional information I can provide to resolve the issue. This was fixed here: https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/commit/4758e3ce3f4cc9adc2a49a02e2f05c2a0f943969__;!!G_uCfscf7eWS!fIWnEqjFirZwazUqGfkHLeuaz7w9ooew4360JV437QI5gTXUmvknS6lHCTh_5DDxxYfJSS163XB6VWf_YFpDlGYb$ so it will make it to the point release Sept 28, or it is in main. Thanks, Matt Best, Sophie -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fIWnEqjFirZwazUqGfkHLeuaz7w9ooew4360JV437QI5gTXUmvknS6lHCTh_5DDxxYfJSS163XB6VWf_YE059MBY$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sat Jul 26 17:15:21 2025 From: mfadams at lbl.gov (Mark Adams) Date: Sat, 26 Jul 2025 18:15:21 -0400 Subject: [petsc-users] MatCreateSubMatricesMPI strange behavior In-Reply-To: <9d648a1d-72d2-44d0-8a3b-a9d64b01604f@ec-nantes.fr> References: <9d648a1d-72d2-44d0-8a3b-a9d64b01604f@ec-nantes.fr> Message-ID: First, you can not mix communicators in PETSc calls in general (ever?), but this error looks like you might be asking for a row from the matrix that does not exist. You should start with a PETSc example code. Test it and modify it to suit your needs. Good luck, Mark On Fri, Jul 25, 2025 at 9:31?AM Alexis SALZMAN wrote: > Hi, > > As I am relatively new to Petsc, I may have misunderstood how to use the > MatCreateSubMatricesMPI function. The attached code is tuned for three > processes and extracts one matrix for each colour of a subcommunicator > that has been created using the MPI_Comm_split function from an MPIAij > matrix. The following error message appears when the code is set to its > default configuration (i.e. when a rectangular matrix is extracted with > more rows than columns for colour 0): > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Argument out of range > [0]PETSC ERROR: Column too large: col 4 max 3 > [0]PETSC ERROR: See > https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!ZqH097BZ0G0O3WI7RWrwIKFNpyk0czSWEqfusAeTlgEygAffwpgBUzsLw1TIoGkjZ3mYG-NRQxxFoxU4y8EyY0ofiz9I43Qwe0w$ > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.22.2, unknown > > ... petsc git hash 2a89477b25f compiled on a dell i9 computer with Gcc > 14.3, mkl 2025.2, ..... > [0]PETSC ERROR: #1 MatSetValues_SeqAIJ() at > ...petsc/src/mat/impls/aij/seq/aij.c:426 > [0]PETSC ERROR: #2 MatSetValues() at > ...petsc/src/mat/interface/matrix.c:1543 > [0]PETSC ERROR: #3 MatSetSeqMats_MPIAIJ() at > .../petsc/src/mat/impls/aij/mpi/mpiov.c:2965 > [0]PETSC ERROR: #4 MatCreateSubMatricesMPI_MPIXAIJ() at > .../petsc/src/mat/impls/aij/mpi/mpiov.c:3163 > [0]PETSC ERROR: #5 MatCreateSubMatricesMPI_MPIAIJ() at > .../petsc/src/mat/impls/aij/mpi/mpiov.c:3196 > [0]PETSC ERROR: #6 MatCreateSubMatricesMPI() at > .../petsc/src/mat/interface/matrix.c:7293 > [0]PETSC ERROR: #7 main() at sub.c:169 > > When the '-ok' option is selected, the code extracts a square matrix for > colour 0, which runs smoothly in this case. Selecting the '-trans' > option swaps the row and column selection indices, providing a > transposed submatrix smoothly. For colour 1, which uses only one process > and is therefore sequential, rectangular extraction is OK regardless of > the shape. > > Is this dependency on the shape expected? Have I missed an important > tuning step somewhere? > > Thank you in advance for any clarification. > > Regards > > A.S. > > P.S.: I'm sorry, but as I'm leaving my office for the following weeks > this evening, I won't be very responsive during this period. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Pierre.LEDAC at cea.fr Mon Jul 28 02:50:53 2025 From: Pierre.LEDAC at cea.fr (LEDAC Pierre) Date: Mon, 28 Jul 2025 07:50:53 +0000 Subject: [petsc-users] [GPU] Jacobi preconditioner Message-ID: <386853b1efae4269919b977b88c7e679@cea.fr> Hello all, We are solving with PETSc a linear system updated every time step (constant stencil but coefficients changing). The matrix is preallocated once with MatSetPreallocationCOO() then filled each time step with MatSetValuesCOO() and we use device pointers for coo_i, coo_j, and coefficients values. It is working fine with a GMRES Ksp solver and PC Jacobi but we are surprised to see that every time step, during PCSetUp, MatGetDiagonal_SeqAIJ is called whereas the matrix is on the device. Looking at the API, it seems there is no MatGetDiagonal_SeqAIJCUSPARSE() but a MatGetDiagonal_SeqAIJKOKKOS(). Does it mean we should use Kokkos backend in PETSc to have Jacobi preconditioner built directly on device ? Or I am doing something wrong ? NB: Gmres is running well on device. I could use -ksp_reuse_preconditioner to avoid Jacobi being recreated each solve on host but it increases significantly the number of iterations. Thanks, [cid:efad8a7f-43b5-4363-9e63-818e26a804bd] Pierre LEDAC Commissariat ? l??nergie atomique et aux ?nergies alternatives Centre de SACLAY DES/ISAS/DM2S/SGLS/LCAN B?timent 451 ? point courrier n?41 F-91191 Gif-sur-Yvette +33 1 69 08 04 03 +33 6 83 42 05 79 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pastedImage.png Type: image/png Size: 519712 bytes Desc: pastedImage.png URL: From ali.ali_ahmad at utt.fr Mon Jul 28 10:00:11 2025 From: ali.ali_ahmad at utt.fr (Ali ALI AHMAD) Date: Mon, 28 Jul 2025 17:00:11 +0200 (CEST) Subject: [petsc-users] [petsc-maint] norm L2 problemQuestion about changing the norm used in nonlinear solvers (L2 Euclidean vs. L2 Lebesgue) In-Reply-To: <86F87C6A-DEB3-4125-AF51-2B2E577EBFDD@petsc.dev> References: <414475981.6714047.1749631527145.JavaMail.zimbra@utt.fr> <1703896473.7853283.1749734882144.JavaMail.zimbra@utt.fr> <461035026.7868511.1749735776853.JavaMail.zimbra@utt.fr> <323745907.8383516.1749804945465.JavaMail.zimbra@utt.fr> <85CD2CA9-7B77-4288-87BA-9E108D40C7E8@petsc.dev> <82133477.13270009.1750410624370.JavaMail.zimbra@utt.fr> <86F87C6A-DEB3-4125-AF51-2B2E577EBFDD@petsc.dev> Message-ID: <36326459.7563988.1753714811037.JavaMail.zimbra@utt.fr> I?m sorry for getting back to you so late. Thank you for your patience and understanding. * For example, when using L2 algorithms where a different norm is applied in the line search, see Line_search_L2.png as an example from this reference: [ https://urldefense.us/v3/__https://arxiv.org/abs/1607.04254__;!!G_uCfscf7eWS!foyayF7qjdgruME_ERnNpzgxCXcAQdMKzZwZ_YYvujvBhUanj58QBYDvgRLjJcQJaSKIm0g8PhcP2pO29_j2D5o_WJgD5Q$ | https://urldefense.us/v3/__https://arxiv.org/abs/1607.04254__;!!G_uCfscf7eWS!foyayF7qjdgruME_ERnNpzgxCXcAQdMKzZwZ_YYvujvBhUanj58QBYDvgRLjJcQJaSKIm0g8PhcP2pO29_j2D5o_WJgD5Q$ ] . Here, we can change the norm from the L2 Euclidean to the L2 Lebesgue norm. * For GMRES, we need to replace NORM_2 (L2 Euclidean) in your code with weighted_NormL2 (L2 Lebesgue) everywhere, including all the details such as in the Arnoldi algorithm... * For the convergence test as well, in the linear system for finding the direction d (A ? d = b), and also when we search for a good step using the line search formula: x k + 1 ? = x k ? + step ? d . I hope this explanation is clear for you. Best regards, Ali ALI AHMAD De: "Barry Smith" ?: "Ali ALI AHMAD" Cc: "petsc-users" , "petsc-maint" Envoy?: Vendredi 20 Juin 2025 15:50:17 Objet: Re: [petsc-maint] norm L2 problemQuestion about changing the norm used in nonlinear solvers (L2 Euclidean vs. L2 Lebesgue) On Jun 20, 2025, at 5:10 AM, Ali ALI AHMAD wrote: * Yes, I am indeed using an inexact Newton method in my code. The descent direction is computed by solving a linear system involving the Jacobian, so the update follows the classical formula "J(un)^{- 1}d(un)=-F(un)" I'm also trying to use a line search strategy based on a weighted L 2 norm (in the Lebesgue sense), which a priori should lead to better accuracy and faster convergence in anisotropic settings. Ok, could you point to sample code (any language) or written algorithms where a different norm is used in the line search? BQ_BEGIN * During the subsequent iterations, I apply the Eisenstat?Walker method to adapt the tolerance, which should also involve modifying the norm used in the algorithm. * The current implementation still uses the standard Euclidean L 2 norm in PETSc's linear solver and in GMRES. I believe this should ideally be replaced by a weighted L 2 norm consistent with the discretization. However, I haven't yet succeeded in modifying the norm used internally by the linear solver in PETSc, so, I'm not yet sure how much impact this change would have on the overall convergence, but I suspect it could improve robustness, especially for highly anisotropic problems. I would greatly appreciate any guidance on how to implement this properly in PETSc. BQ_END Norms are used in multiple ways in GMRES. 1) defining convergence 2) as part of preconditioning Again can you point to sample code (any language) or written algorithms that describe exactly what you would like to accomplish. Barry BQ_BEGIN Do not hesitate to contact me again if anything remains unclear or if you need further information. Best regards, Ali ALI AHMAD De: "Barry Smith" ?: "Ali ALI AHMAD" Cc: "petsc-users" , "petsc-maint" Envoy?: Samedi 14 Juin 2025 01:06:52 Objet: Re: [petsc-maint] norm L2 problemQuestion about changing the norm used in nonlinear solvers (L2 Euclidean vs. L2 Lebesgue) I appreciate the clarification. I would call 3) preconditioning. To increase my understanding, you are already using Newton's method? That is, you compute the Jacobian of the function and use - J^{-1}(u^n) F(u^n) as your update direction? When you switch the inner product (or precondition) how will the search direction be different? Thanks Barry The case you need support for is becoming important to PETSc so we need to understand it well and support it well which is why I am asking these (perhaps to you) trivial questions. BQ_BEGIN On Jun 13, 2025, at 4:55 AM, Ali ALI AHMAD wrote: Thank you for your message. To answer your question: I would like to use the L 2 norm in the sense of Lebesgue for all three purposes , especially the third one . 1- For displaying residuals during the nonlinear iterations, I would like to observe the convergence behavior using a norm that better reflects the physical properties of the problem. 2- For convergence testing , I would like the stopping criterion to be based on a weighted L 2 norm that accounts for the geometry of the mesh (since I am working with unstructured, anisotropic triangular meshes). 3 - Most importantly , I would like to modify the inner product used in the algorithm so that it aligns with the weighted L 2 norm (since I am working with unstructured, anisotropic triangular meshes). Best regards, Ali ALI AHMAD De: "Barry Smith" ?: "Ali ALI AHMAD" Cc: "petsc-users" , "petsc-maint" Envoy?: Vendredi 13 Juin 2025 03:14:06 Objet: Re: [petsc-maint] norm L2 problemQuestion about changing the norm used in nonlinear solvers (L2 Euclidean vs. L2 Lebesgue) You haven't answered my question. Where (conceptually) and for what purpose do you want to use the L2 norm. 1) displaying norms to observe the convergence behavior 2) in the convergence testing to determine when to stop 3) changing the "inner product" in the algorithm which amounts to preconditioning. Barry BQ_BEGIN On Jun 12, 2025, at 9:42 AM, Ali ALI AHMAD wrote: Thank you for your answer. I am currently working with the nonlinear solvers newtonls (with bt , l2 , etc.) and newtontr (using newton , cauchy , and dogleg strategies) combined with the linear solver gmres and the ILU preconditioner, since my Jacobian matrix is nonsymmetric. I also use the Eisenstat-Walker method for newtonls , as my initial guess is often very far from the exact solution. What I would like to do now is to replace the standard Euclidean L 2 norm with the L 2 norm in the Lebesgue sense in the above numerical algorithm , because my problem is defined on an unstructured, anisotropic triangular mesh where a weighted norm would be more physically appropriate. Would you be able to advise me on how to implement this change properly? I would deeply appreciate any guidance or suggestions you could provide. Thank you in advance for your help. Best regards, Ali ALI AHMAD De: "Ali ALI AHMAD" ?: "Barry Smith" Cc: "petsc-users" , "petsc-maint" Envoy?: Jeudi 12 Juin 2025 15:28:02 Objet: Re: [petsc-maint] norm L2 problemQuestion about changing the norm used in nonlinear solvers (L2 Euclidean vs. L2 Lebesgue) Thank you for your answer. I am currently working with the nonlinear solvers newtonls (with bt , l2 , etc.) and newtontr (using newton , cauchy , and dogleg strategies) combined with the linear solver gmres and the ILU preconditioner, since my Jacobian matrix is nonsymmetric. I also use the Eisenstat-Walker method for newtonls , as my initial guess is often very far from the exact solution. What I would like to do now is to replace the standard Euclidean L 2 norm with the L 2 norm in the Lebesgue sense , because my problem is defined on an unstructured, anisotropic triangular mesh where a weighted norm would be more physically appropriate. Would you be able to advise me on how to implement this change properly? I would deeply appreciate any guidance or suggestions you could provide. Thank you in advance for your help. Best regards, Ali ALI AHMAD De: "Barry Smith" ?: "Ali ALI AHMAD" Cc: "petsc-users" , "petsc-maint" Envoy?: Jeudi 12 Juin 2025 14:57:40 Objet: Re: [petsc-maint] norm L2 problemQuestion about changing the norm used in nonlinear solvers (L2 Euclidean vs. L2 Lebesgue) Do you wish to use a different norm 1) ONLY for displaying (printing out) the residual norms to track progress 2) in the convergence testing 3) to change the numerical algorithm (for example using the L2 inner product instead of the usual linear algebra R^N l2 inner product). For 1) use SNESMonitorSet() and in your monitor function use SNESGetSolution() to grab the solution and then VecGetArray(). Now you can compute any weighted norm you want on the solution. For 2) similar but you need to use SNESSetConvergenceTest For 3) yes, but you need to ask us specifically. Barry BQ_BEGIN On Jun 11, 2025, at 4:45 AM, Ali ALI AHMAD wrote: Dear PETSc team, I hope this message finds you well. I am currently using PETSc in a C++, where I rely on the nonlinear solvers `SNES` with either `newtonls` or `newtontr` methods. I would like to ask if it is possible to change the default norm used (typically the L2 Euclidean norm) to a custom norm, specifically the L2 norm in the sense of Lebesgue (e.g., involving cell-wise weighted integrals over the domain). My main goal is to define a custom residual norm that better reflects the physical quantities of interest in my simulation. Would this be feasible within the PETSc framework? If so, could you point me to the recommended approach (e.g., redefining the norm manually, using specific PETSc hooks or options)? Thank you very much in advance for your help and for the great work on PETSc! Best regards, Ali ALI AHMAD PhD Student University of Technology of Troyes - UTT - France GAMMA3 Project - Office H008 - Phone No: +33 7 67 44 68 18 12 rue Marie Curie - CS 42060 10004 TROYES Cedex BQ_END BQ_END BQ_END BQ_END -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Line_search_L2.png Type: image/png Size: 36659 bytes Desc: not available URL: From junchao.zhang at gmail.com Mon Jul 28 10:43:56 2025 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Mon, 28 Jul 2025 10:43:56 -0500 Subject: [petsc-users] [GPU] Jacobi preconditioner In-Reply-To: <386853b1efae4269919b977b88c7e679@cea.fr> References: <386853b1efae4269919b977b88c7e679@cea.fr> Message-ID: Yes, MatGetDiagonal_SeqAIJCUSPARSE hasn't been implemented. petsc/cuda and petsc/kokkos backends are separate code. If petsc/kokkos meet your needs, then just use them. For petsc users, we hope it will be just a difference of extra --download-kokkos --download-kokkos-kernels in configuration. --Junchao Zhang On Mon, Jul 28, 2025 at 2:51?AM LEDAC Pierre wrote: > Hello all, > > > We are solving with PETSc a linear system updated every time step > (constant stencil but coefficients changing). > > > The matrix is preallocated once with MatSetPreallocationCOO() then filled > each time step with MatSetValuesCOO() and we use device pointers for > coo_i, coo_j, and coefficients values. > > > It is working fine with a GMRES Ksp solver and PC Jacobi but we are > surprised to see that every time step, during PCSetUp, > MatGetDiagonal_SeqAIJ is called whereas the matrix is on the device. > Looking at the API, it seems there is no MatGetDiagonal_SeqAIJCUSPARSE() > but a MatGetDiagonal_SeqAIJKOKKOS(). > > > Does it mean we should use Kokkos backend in PETSc to have Jacobi > preconditioner built directly on device ? Or I am doing something wrong ? > > NB: Gmres is running well on device. > > > I could use -ksp_reuse_preconditioner to avoid Jacobi being recreated each > solve on host but it increases significantly the number of iterations. > > > Thanks, > > > > > > Pierre LEDAC > Commissariat ? l??nergie atomique et aux ?nergies alternatives > Centre de SACLAY > DES/ISAS/DM2S/SGLS/LCAN > B?timent 451 ? point courrier n?41 > F-91191 Gif-sur-Yvette > +33 1 69 08 04 03 > +33 6 83 42 05 79 > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pastedImage.png Type: image/png Size: 519712 bytes Desc: not available URL: From Pierre.LEDAC at cea.fr Mon Jul 28 11:45:51 2025 From: Pierre.LEDAC at cea.fr (LEDAC Pierre) Date: Mon, 28 Jul 2025 16:45:51 +0000 Subject: [petsc-users] [GPU] Jacobi preconditioner In-Reply-To: References: <386853b1efae4269919b977b88c7e679@cea.fr>, Message-ID: Thanks, i will give a try with kokkos backend. I have just seen now that even if we use MatSetPreallocationCOO() with device pointers, it seems that the matrix is preallocated also on host ? Am i wrong or the strategy is to have matrix on host and device even if only the latter is needed ? Thanks ________________________________ De : Junchao Zhang Envoy? : lundi 28 juillet 2025 17:43:56 ? : LEDAC Pierre Cc : petsc-users at mcs.anl.gov Objet : Re: [petsc-users] [GPU] Jacobi preconditioner Yes, MatGetDiagonal_SeqAIJCUSPARSE hasn't been implemented. petsc/cuda and petsc/kokkos backends are separate code. If petsc/kokkos meet your needs, then just use them. For petsc users, we hope it will be just a difference of extra --download-kokkos --download-kokkos-kernels in configuration. --Junchao Zhang On Mon, Jul 28, 2025 at 2:51?AM LEDAC Pierre > wrote: Hello all, We are solving with PETSc a linear system updated every time step (constant stencil but coefficients changing). The matrix is preallocated once with MatSetPreallocationCOO() then filled each time step with MatSetValuesCOO() and we use device pointers for coo_i, coo_j, and coefficients values. It is working fine with a GMRES Ksp solver and PC Jacobi but we are surprised to see that every time step, during PCSetUp, MatGetDiagonal_SeqAIJ is called whereas the matrix is on the device. Looking at the API, it seems there is no MatGetDiagonal_SeqAIJCUSPARSE() but a MatGetDiagonal_SeqAIJKOKKOS(). Does it mean we should use Kokkos backend in PETSc to have Jacobi preconditioner built directly on device ? Or I am doing something wrong ? NB: Gmres is running well on device. I could use -ksp_reuse_preconditioner to avoid Jacobi being recreated each solve on host but it increases significantly the number of iterations. Thanks, [cid:ii_19851ad63faf456b1e51] Pierre LEDAC Commissariat ? l??nergie atomique et aux ?nergies alternatives Centre de SACLAY DES/ISAS/DM2S/SGLS/LCAN B?timent 451 ? point courrier n?41 F-91191 Gif-sur-Yvette +33 1 69 08 04 03 +33 6 83 42 05 79 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pastedImage.png Type: image/png Size: 519712 bytes Desc: pastedImage.png URL: From ali.ali_ahmad at utt.fr Mon Jul 28 11:52:08 2025 From: ali.ali_ahmad at utt.fr (Ali ALI AHMAD) Date: Mon, 28 Jul 2025 18:52:08 +0200 (CEST) Subject: [petsc-users] SNESSetFunctionDomainError with NEWTONTR method Message-ID: <389832210.7609606.1753721528136.JavaMail.zimbra@utt.fr> Hi, I am currently using the NEWTONTR method for compressible flow simulations. In some cases, I obtain non-physical solutions (for example, negative pressure). Can I use SNESSetFunctionDomainError in this case? I tried, but I don?t fully understand how it works with NEWTONTR. Thank you in advance best regrads, Ali ALI AHMAD -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jul 28 11:55:04 2025 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 28 Jul 2025 12:55:04 -0400 Subject: [petsc-users] [petsc-maint] norm L2 problemQuestion about changing the norm used in nonlinear solvers (L2 Euclidean vs. L2 Lebesgue) In-Reply-To: <36326459.7563988.1753714811037.JavaMail.zimbra@utt.fr> References: <414475981.6714047.1749631527145.JavaMail.zimbra@utt.fr> <1703896473.7853283.1749734882144.JavaMail.zimbra@utt.fr> <461035026.7868511.1749735776853.JavaMail.zimbra@utt.fr> <323745907.8383516.1749804945465.JavaMail.zimbra@utt.fr> <85CD2CA9-7B77-4288-87BA-9E108D40C7E8@petsc.dev> <82133477.13270009.1750410624370.JavaMail.zimbra@utt.fr> <86F87C6A-DEB3-4125-AF51-2B2E577EBFDD@petsc.dev> <36326459.7563988.1753714811037.JavaMail.zimbra@utt.fr> Message-ID: On Mon, Jul 28, 2025 at 11:00?AM Ali ALI AHMAD wrote: > I?m sorry for getting back to you so late. Thank you for your patience and > understanding. > > - > > For example, when using L2 algorithms where a different norm is > applied in the line search, see *Line_search_L2.png* as an example > from this reference: https://urldefense.us/v3/__https://arxiv.org/abs/1607.04254__;!!G_uCfscf7eWS!ZSlAlCgNHU3fRLXzgg5EFEe53RPc1qNdYZxn7EdOoqgcUzNXDLCWTu2r4nJaeRsjD5zTKLeJcfm7BpSaFmx-$ > . > Here, we can change the norm from the L2 Euclidean to the L2 Lebesgue norm. > > What does "L2 Lebesgue" mean here? I know what I mean by L_2. It is ||f||^2_2 = \int_Omega |f(x)|^2 dx \approx \sum_q |f(x_q)|^2 w_q Oh, from below it seems you want a weight function wt(x) in the norm. So we would have \sum_q |f(x_q)|^2 wt(x_q) w_q > > - For GMRES, we need to replace NORM_2 (L2 Euclidean) in your code > with weighted_NormL2 (L2 Lebesgue) everywhere, including all the > details such as in the Arnoldi algorithm... > > I do not quite understand here, because GMRES does not use the L:_2 norm. It uses the l_2 norm, which is ||v||^2_2 = \sum |v_i|^2 The things in the vector are coefficients of basis functions. I guess, if you had an interpolatory element, you could interpret this as quadrature rule, meaning you would have \sum wt(x_i) |v_i|^2 where x_i were the coordinates of the dual basis evaluation functionals. > > - > > For the convergence test as well, in the linear system for finding the > direction *d* (A???d = b), > > You want to use the inner product that generates your weighted L_2 norm? > > - > > and also when we search for a good step using the line search formula: > xk+1=xk+step?d. > > You want to minimize your weighted L_2 norm. That should work in the same way. Thanks, Matt > I hope this explanation is clear for you. > > Best regards, > Ali ALI AHMAD > > ------------------------------ > *De: *"Barry Smith" > *?: *"Ali ALI AHMAD" > *Cc: *"petsc-users" , "petsc-maint" < > petsc-maint at mcs.anl.gov> > *Envoy?: *Vendredi 20 Juin 2025 15:50:17 > *Objet: *Re: [petsc-maint] norm L2 problemQuestion about changing the > norm used in nonlinear solvers (L2 Euclidean vs. L2 Lebesgue) > > > > On Jun 20, 2025, at 5:10?AM, Ali ALI AHMAD wrote: > > > * Yes, I am indeed using an inexact Newton method in my code. The descent > direction is computed by solving a linear system involving the Jacobian, so > the update follows the classical formula "J(un)^{-1}d(un)=-F(un)" I'm > also trying to use a line search strategy based on a weighted L2 norm (in > the Lebesgue sense), which a priori should lead to better accuracy and > faster convergence in anisotropic settings. > > > Ok, could you point to sample code (any language) or written algorithms > where a different norm is used in the line search? > > > * During the subsequent iterations, I apply the Eisenstat?Walker method to > adapt the tolerance, which should also involve modifying the norm used in > the algorithm. > > * The current implementation still uses the standard Euclidean L2 norm in > PETSc's linear solver and in GMRES. I believe this should ideally be > replaced by a weighted L2 norm consistent with the discretization. > However, I haven't yet succeeded in modifying the norm used internally by > the linear solver in PETSc, so, I'm not yet sure how much impact this > change would have on the overall convergence, but I suspect it could > improve robustness, especially for highly anisotropic problems. I would > greatly appreciate any guidance on how to implement this properly in PETSc. > > > Norms are used in multiple ways in GMRES. > > 1) defining convergence > > 2) as part of preconditioning > > Again can you point to sample code (any language) or written algorithms > that describe exactly what you would like to accomplish. > > Barry > > > Do not hesitate to contact me again if anything remains unclear or if you > need further information. > > Best regards, > Ali ALI AHMAD > > ------------------------------ > *De: *"Barry Smith" > *?: *"Ali ALI AHMAD" > *Cc: *"petsc-users" , "petsc-maint" < > petsc-maint at mcs.anl.gov> > *Envoy?: *Samedi 14 Juin 2025 01:06:52 > *Objet: *Re: [petsc-maint] norm L2 problemQuestion about changing the > norm used in nonlinear solvers (L2 Euclidean vs. L2 Lebesgue) > > I appreciate the clarification. I would call 3) preconditioning. > To increase my understanding, you are already using Newton's method? > That is, you compute the Jacobian of the function and use - J^{-1}(u^n) > F(u^n) as your update direction? > > When you switch the inner product (or precondition) how will the > search direction be different? > > Thanks > > Barry > > The case you need support for is becoming important to PETSc so we need to > understand it well and support it well which is why I am asking these > (perhaps to you) trivial questions. > > > > On Jun 13, 2025, at 4:55?AM, Ali ALI AHMAD wrote: > > Thank you for your message. > > To answer your question: I would like to use the L2 norm in the sense of > Lebesgue for *all three purposes*, especially the *third one*. > > *1- For displaying residuals* during the nonlinear iterations, I would > like to observe the convergence behavior using a norm that better reflects > the physical properties of the problem. > > *2- For convergence testing*, I would like the stopping criterion to be > based on a weighted L2 norm that accounts for the geometry of the mesh > (since I am working with unstructured, anisotropic triangular meshes). > > *3 - Most importantly*, I would like to modify the *inner product* used > in the algorithm so that it aligns with the weighted L2 norm (since I am > working with unstructured, anisotropic triangular meshes). > > Best regards, > Ali ALI AHMAD > ------------------------------ > *De: *"Barry Smith" > *?: *"Ali ALI AHMAD" > *Cc: *"petsc-users" , "petsc-maint" < > petsc-maint at mcs.anl.gov> > *Envoy?: *Vendredi 13 Juin 2025 03:14:06 > *Objet: *Re: [petsc-maint] norm L2 problemQuestion about changing the > norm used in nonlinear solvers (L2 Euclidean vs. L2 Lebesgue) > > You haven't answered my question. Where (conceptually) and for what > purpose do you want to use the L2 norm. > 1) displaying norms to observe the convergence behavior > > 2) in the convergence testing to determine when to stop > > 3) changing the "inner product" in the algorithm which amounts to > preconditioning. > > Barry > > > On Jun 12, 2025, at 9:42?AM, Ali ALI AHMAD wrote: > > Thank you for your answer. > > I am currently working with the nonlinear solvers *newtonls* (with bt, l2, > etc.) and *newtontr* (using newton, cauchy, and dogleg strategies) > combined with the linear solver *gmres* and the *ILU* preconditioner, > since my Jacobian matrix is nonsymmetric. > > I also use the Eisenstat-Walker method for newtonls, as my initial guess > is often very far from the exact solution. > > What I would like to do now is to *replace the standard Euclidean L2 norm* > with the *L2 norm in the Lebesgue sense in the above numerical algorithm*, > because my problem is defined on an *unstructured, anisotropic triangular > mesh* where a weighted norm would be more physically appropriate. > > Would you be able to advise me on how to implement this change properly? > > I would deeply appreciate any guidance or suggestions you could provide. > > Thank you in advance for your help. > > Best regards, > Ali ALI AHMAD > > ------------------------------ > *De: *"Ali ALI AHMAD" > *?: *"Barry Smith" > *Cc: *"petsc-users" , "petsc-maint" < > petsc-maint at mcs.anl.gov> > *Envoy?: *Jeudi 12 Juin 2025 15:28:02 > *Objet: *Re: [petsc-maint] norm L2 problemQuestion about changing the > norm used in nonlinear solvers (L2 Euclidean vs. L2 Lebesgue) > > Thank you for your answer. > > I am currently working with the nonlinear solvers *newtonls* (with bt, l2, > etc.) and *newtontr* (using newton, cauchy, and dogleg strategies) > combined with the linear solver *gmres* and the *ILU* preconditioner, > since my Jacobian matrix is nonsymmetric. > > I also use the Eisenstat-Walker method for newtonls, as my initial guess > is often very far from the exact solution. > > What I would like to do now is to *replace the standard Euclidean L2 norm* > with the *L2 norm in the Lebesgue sense*, because my problem is defined > on an *unstructured, anisotropic triangular mesh* where a weighted norm > would be more physically appropriate. > > Would you be able to advise me on how to implement this change properly? > > I would deeply appreciate any guidance or suggestions you could provide. > > Thank you in advance for your help. > > Best regards, > Ali ALI AHMAD > ------------------------------ > *De: *"Barry Smith" > *?: *"Ali ALI AHMAD" > *Cc: *"petsc-users" , "petsc-maint" < > petsc-maint at mcs.anl.gov> > *Envoy?: *Jeudi 12 Juin 2025 14:57:40 > *Objet: *Re: [petsc-maint] norm L2 problemQuestion about changing the > norm used in nonlinear solvers (L2 Euclidean vs. L2 Lebesgue) > > Do you wish to use a different norm > > 1) ONLY for displaying (printing out) the residual norms to track > progress > > 2) in the convergence testing > > 3) to change the numerical algorithm (for example using the L2 inner > product instead of the usual linear algebra R^N l2 inner product). > > For 1) use SNESMonitorSet() and in your monitor function use > SNESGetSolution() to grab the solution and then VecGetArray(). Now you can > compute any weighted norm you want on the solution. > > For 2) similar but you need to use SNESSetConvergenceTest > > For 3) yes, but you need to ask us specifically. > > Barry > > > On Jun 11, 2025, at 4:45?AM, Ali ALI AHMAD wrote: > > Dear PETSc team, > > I hope this message finds you well. > > I am currently using PETSc in a C++, where I rely on the nonlinear solvers > `SNES` with either `newtonls` or `newtontr` methods. I would like to ask if > it is possible to change the default norm used (typically the L2 Euclidean > norm) to a custom norm, specifically the L2 norm in the sense of Lebesgue > (e.g., involving cell-wise weighted integrals over the domain). > > My main goal is to define a custom residual norm that better reflects the > physical quantities of interest in my simulation. > > Would this be feasible within the PETSc framework? If so, could you point > me to the recommended approach (e.g., redefining the norm manually, using > specific PETSc hooks or options)? > > Thank you very much in advance for your help and for the great work on > PETSc! > > Best regards, > > ------------------------------ > Ali ALI AHMAD > PhD Student > University of Technology of Troyes - UTT - France > GAMMA3 Project - Office H008 - Phone No: +33 7 67 44 68 18 > 12 rue Marie Curie - CS 42060 10004 TROYES Cedex > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ZSlAlCgNHU3fRLXzgg5EFEe53RPc1qNdYZxn7EdOoqgcUzNXDLCWTu2r4nJaeRsjD5zTKLeJcfm7BnlDbXYV$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Mon Jul 28 12:25:20 2025 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Mon, 28 Jul 2025 12:25:20 -0500 Subject: [petsc-users] [GPU] Jacobi preconditioner In-Reply-To: References: <386853b1efae4269919b977b88c7e679@cea.fr> Message-ID: Currently we always allocate matrices on the host, so that when you call some operations not implemented on the device yet, we have a backup. The memory on the host won't be written until needed, so it won't affect performance if all operations are done on the device. --Junchao Zhang On Mon, Jul 28, 2025 at 11:45?AM LEDAC Pierre wrote: > Thanks, i will give a try with kokkos backend. > > I have just seen now that even if we use MatSetPreallocationCOO() with > device pointers, it seems that the matrix is preallocated also on host ? Am > i wrong or the strategy is to have matrix on host and device even if only > the latter is needed ? > > Thanks > ------------------------------ > *De :* Junchao Zhang > *Envoy? :* lundi 28 juillet 2025 17:43:56 > *? :* LEDAC Pierre > *Cc :* petsc-users at mcs.anl.gov > *Objet :* Re: [petsc-users] [GPU] Jacobi preconditioner > > Yes, MatGetDiagonal_SeqAIJCUSPARSE hasn't been implemented. petsc/cuda > and petsc/kokkos backends are separate code. > If petsc/kokkos meet your needs, then just use them. For petsc users, we > hope it will be just a difference of extra --download-kokkos > --download-kokkos-kernels in configuration. > > --Junchao Zhang > > > On Mon, Jul 28, 2025 at 2:51?AM LEDAC Pierre wrote: > >> Hello all, >> >> >> We are solving with PETSc a linear system updated every time step >> (constant stencil but coefficients changing). >> >> >> The matrix is preallocated once with MatSetPreallocationCOO() then >> filled each time step with MatSetValuesCOO() and we use device pointers >> for coo_i, coo_j, and coefficients values. >> >> >> It is working fine with a GMRES Ksp solver and PC Jacobi but we are >> surprised to see that every time step, during PCSetUp, >> MatGetDiagonal_SeqAIJ is called whereas the matrix is on the device. >> Looking at the API, it seems there is no MatGetDiagonal_SeqAIJCUSPARSE() >> but a MatGetDiagonal_SeqAIJKOKKOS(). >> >> >> Does it mean we should use Kokkos backend in PETSc to have Jacobi >> preconditioner built directly on device ? Or I am doing something wrong ? >> >> NB: Gmres is running well on device. >> >> >> I could use -ksp_reuse_preconditioner to avoid Jacobi being recreated >> each solve on host but it increases significantly the number of iterations. >> >> >> Thanks, >> >> >> >> >> >> Pierre LEDAC >> Commissariat ? l??nergie atomique et aux ?nergies alternatives >> Centre de SACLAY >> DES/ISAS/DM2S/SGLS/LCAN >> B?timent 451 ? point courrier n?41 >> F-91191 Gif-sur-Yvette >> +33 1 69 08 04 03 >> +33 6 83 42 05 79 >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pastedImage.png Type: image/png Size: 519712 bytes Desc: not available URL: From bsmith at petsc.dev Mon Jul 28 22:20:08 2025 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 28 Jul 2025 23:20:08 -0400 Subject: [petsc-users] SNESSetFunctionDomainError with NEWTONTR method In-Reply-To: <389832210.7609606.1753721528136.JavaMail.zimbra@utt.fr> References: <389832210.7609606.1753721528136.JavaMail.zimbra@utt.fr> Message-ID: <9EE1BE4D-9F7E-427D-9840-95BEDD1B3BC0@petsc.dev> Stefano will need to provide the full explanation with trust regions I will just try to provide some background based on looking at the code. Something that is not made clear in the manual page is that for SNESSetFunctionDomainError() to work (in line search codes) properly, you must also set some Infinity or NaN into the computed vector. I have attempted to make this clear in https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8603__;!!G_uCfscf7eWS!c2YPoljsHfCf3J1-tV6cQV7kuT9ezJmi1Kuv-pGMFz6-6ZuXvyEXCAbHOTEZ-AX9a_LlrYH2HTafcS-AU2WCXLw$ In looking at SNESSolve_NEWTONTR() if the initial "guess" is not in the domain; that is you called SNESSetFunctionDomainError() and set some vector entry to NaN or infinity) then it hits the lines if (!snes->vec_func_init_set) { PetscCall(SNESComputeFunction(snes, X, F)); /* F(X) */ } else snes->vec_func_init_set = PETSC_FALSE; PetscCall(VecNorm(F, NORM_2, &fnorm)); /* fnorm <- || F || */ SNESCheckFunctionNorm(snes, fnorm); and will return from SNESSolve() with SNES_DIVERGED_FUNCTION_DOMAIN (I think this is likely what we want to happen). Later in the algorithm, the nonlinear function gets called with /* Compute new objective function */ PetscCall(SNESNewtonTRObjective(snes, has_objective, X, Y, W, G, &gnorm, &fkp1)); if (PetscIsInfOrNanReal(fkp1)) rho = neP->eta1; So if part of the function vector has an Inf or NaN it sets rho to neP->eta1. But rho must be > neP->eta1 so the step is rejected and presumably the trust region will be shrunk or the algorithm will give up. Note that any value you gave for SNESSetFunctionDomainError() is ignored. Barry > On Jul 28, 2025, at 12:52?PM, Ali ALI AHMAD wrote: > > Hi, > > I am currently using the NEWTONTR method for compressible flow simulations. In some cases, I obtain non-physical solutions (for example, negative pressure). Can I use SNESSetFunctionDomainError in this case? I tried, but I don?t fully understand how it works with NEWTONTR. > > Thank you in advance > > best regrads, > Ali ALI AHMAD -------------- next part -------------- An HTML attachment was scrubbed... URL: From Pierre.LEDAC at cea.fr Tue Jul 29 02:23:45 2025 From: Pierre.LEDAC at cea.fr (LEDAC Pierre) Date: Tue, 29 Jul 2025 07:23:45 +0000 Subject: [petsc-users] [GPU] Jacobi preconditioner In-Reply-To: References: <386853b1efae4269919b977b88c7e679@cea.fr> , Message-ID: Thanks for your confirmation, If I read you carefully: The memory on the host won't be written until needed, so it won't affect performance if all operations are done on the device. That means I am doing an operation that forces the matrix to be written on the host. Probably the Jacobi preconditioner operation. And many thanks for the work done on the new API with COO format ! Pierre LEDAC Commissariat ? l??nergie atomique et aux ?nergies alternatives Centre de SACLAY DES/ISAS/DM2S/SGLS/LCAN B?timent 451 ? point courrier n?41 F-91191 Gif-sur-Yvette +33 1 69 08 04 03 +33 6 83 42 05 79 ________________________________ De : Junchao Zhang Envoy? : lundi 28 juillet 2025 19:25:20 ? : LEDAC Pierre Cc : petsc-users at mcs.anl.gov Objet : Re: [petsc-users] [GPU] Jacobi preconditioner Currently we always allocate matrices on the host, so that when you call some operations not implemented on the device yet, we have a backup. The memory on the host won't be written until needed, so it won't affect performance if all operations are done on the device. --Junchao Zhang On Mon, Jul 28, 2025 at 11:45?AM LEDAC Pierre > wrote: Thanks, i will give a try with kokkos backend. I have just seen now that even if we use MatSetPreallocationCOO() with device pointers, it seems that the matrix is preallocated also on host ? Am i wrong or the strategy is to have matrix on host and device even if only the latter is needed ? Thanks ________________________________ De : Junchao Zhang > Envoy? : lundi 28 juillet 2025 17:43:56 ? : LEDAC Pierre Cc : petsc-users at mcs.anl.gov Objet : Re: [petsc-users] [GPU] Jacobi preconditioner Yes, MatGetDiagonal_SeqAIJCUSPARSE hasn't been implemented. petsc/cuda and petsc/kokkos backends are separate code. If petsc/kokkos meet your needs, then just use them. For petsc users, we hope it will be just a difference of extra --download-kokkos --download-kokkos-kernels in configuration. --Junchao Zhang On Mon, Jul 28, 2025 at 2:51?AM LEDAC Pierre > wrote: Hello all, We are solving with PETSc a linear system updated every time step (constant stencil but coefficients changing). The matrix is preallocated once with MatSetPreallocationCOO() then filled each time step with MatSetValuesCOO() and we use device pointers for coo_i, coo_j, and coefficients values. It is working fine with a GMRES Ksp solver and PC Jacobi but we are surprised to see that every time step, during PCSetUp, MatGetDiagonal_SeqAIJ is called whereas the matrix is on the device. Looking at the API, it seems there is no MatGetDiagonal_SeqAIJCUSPARSE() but a MatGetDiagonal_SeqAIJKOKKOS(). Does it mean we should use Kokkos backend in PETSc to have Jacobi preconditioner built directly on device ? Or I am doing something wrong ? NB: Gmres is running well on device. I could use -ksp_reuse_preconditioner to avoid Jacobi being recreated each solve on host but it increases significantly the number of iterations. Thanks, [cid:ii_198520be267f456b1e51] Pierre LEDAC Commissariat ? l??nergie atomique et aux ?nergies alternatives Centre de SACLAY DES/ISAS/DM2S/SGLS/LCAN B?timent 451 ? point courrier n?41 F-91191 Gif-sur-Yvette +33 1 69 08 04 03 +33 6 83 42 05 79 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pastedImage.png Type: image/png Size: 519712 bytes Desc: pastedImage.png URL: From junchao.zhang at gmail.com Tue Jul 29 10:36:45 2025 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Tue, 29 Jul 2025 10:36:45 -0500 Subject: [petsc-users] [GPU] Jacobi preconditioner In-Reply-To: References: <386853b1efae4269919b977b88c7e679@cea.fr> Message-ID: On Tue, Jul 29, 2025 at 2:23?AM LEDAC Pierre wrote: > Thanks for your confirmation, If I read you carefully: > > *The memory on the host won't be written until needed, so it won't affect > performance if all operations are done on the device.* > > > That means I am doing an operation that forces the matrix to be written on > the host. Probably the Jacobi preconditioner operation. > Yes, with the petsc/cuda backend, since MatGetDiagonal is not implemented on device, petsc will copy the matrix from device to host and do the work there. > And many thanks for the work done on the new API with COO format ! > > > Pierre LEDAC > Commissariat ? l??nergie atomique et aux ?nergies alternatives > Centre de SACLAY > DES/ISAS/DM2S/SGLS/LCAN > B?timent 451 ? point courrier n?41 > F-91191 Gif-sur-Yvette > +33 1 69 08 04 03 > +33 6 83 42 05 79 > ------------------------------ > *De :* Junchao Zhang > *Envoy? :* lundi 28 juillet 2025 19:25:20 > *? :* LEDAC Pierre > *Cc :* petsc-users at mcs.anl.gov > *Objet :* Re: [petsc-users] [GPU] Jacobi preconditioner > > Currently we always allocate matrices on the host, so that when you call > some operations not implemented on the device yet, we have a backup. The > memory on the host won't be written until needed, so it won't affect > performance if all operations are done on the device. > > --Junchao Zhang > > > On Mon, Jul 28, 2025 at 11:45?AM LEDAC Pierre wrote: > >> Thanks, i will give a try with kokkos backend. >> >> I have just seen now that even if we use MatSetPreallocationCOO() with >> device pointers, it seems that the matrix is preallocated also on host ? Am >> i wrong or the strategy is to have matrix on host and device even if only >> the latter is needed ? >> >> Thanks >> ------------------------------ >> *De :* Junchao Zhang >> *Envoy? :* lundi 28 juillet 2025 17:43:56 >> *? :* LEDAC Pierre >> *Cc :* petsc-users at mcs.anl.gov >> *Objet :* Re: [petsc-users] [GPU] Jacobi preconditioner >> >> Yes, MatGetDiagonal_SeqAIJCUSPARSE hasn't been implemented. petsc/cuda >> and petsc/kokkos backends are separate code. >> If petsc/kokkos meet your needs, then just use them. For petsc users, we >> hope it will be just a difference of extra --download-kokkos >> --download-kokkos-kernels in configuration. >> >> --Junchao Zhang >> >> >> On Mon, Jul 28, 2025 at 2:51?AM LEDAC Pierre wrote: >> >>> Hello all, >>> >>> >>> We are solving with PETSc a linear system updated every time step >>> (constant stencil but coefficients changing). >>> >>> >>> The matrix is preallocated once with MatSetPreallocationCOO() then >>> filled each time step with MatSetValuesCOO() and we use device pointers >>> for coo_i, coo_j, and coefficients values. >>> >>> >>> It is working fine with a GMRES Ksp solver and PC Jacobi but we are >>> surprised to see that every time step, during PCSetUp, >>> MatGetDiagonal_SeqAIJ is called whereas the matrix is on the device. >>> Looking at the API, it seems there is no MatGetDiagonal_SeqAIJCUSPARSE() >>> but a MatGetDiagonal_SeqAIJKOKKOS(). >>> >>> >>> Does it mean we should use Kokkos backend in PETSc to have Jacobi >>> preconditioner built directly on device ? Or I am doing something wrong ? >>> >>> NB: Gmres is running well on device. >>> >>> >>> I could use -ksp_reuse_preconditioner to avoid Jacobi being recreated >>> each solve on host but it increases significantly the number of iterations. >>> >>> >>> Thanks, >>> >>> >>> >>> >>> >>> Pierre LEDAC >>> Commissariat ? l??nergie atomique et aux ?nergies alternatives >>> Centre de SACLAY >>> DES/ISAS/DM2S/SGLS/LCAN >>> B?timent 451 ? point courrier n?41 >>> F-91191 Gif-sur-Yvette >>> +33 1 69 08 04 03 >>> +33 6 83 42 05 79 >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pastedImage.png Type: image/png Size: 519712 bytes Desc: not available URL: From bsmith at petsc.dev Wed Jul 30 13:34:26 2025 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 30 Jul 2025 14:34:26 -0400 Subject: [petsc-users] [GPU] Jacobi preconditioner In-Reply-To: References: <386853b1efae4269919b977b88c7e679@cea.fr> Message-ID: <49396000-D752-4C95-AF1B-524EC68BC5BC@petsc.dev> We absolutely should have a MatGetDiagonal_SeqAIJCUSPARSE(). It's somewhat embarrassing that we don't provide this. I have found some potential code at https://urldefense.us/v3/__https://stackoverflow.com/questions/60311408/how-to-get-the-diagonal-of-a-sparse-matrix-in-cusparse__;!!G_uCfscf7eWS!cOVqDE53HIAwLx-ILxDMFmg5MAzIZUHjcusb9V2nHf3y_wMh6ZB0ApPcybv2hEBDJlag4Q1oiMX3d-f6nEAtnSI$ Barry > On Jul 28, 2025, at 11:43?AM, Junchao Zhang wrote: > > Yes, MatGetDiagonal_SeqAIJCUSPARSE hasn't been implemented. petsc/cuda and petsc/kokkos backends are separate code. > If petsc/kokkos meet your needs, then just use them. For petsc users, we hope it will be just a difference of extra --download-kokkos --download-kokkos-kernels in configuration. > > --Junchao Zhang > > > On Mon, Jul 28, 2025 at 2:51?AM LEDAC Pierre > wrote: >> Hello all, >> >> >> >> We are solving with PETSc a linear system updated every time step (constant stencil but coefficients changing). >> >> >> >> The matrix is preallocated once with MatSetPreallocationCOO() then filled each time step with MatSetValuesCOO() and we use device pointers for coo_i, coo_j, and coefficients values. >> >> >> >> It is working fine with a GMRES Ksp solver and PC Jacobi but we are surprised to see that every time step, during PCSetUp, MatGetDiagonal_SeqAIJ is called whereas the matrix is on the device. Looking at the API, it seems there is no MatGetDiagonal_SeqAIJCUSPARSE() but a MatGetDiagonal_SeqAIJKOKKOS(). >> >> >> >> Does it mean we should use Kokkos backend in PETSc to have Jacobi preconditioner built directly on device ? Or I am doing something wrong ? >> >> NB: Gmres is running well on device. >> >> >> >> I could use -ksp_reuse_preconditioner to avoid Jacobi being recreated each solve on host but it increases significantly the number of iterations. >> >> >> >> Thanks, >> >> >> >> >> >> >> >> >> >> Pierre LEDAC >> Commissariat ? l??nergie atomique et aux ?nergies alternatives >> Centre de SACLAY >> DES/ISAS/DM2S/SGLS/LCAN >> B?timent 451 ? point courrier n?41 >> F-91191 Gif-sur-Yvette >> +33 1 69 08 04 03 >> +33 6 83 42 05 79 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Pierre.LEDAC at cea.fr Thu Jul 31 05:46:41 2025 From: Pierre.LEDAC at cea.fr (LEDAC Pierre) Date: Thu, 31 Jul 2025 10:46:41 +0000 Subject: [petsc-users] [GPU] Jacobi preconditioner In-Reply-To: <49396000-D752-4C95-AF1B-524EC68BC5BC@petsc.dev> References: <386853b1efae4269919b977b88c7e679@cea.fr> , <49396000-D752-4C95-AF1B-524EC68BC5BC@petsc.dev> Message-ID: <99f1b933bd7a40c0ab8b946b99f8c944@cea.fr> Thanks Barry, I agree but didn't dare asking for that. Pierre LEDAC Commissariat ? l??nergie atomique et aux ?nergies alternatives Centre de SACLAY DES/ISAS/DM2S/SGLS/LCAN B?timent 451 ? point courrier n?41 F-91191 Gif-sur-Yvette +33 1 69 08 04 03 +33 6 83 42 05 79 ________________________________ De : Barry Smith Envoy? : mercredi 30 juillet 2025 20:34:26 ? : Junchao Zhang Cc : LEDAC Pierre; petsc-users at mcs.anl.gov Objet : Re: [petsc-users] [GPU] Jacobi preconditioner We absolutely should have a MatGetDiagonal_SeqAIJCUSPARSE(). It's somewhat embarrassing that we don't provide this. I have found some potential code at https://urldefense.us/v3/__https://stackoverflow.com/questions/60311408/how-to-get-the-diagonal-of-a-sparse-matrix-in-cusparse__;!!G_uCfscf7eWS!flO1UCfj-bia4eeLdSw3qZ5b15r6I7UIktvoIFPaqwGfdbGlABa_9JjiwW6xy6Gan0s-kA6hRXDz3jjsoCZnkf_SlbiT$ Barry On Jul 28, 2025, at 11:43?AM, Junchao Zhang wrote: Yes, MatGetDiagonal_SeqAIJCUSPARSE hasn't been implemented. petsc/cuda and petsc/kokkos backends are separate code. If petsc/kokkos meet your needs, then just use them. For petsc users, we hope it will be just a difference of extra --download-kokkos --download-kokkos-kernels in configuration. --Junchao Zhang On Mon, Jul 28, 2025 at 2:51?AM LEDAC Pierre > wrote: Hello all, We are solving with PETSc a linear system updated every time step (constant stencil but coefficients changing). The matrix is preallocated once with MatSetPreallocationCOO() then filled each time step with MatSetValuesCOO() and we use device pointers for coo_i, coo_j, and coefficients values. It is working fine with a GMRES Ksp solver and PC Jacobi but we are surprised to see that every time step, during PCSetUp, MatGetDiagonal_SeqAIJ is called whereas the matrix is on the device. Looking at the API, it seems there is no MatGetDiagonal_SeqAIJCUSPARSE() but a MatGetDiagonal_SeqAIJKOKKOS(). Does it mean we should use Kokkos backend in PETSc to have Jacobi preconditioner built directly on device ? Or I am doing something wrong ? NB: Gmres is running well on device. I could use -ksp_reuse_preconditioner to avoid Jacobi being recreated each solve on host but it increases significantly the number of iterations. Thanks, Pierre LEDAC Commissariat ? l??nergie atomique et aux ?nergies alternatives Centre de SACLAY DES/ISAS/DM2S/SGLS/LCAN B?timent 451 ? point courrier n?41 F-91191 Gif-sur-Yvette +33 1 69 08 04 03 +33 6 83 42 05 79 -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Jul 31 10:05:10 2025 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 31 Jul 2025 10:05:10 -0500 Subject: [petsc-users] [GPU] Jacobi preconditioner In-Reply-To: <49396000-D752-4C95-AF1B-524EC68BC5BC@petsc.dev> References: <386853b1efae4269919b977b88c7e679@cea.fr> <49396000-D752-4C95-AF1B-524EC68BC5BC@petsc.dev> Message-ID: What would embarrass me more is to copy the same code to MatGetDiagonal_ SeqAIJHIPSPARSE. --Junchao Zhang On Wed, Jul 30, 2025 at 1:34?PM Barry Smith wrote: > > We absolutely should have a MatGetDiagonal_SeqAIJCUSPARSE(). > It's somewhat embarrassing that we don't provide this. > > I have found some potential code at > https://urldefense.us/v3/__https://stackoverflow.com/questions/60311408/how-to-get-the-diagonal-of-a-sparse-matrix-in-cusparse__;!!G_uCfscf7eWS!eFR7ZzxJzlmh7BCszleLr5XhdKGWWrJhH0s5z1UvCfSHo3N0wJBFRvxofvWDbFPYNGW2tSRU8lzHC4w2RMCL0W4wbMZv$ > > Barry > > > > > On Jul 28, 2025, at 11:43?AM, Junchao Zhang > wrote: > > Yes, MatGetDiagonal_SeqAIJCUSPARSE hasn't been implemented. petsc/cuda > and petsc/kokkos backends are separate code. > If petsc/kokkos meet your needs, then just use them. For petsc users, we > hope it will be just a difference of extra --download-kokkos > --download-kokkos-kernels in configuration. > > --Junchao Zhang > > > On Mon, Jul 28, 2025 at 2:51?AM LEDAC Pierre wrote: > >> Hello all, >> >> >> We are solving with PETSc a linear system updated every time step >> (constant stencil but coefficients changing). >> >> >> The matrix is preallocated once with MatSetPreallocationCOO() then >> filled each time step with MatSetValuesCOO() and we use device pointers >> for coo_i, coo_j, and coefficients values. >> >> >> It is working fine with a GMRES Ksp solver and PC Jacobi but we are >> surprised to see that every time step, during PCSetUp, >> MatGetDiagonal_SeqAIJ is called whereas the matrix is on the device. >> Looking at the API, it seems there is no MatGetDiagonal_SeqAIJCUSPARSE() >> but a MatGetDiagonal_SeqAIJKOKKOS(). >> >> >> Does it mean we should use Kokkos backend in PETSc to have Jacobi >> preconditioner built directly on device ? Or I am doing something wrong ? >> >> NB: Gmres is running well on device. >> >> >> I could use -ksp_reuse_preconditioner to avoid Jacobi being recreated >> each solve on host but it increases significantly the number of iterations. >> >> >> Thanks, >> >> >> >> >> >> >> Pierre LEDAC >> Commissariat ? l??nergie atomique et aux ?nergies alternatives >> Centre de SACLAY >> DES/ISAS/DM2S/SGLS/LCAN >> B?timent 451 ? point courrier n?41 >> F-91191 Gif-sur-Yvette >> +33 1 69 08 04 03 >> +33 6 83 42 05 79 >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From franz.fischer at uni-hamburg.de Thu Jul 31 05:07:52 2025 From: franz.fischer at uni-hamburg.de (Fischer, Franz) Date: Thu, 31 Jul 2025 10:07:52 +0000 Subject: [petsc-users] Spectrum Slicing with MATMPISBAIJ Message-ID: Dear all, I am dealing with large, sparse symmetric (and hermitian) matrices (N ~ 1e5) and I am using slepc in order to solve them. For now, I was using the matrix type MATMPISBAIJ to fill my matrix in parallel and I was solving the EVP for the EPS_SMALLEST_REAL eigenvalues just fine. Now I am dealing with spectra, where I am not desired in knowing the small real part of the eigenvalue spectrum, but rather in some energy interval - thus, making me want to use spectrum slicing. For that I have browsed your documentation and now I am a little concerned on what to do. I saw that my matrix type is not supported for LU decomposition, so I have converted my matrix to type MATMPIAIJ. There is this table here (https://urldefense.us/v3/__https://petsc.org/release/overview/linear_solve_table/__;!!G_uCfscf7eWS!e_EQDlcM6bSB8ayYFeVL6q8O-g91z5wOtB-pL91VevkcE-S7h_vfKV-Tx7PRpJRVsRJuYkvple6k1IZvvAPGGWPJsYRnSAPKf3SZ$ ), where I found which solvers are available for which matrix type. What I can not fully understand are the columns Parallel and Complex. What does it mean to have an X in there, does it mean it is possible or not? Do I need to compile Petsc with external LU-solver packages or not? Thanks in advance for your reply! Best, Franz --------------------------------------------------------- MSc. Franz Fischer Universit?t Hamburg HARBOR, Geb. 610 Luruper Chaussee 149 D-22761 Hamburg --------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Thu Jul 31 10:17:29 2025 From: jroman at dsic.upv.es (Jose E. Roman) Date: Thu, 31 Jul 2025 15:17:29 +0000 Subject: [petsc-users] Spectrum Slicing with MATMPISBAIJ In-Reply-To: References: Message-ID: <4F707ADA-3761-48B7-9471-CA40B3939D29@dsic.upv.es> If you look at section 3.4.5 (Spectrum slicing) of the SLEPc users manual, you will see it is CHOLESKY that you need, not LU, so both AIJ and SBAIJ should work. PETSc's LU and Cholesky are sequential, i.e., they can only be used in sequential runs, or as a local preconditioner in parallel runs. If you plan to run your code with several MPI processes, you should configure PETSc with an external package such as MUMPS (those with the X in the Parallel column). The Complex column indicates if the factorization is available in complex scalar builds. Jose > El 31 jul 2025, a las 12:07, Fischer, Franz via petsc-users escribi?: > > Dear all, > > I am dealing with large, sparse symmetric (and hermitian) matrices (N ~ 1e5) and I am using slepc in order to solve them. > For now, I was using the matrix type MATMPISBAIJ to fill my matrix in parallel and I was solving the EVP for the EPS_SMALLEST_REAL eigenvalues just fine. > > Now I am dealing with spectra, where I am not desired in knowing the small real part of the eigenvalue spectrum, but rather in some energy interval - thus, making me want to use spectrum slicing. > > For that I have browsed your documentation and now I am a little concerned on what to do. > I saw that my matrix type is not supported for LU decomposition, so I have converted my matrix to type MATMPIAIJ. > There is this table here (https://urldefense.us/v3/__https://petsc.org/release/overview/linear_solve_table/__;!!G_uCfscf7eWS!fe4ysln5TfzUVOsGUnRA53e7CXOZAr8euf_lpIw-8QBTEN4ZFAk_iwlgW4f9EM_Ymn_tCdWH1roT2mD1PnyqDu8G$ ), where I found which solvers are available for which matrix type. > > What I can not fully understand are the columns Parallel and Complex. What does it mean to have an X in there, does it mean it is possible or not? > > Do I need to compile Petsc with external LU-solver packages or not? > > Thanks in advance for your reply! > > Best, > Franz > > --------------------------------------------------------- > MSc. Franz Fischer > Universit?t Hamburg > HARBOR, Geb. 610 > Luruper Chaussee 149 > D-22761 Hamburg > --------------------------------------------------------- > From bourdin at mcmaster.ca Thu Jul 31 15:14:31 2025 From: bourdin at mcmaster.ca (Blaise Bourdin) Date: Thu, 31 Jul 2025 20:14:31 +0000 Subject: [petsc-users] Printing _some_ options In-Reply-To: References: <94ABA8B4-8703-4A87-82A4-5F0B632D777E@mcmaster.ca> Message-ID: An HTML attachment was scrubbed... URL: