[petsc-users] problem with nested logging, standalone example

Barry Smith bsmith at petsc.dev
Tue Jul 22 16:16:20 CDT 2025


  Yippee! (maybe)

> On Jul 22, 2025, at 4:18 PM, Junchao Zhang <junchao.zhang at gmail.com> wrote:
> 
> With Chris's example, I did reproduce the "MPI_ERR_BUFFER: invalid buffer pointer" on a machine.  I am looking into it.
> 
> Thanks.
> --Junchao Zhang
> 
> 
> On Tue, Jul 22, 2025 at 9:51 AM Zongze Yang <yangzongze at gmail.com <mailto:yangzongze at gmail.com>> wrote:
>> Hi,
>> I encountered a similar issue with Firedrake when using the -log_view option with XML format on macOS. Below is the error message. The Firedrake code and the shell script used to run it are attached.
>> 
>> ```
>> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> [0]PETSC ERROR: General MPI error
>> [0]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer
>> [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!e6os5i6Vr0sPXPU9Ut4fBH23Zm5rhuv1OQzrN-_auwqE3MHOqaBjE9-qILFZzFX8uJoxgIXWCDijPrpqKAzeMGg$  <https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!eiv8Wo1VhQz4c2L8MbDoPcg0KZ0loiWlwjI1MR6VEtFfLWTjZNV4UssfSUT-F9tKXb2GjX8Ar-YrWmBGIAY9ujQp$> for trouble shooting.
>> [0]PETSC ERROR: PETSc Release Version 3.23.4, unknown
>> [0]PETSC ERROR: test.py with 2 MPI process(es) and PETSC_ARCH arch-firedrake-default on 192.168.10.51 by zzyang Tue Jul 22 22:24:05 2025
>> [0]PETSC ERROR: Configure options: PETSC_ARCH=arch-firedrake-default --COPTFLAGS="-O3 -march=native -mtune=native" --CXXOPTFLAGS="-O3 -march=native -mtune=native" --FOPTFLAGS="-O3 -mtune=native" --with-c2html=0 --with-debugging=0 --with-fortran-bindings=0 --with-shared-libraries=1 --with-strict-petscerrorcode --download-cmake --download-bison --download-fftw --download-mumps-avoid-mpi-in-place --with-hdf5-dir=/opt/homebrew --with-hwloc-dir=/opt/homebrew --download-metis --download-mumps --download-netcdf --download-pnetcdf --download-ptscotch --download-scalapack --download-suitesparse --download-superlu_dist --download-slepc --with-zlib --download-hpddm --download-libpng --download-ctetgen --download-tetgen --download-triangle --download-mmg --download-parmmg --download-p4est --download-eigen --download-hypre --download-pragmatic
>> [0]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:289
>> [0]PETSC ERROR: #2 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:383
>> [0]PETSC ERROR: #3 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384
>> [0]PETSC ERROR: #4 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384
>> [0]PETSC ERROR: #5 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384
>> [0]PETSC ERROR: #6 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384
>> [0]PETSC ERROR: #7 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384
>> [0]PETSC ERROR: #8 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384
>> [0]PETSC ERROR: #9 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384
>> [0]PETSC ERROR: #10 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384
>> [0]PETSC ERROR: #11 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384
>> [0]PETSC ERROR: #12 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384
>> [0]PETSC ERROR: #13 PetscLogNestedTreePrintTop() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:420
>> [0]PETSC ERROR: #14 PetscLogHandlerView_Nested_XML() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:443
>> [0]PETSC ERROR: #15 PetscLogHandlerView_Nested() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/lognested.c:405
>> [0]PETSC ERROR: #16 PetscLogHandlerView() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/interface/loghandler.c:342
>> [0]PETSC ERROR: #17 PetscLogView() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/plog.c:2043
>> [0]PETSC ERROR: #18 PetscLogViewFromOptions() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/plog.c:2084
>> [0]PETSC ERROR: #19 PetscFinalize() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/objects/pinit.c:1552
>> PetscFinalize() failed [error code: 98]
>> --------------------------------------------------------------------------
>> prterun has exited due to process rank 0 with PID 28986 on node 192.168.10.51 exiting
>> improperly. There are three reasons this could occur:
>> 
>> 1. this process did not call "init" before exiting, but others in the
>> job did. This can cause a job to hang indefinitely while it waits for
>> all processes to call "init". By rule, if one process calls "init",
>> then ALL processes must call "init" prior to termination.
>> 
>> 2. this process called "init", but exited without calling "finalize".
>> By rule, all processes that call "init" MUST call "finalize" prior to
>> exiting or it will be considered an "abnormal termination"
>> 
>> 3. this process called "MPI_Abort" or "prte_abort" and the mca
>> parameter prte_create_session_dirs is set to false. In this case, the
>> run-time cannot detect that the abort call was an abnormal
>> termination. Hence, the only error message you will receive is this
>> one.
>> 
>> This may have caused other processes in the application to be
>> terminated by signals sent by prterun (as reported here).
>> 
>> You can avoid this message by specifying -quiet on the prterun command
>> line.
>> --------------------------------------------------------------------------
>> ```
>> 
>> Best wishes,
>> Zongze
>> 
>> From: petsc-users <petsc-users-bounces at mcs.anl.gov <mailto:petsc-users-bounces at mcs.anl.gov>> on behalf of Klaij, Christiaan via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
>> Date: Monday, July 14, 2025 at 15:58
>> To: Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
>> Cc: PETSc users list <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
>> Subject: Re: [petsc-users] problem with nested logging, standalone example
>> 
>> @Junchao: yes, all with my ex2f.F90 variation on two or three cores
>> 
>> @Barry: it's really puzzling that you cannot reproduce. Can you try running it a dozen times in a row? And look at the report_performance.xml file? When it hangs I see some nan's, for instance here in the VecAXPY event:
>> 
>>                <events>
>>                     <event>
>>                         <name>VecAXPY</name>
>>                         <time>
>>                             <avgvalue>0.00610203</avgvalue>
>>                             <minvalue>0.</minvalue>
>>                             <maxvalue>0.0122041</maxvalue>
>>                             <minloc>1</minloc>
>>                             <maxloc>0</maxloc>
>>                         </time>
>>                         <ncalls>
>>                             <avgvalue>0.5</avgvalue>
>>                             <minvalue>0.</minvalue>
>>                             <maxvalue>1.</maxvalue>
>>                             <minloc>1</minloc>
>>                             <maxloc>0</maxloc>
>>                         </ncalls>
>>                     </event>
>>                     <event>
>>                         <name>self</name>
>>                         <time>
>>                             <value>-nan.</value>
>>                         </time>
>> 
>> This is what I did in my latest attempt on the login node of our Rocky Linux 9 cluster:
>> 1) download petsc-3.23.4.tar.gz from the petsc website
>> 2) ./configure -prefix=~/petsc/install --with-cxx=0 --with-debugging=0 --with-mpi-dir=/cm/shared/apps/mpich/ge/gcc/64/3.4.2
>> 3) adjust my example to this version of petsc (file is attached)
>> 4) make ex2f-cklaij-dbg-v2
>> 5) mpirun -n 2 ./ex2f-cklaij-dbg-v2
>> 
>> So the exact versions are: petsc-3.23.4, system mpich 3.4.2, system gcc 11.5.0
>> 
>> ________________________________________
>> From: Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
>> Sent: Friday, July 11, 2025 11:22 PM
>> To: Klaij, Christiaan
>> Cc: Junchao Zhang; PETSc users list
>> Subject: Re: [petsc-users] problem with nested logging, standalone example
>> 
>> 
>>   And yet we cannot reproduce.
>> 
>>   Please tell us the exact PETSc version and MPI implementation versions. And reattach your reproducing example. And exactly how you run it.
>> 
>> 
>>   Can you reproduce it on  an "ordinary" machine, say a Mac or Linux laptop.
>> 
>>   Barry
>> 
>>   If I could reproduce the problem here is how I would debug. I put use -start_in_debugger and then put break points in places which it seem problematic. Presumably I would end up with a hang with each MPI process in a "different place" and from that I may be able to determine how that happened.
>> 
>> 
>> 
>> > On Jul 11, 2025, at 7:58 AM, Klaij, Christiaan <C.Klaij at marin.nl <mailto:C.Klaij at marin.nl>> wrote:
>> >
>> > In summary for future reference:
>> > - tested 3 different machines, two at Marin, one at the national HPC
>> > - tested 3 different mpi implementation (intelmpi, openmpi and mpich)
>> > - tested openmpi in both release and debug
>> > - tested 2 different compilers (intel and gnu), both older and very recent versions
>> > - tested with the most basic config (./configure --with-cxx=0 --with-debugging=0 --download-mpich)
>> >
>> > All of these test either segfault, or hang or error-out at the call to PetscLogView.
>> >
>> > Chris
>> >
>> > ________________________________________
>> > From: Klaij, Christiaan <C.Klaij at marin.nl <mailto:C.Klaij at marin.nl>>
>> > Sent: Friday, July 11, 2025 10:10 AM
>> > To: Barry Smith; Junchao Zhang
>> > Cc: PETSc users list
>> > Subject: Re: [petsc-users] problem with nested logging, standalone example
>> >
>> > @Matt: no MPI errors indeed. I've tried with MPICH and I get the same hanging.
>> > @Barry: both stack traces aren't exactly the same, see a sample with MPICH below.
>> >
>> > If it cannot be reproduced at your side, I'm afraid this is another dead end. Thanks anyway, I really appreciate all your help.
>> >
>> > Chris
>> >
>> > (gdb) bt
>> > #0  0x000015555033bc2e in MPIDI_POSIX_mpi_release_gather_gather.constprop.0 ()
>> >   from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>> > #1  0x000015555033db8a in MPIDI_POSIX_mpi_allreduce_release_gather ()
>> >   from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>> > #2  0x000015555033e70f in MPIR_Allreduce ()
>> >   from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>> > #3  0x000015555033f22e in PMPI_Allreduce ()
>> >   from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>> > #4  0x0000155553f85d69 in MPIU_Allreduce_Count (comm=-2080374782,
>> >    op=1476395020, dtype=1275072547, count=1, outbuf=0x7fffffffac70,
>> >    inbuf=0x7fffffffac60)
>> >    at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1839
>> > #5  MPIU_Allreduce_Private (inbuf=inbuf at entry=0x7fffffffac60,
>> >    outbuf=outbuf at entry=0x7fffffffac70, count=count at entry=1,
>> >    dtype=dtype at entry=1275072547, op=op at entry=1476395020, comm=-2080374782)
>> >    at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1869
>> > #6  0x0000155553f33dbe in PetscPrintXMLNestedLinePerfResults (
>> >    viewer=viewer at entry=0x458890, name=name at entry=0x155554ef6a0d 'mbps\000',
>> >    value=<optimized out>, minthreshold=minthreshold at entry=0,
>> >    maxthreshold=maxthreshold at entry=0.01,
>> >    minmaxtreshold=minmaxtreshold at entry=1.05)
>> >    at /home/cklaij/petsc/petsc-3.23.4/src/sys/logging/handler/impls/nested/xmlviewer.c:255
>> >
>> >
>> > (gdb) bt
>> > #0  0x000015554fed3b17 in clock_gettime at GLIBC_2.2.5 () from /lib64/libc.so.6
>> > #1  0x0000155550b0de71 in ofi_gettime_ns ()
>> >   from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>> > #2  0x0000155550b0dec9 in ofi_gettime_ms ()
>> >   from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>> > #3  0x0000155550b2fab5 in sock_cq_sreadfrom ()
>> >   from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>> > #4  0x00001555505ca6f7 in MPIDI_OFI_progress ()
>> >   from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>> > #5  0x0000155550591fe9 in progress_test ()
>> >   from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>> > #6  0x00001555505924a3 in MPID_Progress_wait ()
>> >   from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>> > #7  0x000015555043463e in MPIR_Wait_state ()
>> >   from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>> > #8  0x000015555052ec49 in MPIC_Wait ()
>> >   from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>> > #9  0x000015555053093e in MPIC_Sendrecv ()
>> >   from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>> > #10 0x00001555504bf674 in MPIR_Allreduce_intra_recursive_doubling ()
>> >   from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>> > #11 0x00001555505b61de in MPIDI_OFI_mpi_finalize_hook ()
>> >   from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>> >
>> > ________________________________________
>> > From: Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
>> > Sent: Thursday, July 10, 2025 11:10 PM
>> > To: Junchao Zhang
>> > Cc: Klaij, Christiaan; PETSc users list
>> > Subject: Re: [petsc-users] problem with nested logging, standalone example
>> >
>> >
>> >  I cannot reproduce
>> >
>> > On Jul 10, 2025, at 3:46 PM, Junchao Zhang <junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com>> wrote:
>> >
>> > Adding -mca coll_hcoll_enable 0 didn't change anything at my end.  Strange.
>> >
>> > --Junchao Zhang
>> >
>> >
>> > On Thu, Jul 10, 2025 at 3:39 AM Klaij, Christiaan <C.Klaij at marin.nl <mailto:C.Klaij at marin.nl><mailto:C.Klaij at marin.nl <mailto:C.Klaij at marin.nl>>> wrote:
>> > An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0, the code does not hang but gives the error below.
>> >
>> > Chris
>> >
>> >
>> > $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always
>> > 0 KSP Residual norm 1.11803
>> > 1 KSP Residual norm 0.591608
>> > 2 KSP Residual norm 0.316228
>> > 3 KSP Residual norm < 1.e-11
>> > 0 KSP Residual norm 0.707107
>> > 1 KSP Residual norm 0.408248
>> > 2 KSP Residual norm < 1.e-11
>> > Norm of error < 1.e-12 iterations 3
>> > [1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> > [1]PETSC ERROR: General MPI error
>> > [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer
>> > [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK43J9p4SM$ <https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJjkYxsN9$> for trouble shooting.
>> > [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025
>> > [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on login1 by cklaij Thu Jul 10 10:33:33 2025
>> > [1]PETSC ERROR: Configure options: --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 --with-mpe=0 --with-debugging=0 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4VVy6P4U$ <https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJkouVHb2$> --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4-9b1K84$ <https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJrjo6-SP$> --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4Y9uaqiQ$ <https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJhCc9MRE$> --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG"
>> > [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289
>> > [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377
>> > [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384
>> > [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420
>> > [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443
>> > [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405
>> > [1]PETSC ERROR: #7 PetscLogHandlerView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342
>> > [1]PETSC ERROR: #8 PetscLogView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040
>> > [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301
>> > --------------------------------------------------------------------------
>> > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF
>> > Proc: [[55228,1],1]
>> > Errorcode: 98
>> >
>> > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>> > You may or may not see output from other processes, depending on
>> > exactly when Open MPI kills them.
>> > --------------------------------------------------------------------------
>> > --------------------------------------------------------------------------
>> > prterun has exited due to process rank 1 with PID 0 on node login1 calling
>> > "abort". This may have caused other processes in the application to be
>> > terminated by signals sent by prterun (as reported here).
>> > --------------------------------------------------------------------------
>> >
>> > ________________________________________
>> > <image198746.png>
>> > dr. ir.         Christiaan       Klaij   |      senior researcher
>> > Research & Development   |      CFD Development
>> > T +31 317 49 33 44<tel:+31%20317%2049%2033%2044>         |      https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4BUEn1h8$ <https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJrOqapgp$>
>> > <image542473.png><https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJoD4fuV7$>
>> > <image555176.png><https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJospHf95$>
>> > <image269837.png><https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJrpsjB_W$>
>> >
>> >
>> > From: Klaij, Christiaan <C.Klaij at marin.nl <mailto:C.Klaij at marin.nl><mailto:C.Klaij at marin.nl <mailto:C.Klaij at marin.nl>>>
>> > Sent: Thursday, July 10, 2025 10:15 AM
>> > To: Junchao Zhang
>> > Cc: PETSc users list
>> > Subject: Re: [petsc-users] problem with nested logging, standalone example
>> >
>> > Hi Junchao,
>> >
>> > Thanks for testing. I've fixed the error but unfortunately that doesn't change the behavior, the code still hangs as before, with the same stack trace...
>> >
>> > Chris
>> >
>> > ________________________________________
>> > From: Junchao Zhang <junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com><mailto:junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com>>>
>> > Sent: Tuesday, July 8, 2025 10:58 PM
>> > To: Klaij, Christiaan
>> > Cc: PETSc users list
>> > Subject: Re: [petsc-users] problem with nested logging, standalone example
>> >
>> > Hi, Chris,
>> > First, I had to fix an error in your test by adding " PetscCallA(MatSetFromOptions(AA,ierr))" at line 254.
>> > [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> > [0]PETSC ERROR: Object is in wrong state
>> > [0]PETSC ERROR: Mat object's type is not set: Argument # 1
>> > ...
>> > [0]PETSC ERROR: #1 MatSetValues() at /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503
>> > [0]PETSC ERROR: #2 ex2f.F90:258
>> >
>> > Then I could ran the test without problems
>> > mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always
>> > 0 KSP Residual norm 1.11803
>> > 1 KSP Residual norm 0.591608
>> > 2 KSP Residual norm 0.316228
>> > 3 KSP Residual norm < 1.e-11
>> > 0 KSP Residual norm 0.707107
>> > 1 KSP Residual norm 0.408248
>> > 2 KSP Residual norm < 1.e-11
>> > Norm of error < 1.e-12 iterations 3
>> >
>> > I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with
>> > ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-openmpi --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG"
>> >
>> > Could you fix the error and retry?
>> >
>> > --Junchao Zhang
>> >
>> >
>> > On Sun, Jul 6, 2025 at 12:57 PM Klaij, Christiaan via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov><mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>><mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov><mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>>> wrote:
>> > Attached is a standalone example of the issue described in the
>> > earlier thread "problem with nested logging". The issue appeared
>> > somewhere between petsc 3.19.4 and 3.23.4.
>> >
>> > The example is a variation of ../ksp/tutorials/ex2f.F90, where
>> > I've added the nested log viewer with one event as well as the
>> > solution of a small system on rank zero.
>> >
>> > When running on mulitple procs the example hangs during
>> > PetscLogView with the backtrace below. The configure.log is also
>> > attached in the hope that you can replicate the issue.
>> >
>> > Chris
>> >
>> >
>> > #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1,
>> > datatype=0x15554c9ef900 <ompi_mpi_2dblprec>, src=1, tag=-12,
>> > comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700
>> > #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling (
>> > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1,
>> > dtype=0x15554c9ef900 <ompi_mpi_2dblprec>,
>> > op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaec630)
>> > at base/coll_base_allreduce.c:247
>> > #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this (
>> > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1,
>> > dtype=0x15554c9ef900 <ompi_mpi_2dblprec>,
>> > op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaec630,
>> > algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142
>> > #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed (
>> > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1,
>> > dtype=0x15554c9ef900 <ompi_mpi_2dblprec>,
>> > op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaec630)
>> > at coll_tuned_decision_fixed.c:216
>> > #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20,
>> > rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 <ompi_mpi_2dblprec>,
>> > op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaecb80)
>> > at coll_hcoll_ops.c:217
>> > #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20,
>> > recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 <ompi_mpi_2dblprec>, op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30) at allreduce.c:123
>> > #6 0x0000155553eabede in MPIU_Allreduce_Private () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> > #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> > #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> > #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> > #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> > #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> > #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> > #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> > #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> > #15 0x0000155553e56232 in PetscLogHandlerView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> > #16 0x0000155553e588c3 in PetscLogView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> > #17 0x0000155553e40eb5 in petsclogview_ () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>> > #18 0x0000000000402c8b in MAIN__ ()
>> > #19 0x00000000004023df in main ()
>> > [cid:ii_197ebccaa1d27ee6ef21]
>> > dr. ir. Christiaan Klaij | senior researcher
>> > Research & Development | CFD Development
>> > T +31 317 49 33 44<tel:+31%20317%2049%2033%2044> | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4BUEn1h8$ <https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJhphmV4x$><https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imk4ivm_tE$ <https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJhphmV4x$%3E%3Chttps://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imk4ivm_tE$>>
>> > [Facebook]<https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkLNCvsiI$>
>> > [LinkedIn]<https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkrb79Ay4$>
>> > [YouTube]<https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkJiCoeLw$>
>> >
>> >
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20250722/fca4d4b2/attachment-0001.html>


More information about the petsc-users mailing list