[petsc-users] problem with nested logging, standalone example
Barry Smith
bsmith at petsc.dev
Wed Jul 23 14:02:28 CDT 2025
Yippee!
> On Jul 23, 2025, at 2:55 PM, Junchao Zhang <junchao.zhang at gmail.com> wrote:
>
> I think I have a fix at https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8583__;!!G_uCfscf7eWS!e3MBGqc1gt1lKlbv4ETAhLKYDlgT2teM1RuXVOTxDVqsgdFK3oQU-JmOwnszj_WGfSTGPBSmsjoAzNzCpDd4Si8$
>
> Chirs and Zongze, could you try it?
>
> Thanks!
> --Junchao Zhang
>
>
> On Tue, Jul 22, 2025 at 4:16 PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>>
>> Yippee! (maybe)
>>
>>> On Jul 22, 2025, at 4:18 PM, Junchao Zhang <junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com>> wrote:
>>>
>>> With Chris's example, I did reproduce the "MPI_ERR_BUFFER: invalid buffer pointer" on a machine. I am looking into it.
>>>
>>> Thanks.
>>> --Junchao Zhang
>>>
>>>
>>> On Tue, Jul 22, 2025 at 9:51 AM Zongze Yang <yangzongze at gmail.com <mailto:yangzongze at gmail.com>> wrote:
>>>> Hi,
>>>> I encountered a similar issue with Firedrake when using the -log_view option with XML format on macOS. Below is the error message. The Firedrake code and the shell script used to run it are attached.
>>>>
>>>> ```
>>>> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>>>> [0]PETSC ERROR: General MPI error
>>>> [0]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer
>>>> [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!e3MBGqc1gt1lKlbv4ETAhLKYDlgT2teM1RuXVOTxDVqsgdFK3oQU-JmOwnszj_WGfSTGPBSmsjoAzNzC4_rTAoo$ <https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!eiv8Wo1VhQz4c2L8MbDoPcg0KZ0loiWlwjI1MR6VEtFfLWTjZNV4UssfSUT-F9tKXb2GjX8Ar-YrWmBGIAY9ujQp$> for trouble shooting.
>>>> [0]PETSC ERROR: PETSc Release Version 3.23.4, unknown
>>>> [0]PETSC ERROR: test.py with 2 MPI process(es) and PETSC_ARCH arch-firedrake-default on 192.168.10.51 by zzyang Tue Jul 22 22:24:05 2025
>>>> [0]PETSC ERROR: Configure options: PETSC_ARCH=arch-firedrake-default --COPTFLAGS="-O3 -march=native -mtune=native" --CXXOPTFLAGS="-O3 -march=native -mtune=native" --FOPTFLAGS="-O3 -mtune=native" --with-c2html=0 --with-debugging=0 --with-fortran-bindings=0 --with-shared-libraries=1 --with-strict-petscerrorcode --download-cmake --download-bison --download-fftw --download-mumps-avoid-mpi-in-place --with-hdf5-dir=/opt/homebrew --with-hwloc-dir=/opt/homebrew --download-metis --download-mumps --download-netcdf --download-pnetcdf --download-ptscotch --download-scalapack --download-suitesparse --download-superlu_dist --download-slepc --with-zlib --download-hpddm --download-libpng --download-ctetgen --download-tetgen --download-triangle --download-mmg --download-parmmg --download-p4est --download-eigen --download-hypre --download-pragmatic
>>>> [0]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:289
>>>> [0]PETSC ERROR: #2 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:383
>>>> [0]PETSC ERROR: #3 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384
>>>> [0]PETSC ERROR: #4 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384
>>>> [0]PETSC ERROR: #5 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384
>>>> [0]PETSC ERROR: #6 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384
>>>> [0]PETSC ERROR: #7 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384
>>>> [0]PETSC ERROR: #8 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384
>>>> [0]PETSC ERROR: #9 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384
>>>> [0]PETSC ERROR: #10 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384
>>>> [0]PETSC ERROR: #11 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384
>>>> [0]PETSC ERROR: #12 PetscLogNestedTreePrint() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:384
>>>> [0]PETSC ERROR: #13 PetscLogNestedTreePrintTop() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:420
>>>> [0]PETSC ERROR: #14 PetscLogHandlerView_Nested_XML() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/xmlviewer.c:443
>>>> [0]PETSC ERROR: #15 PetscLogHandlerView_Nested() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/impls/nested/lognested.c:405
>>>> [0]PETSC ERROR: #16 PetscLogHandlerView() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/handler/interface/loghandler.c:342
>>>> [0]PETSC ERROR: #17 PetscLogView() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/plog.c:2043
>>>> [0]PETSC ERROR: #18 PetscLogViewFromOptions() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/logging/plog.c:2084
>>>> [0]PETSC ERROR: #19 PetscFinalize() at /Users/zzyang/opt/firedrake/firedrake-pip/petsc/src/sys/objects/pinit.c:1552
>>>> PetscFinalize() failed [error code: 98]
>>>> --------------------------------------------------------------------------
>>>> prterun has exited due to process rank 0 with PID 28986 on node 192.168.10.51 exiting
>>>> improperly. There are three reasons this could occur:
>>>>
>>>> 1. this process did not call "init" before exiting, but others in the
>>>> job did. This can cause a job to hang indefinitely while it waits for
>>>> all processes to call "init". By rule, if one process calls "init",
>>>> then ALL processes must call "init" prior to termination.
>>>>
>>>> 2. this process called "init", but exited without calling "finalize".
>>>> By rule, all processes that call "init" MUST call "finalize" prior to
>>>> exiting or it will be considered an "abnormal termination"
>>>>
>>>> 3. this process called "MPI_Abort" or "prte_abort" and the mca
>>>> parameter prte_create_session_dirs is set to false. In this case, the
>>>> run-time cannot detect that the abort call was an abnormal
>>>> termination. Hence, the only error message you will receive is this
>>>> one.
>>>>
>>>> This may have caused other processes in the application to be
>>>> terminated by signals sent by prterun (as reported here).
>>>>
>>>> You can avoid this message by specifying -quiet on the prterun command
>>>> line.
>>>> --------------------------------------------------------------------------
>>>> ```
>>>>
>>>> Best wishes,
>>>> Zongze
>>>>
>>>> From: petsc-users <petsc-users-bounces at mcs.anl.gov <mailto:petsc-users-bounces at mcs.anl.gov>> on behalf of Klaij, Christiaan via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
>>>> Date: Monday, July 14, 2025 at 15:58
>>>> To: Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
>>>> Cc: PETSc users list <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
>>>> Subject: Re: [petsc-users] problem with nested logging, standalone example
>>>>
>>>> @Junchao: yes, all with my ex2f.F90 variation on two or three cores
>>>>
>>>> @Barry: it's really puzzling that you cannot reproduce. Can you try running it a dozen times in a row? And look at the report_performance.xml file? When it hangs I see some nan's, for instance here in the VecAXPY event:
>>>>
>>>> <events>
>>>> <event>
>>>> <name>VecAXPY</name>
>>>> <time>
>>>> <avgvalue>0.00610203</avgvalue>
>>>> <minvalue>0.</minvalue>
>>>> <maxvalue>0.0122041</maxvalue>
>>>> <minloc>1</minloc>
>>>> <maxloc>0</maxloc>
>>>> </time>
>>>> <ncalls>
>>>> <avgvalue>0.5</avgvalue>
>>>> <minvalue>0.</minvalue>
>>>> <maxvalue>1.</maxvalue>
>>>> <minloc>1</minloc>
>>>> <maxloc>0</maxloc>
>>>> </ncalls>
>>>> </event>
>>>> <event>
>>>> <name>self</name>
>>>> <time>
>>>> <value>-nan.</value>
>>>> </time>
>>>>
>>>> This is what I did in my latest attempt on the login node of our Rocky Linux 9 cluster:
>>>> 1) download petsc-3.23.4.tar.gz from the petsc website
>>>> 2) ./configure -prefix=~/petsc/install --with-cxx=0 --with-debugging=0 --with-mpi-dir=/cm/shared/apps/mpich/ge/gcc/64/3.4.2
>>>> 3) adjust my example to this version of petsc (file is attached)
>>>> 4) make ex2f-cklaij-dbg-v2
>>>> 5) mpirun -n 2 ./ex2f-cklaij-dbg-v2
>>>>
>>>> So the exact versions are: petsc-3.23.4, system mpich 3.4.2, system gcc 11.5.0
>>>>
>>>> ________________________________________
>>>> From: Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
>>>> Sent: Friday, July 11, 2025 11:22 PM
>>>> To: Klaij, Christiaan
>>>> Cc: Junchao Zhang; PETSc users list
>>>> Subject: Re: [petsc-users] problem with nested logging, standalone example
>>>>
>>>>
>>>> And yet we cannot reproduce.
>>>>
>>>> Please tell us the exact PETSc version and MPI implementation versions. And reattach your reproducing example. And exactly how you run it.
>>>>
>>>>
>>>> Can you reproduce it on an "ordinary" machine, say a Mac or Linux laptop.
>>>>
>>>> Barry
>>>>
>>>> If I could reproduce the problem here is how I would debug. I put use -start_in_debugger and then put break points in places which it seem problematic. Presumably I would end up with a hang with each MPI process in a "different place" and from that I may be able to determine how that happened.
>>>>
>>>>
>>>>
>>>> > On Jul 11, 2025, at 7:58 AM, Klaij, Christiaan <C.Klaij at marin.nl <mailto:C.Klaij at marin.nl>> wrote:
>>>> >
>>>> > In summary for future reference:
>>>> > - tested 3 different machines, two at Marin, one at the national HPC
>>>> > - tested 3 different mpi implementation (intelmpi, openmpi and mpich)
>>>> > - tested openmpi in both release and debug
>>>> > - tested 2 different compilers (intel and gnu), both older and very recent versions
>>>> > - tested with the most basic config (./configure --with-cxx=0 --with-debugging=0 --download-mpich)
>>>> >
>>>> > All of these test either segfault, or hang or error-out at the call to PetscLogView.
>>>> >
>>>> > Chris
>>>> >
>>>> > ________________________________________
>>>> > From: Klaij, Christiaan <C.Klaij at marin.nl <mailto:C.Klaij at marin.nl>>
>>>> > Sent: Friday, July 11, 2025 10:10 AM
>>>> > To: Barry Smith; Junchao Zhang
>>>> > Cc: PETSc users list
>>>> > Subject: Re: [petsc-users] problem with nested logging, standalone example
>>>> >
>>>> > @Matt: no MPI errors indeed. I've tried with MPICH and I get the same hanging.
>>>> > @Barry: both stack traces aren't exactly the same, see a sample with MPICH below.
>>>> >
>>>> > If it cannot be reproduced at your side, I'm afraid this is another dead end. Thanks anyway, I really appreciate all your help.
>>>> >
>>>> > Chris
>>>> >
>>>> > (gdb) bt
>>>> > #0 0x000015555033bc2e in MPIDI_POSIX_mpi_release_gather_gather.constprop.0 ()
>>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>>>> > #1 0x000015555033db8a in MPIDI_POSIX_mpi_allreduce_release_gather ()
>>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>>>> > #2 0x000015555033e70f in MPIR_Allreduce ()
>>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>>>> > #3 0x000015555033f22e in PMPI_Allreduce ()
>>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>>>> > #4 0x0000155553f85d69 in MPIU_Allreduce_Count (comm=-2080374782,
>>>> > op=1476395020, dtype=1275072547, count=1, outbuf=0x7fffffffac70,
>>>> > inbuf=0x7fffffffac60)
>>>> > at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1839
>>>> > #5 MPIU_Allreduce_Private (inbuf=inbuf at entry=0x7fffffffac60,
>>>> > outbuf=outbuf at entry=0x7fffffffac70, count=count at entry=1,
>>>> > dtype=dtype at entry=1275072547, op=op at entry=1476395020, comm=-2080374782)
>>>> > at /home/cklaij/petsc/petsc-3.23.4/src/sys/objects/pinit.c:1869
>>>> > #6 0x0000155553f33dbe in PetscPrintXMLNestedLinePerfResults (
>>>> > viewer=viewer at entry=0x458890, name=name at entry=0x155554ef6a0d 'mbps\000',
>>>> > value=<optimized out>, minthreshold=minthreshold at entry=0,
>>>> > maxthreshold=maxthreshold at entry=0.01,
>>>> > minmaxtreshold=minmaxtreshold at entry=1.05)
>>>> > at /home/cklaij/petsc/petsc-3.23.4/src/sys/logging/handler/impls/nested/xmlviewer.c:255
>>>> >
>>>> >
>>>> > (gdb) bt
>>>> > #0 0x000015554fed3b17 in clock_gettime at GLIBC_2.2.5 () from /lib64/libc.so.6
>>>> > #1 0x0000155550b0de71 in ofi_gettime_ns ()
>>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>>>> > #2 0x0000155550b0dec9 in ofi_gettime_ms ()
>>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>>>> > #3 0x0000155550b2fab5 in sock_cq_sreadfrom ()
>>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>>>> > #4 0x00001555505ca6f7 in MPIDI_OFI_progress ()
>>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>>>> > #5 0x0000155550591fe9 in progress_test ()
>>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>>>> > #6 0x00001555505924a3 in MPID_Progress_wait ()
>>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>>>> > #7 0x000015555043463e in MPIR_Wait_state ()
>>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>>>> > #8 0x000015555052ec49 in MPIC_Wait ()
>>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>>>> > #9 0x000015555053093e in MPIC_Sendrecv ()
>>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>>>> > #10 0x00001555504bf674 in MPIR_Allreduce_intra_recursive_doubling ()
>>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>>>> > #11 0x00001555505b61de in MPIDI_OFI_mpi_finalize_hook ()
>>>> > from /cm/shared/apps/mpich/ge/gcc/64/3.4.2/lib/libmpi.so.12
>>>> >
>>>> > ________________________________________
>>>> > From: Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
>>>> > Sent: Thursday, July 10, 2025 11:10 PM
>>>> > To: Junchao Zhang
>>>> > Cc: Klaij, Christiaan; PETSc users list
>>>> > Subject: Re: [petsc-users] problem with nested logging, standalone example
>>>> >
>>>> >
>>>> > I cannot reproduce
>>>> >
>>>> > On Jul 10, 2025, at 3:46 PM, Junchao Zhang <junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com>> wrote:
>>>> >
>>>> > Adding -mca coll_hcoll_enable 0 didn't change anything at my end. Strange.
>>>> >
>>>> > --Junchao Zhang
>>>> >
>>>> >
>>>> > On Thu, Jul 10, 2025 at 3:39 AM Klaij, Christiaan <C.Klaij at marin.nl <mailto:C.Klaij at marin.nl><mailto:C.Klaij at marin.nl <mailto:C.Klaij at marin.nl>>> wrote:
>>>> > An additional clue perhaps: with the option OMPI_MCA_coll_hcoll_enable=0, the code does not hang but gives the error below.
>>>> >
>>>> > Chris
>>>> >
>>>> >
>>>> > $ mpirun -mca coll_hcoll_enable 0 -n 2 ./ex2f-cklaij-dbg -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always
>>>> > 0 KSP Residual norm 1.11803
>>>> > 1 KSP Residual norm 0.591608
>>>> > 2 KSP Residual norm 0.316228
>>>> > 3 KSP Residual norm < 1.e-11
>>>> > 0 KSP Residual norm 0.707107
>>>> > 1 KSP Residual norm 0.408248
>>>> > 2 KSP Residual norm < 1.e-11
>>>> > Norm of error < 1.e-12 iterations 3
>>>> > [1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>>>> > [1]PETSC ERROR: General MPI error
>>>> > [1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer
>>>> > [1]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK43J9p4SM$ <https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJjkYxsN9$> for trouble shooting.
>>>> > [1]PETSC ERROR: Petsc Release Version 3.22.4, Mar 01, 2025
>>>> > [1]PETSC ERROR: ./ex2f-cklaij-dbg with 2 MPI process(es) and PETSC_ARCH on login1 by cklaij Thu Jul 10 10:33:33 2025
>>>> > [1]PETSC ERROR: Configure options: --prefix=/home/cklaij/ReFRESCO/trunk/install/extLibs --with-mpi-dir=/cm/shared/apps/openmpi/gcc/5.0.6-debug --with-x=0 --with-mpe=0 --with-debugging=0 --download-superlu_dist=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4VVy6P4U$ <https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/superlu_dist-8.1.2.tar.gz__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJkouVHb2$> --with-blaslapack-dir=/cm/shared/apps/oneapi/2024.2.1/mkl/2024.2 --download-parmetis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4-9b1K84$ <https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/parmetis-4.0.3-p9.tar.gz__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJrjo6-SP$> --download-metis=https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4Y9uaqiQ$ <https://urldefense.us/v3/__https://updates.marin.nl/refresco/libs/metis-5.1.0-p11.tar.gz__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJhCc9MRE$> --with-packages-build-dir=/home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG"
>>>> > [1]PETSC ERROR: #1 PetscLogNestedTreePrintLine() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:289
>>>> > [1]PETSC ERROR: #2 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:377
>>>> > [1]PETSC ERROR: #3 PetscLogNestedTreePrint() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:384
>>>> > [1]PETSC ERROR: #4 PetscLogNestedTreePrintTop() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:420
>>>> > [1]PETSC ERROR: #5 PetscLogHandlerView_Nested_XML() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/xmlviewer.c:443
>>>> > [1]PETSC ERROR: #6 PetscLogHandlerView_Nested() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/impls/nested/lognested.c:405
>>>> > [1]PETSC ERROR: #7 PetscLogHandlerView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/handler/interface/loghandler.c:342
>>>> > [1]PETSC ERROR: #8 PetscLogView() at /home/cklaij/ReFRESCO/trunk/build-extlibs/superbuild/petsc/src/src/sys/logging/plog.c:2040
>>>> > [1]PETSC ERROR: #9 ex2f-cklaij-dbg.F90:301
>>>> > --------------------------------------------------------------------------
>>>> > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF
>>>> > Proc: [[55228,1],1]
>>>> > Errorcode: 98
>>>> >
>>>> > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>>>> > You may or may not see output from other processes, depending on
>>>> > exactly when Open MPI kills them.
>>>> > --------------------------------------------------------------------------
>>>> > --------------------------------------------------------------------------
>>>> > prterun has exited due to process rank 1 with PID 0 on node login1 calling
>>>> > "abort". This may have caused other processes in the application to be
>>>> > terminated by signals sent by prterun (as reported here).
>>>> > --------------------------------------------------------------------------
>>>> >
>>>> > ________________________________________
>>>> > <image198746.png>
>>>> > dr. ir. Christiaan Klaij | senior researcher
>>>> > Research & Development | CFD Development
>>>> > T +31 317 49 33 44<tel:+31%20317%2049%2033%2044> | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4BUEn1h8$ <https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJrOqapgp$>
>>>> > <image542473.png><https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJoD4fuV7$>
>>>> > <image555176.png><https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJospHf95$>
>>>> > <image269837.png><https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJrpsjB_W$>
>>>> >
>>>> >
>>>> > From: Klaij, Christiaan <C.Klaij at marin.nl <mailto:C.Klaij at marin.nl><mailto:C.Klaij at marin.nl <mailto:C.Klaij at marin.nl>>>
>>>> > Sent: Thursday, July 10, 2025 10:15 AM
>>>> > To: Junchao Zhang
>>>> > Cc: PETSc users list
>>>> > Subject: Re: [petsc-users] problem with nested logging, standalone example
>>>> >
>>>> > Hi Junchao,
>>>> >
>>>> > Thanks for testing. I've fixed the error but unfortunately that doesn't change the behavior, the code still hangs as before, with the same stack trace...
>>>> >
>>>> > Chris
>>>> >
>>>> > ________________________________________
>>>> > From: Junchao Zhang <junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com><mailto:junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com>>>
>>>> > Sent: Tuesday, July 8, 2025 10:58 PM
>>>> > To: Klaij, Christiaan
>>>> > Cc: PETSc users list
>>>> > Subject: Re: [petsc-users] problem with nested logging, standalone example
>>>> >
>>>> > Hi, Chris,
>>>> > First, I had to fix an error in your test by adding " PetscCallA(MatSetFromOptions(AA,ierr))" at line 254.
>>>> > [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>>>> > [0]PETSC ERROR: Object is in wrong state
>>>> > [0]PETSC ERROR: Mat object's type is not set: Argument # 1
>>>> > ...
>>>> > [0]PETSC ERROR: #1 MatSetValues() at /scratch/jczhang/petsc/src/mat/interface/matrix.c:1503
>>>> > [0]PETSC ERROR: #2 ex2f.F90:258
>>>> >
>>>> > Then I could ran the test without problems
>>>> > mpirun -n 2 ./ex2f -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always
>>>> > 0 KSP Residual norm 1.11803
>>>> > 1 KSP Residual norm 0.591608
>>>> > 2 KSP Residual norm 0.316228
>>>> > 3 KSP Residual norm < 1.e-11
>>>> > 0 KSP Residual norm 0.707107
>>>> > 1 KSP Residual norm 0.408248
>>>> > 2 KSP Residual norm < 1.e-11
>>>> > Norm of error < 1.e-12 iterations 3
>>>> >
>>>> > I used petsc-3.22.4, gcc-11.3, openmpi-5.0.6 and configured with
>>>> > ./configure --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-openmpi --with-ssl=0 --with-shared-libraries=1 CFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " COPTFLAGS="-std=gnu11 -Wall -funroll-all-loops -O3 -DNDEBUG" CXXOPTFLAGS="-std=gnu++14 -Wall -funroll-all-loops -O3 -DNDEBUG " FCFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" F90FLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG" FOPTFLAGS="-Wall -funroll-all-loops -ffree-line-length-0 -Wno-maybe-uninitialized -Wno-target-lifetime -Wno-unused-function -O3 -DNDEBUG"
>>>> >
>>>> > Could you fix the error and retry?
>>>> >
>>>> > --Junchao Zhang
>>>> >
>>>> >
>>>> > On Sun, Jul 6, 2025 at 12:57 PM Klaij, Christiaan via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov><mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>><mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov><mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>>> wrote:
>>>> > Attached is a standalone example of the issue described in the
>>>> > earlier thread "problem with nested logging". The issue appeared
>>>> > somewhere between petsc 3.19.4 and 3.23.4.
>>>> >
>>>> > The example is a variation of ../ksp/tutorials/ex2f.F90, where
>>>> > I've added the nested log viewer with one event as well as the
>>>> > solution of a small system on rank zero.
>>>> >
>>>> > When running on mulitple procs the example hangs during
>>>> > PetscLogView with the backtrace below. The configure.log is also
>>>> > attached in the hope that you can replicate the issue.
>>>> >
>>>> > Chris
>>>> >
>>>> >
>>>> > #0 0x000015554c84ea9e in mca_pml_ucx_recv (buf=0x7fffffff9e30, count=1,
>>>> > datatype=0x15554c9ef900 <ompi_mpi_2dblprec>, src=1, tag=-12,
>>>> > comm=0x7f1e30, mpi_status=0x0) at pml_ucx.c:700
>>>> > #1 0x000015554c65baff in ompi_coll_base_allreduce_intra_recursivedoubling (
>>>> > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1,
>>>> > dtype=0x15554c9ef900 <ompi_mpi_2dblprec>,
>>>> > op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaec630)
>>>> > at base/coll_base_allreduce.c:247
>>>> > #2 0x000015554c6a7e40 in ompi_coll_tuned_allreduce_intra_do_this (
>>>> > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1,
>>>> > dtype=0x15554c9ef900 <ompi_mpi_2dblprec>,
>>>> > op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaec630,
>>>> > algorithm=3, faninout=0, segsize=0) at coll_tuned_allreduce_decision.c:142
>>>> > #3 0x000015554c6a054f in ompi_coll_tuned_allreduce_intra_dec_fixed (
>>>> > sbuf=0x7fffffff9e20, rbuf=0x7fffffff9e30, count=1,
>>>> > dtype=0x15554c9ef900 <ompi_mpi_2dblprec>,
>>>> > op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaec630)
>>>> > at coll_tuned_decision_fixed.c:216
>>>> > #4 0x000015554c68e160 in mca_coll_hcoll_allreduce (sbuf=0x7fffffff9e20,
>>>> > rbuf=0x7fffffff9e30, count=1, dtype=0x15554c9ef900 <ompi_mpi_2dblprec>,
>>>> > op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30, module=0xaecb80)
>>>> > at coll_hcoll_ops.c:217
>>>> > #5 0x000015554c59811a in PMPI_Allreduce (sendbuf=0x7fffffff9e20,
>>>> > recvbuf=0x7fffffff9e30, count=1, datatype=0x15554c9ef900 <ompi_mpi_2dblprec>, op=0x15554ca28980 <ompi_mpi_op_maxloc>, comm=0x7f1e30) at allreduce.c:123
>>>> > #6 0x0000155553eabede in MPIU_Allreduce_Private () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>>>> > #7 0x0000155553e50d08 in PetscPrintXMLNestedLinePerfResults () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>>>> > #8 0x0000155553e5123e in PetscLogNestedTreePrintLine () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>>>> > #9 0x0000155553e51f3a in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>>>> > #10 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>>>> > #11 0x0000155553e51e96 in PetscLogNestedTreePrint () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>>>> > #12 0x0000155553e52142 in PetscLogNestedTreePrintTop () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>>>> > #13 0x0000155553e5257b in PetscLogHandlerView_Nested_XML () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>>>> > #14 0x0000155553e4e5a0 in PetscLogHandlerView_Nested () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>>>> > #15 0x0000155553e56232 in PetscLogHandlerView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>>>> > #16 0x0000155553e588c3 in PetscLogView () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>>>> > #17 0x0000155553e40eb5 in petsclogview_ () from /home/cklaij/ReFRESCO/trunk/install/extLibs/lib/libpetsc.so.3.22
>>>> > #18 0x0000000000402c8b in MAIN__ ()
>>>> > #19 0x00000000004023df in main ()
>>>> > [cid:ii_197ebccaa1d27ee6ef21]
>>>> > dr. ir. Christiaan Klaij | senior researcher
>>>> > Research & Development | CFD Development
>>>> > T +31 317 49 33 44<tel:+31%20317%2049%2033%2044> | https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!dcT9AzbxDJMLIie0NhYIw4YU2TObPM3WHhzR-HlzrpfbjPd6sgsPX009yFy1lw_eLLu2WprNwYRABMK4BUEn1h8$ <https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJhphmV4x$><https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imk4ivm_tE$ <https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!cbfMf1uAUCQ_T756UiU6Vd_NZkAvFLYRqJzL47P2JiAVi_2KCG5Q1u2oHseUcGLNAIW5qWtWbWHMIk_YNR8bJhphmV4x$%3E%3Chttps://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imk4ivm_tE$>>
>>>> > [Facebook]<https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkLNCvsiI$>
>>>> > [LinkedIn]<https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkrb79Ay4$>
>>>> > [YouTube]<https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!dAFNrWR8FzE9RrQXQAlok1iR_fA-rZdm9JAi-dlnKTnbdNTOTCViw0Nc-jjU4g72I-mhE1x1MZaf8imkJiCoeLw$>
>>>> >
>>>> >
>>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20250723/97bb46b6/attachment-0001.html>
More information about the petsc-users
mailing list