[petsc-users] Petsc + nvhpc

Matthew Knepley knepley at gmail.com
Thu Nov 13 11:23:20 CST 2025


On Thu, Nov 13, 2025 at 12:11 PM howen via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Dear Junchao,
>
> Thank you for response and sorry for taking so long to answer back.
> I cannot avoid using the nvidia tools. Gfortran is not mature for OpenACC
> and gives us problems when compiling our code.
> What I have done to enable using the latest petsc is to create my own C
> code to call petsc.
> I have little experience with c and it took me some time, but I can now
> use petsc 3.24.1  ;)
>
> The behaviour remains the same as in my original email .
> Parallel+GPU gives bad results. CPU(serial and parallel) and GPU serial
> all work ok and give the same result.
>
> I have gone a bit into petsc comparing the CPU and GPU version with 2 mpi.
> I see that the difference starts in
> src/ksp/ksp/impls/cg/cg.c  L170
>     PetscCall(KSP_PCApply(ksp, R, Z));  /*    z <- Br
>       */
> I have printed the vectors R and Z and the norm dp.
> R is identical on both CPU and GPU; but Z differs.
> The correct value of dp (for the first time it enters) is 14.3014, while
> running on the GPU with 2 mpis it gives 14.7493.
> If you wish I can send you prints I introduced in cg.c
>

Thank you for all the detail in this report. However, since you see a
problem in KSPCG, I believe we can reduce the complexity. You can use

  -ksp_view_mat binary:A.bin -ksp_view_rhs binary:b.bin

and send us those files. Then we can run your system directly using KSP
ex10 (and so can you).

  Thanks,

      Matt


> The folder with the input files to run the case can be downloaded from
> https://urldefense.us/v3/__https://b2drop.eudat.eu/s/wKRQ4LK7RTKz2iQ__;!!G_uCfscf7eWS!bOzafvznhRJly5r11WId0BSmM38vOBG5qnlJMJf02uLM44-t4g7Xm8NCG7h_D7BTAe3ACc19jaFdq9hQ4klS$ 
> <https://urldefense.us/v3/__https://b2drop.eudat.eu/s/wKRQ4LK7RTKz2iQ__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEAh7n_UO$>
>
> For submitting the gpu run I use
> mpirun -np 2 --map-by ppr:4:node:PE=20 --report-bindings ./mn5_bind.sh
> /gpfs/scratch/bsc21/bsc021257/git/140-add-petsc/sod2d_gitlab/build_gpu/src/app_sod2d/sod2d
> ChannelFlowSolverIncomp.json
>
> For the cpu run
> mpirun -np 2
> /gpfs/scratch/bsc21/bsc021257/git/140-add-petsc/sod2d_gitlab/build_cpu/src/app_sod2d/sod2d
> ChannelFlowSolverIncomp.json
>
> Our code can be downloaded with :
> git clone --recursive https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab.git__;!!G_uCfscf7eWS!bOzafvznhRJly5r11WId0BSmM38vOBG5qnlJMJf02uLM44-t4g7Xm8NCG7h_D7BTAe3ACc19jaFdq_ZmRVRG$ 
> <https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab.git__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEFjsBTIo$>
>
> -and the branch I am using with
> git checkout 140-add-petsc
>
> To use exactly the same commit I am using
> git checkout 09a923c9b57e46b14ae54b935845d50272691ace
>
>
> I am currently using: Currently Loaded Modules:
>   1) nvidia-hpc-sdk/25.1   2) hdf5/1.14.1-2-nvidia-nvhpcx   3) cmake/3.25.1
> I guess/hope similar modules should be available in any supercomputer.
>
> To build the cpu version
> mkdir build_cpu
> cd build_cpu
>
> export
> PETSC_INSTALL=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241_cpu/hhinstal
> export LD_LIBRARY_PATH=$PETSC_INSTALL/lib:$LD_LIBRARY_PATH
> export LIBRARY_PATH=$PETSC_INSTALL/lib:$LIBRARY_PATH
> export C_INCLUDE_PATH=$PETSC_INSTALL/include:$C_INCLUDE_PATH
> export CPLUS_INCLUDE_PATH=$PETSC_INSTALL/include:$CPLUS_INCLUDE_PATH
> export PKG_CONFIG_PATH=$PETSC_INSTALL/lib/pkgconfig:$PKG_CONFIG_PATH
>
> cmake -DUSE_RP=8 -DUSE_PORDER=3 -DUSE_PETSC=ON -DUSE_GPU=OFF
> -DDEBUG_MODE=OFF ..
> make -j 80
>
> I have built petsc myself  as follows
>
> git clone -b release https://urldefense.us/v3/__https://gitlab.com/petsc/petsc.git__;!!G_uCfscf7eWS!bOzafvznhRJly5r11WId0BSmM38vOBG5qnlJMJf02uLM44-t4g7Xm8NCG7h_D7BTAe3ACc19jaFdq58G6Hkk$ 
> <https://urldefense.us/v3/__https://gitlab.com/petsc/petsc.git__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLELP8U6d0$>
> petsc
> cd petsc
> git checkout v3.24.1
> module purge
> module load nvidia-hpc-sdk/25.1   hdf5/1.14.1-2-nvidia-nvhpcx cmake/3.25.1
> ./configure
> --PETSC_DIR=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/petsc
> --prefix=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/hhinstal
> --with-fortran-bindings=0  --with-fc=0 --with-petsc-arch=linux-x86_64-opt
> --with-scalar-type=real --with-debugging=yes --with-64-bit-indices=1
> --with-precision=single --download-hypre
> CFLAGS=-I/apps/ACC/HDF5/1.14.1-2/NVIDIA/NVHPCX/include CXXFLAGS= FCFLAGS=
> --with-shared-libraries=1 --with-mpi=1
> --with-blacs-lib=/gpfs/apps/MN5/ACC/ONEAPI/2025.1/mkl/2025.1/lib/intel64/libmkl_blacs_openmpi_lp64.a
> --with-blacs-include=/gpfs/apps/MN5/ACC/ONEAPI/2025.1/mkl/2025.1/include
> --with-mpi-dir=/apps/ACC/NVIDIA-HPC-SDK/25.1/Linux_x86_64/25.1/comm_libs/12.6/hpcx/latest/ompi/
> --download-ptscotch=yes --download-metis --download-parmetis
> make all check
> make install
>
> -------------------
> For the GPU version when configuring petsc I add : --with-cuda
>
> I then change the export PETSC_INSTALL  to
> export
> PETSC_INSTALL=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/hhinstal
> and repeat all other exports
>
> mkdir build_gpu
> cd build_gpu
> cmake -DUSE_RP=8 -DUSE_PORDER=3 -DUSE_PETSC=ON -DUSE_GPU=ON
> -DDEBUG_MODE=OFF ..
> make -j 80
>
> As you can see from the submit instructions the executable is found in
> sod2d_gitlab/build_gpu/src/app_sod2d/sod2d
>
> I hope I have not forgotten anything and my instructions are 'easy' to
> follow. If you have any issue do not doubt to contact me.
> The wiki for our code can be found in
> https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab/-/wikis/home__;!!G_uCfscf7eWS!bOzafvznhRJly5r11WId0BSmM38vOBG5qnlJMJf02uLM44-t4g7Xm8NCG7h_D7BTAe3ACc19jaFdq4yklS7Y$ 
> <https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab/-/wikis/home__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEA1vqPYk$>
>
> Best,
>
> Herbert Owen
>
> Herbert Owen
> Senior Researcher, Dpt. Computer Applications in Science and Engineering
> Barcelona Supercomputing Center (BSC-CNS)
> Tel: +34 93 413 4038
> Skype: herbert.owen
>
> https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!bOzafvznhRJly5r11WId0BSmM38vOBG5qnlJMJf02uLM44-t4g7Xm8NCG7h_D7BTAe3ACc19jaFdq7rqqXKl$ 
> <https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEAA5PwtO$>
>
>
>
>
>
>
>
>
> On 16 Oct 2025, at 18:30, Junchao Zhang <junchao.zhang at gmail.com> wrote:
>
> Hi, Herbert,
>    I don't have much experience on OpenACC and PETSc CI doesn't have such
> tests.  Could you avoid using nvfortran and instead use gfortran to compile
> your Fortran + OpenACC code?  If you, then you can use the latest petsc
> code and make our debugging easier.
>    Also, could you provide us with a test and instructions to reproduce
> the problem?
>
>    Thanks!
> --Junchao Zhang
>
>
> On Thu, Oct 16, 2025 at 5:07 AM howen via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
>> Dear All,
>>
>> I am interfacing our CFD code (Fortran + OpenACC)  to Petsc.
>> Since we use OpenACC the natural choice for us is to use Nvidia´s nvhpc
>> compiler. The Gnu compiler does not work well and we do not have access to
>> the Cray compiler.
>>
>> I already know that the latest version of Petsc does not compile with
>> nvhpc, I am therefore using version 3.21.
>> I get good results on the CPU both in serial and parallel (MPI). However,
>> the GPU implementation, that is what we are interested in, only work
>> correctly for the serial version. In parallel, the results are different.
>> Even for a CG solve.
>>
>> I would like to know, if you have experience with the Nvidia compiler.  I
>> am particularly interested if you have already observed issues with it.
>> Your opinion on whether to put further effort into trying to find a bug I
>> may have introduced during the interfacing is highly appreciated.
>>
>> Best,
>>
>> Herbert Owen
>> Senior Researcher, Dpt. Computer Applications in Science and Engineering
>> Barcelona Supercomputing Center (BSC-CNS)
>> Tel: +34 93 413 4038
>> Skype: herbert.owen
>>
>> https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!bOzafvznhRJly5r11WId0BSmM38vOBG5qnlJMJf02uLM44-t4g7Xm8NCG7h_D7BTAe3ACc19jaFdq7rqqXKl$ 
>> <https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!abuM7ozzUs7eISYBumHNxpvO2Tuy74KRM4-WWcunXHZVjQf1V032xQrCzTfC5vA_NM-35xMEZ9yJ8XK-3QFqjWBSWuUi$>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bOzafvznhRJly5r11WId0BSmM38vOBG5qnlJMJf02uLM44-t4g7Xm8NCG7h_D7BTAe3ACc19jaFdq3vxkBC_$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bOzafvznhRJly5r11WId0BSmM38vOBG5qnlJMJf02uLM44-t4g7Xm8NCG7h_D7BTAe3ACc19jaFdq7RR1DEh$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20251113/d26637d5/attachment.html>


More information about the petsc-users mailing list